Skip to Content

It’s Easy to Slip Toxic Language Past Alphabet’s Toxic-Comment Detector

Machine-learning algorithms are no match for the creativity of human insults.
February 24, 2017

On Thursday Alphabet released a machine-learning-based service, called Perspective, intended to identify toxic comments on websites. It’s from Jigsaw, a unit working on technologies to make the Internet a safer and more civil place. But when I toyed with Perspective, the results were erratic.

Perspective rates comments on a 1 to 100 scale for “toxicity,” defined as “a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion.” “Screw you, Trump supporters” is judged to be highly toxic, while “I honestly support both” is not, for example. But Perspective has trouble detecting the sentiment behind a comment—a problem I predicted would trouble Jigsaw when I examined its ambitions in December (see “If Only AI Could Save Us From Ourselves”).

“Trump sucks” scored a colossal 96 percent, yet neo-Nazi codeword “14/88” only scored 5 percent. “Few Muslims are a terrorist threat” was 79 percent toxic, while “race war now” scored 24 percent. “Hitler was an anti-Semite” scored 70 percent, but “Hitler was not an anti-Semite” scored only 53%, and “The Holocaust never happened” scored only 21%. And while “gas the joos” scored 29 percent, rephrasing it to “Please gas the joos. Thank you.” lowered the score to a mere 7 percent. (“Jews are human,” however, scores 72 percent. “Jews are not human”? 64 percent.)

According to Jigsaw, Perspective was trained to detect toxicity using hundreds of thousands of comments ranked by human reviewers. The result appears to be a system sensitized to particular words and phrases—but not to meanings.

The word “Rape,” for example, scores 77 percent on its own—perhaps explaining why “Rape is a horrible crime” scores 81 percent. (A similar pattern is seen with profanity: “I fucking love this” scores 94 percent.)

Similarly, negations and other nuances of language cause paradoxical results. Adding a “not” to create “Few Muslims are not a terrorist threat” lowers the toxicity from 79 percent to 60 percent because “not a terrorist threat” appears more innocuous to Perspective, even though the intended meaning becomes more toxic.

As I noted in my previous piece on Jigsaw, the current state of machine learning doesn’t permit software to grasp the intent and context of comments. By doing surface-level pattern matching, Conversation AI may be able to filter stylistically—but not semantically.

That doesn’t make the technology useless. A system such as Perspective could speed up the work of moderators by flagging extreme cases. It makes sense that the New York Times is collaborating with Jigsaw to give its moderators help in policing comments on articles. The New York Times does not have an abuse problem, though; it is seeking to identify high-quality comments, where stylistic matching is likely to be more effective. When it comes to intentional abuse, Jigsaw’s software won’t be able to substitute for human judgment in ambiguous cases.

We may say “Trolls are stupid” (toxicity score 96 percent), but the language of toxicity and harassment is often rich in ways that machine-learning systems can’t handle. The comment “You should be made into a lamp,” an allusion to claims that skin from concentration camp victims was used for lampshades, has been thrown at a number of journalists and other public figures in recent months. It scores just 4 percent on Perspective. But best not reply by saying “You are a Nazi,” because that’s an 87 percent.

Keep Reading

Most Popular

Scientists are finding signals of long covid in blood. They could lead to new treatments.

Faults in a certain part of the immune system might be at the root of some long covid cases, new research suggests.

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.