Yahoo Has a Tool that Can Catch Online Abuse Surprisingly Well
Computers are getting better at spotting trolls, but they often can’t understand the meaning of messages.
Trolls seem to lurk in every corner of the Internet, and they delight in ruining your day. But if our e-mail inboxes can be kept relatively spam-free, why can’t machines automatically purge abusive messages from tweets or comments?
It’s a question that seems relevant to the very fabric of Internet culture today. Last week, Twitter banned a journalist that it accused of orchestrating a campaign of abuse aimed at one of the stars of the all-female Ghostbusters reboot. Twitter said it would introduce new guidelines and tools for reporting abuse through its service. Certainly, countless other incidents on Twitter and elsewhere go unnoticed every day.
Researchers are, in fact, making some progress toward technology that can help stop the abuse. A team at Yahoo recently developed an algorithm capable of catching abusive messages better than any other automated system to date. The researchers created a data set of abuse by collecting messages on Yahoo articles that were flagged as offensive by the company’s own comment editors.
The Yahoo team used a number of conventional techniques, including looking for abusive keywords, punctuation that often seemed to accompany abusive messages, and syntactic clues as to the meaning of a sentence.
But the researchers also applied a more advanced approach to automated language understanding, using a way of representing the meaning of words as vectors with many dimensions. This approach, known as “word embedding,” allows semantics to be processed in a sophisticated way. For instance, even if a comment contains a string of words that have not been identified as abusive, the representations of that string in vector space may be enough to identify it as such.
When everything was combined, the team was able to identify abusive messages (from its own data set) with roughly 90 percent accuracy.
Catching the remaining 10 percent may prove tricky. Although AI researchers are making significant progress in training machines to parse language, artificial intelligence has yet to equip computers with the brainpower needed to untangle meaning. As a contest held at a recent AI conference shows, computers cannot disentangle the most simple ambiguities in sentences.
Many tech companies, including Twitter, have AI researchers dedicated to advancing the state of the art in areas such as image recognition and text comprehension. But so far surprisingly little effort seems to have been put into catching abuse or harassment systematically. Twitter declined to say if its AI team is actively working on the problem (although it seems likely). But it is unlikely that the company will introduce a magic bullet for filtering out malicious messages. The problem with automated hate filtering is that words are packed with meaning that can only be unpacked with real intelligence.
“Automatically identifying abuse is surprisingly difficult,” says Alex Krasodomski-Jones, who tracks online abuse as a researcher with the U.K.-based Centre for Analysis of Social Media. “The language of abuse is amorphous—changing frequently and often used in ways that do not connote abuse, such as when racially or sexually charged terms are appropriated by the groups they once denigrated. Given 10 tweets, a group of humans will rarely all agree on which ones should be classed as abusive, so you can imagine how difficult it would be for a computer.”
Until machines gain real intelligence, filtering out hateful messages will be impossible. But Krasodomski-Jones offers another, more human, reason why we might not want an automated solution: “In a world where what we read is increasingly dictated by algorithms and filters, we ought to be careful about demanding more computer interference.”
Become an MIT Technology Review Insider for in-depth analysis and unparalleled perspective.Subscribe today