We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Intelligent Machines

How Vector Space Mathematics Helps Machines Spot Sarcasm

Sarcasm is almost impossible for computers to spot. A mathematical approach to linguistics could change that.

Back in 1970, the social activist Irina Dunn scribbled a slogan on the back of a toilet cubicle door at the University of Sydney. It said: “A woman needs a man like a fish needs a bicycle.” The phrase went viral and eventually became a famous refrain for the growing feminist movement of the time. 

The phrase is also an example of sarcasm. The humor comes from the fact that a fish doesn’t need a bicycle. Most humans have little trouble spotting this. But while various advanced machine learning techniques have helped computers spot other forms of humor, sarcasm still largely eludes them.

These other forms of humor can be spotted by looking for, say, positive verbs associated with negative or undesirable situation. And some researchers have used this approach to look for sarcasm.

But sarcasm is often devoid of sentiment. The phrase above is a good example—it contains no sentiment-bearing words. So a new strategy is clearly needed if computers are ever to spot this kind of joke.

Today, Aditya Joshi at the Indian Institute of Technology Bombay in India, and a few pals, say they’ve hit on just such a strategy. They say their new approach dramatically improves the ability of computers to spot sarcasm.

Their method is relatively straightforward. Instead analyzing the sentiment in a sentence, Joshi and co analyze the similarity of the words. The do this by studying the way words relate to each other in a vast database of Google News stories containing some three million words. This is known as the Word2Vec database.

This database has been analyzed extensively to determine how often words appear next to each other. This allows them to be represented as vectors in in a high dimensional space. It turns out that similar words can be represented by similar vectors and that vector space mathematics can capture simple relationships between them. For example, “king – man + woman = queen.”

Although there are clear differences between the words “man” and “woman,” they occupy similar parts of the vector space. However, the words bicycle and fish occupy entirely different parts of the space and so are thought of as very different.

According to Joshi and co, sentences that contrast similar concepts with dissimilar ones are more likely to be sarcastic.

To test this idea, they study the similarity between words in a database of quotes on the Goodreads website. The team chose only quotes that have been tagged “sarcastic” by readers and, as a control, also include quotes tagged as “philosophy.” This results in a database of 3,629 quotes, of which 759 are sarcastic. The team then compared the word vectors in each quote looking for similarities and differences.

The results make for interesting reading. Joshi and co say this word embedding approach is significantly better than other techniques at spotting sarcasm. “We observe an improvement in sarcasm detection,” they say.

The new approach isn’t perfect, of course. And the errors it makes are instructive. For example, it did not spot the sarcasm in the following quote: “Great. Relationship advice from one of America’s most wanted.”

That’s probably because many of these words have multiple meanings that the Word2Vec embedding does not capture.

Another sarcastic sentence it fails to spot is: “Oh, and I suppose the apple ate the cheese.” In this case, apple and cheese have a high similarity score and none of the words pairs shows a meaningful difference. So this example does not follow the rule that the algorithm is designed to search for.

The algorithm also incorrectly identifies some sentences as sarcastic. Joshi and co point to this one, for example: “Oh my love, I like to vanish in you like a ripple vanishes in an ocean—slowly, silently and endlessly.”

Humans had not tagged this as sarcastic. However, it is not hard to imagine this sentence being used sarcastically.

Overall, this is interesting work which raises some directions for future research. In particular, it would be fascinating to use this kind of algorithm to create sarcastic sentences and perhaps use human judges to decide whether or not they work in this sense.

Beyond that is the task of computational humor itself. That’s an ambitious goal but perhaps one that is not entirely out of reach. Much humor is formulaic so an algorithm ought to be able to apply such a formula with ease. Yeah, right!

Ref: arxiv.org/abs/1610.00883: Are Word Embedding-based Features Useful for Sarcasm Detection?

Get stories like this before anyone else with First Look.

Subscribe today
Already a Premium subscriber? Log in.
More from Intelligent Machines

Artificial intelligence and robots are transforming how we work and live.

Want more award-winning journalism? Subscribe to Insider Basic.
  • Insider Basic {! insider.prices.basic !}*

    {! insider.display.menuOptionsLabel !}

    Six issues of our award winning print magazine, unlimited online access plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Bimonthly print magazine (6 issues per year)

You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.