An Algorithmic Sense of Humor? Not Yet.

Artificial-intelligence researchers have made vast strides recently in matching various human capabilities. But for the moment, humor looks beyond their reach.

Emerging Technology from the arXivarchive page

July 13, 2015

In recent months, artificial-intelligence researchers have made giant strides in matching human performance in all kind of tasks that had, until recently, been thought of as almost impossible for computers. Things like face and object recognition, for example.

But there are some areas that are still hugely difficult for machines to grasp, and humor is one of them. Having a sense of humor is a uniquely personal quality that is famously difficult to define. What makes one person laugh and another grimace can be almost impossible to predict.

That’s partly because humor depends on so many parameters, many of which are internal and liable to change from one moment to the next. What seems funny now may not seem so funny later or tomorrow.

Nevertheless, various linguists and psychologists have suggested that good jokes all share common properties and that a systematic analysis ought to reveal them. The question is how to get at these primitives of humor and whether machine learning can help.

Today, we get an answer of sorts thanks to the work of Dragomir Radev at the University of Michigan in Ann Arbor and a few pals at Yahoo Labs, Columbia University, and the New Yorker magazine. These guys have been studying the captions associated with cartoons.

The New Yorker famously publishes an uncaptioned cartoon each week, asking readers to submit their own caption. The editors then pick the top three and ask the readers to vote for the best.

That has created a huge database of captions. Today, Radev and co publish their study of 300,000 captions written for 50 New Yorker cartoons since 2005.

Their method is straightforward. They first analyze the set of captions for each cartoon using a number of standard linguistic techniques. Criteria include the level of positive or negative sentiment, whether the captions were human-centered (i.e., referring to people), how clearly they refer to objects depicted in the cartoon, and so on.

Radev and co also used network theory to study the captions. They listed the topics mentioned in each caption and then created a network by linking captions that mentioned the same topics. That allowed them to use standard network analysis tools to find, for example, the most important node in network, a property known as centrality.

Each of these methods produced a ranking of the captions. Radev and co took each of the most highly ranked captions and compared them to the gold standard: captions that the readers of the New Yorker chose as the funniest. They did this by crowdsourcing opinion using Amazon’s Mechanical Turk, asking seven turkers to choose the funnier of two captions or to rank them equally.

Radev and co say the results provide some insight into the nature of funny captions. “We found that the methods that consistently select funnier captions are negative sentiment, human-centeredness, and lexical centrality,” they say.

That’s a curious study that is hard to evaluate. The researchers acknowledge that there is no surprise in finding that negative sentiment correlates with funniness; human-centeredness is also an expected property of humor. The significance of lexical centrality is less clear.

And therein lies the problem with this kind of research. It’s easy to imagine that one goal from this kind of work would be to create a machine capable of automatically choosing the best caption from thousands entered into the New Yorker competition each week. But the teams seem as far as ever from achieving this. Did any of these automatic methods reliably pick the caption chosen by readers? Radev and co do not say, so presumably not.

A more ambitious goal would be to find a way to write better captions for cartoons, perhaps automatically. The conclusion from this work? Don’t hold your breath.

And perhaps that’s a relief. At least there’s one human quality that looks beyond the reach of current machine-learning techniques.

To their credit, Radev and co are making their corpus of cartoons and captions available to other researchers. So if there’s anybody out there who thinks they can do better, they’re welcome to try.

Ref: arxiv.org/abs/1506.08126 : Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.