Skip to Content

Watson on Jeopardy, Part 2

The IBM machine’s mistakes offered insights about how it works.
February 15, 2011

Watching the first night of the Jeopardy match pitting the IBM Watson program against human contestants was great fun. One nice touch was the “backstage” display that showed three answers Watson considered for each question and the machine’s confidence in them. That’s interesting, because it gives you some insight into the range of things it was considering.

Some of the categories were obviously softballs for Watson. One category, “Beatles People,” was easy because simply matching song lyrics would get the program a long way (but not all the way) to finding the answer. The rules of the game prohibited the computer from going out on the Web to find answers. Watson has to rely on its own resources, stored in advance. But in its 15 petabytes of storage, Watson basically has, more or less, a copy of a good swath of the Web.

Obviously, it had a copy of the Beatles lyrics that it was searching. Otherwise it wouldn’t have had a prayer on those questions.

Watson ended the first round tied for first, with $5,000; Ken Jennings was third with $2,000. But to get an idea of how well Watson really did, you can run your own contest at home, against what is Watson’s real competitor. Not Brad Rutter or Ken Jennings, but a search engine like Google. Simply type in the clue to Google and see what you get. Like Watson, Google analyzes huge quantities of text, counting words and keeping track of how often words tend to occur together. Like Watson, Google uses multiple approaches to analyze text, and then has a kind of “voting” scheme to figure out how confident it is of the answer.

There are many differences between Watson and Google, but doing that will give you a good feel for the problem. A lot of the time, what you will get is some Web pages that have the answer somewhere within them, but picking the answer out of whatever is on the page, ads and all, is no mean feat. Understanding what constitutes an answer is the central problem.

Interestingly, where Watson failed was sometimes more instructive than when it succeeded.

Clue: It was this anatomical oddity of US gymnast George Eyser….

Ken Jennings’ answer: Missing a hand (wrong)

Watson’s answer: leg (wrong)

Correct answer: Missing a leg

What Watson failed to realize was that the word “leg,” by itself, wasn’t actually an answer to the question. This is common sense for people, because “leg” is an anatomical part, not an anatomical oddity, though Watson did realize that legs were involved somehow. What happened here might have been something more profound than a simple bug. David Ferrucci, Watson’s project leader, attributed the failure to the difficulty of the word “oddity” in the question. To understand what might be odd, you have to compare it to what isn’t odd—that is to say, what’s common sense. A problem with Watson’s approach is that if some sentence appears in its database, it can’t tell whether someone put it there just because it’s true, or because someone felt it was so unusual that it needed to be said.

A computer that lacks common sense, unfortunately, isn’t an oddity. Maybe it should be.

Henry Lieberman is a research scientist who works on artificial intelligence at the Media Laboratory at MIT.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

This baby with a head camera helped teach an AI how kids learn language

A neural network trained on the experiences of a single young child managed to learn one of the core components of language: how to match words to the objects they represent.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.