How IBM Plans to Win Jeopardy!

IBM’s Watson will showcase the latest tricks in natural-language processing.

David Talbotarchive page

May 27, 2009

For decades, humans have struggled to create machines that can extract meaning from human language, with all its messiness, subtle context, humor, and irony. Traditional approaches require a great deal of manual work up front to render material understandable to computer algorithms. The ultimate goal is to make this step unnecessary.

**What is Watson?**: IBM is preparing a natural-language computer system that will compete against humans on TV’s Jeopardy!, which is hosted by Alex Trebek.

IBM hopes to advance toward this objective with Watson, a computer system that will play Jeopardy!, the popular TV trivia game show, against human contestants. Demonstrations of the system are expected this year, with a final televised matchup–complete with hosting by the show’s Alex Trebek–sometime next year. Questions will be spoken aloud by Trebek but fed into the machine in text format during the show.

The company has not yet published any research papers describing how its system will tackle Jeopardy!-style questions. But David Ferrucci, the IBM computer scientist leading the effort, explains that the system breaks a question into pieces, searches its own databases for “related knowledge,” and then finally makes connections to assemble a result. Watson is not designed to search the Web, and IBM’s end goal is a system that it can sell to its corporate customers who need to make large quantities of information more accessible.

Ferrucci describes how the technology would handle the following Jeopardy!-style question: “It’s the opera mentioned in the lyrics of a 1970 number-one hit by Smokey Robinson and the Miracles.”

The Watson engine uses natural-language processing techniques to break the question into structural components. In this case, the pieces include 1) an opera; 2) the opera is mentioned in a song; 3) the song was a hit in 1970; and 4) the hit was by Smokey Robinson and the Miracles.

In searching its databases for information that could be relevant to these segments, the system might find hundreds of passages. These could include the following three:

“Pagliacci,” the opera about a clown who tries to keep his feelings “hid”;

Smokey Robinson’s Motown hit record of the ‘60s “Tears of a Clown”;

“Tears of a Clown” by the Miracles hit #1 in the UK in 1970.

By analyzing these passages, Watson can identify “Pagliacci” as being “an opera,” although this on its own would not be much help, since many other passages also identify opera names. The second result identifies a hit record, “The Tears of a Clown,” by “Smokey Robinson,” which the system judges to be probably the same thing as “Smokey Robinson and the Miracles.” However, many other song titles would be generated in a similar manner. The probability that the result is accurate would also be judged low, because the song is associated with “the ’60s” and not “1970.” The third passage, however, reinforces the idea that “The Tears of a Clown” was a hit in 1970, provided the system determines that “The Miracles” refers to the same thing as “Smokey Robinson and the Miracles.”

From the first of these three passages, the Watson engine would know that Pagliacci is an opera about a clown who hides his feelings. To make the connection to Smokey Robinson, the system has to recognize that “tears” are strongly related to “feelings,” and since it knows that Pagliacci is about a clown that tries to keep its feelings hid, it guesses–correctly–that Pagliacci is the answer. Of course, the system might still make the wrong choice “depending on how the wrong answers may be supported by the available evidence,” says Ferrucci.

It’s easy, Ferrucci says, for less sophisticated natural-language systems to conclude that “The Tears of a Clown” is the answer by missing the fact that the request was for an opera referenced by that song. Such a conclusion could be triggered by passages that have lots of keywords that match the question.

Marti Hearst, a computer scientist at the University of California, Berkeley, says that “tremendous progress has been made on this task in the last decade by researchers in natural-language processing.” She adds that “pitting IBM’s Watson question-answering system against the top humans in a game of Jeopardy! is a fun way to publicize and showcase this progress,” but she also notes the lack of published research available for examination.

Meanwhile, the Defense Advanced Research Projects Agency (DARPA) will soon announce the participants chosen to take part in a five-year research effort aimed at advancing the state of natural-language processing. “I expect that this whole area will heat up significantly in the next few years,” says Dan Weld, a computer scientist at the University of Washington, who leads a group that has applied to take part in the DARPA effort.

Whether or not IBM’s Watson beats humans on Jeopardy! next year, the DARPA project will surely push the field ahead, says Weld. As DARPA noted in its request for research proposals, today’s smartest language-processing systems are narrowly focused, while more broadly focused systems are more imprecise. “DARPA’s involvement will focus the research of many people at top universities and research labs to push on integrated systems that can actually read a broad array of documents,” Weld says. “Most current systems tackle small parts of the puzzle.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.