For decades, humans have struggled to create machines that can extract meaning from human language, with all its messiness, subtle context, humor, and irony. Traditional approaches require a great deal of manual work up front to render material understandable to computer algorithms. The ultimate goal is to make this step unnecessary.
IBM hopes to advance toward this objective with Watson, a computer system that will play Jeopardy!, the popular TV trivia game show, against human contestants. Demonstrations of the system are expected this year, with a final televised matchup–complete with hosting by the show’s Alex Trebek–sometime next year. Questions will be spoken aloud by Trebek but fed into the machine in text format during the show.
The company has not yet published any research papers describing how its system will tackle Jeopardy!-style questions. But David Ferrucci, the IBM computer scientist leading the effort, explains that the system breaks a question into pieces, searches its own databases for “related knowledge,” and then finally makes connections to assemble a result. Watson is not designed to search the Web, and IBM’s end goal is a system that it can sell to its corporate customers who need to make large quantities of information more accessible.
Ferrucci describes how the technology would handle the following Jeopardy!-style question: “It’s the opera mentioned in the lyrics of a 1970 number-one hit by Smokey Robinson and the Miracles.”
The Watson engine uses natural-language processing techniques to break the question into structural components. In this case, the pieces include 1) an opera; 2) the opera is mentioned in a song; 3) the song was a hit in 1970; and 4) the hit was by Smokey Robinson and the Miracles.
In searching its databases for information that could be relevant to these segments, the system might find hundreds of passages. These could include the following three:
“Pagliacci,” the opera about a clown who tries to keep his feelings “hid”;
Smokey Robinson’s Motown hit record of the ‘60s “Tears of a Clown”;
“Tears of a Clown” by the Miracles hit #1 in the UK in 1970.
Smaller design teams can now prototype and deploy faster.