By analyzing these passages, Watson can identify “Pagliacci” as being “an opera,” although this on its own would not be much help, since many other passages also identify opera names. The second result identifies a hit record, “The Tears of a Clown,” by “Smokey Robinson,” which the system judges to be probably the same thing as “Smokey Robinson and the Miracles.” However, many other song titles would be generated in a similar manner. The probability that the result is accurate would also be judged low, because the song is associated with “the ’60s” and not “1970.” The third passage, however, reinforces the idea that “The Tears of a Clown” was a hit in 1970, provided the system determines that “The Miracles” refers to the same thing as “Smokey Robinson and the Miracles.”
From the first of these three passages, the Watson engine would know that Pagliacci is an opera about a clown who hides his feelings. To make the connection to Smokey Robinson, the system has to recognize that “tears” are strongly related to “feelings,” and since it knows that Pagliacci is about a clown that tries to keep its feelings hid, it guesses–correctly–that Pagliacci is the answer. Of course, the system might still make the wrong choice “depending on how the wrong answers may be supported by the available evidence,” says Ferrucci.
It’s easy, Ferrucci says, for less sophisticated natural-language systems to conclude that “The Tears of a Clown” is the answer by missing the fact that the request was for an opera referenced by that song. Such a conclusion could be triggered by passages that have lots of keywords that match the question.
Marti Hearst, a computer scientist at the University of California, Berkeley, says that “tremendous progress has been made on this task in the last decade by researchers in natural-language processing.” She adds that “pitting IBM’s Watson question-answering system against the top humans in a game of Jeopardy! is a fun way to publicize and showcase this progress,” but she also notes the lack of published research available for examination.
Meanwhile, the Defense Advanced Research Projects Agency (DARPA) will soon announce the participants chosen to take part in a five-year research effort aimed at advancing the state of natural-language processing. “I expect that this whole area will heat up significantly in the next few years,” says Dan Weld, a computer scientist at the University of Washington, who leads a group that has applied to take part in the DARPA effort.
Whether or not IBM’s Watson beats humans on Jeopardy! next year, the DARPA project will surely push the field ahead, says Weld. As DARPA noted in its request for research proposals, today’s smartest language-processing systems are narrowly focused, while more broadly focused systems are more imprecise. “DARPA’s involvement will focus the research of many people at top universities and research labs to push on integrated systems that can actually read a broad array of documents,” Weld says. “Most current systems tackle small parts of the puzzle.”