Building a truly interactive customer service system like Nuance’s requires solutions to each of the major challenges in natural-language processing: accurately transforming human speech into machine-readable text; analyzing the text’s vocabulary and structure to extract meaning; generating a sensible response; and replying in a human-sounding voice.
Scientists at MIT, Carnegie Mellon University, and other universities, as well as researchers at companies like IBM, AT&T, and the Stanford Research Institute (now SRI International), have struggled for decades with the first part of the problem: turning the spoken word into something computers can work with. The first practical products came in the early 1990s in the form of consumer speech recognition programs-such as IBM’s Voice Type-that took dictation but forced users to pause after each word, limiting adoption. By the mid-1990s, the technology had advanced and led to dictation systems such as Dragon Systems’ NaturallySpeaking and IBM’s ViaVoice, which can transcribe unbroken speech with up to 99 percent accuracy.
Around the same time, a few scientists broke away from academic and corporate labs to create startups aimed at tackling the even more complex problems-and bigger potential markets-of the second area of language processing, dubbed “language understanding.” It’s largely advances in this area that have positioned the field for its real growth spurt. These advances rest on two important realizations, according to SpeechWorks chief technology officer Michael Phillips, a former research scientist at MIT’s Laboratory for Computer Science. The first was that there’s little point in reaching for the moon-the decades-old dream of systems capable of HAL-like general conversation. “There is a myth that people want to talk to machines the same way they talk to people,” Phillips says. “People want an efficient, friendly, helpful machine-not something that’s trying to trick them into thinking they’re having a conversation with a human.” This assumption vastly simplifies the job of building and training a natural-language system.