The second realization was that the time had come to combine philosophies long held by rival factions in the language-processing community. One philosophy essentially says that understanding speech is a matter of discerning its grammatical structure, while the other holds that statistical analysis-matching words or phrases against a historical database of speech examples-is a more efficient tool for guessing a sentence’s meaning. Hybrid systems that use both methods, the startups have learned, are more accurate than either approach on its own.
But this insight didn’t arrive overnight. At MIT, Phillips had helped develop experimental software that could recognize speech and, based on its understanding of grammar, make sense of a request and reply logically. Like other grammar-based systems, it broke a sentence into its syntactic components, such as subject, verb, and object. The system then arranged these components into treelike diagrams that represented a sentence’s semantic content, or internal logic-who did what to whom, and when. The software was limited to helping users navigate around Cambridge, MA, Phillips explains. “You’d say, Where’s the nearest restaurant?’ and it would say, What kind of restaurant do you want?’ You would say, Chinese,’ and it would find you a place.”
Shortly after Phillips licensed the technology from MIT in 1994 and left to start SpeechWorks, both he and researchers at competitor Nuance saw that one of their target applications, call steering, required something more. “There are companies out there that have 300 different 800 numbers,” Phillips explains. “The customer doesn’t understand the structure of the organization-they just know what problem they have. The right thing to do is to ask a question, like, What’s the problem you’re having?’” But compared to a request for a nearby Chinese restaurant, such questions are perilously open ended.
The problem gets harder when one considers that the ambiguity of much human speech-think of a phrase like “he saw the girl with the telescope”-means that many requests are open to multiple interpretations. “There are so many different ways that somebody could speak to the system that trying to cover all that in grammars is prohibitive,” says John Shea, vice president for marketing and product management at Nuance.
SpeechWorks finally found a workable solution in 2000, when it married the MIT software with a statistical language-processing technology developed at AT&T Labs-Research in Florham Park, NJ. AT&T’s system is built around a database of common sentence fragments drawn from tens of thousands of recorded telephone calls involving both human-to-human and human-to-machine communication. Each fragment in the database is scored for its statistical association with a certain topic and classified accordingly. A fragment such as “calls I didn’t make,” for instance, might correlate strongly with the topic “unrecognized-number billing inquiries,” and the system would route the call to an agent who could credit the caller’s account. If the system isn’t confident about its choice, it prompts the caller for more information using speech synthesis technology. In the end, according to AT&T, the system correctly routes more than 90 percent of calls-a far higher success rate than callers experience when navigating old-fashioned phone trees on their own.
Nuance developed a similar system, based on technology from SRI, which can use either grammatical or statistical methods, or both, to extract meaning from a caller’s speech. “We use different approaches depending on the customer’s needs,” says Felix Gofman, a product-marketing manager at Nuance. “You can mix and match.” In a specific field, such as banking, the topics and vocabulary of callers’ questions will be limited, and the system can operate solely using predefined lists of what customers might say. For new or wider-ranging fields such as ordering phone service, the system stores each question it hears in a database, then uses statistical techniques to compare new questions to old entries in a search for probable matches-thereby improving accuracy over time.
SpeechWorks’ call center technology is used by such diverse enterprises as Office Depot, the U.S. Postal Service, Thrifty Car Rental, and United Airlines. But the company pushing the technology closest to its limits is Amtrak. Travelers calling Amtrak’s automated telephone system can not only get train schedules but also book reservations and charge tickets to their credit cards. “When we set out, the primary goal was to increase customer satisfaction rates,” says Matt Hardison, the railroad’s chief of sales, distribution, and customer service. But as a bonus, he says, the savings in labor costs repaid Amtrak’s $4 million investment in the technology within 18 months.
Nuance, meanwhile, has big customers in the financial and telecommunications industries, including Schwab, Sprint PCS, and Bell Canada. British Airways told the company that after deploying Nuance speech recognition systems last year, its average cost per customer call dropped from $3.00 to $.16. And according to Bell Canada’s Banks, 40 percent of customers used to “zero out,” or request a live operator, while navigating the company’s touch-tone phone tree. Between the company’s December 2002 implementation of the system and March 2003, that number dropped to 15 percent, says Banks.