Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo


Unsupported browser: Your browser does not meet modern web standards. See how it scores »

{ action.text }


Say that to a human airline agent nicely, and he or she will quickly disentangle your words and find flights that meet your criteria. Say it to the airline’s automated reservations line, however, and all you’re likely to get is a cheery digital voice intoning, “Sorry, I didn’t catch that.”

Don’t blame the voice. Even assuming the airline’s computers overcame the garbled words, background noise, and Boston accent to render the request into accurate text, no language-processing system has the computational firepower to make sense of your price and routing constraints, ignore irrelevancies like the fact that Saturday is your sister’s birthday, and understand that if the party starts at 3:00 p.m., you’re not interested in flights that arrive in Milwaukee at 4:00.

If computers could understand and respond to such routine natural-language requests, the results would be win-win: airlines wouldn’t need to hire so many agents, and consumers wouldn’t have to struggle with the confusion of touch-tone interfaces that leave them furiously tapping the “0” button, vainly trying to reach a live operator.

Futurists have been envisioning such a world since at least 1968, when 2001: A Space Odyssey’s HAL 9000 became the archetypal voice-interactive computer. Academic and corporate researchers intrigued by the sheer coolness of the idea have been tinkering for just as long with systems for recognizing and responding to human speech. But technologies don’t take hold because they’re cool: they need a business imperative. For language processing, it’s the enormous expense of live customer service that’s finally driving the technologies out of the lab. Simple “press or say one’ ” phone trees are rapidly heading for the scrap heap as companies such as Nuance Communications and SpeechWorks meld previously competing strategies into software that infers the intention behind people’s naturally spoken or written requests. Major airlines, banks, and consumer-goods companies are already using the systems, and while the technology can’t yet hold up its end of a conversation, it does help callers with simple questions avoid long queues-and frees human agents to deal with more complex requests.

Such improvements have set up natural-language systems for explosive growth: 43 percent of North American companies have either purchased interactive voice response software for their call centers or are conducting pilot studies, according to Forrester Research, a technology analysis firm. As more companies replace their old touch-tone phone menus, today’s $500 million market for telephone-based speech applications will grow-reaching $3.5 billion by 2007, according to Steve McClure, a vice president in the software research group at market analysis firm IDC. In late 2002, for example, Bell Canada installed a $4.5 million voice response system built by Menlo Park, CA-based Nuance. “Based on the results we’re seeing, the actual return on investment will take only about 10 months,” says Belinda Banks, Bell Canada’s associate director of customer care. Overall, the company expects to save $5.3 million in customer service costs this year alone.

And this is only phase one in the deployment of language-processing systems. Companies like Nuance and Boston’s SpeechWorks, the two market leaders in interactive voice response systems, are succeeding partly because they’ve tailored their technologies for narrow domains-such as travel information-where the vocabularies and concepts they must master are restricted. Even as such systems take over the customer service niche, other companies are still pursuing the challenge of true natural-language understanding. If research efforts at IBM and the Palo Alto Research Center (PARC), for example, bear fruit, computers may soon be able to interpret almost any conversation, or to retrieve almost any information a Web user wants, even if it’s locked away in a video file or a foreign language-opening markets wherever people seek knowledge via computer networks. Predicts IDC’s McClure, “Whereas the GUI [graphical user interface] was the interface for the 1990s, the NUI, or natural’ user interface, will be the interface for this decade.”


0 comments about this story. Start the discussion »

Tagged: Communications

Reprints and Permissions | Send feedback to the editor

From the Archives


Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me