Until recently, the idea of holding a conversation with a computer seemed pure science fiction. If you asked a computer to “open the pod bay doors”—well, that was only in movies.
But things are changing, and quickly. A growing number of people now talk to their mobile smart phones, asking them to send e-mail and text messages, search for directions, or find information on the Web.
“We’re at a transition point where voice and natural-language understanding are suddenly at the forefront,” says Vlad Sejnoha, chief technology officer of Nuance Communications, a company based in Burlington, Massachusetts, that dominates the market for speech recognition with its Dragon software and other products. “I think speech recognition is really going to upend the current [computer] interface.”
Progress has come about thanks in part to steady progress in the technologies needed to help machines understand human speech, including machine learning and statistical data-mining techniques. Sophisticated voice technology is already commonplace in call centers, where it lets users navigate through menus and helps identify irate customers who should be handed off to a real customer service rep.
Now the rapid rise of powerful mobile devices is making voice interfaces even more useful and pervasive.
Jim Glass, a senior research scientist at MIT who has been working on speech interfaces since the 1980s, says today’s smart phones pack as much processing power as the laboratory machines he worked with in the ’90s. Smart phones also have high-bandwidth data connections to the cloud, where servers can do the heavy lifting involved with both voice recognition and understanding spoken queries. “The combination of more data and more computing power means you can do things today that you just couldn’t do before,” says Glass. “You can use more sophisticated statistical models.”
The most prominent example of a mobile voice interface is, of course, Siri, the voice-activated personal assistant that comes built into the latest iPhone. But voice functionality is built into Android, the Windows Phone platform, and most other mobile systems, as well as many apps. While these interfaces still have considerable limitations (see Social Intelligence), we are inching closer to machine interfaces we can actually talk to.
Nuance is at the heart of the boom in voice technology. The company was founded in 1992 as Visioneer and has acquired dozens of other voice technology businesses. It now has more than 6,000 staff members at 35 locations around the world, and its revenues in the second quarter of 2012 were $390.3 million, a 22.4 percent increase over the same period in 2011.
In recent years, Nuance has deftly applied its expertise in voice recognition to the emerging market for speech interfaces. The company supplies voice recognition technology to many other companies and is widely believed to provide the speech component of Siri.
Speech is ideally suited to mobile computing, says Nuance’s CTO, partly because users have their hands and eyes otherwise occupied—but also because a single spoken command can accomplish tasks that would normally require a multitude of swipes and presses. “Suddenly you have this new building block, this new dimension that you can bring to the problem,” says Sejnoha. “And I think we’re going to be designing the basic modern device UI with that in mind.”