Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo

 

Unsupported browser: Your browser does not meet modern web standards. See how it scores »

The idea of computers that accurately understand human speech has both enticed and frustrated engineers. But now, IBM Research in Yorktown Heights, NY, is undertaking a multiyear project to finally solve all the problems that have kept voice recognition systems from comprehending free-form conversations-and becoming mainstream technology.

IBM aims to create a system that understands perhaps 20 languages, including medical and legal terms, with about 98 percent accuracy-a big improvement over the 80 to 85 percent accuracy of IBM’s own speech recognition products and those from firms such as Peabody, MA-based ScanSoft. Troubles with accuracy are largely to blame for the limited market for speech recognition, which has so far been relegated mainly to dictation and telephone-based automated-response applications. IBM also hopes to overcome the other limitations of current systems: the need for hours of training, quiet surroundings and steady voice inflections. By making voice recognition more accurate and more broadly applicable, IBM believes it could open markets in real-time transcription for business meetings and new voice interfaces for handheld computers, or for search engines that could retrieve sound bites from audio databases of news broadcasts and speeches.

In current speech recognition technology, algorithms compare the waveform, an electronic representation of a word, to a master waveform database to develop a short list of possible matches, then select the most commonly used word on that list. IBM is exploring ways to make better matches, including new algorithms that make guesses based on the context of the conversation. IBM researchers have also built a lip-reading video system that reduces errors by one-third, says David Nahamoo, group manager of Human Language Technologies for IBM Research. “We’re combining audio and visual features together, which we’re feeding into our recognition engines,” he says. “We’re learning how to use one to clean up the other.”

Some experts are skeptical. Real-time meeting transcription is still “lab stuff right now,” says Steve McClure, vice president at technology market researcher IDC in Framingham, MA. “I’ve seen IBM demos work fine one time, and another time the damn application wouldn’t work at all.” Nahamoo concedes the initiative needs years of work and lots of luck to reach its goals. But given the speech recognition industry’s history of failing to deliver on its promises, Big Blue’s newest push could provide a few words of encouragement to the struggling technology.

0 comments about this story. Start the discussion »

Tagged: Communications

Reprints and Permissions | Send feedback to the editor

From the Archives