Skip to Content
Uncategorized

Speak Easy

IBM aims to solve speech recognition’s nagging problems.
May 1, 2002

The idea of computers that accurately understand human speech has both enticed and frustrated engineers. But now, IBM Research in Yorktown Heights, NY, is undertaking a multiyear project to finally solve all the problems that have kept voice recognition systems from comprehending free-form conversations-and becoming mainstream technology.

IBM aims to create a system that understands perhaps 20 languages, including medical and legal terms, with about 98 percent accuracy-a big improvement over the 80 to 85 percent accuracy of IBM’s own speech recognition products and those from firms such as Peabody, MA-based ScanSoft. Troubles with accuracy are largely to blame for the limited market for speech recognition, which has so far been relegated mainly to dictation and telephone-based automated-response applications. IBM also hopes to overcome the other limitations of current systems: the need for hours of training, quiet surroundings and steady voice inflections. By making voice recognition more accurate and more broadly applicable, IBM believes it could open markets in real-time transcription for business meetings and new voice interfaces for handheld computers, or for search engines that could retrieve sound bites from audio databases of news broadcasts and speeches.

In current speech recognition technology, algorithms compare the waveform, an electronic representation of a word, to a master waveform database to develop a short list of possible matches, then select the most commonly used word on that list. IBM is exploring ways to make better matches, including new algorithms that make guesses based on the context of the conversation. IBM researchers have also built a lip-reading video system that reduces errors by one-third, says David Nahamoo, group manager of Human Language Technologies for IBM Research. “We’re combining audio and visual features together, which we’re feeding into our recognition engines,” he says. “We’re learning how to use one to clean up the other.”

Some experts are skeptical. Real-time meeting transcription is still “lab stuff right now,” says Steve McClure, vice president at technology market researcher IDC in Framingham, MA. “I’ve seen IBM demos work fine one time, and another time the damn application wouldn’t work at all.” Nahamoo concedes the initiative needs years of work and lots of luck to reach its goals. But given the speech recognition industry’s history of failing to deliver on its promises, Big Blue’s newest push could provide a few words of encouragement to the struggling technology.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.