Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Speak Easy

IBM aims to solve speech recognition’s nagging problems.

The idea of computers that accurately understand human speech has both enticed and frustrated engineers. But now, IBM Research in Yorktown Heights, NY, is undertaking a multiyear project to finally solve all the problems that have kept voice recognition systems from comprehending free-form conversations-and becoming mainstream technology.

IBM aims to create a system that understands perhaps 20 languages, including medical and legal terms, with about 98 percent accuracy-a big improvement over the 80 to 85 percent accuracy of IBM’s own speech recognition products and those from firms such as Peabody, MA-based ScanSoft. Troubles with accuracy are largely to blame for the limited market for speech recognition, which has so far been relegated mainly to dictation and telephone-based automated-response applications. IBM also hopes to overcome the other limitations of current systems: the need for hours of training, quiet surroundings and steady voice inflections. By making voice recognition more accurate and more broadly applicable, IBM believes it could open markets in real-time transcription for business meetings and new voice interfaces for handheld computers, or for search engines that could retrieve sound bites from audio databases of news broadcasts and speeches.

In current speech recognition technology, algorithms compare the waveform, an electronic representation of a word, to a master waveform database to develop a short list of possible matches, then select the most commonly used word on that list. IBM is exploring ways to make better matches, including new algorithms that make guesses based on the context of the conversation. IBM researchers have also built a lip-reading video system that reduces errors by one-third, says David Nahamoo, group manager of Human Language Technologies for IBM Research. “We’re combining audio and visual features together, which we’re feeding into our recognition engines,” he says. “We’re learning how to use one to clean up the other.”

This story is part of our May 2002 Issue
See the rest of the issue
Subscribe

Some experts are skeptical. Real-time meeting transcription is still “lab stuff right now,” says Steve McClure, vice president at technology market researcher IDC in Framingham, MA. “I’ve seen IBM demos work fine one time, and another time the damn application wouldn’t work at all.” Nahamoo concedes the initiative needs years of work and lots of luck to reach its goals. But given the speech recognition industry’s history of failing to deliver on its promises, Big Blue’s newest push could provide a few words of encouragement to the struggling technology.

AI is here. Will you lead or follow?
Join us at EmTech Digital 2019.

Register now
Want more award-winning journalism? Subscribe to Insider Online Only.
  • Insider Online Only {! insider.prices.online !}*

    {! insider.display.menuOptionsLabel !}

    Unlimited online access including articles and video, plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

/3
You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.