Skip to Content

Digital Lip Reader

September 1, 2003

Speech recognition is a long-promised technology that’s finally beginning to deliver. But today’s best systems tend to fail when the speaker is in a noisy spot. To fix this problem, researchers are adding lip-reading to the mix.

While people rely on mouth shapes all the time to interpret speech, lip reading is no simple task for a computer. For one thing, each shape can correspond to several specific sounds. To make matters worse, mouth movements begin as much as 120 milliseconds before a sound is uttered. Humans can use other cues such as sentence context and facial expressions to overcome these difficulties, but until recently, computers lacked the processing power to do so.

Now groups at Intel, IBM, and other institutions are modifying language-processing programs to link each vocal sound to several possible mouth movements, allowing the software to make a best guess about what’s being uttered. In tests in noisy environments, adding visual information boosted speech recognition accuracy from 20 percent to 75 percent, says Ara Nefian, a senior researcher at Intel Research in Santa Clara, CA.

Initially, this is likely to be most useful to doctors and others working in noisy locations who need better accuracy from office dictation software. With this audience in mind, IBM is building a tiny camera into the boom microphone that comes with existing speech recognition software. Further down the road, researchers envision the day when your car dashboard might have a camera peering at your lips for voice-actuated controls, or your cell phone might watch what you say.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

It’s time to retire the term “user”

The proliferation of AI means we need a new word.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.