Skip to Content

Can We Make Machines Listen More Carefully?

A new project aims to create far smarter voice recognition software.
July 19, 2011

You probably use voice recognition technology already, if in a limited capacity. Maybe you use Google’s voice-activated search, or take advantage of its (somewhat wonky) voice-mail transcriptions in Google Voice. At the office, maybe you use Dragon dictation software. Even if these programs worked perfectly, though (which they don’t) they would still leave something to be desired. Voice recognition software today works in very specialized circumstances—it can typically recognize only one voice at a time, and it performs best when it has reams of data in the archive before tackling a new speech sample.

What if we had voice recognition technology that didn’t have so many strictures? What if we had software that was quick and nimble, able to discern one speaker from another on the fly? In other words, what if voice recognition technology was more like the way voice recognition actually works in the real world, in the human brain?

A coalition of three British Universities—the Universities of Cambridge, Sheffield, and Edinburgh—is working to bring us what they call “natural speech technology.” Google and Dragon are (relatively) good at what they do, Thomas Hain of Sheffield recently told The Engineer. “But where it’s about natural speech—people having a normal conversation—these applications still have very poor performance.”

With nearly $10 million of funding from Britain’s Engineering and Physical Sciences Research Council, the team has set itself four main technical objectives.

First, they want to make speech software that’s smart–that can learn and adapt on the fly. They intend to build models and algorithms that can “adapt to new scenarios and speaking styles, and seamlessly adapt to new situations and contexts almost instantaneously,” the team members write.

Second, they want those models and algorithms to be smart enough to eavesdrop on a meeting, and to be able to sift “who spoke what, when, and how”—in other words, they want speech software as adept as a great human stenographer. Then, looking forward, the team’s third and fourth goals are to create technologies building on their models: speech synthesizers (for sufferers of stroke or neurodegenerative diseases) that learn from data and that are “capable of generating the full expressive diversity of natural speech”; and various other applications. These are as yet vaguely defined, but which might include something the team calls “personal listeners.”

It’s very ambitious stuff, enough to make you pause and consider a future in which speech recognition is ubiquitous, seamless, and orders of magnitude more useful than it is today. Some of the researchers are already at work on some applications; Hain’s award-winning team is collaborating with the BBC to transcribe its back catalog of audio and video footage.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.