David Zax

A View from David Zax

Can We Make Machines Listen More Carefully?

A new project aims to create far smarter voice recognition software.

  • July 19, 2011

You probably use voice recognition technology already, if in a limited capacity. Maybe you use Google’s voice-activated search, or take advantage of its (somewhat wonky) voice-mail transcriptions in Google Voice. At the office, maybe you use Dragon dictation software. Even if these programs worked perfectly, though (which they don’t) they would still leave something to be desired. Voice recognition software today works in very specialized circumstances—it can typically recognize only one voice at a time, and it performs best when it has reams of data in the archive before tackling a new speech sample.

What if we had voice recognition technology that didn’t have so many strictures? What if we had software that was quick and nimble, able to discern one speaker from another on the fly? In other words, what if voice recognition technology was more like the way voice recognition actually works in the real world, in the human brain?

A coalition of three British Universities—the Universities of Cambridge, Sheffield, and Edinburgh—is working to bring us what they call “natural speech technology.” Google and Dragon are (relatively) good at what they do, Thomas Hain of Sheffield recently told The Engineer. “But where it’s about natural speech—people having a normal conversation—these applications still have very poor performance.”

With nearly $10 million of funding from Britain’s Engineering and Physical Sciences Research Council, the team has set itself four main technical objectives.

First, they want to make speech software that’s smart–that can learn and adapt on the fly. They intend to build models and algorithms that can “adapt to new scenarios and speaking styles, and seamlessly adapt to new situations and contexts almost instantaneously,” the team members write.

Second, they want those models and algorithms to be smart enough to eavesdrop on a meeting, and to be able to sift “who spoke what, when, and how”—in other words, they want speech software as adept as a great human stenographer. Then, looking forward, the team’s third and fourth goals are to create technologies building on their models: speech synthesizers (for sufferers of stroke or neurodegenerative diseases) that learn from data and that are “capable of generating the full expressive diversity of natural speech”; and various other applications. These are as yet vaguely defined, but which might include something the team calls “personal listeners.”

It’s very ambitious stuff, enough to make you pause and consider a future in which speech recognition is ubiquitous, seamless, and orders of magnitude more useful than it is today. Some of the researchers are already at work on some applications; Hain’s award-winning team is collaborating with the BBC to transcribe its back catalog of audio and video footage.

Want to go ad free? No ad blockers needed.

Become an Insider
Already an Insder? Log in.

Uh oh–you've read all of your free articles for this month.

Insider Premium
$179.95/yr US PRICE

More from Intelligent Machines

Artificial intelligence and robots are transforming how we work and live.

Want more award-winning journalism? Subscribe to Insider Basic.
  • Insider Basic {! insider.prices.basic !}*

    {! insider.display.menuOptionsLabel !}

    Six issues of our award winning magazine and daily delivery of The Download, our newsletter of what’s important in technology and innovation.

    See details+

    What's Included

    Bimonthly home delivery and unlimited 24/7 access to MIT Technology Review’s website.

    The Download. Our daily newsletter of what's important in technology and innovation.

/
You've read all of your free articles this month. This is your last free article this month. You've read of free articles this month. or  for unlimited online access.