You probably use voice recognition technology already, if in a limited capacity. Maybe you use Google’s voice-activated search, or take advantage of its (somewhat wonky) voice-mail transcriptions in Google Voice. At the office, maybe you use Dragon dictation software. Even if these programs worked perfectly, though (which they don’t) they would still leave something to be desired. Voice recognition software today works in very specialized circumstances—it can typically recognize only one voice at a time, and it performs best when it has reams of data in the archive before tackling a new speech sample.
What if we had voice recognition technology that didn’t have so many strictures? What if we had software that was quick and nimble, able to discern one speaker from another on the fly? In other words, what if voice recognition technology was more like the way voice recognition actually works in the real world, in the human brain?
A coalition of three British Universities—the Universities of Cambridge, Sheffield, and Edinburgh—is working to bring us what they call “natural speech technology.” Google and Dragon are (relatively) good at what they do, Thomas Hain of Sheffield recently told The Engineer. “But where it’s about natural speech—people having a normal conversation—these applications still have very poor performance.”
With nearly $10 million of funding from Britain’s Engineering and Physical Sciences Research Council, the team has set itself four main technical objectives.
First, they want to make speech software that’s smart–that can learn and adapt on the fly. They intend to build models and algorithms that can “adapt to new scenarios and speaking styles, and seamlessly adapt to new situations and contexts almost instantaneously,” the team members write.
Second, they want those models and algorithms to be smart enough to eavesdrop on a meeting, and to be able to sift “who spoke what, when, and how”—in other words, they want speech software as adept as a great human stenographer. Then, looking forward, the team’s third and fourth goals are to create technologies building on their models: speech synthesizers (for sufferers of stroke or neurodegenerative diseases) that learn from data and that are “capable of generating the full expressive diversity of natural speech”; and various other applications. These are as yet vaguely defined, but which might include something the team calls “personal listeners.”
It’s very ambitious stuff, enough to make you pause and consider a future in which speech recognition is ubiquitous, seamless, and orders of magnitude more useful than it is today. Some of the researchers are already at work on some applications; Hain’s award-winning team is collaborating with the BBC to transcribe its back catalog of audio and video footage.
Saudi Arabia plans to spend $1 billion a year discovering treatments to slow aging
The oil kingdom fears that its population is aging at an accelerated rate and hopes to test drugs to reverse the problem. First up might be the diabetes drug metformin.
Yann LeCun has a bold new vision for the future of AI
One of the godfathers of deep learning pulls together old ideas to sketch out a fresh path for AI, but raises as many questions as he answers.
The dark secret behind those cute AI-generated animal images
Google Brain has revealed its own image-making AI, called Imagen. But don't expect to see anything that isn't wholesome.
A quick guide to the most important AI law you’ve never heard of
The European Union is planning new legislation aimed at curbing the worst harms associated with artificial intelligence.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.