AI Is Learning to Pick Out Voices from a Crowd’s Chatter
Current voice recognition systems are pretty good—if only one person speaks. But as we’ve said before, understanding voices among more people, which is often known as the cocktail party problem, is tough—even for firms like Amazon, which has amassed gobs of data via its Alexa smart assistant platform.
Now, though, a team of researchers from Mitsubishi Electric Research Laboratory has developed a trick to identify features in a voice that can be used to track a single person in conversation. According to New Scientist, by chopping up audio and identifying how clusters of those features occur over time, it’s possible to trace a voice even in the din of a crowd.
How good is it? Well, results published on the arXiv suggest it can track a single person in conversation even when five people are talking, and can isolate a single voice from two others with 80 percent accuracy. So, not perfect. But it’s a big step toward having Alexa understand you when you ask it to play your new jam over the hubbub of your friends at a dinner party.
Keep Reading
Most Popular
The inside story of how ChatGPT was built from the people who made it
Exclusive conversations that take us behind the scenes of a cultural phenomenon.
ChatGPT is about to revolutionize the economy. We need to decide what that looks like.
New large language models will transform many jobs. Whether they will lead to widespread prosperity or not is up to us.
Sam Altman invested $180 million into a company trying to delay death
Can anti-aging breakthroughs add 10 healthy years to the human life span? The CEO of OpenAI is paying to find out.
GPT-4 is bigger and better than ChatGPT—but OpenAI won’t say why
We got a first look at the much-anticipated big new language model from OpenAI. But this time how it works is even more deeply under wraps.
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.