Skip to Content

AI Is Learning to Pick Out Voices from a Crowd’s Chatter


Current voice recognition systems are pretty good—if only one person speaks. But as we’ve said before, understanding voices among more people, which is often known as the cocktail party problem, is tough—even for firms like Amazon, which has amassed gobs of data via its Alexa smart assistant platform.

Now, though, a team of researchers from Mitsubishi Electric Research Laboratory has developed a trick to identify features in a voice that can be used to track a single person in conversation. According to New Scientist, by chopping up audio and identifying how clusters of those features occur over time, it’s possible to trace a voice even in the din of a crowd.

How good is it? Well, results published on the arXiv suggest it can track a single person in conversation even when five people are talking, and can isolate a single voice from two others with 80 percent accuracy. So, not perfect. But it’s a big step toward having Alexa understand you when you ask it to play your new jam over the hubbub of your friends at a dinner party.