Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo


Unsupported browser: Your browser does not meet modern web standards. See how it scores »

{ action.text }

Isolating speech signals in audio recordings

Context: Microphones placed around a meeting room tend to yield recordings where voices overlap and are hard to distinguish. When there are at least as many microphones present as people talking, computer algorithms have been able to isolate the audio of each speaker. But if fewer microphones are used, these methods don’t work, and problems of voice overlapping can persist. Alternative methods require creating a profile of each speaker’s voice from previous recordings or making certain assumptions about the audio signals. Now Francis Bach and Michael Jordan of the University of California, Berkeley, have developed an algorithm that separates the voices of multiple speakers in recordings made with just one microphone, without requiring strong prior assumptions or speaker profiles.

Methods and Results: Bach and Jordan’s algorithm homes in on the voice characteristics that are most likely to vary among people. The recorded sounds are laid out in a spectrogram, which shows the intensity of sound of various frequencies over time in a two-dimensional graph. Bach and Jordan’s algorithm automatically divides up the spectrogram among the speakers; it assumes that parts of the spectrogram are likely to be from the same speaker if they are near each other on the graph, vary similarly over time, or are alike in pitch and timbre. The algorithm is trained on samples in which separately recorded voices have been mixed; based on the training, the algorithm assigns a relative importance to each characteristic – say, timbre or tempo. Then the algorithm applies this training to new recordings. So far, the authors have been able to separate the overlapping voices in several recordings of pairs of speakers. Although the separation is not perfect, both speakers are more intelligible.

Why it Matters: Historians, journalists, lawyers, and other professionals rely on recorded conversations. These recordings are often made using a single microphone but feature multiple voices. By making babble more comprehensible, Bach and Jordan’s algorithm promises to make such recordings more useful and easier to analyze. Hence, users may no longer be forced to haul around bulky, expensive equipment when recording important conversations and events.

Source: Bach, F. R., and M. I. Jordan. 2005. Blind one-microphone speech separation: a spectral learning approach. Advances in Neural Information Processing Systems 17 (in press).

0 comments about this story. Start the discussion »

Tagged: Computing

Reprints and Permissions | Send feedback to the editor

From the Archives


Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me