Context: Microphones placed around a meeting room tend to yield recordings where voices overlap and are hard to distinguish. When there are at least as many microphones present as people talking, computer algorithms have been able to isolate the audio of each speaker. But if fewer microphones are used, these methods don’t work, and problems of voice overlapping can persist. Alternative methods require creating a profile of each speaker’s voice from previous recordings or making certain assumptions about the audio signals. Now Francis Bach and Michael Jordan of the University of California, Berkeley, have developed an algorithm that separates the voices of multiple speakers in recordings made with just one microphone, without requiring strong prior assumptions or speaker profiles.
Methods and Results: Bach and Jordan’s algorithm homes in on the voice characteristics that are most likely to vary among people. The recorded sounds are laid out in a spectrogram, which shows the intensity of sound of various frequencies over time in a two-dimensional graph. Bach and Jordan’s algorithm automatically divides up the spectrogram among the speakers; it assumes that parts of the spectrogram are likely to be from the same speaker if they are near each other on the graph, vary similarly over time, or are alike in pitch and timbre. The algorithm is trained on samples in which separately recorded voices have been mixed; based on the training, the algorithm assigns a relative importance to each characteristic – say, timbre or tempo. Then the algorithm applies this training to new recordings. So far, the authors have been able to separate the overlapping voices in several recordings of pairs of speakers. Although the separation is not perfect, both speakers are more intelligible.
Why it Matters: Historians, journalists, lawyers, and other professionals rely on recorded conversations. These recordings are often made using a single microphone but feature multiple voices. By making babble more comprehensible, Bach and Jordan’s algorithm promises to make such recordings more useful and easier to analyze. Hence, users may no longer be forced to haul around bulky, expensive equipment when recording important conversations and events.