One reason that audio beamforming is expensive is because it is time-consuming to calibrate a given real-world system, says Microsoft’s Tashev. Each speaker has slight variations in the sound it emits, and since focusing a sound beam requires extreme precision in timing, these slight variations can cause large distortions in sound. Therefore, the software used to focus the sound is calibrated to work with specific hardware, and when it’s purchased, the whole system needs to be calibrated to the shape of the room in which it’s installed.
Microsoft wants to develop software that’s good enough to work with any speakers, with a minimal amount of calibration required at the factory or by users. To allow generic speakers to focus sound, Tashev and his group have modified well-known beamforming algorithms. They designed a part of a signal-processing algorithm, called a filter, to accommodate a wide range of manufacturing tolerances, or the data that describe speaker performance at various frequencies. “You have to know how those parameters vary,” Tashev says. “When you design the algorithm, you do it for multiple instances of speaker arrays.”
The trick, he says, is to try to find a happy medium among the different tolerances so that the resultant sound is comparable across speakers. This requires some fine-tuning, and the researchers are still determining the best way to implement the speaker tolerances. However, Tashev concedes, by making a generic beamforming algorithm, there will most likely be a trade-off in performance. “You have to make some compromises,” he says.
Tashev points out that the project is still in its early stages. “Even if you have a good beamformer, it’s not enough,” he says. “You also have to have a sound localizer [such as a camera or specialized microphone array] that tells you where to point the beam.” Moreover, he says, in order for the beamforming algorithm to be successful, it would need to take into account sound reflections from walls and windows within an office.
“It’d be neat to see this out there,” says Stan Birchfield, professor of electrical and computer engineering at Clemson University, in Clemson, SC. Birchfield works on image-processing techniques that use cameras to identify a person’s location to improve the focus of microphone arrays. “Tracking is a really hard problem,” he says, one that no one has found a way to solve for an environment like an office. It’s encouraging that Microsoft is exploring the area, Birchfield adds, but until the company has plans for products, he’s “cautious of getting enthusiastic.”
Tashev says that commercialization of this technology will require a complex coordination of many factors that could take up to three years to achieve even if a research prototype has been perfected. Even that step will take time: Tashev says the group still needs to test the reliability of the algorithm with a number of speaker arrays. Then, in order to turn the work into a product, Microsoft will need to find the best way to integrate the algorithm into Windows Media Player, make sure drivers for the hardware are included in the operating system, and, Tashev says, find companies that are interested in manufacturing speakers for such an application. But if and when all this happens, the payoff will be great, he says. People will no longer need headsets to have a private Skype conversation or video conference.