Hearing Machines

While hearing in machines lags far behind vision in machines, the potential is great, and researchers are beginning to make impressive progress.

Paris Smaragdisarchive page

January 3, 2007

Technology Review has invited members of the 2006 TR35 to tell us about their hopes for research in 2007. Paris Smaragdis explains the importance of improving machine hearing. Smaragdis, a 2006 TR35, is a research scientist at MERL Research Lab, in Cambridge, MA.

Understanding how we perceive the world, and using that knowledge to make machines that can mimic us, has been an ongoing and exciting scientific quest. Vision has had the lion’s share of attention in the field. Our understanding of image structure and form is well developed. The development of machine learning and artificial intelligence (AI) has immensely benefited from–and has been immensely influenced by–vision problems. And we all understand why computers and ATM machines come equipped with cameras nowadays. The rest of the senses have not been investigated as much as vision has. Having a machine exhibit hearing is not something that people think about. Sure, computers can (sort of) recognize speech, but is that all hearing is good for? Surely we do more with our ears than just hear other people talk.

Our thinking is so concretely grounded in vision that hearing, as well as our other senses, has become a subconscious processes. But hearing is important for a lot of tasks. You can hear your baby cry from upstairs; you can hear the car you didn’t see approaching you in the pedestrian crosswalk; and you can hear that not-so-friendly dog growling behind your back. Machines can do their own set of valuable hearing tasks. They can listen for survivors in a collapsed building’s rubble; they can help soldiers locate who shot at them; they can listen for breathing problems in patients in intensive care; and they can try to filter out that annoying neighbor who loves to sing really loudly in the shower.

The technical challenges in computational audition are plenty. As in all fields of computational perception, there is a thrillingly large number of problems awaiting exploration that will keep technologists busy for a long time. However, until the idea of a hearing machine captures the public imagination, these problems will stay at the fringe of computer science. And being at the fringe comes with an extra burden to those of us working in the field.

Our knowledge of human hearing is relatively limited. We know how our ears work, but we mostly improvise in our descriptions as neural signals move deeper into the brain. The study of auditory psychology is not even close to where we want it to be. Machine learning, AI, and classical computer-science algorithms are deeply rooted in a visual way of thinking that does not extend naturally to reasoning about sound. Our own ability to describe sounds and the process of hearing is predominantly limited to vocabulary developed for music. These problems share the common cause that hearing (whether human or machine) is not something that has attracted adequate attention. Because of this, the process of creating a new technology in this field–from finding bibliographical references and abstracting to simpler problems, to actually explaining the point of it all in a business or technical meeting–is a fight against the unknown. Things are getting better, though: in the past few years an increasing number of researchers interested in computational perception have started showing interest in hearing (as well as in the senses taste and olfaction), and we have seen some amazing progress in our field as well as the slow emergence of relevant products in the mainstream.

So keep your mind and ears open. You might not see much of hearing machines today, but you’ll be hearing about them soon.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.