Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo


Unsupported browser: Your browser does not meet modern web standards. See how it scores »

{ action.text }

Technology Review has invited members of the 2006 TR35 to tell us about their hopes for research in 2007. Paris Smaragdis explains the importance of improving machine hearing. Smaragdis, a 2006 TR35, is a research scientist at MERL Research Lab, in Cambridge, MA.

Understanding how we perceive the world, and using that knowledge to make machines that can mimic us, has been an ongoing and exciting scientific quest. Vision has had the lion’s share of attention in the field. Our understanding of image structure and form is well developed. The development of machine learning and artificial intelligence (AI) has immensely benefited from–and has been immensely influenced by–vision problems. And we all understand why computers and ATM machines come equipped with cameras nowadays. The rest of the senses have not been investigated as much as vision has. Having a machine exhibit hearing is not something that people think about. Sure, computers can (sort of) recognize speech, but is that all hearing is good for? Surely we do more with our ears than just hear other people talk.

Our thinking is so concretely grounded in vision that hearing, as well as our other senses, has become a subconscious processes. But hearing is important for a lot of tasks. You can hear your baby cry from upstairs; you can hear the car you didn’t see approaching you in the pedestrian crosswalk; and you can hear that not-so-friendly dog growling behind your back. Machines can do their own set of valuable hearing tasks. They can listen for survivors in a collapsed building’s rubble; they can help soldiers locate who shot at them; they can listen for breathing problems in patients in intensive care; and they can try to filter out that annoying neighbor who loves to sing really loudly in the shower.

The technical challenges in computational audition are plenty. As in all fields of computational perception, there is a thrillingly large number of problems awaiting exploration that will keep technologists busy for a long time. However, until the idea of a hearing machine captures the public imagination, these problems will stay at the fringe of computer science. And being at the fringe comes with an extra burden to those of us working in the field.

Our knowledge of human hearing is relatively limited. We know how our ears work, but we mostly improvise in our descriptions as neural signals move deeper into the brain. The study of auditory psychology is not even close to where we want it to be. Machine learning, AI, and classical computer-science algorithms are deeply rooted in a visual way of thinking that does not extend naturally to reasoning about sound. Our own ability to describe sounds and the process of hearing is predominantly limited to vocabulary developed for music. These problems share the common cause that hearing (whether human or machine) is not something that has attracted adequate attention. Because of this, the process of creating a new technology in this field–from finding bibliographical references and abstracting to simpler problems, to actually explaining the point of it all in a business or technical meeting–is a fight against the unknown. Things are getting better, though: in the past few years an increasing number of researchers interested in computational perception have started showing interest in hearing (as well as in the senses taste and olfaction), and we have seen some amazing progress in our field as well as the slow emergence of relevant products in the mainstream.

So keep your mind and ears open. You might not see much of hearing machines today, but you’ll be hearing about them soon.

3 comments. Share your thoughts »

Tagged: Communications, artificial intelligence, machine learning, vision, hearing

Reprints and Permissions | Send feedback to the editor

From the Archives


Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me