The Chinese Solar Machine Layer by Layer Fire in the Library The Mystery Behind Anesthesia
While hearing in machines lags far behind vision in machines, the potential is great, and researchers are beginning to make impressive progress.
Technology Review has invited members of the 2006 TR35 to tell us about their hopes for research in 2007. Paris Smaragdis explains the importance of improving machine hearing. Smaragdis, a 2006 TR35, is a research scientist at MERL Research Lab, in Cambridge, MA.
Understanding how we perceive the world, and using that knowledge to make machines that can mimic us, has been an ongoing and exciting scientific quest. Vision has had the lion's share of attention in the field. Our understanding of image structure and form is well developed. The development of machine learning and artificial intelligence (AI) has immensely benefited from--and has been immensely influenced by--vision problems. And we all understand why computers and ATM machines come equipped with cameras nowadays. The rest of the senses have not been investigated as much as vision has. Having a machine exhibit hearing is not something that people think about. Sure, computers can (sort of) recognize speech, but is that all hearing is good for? Surely we do more with our ears than just hear other people talk.
Our thinking is so concretely grounded in vision that hearing, as well as our other senses, has become a subconscious processes. But hearing is important for a lot of tasks. You can hear your baby cry from upstairs; you can hear the car you didn't see approaching you in the pedestrian crosswalk; and you can hear that not-so-friendly dog growling behind your back. Machines can do their own set of valuable hearing tasks. They can listen for survivors in a collapsed building's rubble; they can help soldiers locate who shot at them; they can listen for breathing problems in patients in intensive care; and they can try to filter out that annoying neighbor who loves to sing really loudly in the shower.
The technical challenges in computational audition are plenty. As in all fields of computational perception, there is a thrillingly large number of problems awaiting exploration that will keep technologists busy for a long time. However, until the idea of a hearing machine captures the public imagination, these problems will stay at the fringe of computer science. And being at the fringe comes with an extra burden to those of us working in the field.
Our knowledge of human hearing is relatively limited. We know how our ears work, but we mostly improvise in our descriptions as neural signals move deeper into the brain. The study of auditory psychology is not even close to where we want it to be. Machine learning, AI, and classical computer-science algorithms are deeply rooted in a visual way of thinking that does not extend naturally to reasoning about sound. Our own ability to describe sounds and the process of hearing is predominantly limited to vocabulary developed for music. These problems share the common cause that hearing (whether human or machine) is not something that has attracted adequate attention. Because of this, the process of creating a new technology in this field--from finding bibliographical references and abstracting to simpler problems, to actually explaining the point of it all in a business or technical meeting--is a fight against the unknown. Things are getting better, though: in the past few years an increasing number of researchers interested in computational perception have started showing interest in hearing (as well as in the senses taste and olfaction), and we have seen some amazing progress in our field as well as the slow emergence of relevant products in the mainstream.
So keep your mind and ears open. You might not see much of hearing machines today, but you'll be hearing about them soon.
Much has been accomplished with machine hearing
(To the author) Much has been accomplished with machine hearing, so I disagree somewhat with your premise. Voice Recognition is now fast and reliable. A new auto-attendant system from IBM can also sense your stress level. Work on emotion analysis in voice / audio streams is far advanced; for example:
Today, Affective Media Limited in Scotland is working to help computers better understand people in various stages of emotional stress. Affective Media even has an online demo with an animated character named Tetchy the Turtle, who accepts voice samples and analyzes them.
Dr Christian Jones, the chief executive of Affective Media, puts it this way:
"When you are depressed or sad, the pitch of your voice drops and your speech slows down. When you are angry, the pitch rises and the volume of your voice goes up. We betray our emotions as we talk in dozens of subtle ways. Our recognition system uses 40 of these. It ignores the words you use, and concentrates exclusively on the sound quality of speech. It can tell your emotional state the very first time it hears your voice."
Affective Media is planning for a future in which it will be important that machines are able to understand the different states of their human colleagues. "Soon we will talk to our cars. We will give them voice commands to turn on CD players, heaters and fans," said Jones. "Using emotion recognition, those commands would also show if we are angry, frustrated, or sleepy."
URL: http://www.affectivemedia.com/demo.htm
In addition to this company, several others are incorporating voice / emotion analysis technology in their relational agents / chatbot avatars to greatly improve the performances of their characters. It is also being combined with translation software for improved nuance. The most 'showy' example of this integration is the Korean 'android' EveR2 Muse. She can hear what you say and chat in response.
Re: Much has been accomplished with machine hearing
I think Paris is talking about the lack of knowledge not in the field of the voice recognition systems but in everyday sounds in general.
Manufacturing in the United States is in trouble. That's bad news not just for the country's economy but for the future of innovation.
Our list of the 50 most innovative companies, including the following:
ms
190 Comments
We know how our ears work?
I attended a talk last year about the mechanical functioning of the ear, and it seems that there is an enormous amount we don't know about how the ear works. In particular the ear appears to be an active (rather than passive) receiver, possibly partially explaining its sensitivity and ability to handle so many orders of magnitude of sound intensity.
Reply
stan@adnamis.org
6 Comments
Re: We know how our ears work?
It can be argued that all our senses are active, if for no other reason noise discrimination to reduce processing load. That is all sensory data is pre processed before being passed along to the higher brain functions such as pattern recognition.
Reply