Cell Phones That Listen and Learn

New software tracks a user’s behavior by monitoring everyday sounds.

Kristina Grifantiniarchive page

June 22, 2009

Researchers are increasingly using cell phones to better understand users’ behavior and social interactions. The data collected from a phone’s GPS chip or accelerometer, for example, can reveal trends that are relevant to modeling the spread of disease, determining personal health-care needs, improving time management, and even updating social-networks. The approach, known as reality mining, has also been suggested as a way to improve targeted advertising or make cell phones smarter: a device that knows its owner is in a meeting could automatically switch its ringer off, for example.

Now a group at Dartmouth College, in Hanover, NH, has created software that uses the microphone on a cell phone to track and interpret a user’s activity. The software, called SoundSense, picks up sounds and tries to classify them into certain categories. In contrast to similar software developed previously, SoundSense can recognize completely unfamiliar sounds, and it also runs entirely on the device. SoundSense automatically classifies sounds as “voice,” “music,” or “ambient noise.” If a sound is repeated often enough or for long enough, SoundSense gives it a high “sound rank” and asks the user to confirm that it is significant and offers the option to label the sound.

The Dartmouth team focused on monitoring sound because every phone has a microphone and because accelerometers provide only limited information. “When we think about sounds, we don’t typically think that they can also represent a location that has a unique signature,” says Andrew Campbell, a professor of computer science at Dartmouth and a lead researcher on the project. The researchers made sure the program is small, so that it doesn’t use too much power. To address privacy concerns, they designed SoundSense so that information is not removed from the device for processing. Additionally, the program itself doesn’t store raw audio clips. A user can also tell the software to ignore any sounds deemed off limits.

In testing, the SoundSense software was able to correctly determine when the user was in a particular coffee shop, walking outside, brushing her teeth, cycling, and driving in the car. It also picked up the noise of an ATM machine and a fan in a particular room. The results of the experiments will be presented this week at the MobiSys 2009 conference, in Krakow, Poland.

“The SoundSense system is our first step in building a system that can learn [user behavior] on the go,” says Tanzeem Choudhury, an assistant professor at Dartmouth who was also a leader on the project and a TR35 winner. Choudhury says that enabling the software to learn to recognize new sounds will be essential for practical applications. “A system that can recognize sounds in a person’s life can be used to search for others who have the same preferences,” she says. Using sounds to classify events can give users feedback on their daily activities for health or time-management applications, she adds.

**The phones have ears:** SoundSense listens to a user’s environment through a phone’s microphone and learns to connect certain sounds to activities.

Kurt Partridge, a researcher at Palo Alto Research Center, who has also created cell-phone software that tracks behavior, believes that the SoundSense project exploits an underused resource. “I don’t think the field has really realized both how little power audio-based activity-sensing takes, and how informative it can be,” Partridge says. “Audio can distinguish so many more activities [and] adds a social aspect to contextual sensing that’s not possible otherwise.”

Dan Ellis, an associate professor at Columbia University, who has researched the use of continuous audio recordings, says that this type of “life logging” could someday be used as routinely as the outbox in an e-mail application. “Maybe you don’t look at your outbox very often, but given the right tools to quickly find what you’re looking for, it’s very convenient to keep a record of every e-mail you’re ever sent,” he says. “A near-continuous, audio-based record collected by a personal device could be similarly desirable.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.