Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo


Unsupported browser: Your browser does not meet modern web standards. See how it scores »

{ action.text }

Google probably already knows what search terms you use, what Web pages you’re viewing, and what you write about in your e-mail – after all, that’s how it serves up the text ads targeted to the Web content on your screen.

Pretty soon, Google may also know what TV programs you watch – and could use that information to send you more advertising, leavened with information pertinent to a show.

A system recently outlined by researchers at Google amounts to personalized TV without the fancy set-top equipment required by previous (and failed) attempts at interactive television. Their prototype software, detailed in a conference presentation in Europe last June, uses a computer’s built-in microphone to listen to the sounds in a room. It then filters each five-second snippet of sound to pick out audio from a TV, reduces the snippet to a digital “fingerprint,” searches an Internet server for a matching fingerprint from a pre-recorded show, and, if it finds a match, displays ads, chat rooms, or other information related to that snippet on the user’s computer.

Letting Google listen in on your living-room activities may sound like a privacy nightmare. Given the recent firestorm over AOL’s accidental releasing of search records for 685,000 members, consumers are more sensitive than ever to how search companies might misuse personal information, deliberately or not.

But the fingerprinting technology used in the Google prototype makes it impossible for the company to eavesdrop on other sounds in the room, such as personal conversations, according to the Google team. In the end, the researchers say, the only personal information revealed is TV-watching preferences.

Google research director Peter Norvig predicts that the prototype, which uses an audio identification technique invented outside Google and applied to a uniquely large database of recorded sound, will eventually evolve into a product. And it’s attracted plenty of attention from technology watchers, who see a big potential payoff for Google and other companies if a system for bridging TV and Web content can be made practical. For now, though, it’s still an early-stage research project.

“We weren’t really pitching an application that we want to do here and now, but rather a concept,” says Michael Fink, lead researcher on the project. Fink works at the Interdisciplinary Center for Neural Computation at Hebrew University in Jerusalem and is spending the summer at Google. “We wanted to open people’s minds to the possibility of using ambient audio as a medium for querying web content,” he says.

Computer science researcher Yan Ke and colleagues at Carnegie Mellon University laid the groundwork for the idea when they created software that reduced audio segments to very small fingerprints. The program, which runs on a conventional PC, converts spurts of sound into two-dimensional graphs, and uses computer vision algorithms to weed out background noise and boil down the graphs to a few key features that can then be translated into electronic bits. In this way, one second of audio can be reduced to four bytes of information – meaning the fingerprints for an entire year of television programming would add up to no more than a few gigabytes, according to Fink.

In Google’s prototype, the fingerprints alone are transmitted from a user’s home computer to the company’s audio database server, where they’re compared with the fingerprints from almost 100 hours of recorded video. A special algorithm developed by Fink and Google colleagues Michele Covell and Shumeet Baluja reduces the possibility of mismatches; in tests, the system achieved a “false positive” rate of between 1 percent and 6 percent, meaning that only six or fewer times out of 100 did it match audio fingerprints from the user with the wrong snippet of audio from a recorded show (with irrelevant information showing up on the user’s screen as a result).

5 comments. Share your thoughts »

Tagged: Computing

Reprints and Permissions | Send feedback to the editor

From the Archives


Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me