Googling Your TV

Prototype software from Google Research could listen to your TV and send back useful information – and ads of course.

Wade Rousharchive page

August 24, 2006

Google probably already knows what search terms you use, what Web pages you’re viewing, and what you write about in your e-mail – after all, that’s how it serves up the text ads targeted to the Web content on your screen.

Peter Norvig, director of research at Google, says the company’s work on audio and video processing – such as a system that provides Web content matched to what’s playing on a user’s TV – will show up eventually in real products. (Credit: Google Inc.)

Pretty soon, Google may also know what TV programs you watch – and could use that information to send you more advertising, leavened with information pertinent to a show.

A system recently outlined by researchers at Google amounts to personalized TV without the fancy set-top equipment required by previous (and failed) attempts at interactive television. Their prototype software, detailed in a conference presentation in Europe last June, uses a computer’s built-in microphone to listen to the sounds in a room. It then filters each five-second snippet of sound to pick out audio from a TV, reduces the snippet to a digital “fingerprint,” searches an Internet server for a matching fingerprint from a pre-recorded show, and, if it finds a match, displays ads, chat rooms, or other information related to that snippet on the user’s computer.

Letting Google listen in on your living-room activities may sound like a privacy nightmare. Given the recent firestorm over AOL’s accidental releasing of search records for 685,000 members, consumers are more sensitive than ever to how search companies might misuse personal information, deliberately or not.

But the fingerprinting technology used in the Google prototype makes it impossible for the company to eavesdrop on other sounds in the room, such as personal conversations, according to the Google team. In the end, the researchers say, the only personal information revealed is TV-watching preferences.

Google research director Peter Norvig predicts that the prototype, which uses an audio identification technique invented outside Google and applied to a uniquely large database of recorded sound, will eventually evolve into a product. And it’s attracted plenty of attention from technology watchers, who see a big potential payoff for Google and other companies if a system for bridging TV and Web content can be made practical. For now, though, it’s still an early-stage research project.

“We weren’t really pitching an application that we want to do here and now, but rather a concept,” says Michael Fink, lead researcher on the project. Fink works at the Interdisciplinary Center for Neural Computation at Hebrew University in Jerusalem and is spending the summer at Google. “We wanted to open people’s minds to the possibility of using ambient audio as a medium for querying web content,” he says.

Computer science researcher Yan Ke and colleagues at Carnegie Mellon University laid the groundwork for the idea when they created software that reduced audio segments to very small fingerprints. The program, which runs on a conventional PC, converts spurts of sound into two-dimensional graphs, and uses computer vision algorithms to weed out background noise and boil down the graphs to a few key features that can then be translated into electronic bits. In this way, one second of audio can be reduced to four bytes of information – meaning the fingerprints for an entire year of television programming would add up to no more than a few gigabytes, according to Fink.

In Google’s prototype, the fingerprints alone are transmitted from a user’s home computer to the company’s audio database server, where they’re compared with the fingerprints from almost 100 hours of recorded video. A special algorithm developed by Fink and Google colleagues Michele Covell and Shumeet Baluja reduces the possibility of mismatches; in tests, the system achieved a “false positive” rate of between 1 percent and 6 percent, meaning that only six or fewer times out of 100 did it match audio fingerprints from the user with the wrong snippet of audio from a recorded show (with irrelevant information showing up on the user’s screen as a result).

It’s accurate enough that TV viewers might find the supplementary content Google sends useful. Nicole Kidman fans, for instance, might enjoy knowing what dress she’s wearing on a broadcast of “Extra!” or where they can buy a similar outfit. Or ads for Cooper Minis might appear whenever the car showed up in a TV rebroadcast of The Italian Job.

All of this would work only if someone first manually notated what is onscreen at any given moment in a broadcast. With the volume of TV programming broadcast every day, that would be a tedious job. But in some cases it could be worthwhile for Google and advertisers, Fink says. “Say I’m an advertiser, and I would like a link to my website to appear with a specific episode of Seinfeld. We could open each moment of audio to a bidding process. The Google model of advertisers bidding for related words on Web pages, which has proved to be very successful online, could be carried over.”

And the information Google sends doesn’t have to be one-way – it could also invite viewers watching the same program simultaneously to join a chat room, and administer surveys.

When word of the research first appeared in the media, some bloggers and other technology watchers reacted with horror; many assumed that the background conversation picked up by the microphone in Google’s system would be uploaded to Google. But the technology makes it impractical; at four bytes, the fingerprints don’t contain enough information to reconstruct the original sounds in a room. “Some people did get the impression that we had an open microphone that was going to listen in on them,” says Norvig. “Clearly, that was not what we were doing. We are transmitting a key that can be matched but not reversed. That said, users are giving up some information – and that’s something they have to decide about.”

Whether users could adapt to this new form of monitoring is uncertain. But the revenue opportunities are clear – if the system works, that is. “It’s a devilishly clever way to bridge those old and new media technologies,” says Michael McGuire, an online media analyst at Gartner Research. But everything would depend on how accurately Google could match audio segments, he says. “You could imagine that if they were just a little bit off, it would drive you insane, in terms of the type of advertising you’re seeing. And if it was far removed from what you were watching, you’d be jarred and maybe angry.”

Fink’s team is working on making the false-positive rate even lower – so users don’t get Doritos ads with their Masterpiece Theatre. But there’s another challenge, notes McGuire: how to divide the attention of the viewer. “Presumably, broadcasters and advertisers wouldn’t want anything so absorbing on the computer that it pulls viewers away from the actual broadcast. And even though the crowd that surfs the Web while they watch TV already knows how to multitask, they might ignore or block the online media stream if it starts to get too obtrusive. So [Google] would have to find a balance between information overload and effective advertising.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.