A single episode of Buffy can have up to 20,000 instances of detected faces, but most of these instances arise from multiple frames of a single character in any given shot. The software tracks key “landmarks” on actor’s faces–nostrils, pupils, and eyes, for instance–and if one of them overlaps with the next frame, the two faces are considered part of a single track. If these landmarks are unclear, though, the software uses a description of clothing to unite two “broken” face tracks. Finally, the software also watches actors’ lips to identify who’s speaking or if the speaker is off screen. Ultimately, the system produces a detailed, play-by-play annotation of the video.
“The general idea is that you want to get more information without having people capture it,” says Alex Berg at the Computer Vision Group at University of California, Berkeley. “If you want to find a particular scene with a character, you have to first find the scenes that contain that character.” He says that Everingham’s research will pave the way for more complex searches of television programming.
Computer scientist Josef Sivic at Oxford’s Visual Geometry Group, who contributed to the Buffy project, says that in the future it will be possible to search for high-level concepts like “Buffy and Spike walking toward the camera hand-in-hand” or all outdoor scenes that contain Buffy.
Timothy Tuttle, vice president of AOL Video, says, “It seems like over the next five to ten years, more and more people will choose what to watch on their own schedule and they will view content on demand.” He also notes that the barrier to adapting technologies like Everingham’s may no longer be technical, but legal.
These legal barriers have been coming down with print media because companies have reaped the financial benefits of searchable content–Google’s Book Scan and Amazon’s search programs have been shown to boost book sales over the last two years.
But it’s unclear whether a searchable video can increase DVD sales in the same way. Currently, Google offers teasers of premium video content, says staff scientist Michele Covell. For certain genres, like sports videos, it’s becoming easier to select a teaser clip that will encourage people to buy the video, she says.
Shumeet Baluja, a staff research scientist at Google, agrees that annotating video over the Web will be a challenge, but over time they’ll be able to add more and more metadata to popular video clips offline, which will improve the speed and accuracy of searches.