We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Video Searching by Sight and Script

Researchers have designed an automated system to identify characters in television shows, paving the way for better video search.

Google’s acquisition this week of YouTube.com has raised hopes that searching for video is going to improve. More than 65,000 videos are uploaded to YouTube each day, according to the website. With all that content, finding the right clip can be difficult.

Now researchers have developed a system that uses a combination of face recognition, close-captioning information, and original television scripts to automatically name the faces on that appear on screen, making episodes of the TV show Buffy the Vampire Slayer searchable.

“We basically see this work as one of the first steps in getting automated descriptions of what’s happening in a video,” says Mark Everingham, a computer scientist now at the University of Leeds (formerly of the University of Oxford), who presented his research at the British Machine Vision Conference in September.

Currently, video searches offered by AOL Video, Google, and YouTube do not search the content of a video itself, but instead rely primarily on “metadata,” or text descriptions, written by users to develop a searchable index of Web-based media content.

Users frequently (and illegally) upload bits and pieces of their favorite sitcoms to video-sharing sites such as YouTube. For instance, a recent search for “Buffy the Vampire Slayer” turned up nearly 2,000 clips on YouTube, many of them viewed thousands of times. Most of these clips are less than five minutes and the descriptions are vague. One titled “A new day has come,” for instance, is described by a user thusly: “It mostly contains Buffy and Spike. It shows how Spike was there for Buffy until he died and she felt alone afterward.”

Everingham says previous work in video search has used data from subtitles to find videos, but he’s not aware of anyone using his method, which combines–in the technical tour de force–subtitles and script annotation. The script tells you “what is said and who said it” and subtitles tell you “what time something is said,” he explains. Everingham’s software combines those two sources of information with powerful tools previously developed to track faces and identify speakers without the need for user input.

What made the Buffy project such a challenge, Everingham says, is that in film and television, the person speaking is not always in the shot. The star, Buffy, may be speaking off-screen or facing away from the camera, for instance, and the camera will be showing you the listener’s reactions. Other times, there may be multiple actors on the screen or the actor’s face is not directly facing the camera. All of these ambiguities are easy for humans to interpret, but difficult for computers–at least until now. Everingham says their multimodal system is accurate up to 80 percent of the time.

A single episode of Buffy can have up to 20,000 instances of detected faces, but most of these instances arise from multiple frames of a single character in any given shot. The software tracks key “landmarks” on actor’s faces–nostrils, pupils, and eyes, for instance–and if one of them overlaps with the next frame, the two faces are considered part of a single track. If these landmarks are unclear, though, the software uses a description of clothing to unite two “broken” face tracks. Finally, the software also watches actors’ lips to identify who’s speaking or if the speaker is off screen. Ultimately, the system produces a detailed, play-by-play annotation of the video.

“The general idea is that you want to get more information without having people capture it,” says Alex Berg at the Computer Vision Group at University of California, Berkeley. “If you want to find a particular scene with a character, you have to first find the scenes that contain that character.” He says that Everingham’s research will pave the way for more complex searches of television programming.

Computer scientist Josef Sivic at Oxford’s Visual Geometry Group, who contributed to the Buffy project, says that in the future it will be possible to search for high-level concepts like “Buffy and Spike walking toward the camera hand-in-hand” or all outdoor scenes that contain Buffy.

Timothy Tuttle, vice president of AOL Video, says, “It seems like over the next five to ten years, more and more people will choose what to watch on their own schedule and they will view content on demand.” He also notes that the barrier to adapting technologies like Everingham’s may no longer be technical, but legal.

These legal barriers have been coming down with print media because companies have reaped the financial benefits of searchable content–Google’s Book Scan and Amazon’s search programs have been shown to boost book sales over the last two years.

But it’s unclear whether a searchable video can increase DVD sales in the same way. Currently, Google offers teasers of premium video content, says staff scientist Michele Covell. For certain genres, like sports videos, it’s becoming easier to select a teaser clip that will encourage people to buy the video, she says.

Shumeet Baluja, a staff research scientist at Google, agrees that annotating video over the Web will be a challenge, but over time they’ll be able to add more and more metadata to popular video clips offline, which will improve the speed and accuracy of searches.

Want to go ad free? No ad blockers needed.

Become an Insider
Already an Insider? Log in.
Want more award-winning journalism? Subscribe to Insider Plus.
  • Insider Plus {! insider.prices.plus !}*

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    Print + Digital Magazine (6 bi-monthly issues)

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

    Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

    10% Discount to MIT Technology Review events and MIT Press

    Ad-free website experience

You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.