Technology Review - Published By MIT
Advertisement
[1] 2 Next »

Wednesday, October 11, 2006

Video Searching by Sight and Script

Researchers have designed an automated system to identify characters in television shows, paving the way for better video search.

By Brendan Borrell

smaller text tool iconmedium text tool iconlarger text tool icon

Google's acquisition this week of YouTube.com has raised hopes that searching for video is going to improve. More than 65,000 videos are uploaded to YouTube each day, according to the website. With all that content, finding the right clip can be difficult.

Now researchers have developed a system that uses a combination of face recognition, close-captioning information, and original television scripts to automatically name the faces on that appear on screen, making episodes of the TV show Buffy the Vampire Slayer searchable.

"We basically see this work as one of the first steps in getting automated descriptions of what's happening in a video," says Mark Everingham, a computer scientist now at the University of Leeds (formerly of the University of Oxford), who presented his research at the British Machine Vision Conference in September.

Currently, video searches offered by AOL Video, Google, and YouTube do not search the content of a video itself, but instead rely primarily on "metadata," or text descriptions, written by users to develop a searchable index of Web-based media content.

Users frequently (and illegally) upload bits and pieces of their favorite sitcoms to video-sharing sites such as YouTube. For instance, a recent search for "Buffy the Vampire Slayer" turned up nearly 2,000 clips on YouTube, many of them viewed thousands of times. Most of these clips are less than five minutes and the descriptions are vague. One titled "A new day has come," for instance, is described by a user thusly: "It mostly contains Buffy and Spike. It shows how Spike was there for Buffy until he died and she felt alone afterward."

Everingham says previous work in video search has used data from subtitles to find videos, but he's not aware of anyone using his method, which combines--in the technical tour de force--subtitles and script annotation. The script tells you "what is said and who said it" and subtitles tell you "what time something is said," he explains. Everingham's software combines those two sources of information with powerful tools previously developed to track faces and identify speakers without the need for user input.

What made the Buffy project such a challenge, Everingham says, is that in film and television, the person speaking is not always in the shot. The star, Buffy, may be speaking off-screen or facing away from the camera, for instance, and the camera will be showing you the listener's reactions. Other times, there may be multiple actors on the screen or the actor's face is not directly facing the camera. All of these ambiguities are easy for humans to interpret, but difficult for computers--at least until now. Everingham says their multimodal system is accurate up to 80 percent of the time.

[1] 2 Next »

Comments

Advertisement

Current Issue

Technology Review November/December 2008
Sun + Water = Fuel
An MIT chemist has opened the way to making hydrogen fuel from water using sunlight.
•  Subscribe
Save 41%
•  Table of Contents
•  MIT News

Magazine Services

Career Resources

MIT Technology Insider

Stories and breaking news from inside MIT about the latest research, innovations, and startups--in a convenient monthly e-newsletter. Subscribe today
Advertisement

Follow us on Twitter

Twitter

Get Technology Review updates via the web, cellphone, or Instant Messager – Follow techreview on Twitter!

Advertisement

More Technology News from Forbes

Advertisement
Advertisement
TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology