Now that we’ve all become news junkies, we’re struck again by how much junk news is out there-especially on TV, where you may have to watch hours of car chases to find the five minutes that interest you. Microsoft has turned its heavy guns on the problem, with technology it hopes will bring order to video chaos.
At its tenth anniversary this fall, Microsoft Research unveiled prototype software that analyzes and indexes news footage. Q-Video, as Microsoft calls it, lets viewers find a CNN story as easily as a New York Times article.
“The objective is to provide an intelligent and interactive viewing experience to users, and technologies for next generation TV,” says HongJiang Zhang, who heads the Q-video effort at Microsoft’s Beijing, China research center.
Combine and Conquer
Q-video works through the artful blend of a variety of technologies, from face recognition to image analysis to natural language processing. Although researchers have developed a number of ways to search video (see Upstream: Video Searching), Zhang’s group may be the first to combine so many different technologies-an approach that promises to categorize clips more accurately.
First, the software tries to break down a video into distinct stories, using both visual and audio cues. Face and voice recognition identify news anchors and reporters. Music and so-called “keyframes”-a particular view of the studio, for example-signal story transitions. Q-video compares this information to identify a story’s beginning and end.
Next, Q-video attempts to categorize the story, by analyzing the closed caption text. If the story isn’t captioned, Q-video relies on speech recognition. The software also uses optical character recognition to extract words displayed onscreen, such as “Live at Camp David.”
Users could type-or, one day, say-“Osama bin Laden,” into a set-top box and select from a list of relevant stories. In addition, filters would gather the news a viewer cares most about: investors could monitor their portfolio companies, parents the local schools or commuters the traffic report.
Zhang cautions that Q-video is “far from being a product.” However, he says that Microsoft may soon integrate some of its features into enhanced TV offerings such as MSN TV-a Web-enabled TV service-and UltimateTV, a combination of Web-enabled satellite TV and a Tivo-like personal video recorder.
He Shoots, He Scores!
Researchers have been tackling the problem of video search for more than two decades, says Walter Bender, executive director of MIT’s Media Lab and a pioneer of video-search technology.
In the early 1980s, soon after closed captioning was introduced, Bender and colleagues at MIT built a keyword-based system to analyze the captioning of TV news stories, automatically categorize them and augment them with additional information.
At Columbia University’s Center for Telecommunications Research, Professor Shih-Fu Chang leads several projects that seek to extract and categorize highlights from sports video. One project uses scene detection and structure analysis to find highlights of tennis and baseball games. Another distinguishes wide, narrow and close-up shots in soccer game footage by analyzing how much of the frame is occupied by grass.
One problem still to be solved, Bender says, is personalization. “I don’t work or live alone,” he says. “The communities in which I interact have an impact on me.” To decide what is important to a particular user at a specific time, a personalized search may one day take into account such information as entries in an appointment calendar or the identities of recent callers and e-mail correspondents.
Even then, many people may still prefer to rely on the serendipitous discovery of new information as much as on automated news systems. After all, what once seemed an obscure event on the far side of the globe may suddenly become crucial information.