Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Searching Sportscasts

A new way to search video could help fans find footage.

A new kind of visual-search engine has been developed to automatically scour sports footage for clips showing specific types of action and events. According to its creators, borrowing a few tricks from the field of machine translation seems to make all the difference in improving the accuracy of video search.

Event tracking: Researchers at MIT have developed a new kind of video search specifically for finding key plays in sporting events. The system combines a search of a text transcript of the announcers’ voices with a search for visual elements in the video.

Despite recent advances in visual-search engines, accurate video search still remains a challenge, particularly when dealing with sports footage, says Michael Fleischman, a computer scientist at MIT. “The difference between a home run and a foul ball is often hard for a human novice to notice, and nearly impossible for a machine to recognize.”

To cope with growing video repositories, cutting-edge systems are now emerging that use automatic speech recognition (ASR) to try to improve the search accuracy by generating text transcripts. (See “More-Accurate Video Search.”)

The trouble is, search terms are often repeated out of context, says Fleischman. This is particularly the case in sport footage, such as baseball, in which commentators frequently talk about home runs and other events regardless of what is actually happening on the field.

Multimedia

To address this issue, Fleischman and Deb Roy, director of MIT’s Cognitive Machines Group, developed a system that provides a way to associate search terms with aspects of the video, and not just with what is being said as the video plays. “We collect hundreds of hours of baseball games and automatically encode all the video based on features, such as how much grass is visible and whether there is cheering in the background,” says Fleischman.

Using machine-learning algorithms, researchers analyze these video clips to identify discrete temporal “events” by extracting patterns in the different types of shots and the order in which they occur. For example, a fly ball could be described as a sequence involving a camera panning up and a camera panning down, which also occurs during a field scene and before a pitching scene.

The search system then tries to map these events to words that appear in the transcript text by looking at their probabilistic distribution. According to Fleischman, this technique is commonly used in automatic machine translation, in which words from one language are automatically mapped onto words from another, even though they may appear in completely different orders or at different frequencies. It this case, it’s a matter of translating video into audio, Fleischman says. The system tries to find the best “translation” of the events in the video into the words uttered by the announcer.

Once a new video clip is encoded using such patterns, the system looks for co-occurrences between the matched patterns and phrases. “In this way, the system is able to find correlations with events in the game, without requiring a human to explicitly design representations for any specific events,” says Fleischman.

Giving precise figures on the accuracy of the system is difficult because there is no standard for judging. Even so, trials carried out by Fleischman and Roy involving searching six baseball games for occurrences of home runs showed promise. Using just visual search alone yielded poor results, as was the case using just speech. “However, when you combine the two sources of information, we have seen results that nearly double the performance of either one on their own,” says Fleischman.

The researchers are now looking to extend this system to other sport-video archives, such as for basketball. But it shouldn’t just benefit sports fans, says Fleischman.

In theory, the system could help with other video-search processes, such as security-video analysis, says David Hogg, a professor of computer science and head of the Vision Group at Leeds University, in the United Kingdom. This system is a very novel approach, he says, and one that shows the way forward for the unsupervised learning systems that are needed to make this kind of search automatic.

Using speech and visual information together is a powerful combination for machine learning, Hogg says. “In machine learning, it is very likely to be easier the more information there is available about each situation.”

Speech can help remove ambiguities in visual data, and visual data can help disambiguate speech, says Richard Stern, a professor of electrical and computer engineering at Carnegie Mellon University, in Pittsburgh. It’s a natural marriage, he says, but one that’s just beginning to emerge.

Until recently, there has been relatively little use of ASR to aid in search, says Stern. “But this is all changing very rapidly,” he says. “Google has been recruiting speech scientists aggressively for the past several years–another indication that multimedia search is moving from the research lab to the consumer very rapidly.”

Tech Obsessive?
Become an Insider to get the story behind the story — and before anyone else.

Subscribe today

Uh oh–you've read all of your free articles for this month.

Insider Premium
$179.95/yr US PRICE

Want more award-winning journalism? Subscribe to Insider Premium.
  • Insider Premium {! insider.prices.premium !}*

    {! insider.display.menuOptionsLabel !}

    Our award winning magazine, unlimited access to our story archive, special discounts to MIT Technology Review Events, and exclusive content.

    See details+

    What's Included

    Bimonthly magazine delivery and unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

    Special discounts to select partner offerings

    Discount to MIT Technology Review events

    Ad-free web experience

    First Look: exclusive early access to important stories, before they’re available to anyone else

    Insider Conversations: listen in on in-depth calls between our editors and today’s thought leaders

/
You've read all of your free articles this month. This is your last free article this month. You've read of free articles this month. or  for unlimited online access.