Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo

 

Unsupported browser: Your browser does not meet modern web standards. See how it scores »

Looking for clouds: Above, a new video-search tool called TalkMiner searches an O’Reilly webcast for mentions of “cloud security.” The tool enables users to search online video lectures for particular words by detecting and indexing the words in presentation slides.

A new online video search tool launched this week makes it easier to search the content of video lectures by automatically transcribing words used in the lecturer’s visual aids.

TalkMiner was created by researchers at Fuji Xerox Palo Alto Laboratory (FXPAL), in California, to help students and professionals search the ever-expanding online archives of video lectures and presentations. “It gives you a good shot at finding something that wasn’t mentioned in the title or abstract but is buried deep inside the video,” says Larry Rowe, president of FXPAL.

Video lectures are becoming an increasingly popular study tool, and more and more universities are providing them, says Rowe. But if you’re a student trying to review part of a lecture for a midterm exam, or a professional searching for something specific in an online TED lecture, the process isn’t quick. Even if you know the date a lecture was given, there’s no way to search it for specific content without watching the entire thing, says Rowe.

TalkMiner overcomes this by skimming videos to find the speakers’ presentation slides. It analyzes the footage once per second for telltale signs of a presentation slide, such as its shape and static nature; captures the slide image and compensates for any skewed angles; and uses optical character recognition (OCR) to detect the words on the slides. These words are then indexed into TalkMiner’s search engine, which currently makes available 15,000 videos from institutions such as Stanford University, the University of California, Berkeley, and TED.

“OCR and the search indexing have been done before,” says Rowe. What’s new is automatic extraction of slide content from video.

“The quality of the video production is often very poor,” says Rowe. “So you have got to find the slides and then clean them up.” The slides can appear anywhere in the image, or sometimes not at all. And “if they have multiple cameras, they may switch between a full-screen image of a slide and [an image of] the speaker.”

The absence of a standard format for recording lectures doesn’t help. “It’s a very uncontrolled environment,” says John Adcock, who also worked on the project. The challenge, he says, was to make a system that would work no matter how the lecture was recorded.

Although TalkMiner is application-specific in its current form, it could ultimately extend the range of situations in which OCR can be used, says Adrian Ulges, a researcher in multimedia analysis and data mining at the German Research Center for Artificial Intelligence in Kaiserslautern. Google’s Street View could use TalkMiner to capture additional information about particular geographic locations, such as opening times or special offers, he says, or it could improve the accuracy of mobile apps such as Word Lens, which translates text viewed by a phone’s camera.

“OCR is still not considered a solved problem, even though recognition rates are pretty decent,” says Ulges. Different lighting conditions, poor contrast, different-colored slides, and even different fonts can all trip up OCR.

But even when OCR fails to recognize any text, TalkMiner can still serve a purpose. Adcock explains, “An awful lot of TED presentations don’t use text in their slides,” but merely capturing static images of whatever’s being displayed is enough to create a visual index.

Originally researchers tried indexing the video based on what was actually said by the speaker, detecting key words in the audio track. But the speech recognition software wasn’t reliable enough to make it accurate, says Rowe With the current approach, users don’t have to concentrate on copying down the content of slides, so they can pay closer attention to what the speaker is saying, he says. And, yes, in theory, lazy students could become over-reliant on TalkMiner and miss vital bits of information. “But I view this as just another tool for learning, and as with [all] tools, [it] can be misused.”

2 comments. Share your thoughts »

Credit: FXPAL

Tagged: Web, video, online video

Reprints and Permissions | Send feedback to the editor

From the Archives

Close

Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me