We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Video Content Search Gets a Boost

Online video search tool scans videos for written words.

A new online video search tool launched this week makes it easier to search the content of video lectures by automatically transcribing words used in the lecturer’s visual aids.

Looking for clouds: Above, a new video-search tool called TalkMiner searches an O’Reilly webcast for mentions of “cloud security.” The tool enables users to search online video lectures for particular words by detecting and indexing the words in presentation slides.

TalkMiner was created by researchers at Fuji Xerox Palo Alto Laboratory (FXPAL), in California, to help students and professionals search the ever-expanding online archives of video lectures and presentations. “It gives you a good shot at finding something that wasn’t mentioned in the title or abstract but is buried deep inside the video,” says Larry Rowe, president of FXPAL.

Video lectures are becoming an increasingly popular study tool, and more and more universities are providing them, says Rowe. But if you’re a student trying to review part of a lecture for a midterm exam, or a professional searching for something specific in an online TED lecture, the process isn’t quick. Even if you know the date a lecture was given, there’s no way to search it for specific content without watching the entire thing, says Rowe.

TalkMiner overcomes this by skimming videos to find the speakers’ presentation slides. It analyzes the footage once per second for telltale signs of a presentation slide, such as its shape and static nature; captures the slide image and compensates for any skewed angles; and uses optical character recognition (OCR) to detect the words on the slides. These words are then indexed into TalkMiner’s search engine, which currently makes available 15,000 videos from institutions such as Stanford University, the University of California, Berkeley, and TED.

“OCR and the search indexing have been done before,” says Rowe. What’s new is automatic extraction of slide content from video.

“The quality of the video production is often very poor,” says Rowe. “So you have got to find the slides and then clean them up.” The slides can appear anywhere in the image, or sometimes not at all. And “if they have multiple cameras, they may switch between a full-screen image of a slide and [an image of] the speaker.”

The absence of a standard format for recording lectures doesn’t help. “It’s a very uncontrolled environment,” says John Adcock, who also worked on the project. The challenge, he says, was to make a system that would work no matter how the lecture was recorded.

Although TalkMiner is application-specific in its current form, it could ultimately extend the range of situations in which OCR can be used, says Adrian Ulges, a researcher in multimedia analysis and data mining at the German Research Center for Artificial Intelligence in Kaiserslautern. Google’s Street View could use TalkMiner to capture additional information about particular geographic locations, such as opening times or special offers, he says, or it could improve the accuracy of mobile apps such as Word Lens, which translates text viewed by a phone’s camera.

“OCR is still not considered a solved problem, even though recognition rates are pretty decent,” says Ulges. Different lighting conditions, poor contrast, different-colored slides, and even different fonts can all trip up OCR.

But even when OCR fails to recognize any text, TalkMiner can still serve a purpose. Adcock explains, “An awful lot of TED presentations don’t use text in their slides,” but merely capturing static images of whatever’s being displayed is enough to create a visual index.

Originally researchers tried indexing the video based on what was actually said by the speaker, detecting key words in the audio track. But the speech recognition software wasn’t reliable enough to make it accurate, says Rowe With the current approach, users don’t have to concentrate on copying down the content of slides, so they can pay closer attention to what the speaker is saying, he says. And, yes, in theory, lazy students could become over-reliant on TalkMiner and miss vital bits of information. “But I view this as just another tool for learning, and as with [all] tools, [it] can be misused.”

AI is here. Will you lead or follow?
Join us at EmTech Digital 2019.

Register now
Want more award-winning journalism? Subscribe to Insider Plus.
  • Insider Plus {! insider.prices.plus !}*

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    Print + Digital Magazine (6 bi-monthly issues)

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

    Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

    10% Discount to MIT Technology Review events and MIT Press

    Ad-free website experience

You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.