A new online video search tool launched this week makes it easier to search the content of video lectures by automatically transcribing words used in the lecturer’s visual aids.
TalkMiner was created by researchers at Fuji Xerox Palo Alto Laboratory (FXPAL), in California, to help students and professionals search the ever-expanding online archives of video lectures and presentations. “It gives you a good shot at finding something that wasn’t mentioned in the title or abstract but is buried deep inside the video,” says Larry Rowe, president of FXPAL.
Video lectures are becoming an increasingly popular study tool, and more and more universities are providing them, says Rowe. But if you’re a student trying to review part of a lecture for a midterm exam, or a professional searching for something specific in an online TED lecture, the process isn’t quick. Even if you know the date a lecture was given, there’s no way to search it for specific content without watching the entire thing, says Rowe.
TalkMiner overcomes this by skimming videos to find the speakers’ presentation slides. It analyzes the footage once per second for telltale signs of a presentation slide, such as its shape and static nature; captures the slide image and compensates for any skewed angles; and uses optical character recognition (OCR) to detect the words on the slides. These words are then indexed into TalkMiner’s search engine, which currently makes available 15,000 videos from institutions such as Stanford University, the University of California, Berkeley, and TED.
“OCR and the search indexing have been done before,” says Rowe. What’s new is automatic extraction of slide content from video.
“The quality of the video production is often very poor,” says Rowe. “So you have got to find the slides and then clean them up.” The slides can appear anywhere in the image, or sometimes not at all. And “if they have multiple cameras, they may switch between a full-screen image of a slide and [an image of] the speaker.”
The absence of a standard format for recording lectures doesn’t help. “It’s a very uncontrolled environment,” says John Adcock, who also worked on the project. The challenge, he says, was to make a system that would work no matter how the lecture was recorded.
Although TalkMiner is application-specific in its current form, it could ultimately extend the range of situations in which OCR can be used, says Adrian Ulges, a researcher in multimedia analysis and data mining at the German Research Center for Artificial Intelligence in Kaiserslautern. Google’s Street View could use TalkMiner to capture additional information about particular geographic locations, such as opening times or special offers, he says, or it could improve the accuracy of mobile apps such as Word Lens, which translates text viewed by a phone’s camera.
“OCR is still not considered a solved problem, even though recognition rates are pretty decent,” says Ulges. Different lighting conditions, poor contrast, different-colored slides, and even different fonts can all trip up OCR.
But even when OCR fails to recognize any text, TalkMiner can still serve a purpose. Adcock explains, “An awful lot of TED presentations don’t use text in their slides,” but merely capturing static images of whatever’s being displayed is enough to create a visual index.
Originally researchers tried indexing the video based on what was actually said by the speaker, detecting key words in the audio track. But the speech recognition software wasn’t reliable enough to make it accurate, says Rowe With the current approach, users don’t have to concentrate on copying down the content of slides, so they can pay closer attention to what the speaker is saying, he says. And, yes, in theory, lazy students could become over-reliant on TalkMiner and miss vital bits of information. “But I view this as just another tool for learning, and as with [all] tools, [it] can be misused.”