Video search is one of the most difficult challenges facing search engines. Given a particular query, the task is to select from all stored videos the ones that the searcher most wants to see.
Most video search services do this without any reference to the video content. Instead, they crunch the metadata associated with each clip: the key words, popularity and dates that describe the video and must be added separately.
That works reasonably well if you’re looking for “Jesus Christ Fenton” or some other viral content. But it’s not so good if for news.
News coverage provides more or less continuous content that is divided up into sections that often have little or no connection. Search news videos for “Barack Obama in China” and you could easily be served up a video that mentions the president in one segment and China in the following one.
Now Julien Lawto at Exalead, a search engine company based in Paris, and a few pals have developed a technology that could change all this.
For some time now, these guys have been playing with a video search engine called Voxalead. The novelty of this approach is that Voxalead processes the audio content of news videos to produce an automated transcript of what’s being said. This is then searchable for key words in a straightforward way.
What’s new, however, is that they’ve found a way to automatically divide up news broadcasts into standalone passages on specific topics.
The team do this using software that analyses the transcript looking for obvious transitions form one segment to another. “The general idea of this method is to search for the best possible segmentation among all the possible ones,” they say.
In practice, this means looking for repetitions of nouns, adjectives and certain verbs and then testing the fitness of various patterns of segmentation to see which seems the best (as measured by what the company mysteriously calls ‘thematic cohesion probability’).
That produces a number of advantages. First up is the accuracy of the results. If a user searchers for “Barack Obama AND Hu Jintao”, the engine will only return passages of news that contain references to both individuals.
It also allows users to search for events, such as footage of the British embassy sacking in Tehran.
Finally, this approach allows the engine to display the results in various innovative ways. For example, it shows a timeline giving how the number of mentions of the topic has changed in the past, it also shows a geographical mashup indicating where the events have taken place or been referred too. There is also a tag cloud showing trending topics.
These guys have also taken care to ensure that the technology is scalable. “The system processes more than 150 new video and audio items each day, amounting to roughly 3.5Gb or 15 hours of new daily content, on a single 6 core server,” they say. That gives it the potential to process about 100 hours of videos per day.
There are obviously potential problems with this approach. Not least of these is the accuracy of the voice recognition software. That can be a particular problem with news coverage which often refers to rare and sometimes entirely new words. They say the design is highly scalable.
Lawto and co have ambitious plans for the future. For example, they want to augment the search results with the results from facial recognition software, presumably to be more certain about who is talking.
Clearly video search is becoming more powerful. Worth a look if you have a few spare moments.
Voxalead is available now in demonstrator form at the Exalead labs. You can find it here: http://voxaleadnews.labs.exalead.com/ However, it isn’t clear from the paper whether the new technology is running now.
Ref: arxiv.org/abs/1111.6265: A Scalable Video Search Engine Based on Audio Content Indexing and Topic Segmentation