A View from Emerging Technology from the arXiv
The Video Genome Unveiled
The powerful technique behind DNA sequencing can now be used to mine video databases. The key is a new idea known as the video genome.
Bioinformatics has grown from an obscure branch of computer science to the powerhouse behind molecular biology in just 30 years. In particular, the technique behind gene sequencing has been hugely influential. At least in biology. While all kinds of disciplines such as computer science and mathematics have contributed to its development, bionformatics has yet to repay the favour.
Today that looks to have changed with an announcement of by Alexander Bronstein and buddies at BBK Technologies based near Boston. They’ve found a way to use the technique behind DNA sequencing to match video sequences.
“The problems encountered in video analysis such as identifying a video in a large database, putting together video fragments, finding similarities and common ancestry between different versions of a video, have analogous counterpart problems in genetic research and analysis of DNA and protein sequences,” they say.
Bronstein and co begin by creating “video DNA” for movies. This is information in the form of a sequence of letters that they use to label each scene in a film. The trick, of course, is to find a way to generate this sequence from the scene itself. They do this by looking for qualities of the image that are invariant under any reasonable transformation, such as the addition of subtitles or a change in colour cast.
To help, Brostein and co have developed a piece of software that they’ve trained to find these invariant features, which turn out largely to be things like the relative position of shapes and features in the image.
The task of matching identical pieces of film is then to analyse each scene in a film, generate its video DNA and then look for other pieces of film with the same DNA.
To recreate the whole film, the software simply glues bits of video DNA together based on common sets of sequences in its DNA, just like DNA sequencing.
That’s a clever idea. Video databases are not so different from DNA databases in terms of the amount of information they hold. So using the techniques of bionformatics to mine them makes good sense.
However, just how well and how quickly their algorithm works in the real world is will be important to ascertain. And they’ll surely face competition in producing the best algorithms for this purpose.
Bronstein and say that their technique could eventually “have an impact similar to that of the Human Genome project in genomic research”. That may be overstating things a little but the video genome idea will certainly appeal to a number of industries, not least those wanting to crack down on video piracy.
Ref: arxiv.org/abs/1003.5320: The Video Genome