Picture this: The objects in a surveillance footage scene (top) are annotated by computer vision software (below).
Song-Chun Zhu/UCLA

Computing

Surveillance Software Knows What a Camera Sees

Software offers a running commentary to ease video searching and analysis.

  • Tuesday, June 1, 2010
  • By Tom Simonite

A prototype computer vision system can generate a live text description of what's happening in a feed from a surveillance camera. Although not yet ready for commercial use, the system demonstrates how software could make it easier to skim or search through video or image collections. It was developed by researchers at the University of California, Los Angeles, in collaboration with ObjectVideo of Reston, VA.

"You can see from the existence of YouTube and all the other growing sources of video around us that being able to search video is a major problem," says Song-Chun Zhu, lead researcher and professor of statistics and computer science at UCLA.

"Almost all search for images or video is still done using the surrounding text," he says. Zhu and UCLA colleagues Benjamin Yao and Haifeng Gong developed a new system, called I2T (Image to Text), which is intended to change that.

It puts a series of computer vision algorithms into a system that takes images or video frames as input, and spits out summaries of what they depict. "That can be searched using simple text search, so it's very human-friendly," says Zhu.

Advertisement

The team applied the software to surveillance footage in collaboration with Mun Wai Lee of ObjectVideo to demonstrate the strength of I2T. Systems like it might help address the fact that there are more and more surveillance cameras--on the streets and in military equipment, for instance--while the number of people working with them remains about the same, says Zhu.

The first part of I2T is an image parser that decomposes an image--meaning it removes the background, and objects like vehicles, trees, and people. Some objects can be broken down further; for example, the limbs of a person or wheels of a car can be separated from the object they belong to.

Video

Next, the meaning of that collection of shapes is determined. "This knowledge representation step is the most important part of the system," says Zhu, explaining that this knowledge comes from human smarts. In 2005, Zhu established the nonprofit Lotus Hill Institute in Ezhou, China, and, with some support from the Chinese government, recruited about 20 graduates of local art colleges to work full-time to annotate a library of images to aid computer vision systems. The result is a database of more than two million images containing objects that have been identified and classified into more than 500 categories.

To ensure that workers annotate images in a standard way, software guides them as they work. It uses versions of the algorithms that will eventually benefit from the final data to pick out the key objects for a person to classify, and it suggests how they might be classified based on previous data. The objects inside images are classified into a hierarchy of categories based on Princeton's WordNet database, which organizes English words into groups according to their meanings. "Once you have the image parsed using that system that also includes the meaning, transcription into the natural language is not too hard," says Zhu, who makes some of the data available for free to other researchers. "It is high-quality data and we hope that more people are going to use this," he says.

Print

Related Articles

Surveillance Video Becomes a Tool for Studying Customers

Software mines security footage to help business owners see what people do once they're inside the store.

Google Gives Away Video Software to Lure Developers

An open and free video format offers new opportunities for Web programmers.

Merging Video with Maps

A new system uses panoramic images to create navigation videos that highlight turns and landmarks.

Close Comments

To comment, please sign in or register

Forgot my password

quatschtronaut

8 Comments

  • 617 Days Ago
  • 06/03/2010

Art-ificial interpretation.

From an economical perspective, it certainly makes sense to dense information in a digital format that easily can be analyzed. From a human perspective, a more “arty” interpretation may cause higher acceptance... http://quatschtronauts.wordpress.com/2010/03/19/body-scanner/

Reply

StupidPeasant

98 Comments

  • 615 Days Ago
  • 06/05/2010

Grand Vision

Closer and closer we get to the computer knowing what it sees, "the grand vision," Wow!  If what is seen can become text, and text is given context, and the text and images of the entire web are at hand; the vision is indeed grand.  The implications to our future friends, the robots, are huge. 

Reply

sandipnsin

1 Comment

  • 151 Days Ago
  • 09/12/2011

Face recognition

I was curious to know whether face recognition software is integrated in the system. An excellent work indeed!!

Reply

Advertisement

MAGAZINE

Can We Build Tomorrow's Breakthroughs?

Manufacturing in the United States is in trouble. That's bad news not just for the country's economy but for the future of innovation.

Sponsored Content

Technologies from National Instruments

Adding Data Logging
Log measured data to a file and open it in Microsoft Excel

> Click here for more National Instruments Videos <
Whitepaper

Temperature Measurements with Thermocouples: How-To Guide

This document is part of the “How-To Guide for Most Common Measurements” centralized resource portal. This tutorial provides a detailed guide for measurement and device considerations to take temperature measurements using thermocouples. Get an introduction to thermocouples, which are inexpensive sensing devices widely used with PC-based data acquisition systems. Also review some specific thermocouple examples and learn how thermocouples work and ways to integrate them into a data acquisition measurement system.

View full PDF > Listen to story >
Find us on Youtube

Videos

A Robot Recruit that Can Do It All

More

Advertisement

Technology Review Lists

TR50

Our list of the 50 most innovative companies, including the following:

eSolar

Facebook

Akamai

Crowdcast

More

Advertisement

Facebook

Advertisement