Neuroscientists at MIT have developed a computer model that mimics the human vision system to accurately detect and recognize objects in a busy street scene, such as cars and motorcycles.
Such biologically inspired vision systems could soon be used in surveillance systems, or in smart sensors that can warn drivers of pedestrians and other obstacles. It may also help in the development of so-called visual search engines, says Thomas Serre, a neuroscientist at the Center for Biological and Computational Learning at MIT’s McGovern Institute for Brain Research, who was involved in the project.
Researchers have been interested for years in trying to copy biological vision systems, simply because they are so good, says David Hogg, a computer vision expert at Leeds University in the UK. “This is a very successful example of [mimicking biological vision],” he says.
Teaching a computer to classify objects has proved much harder than was originally anticipated, says Serre, who carried out the work with Tomaso Poggio, codirector of the center. On the one hand, to recognize a particular type of object, such as a car, a computer needs a template or computational representation specific to that particular object. Such a template enables the computer to distinguish a car from objects in other classes–noncars. Yet this representation must be sufficiently flexible to include all types of cars–no matter how varied in appearance–at different angles, positions, and poses, and under different lighting conditions.
“You want to be able to recognize an object anywhere in the field of vision, irrespective of where it is and irrespective of its size,” says Serre. Yet if you analyze images just by their patterns of light and dark pixels, then two portrait images of different people can end up looking more similar than two images of the same person taken from different angles.
The most effective method for getting around such problems is to train a learning algorithm on a set of images and allow it to extract the features they have in common; two wheels aligned with the road could signal a car, for example. Serre and Poggio believe that the human vision system uses a similar approach, but one that depends on a hierarchy of successive layers in the visual cortex. The first layers of the cortex detect an object’s simpler features, such as edges, and higher layers integrate that information to form our perception of the object as a whole.