Robotic Vision

Robots are learning a few tricks from people on how to identify things.

Julie Claire Dioparchive page

October 1, 2002

There are some sights and noises that people just can’t help but notice. Indeed, research in neuroscience now suggests that the recognition of salient objects is a key part of how we make sense of our environment. But building robots that can intelligently pick out items of interest using sight or sound remains a daunting challenge. So a handful of engineers are working on a new approach called selective-attention modeling, which attempts to program robots to evaluate scenes critically as some neuroscientists believe people do.

“General scene understanding is the Holy Grail for computer vision,” says University of Southern California computer scientist Laurent Itti. Neuroscience-based algorithms, he contends, “should be the new approach.”

Several research groups, including ones at Caltech and Itti’s lab at the USC, are giving robots pan-and-tilt cameras for eyes and the ability to pick out unusual objects. The robots are designed to notice, for example, a bright purple tree house at the side of a wooded road and to do it for the same reason people would-because it stands out against a backdrop of trees. Responding to eye-catching sights enables robots to act independently when they encounter something unexpected. A robot on Mars, for example, might notice an area of the ground that is discolored and take a sample of it. In contrast, robots using more traditional vision methods would detect the discolored ground only if they had been told specifically to look for it.

Itti’s robots construct maps on the basis of such local contrasts in features as color, edges, orientation, light intensity and motion. The tree house would stand out on the orientation map because of its horizontal profile amidst the vertical trees. It would also be conspicuous on the color map. The robot would overlay these maps to build a composite that highlights any striking areas.

This technique works basically the way your brain does when you scan a “what’s wrong with this picture” type of puzzle. But robotic vision needs to do more than just pick out anomalous sights. So Itti and others, such as Caltech’s Christof Koch, are also programming robots to use the approach to find specific objects. A robot looking for that purple tree house, say, knows to put greater weight on purple objects in its color map. And when forming the composite map, it will give greater weight to the color map than to other maps; the motion map, for instance, would be irrelevant, since both the tree house and its background are mainly stationary.

The strategy differs from conventional approaches to robotic vision. One common method, for example, is called object segmentation. To learn to find a coffee cup, a robot first examines a picture of the cup. It extracts-or segments-the cup’s image out of the picture. Next, it erases anything that occludes the cup, say, a sugar bowl, and cleans up the image, maybe creating a black silhouette of the cup on a white background. The robot then estimates what the cup looks like at different scales and rotations. Only then can it conduct its search-homing in on objects that have shapes similar to the cup’s, until it finds a match.

But using object segmentation can be cumbersome, and it’s not especially reliable, says Polly Pook, vice president for research at iRobot, a robotics developer in Somerville, MA. She is impressed with the selective-attention models Itti and others are building. “I think it’s a great approach,” says Pook. “It’s simplifying the process.”

Visual selective-attention models, however, still have many shortcomings. Itti’s robots cannot tell the difference between a black laptop and a neatly folded black turtleneck, for example, since their software has no way of knowing the difference between metal and fabric. But engineers hope texture maps will one day advance visual selective-attention processing yet another step, so robots can make such distinctions.

Researchers are also thinking about incorporating data from other senses, such as touch, into the final map, although they’re not yet sure how this can be done. Any robot that interacts with its environment will benefit from having tactile senses, says Ernst Niebur, a neuroscientist at the Johns Hopkins University. Niebur feels confident that he can build a tactile map that would then feed into the final composite map. “We know the brain is capable of doing it,” he says.

The next leap is putting the robots in the field. Finding a purple tree house is small stuff compared with navigating a crowded street. That takes a robot that can quickly process and respond to multiple stimuli. But just knowing where to look and what to look out for is at least a small robotic step forward.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.