Object recognition has been one of the major targets, and major disappointments, of traditional AI. While machine vision is a real industry, its successes have been in narrowly defined applications under highly controlled conditions, such as decoding license plates, identifying fingerprints, recognizing printed characters, and inspecting products (for instance, identifying burnt potato chips so they can be blown out of an assembly line). Each machine vision system “sees” only a specific kind of object; for example, the machine that reads license plates would not be able to identify fingerprints, and vice versa. Although today’s technology might be good enough to give us machines that recognize any one thing, most jobs in most industries – assembly, maintenance, health care, transportation, security – require more versatility than that. Workers need to be able to recognize a hammer and a screwdriver and a wrench, despite differences in lighting, the objects’ orientation, and the surrounding clutter. The failure to build machines that can do this is especially frustrating given that birds like crows, and small mammals like rats, routinely exhibit a level of skill in general recognition that is way beyond current technology. There is something about not being able to make machines as smart as we are that is consoling to our vanity; but not being able to make one as smart as a pigeon is just embarrassing.
So for years AI researchers have been working on the problem of associating visual patterns with meanings or identities. This is one of the areas where AI and neuroscience have been edging toward each other: neuroscience has been working on the brain’s role in object recognition, AI on the general logic of what any system would have to do to solve the same problem. After decades they are almost within talking distance. DiCarlo wonders if it might be time to christen a new discipline that draws from both fields, like “biologically inspired machine vision.”
No university is approaching this intersection faster than MIT, where the collaboration of engineering and science is an institutional mission. And that, says DiCarlo, is one reason he came to MIT: he expects the revolution to happen here.
Modeling Immediate Recognition
A striking illustration of DiCarlo’s point can be found in the labs of Tomaso Poggio. The codirector of MIT’s Center for Biological and Computational Learning, Poggio has been working on vision for four decades, first at the Max Planck Institute in Tübingen, Germany, then at MIT’s AI lab (which became the Computer Science and Artificial Intelligence Lab), and now in the Department of Brain and Cognitive Sciences. (Poggio collaborated with DiCarlo in the macaque experiments described in Science.) For much of this time, Poggio directed one research group in neuro-science and one in machine vision and saw no reason to bring them together. “We knew so little,” he says. “I always thought it was a mistake to expect much from neuro-science.” But recent results from a project carried out by postdoc Thomas Serre and Aude Oliva, assistant professor of cognitive neuroscience in BCS, made him a convert.