Biological vision solves this problem in several different ways. One, according to Poggio’s group, is to organize processing around two simple operations and then alternate these operations in an ordered way through layers of neurons. Layer A might filter the basic inputs from the optic nerve; layer B would integrate the results from many cells in layer A; C would filter the inputs from B; D would integrate the results from C; and so on, perhaps a dozen times. As a signal rises through the layers, the outputs of the parallelized processors gradually combine, identity emerges, and noise falls away.
Serre and Poggio used this layering technique to enable their model to do parallel processing. Another trick they borrowed from biology was to increase the number of connections linking their basic switching units. The switching units in conventional computers have very few connections, usually around three; neurons, the basic switching units of the brain, have thousands or even tens of thousands. Serre and Poggio endowed the logical switches in their model with a biologically plausible degree of connectivity. In cases where the science was not yet known, they made assumptions based on their broader experience with neuroanatomy.
To test their theory, Serre and Poggio developed an immediate-recognition computer program that analyzes digital images. When digital image files are fed into the program, it passes them through multiple alternating layers of filtering and integrating cells, training itself to identify and classify the images. “The key is building complexity slowly,” Serre says. “Introducing intelligence too quickly is a big mistake.” Early AI efforts may have tried to zero in on identity too quickly, throwing out information that was critical for getting the right answer.
Serre and Poggio’s approach was a spectacular success. From a neuroscientific point of view, some of their assumptions turned out to predict real features, such as the presence of cells (call them OR cells) that pick the strongest or most consistent signal out of a group of inputs and copy it to their own output fibers. (Imagine a group of three neurons, A, B, and C, all sending signals to OR neuron X. If those signals were at strength levels 1, 2, and 3 respectively, X would suppress A and B and copy C’s signal to its output. If the strengths had been 3, 2, and 1, it would instead have copied A’s signal and suppressed those of B and C.)
The results were just as dramatic from an AI point of view. When human subjects and Serre and Poggio’s immediate-recognition program took the animal presence/absence test, the computer did as well as the humans – and better than the best machine vision programs available. (Indeed, it got the right answer 82 percent of the time, while the humans averaged just 80 percent.) This is almost certainly the first time a general-vision program has performed as well as humans.
The promising results have Poggio and Serre thinking beyond immediate recognition. Poggio suspects that the model might apply just as well to auditory perception. Serre advances an even more daring speculation: that general object recognition is the basic building block of cognition. Perhaps that’s why we say “I see” when we want to indicate that we understand something.