Machines Can Now Recognize Something After Seeing It Once

Algorithms usually need thousands of examples to learn something. Researchers at Google DeepMind found a way around that.

Will Knightarchive page

November 3, 2016

Most of us can recognize an object after seeing it once or twice. But the algorithms that power computer vision and voice recognition need thousands of examples to become familiar with each new image or word.

Researchers at Google DeepMind now have a way around this. They made a few clever tweaks to a deep-learning algorithm that allows it to recognize objects in images and other things from a single example—something known as "one-shot learning." The team demonstrated the trick on a large database of tagged images, as well as on handwriting and language.

The best algorithms can recognize things reliably, but their need for data makes building them time-consuming and expensive. An algorithm trained to spot cars on the road, for instance, needs to ingest many thousands of examples to work reliably in a driverless car. Gathering so much data is often impractical—a robot that needs to navigate an unfamiliar home, for instance, can’t spend countless hours wandering around learning.

Oriol Vinyals, a research scientist at Google DeepMind, a U.K.-based subsidiary of Alphabet that’s focused on artificial intelligence, added a memory component to a deep-learning system—a type of large neural network that’s trained to recognize things by adjusting the sensitivity of many layers of interconnected components roughly analogous to the neurons in a brain. Such systems need to see lots of images to fine-tune the connections between these virtual neurons.

The team demonstrated the capabilities of the system on a database of labeled photographs called ImageNet. The software still needs to analyze several hundred categories of images, but after that it can learn to recognize new objects—say, a dog—from just one picture. It effectively learns to recognize the characteristics in images that make them unique. The algorithm was able to recognize images of dogs with an accuracy close to that of a conventional data-hungry system after seeing just one example.

Vinyals says the work could be especially useful if it could quickly recognize the meaning of a new word. This could be important for Google, Vinyals says, since it could allow a system to quickly learn the meaning of a new search term.

Others have developed one-shot learning systems, but these are usually not compatible with deep-learning systems. An academic project last year used probabilistic programming techniques to enable this kind of very efficient learning (see "This Algorithm Learns Tasks As Fast As We Do").

But deep-learning systems are becoming more capable, especially with the addition of memory mechanisms. Another group at Google DeepMind recently developed a network with a flexible kind of memory, making it capable of performing simple reasoning tasks—for example, learning how to navigate a subway system after analyzing several much simpler network diagrams (see "What Happens When You Give a Computer a Working Memory?").

"I think this is a very interesting approach, providing a novel way of doing one-shot learning on such large-scale data sets," says Sang Wan Lee, who leads the Laboratory for Brain and Machine Intelligence at the Korean Advanced Institute for Science and Technology in Daejeon, South Korea. "This is a technical contribution to the AI community, which is something that computer vision researchers might fully appreciate."

Others are more skeptical about its usefulness, given how different it still is from human learning. For one thing, says Sam Gershman, an assistant professor in Harvard's Department for Brain Science, humans generally learn by understanding the components that make up an image, which may require some real-world, or commonsense, knowledge. For example, "a Segway might look very different from a bicycle or motorcycle, but it can be composed from the same parts."

According to both Gershman and Wan Lee, it will be some time yet before machines match human learning. "We still remain far from revealing humans’ secret of performing one-shot learning," Wan Lee says, "but this proposal clearly poses new challenges that merit further study."

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.