MIT Technology Review Subscribe

A computer program that learns to “imagine” the world shows how AI can think more like us

DeepMind’s advance could lead to machines that can make better sense of a scene.

Machines will need to get a lot better at making sense of the world on their own if they are ever going to become truly intelligent.

DeepMind, the AI-focused subsidiary of Alphabet, has taken a step in that direction by making a computer program that builds a mental picture of the world all by itself. You might say that it learns to imagine the world around it.

Advertisement

The system, which uses what DeepMind’s researchers call a generative query network (GQN), looks at a scene from several angles and can then describe what it would look like from another angle.

This story is only available to subscribers.

Don’t settle for half the story.
Get paywall-free access to technology news for the here and now.

Subscribe now Already a subscriber? Sign in
You’ve read all your free stories.

MIT Technology Review provides an intelligent and independent filter for the flood of information about technology.

Subscribe now Already a subscriber? Sign in

This might seem trivial, but it requires a relatively sophisticated ability to learn about the physical world. In contrast to many AI vision systems, the DeepMind program makes sense of a scene more the way a person does. Even if something is partly occluded, for example, it can reason about what’s there.

Eventually, such technology might help serve as the foundation for deeper artificial intelligence, letting machines describe and reason about the world with much greater sophistication.

Ali Eslami, a research scientist at DeepMind, and his colleagues tested the approach on three virtual settings: a block-like tabletop, a virtual robot arm, and a simple maze. The system uses two neural networks; one learns and another generates, or “imagines,” new perspectives. The system captures aspects of a scene, including object shapes, positions, and colors, using a vector representation, which makes it relatively efficient. The research appears in the journal Science today.

The work is something of a new direction for DeepMind, which has made its name by developing programs capable of performing remarkable feats, including learning how to play the complex and abstract board game Go. The new project builds upon other academic research that seeks to mimic human perception and intelligence using similar computational tools.

“It is an interesting and valuable step in the right direction,” says Josh Tenenbaum, a professor who leads the Computational Cognitive Science group at MIT.

Tenenbaum says the ability to deal with complex scenes in a modular way is impressive but adds that the approach shows the same limitations as other machine-learning methods, including a need for a huge amount of training data: “The jury is still out on how much of the problem this solves.”

Sam Gershman, who heads the Computational Cognitive Neuroscience Lab at Harvard, says the DeepMind work combines some important ideas about how human visual perception works. But he notes that, like other AI programs, it is somewhat narrow, in that it can answer only a single query: what would a scene look like from a different viewpoint?

Advertisement

“In contrast, humans can answer an infinite variety of queries about a scene,” Gershman says. “What would a scene look like if I moved the blue circle a bit to the left, or repainted the red triangle, or squashed the yellow cube?

Gershman says it’s unclear whether DeepMind’s approach could be adapted to answer more complex questions or whether some fundamentally different approach might be required.

This is your last free story.
Sign in Subscribe now

Your daily newsletter about what’s up in emerging technology from MIT Technology Review.

Please, enter a valid email.
Privacy Policy
Submitting...
There was an error submitting the request.
Thanks for signing up!

Our most popular stories

Advertisement