Machines will need to get a lot better at making sense of the world on their own if they are ever going to become truly intelligent.
DeepMind, the AI-focused subsidiary of Alphabet, has taken a step in that direction by making a computer program that builds a mental picture of the world all by itself. You might say that it learns to imagine the world around it.
The system, which uses what DeepMind’s researchers call a generative query network (GQN), looks at a scene from several angles and can then describe what it would look like from another angle.
This might seem trivial, but it requires a relatively sophisticated ability to learn about the physical world. In contrast to many AI vision systems, the DeepMind program makes sense of a scene more the way a person does. Even if something is partly occluded, for example, it can reason about what’s there.
Eventually, such technology might help serve as the foundation for deeper artificial intelligence, letting machines describe and reason about the world with much greater sophistication.
Ali Eslami, a research scientist at DeepMind, and his colleagues tested the approach on three virtual settings: a block-like tabletop, a virtual robot arm, and a simple maze. The system uses two neural networks; one learns and another generates, or “imagines,” new perspectives. The system captures aspects of a scene, including object shapes, positions, and colors, using a vector representation, which makes it relatively efficient. The research appears in the journal Science today.
The work is something of a new direction for DeepMind, which has made its name by developing programs capable of performing remarkable feats, including learning how to play the complex and abstract board game Go. The new project builds upon other academic research that seeks to mimic human perception and intelligence using similar computational tools.
“It is an interesting and valuable step in the right direction,” says Josh Tenenbaum, a professor who leads the Computational Cognitive Science group at MIT.
Tenenbaum says the ability to deal with complex scenes in a modular way is impressive but adds that the approach shows the same limitations as other machine-learning methods, including a need for a huge amount of training data: “The jury is still out on how much of the problem this solves.”
Sam Gershman, who heads the Computational Cognitive Neuroscience Lab at Harvard, says the DeepMind work combines some important ideas about how human visual perception works. But he notes that, like other AI programs, it is somewhat narrow, in that it can answer only a single query: what would a scene look like from a different viewpoint?
“In contrast, humans can answer an infinite variety of queries about a scene,” Gershman says. “What would a scene look like if I moved the blue circle a bit to the left, or repainted the red triangle, or squashed the yellow cube?
Gershman says it’s unclear whether DeepMind’s approach could be adapted to answer more complex questions or whether some fundamentally different approach might be required.
A Roomba recorded a woman on the toilet. How did screenshots end up on Facebook?
Robot vacuum companies say your images are safe, but a sprawling global supply chain for data from our devices creates risk.
The viral AI avatar app Lensa undressed me—without my consent
My avatars were cartoonishly pornified, while my male colleagues got to be astronauts, explorers, and inventors.
Roomba testers feel misled after intimate images ended up on Facebook
An MIT Technology Review investigation recently revealed how images of a minor and a tester on the toilet ended up on social media. iRobot said it had consent to collect this kind of data from inside homes—but participants say otherwise.
How to spot AI-generated text
The internet is increasingly awash with text written by AI software. We need new tools to detect it.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.