Skip to Content

AI Begins to Understand the 3-D World

Research on artificial intelligence moves from 2-D to 3-D representations of the world—work that could lead to big advances in robotics and automated driving.
December 9, 2016
Researchers at UC Berkeley are working with a machine from Rethink Robotics to help AI gain an understanding of how objects in the real world can be manipulated.

There’s been some stunning progress in artificial intelligence of late, but it’s been surprisingly flat.

Now AI researchers are moving beyond two-dimensional images and pixels. Instead they’re building systems capable of picturing the three-dimensional world and taking action. The work could have a big impact on robotics and self-driving cars, helping to make machines that can learn how to act more intelligently in the real world.

“An exciting and important trend is the move in learning-based vision systems from just doing things with images to doing things with three-dimensional objects,” says Josh Tenenbaum, a professor in MIT’s Department of Brain and Cognitive Sciences. “That includes seeing objects in depth and modeling whole solid objects—not just recognizing that this pattern of pixels is a dog or a chair or table.”

Tenenbaum and colleagues used a popular machine-learning technique known as generative adversarial modeling to have a computer learn about the properties of three-dimensional space from examples. It could then generate new objects that are realistic and physically accurate. The team presented the work this week at the Neural Information Processing System conference in Barcelona, Spain.

This is just one technique that can be used to learn about the physical world, Tenenbaum says. Research from cognitive science suggests that humans make use of some sort of three-dimensional model to perceive and take action. For example, we may generate a three-dimensional picture of an unfamiliar object in order to work out how to grasp it. And we use our understanding of the physical world—the fact that tables are heavy and chairs will fall over if they lean back—to move around. It is Tenenbaum’s contention that higher levels of intelligence, such as reasoning and even language, can build upon this.

Enabling machines to understand the three-dimensional world should have important near-term practical applications. “This is definitely something we’re going to need if we’re going to have robots that interact with the physical world,” Tenenbaum adds. “They have to be able to deal with the fact that the physical world is three-dimensional, and it has stuff in it.”

Many researchers at NIPS are experimenting with machine-learning systems that exist inside simplified 3-D worlds. This offers a way to develop and test simple ideas that might eventually transfer to the real world. A group from Microsoft, for example, showed a machine-learning system developed inside an experimental version of the computer game Minecraft.

A range of new three-dimensional environments aimed at AI researchers should drive further research in this area (see “A 3-D World for Smarter AI Agents” and “New Tool Lets AI Learn to Do Almost Anything on a Computer”).

Other work is already focusing on robots. A team from the University of California, Berkeley, led by Sergey Levine, presented a system that learns about the physical world using a combination of video imagery and experimentation. Their robot experiments by poking objects and studying the effect this has on the visual world in order to build a simple understanding of physics. It can then perform new actions based on this understanding. For example, after nudging an object many thousands of times, the robot (a research version of an industrial machine from Rethink Robotics) can move the object to a new place.

Tenenbaum isn’t the only one who believes that understanding actions in the physical world will be important to overall progress in AI. Nando de Freitas, a professor at the University of Oxford, said during a speech that without exploring the real world AI would remain lacking. “The only way to figure out physics is to interact,” he said. “Just learning from pixels isn’t enough.”

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

This baby with a head camera helped teach an AI how kids learn language

A neural network trained on the experiences of a single young child managed to learn one of the core components of language: how to match words to the objects they represent.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.