An endlessly changing playground teaches AIs how to multitask

Virtual game worlds provide a non-stop stream of open-ended challenges that nudge AI towards general intelligence.

Will Douglas Heavenarchive page

July 30, 2021

DeepMind

DeepMind has developed a vast candy-colored virtual playground that teaches AIs general skills by endlessly changing the tasks it sets them. Instead of developing just the skills needed to solve a particular task, the AIs learn to experiment and explore, picking up skills they then use to succeed in tasks they’ve never seen before. It is a small step toward general intelligence.

What is it? XLand is a video-game-like 3D world that the AI players sense in color. The playground is managed by a central AI that sets the players billions of different tasks by changing the environment, the game rules, and the number of players. Both the players and the playground manager use reinforcement learning to improve by trial and error.

During training, the players first face simple one-player games, such as finding a purple cube or placing a yellow ball on a red floor. They advance to more complex multiplayer games like hide and seek or capture the flag, where teams compete to be the first to find and grab their opponent’s flag. The playground manager has no specific goal but aims to improve the general capability of the players over time.

Why is this cool? AIs like DeepMind’s AlphaZero have beaten the world’s best human players at chess and Go. But they can only learn one game at a time. As DeepMind cofounder Shane Legg put it when I spoke to him last year, it’s like having to swap out your chess brain for your Go brain each time you want to switch games.

Researchers are now trying to build AIs that can learn multiple tasks at once, which means teaching them general skills that make it easier to adapt.

video of AI agents experimenting in a virtual environment — Having learned to experiment, these bots improvised a ramp

One exciting trend in this direction is open-ended learning, where AIs are trained on many different tasks without a specific goal. In many ways, this is how humans and other animals seem to learn, via aimless play. But this requires a vast amount of data. XLand generates that data automatically, in the form of an endless stream of challenges. It is similar to POET, an AI training dojo where two-legged bots learn to navigate obstacles in a 2D landscape. XLand’s world is much more complex and detailed, however.

XLand is also an example of AI learning to make itself, or what Jeff Clune, who helped develop POET and leads a team working on this topic at OpenAI, calls AI-generating algorithms (AI-GAs). “This work pushes the frontiers of AI-GAs,” says Clune. “It is very exciting to see.”

What did they learn? Some of DeepMind’s XLand AIs played 700,000 different games in 4,000 different worlds, encountering 3.4 million unique tasks in total. Instead of learning the best thing to do in each situation, which is what most existing reinforcement-learning AIs do, the players learned to experiment—moving objects around to see what happened, or using one object as a tool to reach another object or hide behind—until they beat the particular task.

In the videos you can see the AIs chucking objects around until they stumble on something useful: a large tile, for example, becomes a ramp up to a platform. It is hard to know for sure if all such outcomes are intentional or happy accidents, say the researchers. But they happen consistently.

AIs that learned to experiment had an advantage in most tasks, even ones that they had not seen before. The researchers found that after just 30 minutes of training on a complex new task, the XLand AIs adapted to it quickly. But AIs that had not spent time in XLand could not learn these tasks at all.

Deep Dive

Artificial intelligence

What’s next for generative video

OpenAI's Sora has raised the bar for AI moviemaking. Here are four things to bear in mind as we wrap our heads around what's coming.

Will Douglas Heavenarchive page

Is robotics about to have its own ChatGPT moment?

Researchers are using generative AI and other techniques to teach robots new skills—including tasks they could perform in homes.

Melissa Heikkiläarchive page

The AI Act is done. Here’s what will (and won’t) change

The hard work starts now.

Melissa Heikkiläarchive page

An AI startup made a hyperrealistic deepfake of me that’s so good it’s scary

Synthesia's new technology is impressive but raises big questions about a world where we increasingly can’t tell what’s real.

Melissa Heikkiläarchive page

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

An endlessly changing playground teaches AIs how to multitask

Deep Dive

Artificial intelligence

What’s next for generative video

Is robotics about to have its own ChatGPT moment?

The AI Act is done. Here’s what will (and won’t) change

An AI startup made a hyperrealistic deepfake of me that’s so good it’s scary

Stay connected

Get the latest updates from
MIT Technology Review

The latest iteration of a legacy

Advertise with MIT Technology Review

About

Help

Deep Dive

Artificial intelligence

What’s next for generative video

Is robotics about to have its own ChatGPT moment?

The AI Act is done. Here’s what will (and won’t) change

An AI startup made a hyperrealistic deepfake of me that’s so good it’s scary

Stay connected

Get the latest updates fromMIT Technology Review

Get the latest updates from
MIT Technology Review