Skip to Content
Artificial intelligence

AI can’t just play video games all day if it’s ever going to grow up

November 21, 2018

Reinforcement learning (RL) is suffering from what I call the “big baby problem.”

RL is a category of machine learning that uses rewards and penalties to achieve a desired goal. But the benchmark tasks used to measure how RL algorithms are performing—like Atari video games and simulation environments—don’t reflect the complexity of the natural world.

As a result, the algorithms have grown more sophisticated without confronting real-world problems—leaving them too fragile to operate beyond deterministic and narrowly defined environments. (Big baby, see what I mean?)

This defeats the purpose of RL, which is to eventually develop robots that can adapt to changing physical surroundings. If you train a robot to pour a glass of water, for example, you want it to be able to do that with any given sink. But benchmarking the RL algorithms on Atari is like “training, testing, and evaluating on a single sink,” says Amy Zhang, a PhD student at McGill University and part-time research engineer on the Facebook AI Research team.

That’s why Zhang, along with two other collaborators, proposed three new families of benchmark tasks to better reflect the natural world. Two of them are focused on visual reasoning, in which the algorithm must train to navigate inside a natural image either to classify that image or to find a target object to settle on. The third modifies the existing Atari benchmark by swapping out the video game’s black background for randomly selected video clips.

“In the original Atari, a model can just learn to memorize every screen,” says Zhang. “In this setting, where we have video, every screen is going to be different, so it actually needs to learn to visually comprehend the scene to understand what's going on.”

“That seems a lot closer to real-world robotics,” she says.

When the researchers tested current RL algorithms on the new benchmarks, the algorithms tripped up significantly. “So that means that there's more work to be done to figure out how we can learn more generalizable, more robust models in RL,” Zhang says.

Until the algorithms can cope with a small dose of complexity, they certainly won’t be able to fare in more dynamic environments.

An abridged version of this story appeared in our AI newsletter The Algorithm. To have it directly delivered to your inbox, subscribe here for free.

Deep Dive

Artificial intelligence

What does GPT-3 “know” about me? 

Large language models are trained on troves of personal data hoovered from the internet. So I wanted to know: What does it have on me?

DeepMind has predicted the structure of almost every protein known to science

And it’s giving the data away for free, which could spur new scientific discoveries.

An AI that can design new proteins could help unlock new cures and materials 

The machine-learning tool could help researchers discover entirely new proteins not yet known to science.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.