Skip to Content
Artificial intelligence

AI can’t just play video games all day if it’s ever going to grow up

November 21, 2018

Reinforcement learning (RL) is suffering from what I call the “big baby problem.”

RL is a category of machine learning that uses rewards and penalties to achieve a desired goal. But the benchmark tasks used to measure how RL algorithms are performing—like Atari video games and simulation environments—don’t reflect the complexity of the natural world.

As a result, the algorithms have grown more sophisticated without confronting real-world problems—leaving them too fragile to operate beyond deterministic and narrowly defined environments. (Big baby, see what I mean?)

This defeats the purpose of RL, which is to eventually develop robots that can adapt to changing physical surroundings. If you train a robot to pour a glass of water, for example, you want it to be able to do that with any given sink. But benchmarking the RL algorithms on Atari is like “training, testing, and evaluating on a single sink,” says Amy Zhang, a PhD student at McGill University and part-time research engineer on the Facebook AI Research team.

That’s why Zhang, along with two other collaborators, proposed three new families of benchmark tasks to better reflect the natural world. Two of them are focused on visual reasoning, in which the algorithm must train to navigate inside a natural image either to classify that image or to find a target object to settle on. The third modifies the existing Atari benchmark by swapping out the video game’s black background for randomly selected video clips.

“In the original Atari, a model can just learn to memorize every screen,” says Zhang. “In this setting, where we have video, every screen is going to be different, so it actually needs to learn to visually comprehend the scene to understand what's going on.”

“That seems a lot closer to real-world robotics,” she says.

When the researchers tested current RL algorithms on the new benchmarks, the algorithms tripped up significantly. “So that means that there's more work to be done to figure out how we can learn more generalizable, more robust models in RL,” Zhang says.

Until the algorithms can cope with a small dose of complexity, they certainly won’t be able to fare in more dynamic environments.

An abridged version of this story appeared in our AI newsletter The Algorithm. To have it directly delivered to your inbox, subscribe here for free.

Deep Dive

Artificial intelligence

Geoffrey Hinton tells us why he’s now scared of the tech he helped build

“I have suddenly switched my views on whether these things are going to be more intelligent than us.”

Deep learning pioneer Geoffrey Hinton has quit Google

Hinton will be speaking at EmTech Digital on Wednesday.

The future of generative AI is niche, not generalized

ChatGPT has sparked speculation about artificial general intelligence. But the next real phase of AI will be in specific domains and contexts.

Video: Geoffrey Hinton talks about the “existential threat” of AI

Watch Hinton speak with Will Douglas Heaven, MIT Technology Review’s senior editor for AI, at EmTech Digital.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.