Reinforcement learning (RL) is suffering from what I call the “big baby problem.”
RL is a category of machine learning that uses rewards and penalties to achieve a desired goal. But the benchmark tasks used to measure how RL algorithms are performing—like Atari video games and simulation environments—don’t reflect the complexity of the natural world.
As a result, the algorithms have grown more sophisticated without confronting real-world problems—leaving them too fragile to operate beyond deterministic and narrowly defined environments. (Big baby, see what I mean?)
This defeats the purpose of RL, which is to eventually develop robots that can adapt to changing physical surroundings. If you train a robot to pour a glass of water, for example, you want it to be able to do that with any given sink. But benchmarking the RL algorithms on Atari is like “training, testing, and evaluating on a single sink,” says Amy Zhang, a PhD student at McGill University and part-time research engineer on the Facebook AI Research team.
That’s why Zhang, along with two other collaborators, proposed three new families of benchmark tasks to better reflect the natural world. Two of them are focused on visual reasoning, in which the algorithm must train to navigate inside a natural image either to classify that image or to find a target object to settle on. The third modifies the existing Atari benchmark by swapping out the video game’s black background for randomly selected video clips.
“In the original Atari, a model can just learn to memorize every screen,” says Zhang. “In this setting, where we have video, every screen is going to be different, so it actually needs to learn to visually comprehend the scene to understand what's going on.”
“That seems a lot closer to real-world robotics,” she says.
When the researchers tested current RL algorithms on the new benchmarks, the algorithms tripped up significantly. “So that means that there's more work to be done to figure out how we can learn more generalizable, more robust models in RL,” Zhang says.
Until the algorithms can cope with a small dose of complexity, they certainly won’t be able to fare in more dynamic environments.
An abridged version of this story appeared in our AI newsletter The Algorithm. To have it directly delivered to your inbox, subscribe here for free.
A Roomba recorded a woman on the toilet. How did screenshots end up on Facebook?
Robot vacuum companies say your images are safe, but a sprawling global supply chain for data from our devices creates risk.
The viral AI avatar app Lensa undressed me—without my consent
My avatars were cartoonishly pornified, while my male colleagues got to be astronauts, explorers, and inventors.
Roomba testers feel misled after intimate images ended up on Facebook
An MIT Technology Review investigation recently revealed how images of a minor and a tester on the toilet ended up on social media. iRobot said it had consent to collect this kind of data from inside homes—but participants say otherwise.
How to spot AI-generated text
The internet is increasingly awash with text written by AI software. We need new tools to detect it.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.