Skip to Content

Robots Get an ‘Undo’ Button That Could Help Them Learn Faster

November 27, 2017

Deep reinforcement learning works a lot like a child learning a skill: practice makes perfect. For an autonomous agent like a robot, though, its environment has to be reset to its original state between attempts—a chore that can take hours as humans scurry around replacing objects, for example.

A new arXiv paper by researchers with Google Brain, the University of Cambridge, the Max Planck Institute for Intelligent Systems, and UC Berkeley details a method that can teach an agent to reset the environment for the next attempt, as well as stop it from performing actions that would be irreversible. 

Their advance was to give agents a “forward” and “reset” policy that work together. While the forward policy is tasked with learning a skill, hitting reset forces an agent to learn how to “leave no trace,” effectively rewinding an action. Actions that the robot thinks would be irreversible are aborted as soon as possible.

The researchers write that they sought to give their agents “intuition” to classify anything that is reversible as safe, since it’s possible to go back to the original state. Through trial and error, the agent discovers that more and more actions are reversible, allowing it to explore safely.

Deep reinforcement learning is often done in simulation, and especially when real-world environments will be less forgiving of errors, like an autonomous car driving over a cliff. Even for safer situations, waiting for manual resets can become a bottleneck for data collection. For this reason, the team’s work was confined to virtual environments. Eventually, however, real-world testing has to be done, and this research could make it faster and safer.

As Jack Clark points out in his Import AI newsletter, this paper echoes the work outlined in another paper (PDF) from Facebook AI Research last month, in which a single agent has two separate modes, nicknamed Alice and Bob, one of which tries to reverse the task the other has attempted to complete. This type of work to make AI able to plan ahead could save it (and us) from disastrous mistakes in the future.

Deep Dive


Our best illustrations of 2022

Our artists’ thought-provoking, playful creations bring our stories to life, often saying more with an image than words ever could.

How CRISPR is making farmed animals bigger, stronger, and healthier

These gene-edited fish, pigs, and other animals could soon be on the menu.

The Download: the Saudi sci-fi megacity, and sleeping babies’ brains

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. These exclusive satellite images show Saudi Arabia’s sci-fi megacity is well underway In early 2021, Crown Prince Mohammed bin Salman of Saudi Arabia announced The Line: a “civilizational revolution” that would house up…

10 Breakthrough Technologies 2023

Every year, we pick the 10 technologies that matter the most right now. We look for advances that will have a big impact on our lives and break down why they matter.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.