Skip to Content

Robots Get an ‘Undo’ Button That Could Help Them Learn Faster

November 27, 2017

Deep reinforcement learning works a lot like a child learning a skill: practice makes perfect. For an autonomous agent like a robot, though, its environment has to be reset to its original state between attempts—a chore that can take hours as humans scurry around replacing objects, for example.

A new arXiv paper by researchers with Google Brain, the University of Cambridge, the Max Planck Institute for Intelligent Systems, and UC Berkeley details a method that can teach an agent to reset the environment for the next attempt, as well as stop it from performing actions that would be irreversible. 

Their advance was to give agents a “forward” and “reset” policy that work together. While the forward policy is tasked with learning a skill, hitting reset forces an agent to learn how to “leave no trace,” effectively rewinding an action. Actions that the robot thinks would be irreversible are aborted as soon as possible.

The researchers write that they sought to give their agents “intuition” to classify anything that is reversible as safe, since it’s possible to go back to the original state. Through trial and error, the agent discovers that more and more actions are reversible, allowing it to explore safely.

Deep reinforcement learning is often done in simulation, and especially when real-world environments will be less forgiving of errors, like an autonomous car driving over a cliff. Even for safer situations, waiting for manual resets can become a bottleneck for data collection. For this reason, the team’s work was confined to virtual environments. Eventually, however, real-world testing has to be done, and this research could make it faster and safer.

As Jack Clark points out in his Import AI newsletter, this paper echoes the work outlined in another paper (PDF) from Facebook AI Research last month, in which a single agent has two separate modes, nicknamed Alice and Bob, one of which tries to reverse the task the other has attempted to complete. This type of work to make AI able to plan ahead could save it (and us) from disastrous mistakes in the future.

Deep Dive


Embracing CX in the metaverse

More than just meeting customers where they are, the metaverse offers opportunities to transform customer experience.

Identity protection is key to metaverse innovation

As immersive experiences in the metaverse become more sophisticated, so does the threat landscape.

The modern enterprise imaging and data value chain

For both patients and providers, intelligent, interoperable, and open workflow solutions will make all the difference.

Scientists have created synthetic mouse embryos with developed brains

The stem-cell-derived embryos could shed new light on the earliest stages of human pregnancy.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.