Skip to Content

Robots Get an ‘Undo’ Button That Could Help Them Learn Faster

November 27, 2017

Deep reinforcement learning works a lot like a child learning a skill: practice makes perfect. For an autonomous agent like a robot, though, its environment has to be reset to its original state between attempts—a chore that can take hours as humans scurry around replacing objects, for example.

A new arXiv paper by researchers with Google Brain, the University of Cambridge, the Max Planck Institute for Intelligent Systems, and UC Berkeley details a method that can teach an agent to reset the environment for the next attempt, as well as stop it from performing actions that would be irreversible. 

Their advance was to give agents a “forward” and “reset” policy that work together. While the forward policy is tasked with learning a skill, hitting reset forces an agent to learn how to “leave no trace,” effectively rewinding an action. Actions that the robot thinks would be irreversible are aborted as soon as possible.

The researchers write that they sought to give their agents “intuition” to classify anything that is reversible as safe, since it’s possible to go back to the original state. Through trial and error, the agent discovers that more and more actions are reversible, allowing it to explore safely.

Deep reinforcement learning is often done in simulation, and especially when real-world environments will be less forgiving of errors, like an autonomous car driving over a cliff. Even for safer situations, waiting for manual resets can become a bottleneck for data collection. For this reason, the team’s work was confined to virtual environments. Eventually, however, real-world testing has to be done, and this research could make it faster and safer.

As Jack Clark points out in his Import AI newsletter, this paper echoes the work outlined in another paper (PDF) from Facebook AI Research last month, in which a single agent has two separate modes, nicknamed Alice and Bob, one of which tries to reverse the task the other has attempted to complete. This type of work to make AI able to plan ahead could save it (and us) from disastrous mistakes in the future.

Keep Reading

Most Popular

Geoffrey Hinton tells us why he’s now scared of the tech he helped build

“I have suddenly switched my views on whether these things are going to be more intelligent than us.”

Meet the people who use Notion to plan their whole lives

The workplace tool’s appeal extends far beyond organizing work projects. Many users find it’s just as useful for managing their free time.

Learning to code isn’t enough

Historically, learn-to-code efforts have provided opportunities for the few, but new efforts are aiming to be inclusive.

Deep learning pioneer Geoffrey Hinton has quit Google

Hinton will be speaking at EmTech Digital on Wednesday.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.