Skip to Content
Uncategorized

Robots Get an ‘Undo’ Button That Could Help Them Learn Faster

November 27, 2017

Deep reinforcement learning works a lot like a child learning a skill: practice makes perfect. For an autonomous agent like a robot, though, its environment has to be reset to its original state between attempts—a chore that can take hours as humans scurry around replacing objects, for example.

A new arXiv paper by researchers with Google Brain, the University of Cambridge, the Max Planck Institute for Intelligent Systems, and UC Berkeley details a method that can teach an agent to reset the environment for the next attempt, as well as stop it from performing actions that would be irreversible. 

Their advance was to give agents a “forward” and “reset” policy that work together. While the forward policy is tasked with learning a skill, hitting reset forces an agent to learn how to “leave no trace,” effectively rewinding an action. Actions that the robot thinks would be irreversible are aborted as soon as possible.

The researchers write that they sought to give their agents “intuition” to classify anything that is reversible as safe, since it’s possible to go back to the original state. Through trial and error, the agent discovers that more and more actions are reversible, allowing it to explore safely.

Deep reinforcement learning is often done in simulation, and especially when real-world environments will be less forgiving of errors, like an autonomous car driving over a cliff. Even for safer situations, waiting for manual resets can become a bottleneck for data collection. For this reason, the team’s work was confined to virtual environments. Eventually, however, real-world testing has to be done, and this research could make it faster and safer.

As Jack Clark points out in his Import AI newsletter, this paper echoes the work outlined in another paper (PDF) from Facebook AI Research last month, in which a single agent has two separate modes, nicknamed Alice and Bob, one of which tries to reverse the task the other has attempted to complete. This type of work to make AI able to plan ahead could save it (and us) from disastrous mistakes in the future.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

This baby with a head camera helped teach an AI how kids learn language

A neural network trained on the experiences of a single young child managed to learn one of the core components of language: how to match words to the objects they represent.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.