The robot dog is waving its legs in the air like an exasperated beetle. After 10 minutes of struggling, it manages to roll over to its front. Half an hour in, the robot is taking its first clumsy steps, like a newborn calf. But after one hour, the robot is strutting around the lab with confidence.
What makes this four-legged robot special is that it learned to do all this by itself, without being shown what to do in a computer simulation.
Danijar Hafner and colleagues at the University of California, Berkeley, used an AI technique called reinforcement learning, which trains algorithms by rewarding them for desired actions, to train the robot to walk from scratch in the real world. The team used the same algorithm to successfully train three other robots, such as one that was able to pick up balls and move them from one tray to another.
Traditionally, robots are trained in a computer simulator before they attempt to do anything in the real world. For example, a pair of robot legs called Cassie taught itself to walk using reinforcement learning, but only after it had done so in a simulation.
“The problem is your simulator will never be as accurate as the real world. There’ll always be aspects of the world you’re missing,” says Hafner, who worked with colleagues Alejandro Escontrela and Philipp Wu on the project and is now an intern at DeepMind. Adapting lessons from the simulator to the real world also requires extra engineering, he says.
The team’s algorithm, called Dreamer, uses past experiences to build up a model of the surrounding world. Dreamer also allows the robot to conduct trial-and-error calculations in a computer program as opposed to the real world, by predicting potential future outcomes of its potential actions. This allows it to learn faster than it could purely by doing. Once the robot had learned to walk, it kept learning to adapt to unexpected situations, such as resisting being toppled by a stick.
“Teaching robots through trial and error is a difficult problem, made even harder by the long training times such teaching requires,” says Lerrel Pinto, an assistant professor of computer science at New York University, who specializes in robotics and machine learning. Dreamer shows that deep reinforcement learning and world models are able to teach robots new skills in a really short amount of time, he says.
Jonathan Hurst, a professor of robotics at Oregon State University, says the findings, which have not yet been peer-reviewed, make it clear that “reinforcement learning will be a cornerstone tool in the future of robot control.”
Removing the simulator from robot training has many perks. The algorithm could be useful for teaching robots how to learn skills in the real world and adapt to situations like hardware failures, Hafner says–for example, a robot could learn to walk with a malfunctioning motor in one leg.
The approach could also have huge potential for more complicated things like autonomous driving, which require complex and expensive simulators, says Stefano Albrecht, an assistant professor of artificial intelligence at the University of Edinburgh. A new generation of reinforcement-learning algorithms could “super quickly pick up in the real world how the environment works,” Albrecht says.
But there are some big unsolved problems, Pinto says.
With reinforcement learning, engineers need to specify in their code which behaviors are good and are thus rewarded, and which behaviors are undesirable. In this case, turning over and walking is good, while not walking is bad. “A roboticist will need to do this for each and every task [or] problem they want the robot to solve,” says Pinto. That is incredibly time consuming, and it is difficult to program behaviors for unexpected situations.
And while simulators can be inaccurate, so can world models, Albrecht says. “World models start from nothing, so initially the predictions from the models will be completely all over the place,” he says. It takes time until they get enough data to make them accurate.
In the future, Hafner says, it would be nice to teach the robot to understand spoken commands. Hafner says the team also wants to connect cameras to the robot dog to give it vision. This would allow it to navigate in complex indoor situations, such as walking to a room, finding objects, and—yes!—playing fetch.
This new data poisoning tool lets artists fight back against generative AI
The tool, called Nightshade, messes up training data in ways that could cause serious damage to image-generating AI models.
Rogue superintelligence and merging with machines: Inside the mind of OpenAI’s chief scientist
An exclusive conversation with Ilya Sutskever on his fears for the future of AI and why they’ve made him change the focus of his life’s work.
Driving companywide efficiencies with AI
Advanced AI and ML capabilities revolutionize how administrative and operations tasks are done.
Unpacking the hype around OpenAI’s rumored new Q* model
If OpenAI's new model can solve grade-school math, it could pave the way for more powerful systems.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.