How AI taught Cassie the two-legged robot to run and jump

Reinforcement learning can help robots tackle new tasks they haven't tried before

Rhiannon Williamsarchive page

March 18, 2024

Hybrid Robotics via YouTube

If you’ve watched Boston Dynamics’ slick videos of robots running, jumping and doing parkour, you might have the impression robots have learned to be amazingly agile. In fact, these robots are still coded by hand, and would struggle to deal with new obstacles they haven’t encountered before.

However, a new method of teaching robots to move could help to deal with new scenarios, through trial and error—just as humans learn and adapt to unpredictable events.

Researchers used an AI technique called reinforcement learning to help a two-legged robot nicknamed Cassie to run 400 meters, over varying terrains, and execute standing long jumps and high jumps, without being trained explicitly on each movement. Reinforcement learning works by rewarding or penalizing an AI as it tries to carry out an objective. In this case, the approach taught the robot to generalize and respond in new scenarios, instead of freezing like its predecessors may have done.

“We wanted to push the limits of robot agility,” says Zhongyu Li, a PhD student at University of California, Berkeley, who worked on the project, which has not yet been peer-reviewed. “The high-level goal was to teach the robot to learn how to do all kinds of dynamic motions the way a human does.”

The team used a simulation to train Cassie, an approach that dramatically speeds up the time it takes it to learn—from years to weeks—and enables the robot to perform those same skills in the real world without further fine-tuning.

Firstly, they trained the neural network that controlled Cassie to master a simple skill from scratch, such as jumping on the spot, walking forward, or running forward without toppling over. It was taught by being encouraged to mimic motions it was shown, which included motion capture data collected from a human and animations demonstrating the desired movement.

After the first stage was complete, the team presented the model with new commands encouraging the robot to perform tasks using its new movement skills. Once it became proficient at performing the new tasks in a simulated environment, they then diversified the tasks it had been trained on through a method called task randomization.

This makes the robot much more prepared for unexpected scenarios. For example, the robot was able to maintain a steady running gait while being pulled sideways by a leash. “We allowed the robot to utilize the history of what it’s observed and adapt quickly to the real world,” says Li.

Cassie completed a 400-meter run in two minutes and 34 seconds, then jumped 1.4 meters in the long jump without needing additional training.

The researchers are now planning on studying how this kind of technique could be used to train robots equipped with on-board cameras. This will be more challenging than completing actions blind, adds Alan Fern, a professor of computer science at Oregon State University who helped to develop the Cassie robot but was not involved with this project.

“The next major step for the field is humanoid robots that do real work, plan out activities, and actually interact with the physical world in ways that are not just interactions between feet and the ground,” he says.