Over a year ago, OpenAI, the San Francisco–based for-profit AI research lab, announced that it had trained a robotic hand to manipulate a cube with remarkable dexterity.
That might not sound earth-shattering. But in the AI world, it was impressive for two reasons. First, the hand had taught itself how to fidget with the cube using a reinforcement-learning algorithm, a technique modeled on the way animals learn. Second, all the training had been done in simulation, but it managed to successfully translate to the real world. In both ways, it was an important step toward more agile robots for industrial and consumer applications.
“I was kind of amazed,” says Leslie Kaelbling, a roboticist and professor at MIT, of the 21018 results. “It’s not a thing I would have imagined that they could have made to work.”
In a new paper today, OpenAI has released the latest results with its robotic hand, Dactyl. This time Dactyl has learned to solve a Rubik’s cube with one hand—once again through reinforcement learning in simulation. This is notable not so much because a robot cracked the old puzzle as because the achievement took a new level of dexterity.
“This is a really hard problem,” says Dmitry Berenson, a roboticist at the University of Michigan who specializes in machine manipulation. “The kind of manipulation required to rotate the Rubik’s cube’s parts is actually much harder than to rotate a cube.”
Traditionally, robots have only been able to manipulate objects in very simple ways. While reinforcement-learning algorithms have seen great success in achieving complex tasks in software, such as beating the best human player in the ancient game of Go, using them to train a physical machine has been a different story. That’s because the algorithms must refine themselves through trial and error—in many cases, millions of rounds of it. It would probably take much too long, and a lot of wear and tear, for a physical robot to do this in the real world. It could even be dangerous if the robot thrashed about wildly to collect data.
To avoid this, roboticists use simulation: they build a virtual model of their robot and train it virtually to do the task at hand. The algorithm learns in the safety of the digital space and can be ported into a physical robot afterwards. But that process comes with its own challenges. It’s nearly impossible to build a virtual model that exactly replicates all the same laws of physics, material properties, and manipulation behaviors seen in the real world—let alone unexpected circumstances. Thus, the more complex the robot and task, the more difficult it is to apply a virtually trained algorithm in physical reality.
This is what impressed Kaelbling about OpenAI’s results a year ago. The key to its success was that the lab scrambled the simulated conditions in every round of training to make the algorithm more adaptable to different possibilities.
“They messed their simulator up in all kinds of crazy ways,” Kaelbling says. “Not only did they change how much gravity there is—they changed which way gravity points. So by trying to construct a strategy that worked reliably with all of these crazy permutations of the simulation, the algorithm actually ended up working in the real robot.”
In the latest paper, OpenAI takes this technique one step further. Previously, the researchers had to randomize the parameters in the environment by hand-picking which permutations they thought would lead to a better algorithm. Now the training system does this by itself. Each time the robot reaches a certain level of mastery in the existing environment, the simulator tweaks its own parameters to make the training conditions even harder.
The result is an even more robust algorithm that can move at the precision required to rotate a Rubik’s cube in real life. Through testing, the researchers found that Dactyl also successfully solved the cube under various conditions that it hadn’t been trained on. For example, it was able to complete the task while wearing a rubber glove, while having a few fingers bound together, and while being prodded by a stuffed toy giraffe.
OpenAI believes the latest results provide strong evidence that their approach will unlock more general-purpose robots that can adapt in open-ended environments such as a home kitchen. “A Rubik’s cube is one of the most complicated rigid objects out there,” says Marcin Andrychowicz of OpenAI. “I think other objects won’t be much more complicated.”
Though there are more complex tasks that involve more objects or deformable objects, he says, he feels confident that the lab’s method can train robots for all of them: “I think this approach is the approach to widespread adoption of robotics.”
Both Berenson and Kaelbling, however, remain skeptical. “There can be an impression that there’s one unified theory or system, and now OpenAI’s just applying it to this task and that task,” Berenson says of the previous and current paper. “But that’s not what’s happening at all. These are isolated tasks. There are common components, but there’s also a huge amount of engineering here to make each new task work.”
“That’s why I feel a little bit uncomfortable with the claims about this leading to general-purpose robots,” he says. “I see this as a very specific system meant for a specific application.”
Part of the problem, Berenson believes, is reinforcement learning itself. By nature, the technique is designed to master one particular thing, with some flexibility for handling variations. But in the real world, the number of potential variations extends beyond what can reasonably be simulated. In a cleaning task, for example, you could have different kinds of mops, different kinds of spills, and different kinds of floors.
Reinforcement learning is also designed for learning new capabilities largely from scratch. That is neither efficient in robotics nor true to how humans learn. “If you’re already a reasonably competent human and I tried to teach you a motor skill in the kitchen—like maybe you’ve never whipped something with a spoon—it’s not like you have to learn your whole motor control over again,” says Kaelbling.
Moving beyond these limitations, Berenson argues, will require other, more traditional robotics techniques. “There will be some learning processes—probably reinforcement learning—at the end of the day,” he says. “But I think that those actually should come much later.”