An AI-driven robot hand spent a hundred years teaching itself to rotate a cube

A reinforcement-learning algorithm allows Dactyl to learn physical tasks by practicing them in a virtual-reality environment.

Will Knightarchive page

July 30, 2018

Dactyl

AI researchers have demonstrated a self-teaching algorithm that gives a robot hand remarkable new dexterity. Their creation taught itself to manipulate a cube with uncanny skill by practicing for the equivalent of a hundred years inside a computer simulation (though only a few days in real time).

The robotic hand is still nowhere near as agile as a human one, and far too clumsy to be deployed in a factory or a warehouse. Even so, the research shows the potential for machine learning to unlock new robotic capabilities. It also suggests that someday robots might teach themselves new skills inside virtual worlds, which could greatly speed up the process of programming or training them.

The robotic system, dubbed Dactyl, was developed by researchers at OpenAI, a nonprofit based in Silicon Valley. It uses an off-the-shelf robotic hand from a UK company called Shadow, an ordinary camera, and an algorithm that’s already mastered a sprawling multiplayer video game, DotA, using the same self-teaching approach (see “A team of AI algorithms just crushed humans in a complex computer game”).

The algorithm uses a machine-learning technique known as reinforcement learning. Dactyl was given the task of maneuvering a cube so that a different face was upturned. It was left to figure out, through trial and error, which movements would produce the desired results.

Videos of Dactyl show it rotating the cube with impressive agility. It automatically figured out several grips that humans commonly use. But the research also showed how far AI still has to go: the robot was able to manipulate the cube successfully just 13 out of 50 times after its hundred years of virtual training time—far more than a human child needs.

“It is not going to fit into an industrial workflow any time soon,” says Rodney Brooks, a professor emeritus at MIT and the founder of Rethink Robotics, a startup that makes more intelligent industrial robots. “But that is fine—research is a good thing to do.”

Reinforcement learning is inspired by the way animals seem to learn through positive feedback. It was first proposed decades ago, but it has only proved practical in recent years thanks to advances involving artificial neural networks (see “10 breakthrough technologies 2017: Reinforcement learning”). The Alphabet subsidiary DeepMind used reinforcement learning to create AlphaGo, a computer program that taught itself to play the fiendishly complex and subtle board game Go with superhuman skill.

Other robotics researchers have been testing the approach for a while but have been hamstrung by the difficulty of mimicking the real world’s complexity and unpredictability. The OpenAI researchers got around this by introducing random variations in their virtual world, so that the robot could learn to account for nuisances like friction, noise in the robot’s hardware, and moments when the cube is partly hidden from view.

Alex Ray, one of the engineers behind the robot, says Dactyl could be improved by giving it more processing power and introducing more randomization. “I don’t think we’ve yet hit the limit,” he says. Ray adds that there’s no plan to try to commercialize the technology. His team is focused purely on developing the most powerful generalized learning approaches possible.

“This is hard to do well,” says Dmitry Berenson, a roboticist at the University of Michigan who specializes in machine manipulation. Berenson says it isn’t exactly clear how far the latest machine-learning approaches will take us. “There’s a lot of human effort involved with coming up with the right network for a specific task,” he says. But he believes simulated learning could prove very useful: “If we can reliably cross the ‘reality gap,’ it makes learning exponentially easier.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.