Deep-Learning Robot Takes 10 Days to Teach Itself to Grasp

Leave a human baby with some toys and it’ll quickly learn to pick them up. Now a robot with deep-learning capabilities has done the same thing.

Emerging Technology from the arXivarchive page

October 5, 2015

One of the goals of general purpose robots is to interact in an intelligent way with everyday objects. But robotic grasping skills are embarrassingly poor. Ask a robot to pick up a TV remote or a bottle of water or a toy gun and it will endlessly fumble with it—unless specifically programmed to pick up that object in a specially controlled environment.

That’s in stark contrast to human grasping capabilities. A human baby quickly learns to grasp such objects, even in the most cluttered and unstructured environments.

And therein lies an important clue. Could robots learnt to grasp like babies, by repeated trial and error?

Today, Lerrel Pinto and Abhinav Gupta at Carnegie Mellon University in Pittsburgh show how this is possible. These guys have equipped a robot called Baxter with powerful deep learning capabilities, placed a table full of ordinary objects in front of it and then left it to learn, like a baby playing in a high chair.

Baxter is a modern two-armed industrial robot designed to perform repeatable tasks in environments such as factory floors. Each arm has a standard two-fingered parallel gripper and a high resolution camera to allow the robot to see what it is grasping close up. It also has a Microsoft Kinect sensor to provide an overview of the table in front of it.

Pinto and Gupta programmed Baxter to grasp an object by isolating it from its neighbors, then to pick a random point in that area, rotate the grippers to a certain angle and then move the grippers down vertically to attempt a grasp. The robot then lifts its arm and determines whether the grasp has been successful using force sensors. For each point, it repeats the grasping process 188 times, each time after rotating the gripping angle by 10 degrees.

To allow the robot to learn, Pinto and Gupta placed a variety of objects on the table in front of Baxter, and simply left it for up to 10 hours a day, without human intervention. If the robot dropped an object on the floor, there were plenty of others it could continue with.

Baxter’s deep learning approach was fairly standard too. The researchers gave Baxter a conventional neural network that had been pretrained in object recognition, thereby giving Baxter some essential skills to start with. However, two layers of this network were devoted to learning to grasp in these random experiments.

The team then used a second stage of learning to improve Baxter’s skills. Having picked up some basics, the team used these on a new set of objects, some of which Baxter had already seen and others that were entirely novel.

Over 700 hours, Baxter performed some 50,000 grasps on 150 different objects, each time learning whether the approach was successful or not. These objects included a TV remote, a variety of plastic toys such as guns, cars, and so on, and other similarly sized objects. That produced a significant body of learning which allowed Baxter to predict whether or not a grasp would be successful almost 80 percent of the time.

To find out just how good it was, Pinto and Gupta compared Baxter to a number of other approaches. For example, one of these was a common sense approach in which the robot was programmed to pick up an object near its center, at its narrowest point while avoiding areas that were too narrow to grip (since the grippers could not close completely).

This lead to a grasping accuracy of only 62 percent, while other approaches were only a little better. None matched Baxter’s performance.

That’s interesting work that could have an important impact on the way robots like Baxter interact with the world. A key part of this approach is that it occurred in a cluttered, relatively unstructured environment like the ones humans cope with easily. What’s more, the grasping skills were more or less entirely self-taught.

Of course, Baxter and its convolutional neural network have a long way to go before they can match the grasping skills of a young child. An important skill to learn next will be how hard to grip, something that is important when handling delicate objects without damaging them.

Perhaps the ultimate test for Baxter will be the toothpaste challenge—the successful placement of a pea-sized blob of toothpaste on a toothbrush. That’s something humans learn at a young age (while mysteriously forgetting it again during their teenage years).

But at the rate robots like Baxter are learning, it won’t be long before humans lose their mastery of toothpaste world, just as they lost their mastery of the chess world almost 20 years ago.

Ref: arxiv.org/abs/1509.06825 : Supersizing Self-Supervision: Learning to Grasp from 50K Tries and 700 Robot Hours

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.