A Robot That Learns to Use Tools

By shoving objects around on a table, UMan figures out how they work.

Kristina Grifantiniarchive page

July 1, 2008

To assist humans around the house, robots will need to be able to deal with the unfamiliar. But while researchers can preprogram robots to do increasingly sophisticated tasks, they face a much bigger challenge in teaching them to adapt to unstructured environments. A robot developed at the University of Massachusetts Amherst, however, is able to learn to use objects that it has never encountered before.

**Tactile learner:** The UMan robot has wheels, a battery pack, a one-meter arm, and a three-fingered hand, which it uses to prod objects on a table in order to determine how they move.

The robot–called the UMass Mobile Manipulator, or UMan–pushes objects around on a table to see how they move. Once it identifies an object’s moving parts, it begins to experiment with it, manipulating it to perform tasks. “You can imagine a baby playing with a toy and pulling the different parts and seeing what moves how,” says lead author and graduate student Dov Katz, who did the work with Oliver Brock, a professor of computer science.

“One of the challenges in robotics is having [a robot] act intelligently, even when it doesn’t know the shape of the object,” says Andrew Ng, a computer scientist at Stanford University who works on robotic gripping.

“I think their work is an important step in this direction,” says Ng. “Previously, if someone wants a robot to use a pair of scissors, they will write a lot of software [defining] what scissors are and how the two blades move relative to each other. In contrast, Katz and Brock propose a completely new approach, where the robot plays with a pair of scissors by itself and figures out how the two blades are connected to each other.”

UMan uses a regular webcam to look down at a table from above. By analyzing differences between adjacent pixels, it guesses where an object’s edges might be found. Then it prods the object and, on the basis of how it moves, revises its estimate of the object’s shape (see video below). It continues shoving the object around, observing how its parts move in relation to each other. UMan will push the object backward and forward along its width and length and at a 45-degree angle to both, if necessary, until it’s satisfied that it understands how the object moves. Wherever the movement is restricted, the robot concludes that there’s a joint. UMan then uses that information to figure out the best way to manipulate the object. It can also tell if there are multiple joints, and how those relate to each other.

Credit: Dov Katz

Katz says that his team was inspired by the work of Paul Fitzpatrick, a researcher at the LIRA-Lab at the University of Genoa, in Italy. In Fitzpatrick’s research, a robot tapped an object to distinguish it from its visual background. “What I like about the Amherst work, compared to my own, is that they are extracting a lot more information from essentially the same action,” says Fitzpatrick. This is “the robot equivalent of ‘fumbling around’ with an object, where you don’t really know enough about it to manipulate it dexterously.”

As of now, UMan is not equipped to pick up objects; instead, it manipulates them on the surface of the table. It has successfully learned how to manipulate scissors, shears, and several different kinds of wooden toys. A little shorter than the average human, it has a single arm that’s about a meter long. The arm’s seven degrees of freedom make it “very similar to a human arm in its flexibility,” according to Katz. The arm has a three-fingered hand and is mounted on a rotating base.

The researchers expect that UMan will soon be able to use past experience as a guide to handling new objects. In computer simulations, they’ve tested a learning algorithm for UMan, so that “the next time [it] sees a similar object, [it] can generalize and use the same action,” says Katz. For example, “you learn something about a pair of scissors, and next time you see a stapler you understand it has a similar structure.” In the simulations, the algorithm was able to identify joints by pushing objects in only one direction, as opposed to the six that UMan currently uses. But Katz hopes that eventually the robot won’t even need to touch a new object: it will generalize about it on the basis of visual observation alone. Katz expects to test the learning algorithm in the real world in the next year.

“This work seems like a step toward a more humanlike, manipulation-sensing-perception process,” says Josh Smith, who works on sensing for robotic grasping at Intel. The UMass approach, Smith says, is “philosophically interesting in the way it combines manipulation with sensing and perception.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.