Skip to Content

Your Robotic Personal Assistant

New software lets robots pick up objects they have never seen before–an important step toward creating multifunctional domestic helpers.
November 28, 2007

Aside from the Roomba, robots haven’t made much progress infiltrating American homes. But researchers at Stanford University have developed software that overcomes one of the biggest challenges: teaching a robot how to pick up an object it has never encountered before. The robot’s software suggests that the best way to pick up something new is by determining the most grabable part of the object–the stem of a wineglass, the handle of a mug, or the edge of a book, for instance.

Pick it up: Stanford researchers have designed software that helps a robot grab objects that it has never seen before. The hardware sits on a Segway wheel base and includes two lasers for navigation, a robotic arm for grasping, speakers, cameras, and a microphone.

Engineers and science-fictions fans have long dreamed of putting robotics in the home, says Andrew Ng, professor of computer science at Stanford. In fact, the robotic hardware that exists today could allow a robot to do the complex tasks that are required to pick up objects, keep a house clean, and so on. But the missing piece, Ng explains, is software that can allow robots to do these things by themselves. A dexterous robot with the smarts to pick up new objects without being specifically programmed to do so could be useful for complex domestic tasks such as feeding the pets and loading the dishwasher.

While it’s true that some robots are capable of picking up specific objects, even on a cluttered table, they do so with the help of specific three-dimensional models that have been preprogrammed, says Aaron Edsinger, founder of Meka Robotics, a startup in San Francisco. “But this assumes that we’re going to be able to know ahead of time what objects are out there,” he says. This might be inessential in a carefully constructed nursing home, for instance, but it would be essential in a busy family’s apartment or house.

Instead of using predetermined models of objects, some roboticists, including Edsinger and Ng, are building perception systems for robots that look for certain features on objects that are good for grasping. The Stanford team has approached the problem by collecting a number of previously fragmented technologies, says Ng, such as computer vision, machine learning, speech recognition, and grasping hardware, and put them together in a robot called STAIR (Stanford Artificial Intelligence Robot).

Multimedia

  • Watch the robot obey instructions to retrieve a stapler.

  • Watch the robot pick up a number of differently shaped objects.

  • Watch the robot open a door.

STAIR’s hardware consists of a mobile robotic arm with a microphone, a speaker, sensors, and cameras that help the arm retrieve objects. The robot’s software has its foundation in machine-learning algorithms that can be trained to perform certain functions. The researchers trained the software using 2,500 pictures of objects, with graspable regions identified.

But making the leap from two-dimensional pictures to a three-dimensional world was a challenge, says Ng. Typically, a robot can create a 3-D view of its environment–so it knows how far away the coffeepot is from its hand–using the input from two cameras. This distance is usually determined by collecting a large number of points on an object with the right and left cameras, and then triangulating all the data to build a 3-D model. This process takes a lot of computing power and time, however.

Ng’s team developed an alternative that simplifies the process. Instead of collecting data about lots of points on an object, the researchers’ algorithm identifies the midpoint of a graspable portion of an object, such as a handle, by calculating the edges of an object and comparing this with the edges of statistically similar objects in the database. The software matches this point using both cameras and triangulates the distance. “This was the key idea that made all of our grasping things work,” Ng says. “We’ve now done things like load items from a dishwasher.”

Robots still need to learn the finer points of automatic manipulation, Ng adds. STAIR was designed only to grasp objects, and not to adjust its grasp depending on the situation. For instance, it wasn’t built to pour coffee from a pot–a task that might require a different grasp position and a different amount of pressure than simply picking up the pot and placing it on a shelf. Additionally, the software doesn’t know the consistency of the object–whether it’s squishy or solid. But researchers are working on these problems, and ultimately, a personal robot will have a combination of sensing technologies and different software that will allow it to pick up and manipulate an object. (See “Robots That Sense Before They Touch.”)

It could be years before all the technologies are integrated well enough so that robots can handle complex household chores on their own, but the Stanford work is pushing the dream forward. “If I had to pick one thing that’s holding back this vision of personal robotics, it would be the ability to pick things up and manipulate them,” says Josh Smith, senior research scientist at Intel Research, in Seattle. “We need more grasping strategies, like [the Stanford researchers’], that don’t require an explicit 3-D model of the object.” He adds that in addition to the robot having improved computer vision techniques, the actual hand of the robot will most likely have a number of sensors that can feel if an object is moving or if the grasp isn’t right. “Much richer sensing in the hand will be an important part of the solution,” Smith says.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

This baby with a head camera helped teach an AI how kids learn language

A neural network trained on the experiences of a single young child managed to learn one of the core components of language: how to match words to the objects they represent.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.