Robots Learn to Make Pancakes from WikiHow Articles

Researchers at a European project are teaching robots to use written text to learn how to perform tasks.

Will Knightarchive page

August 24, 2015

If you’ve ever needed to know how to tie a bowtie or fix a strawberry daiquiri, you likely ended up on a website like WikiHow for step-by-step instructions. Surprisingly, some robots are now doing the same.

A robot called P2T removes the top from a bottle while working in a simulated lab setting.

A robot called PR2 in Germany is learning to prepare pancakes and pizzas by carefully reading through WikiHow’s written directions. It’s part of a European project called RoboHow, which is exploring ways of teaching robots to understand language. This could make it easier for people to communicate instructions to robots and provide a way for machines to figure out how to perform unfamiliar tasks. Instead of programming a robot to perform precise movements, the goal is for a person to simply tell a robot what to do.

Teaching robots how to turn high-level descriptions into specific actions is an important but challenging task. It is straightforward for humans because we have an understanding of all sorts of basic tasks, collected over a lifetime. A human does not need to be told the specific grasp needed to remove the top from a jar of tomato sauce, for instance, or that flipping a pancake involves using a spatula or some other kitchen utensil.

So the researchers behind the RoboHow project want to teach robots the general knowledge required to turn high-level instructions into specific actions. They have so far been able to convert a few WikiHow instructions into useful behavior, both in simulations and in real robots.

Achieving more could prove very useful as robots become more commonplace and need to work more closely with people. “If you have a robot in a factory, you want to say ‘Take the screw and put it into the nut and fasten the nut,’” says Michael Beetz, head of the Artificial Intelligence Institute at the University of Bremen in northern Germany, where the RoboHow project is based. “You want the robot to generate the parameters automatically out of the semantic description of objects.”

In one set of experiments, the researchers are teaching PR2 robots to perform simple lab tasks, such as handling chemicals.

Once a robot has learned how a particular set of instructions relates to a task, its knowledge is added to an online database called Open Ease, so that other robots can access that understanding. These instructions are encoded in machine-readable language similar to the one used in the Semantic Web project.

The researchers are using other techniques to help robots learn to perform basic tasks. This includes watching videos of humans performing those tasks and studying virtual-reality data when humans have performed tasks wearing gloves that allow their actions to be tracked.

Even simple manipulation remains a challenge for robots, although many researchers, including those at Amazon, are pushing to develop better robot grasping (see “Help Wanted: Robot to Fulfill Amazon Orders”). Natural language processing is also very challenging, but progress is being made here, too (see “Teaching Machines to Understand Us”).

Siddhartha Srinivasa, a professor at the Robotics Institute at Carnegie Mellon University, says connecting language with action is hugely important but also very difficult. “I have a four-year-old and often face disaster when I try to instruct him to assemble a toy,” Srinivasa says. “Succeeding in this domain will require a tight integration of natural language, grounding the understanding via perception, and planning complex actions via manipulation algorithms.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.