Facebook’s Artificial-Intelligence Software Gets a Dash More Common Sense

Software that knows if a stack of virtual blocks will fall could show the way to machines that learn the basics of physical reality.

Tom Simonitearchive page

November 3, 2015

Will this tower of virtual blocks fall over? Facebook has trained up software that knows the answer.

Facebook software has learned to accurately predict whether precariously balanced virtual blocks will fall down.

Artificial-intelligence researchers at the company took on the project in an effort to explore how computers might learn some basic physical common sense. Understanding that, for example, unsupported objects fall, or that a larger object won’t fit inside a smaller one, is central to the way we humans predict, explain, and communicate about the world. If machines are to be more useful, they’ll need the same kind of common-sense understanding, says Mike Schroepfer, Facebook’s chief technology officer.

“We’ve got to teach computer systems to understand the world in a similar way,” he said, at a preview last week of results he will share today at the Web Summit in Dublin, Ireland.

Humans learn the basic physics of reality at a young age by playing and observing the world. Facebook drew on its image-processing software to create a system that learned to predict whether a stack of virtual blocks will tumble.

The software learns by being given access to images of virtual stacks like the one on this page, or sometimes two stereo images like those from a pair of eyes. In the learning phase, it was shown many different stacks, some that toppled and others that didn’t. The simulation informed the learning software which did which. After enough examples, it could predict for itself with 90 percent accuracy if a particular stack was likely to tumble. “If you run through a series of tests, it will beat most people,” said Schroepfer.

The research was done by Facebook’s artificial-intelligence research group in New York. It is mostly focused on crafting software that could understand images and language using a technique known as deep learning. Yesterday Facebook’s group showed off a mobile app capable of answering questions about the content of photos.

Yann LeCun, director of the group and also a professor at NYU, told MIT Technology Review that the system for predicting when blocks will topple shows that more complex physical simulations might be used to teach more basic principles of physical common sense. “It serves to establish a baseline–if we were to train the system unsupervised, it has enough power to figure things out like that,” he said.

LeCun’s group previously developed a system called a “memory network” that can pick up some basic common sense and verbal reasoning skills by reading simple stories (see “Teaching Machines to Understand Us”).

It has now graduated to helping power a virtual assistant Facebook is testing, called M (see “Facebook’s Cyborg Virtual Assistant Is Learning from Its Trainers”). M is much more capable than Apple’s Siri or similar apps because it is powered by bank of human operators, but Facebook hopes they will gradually become less important as its software learns to field queries for itself.

Adding the memory network to M is showing how that might happen, says Schroepfer. By watching the interactions between people using M and the customer service agents responding, it has already learned how to handle some common queries.

For example, if someone asks for flowers to be delivered, the trained memory network knows the most important things to ask are “What’s your budget?” and “Where are you sending them?” The system now automatically offers to ask those two questions for a human agent at the click of a button, helping them respond more quickly. In the future, the memory network might be able to handle certain requests automatically, says Schroepfer.

Facebook has not committed to turning M into a widely available product, but Schroepfer says the result shows how it may be possible. “The system has figured this out just by watching the humans,” he said. “We cannot afford to hire operators for the entire world. But with the right AI system, we could deploy that for the entire planet.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.