“Common sense” is the catch-all term for this kind of intuitive reasoning. It includes a grasp of simple physics: for example, knowing that the world is three-dimensional and that objects don’t actually disappear when they go out of view. It lets us predict where a bouncing ball or a speeding bike will be in a few seconds’ time. And it helps us join the dots between incomplete pieces of information: if we hear a metallic crash from the kitchen, we can make an educated guess that someone has dropped a pan, because we know what kinds of objects make that noise and when they make it.

In short, common sense tells us what events are possible and impossible, and which events are more likely than others. It lets us foresee the consequences of our actions and make plans—and ignore irrelevant details.

But teaching common sense to machines is hard. Today’s neural networks need to be shown thousands of examples before they start to spot such patterns.

In many ways common sense amounts to the ability to predict what’s going to happen next. “This is the essence of intelligence,” says LeCun. That’s why he—and a few other researchers—have been using video clips to train their models. But existing machine-learning techniques required the models to predict exactly what is going to happen in the next frame and generate it pixel by pixel. Imagine you hold up a pen and let it go, LeCun says. Common sense tells you that the pen will fall, but not the exact position it will end up in. Predicting that would require crunching some tough physics equations.

That’s why LeCun is now trying to train a neural network that can focus only on the relevant aspects of the world: predicting that the pen will fall but not exactly how. He sees this trained network as the equivalent of the world model that animals rely on.

Mystery ingredients

LeCun says he has built an early version of this world model that can do basic object recognition. He is now working on training it to make predictions. But how the configurator should work remains a mystery, he says. LeCun imagines that neural network as the controller for the whole system. It would decide what kind of predictions the world model should be making at any given time and what level of detail it should focus on to make those predictions possible, adjusting the world model as required.

LeCun is convinced that something like a configurator is needed, but he doesn’t know how to go about training a neural network to do the job. “We need to figure out a good recipe to make this work, and we don’t have that recipe yet,” he says.

In LeCun’s vision, the world model and configurator are two key pieces in a larger system, known as a cognitive architecture, that includes other neural networks—such as a perception model that senses the world and a model that uses rewards to motivate the AI to explore or curb its behavior.