A new data set reveals just how bad AI is at reasoning—and suggests that a new hybrid approach might be the best way forward.
Questions, questions: Known as CLEVRER, the data set consists of 20,000 short synthetic video clips and more than 300,000 question and answer pairings that reason about the events in the videos. Each video shows a simple world of toy objects that collide with one another following simulated physics. In one, a red rubber ball hits a blue rubber cylinder, which continues on to hit a metal cylinder.
The questions fall into four categories: descriptive (e.g., “What shape is the object that collides with the cyan cylinder?”), explanatory (“What is responsible for the gray cylinder’s collision with the cube?”), predictive (“Which event will happen next?”), and counterfactual (“Without the gray object, which event will not happen?”). The questions mirror many of the concepts that children learn early on as they explore their surroundings. But the latter three categories, which specifically require causal reasoning to answer, often stump deep-learning systems.
Fail: The data set, created by researchers at Harvard, DeepMind, and MIT-IBM Watson AI Lab is meant to help evaluate how well AI systems can reason. When the researchers tested several state-of-the-art computer vision and natural language models with the data set, they found that all of them did well on the descriptive questions but poorly on the others.
Mixing the old and the new: The team then tried a new AI system that combines both deep learning and symbolic logic. Symbolic systems used to be all the rage before they were eclipsed by machine learning in the late 1980s. But both approaches have their strengths: deep learning excels at scalability and pattern recognition; symbolic systems are better at abstraction and reasoning.
The composite system, known as a neuro-symbolic model, leverages both: it uses a neural network to recognize the colors, shapes, and materials of the objects and a symbolic system to understand the physics of their movements and the causal relationships between them. It outperformed existing models across all categories of questions.
Why it matters: As children, we learn to observe the world around us, infer why things happened and make predictions about what will happen next. These predictions help us make better decisions, navigate our environments, and stay safe. Replicating that kind of causal understanding in machines will similarly equip them to interact with the world in a more intelligent way.
A Roomba recorded a woman on the toilet. How did screenshots end up on Facebook?
Robot vacuum companies say your images are safe, but a sprawling global supply chain for data from our devices creates risk.
The viral AI avatar app Lensa undressed me—without my consent
My avatars were cartoonishly pornified, while my male colleagues got to be astronauts, explorers, and inventors.
Roomba testers feel misled after intimate images ended up on Facebook
An MIT Technology Review investigation recently revealed how images of a minor and a tester on the toilet ended up on social media. iRobot said it had consent to collect this kind of data from inside homes—but participants say otherwise.
How to spot AI-generated text
The internet is increasingly awash with text written by AI software. We need new tools to detect it.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.