What AI still can’t do

Artificial intelligence won’t be very smart if computers don’t grasp cause and effect. That’s something even humans have trouble with.

Brian Bergsteinarchive page

February 19, 2020

Saiman Chow

In less than a decade, computers have become extremely good at diagnosing diseases, translating languages, and transcribing speech. They can outplay humans at complicated strategy games, create photorealistic images, and suggest useful replies to your emails.

Yet despite these impressive achievements, artificial intelligence has glaring weaknesses.

Machine-learning systems can be duped or confounded by situations they haven’t seen before. A self-driving car gets flummoxed by a scenario that a human driver could handle easily. An AI system laboriously trained to carry out one task (identifying cats, say) has to be taught all over again to do something else (identifying dogs). In the process, it’s liable to lose some of the expertise it had in the original task. Computer scientists call this problem “catastrophic forgetting.”

These shortcomings have something in common: they exist because AI systems don’t understand causation. They see that some events are associated with other events, but they don’t ascertain which things directly make other things happen. It’s as if you knew that the presence of clouds made rain likelier, but you didn’t know clouds caused rain.

Elias Bareinboim: AI systems are clueless when it comes to causation.

Understanding cause and effect is a big aspect of what we call common sense, and it’s an area in which AI systems today “are clueless,” says Elias Bareinboim. He should know: as the director of the new Causal Artificial Intelligence Lab at Columbia University, he’s at the forefront of efforts to fix this problem.

His idea is to infuse artificial-intelligence research with insights from the relatively new science of causality, a field shaped to a huge extent by Judea Pearl, a Turing Award–winning scholar who considers Bareinboim his protégé.

As Bareinboim and Pearl describe it, AI’s ability to spot correlations—e.g., that clouds make rain more likely—is merely the simplest level of causal reasoning. It’s good enough to have driven the boom in the AI technique known as deep learning over the past decade. Given a great deal of data about familiar situations, this method can lead to very good predictions. A computer can calculate the probability that a patient with certain symptoms has a certain disease, because it has learned just how often thousands or even millions of other people with the same symptoms had that disease.

But there’s a growing consensus that progress in AI will stall if computers don’t get better at wrestling with causation. If machines could grasp that certain things lead to other things, they wouldn’t have to learn everything anew all the time—they could take what they had learned in one domain and apply it to another. And if machines could use common sense we’d be able to put more trust in them to take actions on their own, knowing that they aren’t likely to make dumb errors.

Today’s AI has only a limited ability to infer what will result from a given action. In reinforcement learning, a technique that has allowed machines to master games like chess and Go, a system uses extensive trial and error to discern which moves will essentially cause them to win. But this approach doesn’t work in messier settings in the real world. It doesn’t even leave a machine with a general understanding of how it might play other games.

An even higher level of causal thinking would be the ability to reason about why things happened and ask “what if” questions. A patient dies while in a clinical trial; was it the fault of the experimental medicine or something else? School test scores are falling; what policy changes would most improve them? This kind of reasoning is far beyond the current capability of artificial intelligence.

Performing miracles

The dream of endowing computers with causal reasoning drew Bareinboim from Brazil to the United States in 2008, after he completed a master’s in computer science at the Federal University of Rio de Janeiro. He jumped at an opportunity to study under Judea Pearl, a computer scientist and statistician at UCLA. Pearl, 83, is a giant—the giant—of causal inference, and his career helps illustrate why it’s hard to create AI that understands causality.

Even well-trained scientists are apt to misinterpret correlations as signs of causation—or to err in the opposite direction, hesitating to call out causation even when it’s justified. In the 1950s, for example, a few prominent statisticians muddied the waters around whether tobacco caused cancer. They argued that without an experiment randomly assigning people to be smokers or nonsmokers, no one could rule out the possibility that some unknown—stress, perhaps, or some gene—caused people both to smoke and to get lung cancer.

Eventually, the fact that smoking causes cancer was definitively established, but it needn’t have taken so long. Since then, Pearl and other statisticians have devised a mathematical approach to identifying what facts would be required to support a causal claim. Pearl’s method shows that, given the prevalence of smoking and lung cancer, an independent factor causing both would be extremely unlikely.

Conversely, Pearl’s formulas also help identify when correlations can’t be used to determine causation. Bernhard Schölkopf, who researches causal AI techniques as a director at Germany’s Max Planck Institute for Intelligent Systems, points out that you can predict a country’s birth rate if you know its population of storks. That isn’t because storks deliver babies or because babies attract storks, but probably because economic development leads to more babies and more storks. Pearl has helped give statisticians and computer scientists ways of attacking such problems, Schölkopf says.

Judea Pearl: His theory of causal reasoning has transformed science.

Pearl’s work has also led to the development of causal Bayesian networks—software that sifts through large amounts of data to detect which variables appear to have the most influence on other variables. For example, GNS Healthcare, a company in Cambridge, Massachusetts, uses these techniques to advise researchers about experiments that look promising.

In one project, GNS worked with researchers who study multiple myeloma, a kind of blood cancer. The researchers wanted to know why some patients with the disease live longer than others after getting stem-cell transplants, a common form of treatment. The software churned through data with 30,000 variables and pointed to a few that seemed especially likely to be causal. Biostatisticians and experts in the disease zeroed in on one in particular: the level of a certain protein in patients’ bodies. Researchers could then run a targeted clinical trial to see whether patients with the protein did indeed benefit more from the treatment. “It’s way faster than poking here and there in the lab,” says GNS cofounder Iya Khalil.

Nonetheless, the improvements that Pearl and other scholars have achieved in causal theory haven’t yet made many inroads in deep learning, which identifies correlations without too much worry about causation. Bareinboim is working to take the next step: making computers more useful tools for human causal explorations.

Pearl says AI can’t be truly intelligent until it has a rich understanding of cause and effect, which would enable the introspection that is at the core of cognition.

One of his systems, which is still in beta, can help scientists determine whether they have sufficient data to answer a causal question. Richard McElreath, an anthropologist at the Max Planck Institute for Evolutionary Anthropology, is using the software to guide research into why humans go through menopause (we are the only apes that do).

The hypothesis is that the decline of fertility in older women benefited early human societies because women who put more effort into caring for grandchildren ultimately had more descendants. But what evidence might exist today to support the claim that children do better with grandparents around? Anthropologists can’t just compare the educational or medical outcomes of children who have lived with grandparents and those who haven’t. There are what statisticians call confounding factors: grandmothers might be likelier to live with grandchildren who need the most help. Bareinboim’s software can help McElreath discern which studies about kids who grew up with their grandparents are least riddled with confounding factors and could be valuable in answering his causal query. “It’s a huge step forward,” McElreath says.

The last mile

Bareinboim talks fast and often gestures with two hands in the air, as if he’s trying to balance two sides of a mental equation. It was halfway through the semester when I visited him at Columbia in October, but it seemed as if he had barely moved into his office—hardly anything on the walls, no books on the shelves, only a sleek Mac computer and a whiteboard so dense with equations and diagrams that it looked like a detail from a cartoon about a mad professor.

He shrugged off the provisional state of the room, saying he had been very busy giving talks about both sides of the causal revolution. Bareinboim believes work like his offers the opportunity not just to incorporate causal thinking into machines, but also to improve it in humans.

Getting people to think more carefully about causation isn’t necessarily much easier than teaching it to machines, he says. Researchers in a wide range of disciplines, from molecular biology to public policy, are sometimes content to unearth correlations that are not actually rooted in causal relationships. For instance, some studies suggest drinking alcohol will kill you early, while others indicate that moderate consumption is fine and even beneficial, and still other research has found that heavy drinkers outlive nondrinkers. This phenomenon, known as the “reproducibility crisis,” crops up not only in medicine and nutrition but also in psychology and economics. “You can see the fragility of all these inferences,” says Bareinboim. “We’re flipping results every couple of years.”

He argues that anyone asking “what if”—medical researchers setting up clinical trials, social scientists developing pilot programs, even web publishers preparing A/B tests—should start not merely by gathering data but by using Pearl’s causal logic and software like Bareinboim’s to determine whether the available data could possibly answer a causal hypothesis. Eventually, he envisions this leading to “automated scientist” software: a human could dream up a causal question to go after, and the software would combine causal inference theory with machine-learning techniques to rule out experiments that wouldn’t answer the question. That might save scientists from a huge number of costly dead ends.

Bareinboim described this vision while we were sitting in the lobby of MIT’s Sloan School of Management, after a talk he gave last fall. “We have a building here at MIT with, I don’t know, 200 people,” he said. How do those social scientists, or any scientists anywhere, decide which experiments to pursue and which data points to gather? By following their intuition: “They are trying to see where things will lead, based on their current understanding.”

That’s an inherently limited approach, he said, because human scientists designing an experiment can consider only a handful of variables in their minds at once. A computer, on the other hand, can see the interplay of hundreds or thousands of variables. Encoded with “the basic principles” of Pearl’s causal calculus and able to calculate what might happen with new sets of variables, an automated scientist could suggest exactly which experiments the human researchers should spend their time on. Maybe some public policy that has been shown to work only in Texas could be made to work in California if a few causally relevant factors were better appreciated. Scientists would no longer be “doing experiments in the darkness,” Bareinboim said.

He also doesn’t think it’s that far off: “This is the last mile before the victory.”

What if?

Finishing that mile will probably require techniques that are just beginning to be developed. For example, Yoshua Bengio, a computer scientist at the University of Montreal who shared the 2018 Turing Award for his work on deep learning, is trying to get neural networks—the software at the heart of deep learning—to do “meta-learning” and notice the causes of things.

As things stand now, if you wanted a neural network to detect when people are dancing, you’d show it many, many images of dancers. If you wanted it to identify when people are running, you’d show it many, many images of runners. The system would learn to distinguish runners from dancers by identifying features that tend to be different in the images, such as the positions of a person’s hands and arms. But Bengio points out that fundamental knowledge about the world can be gleaned by analyzing the things that are similar or “invariant” across data sets. Maybe a neural network could learn that movements of the legs physically cause both running and dancing. Maybe after seeing these examples and many others that show people only a few feet off the ground, a machine would eventually understand something about gravity and how it limits human movement. Over time, with enough meta-learning about variables that are consistent across data sets, a computer could gain causal knowledge that would be reusable in many domains.

For his part, Pearl says AI can’t be truly intelligent until it has a rich understanding of cause and effect. Although causal reasoning wouldn’t be sufficient for an artificial general intelligence, it’s necessary, he says, because it would enable the introspection that is at the core of cognition. “What if” questions “are the building blocks of science, of moral attitudes, of free will, of consciousness,” Pearl told me.

You can’t draw Pearl into predicting how long it will take for computers to get powerful causal reasoning abilities. “I am not a futurist,” he says. But in any case, he thinks the first move should be to develop machine-learning tools that combine data with available scientific knowledge: “We have a lot of knowledge that resides in the human skull which is not utilized.”

Brian Bergstein, a former editor at MIT Technology Review, is deputy opinion editor at the Boston Globe.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.