How to know if artificial intelligence is about to destroy civilization

These canaries in the coal mines of AI would be signs that superintelligent robot overlords are approaching

Oren Etzioniarchive page

February 25, 2020

mechanical canaries illustrationMs Tech

Could we wake up one morning dumbstruck that a super-powerful AI has emerged, with disastrous consequences? Books like Superintelligence by Nick Bostrom and Life 3.0 by Max Tegmark, as well as more recent articles, argue that malevolent superintelligence is an existential risk for humanity.

But one can speculate endlessly. It’s better to ask a more concrete, empirical question: What would alert us that superintelligence is indeed around the corner?

We might call such harbingers canaries in the coal mines of AI. If an artificial-intelligence program develops a fundamental new capability, that’s the equivalent of a canary collapsing: an early warning of AI breakthroughs on the horizon.

Could the famous Turing test serve as a canary? The test, invented by Alan Turing in 1950, posits that human-level AI will be achieved when a person can’t distinguish conversing with a human from conversing with a computer. It’s an important test, but it’s not a canary; it is, rather, the sign that human-level AI has already arrived. Many computer scientists believe that if that moment does arrive, superintelligence will quickly follow. We need more intermediate milestones.

Is AI’s performance in games such as Go, poker or Quake 3, a canary? It is not. The bulk of so-called artificial intelligence in these games is actually human work to frame the problem and design the solution. AlphaGo’s victory over human Go champions was a credit to the talented human team at DeepMind, not to the machine, which merely ran the algorithm the people had created. This explains why it takes years of hard work to translate AI success from one narrow challenge to the next. Even AlphaZero, which learned to play world-class Go in a few hours, hasn’t substantially broadened its scope since 2017. Methods such as deep learning are general, but their successful application to a particular task requires extensive human intervention.

More broadly, machine learning is at the core of AI’s successes over the last decade or so. Yet the term “machine learning” is a misnomer. Machines possess only a narrow sliver of humans’ rich and versatile learning abilities. To say that machines learn is like saying that baby penguins know how to fish. The reality is, adult penguins swim, capture fish, digest it, regurgitate into their beaks, and place morsels into their children’s mouths. AI is likewise being spoon-fed by human scientists and engineers.

In contrast to machine learning, human learning maps a personal motivation (“I want to drive to be independent of my parents”) to a strategic learning plan (“Take driver’s ed and practice on weekends”). A human formulates specific learning targets (“Get better at parallel parking”), collects and labels data (“The angle was wrong this time”), and incorporates external feedback and background knowledge (“The instructor explained how to use the side mirrors”). Humans identify, frame, and shape learning problems. None of these human abilities is even remotely replicated by machines. Machines can perform superhuman statistical calculations, but that is merely the last mile of learning.

Machines can perform superhuman statistical calculations, but that is merely the last mile of learning.

The automatic formulation of learning problems, then, is our first canary. It does not appear to be anywhere close to dying.

Self-driving cars are a second canary. They are further in the future than anticipated by boosters like Elon Musk. AI can fail catastrophically in atypical situations, like when a person in a wheelchair is crossing the street. Driving is far more challenging than previous AI tasks because it requires making life-critical, real-time decisions based on both the unpredictable physical world and interaction with human drivers, pedestrians, and others. Of course, we should deploy limited self-driving cars once they reduce accident rates, but only when human-level driving is achieved can this canary be said to have keeled over.

AI doctors are a third canary. AI can already analyze medical images with superhuman accuracy, but that is only a narrow slice of a human doctor’s job. An AI doctor would have to interview patients, consider complications, consult other doctors, and more. These are challenging tasks that require understanding people, language, and medicine. Such a doctor would not have to fool a patient into thinking it is human—that’s why this is different from the Turing test. But it would have to approximate the abilities of human doctors across a wide range of tasks and unanticipated circumstances.

And though the Turing test itself is not a good canary, limited versions of the test could serve as canaries. Existing AIs are unable to understand people and their motivations, or even basic physical questions like “Will a jumbo jet fit through a window?” We can administer a partial Turing test by conversing with an AI like Alexa or Google Home for a few minutes, which quickly exposes their limited understanding of language and the world. Consider a very simple example based on the Winograd schemas proposed by computer scientist Hector Levesque. I said to Alexa: “My trophy doesn’t fit into my carry-on because it is too large. What should I do?” Alexa’s answer was “I don’t know that one.” Since Alexa can’t reason about sizes of objects, it can’t decide whether “it” refers to the trophy or to the carry-on. When AI can’t understand the meaning of “it,” it’s hard to believe it is poised to take over the world. If Alexa were able to have a substantive dialogue on a rich topic, that would be a fourth canary.

Current AIs are idiots savants: successful on narrow tasks, such as playing Go or categorizing MRI images, but lacking the generality and versatility of humans. Each idiot savant is constructed manually and separately, and we are decades away from the versatile abilities of a five-year-old child. The canaries I propose, in contrast, indicate inflection points for the field of AI.

Some theorists, like Bostrom, argue that we must nonetheless plan for very low-probability but high-consequence events as though they were inevitable. The consequences, they say, are so profound that our estimates of their likelihood aren’t important. This is a silly argument: it can be used to justify just about anything. It is a modern-day version of the argument by the 17th-century philosopher Blaise Pascal that it is worth acting as if a Christian God exists because otherwise you are at risk of an everlasting hell. He used the infinite cost of an error to argue that a particular course of action is “rational” even if it is based on a highly improbable premise. But arguments based on infinite costs can support contradictory beliefs. For instance, consider an anti-Christian God who promises everlasting hell for every Christian act. That’s highly improbable as well; from a logical point of view, though, it is just as reasonable a wager as believing in the god of the Bible. This contradiction shows a flaw in arguments based on infinite costs.

My catalogue of early warning signals, or canaries, is illustrative rather than comprehensive, but it shows how far we are from human-level AI. If and when a canary “collapses,” we will have ample time before the emergence of human-level AI to design robust “off-switches” and to identify red lines we don’t want AI to cross. AI eschatology without empirical canaries is a distraction from addressing existing issues like how to regulate AI’s impact on employment or ensure that its use in criminal sentencing or credit scoring doesn’t discriminate against certain groups.

As Andrew Ng, one of the world’s most prominent AI experts, has said, “Worrying about AI turning evil is a little bit like worrying about overpopulation on Mars.” Until the canaries start dying, he is entirely correct.

Oren Etzioni is the CEO of the nonprofit Allen Institute for AI, and a professor of computer science at the University of Washington.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.