AI is learning how to create itself

Humans have struggled to make truly intelligent machines. Maybe we need to let them get on with it themselves.

Shuhua Xiong

Will Douglas Heaven archive page

May 27, 2021

A little stick figure with a wedge-shaped head shuffles across the screen. It moves in a half crouch, dragging one knee along the ground. It’s walking! Er, sort of.

Yet Rui Wang is delighted. “Every day I walk into my office and open my computer, and I don’t know what to expect,” he says.

An artificial-intelligence researcher at Uber, Wang likes to leave the Paired Open-Ended Trailblazer, a piece of software he helped develop, running on his laptop overnight. POET is a kind of training dojo for virtual bots. So far, they aren’t learning to do much at all. These AI agents are not playing Go, spotting signs of cancer, or folding proteins—they’re trying to navigate a crude cartoon landscape of fences and ravines without falling over.

But it’s not what the bots are learning that’s exciting—it’s how they’re learning. POET generates the obstacle courses, assesses the bots’ abilities, and assigns their next challenge, all without human involvement. Step by faltering step, the bots improve via trial and error. “At some point it might jump over a cliff like a kung fu master,” says Wang.

It may seem basic at the moment, but for Wang and a handful of other researchers, POET hints at a revolutionary new way to create supersmart machines: by getting AI to make itself.

Wang’s former colleague Jeff Clune is among the biggest boosters of this idea. Clune has been working on it for years, first at the University of Wyoming and then at Uber AI Labs, where he worked with Wang and others. Now dividing his time between the University of British Columbia and OpenAI, he has the backing of one of the world’s top artificial-intelligence labs.

Clune calls the attempt to build truly intelligent AI the most ambitious scientific quest in human history. Today, seven decades after serious efforts to make AI began, we’re still a long way from creating machines that are anywhere near as smart as humans, let alone smarter. Clune thinks POET might point to a shortcut.

“We need to take the shackles off and get out of our own way,” he says.

If Clune is right, using AI to make AI could be an important step on the road that one day leads to artificial general intelligence (AGI)—machines that can outthink humans. In the nearer term, the technique might also help us discover different kinds of intelligence: non-human smarts that can find solutions in unexpected ways and perhaps complement our own intelligence rather than replace it.

Mimicking evolution

I first spoke to Clune about the idea early last year, just a few weeks after his move to OpenAI. He was happy to discuss past work but remained tight-lipped on what he was doing with his new team. Instead of taking the call inside, he preferred to walk up and down the streets outside the offices as we talked.

All Clune would say was that OpenAI was a good fit. “My idea is very much in line with many of the things that they believe,” he says. “It was kind of a marriage made in heaven. They liked the vision and wanted me to come here and pursue it.” A few months after Clune joined, OpenAI hired most of his old Uber team as well.

Clune’s ambitious vision is grounded by more than OpenAI’s investment. The history of AI is filled with examples in which human-designed solutions gave way to machine-learned ones. Take computer vision: a decade ago, the big breakthrough in image recognition came when existing hand-crafted systems were replaced by ones that taught themselves from scratch. It’s the same for many AI successes.

One of the fascinating things about AI, and machine learning in particular, is its ability to find solutions that humans haven’t found—to surprise us. An oft-cited example is AlphaGo (and its successor AlphaZero), which beat the best humanity has to offer at the ancient, beguiling game of Go by employing seemingly alien strategies. After hundreds of years of study by human masters, AI found solutions no one had ever thought of.

Clune is now working with a team at OpenAI that developed bots that learned to play hide and seek in a virtual environment in 2018. These AIs started off with simple goals and simple tools to achieve them: one pair had to find the other, which could hide behind movable obstacles. Yet when these bots were let loose to learn, they soon found ways to take advantage of their environment in ways the researchers had not foreseen. They exploited glitches in the simulated physics of their virtual world to jump over and even pass through walls.

Those kinds of unexpected emergent behaviors offer tantalizing hints that AI might arrive at technical solutions humans would not think of by themselves, inventing new and more efficient types of algorithms or neural networks—or even ditching neural networks, a cornerstone of modern AI, entirely.

Clune likes to remind people that intelligence has already emerged from simple beginnings. “What’s interesting about this approach is that we know it can work,” he says. “The very simple algorithm of Darwinian evolution produced your brain, and your brain is the most intelligent learning algorithm in the universe that we know so far.” His point is that if intelligence as we know it resulted from the mindless mutation of genes over countless generations, why not seek to replicate the intelligence-producing process—which is arguably simpler—rather than intelligence itself?

But there’s another crucial observation here. Intelligence was never an endpoint for evolution, something to aim for. Instead, it emerged in many different forms from countless tiny solutions to challenges that allowed living things to survive and take on future challenges. Intelligence is the current high point in an ongoing and open-ended process. In this sense, evolution is quite different from algorithms the way people typically think of them—as means to an end.

It’s this open-endedness, glimpsed in the apparently aimless sequence of challenges generated by POET, that Clune and others believe could lead to new kinds of AI. For decades AI researchers have tried to build algorithms to mimic human intelligence, but the real breakthrough may come from building algorithms that try to mimic the open-ended problem-solving of evolution—and sitting back to watch what emerges.

Researchers are already using machine learning on itself, training it to find solutions to some of the field’s hardest problems, such as how to make machines that can learn more than one task at a time or cope with situations they have not encountered before. Some now think that taking this approach and running with it might be the best path to artificial general intelligence. “We could start an algorithm that initially does not have much intelligence inside it, and watch it bootstrap itself all the way up potentially to AGI,” Clune says.

The truth is that for now, AGI remains a fantasy. But that’s largely because nobody knows how to make it. Advances in AI are piecemeal and carried out by humans, with progress typically involving tweaks to existing techniques or algorithms, yielding incremental leaps in performance or accuracy. Clune characterizes these efforts as attempts to discover the building blocks for artificial intelligence without knowing what you’re looking for or how many blocks you’ll need. And that’s just the start. “At some point, we have to take on the Herculean task of putting them all together,” he says.

Asking AI to find and assemble those building blocks for us is a paradigm shift. It’s saying we want to create an intelligent machine, but we don’t care what it might look like—just give us whatever works.

Even if AGI is never achieved, the self-teaching approach may still change what sorts of AI are created. The world needs more than a very good Go player, says Clune. For him, creating a supersmart machine means building a system that invents its own challenges, solves them, and then invents new ones. POET is a tiny glimpse of this in action. Clune imagines a machine that teaches a bot to walk, then to play hopscotch, then maybe to play Go. “Then maybe it learns math puzzles and starts inventing its own challenges,” he says. “The system continuously innovates, and the sky’s the limit in terms of where it might go.”

It’s wild speculation, perhaps, but one hope is that machines like this might be able to evade our conceptual dead ends, helping us unpick vastly complex crises such as climate change or global health.

But first we have to make one.

How to create a brain

There are many different ways to wire up an artificial brain.

Neural networks are made from multiple layers of artificial neurons encoded in software. Each neuron can be connected to others in the layers above. The way a neural network is wired makes a big difference, and new architectures often lead to new breakthroughs.

The neural networks coded by human scientists are often the result of trial and error. There is little theory to what does and doesn’t work, and no guarantee that the best designs have been found. That’s why automating the hunt for better neural-network designs has been one of the hottest topics in AI since at least the 1980s. The most common way to automate the process is to let an AI generate many possible network designs, and let the network automatically try each of them and choose the best ones. This is commonly known as neuro-evolution or neural architecture search (NAS).

In the last few years, these machine designs have started to outstrip human ones. In 2018, Esteban Real and his colleagues at Google used NAS to generate a neural network for image recognition that beat the best human-designed networks at the time. That was an eye-opener.

The 2018 system is part of an ongoing Google project called AutoML, which has also used NAS to produce EfficientNets, a family of deep-learning models that are more efficient than human-designed ones , achieving high levels of accuracy on image-recognition tasks with smaller, faster models.

Three years on, Real is pushing the boundaries of what can be generated from scratch. The earlier systems just rearranged tried and tested neural-network pieces, such as existing types of layers or components. “We could expect a good answer,” he says.

Last year Real and his team took the training wheels off. The new system, called AutoML Zero, tries to build an AI from the ground up using nothing but the most basic mathematical concepts that govern machine learning.

Amazingly, not only did AutoML Zero spontaneously build a neural network, but it came up with gradient descent, the most common mathematical technique that human designers use to train a network. “I was quite surprised,” says Real. “It’s a very simple algorithm—it takes like six lines of code—but it wrote the exact six lines.”

AutoML Zero is not yet generating architectures that rival the performance of human-designed systems—or indeed doing much that a human designer would not have done. But Real believes it could one day.

Time to train a new kind of teacher

First you make a brain; then you have to teach it. But machine brains don’t learn the way ours do. Our brains are fantastic at adapting to new environments and new tasks. Today’s AIs can solve challenges under certain conditions but fail when those conditions change even a little. This inflexibility is hampering the quest to create more generalizable AI that can be useful across a wide range of scenarios, which would be a big step toward making them truly intelligent.

For Jane Wang, a researcher at DeepMind in London, the best way to make AI more flexible is to get it to learn that trait itself. In other words, she wants to build an AI that not only learns specific tasks but learns to learn those tasks in ways that can be adapted to fresh situations.

Researchers have been trying to make AI more adaptable for years. Wang thinks that getting AI to work through this problem for itself avoids some of the trial and error of a hand-designed approach: “We can’t possibly expect to stumble upon the right answer right away.” In the process, she hopes, we will also learn more about how brains work. “There’s still so much we don’t understand about the way that humans and animals learn,” she says.

There are two main approaches to generating learning algorithms automatically, but both start with an existing neural network and use AI to teach it.

The first approach, invented separately by Wang and her colleagues at DeepMind and by a team at OpenAI at around the same time, uses recurrent neural networks. This type of network can be trained in such a way that the activations of their neurons—roughly akin to the firing of neurons in biological brains—encode any type of algorithm. DeepMind and OpenAI took advantage of this to train a recurrent neural network to generate reinforcement-learning algorithms, which tell an AI how to behave to achieve given goals.

The upshot is that the DeepMind and OpenAI systems do not learn an algorithm that solve a specific challenge, such as recognizing images, but learn a learning algorithm that can be applied to multiple tasks and adapt as it goes. It’s like the old adage about teaching someone to fish: whereas a hand-designed algorithm can learn a particular task, these AIs are being made to learn how to learn by themselves. And some of them are performing better than human-designed ones.

The second approach comes from Chelsea Finn at the University of California, Berkeley, and her colleagues. Called model-agnostic meta-learning, or MAML, it trains a model using two machine-learning processes, one nested inside the other.

Roughly, here’s how it works. The inner process in MAML is trained on data and then tested—as usual. But then the outer model takes the performance of the inner model—how well it identifies images, say—and uses it to learn how to adjust that model’s learning algorithm to boost performance. It’s as if you had an school inspector watching over a bunch of teachers, each offering different learning techniques. The inspector checks which techniques help the students get the best scores and tweaks them accordingly.

Through these approaches, researchers are building AI that is more robust, more generalized, and able to learn faster with less data. For example, Finn wants a robot that has learned to walk on flat ground to be able to transition, with minimal extra training, to walking on a slope or on grass or while carrying a load.

Last year, Clune and his colleagues extended Finn’s technique to design an algorithm that learns using fewer neurons so that it does not overwrite everything it has learned previously, a big unsolved problem in machine learning known as catastrophic forgetting. A trained model that uses fewer neurons, known as a “sparse” model, will have more unused neurons left over to dedicate to new tasks when retrained, which means that fewer of the “used” neurons will get overwritten. Clune found that setting his AI the challenge of learning more than one task led it to come up with its own version of a sparse model that outperformed human-designed ones.

If we’re going all in on letting AI create and teach itself, then AIs should generate their own training environments, too—the schools and textbooks, as well as the lesson plans.

And the past year has seen a raft of projects in which AI has been trained on automatically generated data. Face-recognition systems are being trained with AI-generated faces, for example. AIs are also learning how to train each other. In one recent example, two robot arms worked together, with one arm learning to set tougher and tougher block-stacking challenges that trained the other to grip and grasp objects.

In fact, Clune wonders if human intuition about what kind of data an AI needs in order to learn may be off. For example, he and his colleagues have developed what he calls generative teaching networks, which learn what data they should generate to get the best results when training a model. In one experiment, he used one of these networks to adapt a data set of handwritten numbers that’s often used to train image-recognition algorithms. What it came up with looked very different from the original human-curated data set: hundreds of not-quite digits, such as the top half of the figure seven or what looked like two digits merged together. Some AI-generated examples were hard to decipher at all. Despite this, the AI-generated data still did a great job at training the handwriting recognition system to identify actual digits.

Don’t try to succeed

AI-generated data is still just a part of the puzzle. The long-term vision is to take all these techniques—and others not yet invented—and hand them over to an AI trainer that controls how artificial brains are wired, how they are trained, and what they are trained on. Even Clune is not clear on what such a future system would look like. Sometimes he talks about a kind of hyper-realistic simulated sandbox, where AIs can cut their teeth and skin their virtual knees. Something that complex is still years away. The closest thing yet is POET, the system Clune created with Uber’s Rui Wang and others.

POET was motivated by a paradox, says Wang. If you try to solve a problem you’ll fail; if you don’t try to solve it you’re more likely to succeed. This is one of the insights Clune takes from his analogy with evolution—amazing results that emerge from an apparently random process often cannot be re-created by taking deliberate steps toward the same end. There’s no doubt that butterflies exist, but rewind to their single-celled precursors and try to create them from scratch by choosing each step from bacterium to bug, and you’d likely fail.

POET starts its two-legged agent off in a simple environment, such as a flat path without obstacles. At first the agent doesn’t know what to do with its legs and cannot walk. But through trial and error, the reinforcement-learning algorithm controlling it learns how to move along flat ground. POET then generates a new random environment that’s different, but not necessarily harder to move in. The agent tries walking there. If there are obstacles in this new environment, the agent learns how to get over or across those. Every time an agent succeeds or gets stuck, it is moved to a new environment. Over time, the agents learn a range of walking and jumping actions that let them navigate harder and harder obstacle courses.

The team found that random switching of environments was essential.

For example, agents sometimes learned to walk on flat ground with a weird, half-kneeling shuffle, because that was good enough. “They never learn to stand up because they never need to,” says Wang. But after they had been forced to learn alternative strategies on obstacle-strewn ground, they could return to the early stage with a better way of walking—using both legs instead of dragging one behind, say—and then take that improved version of itself forward to harder challenges.

POET trains its bots in a way that no human would—it takes erratic, unintuitive paths to success. At each stage, the bots try to figure out a solution to whatever challenge they are presented with. By coping with a random selection of obstacles thrown their way, they get better overall. But there is no end point to this process, no ultimate test to pass or high score to beat.

Clune, Wang, and a number of their colleagues believe this is a profound insight. They are now exploring what it might mean for the development of supersmart machines. Could trying not to chart a specific path actually be a key breakthrough on the way to artificial general intelligence?

POET is already inspiring other researchers, such as Natasha Jaques and Michael Dennis at the University of California, Berkeley. They’ve developed a system called PAIRED that uses AI to generate a series of mazes to train another AI to navigate them.

Rui Wang thinks human-designed challenges are going to be a bottleneck and that real progress in AI will require AI to come up with its own. “No matter how good algorithms are today, they are always tested on some hand-designed benchmark,” he says. “It’s very hard to imagine artificial general intelligence coming from this, because it is bound by fixed goals.”

A new kind of intelligence

The rapid development of AI that can train itself also raises questions about how well we can control its growth. The idea of AI that builds better AI is a crucial part of the myth-making behind the “Singularity,” the imagined point in the future when AIs start to improve at an exponential rate and move beyond our control. Eventually, certain doomsayers warn, AI might decide it doesn’t need humans at all.

That’s not what any of these researchers have in mind: their work is very much focused on making today’s AI better. Machines that run amok remain a far-off anti-fantasy.

Even so, DeepMind’s Jane Wang has reservations. A big part of the attraction of using AI to make AI is that it can come up with designs and techniques that people hadn’t thought of. Yet Wang notes that not all surprises are good surprises: “Open-endedness is, by definition, something that’s unexpected.” If the whole idea is to get AI to do something you didn’t anticipate, it becomes harder to control. “That’s both exciting and scary,” she says.

Clune also stresses the importance of thinking about the ethics of the new technology from the start. There is a good chance that AI-designed neural networks and algorithms will be even harder to understand than today’s already opaque black-box systems. Are AIs generated by algorithms harder to audit for bias? Is it harder to guarantee that they will not behave in undesirable ways?

Clune hopes such questions will be asked and answered as more people realize the potential of self-generating AIs. “Most people in the machine-learning community don’t ever really talk about our overall path to extremely powerful AI,” he says—instead, they tend to focus on small, incremental improvements. Clune wants to start a conversation about the field’s biggest ambitions again.

His own ambitions tie back into his early interests in human intelligence and how it evolved. His grand vision is to set things up so that machines might one day see their own intelligence—or intelligences—emerge and improve through countless generations of trial and error, guided by algorithms with no ultimate blueprint in mind.

If AI starts to generate intelligence by itself, there’s no guarantee that it will be human-like. Rather than humans teaching machines to think like humans, machines might teach humans new ways of thinking.

“There’s probably a vast number of different ways to be very intelligent,” says Clune. “One of the things that excite me about AI is that we might come to understand intelligence more generally, by seeing what variation is possible.

“I think that’s fascinating. I mean, it’s almost like inventing interstellar travel and being able to go visit alien cultures. There would be no greater moment in the history of humankind than encountering an alien race and learning about its culture, its science, everything. Interstellar travel is exceedingly difficult, but we have the ability to potentially create alien intelligences digitally.”

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

Will Douglas Heavenarchive page

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Will Douglas Heavenarchive page

What’s next for generative video

OpenAI's Sora has raised the bar for AI moviemaking. Here are four things to bear in mind as we wrap our heads around what's coming.

Will Douglas Heavenarchive page

The AI Act is done. Here’s what will (and won’t) change

The hard work starts now.

Melissa Heikkiläarchive page

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

AI is learning how to create itself

Mimicking evolution

How to create a brain

Time to train a new kind of teacher

Don’t try to succeed

A new kind of intelligence

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

What’s next for generative video

The AI Act is done. Here’s what will (and won’t) change

Stay connected

Get the latest updates from
MIT Technology Review

The latest iteration of a legacy

Advertise with MIT Technology Review

About

Help

Mimicking evolution

How to create a brain

Time to train a new kind of teacher

Don’t try to succeed

A new kind of intelligence

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

What’s next for generative video

The AI Act is done. Here’s what will (and won’t) change

Stay connected

Get the latest updates fromMIT Technology Review

Get the latest updates from
MIT Technology Review