In 2013 a British artificial-intelligence startup called DeepMind surprised computer scientists by showing off software that could learn to play classic Atari games better than an expert human player. DeepMind was soon acquired by Google, and the technique that beat the Atari games, reinforcement learning, has become a hot topic in the field of AI and robotics. Google used reinforcement learning to create software that beat a champion Go player last year.
Now OpenAI, a nonprofit research institute cofounded and funded by Elon Musk, says it has discovered that an easier-to-use alternative to reinforcement learning can get rival results when it plays games and performs other tasks. At MIT Technology Review’s EmTech Digital conference in San Francisco on Monday, OpenAI’s research director, Ilya Sutskever, said that could allow researchers to make progress in machine learning faster.
“It’s competitive with today’s reinforcement-learning algorithms on standard benchmarks,” said Sutskever. “It is surprising that something so simple actually works.”
Sutskever argues that finding new ways to have software learn to do things like play computer games or steer robots is important to making machine-learning software take on more complex tasks than just recognizing images or transcribing our speech. “If we have computer systems learn to take complicated actions in the world, then I think we would be comfortable calling them intelligent,” he said.
Sutskever and colleagues tested their approach, called evolution strategies, by building software that learned to play more than 50 Atari games, including Pong and Centipede. Because it is easier to scale up the new method across multiple processors, in one hour they could train artificial players comparable to those that took a day to produce using a reinforcement-learning system published by Google DeepMind last year. It showed the same ability to learn things like the need to surface for air in the game Seaquest (middle frame in the animation).
Evolution strategies showed a similar advantage when used to take on a standard test from robotics in which software has to figure out how to make a humanoid walk in a simulated environment. It took 10 minutes to achieve results that a state-of-the-art reinforcement-learning system would need about 10 hours to attain, the researchers say.
The technique is a reboot of a decades-old idea about how to get learning software to try out different actions and identify the most effective ones. It is loosely inspired by how natural selection causes biological organisms to adapt to their environments.
“An algorithm everybody has known about for a long time works better than most people thought,” said Sutskever.
He declined to suggest specific applications of AI that might get a boost from the evolution strategies technique, saying more research is needed on its strengths and limitations. But Sutskever said that comparing the method with reinforcement learning suggested it would be better at learning to perform more complex tasks that require more steps to get a result.
For that reason, Sutskever said, he believes evolution strategies will help OpenAI’s goal of creating what he calls artificial general intelligence—software that can adapt to many kinds of complex scenarios.
Most researchers in machine learning don’t talk much about general intelligence, instead pursuing progress on specific, often narrowly focused problems. OpenAI’s mission statement includes a commitment to creating artificial general intelligence. Sutskever said the pace of progress in machine learning means that goal is worth thinking about now.
“[It] seems far off right now but [was] way more far off five years ago,” he said. “The number of people and the amount of effort going into developing these algorithms is extremely high—things are moving forward at a very healthy pace.”
We won’t know how bad omicron is for another month
Gene sequencing gave an early alert about the latest covid variant. But we'll only know if omicron is a problem by watching it spread.
The US crackdown on Chinese economic espionage is a mess. We have the data to show it.
The US government’s China Initiative sought to protect national security. In the most comprehensive analysis of cases to date, MIT Technology Review reveals how far it has strayed from its goals.
Why blanket travel bans won’t work to stop omicron
The aim was to stop the variant's spread, but these bans look like too little, too late.
Eight ways scientists are unwrapping the mysteries of the human brain
Optogenetics and advanced imaging have helped neuroscientists understand how memories form and made it possible to manipulate them.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.