We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

  1. Reinforcement Learning

    By experimenting, computers are figuring out how to do things that no programmer could teach them.

  2. Reversing Paralysis

    Scientists are making remarkable progress at using brain implants to restore the freedom of movement that spinal cord injuries take away.

  3. Self-Driving Trucks

    Tractor-trailers without a human at the wheel will soon barrel onto highways near you. What will this mean for the nation’s 1.7 million truck drivers?

  4. Paying with Your Face

    Face-detecting systems in China now authorize payments, provide access to facilities, and track down criminals. Will other countries follow?

  5. Practical Quantum Computers

    Advances at Google, Intel, and several research groups indicate that computers with previously unimaginable power are finally within reach.

  6. The 360-Degree Selfie

    Inexpensive cameras that make spherical images are opening a new era in photography and changing the way people share stories.

  7. Hot Solar Cells

    By converting heat to focused beams of light, a new solar device could create cheap and continuous power.

  8. Gene Therapy 2.0

    Scientists have solved fundamental problems that were holding back cures for rare hereditary disorders. Next we’ll see if the same approach can take on cancer, heart disease, and other common illnesses.

  9. The Cell Atlas

    Biology’s next mega-project will find out what we’re really made of.

  10. Botnets of Things

    The relentless push to add connectivity to home gadgets is creating dangerous side effects that figure to get even worse.

Reinforcement Learning

By experimenting, computers are figuring out how to do things that no programmer could teach them.

Availability: 1 to 2 years

  • by Will Knight
  • Inside a simple computer simulation, a group of self-driving cars are performing a crazy-looking maneuver on a four-lane virtual highway. Half are trying to move from the right-hand lanes just as the other half try to merge from the left. It seems like just the sort of tricky thing that might flummox a robot vehicle, but they manage it with precision.

    I’m watching the driving simulation at the biggest artificial-intelligence conference of the year, held in Barcelona this past December. What’s most amazing is that the software governing the cars’ behavior wasn’t programmed in the conventional sense at all. It learned how to merge, slickly and safely, simply by practicing. During training, the control software performed the maneuver over and over, altering its instructions a little with each attempt. Most of the time the merging happened way too slowly and cars interfered with each other. But whenever the merge went smoothly, the system would learn to favor the behavior that led up to it.

    This approach, known as reinforcement learning, is largely how AlphaGo, a computer developed by a subsidiary of Alphabet called DeepMind, mastered the impossibly complex board game Go and beat one of the best human players in the world in a high-profile match last year. Now reinforcement learning may soon inject greater intelligence into much more than games. In addition to improving self-driving cars, the technology can get a robot to grasp objects it has never seen before, and it can figure out the optimal configuration for the equipment in a data center.

    This story is part of our March/April 2017 Issue
    See the rest of the issue

    Reinforcement learning copies a very simple principle from nature. The psychologist Edward Thorndike documented it more than 100 years ago. Thorndike placed cats inside boxes from which they could escape only by pressing a lever. After a considerable amount of pacing around and meowing, the animals would eventually step on the lever by chance. After they learned to associate this behavior with the desired outcome, they eventually escaped with increasing speed.

    Some of the very earliest artificial-intelligence researchers believed that this process might be usefully reproduced in machines. In 1951, Marvin Minsky, a student at Harvard who would become one of the founding fathers of AI as a professor at MIT, built a machine that used a simple form of reinforcement learning to mimic a rat learning to navigate a maze. Minsky’s Stochastic Neural Analogy Reinforcement Computer, or SNARC, consisted of dozens of tubes, motors, and clutches that simulated the behavior of 40 neurons and synapses. As a simulated rat made its way out of a virtual maze, the strength of some synaptic connections would increase, thereby reinforcing the underlying behavior.

    There were few successes over the next few decades. In 1992, Gerald Tesauro, a researcher at IBM, demonstrated a program that used the technique to play backgammon. It became skilled enough to rival the best human players, a landmark achievement in AI. But reinforcement learning proved difficult to scale to more complex problems. “People thought it was a cool idea that didn’t really work,” says David Silver, a researcher at DeepMind in the U.K. and a leading proponent of reinforcement learning today.

    That view changed dramatically in March 2016, however. That’s when AlphaGo, a program trained using reinforcement learning, destroyed one of the best Go players of all time, South Korea’s Lee Sedol. The feat was astonishing, because it is virtually impossible to build a good Go-playing program with conventional programming. Not only is the game extremely complex, but even accomplished Go players may struggle to say why certain moves are good or bad, so the principles of the game are difficult to write into code. Most AI researchers had expected that it would take a decade for a computer to play the game as well as an expert human.

    Reinforcement Learning
    • Breakthrough An approach to artificial intelligence that gets computers to learn like people, without explicit instruction.
    • Why It Matters Progress in self-­driving cars and other forms of automation will slow dramatically unless machines can hone skills through experience.
    • Key Players - DeepMind
      - Mobileye
      - OpenAI
      - Google
      - Uber
    • Availability 1 to 2 years

    Jostling for position

    Silver, a mild-mannered Brit who became fascinated with artificial intelligence as an undergraduate at the University of Cambridge, explains why reinforcement learning has recently become so formidable. He says that the key is combining it with deep learning, a technique that involves using a very large simulated neural network to recognize patterns in data (see “10 Breakthrough Technologies 2013: Deep Learning”).

    Reinforcement learning works because researchers figured out how to get a computer to calculate the value that should be assigned to, say, each right or wrong turn that a rat might make on its way out of its maze. Each value is stored in a large table, and the computer updates all these values as it learns. For large and complicated tasks, this becomes computationally impractical. In recent years, however, deep learning has proved an extremely efficient way to recognize patterns in data, whether the data refers to the turns in a maze, the positions on a Go board, or the pixels shown on screen during a computer game.

    In fact, it was in games that DeepMind made its name. In 2013 it published details of a program capable of learning to play various Atari video games at a superhuman level, leading Google to acquire the company for more than $500 million in 2014. These and other feats have in turn inspired other researchers and companies to turn to reinforcement learning. A number of industrial-robot makers are testing the approach as a way to train their machines to perform new tasks without manual programming. And researchers at Google, also an Alphabet subsidiary, worked with DeepMind to use deep reinforcement learning to make its data centers more energy efficient. It is difficult to figure out how all the elements in a data center will affect energy usage, but a reinforcement-learning algorithm can learn from collated data and experiment in simulation to suggest, say, how and when to operate the cooling systems.

    These images are from the Mobileye vision system for cars, which will benefit from reinforcement learning.

    But the setting where you will probably most notice this software’s remarkably humanlike behavior is in self-driving cars. Today’s driverless vehicles often falter in complex situations that involve interacting with human drivers, such as traffic circles or four-way stops. If we don’t want them to take unnecessary risks, or to clog the roads by being overly hesitant, they will need to acquire more nuanced driving skills, like jostling for position in a crowd of other cars.

    The highway merging software was demoed in Barcelona by Mobileye, an Israeli automotive company that makes vehicle safety systems used by dozens of carmakers, including Tesla Motors (see “50 Smartest Companies 2016”). After screening the merging clip, Shai Shalev-Shwartz, Mobileye’s vice president for technology, shows some of the challenges self-driving cars will face: a bustling roundabout in Jerusalem; a frenetic intersection in Paris; and a hellishly chaotic scene from a road in India. “If a self-driving car follows the law precisely, then during rush hour I might wait in a merge situation for an hour,” Shalev-Shwartz says.

    Mobileye plans to test the software on a fleet of vehicles in collaboration with BMW and Intel later this year. Both Google and Uber say they are also testing reinforcement learning for their self-driving vehicles.

    Reinforcement learning led to AlphaGo’s stunning victory over a human Go champion last year.

    Reinforcement learning is being applied in a growing number of areas, says Emma Brunskill, an assistant professor at Stanford University who specializes in the approach. But she says it is well suited to automated driving because it enables “good sequences of decisions.” Progress would proceed much more slowly if programmers had to encode all such decisions into cars in advance.

    But there are challenges to overcome, too. Andrew Ng, chief scientist at the Chinese company Baidu, warns that the approach requires a huge amount of  data, and that many of its successes have come when a computer could practice relentlessly in simulations. Indeed, researchers are still figuring out just how to make reinforcement learning work in complex situations in which there is more than one objective. Mobileye has had to tweak its protocols so a self-driving car that is adept at avoiding accidents won’t be more likely to cause one for someone else.

    When you watch the outlandish merging demo, it looks as though the company has succeeded, at least so far. But later this year, perhaps on a highway near you, reinforcement learning will get its most dramatic and important tests to date. 

    Keep up with the latest in machine learning at EmTech Digital.
    Don't be left behind.

    March 25-26, 2019
    San Francisco, CA

    Register now
    Reinforcement learning led to AlphaGo’s stunning victory over a human Go champion last year.
    Next in 10 Breakthrough Technologies 2017
    Want more award-winning journalism? Subscribe to Insider Plus.
    • Insider Plus {! insider.prices.plus !}*

      {! insider.display.menuOptionsLabel !}

      Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

      See details+

      Print + Digital Magazine (6 bi-monthly issues)

      Unlimited online access including all articles, multimedia, and more

      The Download newsletter with top tech stories delivered daily to your inbox

      Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

      10% Discount to MIT Technology Review events and MIT Press

      Ad-free website experience

    You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.