Skip to Content
Artificial intelligence

The technique that taught AI to play Go still can’t teach a car to drive

January 15, 2019

Reinforcement learning (RL), the category of machine learning that relies on penalties and rewards, can be a powerful technique for teaching machines to adapt to new environments.

Deepmind’s AlphaGo used it to defeat the world’s best Go player despite never having played him before. It has also shown promise in the creation of robots that can perform under changing conditions.

But the technique has its limitations. It requires a machine to blunder around as it slowly refines its actions over time. That’s fine in the lab, or when playing a board game. It’s less than ideal for applications, like self-driving cars, where a blunder could be fatal.

In response, researchers have developed different ways to circumvent the need for real-world training. A car can use traffic data to learn to drive in a safe digital replica of the physical world, for example, to get past its blundering stage without putting anyone in harm’s way.

But this isn’t a perfect solution. A machine might still make costly errors when it encounters situations beyond the scope of its training data. In one instance, researchers at New York University discovered a car had learned to make 90-degree turns into oncoming traffic (thankfully, within a simulation) because its training data set didn’t encompass those kinds of scenarios. Needless to say, this isn’t viable for safely training a self-driving car or, say, a robotic surgeon.

The same team at NYU andthe director of AI research at Facebook, Yann Lecun, are now proposing a new method that could overcome this problem. In addition to penalizing and rewarding a car for driving behavior, they also penalized it for straying into scenarios where it doesn’t have enough training data.

In essence, this forces the car to proceed more cautiously, explains Mikael Henaff, one of the authors of the study, rather than make wild turns and other maneuvers that place it squarely in unknown territory.

When they tested their new approach, they found that it was better than previous methods at getting the car to safely navigate dense traffic. It still wasn’t as good as human performance, though, so more work still needs to be done.

This story originally appeared in our AI newsletter The Algorithm. To read stories like this first, get The Algorithm delivered directly to your inbox. Subscribe here for free.

Deep Dive

Artificial intelligence

What does GPT-3 “know” about me? 

Large language models are trained on troves of personal data hoovered from the internet. So I wanted to know: What does it have on me?

DeepMind’s game-playing AI has beaten a 50-year-old record in computer science

The new version of AlphaZero discovered a faster way to do matrix multiplication, a core problem in computing that affects thousands of everyday computer tasks.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.