Skip to Content
Artificial intelligence

The technique that taught AI to play Go still can’t teach a car to drive

January 15, 2019

Reinforcement learning (RL), the category of machine learning that relies on penalties and rewards, can be a powerful technique for teaching machines to adapt to new environments.

Deepmind’s AlphaGo used it to defeat the world’s best Go player despite never having played him before. It has also shown promise in the creation of robots that can perform under changing conditions.

But the technique has its limitations. It requires a machine to blunder around as it slowly refines its actions over time. That’s fine in the lab, or when playing a board game. It’s less than ideal for applications, like self-driving cars, where a blunder could be fatal.

In response, researchers have developed different ways to circumvent the need for real-world training. A car can use traffic data to learn to drive in a safe digital replica of the physical world, for example, to get past its blundering stage without putting anyone in harm’s way.

But this isn’t a perfect solution. A machine might still make costly errors when it encounters situations beyond the scope of its training data. In one instance, researchers at New York University discovered a car had learned to make 90-degree turns into oncoming traffic (thankfully, within a simulation) because its training data set didn’t encompass those kinds of scenarios. Needless to say, this isn’t viable for safely training a self-driving car or, say, a robotic surgeon.

The same team at NYU andthe director of AI research at Facebook, Yann Lecun, are now proposing a new method that could overcome this problem. In addition to penalizing and rewarding a car for driving behavior, they also penalized it for straying into scenarios where it doesn’t have enough training data.

In essence, this forces the car to proceed more cautiously, explains Mikael Henaff, one of the authors of the study, rather than make wild turns and other maneuvers that place it squarely in unknown territory.

When they tested their new approach, they found that it was better than previous methods at getting the car to safely navigate dense traffic. It still wasn’t as good as human performance, though, so more work still needs to be done.

This story originally appeared in our AI newsletter The Algorithm. To read stories like this first, get The Algorithm delivered directly to your inbox. Subscribe here for free.

Deep Dive

Artificial intelligence

Geoffrey Hinton tells us why he’s now scared of the tech he helped build

“I have suddenly switched my views on whether these things are going to be more intelligent than us.”

ChatGPT is going to change education, not destroy it

The narrative around cheating students doesn’t tell the whole story. Meet the teachers who think generative AI could actually make learning better.

Deep learning pioneer Geoffrey Hinton has quit Google

Hinton will be speaking at EmTech Digital on Wednesday.

The future of generative AI is niche, not generalized

ChatGPT has sparked speculation about artificial general intelligence. But the next real phase of AI will be in specific domains and contexts.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.