Skip to Content
Artificial intelligence

The technique that taught AI to play Go still can’t teach a car to drive

January 15, 2019

Reinforcement learning (RL), the category of machine learning that relies on penalties and rewards, can be a powerful technique for teaching machines to adapt to new environments.

Deepmind’s AlphaGo used it to defeat the world’s best Go player despite never having played him before. It has also shown promise in the creation of robots that can perform under changing conditions.

But the technique has its limitations. It requires a machine to blunder around as it slowly refines its actions over time. That’s fine in the lab, or when playing a board game. It’s less than ideal for applications, like self-driving cars, where a blunder could be fatal.

In response, researchers have developed different ways to circumvent the need for real-world training. A car can use traffic data to learn to drive in a safe digital replica of the physical world, for example, to get past its blundering stage without putting anyone in harm’s way.

But this isn’t a perfect solution. A machine might still make costly errors when it encounters situations beyond the scope of its training data. In one instance, researchers at New York University discovered a car had learned to make 90-degree turns into oncoming traffic (thankfully, within a simulation) because its training data set didn’t encompass those kinds of scenarios. Needless to say, this isn’t viable for safely training a self-driving car or, say, a robotic surgeon.

The same team at NYU andthe director of AI research at Facebook, Yann Lecun, are now proposing a new method that could overcome this problem. In addition to penalizing and rewarding a car for driving behavior, they also penalized it for straying into scenarios where it doesn’t have enough training data.

In essence, this forces the car to proceed more cautiously, explains Mikael Henaff, one of the authors of the study, rather than make wild turns and other maneuvers that place it squarely in unknown territory.

When they tested their new approach, they found that it was better than previous methods at getting the car to safely navigate dense traffic. It still wasn’t as good as human performance, though, so more work still needs to be done.

This story originally appeared in our AI newsletter The Algorithm. To read stories like this first, get The Algorithm delivered directly to your inbox. Subscribe here for free.

Deep Dive

Artificial intelligence

A Roomba recorded a woman on the toilet. How did screenshots end up on Facebook?

Robot vacuum companies say your images are safe, but a sprawling global supply chain for data from our devices creates risk.

The viral AI avatar app Lensa undressed me—without my consent

My avatars were cartoonishly pornified, while my male colleagues got to be astronauts, explorers, and inventors.

Roomba testers feel misled after intimate images ended up on Facebook

An MIT Technology Review investigation recently revealed how images of a minor and a tester on the toilet ended up on social media. iRobot said it had consent to collect this kind of data from inside homes—but participants say otherwise.

How to spot AI-generated text

The internet is increasingly awash with text written by AI software. We need new tools to detect it.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.