Reinforcement learning (RL), the category of machine learning that relies on penalties and rewards, can be a powerful technique for teaching machines to adapt to new environments.
Deepmind’s AlphaGo used it to defeat the world’s best Go player despite never having played him before. It has also shown promise in the creation of robots that can perform under changing conditions.
But the technique has its limitations. It requires a machine to blunder around as it slowly refines its actions over time. That’s fine in the lab, or when playing a board game. It’s less than ideal for applications, like self-driving cars, where a blunder could be fatal.
In response, researchers have developed different ways to circumvent the need for real-world training. A car can use traffic data to learn to drive in a safe digital replica of the physical world, for example, to get past its blundering stage without putting anyone in harm’s way.
But this isn’t a perfect solution. A machine might still make costly errors when it encounters situations beyond the scope of its training data. In one instance, researchers at New York University discovered a car had learned to make 90-degree turns into oncoming traffic (thankfully, within a simulation) because its training data set didn’t encompass those kinds of scenarios. Needless to say, this isn’t viable for safely training a self-driving car or, say, a robotic surgeon.
The same team at NYU andthe director of AI research at Facebook, Yann Lecun, are now proposing a new method that could overcome this problem. In addition to penalizing and rewarding a car for driving behavior, they also penalized it for straying into scenarios where it doesn’t have enough training data.
In essence, this forces the car to proceed more cautiously, explains Mikael Henaff, one of the authors of the study, rather than make wild turns and other maneuvers that place it squarely in unknown territory.
When they tested their new approach, they found that it was better than previous methods at getting the car to safely navigate dense traffic. It still wasn’t as good as human performance, though, so more work still needs to be done.
This story originally appeared in our AI newsletter The Algorithm. To read stories like this first, get The Algorithm delivered directly to your inbox. Subscribe here for free.
This artist is dominating AI-generated art. And he’s not happy about it.
Greg Rutkowski is a more popular prompt than Picasso.
What does GPT-3 “know” about me?
Large language models are trained on troves of personal data hoovered from the internet. So I wanted to know: What does it have on me?
The White House just unveiled a new AI Bill of Rights
It's the first big step to hold AI to account.
DeepMind’s game-playing AI has beaten a 50-year-old record in computer science
The new version of AlphaZero discovered a faster way to do matrix multiplication, a core problem in computing that affects thousands of everyday computer tasks.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.