Reinforcement learning (RL), the category of machine learning that relies on penalties and rewards, can be a powerful technique for teaching machines to adapt to new environments.
Deepmind’s AlphaGo used it to defeat the world’s best Go player despite never having played him before. It has also shown promise in the creation of robots that can perform under changing conditions.
But the technique has its limitations. It requires a machine to blunder around as it slowly refines its actions over time. That’s fine in the lab, or when playing a board game. It’s less than ideal for applications, like self-driving cars, where a blunder could be fatal.
In response, researchers have developed different ways to circumvent the need for real-world training. A car can use traffic data to learn to drive in a safe digital replica of the physical world, for example, to get past its blundering stage without putting anyone in harm’s way.
But this isn’t a perfect solution. A machine might still make costly errors when it encounters situations beyond the scope of its training data. In one instance, researchers at New York University discovered a car had learned to make 90-degree turns into oncoming traffic (thankfully, within a simulation) because its training data set didn’t encompass those kinds of scenarios. Needless to say, this isn’t viable for safely training a self-driving car or, say, a robotic surgeon.
The same team at NYU andthe director of AI research at Facebook, Yann Lecun, are now proposing a new method that could overcome this problem. In addition to penalizing and rewarding a car for driving behavior, they also penalized it for straying into scenarios where it doesn’t have enough training data.
In essence, this forces the car to proceed more cautiously, explains Mikael Henaff, one of the authors of the study, rather than make wild turns and other maneuvers that place it squarely in unknown territory.
When they tested their new approach, they found that it was better than previous methods at getting the car to safely navigate dense traffic. It still wasn’t as good as human performance, though, so more work still needs to be done.
This story originally appeared in our AI newsletter The Algorithm. To read stories like this first, get The Algorithm delivered directly to your inbox. Subscribe here for free.