Waymo is going to share its self-driving data—but it’s still not enough

Waymo, the self-driving spinoff of Alphabet, is the latest firm to offer up some of the information gleaned from its vehicles to the wider research community.

The news: Waymo says it will share some of the data it’s gathered from its vehicles for free so other researchers working on autonomous driving can use it. Waymo isn’t the first to do this: Lyft, Argo AI, and other firms have already open-sourced some data sets. But Waymo’s move is notable because its vehicles have covered millions of miles on roads already.

Why this matters: Unlike human drivers, autonomous vehicles don’t have an instinctive understanding of the world. Instead, they rely on training data to learn about conditions they are likely to encounter and how to react to them. The more high-quality data AI models have to train on, the better.

Waymo’s data set: It contains 1,000 segments, each capturing 20 seconds of continuous driving. The data comes from four locations: San Francisco and Mountain View in California; Phoenix in Arizona (where Waymo has launched a small-scale robotaxi service); and Kirkland in Washington. It also comes from multiple sources, including cameras as well as radar and lidar, which bounce lasers off nearby objects to create 3D maps of their surroundings. Helpfully, the company has labelled things like pedestrians, bikes, and signals in the data set, which means other researchers won’t have to do this grunt work.

Data hoarders: While Waymo deserves some credit for its move, it’s sharing just a tiny sliver of the information it has gathered. Other companies are also hoarding data for competitive reasons, and they are especially reluctant to share information related to accidents and near-misses. But if the industry wants to overcome concerns about autonomous vehicles’ safety, the businesses in it will have to become far more transparent about what they’ve learned.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.