Self-driving cars are being trained in virtual worlds while the real one is in chaos

With fleets out of commission, covid-19 left self-driving car and truck companies frozen in time. Now they’re finding new life in old data.

Hayden Fieldarchive page

May 22, 2020

One of Cruise's autonomous vehicles on the move.

Brandon Moak felt as if a freight train had hit him.

It was mid-March, and the cofounder and CTO of the autonomous- trucking startup Embark Trucks had been keeping tabs on the emergence of covid-19. As a shelter-in-place order went into effect throughout the San Francisco Bay Area, where Embark is based, Moak and his team were forced to ground almost all their 13 self-driving semi-trucks (a few stayed on the road moving essential freight but weren’t in autonomous mode) and send home the majority of their workforce, with no idea how long it’d be before they could return.

Moak and Embark weren’t alone. For safety reasons, autonomous vehicles typically have two operators apiece. That’s a no-go in the age of social distancing, and leaders of autonomous-vehicle companies knew they’d have to mothball their fleets. Suddenly the whole nascent industry was in trouble. Autonomous vehicles are still experimental, and real-world testing is the gold standard for collecting data and improving the cars’ ability to operate safely. Unable to get on the road, self-driving operations risked becoming cash-intensive gambits with no path toward fielding a product anytime soon.

As they struggled with this new reality, layoffs rippled through autonomous-driving outfits like Zoox, Ike, and Kodiak Robotics, as well as the autonomous division at Lyft.

But as it turns out, all may not be lost. Several companies have traded road tests for delving deep into their algorithms and simulators, finding new uses for the countless hours of data they’ve collected. They’re doubling down on efforts like detailed data labeling, 3D mapping, and identifying overlooked scenarios from previous road sessions that can be used to train their systems. Some have even helped vehicle operators transition into data labeling, equipping them with new skills that will likely come in handy when they resume their former roles.

To make the best of a bad situation, Moak decided to build a new tool to allow Embark’s operations team to annotate the company’s four years of driving data. For instance, the software serves Embark’s truck drivers with images of different on-road scenarios and then asks them to determine if they’re noteworthy—and how they’d handle each based on their own experience.

Aurora Innovation, a Palo Alto–based company that develops self-driving technology, took a similar approach to finding tasks for underutilized workers. "Our vehicle operators, who can’t all be on the road right now, have joined forces with our triage and labeling teams to mine our massive collection of manual and autonomous driving data for additional interesting on-road events that can be turned into virtual tests," cofounder and CEO Chris Urmson wrote in an email to MIT Technology Review. “This has the additional benefit of increasing the exposure of our operators to how the data they gather is used offline, [which] gives them better context into our overall development process and will help them be even better at their job as we get back on the road,” he added.

Companies have also found creative ways to overcome the obstacle of being physically separated from their products.

Urmson, who previously led Google’s self-driving-car project, said that his team is using its “hardware-in-the-loop” pipeline to "catch software issues that would manifest on Aurora hardware and not on developer laptops or cloud instances." The pipeline can flag, for example, a case where a vehicle's sensors would be slower to make observations about its environment than simulated tests on a developer's laptop suggest.

Embark, for its part, invested in software that could test hardware components offline. One test involves the vehicle’s control system—the algorithms responsible for sending physical commands, like how fast to turn the steering wheel. “In the long run, this will be a good investment for us, but in the short term, we had to make a big leap to build all this new infrastructure,” said Moak.

General Motors-owned Cruise has relegated 200 vehicles in San Francisco and Phoenix largely to the garage, though it is using some to make food deliveries for local relief organizations. The company is relying on its advanced simulators to keep putting cars’ software through its paces—a regular practice even before the pandemic, but SVP of engineering Mo Elshenawy says they’re improving the detail on how cars are scored during their encounters in the sims as a way to better assess competency in unusual situations, like when dealing with ambulances or delivery trucks.

Alexandr Wang, founder and CEO of data annotation firm Scale AI, works with companies like Lyft, Toyota, and Nuro, as well as Embark and Aurora. During the pandemic, Scale has been working on detailed labeling for companies’ old data via point cloud segmentation—using 3D maps of the environment around a vehicle to encode what every point corresponds to (pedestrian, stop sign, window, shrub, stroller). The team is also encoding the behavior of drivers, pedestrians, and cyclists with technology including “gaze detection,” which aims to indicate whether a driver might yield or a pedestrian plans to cross the street.

No matter how much companies invest in their simulators, though, there’s no getting around the need to eventually get back on the road. And as the US reopens, that’s beginning to happen. A Waymo spokesperson wrote in an email that a day of simulated driving is akin to “driving more than 100 years in the real world,” in part thanks to parent company Alphabet’s computing power. Nevertheless, the company got its driving operations in Phoenix up and going again as of May 11.

Still, Wang says he sees a change in how autonomous-vehicle companies are working, shifting toward more innovative approaches and long-term experimentation.

“The ones who are taking this view,” he says, “are the ones who will, at the end of this, come out ahead and be in a better spot.”

Correction: This article was changed to correctly attribute additional quotes to Urmson. An example of the use of the "hardware-in-the-loop" pipeline was also added. A reference to Cruise relegating its vehicles to the garage was changed to reflect the fact that some are in fact being used, and "point-cloud simulation" was changed to "point-cloud segmentation" in the discussion of Scale AI.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.