Sony’s racing AI destroyed its human competitors by being nice (and fast)
What Gran Turismo Sophy learned on the racetrack could help shape the future of machines that can work alongside humans, or join us on the roads.
“Wait, what? How?” Emily Jones wasn’t used to being left behind. A top sim-racing driver with multiple wins to her name, Jones jerked the steering wheel in the e-sports rig, eyes fixed on the screen in front of her: “I’m pushing way too hard to keep up— How does it do that?” Her staccato commentary intercut with squealing tires, Jones flung her virtual car around the virtual track at 120 miles per hour—then 140, 150—chasing the fastest Gran Turismo driver in the world.
Built by Sony AI, a research lab launched by the company in 2020, Gran Turismo Sophy is a computer program trained to control racing cars inside the world of Gran Turismo, a video game known for its super-realistic simulations of real vehicles and tracks. In a series of events held behind closed doors last year, Sony put its program up against the best humans on the professional sim-racing circuit.
What they discovered during those racetrack battles—and the ones that followed—could help shape the future of machines that work alongside humans, or join us on the roads.
Back in July 2021, Jones, who is based in Melbourne, Australia, and races for the e-sports team Trans Tasman Racing, didn’t know what to expect. “I wasn’t told much about it,” she says now, a year later. “‘Don’t do any practice,’ they said. ‘Don’t look at its lap times.’ I was like, it’s obviously going to be good if they’re keeping it secret from me.” In the end, GT Sophy beat Jones’s best lap by 1.5 seconds. At a level where records are smashed in millisecond increments, 1.5 seconds is an age.
But Sony soon learned that speed alone wasn’t enough to make GT Sophy a winner. The program outpaced all human drivers on an empty track, setting superhuman lap times on three different virtual courses. Yet when Sony tested GT Sophy in a race against multiple human drivers, where intelligence as well as speed is needed, GT Sophy lost. The program was at times too aggressive, racking up penalties for reckless driving, and at other times too timid, giving way when it didn’t need to.
Sony regrouped, retrained its AI, and set up a rematch in October. This time GT Sophy won with ease. What made the difference? It’s true that Sony came back with a larger neural network, giving its program more capabilities to draw from on the fly. But ultimately, the difference came down to giving GT Sophy something that Peter Wurman, head of Sony AI America, calls “etiquette”: the ability to balance its aggression and timidity, picking the most appropriate behavior for the situation at hand.
This is also what makes GT Sophy relevant beyond Gran Turismo. Etiquette between drivers on a track is a specific example of the kind of dynamic, context-aware behavior that robots will be expected to have when they interact with people, says Wurman.
An awareness of when to take risks and when to play it safe would be useful for AI that is better at interacting with people, whether it be on the manufacturing floor, in home robots, or in driverless cars.
“I don’t think we’ve learned general principles yet about how to deal with human norms that you have to respect,” says Wurman. “But it’s a start and hopefully gives us some insight into this problem in general.”
GT Sophy is just the latest in a line of AI systems that have beaten the world’s best human players at various games, from chess and Go to video games like Starcraft and DOTA. But Gran Turismo offered Sony a new kind of challenge. Unlike other games, especially those that are turn-based, Gran Turismo calls on its best players to control a vehicle at the limits of what’s physically possible, in real time, and in close proximity with other players all trying to do the same.
Cars hurtle around corners at more than 100 miles per hour with only inches between them. At those speeds, the smallest errors can lead to a crash. Gran Turismo captures real-world physics in extreme detail, simulating the aerodynamics of a car and the friction of its tires on the track. The game is sometimes used to train and recruit drivers for real-world racing.
“It does an excellent job with the realism,” says Davide Scaramuzza, who leads the robotics and perception group at the University of Zurich in Switzerland. Scaramuzza was not involved with GT Sophy, but his team has used Gran Turismo to train a previous AI driver—though not one that was ever tested against humans.
GT Sophy doesn’t get the same view of the game that human players do. Instead of reading pixels off a screen, the program takes in updates about the position of its car on the track and the positions of the cars around it. It also gets sent information about the virtual physical forces affecting its vehicle. In response, GT Sophy tells the car to turn or brake. This back-and-forth between GT Sophy and the game happens 10 times a second, which Wurman and his colleagues claim matches the reaction time of human players.
Sony used reinforcement learning to train GT Sophy from scratch via trial and error. At first the AI struggled to keep a car on the road. But after training on 10 PlayStation 4s, each running 20 instances of the program, GT Sophy matched Gran Turismo’s built-in AI, which amateur players use for practice, in around eight hours. In 24 hours it was laying down lap times near the very top of an online leaderboard of 17,700 human players.
It took nine days before GT Sophy stopped shaving fractions of a second off its lap times. By then it was faster than any human.
Sony’s AI learned how to drive at the limits of what the game allowed, pulling off moves that human players can only gawk at. In particular, Jones was struck by the way GT Sophy took corners, braking early before accelerating out on a much tighter line than she was.
“It used the curve in a weird way, doing stuff that I just didn’t even think of,” she says. For example, GT Sophy often drops a wheel onto the grass at the edge of the track and then skids into turns. “You don’t want to do that because you’ll make a mistake. It’s like a controlled crash,” she says. “I could maybe do that one in a hundred times.”
GT Sophy was quick to master the game’s physics. The bigger problem was the referees. At a professional level, Gran Turismo races are watched by human judges, who can award penalty points for dangerous driving. Racking up penalties was a key reason for GT Sophy’s loss in the first round of races last July, even though it was faster than any of the human drivers. And learning to avoid them made all the difference in round two.
Tough but fair
Wurman has been working on GT Sophy for several years. There’s a painting of two cars jostling for position hanging on the wall behind his desk. “It’s a GT Sophy car passing Yamanaka,” says Wurman, referring to Tomoaki Yamanaka, one of the four Japanese professional sim-racing drivers who competed against GT Sophy last year.
Wurman can’t recall which race the painting is taken from. If it’s the October event, Yamanaka may well be having a great time, pushing himself against a tough but fair opponent. If it’s the July event, he’s probably cussing at the computer.
Yamanaka’s teammate Takuma Miyazono told me about that July race via a translator. “There were a few times where we were pushed off the track because of how aggressively it would go into the corners,” he said. “That threw us off. The human drivers had to hold back on the turns to avoid being run off the road.”
Training the AI to play fair without losing its competitive edge was hard, says Wurman. The human referees make subjective judgments that depend on context, making it difficult to turn them into simple dos and don’ts that the AI can learn from.
The Sony researchers tried giving the AI lots of different cues, adjusting them as they went, hoping to find a mix that worked. They tried penalizing it if it went off the track or bumped into wall. They penalized it for crashes it caused, and for crashes where a referee’s call might go either way. They experimented with different-size penalties for each and checked how GT Sophy’s driving changed in response.
Sony also upped the competition GT Sophy faced in its training. Before, it had trained mostly against previous versions of itself. Leading into the October rematch, Sony tested its AI every week or two against top drivers, tweaking it constantly. “That gave us the kind of feedback we needed to find the right balance between aggression and timidity,” Wurman says.
It worked. When Miyazono went up against GT Sophy three months later, the aggression was gone—but the AI was not simply backing down. “When you go into a corner with two cars side by side, it leaves just enough space for your car to go through,” he told me. “It really does feel like you’re racing with another person.”
“You get a different sort of passion and fun from driving against something that reacts that way,” he added. “That was something that really left a big impression on my mind.”
Scaramuzza is impressed with Sony’s work. “We measure the progress of robotics against what humans can do,” he says. But Elia Kaufman, who works with Scaramuzza at the University of Zurich, points out that it is still human researchers who choose which of GT Sophy’s learned behaviors to bake in during training. “They’re the ones who judge what is good racing etiquette or not,” he says. “It would be really interesting if that could be done in an automated way.” Such a machine would not only have good manners but could recognize what good manners were, and be able to adapt its behavior to new settings.
Scaramuzza’s team is now applying its Gran Turismo research to real-world drone racing, training an AI to fly using raw video input instead of data from a simulation. Last month they invited two world-champion drone racers to take on the computer. No prizes for guessing who won. “It was very interesting to look at their faces after they saw our AI racing,” says Scaramuzza. “They were mind-blown.”
Scaramuzza thinks that making the jump to the real world is essential for true progress in robotics. “There will always be a mismatch between simulation and the real world,” he says. “This is something that gets forgotten when people talk about AI making incredible progress. In terms of strategy, yes. In terms of real-world deployment, we are definitely not there yet.”
For now, Sony is sticking to games. It plans to put GT Sophy in a future version of Gran Turismo. “We’d like this to become part of the product,” says Peter Stone, executive director of Sony AI America. “Sony’s an entertainment company, and we want this to make the game more entertaining.”
Jones thinks the sim-racing community could learn a lot from GT Sophy once more people get a chance to see it drive. “There will be tracks where we’re like, hang on a second, we’ve been doing this for years but there’s actually a faster way of doing it.” Miyazono has already tried to copy some of the lines the AI takes around corners, now that it has shown him they can be done.
“If the benchmark changes, everybody rises up as well,” says Jones.
The inside story of how ChatGPT was built from the people who made it
Exclusive conversations that take us behind the scenes of a cultural phenomenon.
AI is dreaming up drugs that no one has ever seen. Now we’ve got to see if they work.
AI automation throughout the drug development pipeline is opening up the possibility of faster, cheaper pharmaceuticals.
The original startup behind Stable Diffusion has launched a generative AI for video
Runway’s new model, called Gen-1, can change the visual style of existing videos and movies.
GPT-4 is bigger and better than ChatGPT—but OpenAI won’t say why
We got a first look at the much-anticipated big new language model from OpenAI. But this time how it works is even more deeply under wraps.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.