To Get Truly Smart, AI Might Need to Play More Video Games

The realistic 3-D graphics in video games can help deep-learning algorithms make sense of the real world.

Will Knightarchive page

March 16, 2016

The latest computer games can be fantastically realistic. Surprisingly, these lifelike virtual worlds might have some educational value, too—especially for fledgling AI algorithms.

Adrien Gaidon, a computer scientist at Xerox Research Center Europe in Grenoble, France, remembers watching someone play the video game Assassins Creed when he realized that the game’s photo-realistic scenery might offer a useful way to teach AI algorithms about the real world. Gaidon is now testing this idea by developing highly realistic 3-D environments for training algorithms how to recognize particular real-world objects or scenarios.

The idea is important because cutting-edge AI algorithms need to feed on huge quantities of data in order to learn to perform a task. Sometimes, that isn’t a problem. Facebook, for instance, has millions of labeled photographs with which to train the algorithms that automatically tag friends in uploading images (see “Facebook Creates Software that Matches Faces Almost as Well as You Do”). Likewise, Google is capturing huge amounts of data using its self-driving cars, which is then used to refine the algorithms that control those vehicles.

But most companies do not have access to such enormous data sets, or the means to generate such data from scratch.

To fill in those gaps, Gaidon and colleagues used a popular game development engine, called Unity, to generate virtual scenes for training deep-learning algorithms—a very large type of simulated neural network—to recognize objects and situations in real images. Unity is widely used to make 3-D video games, and many common objects are available to developers to use in their creations.

A paper describing the Xerox team’s work will be presented at a computer vision conference later this year. By creating a virtual setting, and letting an algorithm see lots of variations from different angles and with different lighting, it’s possible to teach that algorithm to recognize the same object in real images or video footage. “The nice thing about virtual worlds is you can create any kind of scenario,” Gaidon says.

Gaidon’s group also devised a way to convert a real scene into a virtual one by using a laser scanner to capture a scene in 3-D and then importing that information into the virtual world. The group was able to measure the accuracy of the approach by comparing algorithms trained within virtual environments with ones trained using real images annotated by people. “The benefits of simulation are well known,” he says, “but [we wondered], can we generate virtual reality that can fool an AI?”

The Xerox researchers hope to apply the technique in two situations. First, they plan to use it to find empty parking spots on the street using cameras fitted to buses. Normally doing this would involve collecting lots of video footage, and having someone manually annotate empty spaces. A huge amount of training data can be generated automatically using the virtual environment created by the Xerox team. Second, they are exploring whether it could be used to learn about medical issues using virtual hospitals and patients.

The challenge of learning with less data is well known among computer scientists, and it is inspiring many researchers to explore new approaches, some of which take their inspiration from human learning (see “Can This Man Make AI More Human?”).

“I think this is a very good idea,” says Josh Tenenbaum, a professor of cognitive science and computation at MIT, of the Xerox project. “It’s one that we and many others have been pursuing in different forms.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.