Podcast: How games teach AI to learn for itself

Like humans and many animals, AI often learns new skills through play. But unlike the natural world, AI can process years of training in a single day.

September 29, 2021

Ms Tech | Unsplash

From chess to Jeopardy to e-sports, AI is increasingly beating humans at their own games. But that was never the ultimate goal. In this first episode of season three of In Machines We Trust, we dig into the symbiotic relationship between games and AI. We meet the big players in the space, and we take a trip to an arcade.

In this episode we meet:

Julian Togelius, Associate Professor, Department of Computer Science and Engineering, New York University
Will Douglas-Heaven, Senior Editor for AI, MIT Technology Review
David Silver, Principal Research Scientist at DeepMind, Professor at University College London.
David Fahri, Lead Researcher, Open AI

To make this episode, we also spoke to Natasha Regan, Actuary at RPC Tyche, Chess WIM and co-author of "Game Changer".

Sounds from:

Jeopardy 2011-02:The IBM Challenge: https://archive.org/details/Jeopardy.2011.02.The.IBM.Challenge/Jeopardy.2011.02.16.The.IBM.Challenge.Day.3.HDTV.XviD-FQM.avi
Garry Kasparov VS Deep Blue 1997 6th game (Kasparov Resigns): https://www.youtube.com/watch?v=EsMk1Nbcs-s
Qbert Level 1 Gameplay: https://www.youtube.com/watch?v=c9yxL2D94Sc
Attack Like AlphaZero: The Power of the King: https://www.youtube.com/watch?v=c0JK5Fa3AqI
Artificial Gamer: https://twitter.com/wykrhm/status/1438929297905831939?s=20
Miracle Perfect Anti Mage 16/0 - Dota 2 Pro Gameplay: https://www.youtube.com/watch?v=59KnNcU9iKc
DOTA 2 - ALL GAME-WINNING Moments in The International History (TI1-TI9: https://www.youtube.com/watch?v=RJcNbuASl-Y
Jeopardy announces Watson Challenge: https://youtu.be/isFR6Wfll-Q

Credits:

This episode was reported by Jennifer Strong and Will Douglas Heaven and produced by Anthony Green, Emma Cillekens and Karen Hao. We’re edited by Niall Firth, Michael Reilly and Mat Honan. Our mix engineer is Garret Lang. Sound design and music by Jacob Gorski.

Full transcript:

[TR ID]

[SOT: Jeopardy announces Watson Challenge]

Trebeck: Today we’re announcing a Jeopardy competition unlike anything we have ever presented before.

Jennifer: Ten years ago, the television quiz show Jeopardy unveiled a new player...

Trebeck: It's an exhibition match featuring two of the greatest jeopardy players in history… their challenger? Well, his name is Watson.

Documentary Announcer: [music] Watson is an IBM computer designed to play Jeopardy. Watson understands natural language with all its ambiguity and complexity.”

Jennifer: And perhaps not surprisingly... given that playing Jeopardy is the thing it was designed to do… Watson was good. Really good.

[SOT: Montage of Watson Jeopardy answers.]

Trebek: “Watson.”

Watson: “What is istanbul.”

Trebek: “You’re right.”

Trebek: “Watson.”

Watson: “What is parlement.”

Trebek: “Right.”

Trebek: “Watson.”

Watson: “What is ancient greek.”

Trebek: “Watson, back to you.”

Jennifer: After three nights of this, Watson won… beating the two best players in the game show’s history… From chess to Jeopardy to e-sports… AI is beating humans at their own games... (so to speak)… but that was never the ultimate goal. Researchers are trying to build intelligent systems that are more useful and general purpose than anything we have.

David Silver: If the human brain can solve all kinds of different tasks, can we build programs that can do the same thing?

Jennifer: I’m Jennifer Strong and this episode we dig into the symbiotic relationship between games and AI. Because for as long as there’s been AI research, games have been a part of it. We meet the big players in the space... and we take a trip to an arcade.

{Game sounds}

Karen Hao: In a way, games have over-hyped AI capabilities a little bit, because..

Jennifer: That’s my colleague Karen Hao…

Karen Hao: A lot of people now believe that AI is much more capable than it actually is, but games are actually a demonstration of incredibly narrow intelligence. And we're now kind of trapped in this cycle where AI research is specifically going down this path of more and more advanced games without actually going to more and more advanced, complex real-world situations, environments…which is what we actually need.

{Game sounds}

[SHOW ID]

OC:...you have reached your destination.

Julian Togelius: Games have been a part of AI since AI started, or like since the very idea of AI started.

Jennifer: Julian Togelius is a professor and computer scientist living in New York City…

Julian Togelius: I work on AI for making games better and also games for making AI better.

Jennifer: He’s giving me a history lesson on this relationship between games and AI… and somehow, he manages to do it while also playing a few video games that he’s been working with.

Julian Togelius: I particularly work with the video games and sort of modern video games because really chess and Go and all that... I mean, we're sort of done with that. It's like, I mean, [laughter] not to discourage people that like playing chess and like playing Go or poker for the mental challenge. That's fine. But you know, there are so many more possibilities, so many more interesting challenges in the other games.

Jennifer: How did you get into this field?

Julian Togelius: Yeah. So when my mom gave my cats away, [laughter] It's true! I mean, she, she got allergic and so what are you going to do? So she gave me a computer before a Commodore 64, and I started playing all these games and I got really fascinated by these little, little worlds. And then I grew up... well, more or less. [laughter] Uh, I grew up, I finished high school. I started studying philosophy and psychology. I was interested in, how does the mind work? What is the relationship of consciousness and intelligence and how does it all come about?

Jennifer: These questions brought him to an early paper by the pioneering computer scientist Alan Turing… He was the first to prove that building a computer was even mathematically possible.

Julian Togelius: That paper is largely about games. It's about the Imitation Game, what's now called a Turing Test, where you try to tell whether someone you're chatting with essentially - it wasn't called chatting in the fifties - whether someone you’re talking via text to is a computer or a human. It's also about chess. Because chess became very early on a core focus of artificial intelligence research.

Jennifer: We think of people who play chess as having a certain level of intelligence … and so the game became a way to gauge how intelligent machines are too.

And… fun fact? The very first chess playing program was written before a computer even existed to run it. Turing played it in 1950...using an algorithm worked out on paper.

(It didn’t work very well.)

But people continued to advance this research for decades.

And then, in 1997, I-B-M's Deep Blue computer beat Garry Kasparov... the reigning world champion of chess.

[SOT] - Deep Blue beating Garry Kasparov in Game Six via YouTube

Commentator 2: Are we missing something on the chessboard now that Kasparov sees? He does not look.. he looks disgusted in fact.

Commentator 1: Whoah!

Commentator 2: Deep Blue! Kasparov, after the move C4, has resigned!

[Applause]

Julian Togelius: And this was a huge intellectual event people were thinking, okay, what now? Did we just solve artificial intelligence? And it turns out that no, you didn't because this chess playing program couldn't even play checkers without significant reprogramming. It couldn't play Go. It couldn't play lots of things. And even more, it couldn't tie its shoelaces. It couldn't cook macaroni. It couldn't write a love poem. It couldn't go out and buy a newspaper. It couldn't do any of these things that humans do all the time. It really could literally just do one thing. It could play chess. It was damn good at it, but it could really only play chess.

Jennifer: So, humans had solved what was believed to be the biggest challenge of creating intelligence… but when you looked under the hood of the program... he says It was essentially just a kind of search.

Julian Togelius: What if I take this move? And then, what if my adversary takes this move, then what if I take this move? So we'd built a tree of possibilities and counter possibilities and calculated from that. It was actually much more complicated than that, but that's the heart of what it was doing. And people looked at it like, this doesn't seem like anything like how our brains work. I mean, we don't really know how our brains work, but, um, whatever they are doing, it's not this. [laugh]

Jennifer: But it isn't JUST used to play games against humans... AI shows up in games in all sorts of ways. Especially to make them more interesting and challenging.

For example…. AI changes parts of video games… so that they're different every time we play them, and that's been the case since the 19-80s.

Julian Togelius: And this principle of, like, always creating something new... and every time you play the game it's new... has survived into a lot of different games. For example, the Diablo series of games is based on that, or the Civilization series of strategy games. Every time you play it you have a completely new world and that's core to the game. It just wouldn't be the same if you didn't do that.

Jennifer: Another reason to do this is because of storage… and he says a game called Elite became an important milestone... when it was made available for personal computers, including the Commodore 64.

Julian Togelius: It couldn't possibly fit in memory in this computer. So one version had 4,096 different star systems. Now, if you only had 64,000 bytes of memory and imagine, think of how little that is, that's a millionth of a computer you can buy today. So, they had to recreate the star system every time you got there. Basically build it up from scratch.

Jennifer: And that’s still the case now. Sure, we have much more storage. But games are also much, much larger and more complex.

Julian Togelius: The game of No Man's Sky, which came out 2016, but they keep updating it - it keeps getting more and more impressive. It has more planets in it than you could ever visit in a lifetime, but it somehow all fits in your computer because they are recreated every time you see them.

Jennifer: Meanwhile, researchers have also continued to build game playing AIs… and Togelius says, one of the next challenges in that space will be for them to play many games at once… because multitasking is something humans do well…but that’s not yet the case for these systems.

So, how do we get from these highly structured environments with lots of predictability… to something closer to real life, which is messy and chaotic and not at all predictable.

To him and other researchers…? We play more games.

Julian Togelius: If we had a system that could reliably play, like with some proficiency, the top hundred games on a computer game top list, like Steam or the AppStore or something, then we would have something akin to general intelligence.

Jennifer: So, in some ways… we’re still kind of where we were a half century ago… thinking we might just find the key to general intelligence with AI systems that can beat humans at their own game.

[beat / music]

But we also mix games and AI in all sorts of other ways…like to help us with training data.

A few years ago I met a team at Princeton trying to make stop signs more recognizable to self-driving cars… using the game, Grand Theft Auto.

Strange as that might sound… it’s actually pretty practical when you consider just how many different ways a driver might come across a stop sign in the real world… be it on a stick in the ground… hanging in the air... or painted on the pavement… and we encounter them in every kind of light and weather… sometimes partially hidden by tree branches… or the darkness of night.

Researchers could go looking for examples of all these stop signs… or video games can just generate endless examples.

We’re also using games to better understand how algorithms make decisions.

[Start to bring in sounds from Arcade. *Frogger theme music and gameplay begins, toggle moves*]

Jennifer: We’re at a classic arcade in Boston… because it has several of these older video games that are used to train AI systems.

Will Douglas-Heaven: Hi, I’m Will Douglas-Heaven. I’m senior editor for AI at Technology Review… And I cannot play Frogger.

Will Douglas-Heaven: Frogger came up quite recently in some different AI research where they were trying to get an AI to explain itself and explain like what it was doing. Um, and they taught... they trained an AI to play this game and you know Frogger... You can hear from the noise, I keep failing.

So Frogger is this game where you're a little frog down the bottom and you've got to cross a road that has cars moving sort of across the screen left and right , and you've got to sort of dodge between them. And then you get to a river and you jump on the back of turtles and logs to get to the other side without falling in like I did there. Um, anyway, so it's, it's a game which has got like lots of definite actions you take at each step. And so when they trained the AI to do it, every time it took an action, they got it to explain in, um, sort of, you know, human understandable terms why it did that.

[*Game sounds continue*]

Jennifer: Basically, AI plays the game… and over time, it works out how to succeed. Random moves evolve into complex strategies… even some we didn't know about.

[Continue games sounds underneath the VO above and also into this piece of audio]

Will Douglas-Heaven: They threw the AIs at these old games and just showed them the screens that they had no idea how to play. It was just pixels on a screen, stuff happened. They tried things and sometimes they blew up. Sometimes they shot the alien ships. And using only sort of rewards from you know when they did something, right, the score went up, they slowly worked out how to play the game. And they went from understanding, nothing to, in many cases, sort of beating the high scores of the best human players. And even some really cool examples where they actually found ways to beat the game that humans hadn't discovered.

Jennifer: One example of this comes from a game called Q*Bert, which puts players on a pyramid of squares.

Will Douglas-Heaven: I mean the basic idea is you've got this little guy who jumps down the pyramid from the top landing on the squares. And when you've changed the squares all to the same color, then you can move on to the next level. But the AI, I think on the first level, changed all the colors of the squares and then kept jumping up and down the squares rather than moving on to the next level. And it found some bug in the game that allowed it to sort of get an infinite score in really a short amount of time. And even the designers of the game were like “ I haven't seen that bug before.”

Jennifer: After the break… We’ll meet some pioneers behind major breakthroughs in this field. But first, I want to tell you about an event called CyberSecure in November. It’s Tech Review’s cybersecurity conference and I'll be there with my colleagues. You can learn more at Cyber Secure M-I-T dot com.

We’ll be right back… after this.

[MIDROLL]

David Silver: My name's David Silver. I work on artificial intelligence and I apply it to games. I work for a company called DeepMind and our goal is to try and use, um, artificial intelligence to try and build a system, which has some of the smarts that are inside the human brain.

Jennifer: DeepMind is at the center of this work with games. It’s a research lab that’s part of Google's Alphabet.

David Silver: If the human brain can solve all kinds of different tasks, can we build programs that can do the same thing?

Jennifer: He’s the lead researcher behind some of the best known AI systems that have mastered how to play games... starting with board games, (including the ancient Chinese strategy game of Go.)

David Silver: We developed a system called AlphaGo, which was the first program to be able to play the game of Go at the level of top human professional players. And in fact, it was able to beat the world champion Lee Sedol.

David Silver: And there's this huge space of games, many of which have these beautiful characteristics that allow us to really just dive in and understand, you know, one piece of the world in isolation without having to deal with all of the immense complexity of the real world all at once.

Jennifer: AlphaGo learned how to play board games based on how people play.

Silver’s next system, AlphaZero, learned to play board games and video games in a different way… by learning the rules of a game and then playing itself over and over again.

David Silver: After AlphaGo, we tried to take the next step and make something even more general, which was to be able to play not just one game, but many games using the same technology. And this is a big stepping stone because it really is trying to do one of the things which we, as people are able to do, which is solve many problems, using the same kinds of machinery inside.

Jennifer: It is a milestone in making AI more general purpose... But with an important caveat. The algorithm can’t learn to play these games all at once. It’s as though it builds itself separate brains for each game. So it has to swap out its chess brain before playing Go.

It’s safe to say researchers are still trying to figure out how to make games a test for real life. Because games have rules that can be defined… and no one really knows the rules by which the world works.

David Silver: The world is really a messy place. You know, it's got this incredibly rich dynamics going on, all kinds of details in the way that objects move around. The way that the things we see relate to the things that we touch. There's just this incredible richness and complexity to the real world. And we can't possibly hope to address that in the way that people historically have approached games. So what we need is something which can understand the world for itself in a way that kind of understands the patterns in a way which is useful for it to make decisions that are actually meaningful in helping to achieve its goals.

Jennifer: His latest project is called MuZero. It excels at just as many games as AlphaZero... (as well as a whole host of video games).

...but this system figures out how to play without being given any rules at all.

David Silver: So it was really just let loose. It was able to play games against itself. And all it got at the end of the game was a signal to say, Hey, you won or Hey, you lost. And from that signal, it was able to build an understanding for itself of the rules of the game enough that it could actually sort of imagine what would happen into the future.. And once it had this ability to imagine into the future, it was able to search and start looking ahead and start thinking into the future and saying, aha, now I understand how this world works. I can start to imagine what would happen if I played this move or took this action. And so that's really a key step that we need and something we believe is very important going forward for the future of AI.

Jennifer: He says it’s not unlike an infant coming to grips with the world around it… building problem solving and creative skills, over time.

David Silver: I think we're already seeing examples where, within constrained domains, that we see algorithms that are to all intents and purposes, creative. I mean, what is creativity after all other than, you know, the ability to discover some new idea for itself. And I think that's the essence of creativity. The essence of creativity is what our algorithms are doing, which is to discover step by step something new and to learn through their experience that this new idea that they've come up with is actually something which is powerful and which helps it to achieve its goals. So I think in the future, we'll see more and more creativity of this form. We'll see, you know, machines which are able to discover for themselves ideas that help them to achieve goals. Not because a person's told them, this is the thing you need to achieve that goal, but because they figured it out for themselves.

Jennifer: And.. that creativity has led AlphaZero to discover new things about how to play chess. Now…. human players are actually adopting it in their own games ... calling it.. "playing an alpha zero move".

[SOT: how to play like AlphaZero]

Host: “Welcome to another edition of How to Attack lLike AlphaZero! I hope you are ready for today’s lesson….”

Jennifer: That’s also happening with e-sports… which are video game competitions that are often played in front of a live audience... similar to a sporting event… With a worldwide audience of nearly half a billion viewers tuning in to watch their favorite games played by some of the best gamers in the world.

Here too, AI is being used in a bunch of ways… like coaching tools to help people get better at playing… and (once again), researchers are also aiming to use e-sports to make their AI systems more intelligent…

David Farhi: We're imagining that at some point there'll be general artificial intelligence systems that can really solve problems quickly, can learn maybe at the level of humans.

Jennifer: David Farhi is a lead researcher at Open AI... The research lab founded by Elon Musk and a bunch of other Silicon Valley luminaries.

It created the first system to beat world champions at an e-sports game.

That game is called Defense of the Ancients 2, which everyone calls Dota 2… and there’s a new documentary about this win… called Artificial Gamer.

[Clip from Artificial Gamer trailer]

[Dramatic music and sounds from Dota 2 gameplay]

Speaker 1: When you look at the game of Dota, there’s 10,000 plus variables in every moment that your system has to take in.

Speaker 2: The AI learns in a very different way than humans.

Speaker 3: It plays against copies of itself. Many, many times off in the cloud..

Jennifer: Fahri oversaw the Dota 2 project, called the Open AI Five… and he demonstrated how it works at Tech Review’s AI conference, EmTech Digital…

[Sounds of Dota 2 gameplay via YouTube. [00:03 - 00:15] Fade in, then bed under the following Farhi select. *Sword fighting, footsteps, and dramatic battle music.*]

David Farhi: In the upper right corner of this screen. We see a very big, zoomed out, view of the whole world of Dota, In the lower left corner there's one team's base. In the upper right corner is another team's base. Each team is trying to move their characters around, cast spells with their characters, attack the enemies and so on to ultimately invade and destroy the other team’s base.

David Farhi: These more complicated systems like robotics and video games have a different feel to them because you get an observation of the state of the game, and then you choose an action to take. And then the state of the game changes in some way, depending on the action you took. And then you've got a new observation and you can choose a new action and this loop happens over and over and over again. And so you have to make decisions that have long-term consequences down the road. So the way we do this is relatively simple. Conceptually at least. We have agents that start out playing totally randomly. And we just have to play them against themselves, a clone of themselves over and over and over again.

Jennifer: And if you’re thinking this might take a really long time with such a complicated game? You’re not wrong… but Open AI’s ability to run it on 200-thousand machines at once... helps.

Basically… it’s able to gain about 250 years of experience per day.

And if the system does something that works... it’s updated to do that thing more… and if something bad happens that doesn’t work, it does that thing less.

David Farhi: We started out with a limited version of the game. We were eventually able to beat our developer team, which was very fun. And then we added more pieces of the game. We went back and trained for longer. And we were able to beat some amateurs and then some semi-professional humans. Eventually we decided to go to a large tournament that this game has..

[Sounds from The International 3 (Dota tournament) via YouTube. *Crowd cheering, sports commentators shouting excitedly, Dota gameplay.*]

Sportscaster: It could be their last stand. [inaudible]

Sportscaster: He's gonna try to focus everybody but there's so much stuff.

Sportscaster: There's no more clips available. Down to about half HP.

Sportscaster: A quarter HP. A lion surrounding from all sides! EKB! Sportscaster: They won the round! They're gonna do it!

Sportscaster: The kings of the north! Alliance wins! They win TI 3.

Sportscaster: The Alliance just won 1.4 million dollars!

Sportscaster: They are your International 3 champions!

David Farhi: So this game has millions of human users who compete in these tournaments for large prizes, which ensures that we know there are humans who are playing at a very, very high level of skill. In August of 2018, we took our agent to this tournament.

Jennifer: Their AI played against two professional teams that had already been eliminated from the tournament… and narrowly lost. But the following year, with more training, the AI was able to beat the former world champions 2 - 0.

David Farhi: So OpenAI Five is trained with no humans in the training process, so it just plays against itself in these cloud servers over and over and over and over again. And then when we want to play it against a human, we take a snapshot out of the cloud and play it against the human, but we never feed that data back into the training process.

[Music]

Jennifer: But there’s still this question of whether games can help us train AI to be more useful.

Right now, we have systems that are extremely good at one thing. But we don't yet have models that can do lots of things at once.

Once again, my colleague Will Douglas Heaven.

Will Douglas-Heaven: The trick is going to be, I think stepping back from building AI's that, excel at specific strategies or techniques, or have a brilliant workaround for this particular rule or move, you know, the kind of thing that we've been seeing in these AIs that can learn to play games.

Jennifer: To really understand the next stage of this research... It might be helpful to think about the way kids play on a playground.

Will Douglas-Heaven: They're not playing a game that has any sort of real set rules. I mean, they may make them up as they go along, but, you know, they're just exploring, trying stuff out and in a very sort of natural and open-ended way. And there isn't any definite goal that they're working towards. And I think it's this kind of technique, which is still a kind of play, that we're going to see, you know, really push things forward when we talk about general intelligence. Deepmind, for example, a few months ago released a virtual playground. It's sort of like a video game world called X Land. And it's populated by a bunch of little bots. And the neat thing here is that X Land itself is controlled by an AI or sort of like a games master that rearranges the environment, rearranges the obstacles and the blocks and the balls the little bots get to play with, and also comes up with different rules on the fly. So, simple games like tag or hide and seek, and the bots just have to work out, you know, how to play those. You know what objects in that virtual world will help them to do it. And they learn general skills like exploring, just trying stuff out And I think this kind of open-ended exploration is going to be key for the next generation of AI. And it's kind of exciting that the [00:09:00] next wave of AI, the AIs that are going to be good at multiple things, [00:09:03] We // still might get there through games again. So games aren't going anywhere. Games have been with AI since the beginning. And you know, it's nice to see that play is still perhaps the best way of learning.

[CREDITS]

Jennifer: This episode was reported by me and Will Douglas-Heaven… and produced by Anthony Green, Emma Cillekens and Karen Hao. We’re edited by Niall Firth, Michael Reilly and Mat Honan. Our mix engineer is Garret Lang… with sound design and music by Jacob Gorski.

Thanks for listening, I’m Jennifer Strong.
[TR ID]

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.