In January of this year, DeepMind announced it had hit a milestone in its quest for artificial general intelligence. It had designed an AI system, called AlphaStar, that beat two professional players at StarCraft II, a popular video game about galactic warfare. This was quite a feat. StarCaft II is highly complex, with 1026 choices for every move. It’s also a game of imperfect information—and there are no definitive strategies for winning. The achievement marked a new level of machine intelligence.
Now DeepMind, an Alphabet subsidiary, is releasing an update. AlphaStar now outranks the vast majority of active StarCraft players, demonstrating a much more robust and repeatable ability to strategize on the fly than before. The results, published in Nature today, could have important implications for applications ranging from machine translation to digital assistants or even military planning.
StarCraft II is a real-time strategy game, most often played one on one. A player must choose one of three human or alien races—Protoss, Terran, or Zerg—and alternate between gathering resources, building infrastructure and weapons, and attacking the opponent to win the game. Every race has unique skill sets and limitations that affect the winning strategy, so players commonly pick and master playing with one.
AlphaStar used reinforcement learning, where an algorithm learns through trial and error, to master playing with all the races. “This is really important because it means that the same type of methods can in principle be applied to other domains,” said David Silver, DeepMind’s principal research scientist, on a press call. The AI also reached a rank above 99.8% of the active players in the official online league.
In order to attain such flexibility, the DeepMind team modified a commonly used technique known as self-play, in which a reinforcement-learning algorithm plays against itself to learn faster. DeepMind famously used this technique to train AlphaGo Zero, the program that taught itself without any human input to beat the best players in the ancient game of Go. The lab also used it in the preliminary version of AlphaStar.
Conventionally in self-play, both versions of the algorithm are programmed to maximize their chances of winning. But the researchers discovered that that didn’t necessarily result in the most robust algorithms. For such an open-ended game, it risked pigeon-holing the algorithm into specific strategies that would only work under certain conditions.
Taking inspiration from the way pro StarCraft II players train with one another, the researchers instead programmed one of the algorithms to expose the flaws of the other rather than maximize its own chance of winning. “That’s kind of [like] asking a friend to play against you,” said Oriol Vinyals, the lead researcher on the project, on the call. “These friends should show you what your weaknesses are, so then eventually you can become stronger.” The method produced much more generalizable algorithms that could adapt to a broader range of game scenarios.
The researchers believe AlphaStar’s strategy development and coordination skills could be applied to many other problems. “We chose StarCraft [...] because we felt it mirrored a lot of challenges that actually come up in real-world applications,” said Silver. These applications could include digital assistants, self-driving cars, or other machines that have to interact with humans, he said.
“The complexity [of StarCraft] is much more reminiscent of the scales that we’re seeing in the real world,” said Silver.
But AlphaStar demonstrates AI’s significant limitations, too. For example, it still needs orders of magnitude more training data than a human player to attain the same level of skill. Such learning software is also still a long way off from being translated into sophisticated robotics or real-world applications.