OpenAI’s Goofy Sumo-Wrestling Bots Are Smarter Than They Look

Will Knightarchive page

October 12, 2017

It could be a virtual blood sport in some absurdist techno-future.

OpenAI, a research institute backed by Elon Musk and several other Silicon Valley big shots, has revealed its latest research on developing more powerful forms of machine learning. And it’s demonstrating the technology using virtual sumo wrestling.

The virtual wrestlers might look slightly ridiculous, but they are using a very clever approach to learning in a fast-changing environment while dealing with an opponent.

The agents use a form of reinforcement learning, a technique inspired by the way animals learn through feedback. It has proved useful for training computers to play games and to control robots (see “10 Breakthrough Technologies 2017: Reinforcement Learning”).

One big challenge with using reinforcement learning is that it doesn’t work so well in more realistic situations, where things are constantly in flux. OpenAI already developed its own reinforcement algorithm called proximal policy optimization (PPO), which is especially well suited to changing environments.

The latest work, done in collaboration with researchers from Carnegie Mellon University and UC Berkeley, demonstrates a way for AI agents to apply what the researchers call a “meta-learning” framework. This means the agents can take what they have already learned and apply it to a new situation.

Inside the RoboSumo environment (see video above), the agents started out behaving randomly. Through thousands of iterations of trial and error, they gradually developed the ability to move—and, eventually, to fight. Through further iterations, the wrestlers developed the ability to avoid each other, and even to question their own actions. This learning happened on the fly, with the agents adapting even they wrestled each other.

Flexible learning is a very important part of human intelligence, and it will be crucial if machines are going to become capable of performing anything other than very narrow tasks in the real world. This kind of learning is very difficult to implement in machines, and the latest work is a small but significant step in that direction.

The researchers found that by using meta-learning, their sumo-bots could learn effective strategies more quickly. So even if they look a bit hapless, don’t underestimate them.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.