The use of bots is on the rise: they're helping our doctors detect our illnesses, they're helping our banks detect fraudulent transactions and they're finding our faces in photos all across Facebook. The task of creating better and more applicable bots calls for heavy testing of our current algorithms, and that's where video games are currently excelling. In video games, AIs have a complex environment to navigate, full of real-life phenomena such as imperfect information and stochastic behaviors. This branch of research (that is, the AI research that is done through video games) got great news at the beginning of this week: Open AI Five, a bot created by the Elon Musk start-up Open AI, won a match 2-1 against professional Dota2 players.
Dota2 is a MOBA (Multiplayer Online Battle Arena) game, a subgenre of Real Time Strategy games. In these games, two teams battle in a symmetric battlefield with the objective of getting into the enemy base and destroying their core. Each team has three lanes (top, middle and bottom) with three tiers of towers (defense buildings that attack everything that comes close enough), and each base produces creeps periodically, these are small low-attack creatures that go through every lane and attack enemy creeps, enemy towers and also enemy heroes.
There's a lot of complexity in MOBA games: it requires skillful micro in the form of last hitting creeps and skill shooting, the members of the team are expected to have great map awareness and positioning, there's a store with items that boost certain stats and, finally, each player selects a hero at the start of the game in the draft phase, and the variety of heroes in these games makes drafting an art: knowing the complex relationships between damage-doers, supports, tanks must be learned in order to any chance of winning. Moreover, the map of these games has incomplete information in the form of fog-of-war, which calls for the use of wards in order to have more visibility.
The complexity of MOBA games makes them a great test for our current AI algorithms: can our AIs collaborate in such an intricate manner?, can our AIs infer the adversary's intentions from incomplete information and take good strategic decisions? This benchmark match answers this questions with an affirmative.
This event, which was held in San Francisco, took place last Sunday, the 5th of August, 12018. It started with a show match against the audience, in which Open AI Five started looking about evenly-matched but, after focusing on pushing and sieging the enemy base, won the match in about fifteen minutes very convincingly (as a reference, an average match lasts about 30 minutes). After a small word from the CTO and co-founder of Open AI Greg Brockman, the human team got on the stage and faced the challenge. The first game lasted about 20 minutes and Open AI Five showed great positioning and teamwork. A move that caught my eye was a very particular trap Open AI Five created in order to get three of the human players. Such a trap shows the map awareness that the bot has as a whole, the importance of being in a position at a given time. This first game spiraled out of control for the humans, and they surrendered after losing the third tier of both bottom and mid. The second game wasn't a lot better for the humans, they managed to stay on top in terms of income for a little while, but the bot always had more than 90% certainty of winning. In the third match, the only one the bot lost, it was set for failure: the draft (i.e. the heroes) that the bot was going to play with was set by the audience. The final result was a (rather unfair for the bot) 2-1 against the humans.
So, how did they do it? One cannot solve such a complex problem directly, so they trained 5 Artificial Neural Networks and used Reinforcement Learning so that, in the training, the optimization was moved from focusing on a personal score for each AI to the overall group score. Open AI published the network architecture, and it combines convolutional algorithms for the visual components (such as interpreting the minimap) with Long-Short Term Memory (a network topology which is usually used in Natural Language Processing). Among the input nodes for each ANN is a low-resolution version of the minimap, nearby terrain, cooldowns, time until several events such as the next creep wave or night, available actions, etc.; and the output nodes are a selected action, move, offset or ward coordinates and a win probability, among other things. In the words of Brockman, the bot was trained by playing against itself for the equivalent of 180 years, and according to their blog post, they did about 190 petaflops per second a day, which according to their calculations equals 1.9e22 floating point operations. As a comparison, AlphaZero took about 450 petaflops per second a day.
Although this is a great result and a huge breakthrough, it doesn't mean that Dota2 bots are now superhuman. The humans that were tested are 99.95th percentile (i.e. they're among the best 0.05%), but they are mostly casters and don't constitute a professional team; making a comparison with AlphaGo, this match was like winning against Fan Hui, not against Lee Sedol. Moreover, the game that the bot played wasn't a vanilla version of Dota2: this match had rules and restrictions such as considering only 18 from the 111 playable heroes that the game has, banning certain glyphs and using immortal carriers. High-dimensional tasks such as item purchases were scripted instead of learned. The last match also shows that the bot shows funny behavior when playing from a losing position, and warding has shown to be a hard task to automate because there isn't a well-defined reward metric.
Open AI will continue to work on these problems and is going to present a version of this bot in the coming International, Dota2's tournament, which will take place later this month. Best of luck to them!