Some tasks are just too complex, too nuanced to tackle all at once, like beating all 256 levels of Ms. Pac-Man on the Atari 2600 while earning a perfect score of 999,990. That's why Microsoft didn't even try to train its AI to take it on in one go. Instead the company, as it announced on Wednesday, split this monumental challenge up into smaller, chomp-sized pieces and trained a hivemind of 150 AIs to accomplish it as a team.
Developed by Maluuba, a Canadian AI firm that Microsoft recently acquired, the AI system relies on reinforcement learning to develop its strategy. Reinforcement learning is an AI training technique wherein the algorithm is rewarded for using more efficient outcomes and dissuaded from using the less effective based on previously observed outcomes. The idea is that, with enough time and tries, the system will eventually figure out on its own what the best course of action will be. This is the same technique that Google used to beat the world's Go champions.
But with sufficiently complex tasks, a simple reinforcement learning system is too slow -- think monkeys on typewriters reproducing the complete works of Shakespeare. So, the Maluuba team split the task into smaller sub-tasks, like avoiding ghosts or getting to a specific pellet within the maze, and gave each to one of 150 parallel neural networks to figure out. The team then installed a master AI on top of that array of networks to direct the swarm's actions and help achieve their common goal of beating the game.
The master AI takes the response of each sub-AI in a given scenario, weights them and then makes a decision for the group. That is, even if half of the sub-AIs are saying "Go right, get that pellet" but a few are saying "No, don't, there's a ghost down that hall" the master AI will defer to the not-dying contingent rather than the pellet-getters. The team has dubbed its unique system a Hybrid Reward Architecture.
The gameplay results speak for themselves but the company hopes to leverage this success to make future AI systems faster, more reliable and more self-sufficient. As the Microsoft Blog points out, this technique could be used by a sales team to figure out which clients need their attention most at any given time throughout the week or even day. It can also be employed to improve natural language recognition systems. But let's see it beat Contra without using the Konami code.