Reinforcement learning (RL) is the practice of teaching and guiding behavior by using a reward system. Desirable behavior produces rewards; undesirable behavior does not. It's a common tool used in machine learning, and now the the Alphabet team has used it to teach the DeepMind AI to successfully navigate a parkour course.
The team wanted to see if simple rewards would work in a complex environment. They set up a virtual parkour course with drops, hurdles and ledges and set a reward for forward progress. At its most basic level, the system was as follows: the faster the AI moved across the terrain, the greater the rewards. Additional incentives and penalties were added for more complex programs.
— Oriol Vinyals (@OriolVinyalsML) July 10, 2017
All of the stick figure's navigation was taught via reinforcement learning. The AI used a trial and error system to figure out how to move forward as fast as possible without "terminating."
It's clear that DeepMind is using creative solutions to get around the obstacles it's presented with; much of the time, the movement that provides the most efficient solution isn't exactly natural looking. It presents interesting possibilities for future AI because robots don't actually have to restrict themselves to human-like movements in order to accomplish set goals. It will be interesting to see if this has an effect on future AI and robot development.