The team wanted to see if simple rewards would work in a complex environment. They set up a virtual parkour course with drops, hurdles and ledges and set a reward for forward progress. At its most basic level, the system was as follows: the faster the AI moved across the terrain, the greater the rewards. Additional incentives and penalties were added for more complex programs.
All of the stick figure's navigation was taught via reinforcement learning. The AI used a trial and error system to figure out how to move forward as fast as possible without "terminating."
It's clear that DeepMind is using creative solutions to get around the obstacles it's presented with; much of the time, the movement that provides the most efficient solution isn't exactly natural looking. It presents interesting possibilities for future AI because robots don't actually have to restrict themselves to human-like movements in order to accomplish set goals. It will be interesting to see if this has an effect on future AI and robot development.