Dubbed ADEPT, the system is able to, like a human being, understand some laws of physics intuitively. It can look at an object in a video, predict how it should act based on what it knows of the laws of physics and then register surprise if what it was looking at subsequently vanishes or teleports. The team behind ADEPT say their model will allow other researchers to create smarter AIs in the future, as well give us a better understanding of how infants understand the world around them.
"By the time infants are three months old, they have some notion that objects don't wink in and out of existence, and can't move through each other or teleport," said Kevin A. Smith, one of the researchers that created ADEPT. "We wanted to capture and formalize that knowledge to build infant cognition into artificial-intelligence agents. We're now getting near human-like in the way models can pick apart basic implausible or plausible scenes."
ADEPT depends on two modules to do what it does. The first examines an object, determining its shape, pose and velocity. What's interesting about this module is that it doesn't get caught up in details. It only looks at the approximate geometry of something, rather than analyzing every facet of it, before it moves onto the next step. This was by design, according to the ADEPT team; it allows the system to predict the movement of a variety of different objects, not just ones it was trained to understand. Moreover, it's an aspect of the system's design that makes it similar to infants. Like ADEPT, it turns out that children don't care much about the specific physical properties of something when they're thinking about how it may move.
The second module is a physics system. It shares similarities with the software video game developers employ to replicate real-world physics in their games. It takes the data captured by the graphics module and simulates how an object should act based on the laws of physics. Once it has a couple of predicted outcomes, it will compare those against the next frames of a video. If it notices a discrepancy in what it thought would happen with what actually occurred, it will send out a signal. The stronger the signal, the more surprised it was by what just happened. What's interesting about ADEPT is that its level of surprise matched those of humans who were shown the same set of videos.
Moving forward, the team says they want to further explore how young children see the world, and incorporate those findings into their model. "We want to see what else needs to be built in to understand the world more like infants, and formalize what we know about psychology to build better AI agents," Smith said.