Microsoft's already demonstrated how its computer vision technology can recognize objects even better than humans, now it's onto the next frontier: Interpreting elements of a photo and automatically generating captions. That may not exactly sound exciting, but being able to accurately explain an image could be essential for artificial intelligence. It's also yet another sign of the power of neural networks, or computer models that try to mimic the way the human brain works. Microsoft's technology starts by identifying everything in an image, then it generates sentences around how those objects interact. For example, in the image above it came up with "A purple camera with a woman"; "A woman holding a camera in a crowd"; and "A woman holding a cat." Two of those sentences don't make much sense -- it somehow identifies a bundle of hair as a cat -- so it eventually settled on "A woman holding a camera in a crowd" as the best way to describe the scene.