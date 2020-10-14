Latest in Gear

Image credit: Microsoft

Microsoft’s AI is now better at image captioning than humans

It’s a new milestone for AI that could genuinely help the visually impaired. 
Devindra Hardawar, @devindra
13m ago
A field of wheat used to train Microsoft's AI image captioning
A field of wheat used to train Microsoft's AI captioning. Microsoft

Describing an image accurately, and not just like a clueless robot, has long been the goal of AI. In 2016, Google said its artificial intelligence could caption images almost as well as humans, with 94 percent accuracy. Now Microsoft says it’s gone even further: Its researchers have built an AI system that’s even more accurate than humans — so much so that it now sits at the top of the leaderboard for the nocaps image captioning benchmark

And while that’s a notable milestone on its own, Microsoft isn’t just keeping this tech to itself. It’s now offering the new captioning model as part of Azure's Cognitive Services, so any developer can bring it into their apps. It’s also available today in Seeing AI, Microsoft's app for blind and visually impaired users that can narrative the world around them. And later this year, the captioning model will also improve your presentations in PowerPoint for the web, Windows and Mac. It’ll also pop up in Word and Outlook on desktop platforms.

"[Image captioning] is one of the hardest problems in AI,” said Eric Boyd, CVP of Azure AI, in an interview with Engadget. “It represents not only understanding the objects in a scene, but how they’re interacting, and how to describe them.” Refining captioning techniques can help every user: It makes it easier to find the images you’re looking for in search engines. And for visually impaired users, it can make navigating the web and software dramatically better.

It’s not unusual to see companies tout their AI research innovations, but it’s far rarer for those discoveries to be quickly deployed to shipping products. Lijuan Wang, one of Microsoft’s principle research managers who led the development of the new captioning model, pushed to integrate it into Azure quickly because of the potential benefits for users. His team trained the model with images tagged with specific keywords, which helped give it a visual language most AI frameworks don’t have. Typically, these sorts of models are trained with images and full captions, which makes it more difficult for the models to learn how specific objects interact.

