Facebook has long been using AI to describe photos for the visually impaired, but it’s stepping up its efforts in 2021. The social media giant has detailed a new version of automatic alternative text (AAT) that promises much more information.
Instead of relying on heavily supervised AI learning, Facebook is now using weak supervision based on “billions” of Instagram photos and hashtags. The method lets Facebook expand beyond just 100 concept descriptions to include over 1,200, such as different kinds of food and national monuments. It’s also more culturally inclusive — it can recognize weddings that don’t involve white wedding dresses, for example.
A new object detection system can also recognize where people are in the frame as well as the number of people in the scene. And while you’ll normally get a simple summary of a photo’s contents, you’ll have the choice of a detailed description that outlines the position, size and nature of objects.
This still offers an imperfect description — Facebook stressed that it still uses “may be” and errs on the side of excluding concepts it can’t solidly recognize. It’s a big leap for accessibility on the social network, though, and could ensure that more visually impaired people at least get the gist of photos in their feeds.