People online tend to communicate not just with words, but also with images. For a platform like Facebook with over 2 billion monthly active users, that means a plethora of images gets posted every day, including memes. In order to include images with text in relevant photo search results, to give screen readers a way to read what's written on them and to make sure they don't contain hate speech and other words that violate the website's content policy, Facebook has created and deployed a large-scale machine learning system called "Rosetta."
Facebook needed an optical character recognition system that can regularly process huge volumes of content, so it had to conjure up its own technology. According to the social network, Rosetta extracts text from over a billion images and video frames in a wide variety of languages every day in real time.
In a new blog post, the company explained how Rosetta works: it starts by detecting rectangular regions in images that potentially contain text. It then uses a convolutional neural network to recognize and transcribe what's written in that region, even non-English words or non-Latin alphabets, such as Arabic and Hindi. To train the system, Facebook used a mixture of human- and machine-annotated public images.
Various teams within Facebook and Instagram are already using Rosetta to surface more content and to police their platforms. The company plans to keep on growing the number of languages it can understand and to make it better at extracting text from video frames.
Speaking of languages, Facebook has also added 24 new languages to its automatic translation services, including Serbian, Belarusian, Marathi, Sinhalese, Telugu, Nepali, Kannada, Urdu, Punjabi, Cambodian, Pashto, Mongolian, Zulu, Xhosa and Somali. Facebook admits that translations for those languages are at an early stage, so they will still have a lot of errors. It plans to keep on improving them, though, and to introduce more languages in the future.