Facebook needed an optical character recognition system that can regularly process huge volumes of content, so it had to conjure up its own technology. According to the social network, Rosetta extracts text from over a billion images and video frames in a wide variety of languages every day in real time.
In a new blog post, the company explained how Rosetta works: it starts by detecting rectangular regions in images that potentially contain text. It then uses a convolutional neural network to recognize and transcribe what's written in that region, even non-English words or non-Latin alphabets, such as Arabic and Hindi. To train the system, Facebook used a mixture of human- and machine-annotated public images.
Various teams within Facebook and Instagram are already using Rosetta to surface more content and to police their platforms. The company plans to keep on growing the number of languages it can understand and to make it better at extracting text from video frames.
Speaking of languages, Facebook has also added 24 new languages to its automatic translation services, including Serbian, Belarusian, Marathi, Sinhalese, Telugu, Nepali, Kannada, Urdu, Punjabi, Cambodian, Pashto, Mongolian, Zulu, Xhosa and Somali. Facebook admits that translations for those languages are at an early stage, so they will still have a lot of errors. It plans to keep on improving them, though, and to introduce more languages in the future.