Latest in Gear

Image credit: Urupong via Getty Images

Facebook's 'Rosetta' AI can extract text from a billion images daily

The social network's automatic translation feature now also works with 24 new languages.
395 Shares
Share
Tweet
Share
Save

Sponsored Links

Urupong via Getty Images

People online tend to communicate not just with words, but also with images. For a platform like Facebook with over 2 billion monthly active users, that means a plethora of images gets posted every day, including memes. In order to include images with text in relevant photo search results, to give screen readers a way to read what's written on them and to make sure they don't contain hate speech and other words that violate the website's content policy, Facebook has created and deployed a large-scale machine learning system called "Rosetta."

Facebook needed an optical character recognition system that can regularly process huge volumes of content, so it had to conjure up its own technology. According to the social network, Rosetta extracts text from over a billion images and video frames in a wide variety of languages every day in real time.

In a new blog post, the company explained how Rosetta works: it starts by detecting rectangular regions in images that potentially contain text. It then uses a convolutional neural network to recognize and transcribe what's written in that region, even non-English words or non-Latin alphabets, such as Arabic and Hindi. To train the system, Facebook used a mixture of human- and machine-annotated public images.

Various teams within Facebook and Instagram are already using Rosetta to surface more content and to police their platforms. The company plans to keep on growing the number of languages it can understand and to make it better at extracting text from video frames.

Speaking of languages, Facebook has also added 24 new languages to its automatic translation services, including Serbian, Belarusian, Marathi, Sinhalese, Telugu, Nepali, Kannada, Urdu, Punjabi, Cambodian, Pashto, Mongolian, Zulu, Xhosa and Somali. Facebook admits that translations for those languages are at an early stage, so they will still have a lot of errors. It plans to keep on improving them, though, and to introduce more languages in the future.

Source: Facebook (1), (2)
In this article: facebook, gear, internet
All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.
Comment
Comments
Share
395 Shares
Share
Tweet
Share
Save

Popular on Engadget

Engadget’s guide to Home Entertainment

Engadget’s guide to Home Entertainment

View
Safari in iOS sends some Safe Browsing data to Tencent

Safari in iOS sends some Safe Browsing data to Tencent

View
US says digital assets are covered by money laundering and disclosure laws

US says digital assets are covered by money laundering and disclosure laws

View
San Francisco's proposed office would prevent 'reckless' tech rollouts

San Francisco's proposed office would prevent 'reckless' tech rollouts

View
Porsche's Macan EV will fully replace its gas counterpart in a few years

Porsche's Macan EV will fully replace its gas counterpart in a few years

View

From around the web

Page 1Page 1ear iconeye iconFill 23text filevr