Facebook taught a computer vision system how to supervise its own learning process

The techniques that taught AI to translate speech are being applied to visual tasks

Former Senior Editor

Thu, Mar 4, 2021, 10:00 AM·4 min read

As impressively capable as AI systems are these days, teaching machines to perform various tasks, whether its translating speech in real time or accurately differentiating between chihuahuas and blueberry muffins. But that process still involves some amount of hand holding and data curation by the humans training them. However the emergence of self supervised learning (SSL) methods, which have already revolutionized natural language processing, could hold the key to imbuing AI with some much needed common sense. Facebook’s AI research division (FAIR) has now, for the first time, applied SSL to computer vision training.

“We’ve developed SEER (SElf-supERvised), a new billion-parameter self-supervised computer vision model that can learn from any random group of images on the internet, without the need for careful curation and labeling that goes into most computer vision training today,” Facebook AI researchers wrote in a blog post Thursday. In SEERs case, Facebook showed it more than a billion random, unlabeled and uncurated public Instagram images.

Under supervised learning schemes, Facebook AI head scientist Yann LeCunn told Engadget, “to recognize speech you need to label the words that were pronounced; if you want to translate you need to have parallel text. To recognize images you need to have labels for every image.”

Unsupervised learning, on the other hand, “is the idea of a problem of trying to train a system to represent images in appropriate ways, without requiring labeled images,” LeCunn explained. One such method is joint embedding wherein a neural network is presented with a pair of nearly identical images — an original and a slightly modified and distorted copy. “You train the system so that whatever vectors are produced by those two elements should be as close to each other as possible,” LeCunn said. “Then, the problem is to make sure then when the system is shown two images that are different, it produces different vectors, different ‘embeddings’ as we call them. The very natural way to do this is to randomly pick millions of pairs of images that you know are different, run them through the network and hope for the best.” However, contrasting methods such as this tend to be very resource and time intensive given the scale of the necessary training data.

Applying the same SSL techniques used in NLP to computer vision poses additional challenges. As LeCunn notes, semantic language concepts are easily broken up into words and discrete phrases. “But with images, the algorithm must decide which pixel belongs to which concept. Furthermore, the same concept will vary greatly between images, such as a cat in different poses or viewed from different angles,” he wrote. “We need to look at a lot of images to grasp the variation around a single concept.”

And in order for this training method to be effective, researchers needed both an algorithm flexible enough to learn from large numbers of unannotated images and a convoluted network capable of sorting through the algorithmically generated data. Facebook found the former in the recently released SwAV, which “uses online clustering to rapidly group images with similar visual concepts and leverage their similarities,” six times faster than the previous state of the art, per LeCunn. The latter could be found in RegNets, a convoluted network which can apply billions (if not trillions) of parameters to a training model while optimizing its function depending on the available computing resources.

The results of this new system are quite impressive. After its billion-parameter pre-training session, SEER managed to outperform state-of-the-art self-supervised systems on ImageNet, notching 84.2-percent top-1 accuracy. Even when it was trained using just 10-percent of the original dataset, SEER achieved 77.9-percent accuracy. And when using only 1-percent of the OG dataset, SEER still managed a respectable 60.5-percent top-1 accuracy.

Essentially this research shows that, as with NLP training, unsupervised learning methods can be effectively applied to computer vision applications. With that added flexibility, Facebook and other social media platforms should be better equipped to deal with banned content.

“What we'd like to have and what we have to some extent already, but we need to improve, is a universal image understanding system,” LeCunn said. “So a system that, whenever you upload a photo or image on Facebook, computes one of those embeddings and from that we can tell you this is a cat picture or it is, you know, terrorist propaganda.”

As with its other AI research, LeCunn’s team is releasing both its research and SEER’s training library, dubbed VISSL, under an open source license. If you’re interested in giving the system a whirl, head over to the VISSL website for additional documentation and to grab its GitHub code.

Engadget
ISPs are fighting to raise the price of low-income broadband
Internet service providers are objected to the lower rates they need to offer lower income customers if they want to obtain government funds from a new Internet access program.
Engadget
Amazon is giving The Boys the prequel treatment
The cast and crew of Amazon's The Boys announced a bunch of new spinoffs for the supe action series.
Engadget
You can date everything in Date Everything!
Date Everything! is an upcoming dating sim game that lets you date evert
Engadget
The Bioshock movie is still happening but with a reduced budget
The Bioshock movie is still happening, but with steep budget cuts. It’s being reconfigured to become a ‘more personal’ film.
Engadget
Warner Bros. Discovery sues the NBA in a last-ditch effort to block Amazon’s new streaming package
Warner Bros. Discovery followed through on its threat to “take appropriate action” against the NBA for rejecting its broadcasting rights offer. On Friday, the media company sued the league after the NBA turned down its bid to match Amazon’s streaming package.
Engadget
Apple’s M3 MacBook Air with 16GB of RAM is $200 off right now
Apple’s M3 MacBook Air combines Apple’s lightest and thinnest laptop design with the cutting-edge horsepower of the latest Apple silicon chip. You can get the 2024 model on sale for $200 off right now.
Engadget
Here's how to stop Grok's AI models using your tweets for training
X automatically opted users into letting Grok's AI models train on their tweets and interactions with the chatbot. Here's how to opt out.
Engadget
The 10th-generation iPad is back down to $300, plus the rest of this week's best tech deals
The week after Amazon's Prime Day can be a bit sleepy for deals, but we still found a few decent discounts on gear we've tested and recommend.
Engadget
The 65-inch LG C3 OLED TV is nearly half off for today only
The 65-inch LG C3 OLED TV is nearly half off for today only. That brings the set down to a record low of $1,300.
Engadget
NASA's Perseverance rover found a rock on Mars that could indicate ancient life
A Martian rock sample collected by Perseverance contains "chemical signatures and structures" that could've been formed by ancient microbial life from billions of years ago.
Engadget
Apple agrees to stick by Biden administration's voluntary AI safeguards
Apple has joined more than a dozen other tech companies in signing up for the Biden administration's voluntary AI code of practice.
Engadget
North Korean who used ransomware to attack US healthcare providers has been indicted
A grand jury in Kansas City has indicted Rim Jong Hyok, a North Korean intelligence operative who allegedly used ransomware to attack health providers' systems in the US.
Engadget
Samsung Galaxy Ring review: A bit basic, a bit pricey
The Galaxy Ring is comfortable and seemingly basic, but actually delivers detailed insight on your sleep, walks and runs.
Engadget
Apple's 14-inch MacBook Pro laptop with an M3 Pro chip is $300 off at Amazon
Apple's well-specked 14-inch MacBook Pro with an M3 Pro chip, 18GB of memory and 512GB of storage is on sale for the lowest price we've seen yet at Amazon.
Engadget
Gran Turismo 7's more realistic physics update is launching cars into orbit
Gran Turismo 7's latest update is causing some bizarre problems, making cars bounce violently or launch completely into the air.
Engadget
The Morning After: OpenAI reveals its AI-powered search engine, SearchGPT
The biggest news stories this morning: AI video startup Runway reportedly trained on ‘thousands’ of YouTube videos without permission, The best cameras for 2024, WhatsApp hits 100 million monthly active US users.
Engadget
The best fitness trackers for 2024
Here's a list of the best fitness trackers you can buy, as chosen by Engadget editors.
Engadget
The best cameras for 2024
Here's a list of the best cameras you can buy, as chosen by Engadget editors.
Engadget
X's Grok chatbot is misleading voters about the presidential election
Grok's AI chatbot claims that President Biden's name must stay on the ballot in nine states, a claim that is categorically false.
Engadget
Comic-Con leak sparks rumors of two remastered Soul Reaver games
A photo from Comic-Con has leaked possible remasters of two Soul Reaver games from Crystal Dynamics.

Facebook taught a computer vision system how to supervise its own learning process

The techniques that taught AI to translate speech are being applied to visual tasks

Latest Stories

ISPs are fighting to raise the price of low-income broadband

Amazon is giving The Boys the prequel treatment

You can date everything in Date Everything!

The Bioshock movie is still happening but with a reduced budget

Warner Bros. Discovery sues the NBA in a last-ditch effort to block Amazon’s new streaming package

Apple’s M3 MacBook Air with 16GB of RAM is $200 off right now

Here's how to stop Grok's AI models using your tweets for training

The 10th-generation iPad is back down to $300, plus the rest of this week's best tech deals

The 65-inch LG C3 OLED TV is nearly half off for today only

NASA's Perseverance rover found a rock on Mars that could indicate ancient life

Apple agrees to stick by Biden administration's voluntary AI safeguards

North Korean who used ransomware to attack US healthcare providers has been indicted

Samsung Galaxy Ring review: A bit basic, a bit pricey

Apple's 14-inch MacBook Pro laptop with an M3 Pro chip is $300 off at Amazon

Gran Turismo 7's more realistic physics update is launching cars into orbit

The Morning After: OpenAI reveals its AI-powered search engine, SearchGPT

The best fitness trackers for 2024

The best cameras for 2024

X's Grok chatbot is misleading voters about the presidential election

Comic-Con leak sparks rumors of two remastered Soul Reaver games

About

Sections

Contribute

Buying Guides