The fight to study what happens on Facebook

And why some researchers are going after the data on their own.

Senior Editor

Tue, Sep 7, 2021, 1:00 PM·9 min read

Facebook recently added a new report to its transparency center. The "widely viewed content" report was ostensibly meant to shed light on what’s been a long-running debate: What is the most popular content on Facebook?

The 20-page report raised more questions than answers. For example, it showed that the most viewed URL was a seemingly obscure website associated with former Green Bay Packers players. It boasted nearly 90 million views even though its official Facebook page has just a few thousand followers. The report also included URLs for e-commerce sites that seemed at least somewhat spammy, like online stores for CBD products and Bible-themed t-shirts. There was also a low-res cat GIF and several bland memes that asked people to respond with foods they like or don’t like or items they had recently purchased.

Notably absent from the report were the right-wing figures who regularly dominate the unofficial “Facebook Top 10” Twitter account, which ranks content by engagement. In fact, there wasn’t very much political content at all, a point Facebook has long been eager to prove. For Facebook, its latest attempt at “transparency” was evidence that most users’ feeds aren’t polarizing, disinformation-laced swamps but something much more mundane.

Days later, The New York Times reported that the company had prepped an earlier version of the report, but opted not to publish it. The top URL from that report was a story from the Chicago Sun Times that suggested the death of a doctor may have been linked to the COVID-19 vaccine. Though the story was from a credible news source, it’s also the kind of story that’s often used to fuel anti-vaccine narratives.

Almost as soon as the initial report was published, researchers raised other issues. Ethan Zuckerman, an associate professor of public policy and communication at University of Massachusetts at Amherst, called it “transparency theatre.” It was, he said, “a chance for FB to tell critics that they’re moving in the direction of transparency without releasing any of the data a researcher would need to answer a question like ‘Is extreme right-wing content disproportionately popular on Facebook?’”

The promise of ‘transparency’

For researchers studying how information travels on Facebook, it’s a familiar tactic: provide enough data to claim “transparency,” but not enough to actually be useful. “The findings of the report are debatable,” says Alice Marwick, principal researcher at the Center for Information Technology and Public Life at University of North Carolina. “The results just didn't hold up, they don't hold up to scrutiny. They don't map to any of the ways that people actually share information.”

Marwick and other researchers have suggested that this may be because Facebook opted to slice its data in an unusual way. They have suggested that Facebook only looked for URLs that were actually in the body of a post, rather than the link previews typically shared. Or perhaps Facebook just has a really bad spam problem. Or maybe it’s a combination of the two. “There's no way for us to independently verify them … because we have no access to data compared to what Facebook has,” Marwick told Engadget.

Those concerns were echoed by Laura Edelson, a researcher at New York University. “No one else can replicate or verify the findings in this report,” she wrote in a tweet. “We just have to trust Facebook.” Notably, Edelson has her own experience running into the limits of Facebook’s push for “transparency.”

The company recently shut down her personal Facebook account, as well as those of several NYU colleagues, in response to their research on political ad targeting on the platform. Since Facebook doesn’t make targeting data available in its ad library, the researchers recruited volunteers to install a browser extension that could scoop up advertising info based on their feeds.

Facebook called it “unauthorized scraping,” saying it ran afoul of their privacy policies. In doing so, it cited its obligation to the FTC, which the agency later said was “misleading.” Outside groups had vetted the project and confirmed it was only gathering data about advertisers, not users’ personal data. Guy Rosen, the company’s VP of Integrity, later said that even though the research was “well-intentioned” it posed too great a privacy risk. Edelson and others said Facebook was trying to silence research that could make the company look bad.“If this episode demonstrates anything it is that Facebook should not have veto power over who is allowed to study them,” she wrote in a statement.

Rosen and other Facebook execs have said that Facebook does want to make more data available to researchers, but that they need to go through the company’s official channels to ensure the data is made available in a “privacy protected” way. The company has a platform called FORT (Facebook Open Research and Transparency), which allows academics to request access to some types of Facebook data, including election ads from 2020. Earlier this year, the company said it would expand the program to make more info available to researchers studying “fringe” groups on the platform.

But while Facebook has billed FORT as yet another step in its efforts to provide “transparency,” those who have used FORT have cited shortcomings. A group of researchers at Princeton hoping to study election ads ultimately pulled the project, citing Facebook’s restrictive terms. They said Facebook pushed a “strictly non-negotiable” agreement that required them to submit their research to Facebook for review prior to publishing. Even more straightforward questions about how they were permitted to analyze the data were left unanswered.

“Our experience dealing with Facebook highlights their long running pattern of misdirection and doublespeak to dodge meaningful scrutiny of their actions,” they wrote in a statement describing their experience.

A Facebook spokesperson said the company only checks for personally identifiable information, and that it’s never rejected a research paper.

“We support hundreds of academic researchers at more than 100 institutions through the Facebook Open Research and Transparency project,” Facebook’s Chaya Nayak, who heads up FORT at Facebook, said in a statement. “Through this effort, we make massive amounts of privacy-protected data available to academics so they can study Facebook’s impact on the world. We also pro-actively seek feedback from the research community about what steps will help them advance research most effectively going forward.”

Data access affects researchers’ ability to study Facebook’s biggest problems. And the pandemic has further highlighted just how significant that work can be. Facebook’s unwillingness to share more data about vaccine misinformation has been repeatedly called out by researchers and public health officials. It’s all the more vexing because Facebook employs a small army of its own researchers and data scientists. Yet much of their work is never made public. “They have a really solid research team, but virtually everything that research team does is kept only within Facebook, and we never see any of it,” says Marwick, the UNC professor.

But much of Facebook’s internal research could help those outside the platform who are trying to understand the same questions, she says. “I want more of the analysis and research that's going on within Facebook to be communicated to the larger scholarly community, especially stuff around polarization [and] news sharing. I have a fairly strong sense that there's research questions that are actively being debated in my research community that Facebook knows the answer to, but they can't communicate it to us.”

The rise of ‘data donation’

To get around this lack of access, researchers are increasingly looking to “data donation” programs. Like the browser extension used by the NYU researchers, these projects recruit volunteers to “donate” some of their own data for research.

NYU’s Ad Observer, for example, collected data about ads on Facebook and YouTube, with the goal of helping them understand the platform’s ad targeting at amore granular level. Similarly, Mozilla, maker of the Firefox browser, has a browser add-on called Rally that helps researchers study a range of issues from COVID-19 misinformation to local news. The Markup, a nonprofit news organization, has also created Citizen Browser, a customized browser that aids journalists’ investigations into Facebook and YouTube. (Unlike Mozilla and NYU’s browser-based projects, The Markup pays users who participate in Citizen Browser.)

“The biggest single problem in our research community is the lack of access to private proprietary data,” says Marwick. “Data donation programs are one of the tactics that people in my community are using to try to get access to data, given that we know the platform's aren't going to give it to us.”

Crucially, it’s also data that’s collected independently, and that may be the best way to ensure true transparency, says Rebecca Weiss, who leads Mozilla’s Rally project. “We keep getting these good faith transparency efforts from these companies but it's clear that transparency also means some form of independence,” Weiss tells Engadget.

For participants, these programs offer social media users a way to make sure some of their data, which is constantly being scooped up by mega-platforms like Facebook, can also be used in a way that is within their control: to aid in research. Weiss says that, ultimately, it’s not that different from market research or other public science projects. “This idea of donating your time to a good faith effort — these are familiar concepts.”

Researchers also point out that there are significant benefits to gaining a better understanding of how the most influential and powerful platforms operate. The study of election ads, for example, can expose bad actors trying to manipulate elections. Knowing more about how health misinformation spreads can help public health officials understand how to combat vaccine hesitancy. Weiss notes that having a better understanding of why we see the ads we do — political or otherwise — can go a long way toward demystifying how social media platforms operate.

“This affects our lives on a daily basis and there's not a lot of ways that we as consumers can prepare ourselves for the world that exists with these increasingly more powerful ad networks that have no transparency.”

Engadget
ISPs are fighting to raise the price of low-income broadband
Internet service providers are objected to the lower rates they need to offer lower income customers if they want to obtain government funds from a new Internet access program.
Engadget
Amazon is giving The Boys the prequel treatment
The cast and crew of Amazon's The Boys announced a bunch of new spinoffs for the supe action series.
Engadget
You can date everything in Date Everything!
Date Everything! is an upcoming dating sim game that lets you date evert
Engadget
The Bioshock movie is still happening but with a reduced budget
The Bioshock movie is still happening, but with steep budget cuts. It’s being reconfigured to become a ‘more personal’ film.
Engadget
Warner Bros. Discovery sues the NBA in a last-ditch effort to block Amazon’s new streaming package
Warner Bros. Discovery followed through on its threat to “take appropriate action” against the NBA for rejecting its broadcasting rights offer. On Friday, the media company sued the league after the NBA turned down its bid to match Amazon’s streaming package.
Engadget
Apple’s M3 MacBook Air with 16GB of RAM is $200 off right now
Apple’s M3 MacBook Air combines Apple’s lightest and thinnest laptop design with the cutting-edge horsepower of the latest Apple silicon chip. You can get the 2024 model on sale for $200 off right now.
Engadget
Here's how to stop Grok's AI models using your tweets for training
X automatically opted users into letting Grok's AI models train on their tweets and interactions with the chatbot. Here's how to opt out.
Engadget
The 10th-generation iPad is back down to $300, plus the rest of this week's best tech deals
The week after Amazon's Prime Day can be a bit sleepy for deals, but we still found a few decent discounts on gear we've tested and recommend.
Engadget
The 65-inch LG C3 OLED TV is nearly half off for today only
The 65-inch LG C3 OLED TV is nearly half off for today only. That brings the set down to a record low of $1,300.
Engadget
NASA's Perseverance rover found a rock on Mars that could indicate ancient life
A Martian rock sample collected by Perseverance contains "chemical signatures and structures" that could've been formed by ancient microbial life from billions of years ago.
Engadget
Apple agrees to stick by Biden administration's voluntary AI safeguards
Apple has joined more than a dozen other tech companies in signing up for the Biden administration's voluntary AI code of practice.
Engadget
North Korean who used ransomware to attack US healthcare providers has been indicted
A grand jury in Kansas City has indicted Rim Jong Hyok, a North Korean intelligence operative who allegedly used ransomware to attack health providers' systems in the US.
Engadget
Samsung Galaxy Ring review: A bit basic, a bit pricey
The Galaxy Ring is comfortable and seemingly basic, but actually delivers detailed insight on your sleep, walks and runs.
Engadget
Apple's 14-inch MacBook Pro laptop with an M3 Pro chip is $300 off at Amazon
Apple's well-specked 14-inch MacBook Pro with an M3 Pro chip, 18GB of memory and 512GB of storage is on sale for the lowest price we've seen yet at Amazon.
Engadget
Gran Turismo 7's more realistic physics update is launching cars into orbit
Gran Turismo 7's latest update is causing some bizarre problems, making cars bounce violently or launch completely into the air.
Engadget
The Morning After: OpenAI reveals its AI-powered search engine, SearchGPT
The biggest news stories this morning: AI video startup Runway reportedly trained on ‘thousands’ of YouTube videos without permission, The best cameras for 2024, WhatsApp hits 100 million monthly active US users.
Engadget
The best fitness trackers for 2024
Here's a list of the best fitness trackers you can buy, as chosen by Engadget editors.
Engadget
The best cameras for 2024
Here's a list of the best cameras you can buy, as chosen by Engadget editors.
Engadget
X's Grok chatbot is misleading voters about the presidential election
Grok's AI chatbot claims that President Biden's name must stay on the ballot in nine states, a claim that is categorically false.
Engadget
Comic-Con leak sparks rumors of two remastered Soul Reaver games
A photo from Comic-Con has leaked possible remasters of two Soul Reaver games from Crystal Dynamics.

The fight to study what happens on Facebook

And why some researchers are going after the data on their own.

The promise of ‘transparency’

The rise of ‘data donation’

Latest Stories

ISPs are fighting to raise the price of low-income broadband

Amazon is giving The Boys the prequel treatment

You can date everything in Date Everything!

The Bioshock movie is still happening but with a reduced budget

Warner Bros. Discovery sues the NBA in a last-ditch effort to block Amazon’s new streaming package

Apple’s M3 MacBook Air with 16GB of RAM is $200 off right now

Here's how to stop Grok's AI models using your tweets for training

The 10th-generation iPad is back down to $300, plus the rest of this week's best tech deals

The 65-inch LG C3 OLED TV is nearly half off for today only

NASA's Perseverance rover found a rock on Mars that could indicate ancient life

Apple agrees to stick by Biden administration's voluntary AI safeguards

North Korean who used ransomware to attack US healthcare providers has been indicted

Samsung Galaxy Ring review: A bit basic, a bit pricey

Apple's 14-inch MacBook Pro laptop with an M3 Pro chip is $300 off at Amazon

Gran Turismo 7's more realistic physics update is launching cars into orbit

The Morning After: OpenAI reveals its AI-powered search engine, SearchGPT

The best fitness trackers for 2024

The best cameras for 2024

X's Grok chatbot is misleading voters about the presidential election

Comic-Con leak sparks rumors of two remastered Soul Reaver games

About

Sections

Contribute

Buying Guides