Meta's newest AI fairness benchmark measures even more granular bias markers

The dataset includes more than 26,000 videos recorded by 5,600 subjects in seven countries.

iLexx via Getty Images

As a white man in America with no discernible regional accent, I can simply assume that modern consumer technologies — virtual assistants like Siri, Alexa or Assistant, and my phones’ camera — will work seamlessly out of the box. I assume this because, well, they do. That’s namely because the nerds who design and program these devices overwhelmingly both look and sound just like me — if even a little whiter. Folks with more melanin in their skin and extra twang on their tongue don’t enjoy that same privilege.

Tomorrow’s chatbots and visual AIs will only serve to exacerbate this bias unless steps are taken today to ensure a benchmark standard of fairness and equitable behavior from these systems. To address that issue, Meta AI researchers developed and released the Casual Conversations dataset in 2021, designed to “help researchers evaluate their computer vision and audio models for accuracy across a diverse set of age, genders, apparent skin tones and ambient lighting conditions.” On Thursday, the company unveiled Casual Conversations v2, which promises even more granular classification categories than its predecessor.

The original CC dataset included 45,000 videos from more than 3,000 paid subjects across age, gender, apparent skin tone and lighting conditions. These videos are designed to be used by other AI researchers, specifically those working with generative AIs like ChatGPT or visual AIs like those used in social media filters and facial recognition features, to help them ensure that their creations behave the same whether the user looks like Anya Taylor-Joy or Lupita Nyong’o, whether they sound like Colin Firth or Colin Quinn.

Since Casual Conversations first debuted two years ago, Meta has worked “in consultation with internal experts in fields such as civil rights,” according to Tuesday’s announcement, to expand and improve upon the dataset. Professor Pascale Fung, director of the Centre for AI Research, as well as other researchers from Hong Kong University of Science and Technology, participated in the literature review of government and industry data to establish the new annotation categories.

Version 2 now includes 11 categories (seven self-reported and four researcher-annotated) and 26,467 video monologues recorded by nearly 5,600 subjects in seven countries — Brazil, India, Indonesia, Mexico, Vietnam, Philippines and the US. While there aren’t as many individual videos in the new dataset, they are far more heavily annotated. As Meta points out, the first iteration only had a handful of categories: “age, three subcategories of gender (female, male, and other), apparent skin tone and ambient lighting,” according to the Thursday blog post.

“To increase nondiscrimination, fairness, and safety in AI, it’s important to have inclusive data and diversity within the data categories so researchers can better assess how well a specific model or AI-powered product is working for different demographic groups,” Roy Austin, Vice President and Deputy General Counsel for Civil Rights at Meta, said in the release. “This dataset has an important role in ensuring the technology we build has equity in mind for all from the outset."

As with most all of its public AI research to date, Meta is releasing Casual Conversations v2 as an open source dataset for anyone to use and expand upon — perhaps to include markers such as “disability, accent, dialect, location, and recording setup,” as the company hinted at on Thursday.