Scientists release personal data for 70,000 OkCupid profiles (updated)

Though publicly available on individual profiles, the data was collected without permission from either the users or the dating site.

girafchick123/Getty Images

A group of Danish researchers scraped data from 70,000 OkCupid profiles, packaged it in a data set, and released it on the internet. While the profiles are technically public, collecting personal information on such a massive scale without getting consent from either OkCupid or the users themselves is at the very least a breach of social science ethics, experts say.

The researchers, Emil Kirkegaard and Julius Daugbjerg Bjerrekær, used software to automatically scrape profiles and then uploaded it in a set onto the Open Science Framework, a forum and repository for scientists to share data. The info is only slightly anonymous: While no real names are used, usernames are connected with location and answers to the litany of personal questions OkCupid uses to find compatibility. Some of these, like political leanings or feelings about homosexuality, are quite private.

As Kirkegaard repeatedly stated on Twitter, the data was indeed publicly available, but the scraping violates the dating site's terms and a possible legal matter, an OkCupid spokesperson told Vox. And, as Vox points out, it's also a breach of ethics according to the American Psychological Association, which states that people involved in research studies have the right to consent. Even Aarhus University in Denmark, where Kirkegaard is a student, publicly distanced itself from him and noted that the profile data was not collected on behalf of the university.

OkCupid isn't a stranger to mining its users for data and publishing observations on its now-defunct OkTrends blog, but there are crucial differences. For one, the posts are summaries, not massive data sets with identifiable information. Second, and more importantly, OkCupid users give consent when signing up for the dating site to mine their profiles and activity.

Update: The article previously mentioned a third researcher named in the paper, who was listed as a contributor for creating the scraping software years ago. This person claims not to have been involved with the research whatsoever and was included in the paper without their knowledge.