Latest in Science

Image credit:

Yahoo releases massive 13.5TB web-browsing data set to researchers

The anonymous data will help create better recommendation engines.
60 Shares
Share
Tweet
Share
Save

Sponsored Links

Yahoo's business may be struggling, but millions of people still visit its site to read the news every day. That gives the company unique insights into browsing and reading habits, and today the company has released a huge swath of that data. The "Yahoo News Feed dataset" incorporates anonymous browsing habits of 20 million users between February and May of 2015 across a variety of Yahoo properties, including its home page, main news site, Yahoo Sports, Yahoo Finance, Yahoo Movies and Yahoo Real Estate.

All told, the data set is a whopping 13.5TB and covers 110 billion unique interaction "events." Yahoo calls it the "largest machine learning dataset" ever publicly released, and we're inclined to believe them -- there aren't very many companies who could accumulate this much browsing data.

It's a huge amount of data, but fortunately you don't need to worry about advertisers mining it to make more targeted ads. Yahoo is specifically releasing it only to the academic research community to help people build more effective recommendation algorithms. As noted by the MIT Technology Review, the data set includes headlines that Yahoo's personalization algorithms show to visitors, a summary of the article, and which specific articles people click. There's also some demographic data for about 7 million users that includes age, gender and location -- but it's all been anonymized.

Improving recommendation algorithms is particularly relevant right now, as some of the biggest web properties rely on good recommendation engines to engage with their user. Netflix, Amazon, Google, Apple and Facebook (just to name a few) all rely on serving their users relevant recommendations to keep them engaged with their products and services. Yes, it's a way for those companies to make more money, but it also generally makes for a better user experience -- as long as those recommendations are good. Yahoo's huge data release will probably go a long way towards meeting that goal.

[Image credit: Noah Berger/Bloomberg via Getty Images]

Engadget’s parent company, Verizon, now owns Yahoo. Engadget remains editorially independent.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.
Comment
Comments
Share
60 Shares
Share
Tweet
Share
Save

Popular on Engadget

The 2019 Engadget Holiday Gift Guide

The 2019 Engadget Holiday Gift Guide

View
Facebook is fixing a bug that turned on phone cameras

Facebook is fixing a bug that turned on phone cameras

View
Iowa asked researchers to break into a courthouse, then it arrested them

Iowa asked researchers to break into a courthouse, then it arrested them

View
'Star Wars' and 'The Mandalorian' make Disney+ worth it

'Star Wars' and 'The Mandalorian' make Disney+ worth it

View
'Star Wars' on Disney+ reignites the Han-Greedo fan drama

'Star Wars' on Disney+ reignites the Han-Greedo fan drama

View

From around the web

Page 1Page 1ear iconeye iconFill 23text filevr