Latest in Science

Image credit:

Yahoo releases massive 13.5TB web-browsing data set to researchers

The anonymous data will help create better recommendation engines.
Nathan Ingraham
January 14, 2016
Share
Tweet
Share

Sponsored Links

Yahoo's business may be struggling, but millions of people still visit its site to read the news every day. That gives the company unique insights into browsing and reading habits, and today the company has released a huge swath of that data. The "Yahoo News Feed dataset" incorporates anonymous browsing habits of 20 million users between February and May of 2015 across a variety of Yahoo properties, including its home page, main news site, Yahoo Sports, Yahoo Finance, Yahoo Movies and Yahoo Real Estate.

All told, the data set is a whopping 13.5TB and covers 110 billion unique interaction "events." Yahoo calls it the "largest machine learning dataset" ever publicly released, and we're inclined to believe them -- there aren't very many companies who could accumulate this much browsing data.

It's a huge amount of data, but fortunately you don't need to worry about advertisers mining it to make more targeted ads. Yahoo is specifically releasing it only to the academic research community to help people build more effective recommendation algorithms. As noted by the MIT Technology Review, the data set includes headlines that Yahoo's personalization algorithms show to visitors, a summary of the article, and which specific articles people click. There's also some demographic data for about 7 million users that includes age, gender and location -- but it's all been anonymized.

Improving recommendation algorithms is particularly relevant right now, as some of the biggest web properties rely on good recommendation engines to engage with their user. Netflix, Amazon, Google, Apple and Facebook (just to name a few) all rely on serving their users relevant recommendations to keep them engaged with their products and services. Yes, it's a way for those companies to make more money, but it also generally makes for a better user experience -- as long as those recommendations are good. Yahoo's huge data release will probably go a long way towards meeting that goal.

[Image credit: Noah Berger/Bloomberg via Getty Images]

Engadget’s parent company, Verizon, now owns Yahoo. Engadget remains editorially independent.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.
Comment
Comments
Share
Tweet
Share

Popular on Engadget

Facebook inexplicably logs out iPhone users

Facebook inexplicably logs out iPhone users

View
Put Bernie Sanders almost anywhere with this Google Street View app

Put Bernie Sanders almost anywhere with this Google Street View app

View
Microsoft reverses Xbox Live price hike, will add free multiplayer for some games

Microsoft reverses Xbox Live price hike, will add free multiplayer for some games

View
Apple's Magic Keyboard for iPad drops to $199 at Amazon

Apple's Magic Keyboard for iPad drops to $199 at Amazon

View
The Morning After: The Galaxy S21 reviews are in

The Morning After: The Galaxy S21 reviews are in

View

From around the web

Page 1Page 1ear iconeye iconFill 23text filevr