dataset

Latest

  • A screenshot of mislabeled images from ImageNet, a dataset used to test machine learning systems. In one instance, it applied a "nipple" label to a photo of a baby.

    MIT study finds labelling errors in datasets used to test AI

    by 
    Kris Holt
    Kris Holt
    03.29.2021

    Over three percent of data in the most-cited datasets was deemed inaccurate or mislabeled.

  • Ford self-driving vehicle visualization

    Ford shares a year's worth of self-driving car data

    by 
    Christine Fisher
    Christine Fisher
    05.05.2020

    Ford is releasing a comprehensive self-driving dataset to help academics and researchers.

  • Amelie-Benoist via Getty Images

    Google helps scientists search for public data

    by 
    Jon Fingas
    Jon Fingas
    09.05.2018

    There's a sea of open research data available on the web, but it can be time-consuming to sift through those sites to get at the data -- and it's not always presented in an easy-to-parse format. Google hopes it can make that information more accessible to scientists, journalists and plain old data junkies with its new Dataset Search feature. The tool provides more direct access to data presented in an open standard that makes it clear who created the info, how it was collected and how you're allowed to use it. You could not only track down climate data for a report, but make sure that it's relevant and legal to use.

  • AOL

    Wikipedia explains how those late-night reading binges happen

    by 
    Mariella Moon
    Mariella Moon
    01.18.2018

    Everybody's prone to falling down a Wikipedia rabbit hole, clicking link after link until it's been hours since you've started our journey. Now the foundation has begun releasing monthly data dumps for English, Russian, German, Spanish and Japanese Wikipedias that can give you a better understanding of how readers end up navigating from one article to the next. The Wikimedia Analytics team worked on being able to release datasets every month after seeing how the similar set of info released in 2015 led to a number of scholarly research studies.

  • Google Research

    Google releases massive visual databases for machine learning

    by 
    Richard Lawler
    Richard Lawler
    10.01.2016

    It seems like we hear about a new breakthrough using machine learning nearly every day, but it's not easy. In order to fine-tune algorithms that recognize and predict patterns in data, you need to feed them massive amounts of already-tagged information to test and learn from. For researchers, that's where two recently-released archives from Google will come in. Joining other high-quality datasets, Open Images and YouTube8-M provide millions of annotated links for researchers to train their processes on.

  • White House releases early test code for Data.gov platform, moves closer to open source reality

    by 
    Amar Toor
    Amar Toor
    12.06.2011

    The White House's Open Government Partnership inched closer to maturity last week, with the release of a new open data platform, designed to help other governments set up their own Data.gov portals. On Wednesday, Data.gov developer Chris Musialek posted the first pieces of early test code for the unfortunately named "Data.gov-in-a-box" -- an open source version of the US and Indian governments' respective data portals. Both countries, in fact, have been working on the platform since August, with the Obama administration pledging some $1 million to the effort. The idea, according to federal CIO Steve VanRoekel and federal CTO Aneesh Chopra, is to encourage "governments around the word to stand up open data sites that promote transparency, improve citizen engagement, and engage application developers," using Data.gov (and its 400,000 datasets) as a blueprint. Wednesday's release is just the first step in that plan, with the finalized Open Government Platform (OGPL) slated for launch by early next year.