You probably haven't heard of YouTube-8M, but it's a big deal for anyone working in the field of machine learning. In short, it's a large database of labeled video content that programmers can use to test out their algorithms. Today, Google announced that YouTube-8M is getting a major update, with even more labels across more its videos, as well as audio elements. And the company is also aiming to make the dataset even better with a Kaggle competition, which will offer big bucks from a $100,000 prize pool to teams who build the best algorithms for tagging around 700,000 new videos (using the 8M dataset for training).
"The dataset was created from over 7 million YouTube videos (450,000 hours of video) and includes video labels from a vocabulary of 4716 classes (3.4 labels/video on average," Google wrote on the competition page. "It also comes with pre-extracted audio & visual features from every second of video (3.2B feature vectors in total)."
Google says it'll announce the winning teams at the YouTube-8M Workshop held during the IEEE Conference on Computer Vision and Pattern Recognition in July. With up to $30,000 awarded per team, there's a good chance Google will end up attracting some eager developers. The company is also offering some free Google Cloud credits to early participants.
While the results of the competition won't directly affect consumers for a while, Google software engineer Paul Natsev notes that whatever they learn will be useful across many different types of videos. Hopefully, that could lead to better searching and content filtering down the line on YouTube.