Facebook is using first-person videos to train future AIs

Point-of-view data will help computers see the world as we do.

Karissa Bell

One of the obvious goals of almost every computer vision project is to enable a machine to see, and perceive, the world as a human does. Today, Facebook has started talking about Ego4D, its own effort in this space, for which it has created a vast new data set to train future models. In a statement, the company said that it had recruited 13 universities across nine countries, who had collected 2,200 hours of footage from 700 participants. This footage was taken from the perspective of the user, which can be used to train these future AI models. Kristen Grauman, Facebook’s lead research scientist, says that this is the largest collection of data explicitly created for this focus.

The footage was centered on a number of common experiences in human life, including social interaction, hand and object manipulation and predicting what’s going to happen. It’s, as far as the social network is concerned, a big step toward better computing experiences which, until now, have always focused on sourcing data from the bystander’s perspective. Facebook has said that the data sets will be released in November, “for researchers who sign Ego4D’s data use agreement.” And, next year, researchers from beyond this community will be challenged to better train machines to understand what exactly humans are doing in their lives.

Naturally, there is the angle that Facebook, which now has a camera glasses partnership with Ray Ban, is looking to improve its own capabilities in future. You probably already know about the perils of what this potential surveillance could entail, and why anyone might feel a little leery about the announcement.