Latest in Entertainment

Image credit: America's Got Talent

YouTube automates sound effect captions with AI

Its AI can detect laughter, applause and music for the deaf or hard of hearing.
430 Shares
Share
Tweet
Share

Sponsored Links

America's Got Talent

YouTube has used algorithms to automatically caption speech for eight years now in an effort to make its billions of videos more accessible for the deaf and hard of hearing. While the feature was pretty rough at first, it has significantly improved it over time, getting "closer and closer to human transcription error rates," Google said in its developers blog. Since speech is just one part of the audio picture, though, YouTube has launched automatic sound effect captioning for the first time.

For now, the system can just show three classes of sounds: Applause, music and laughter. "These were among the most frequent manually captioned sounds, and they can add meaningful context for viewers who are deaf and hard of hearing," the company wrote.

As with the automatic captions, Google uses machine learning to pick out sounds and display them as text. It developed a "deep neural network (DNN)" model for ambient sound, and trained it with "thousands of hours of videos" to get the best results. The toughest part, it wrote in a technical blog, was separating and displaying events that tend to occur at the same, like laughter and applause.

You can see what that looks like in the clip from America's Got Talent below. The sound effects are merged with the automatic speech recognition and "shown as part of the standard automatic captions," much as you'd see in a close-captioned TV show.

YouTube's team said its aware that the captions are "simplistic," but adding features will be easier as it has built a solid back end foundation. In the future, it'll introduce common sounds like barking, knocking or ringing. That will pose new challenges, as the AI will need to figure out if a ringing sound is coming from an alarm, phone or doorbell, for example.

It'll be worth the effort, though, as Google says that two-thirds of participants in a study found that sound effect captions enhance the video experience. And while it's bound to make mistakes no matter how good it gets (even humans are only about 95 percent accurate), users think that the odd error won't detract from the benefits.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.
Comment
Comments
Share
430 Shares
Share
Tweet
Share

Popular on Engadget

The $35 Raspberry Pi 4 now comes with double the RAM

The $35 Raspberry Pi 4 now comes with double the RAM

View
Daisy is a tiny $29 computer for building custom musical instruments

Daisy is a tiny $29 computer for building custom musical instruments

View
FCC begins collecting data to help carriers replace Huawei and ZTE hardware

FCC begins collecting data to help carriers replace Huawei and ZTE hardware

View
Volkswagen's 2021 GTI adds a hybrid powertrain and tech-filled interior

Volkswagen's 2021 GTI adds a hybrid powertrain and tech-filled interior

View
Google Earth finally works on Firefox, Edge and Opera browsers

Google Earth finally works on Firefox, Edge and Opera browsers

View

From around the web

Page 1Page 1ear iconeye iconFill 23text filevr