You may notice a marked improvement in the audio quality of some YouTube Stories going forward, thanks to a new speech enhancement feature Google rolled out. A couple of years ago, the tech giant debuted the “Looking to Listen” AI technology that can pick out voices in a crowd. Now, it’s making the technology available to creators recording YouTube Stories on iOS devices.
Google taught Looking to Listen the correlations between speech and visual signals, such as the speaker’s mouth movements and facial expressions, by training it on a large collection of online videos. To ensure that it will work for everyone and won’t show bias, Google conducted a series of tests exploring its performance based on various visual and auditory attributes. Those attributes include the speaker’s age, skin tone, spoken language, voice pitch, visibility of their face, head pose, facial hair, presence of glasses and the level of background noise. They were able to determine, for instance, that the technology’s capability to enhance speech remains pretty consistent across speakers’ languages. Facial hair doesn’t seem to have a big effect on it either, though it works best on faces with no facial hair and those with a close shave.
The tech giant also went on to explain in its announcement post how it has improved the technology over the past couple of years. To start with, the developers made sure that it can do all the processing on the device itself, so it doesn’t need to send anything to a remote server. They also used a technique that allows the feature to extract thumbnails with faces from videos for analysis very quickly. That allows the technology to start speech enhancement while the video is still being recorded. Those improvements shrunk the feature’s size from 120MB to 6MB, making it easier to deploy. Google says they also “reduced [Looking to Listen’s] running time from 10x real-time on a desktop using the original formulation... to 0.5x real-time performance using only an iPhone CPU.” In fact, it’ll only take the technology a couple of seconds to process a 15-second Story.
To activate the feature, creators only have to toggle on “Enhance speech” in volume controls on iOS. You can also watch it in action in the videos below.