You normally need to listen to a song or read its lyrics to tell if it’s using explicit language, but Deezer thinks AI might be up to the job before long. The streaming music service is researching a machine learning technique that would detect explicit lyrics solely through audio. Instead of merely training an AI to recognize colorful words with a giant set of annotated samples, like you often see with machine learning, Deezer extracts the vocals and looks for instances where it’s likely that a word matches entries in a dictionary of foul expressions. From there, a simple binary classifier decides if a given word is naughty. It’s an “explainable” system that can show why the AI came to a decision, the company said.
The team also hopes to reduce bias and improve accuracy by using equal amounts of explicit and clean songs in each music genre, and with a variety of genres.