Meta's open-source MusicGen AI uses text to create song genre mashups

It transforms tunes much like Midjourney and ChatGPT change images and text.

By Steve Dent June 12, 2023 7:40 am EST

iLexx via Getty Images

We may receive a commission on purchases made from links.

Meta's Audiocraft research team has just released MusicGen, an open source deep learning language model that can generate new music based on text prompts and even be aligned to an existing song, The Decoder reported. It's much like ChatGPT for audio, letting you describe the style of music you want, drop in an existing tune (optionally) and then clicking "Generate." After a good chunk of time (around 160 seconds in my case), it spits out a short piece of all-new music based on your text prompts and melody.

The demo on Facebook's Hugging Face AI site lets you describe your music, providing a handful of examples like "an 80s driving pop song with heavy drums and synth pads in the background." You can then "condition" that on a given song up top 30 seconds long, with controls letting select a specific portion of that. Then, you just hit generate and it renders a high-quality sample up to 12 seconds long.

We present MusicGen: A simple and controllable music generation model. MusicGen can be prompted by both text and melody.
We release code (MIT) and models (CC-BY NC) for open research, reproducibility, and for the music community: https://t.co/OkYjL4xDN7 pic.twitter.com/h1l4LGzYgf

— Felix Kreuk (@FelixKreuk) June 9, 2023

The team used 20,000 hours of licensed music for training, including 10,000 high quality music tracks from an internal dataset, along with Shutterstock and Pond5 tracks. To make it faster, they used Meta's 32Khz EnCodec audio tokenizer to generate smaller chunks of music that can be processed in parallel. "Unlike existing methods like MusicLM, MusicGen doesn't not require a self-supervised semantic representation [and has] only 50 auto-regressive steps per second of audio," wrote Hugging Face ML Engineer Ahsen Khaliq in a tweet.

Last month, Google released a similar music generator called MusicLM, but MusicGen seems to generate slightly better results. On a sample page, the researchers compare MusicGen's output with MusicLM and two other models, Riffusion and Musai, to prove that point. It can be run locally (a GPU with at least 16GB of RAM is recommended) and available in four model sizes, from small (300 million parameters) to large (3.3 billion parameters) — with the latter having the greatest potential for producing complex music.

Steve Dent · Ode to 80s pop music joy

As mentioned, MusicGen is open source and can even be used to generate commercial music (I tried it with "Ode to Joy" and several suggested genres and the results above were... mixed). Still, it's the latest example of the breathtaking speed of AI development over the past half year, with deep learning models threatening to make incursions into yet another genre.

Meta's open-source MusicGen AI uses text to create song genre mashups

Recommended