AI Stability Introduces A New Model Of Sound Generation

Stability AI, the company known for its AI-powered art generator Stable Diffusion, has unveiled a new open-source AI model for generating sounds and music that it claims was trained exclusively on free recordings.

Called Stable Audio Open, this generative model transforms textual descriptions (eg, “Rock rhythm played in a processed studio, a drumming session on a speaker setup”) into audio recordings up to 47 seconds long. The model was trained using approximately 486,000 samples obtained from free music libraries such as Freesound and the Free Music Archive. According to Stability AI, the model can be used to create drum patterns, instrumental riffs, ambient sounds, and “production elements” for various media such as videos, movies, and TV shows. It can also “edit” existing songs or blend the style of one genre (like smooth jazz) with another.

“One of the main advantages of this open-source release is the ability for users to fine-tune the model to their audio data,” noted Stability AI in a blog post on its official website. “For example, a drummer could fine-tune a model using his drum recordings to create new rhythms.” However, Stable Audio Open has its drawbacks. It is not capable of effectively creating full songs, melodies, or vocals. Stability AI recognizes that the model is not optimized for these tasks and recommends its premium Stable Audio service for users who require such features.

Also, Stable Audio Open is not intended for commercial use, as this is prohibited by the terms of use. The model also has trouble generating content for a variety of musical styles and cultures or using descriptions in languages other than English—limitations that Stability AI attributes to the training data. “Data sources may not be diverse and not equally representative of all cultures,” Stability AI explained in a model description. “The generated samples will exhibit biases present in the training data.”

Stable AI, which has been struggling to revive it’s declining business, recently found itself embroiled in controversy after Ed Newton-Rex, vice president of generative audio, resigned amid disagreements over the company’s position that training AI models on works protected by copyright is “fair use”. The release of Stable Audio Open appears to be an attempt to change that narrative while promoting the paid services of Stability AI. As music creation tools like Stability AI become more popular, copyright issues and potential abuse by some developers have come under greater scrutiny.

In May, Sony Music, which represents artists such as Billy Joel, Doja Kat, and Lil Nas X, issued a warning to 700 AI companies, warning against the “unauthorized use” of its content to train audio generators. In addition, in March, the state of Tennessee passed the first US law aimed at limiting the misuse of AI in music.

Other posts