Stability AI, one of the world’s most prominent generative AI companies, has joined the likes of Google, Meta and Open AI in creating a model that generates clips of music and sound. Called “Stable Audio,” the new text-to-music generator was trained on sounds from the music library Audio Sparx.
Stability touts its new product as the first music generation product that creates high-quality, 44.1 kHz music for commercial use though a process called “latent diffusion” — a process that was first introduced for images through Stable Diffusion, the company’s marquee product. Stable Audio uses sound conditioned on text metadata as well as audio file duration and start time to allow for greater control over the content of the generated audio.
By typing prompts like “post-rock, guitars, drum kit, bass, strings, euphoric, up-lifting, moody, flowing, raw, epic, sentimental, 125 BPM,” users can create up-to 20 seconds of sound through its free tier, or up-to 90 seconds of sound via its pro subscription.
In its announcement, the company touts Stable Audio as a tool for musicians “seeking to create samples to use in their own music.” The music generated could also be used to soundtrack advertisements and creator content, among other commercial applications.
“As the only independent, open and multimodal generative AI company, we are thrilled to use our expertise to develop a product in support of music creators,” says Emad Mostaque, CEO of Stability AI. “Our hope is that Stable Audio will empower music enthusiasts and creative professionals to generate new content with the help of AI, and we look forward to the endless innovations it will inspire.”
This is not the AI giant’s first foray into audio and music AI. The company already has an open-source generative audio label, HarmonAI, which is designed to create accessible and playful music production tools “by musicians for musicians.” Parts of the HarmonAI team, including Ed Newton Rex, its vp of product, took part in the designing of Stable Audio.