Stability AI released the audio model 2.0, and generated music has become the next hot topic in the technology competition

author：DeepTech 2024-04-05 21:45:00

Still struggling to find a favorite song when you want to listen to music? Now you can make your own song in minutes.

Recently, generative music has become the next hot spot to chase thanks to significant advancements in generative AI technology. In the past, there was the widely acclaimed Suno, and domestic large-scale model manufacturers also followed up. On April 3, Beijing time, Stability AI, an open-source generative AI company, announced the launch of Stable Audio 2.0, an audio generation model. The previous version 1.0 debuted in September 2023 and was named one of the best inventions of 2023 by Time magazine.

The current implementation of AI-generated music mainly relies on deep learning technology, just like a language model is trained on a large amount of text, an audio model trains a large amount of music data to enable AI to understand the structure, style, and creative rules of music, and then generate new music.

According to Stability AI's official blog post, Stable Audio 2.0 was trained using data from AudioSparx, which contains more than 800,000 audio files covering music, sound effects, and individual instruments, as well as corresponding text descriptions.

Stable Audio 2.0 produces up to three minutes of full music with high-quality 44.1 kHz stereo sound using only natural language descriptions. This number indicates the sampling frequency of the audio signal, and the higher the sampling rate, the more subtle changes in sound waves can be captured, and the closer the recorded sound will be to the original sound. 44.1 kHz is the standard sample rate for CD sound quality, providing high audio quality.

In addition to generating music directly from text, users can also upload a piece of audio and then describe the desired effect with text, and Stable Audio 2.0 can transform the audio as instructed, that is, it has the ability to convert audio to audio. In addition, it can generate specific sound effects, making it very flexible in the form of creation.

Figure | Stable Audio 2.0 User Interface (Source: Official Website)

To use it, enter the keyword you want to generate music in the Prompt section, and then click "Generate" below to start generating it. Free users can generate 10 times per month, and if you want to use more, you'll need to pay at least $11.99 per month.

It's worth noting that Stable Audio 2.0 doesn't support lyrics yet. In order to fully test the level of AI-generated music, another AI music product, Suno, needs to be mentioned here.

Suno received a major upgrade at the end of March this year, with the latest V3 version widely considered the "ChatGPT moment" of the music industry. It is capable of producing songs of different styles and genres in a matter of seconds, with an effect almost comparable to that of human-created music. Suno also mentioned in the previous announcement that the V4 version is already in development and there will be some exciting new features.

Figure | Suno user interface (source: official website)

Suno is comparatively more user-friendly and can be used for free 5 times a day. Generate two songs at once. The song is 2 minutes long. There are two ways to use it, one is to enter the lyrics, and then enter the music type and theme to generate the song, and the other is to generate pure music, which only needs to describe the song you want with words.

When it comes to Qingming, the author tried to use the must-read poem "Qingming" on the Qingming Festival as a blueprint to conduct an actual test to see the effect, and compare it with the AI music product Suno that also ushered in an upgrade not long ago. If you haven't been exposed to AI music before, you'll be amazed at how well it performs. First, I used ChatGPT to adapt the Tang poem "Qingming" to generate lyrics. Enter the lyrics into Suno, and after a short wait, the song is generated. Here's what it looks like:

The actual effect is quite good. Next, use Stable Audio 2.0 to generate music with similar prompts. Here's what it looks like:

2 Qingming rain, ask the core Voice, 3 minutes

The effect of Stable Audio 2.0 is not very satisfying personally. However, depending on the music, you can also actually test the effect.

It is worth mentioning that in addition to the above two products, recently, a large music model tool Tiangong SkyMusic has also been released in China, which can be tried in the Tiangong APP (at present, you need to add a WeChat group to get an invitation code, which can be operated in the software interface). The user interface is as follows:

After entering the lyrics, you can directly generate music, or you can select an existing song as a reference to generate, generating 3 songs at a time with a duration of about 90 seconds. Here is a direct generation of the lyrics of "Qingming Rain" just generated, and one of the relatively good effects is as follows:

Then, it is generated again with reference music. Reference music is a song generated by Suno above. Here's what it looks like:

After having the reference music, the newly generated music personally feels like it is a notch, and the effects of the 3 songs are okay.

In general, of the above three products, Suno is undoubtedly the best, and it can be done to the point of being fake and real (no matter how well the lyrics are written, the effect should go further). However, Suno still has a problem with the incomplete music generated, which always stops abruptly at the two-minute mark and ends abruptly.

In any case, it is foreseeable that in the near future, AI music will become popular and commercialized on a large scale. For example, background music in film and television dramas can be quickly and efficiently produced by AI to meet the emotional needs of specific scenes. In the music market, AI can create personalized music based on users' listening history and preferences, providing us with a more customized listening experience. The music market may see a shift in consumption patterns.

As AI continues to conquer content generation, from text to video, AI can greatly enrich human creativity and empower everyone to be a creator. This not only allows artists to discover new creative methods, but also gives ordinary people the possibility to become artists.

Artificial intelligence is undoubtedly at the forefront of innovation today. Advances in technology have allowed AI to come close to understanding and simulating the creative process of humans. AI can be used as a tool to help people achieve limitless expansion of their creativity. This application challenges our traditional perceptions of artistic creation and sparks profound discussions about creativity, artistic value, and authorship. But it is undeniable that the application of AI in the field of content generation has opened a new chapter in human creativity.

Header image: DALL· E Generate "Ching Ming Festival"

Reference:

https://stability.ai/news/stable-audio-2-0?utm_source=website&utm_medium=twitter&utm_campaign=blog

Hatps://vv.listen.i/blog/v3

Stability AI released the audio model 2.0, and generated music has become the next hot topic in the technology competition

Read on