Google's "crazy" generative AI track, the latest model can "create" music with text and pictures

2023-01-31 03:50:02

"Science and Technology Innovation Board Daily" on January 28 (editor Song Ziqiao) On the track of generative AI models, Google is "soaring" all the way. Following the text generation AI model Wordcraft and video generation tool Imagen Video, Google has extended the application scenarios of generative AI to the music industry.

On January 27, local time, Google released a new AI model - MusicLM, which can generate high-fidelity music from text and even images, that is, it can convert a piece of text and a picture into a song, and the style of music is diverse.

Google showed a number of examples in related papers, such as the input subtitle "The fusion of reggae and electronic dance music with empty, otherworldly sounds that evoke the experience of being lost in space, and the music is designed to evoke a feeling of wonder and awe, while at the same time suitable for dancing", MusicLM generated 30 seconds of electronic music.

Google's "crazy" generative AI track, the latest model can "create" music with text and pictures

For example, with the world-famous painting "Napoleon Crossing the St. Bernard Pass in the Alps" as the "title", the music generated by MusicLM is solemn and elegant, and the fierce slaughter and heroism of winter are vividly embodied. In addition to realistic oil paintings, abstract paintings such as "Dance", "Scream", "Guernica" and "Starry Sky" can be used as themes.

MusicLM even has a music skewer that mixes different styles of music in story mode. Even if you ask to generate 5 minutes of music, MusicLM is no problem.

In addition, MusicLM has powerful auxiliary functions, which can specify specific instruments, places, genres, eras, musicians' performance levels, etc., and adjust the quality of the generated music, so that a piece of music can be transformed into multiple versions.

MusicLM is not the first AI model to generate songs, similar products include Riffusion, Dance Diffusion, etc., Google itself has released AudioML, and OpenAI, the developer of the most popular chatbot "ChatGPT", has launched Jukebox.

What makes MusicLM unique?

It is actually a hierarchical Sequence-to-Sequence model. According to AI scientist Keunwoo Choi, MusicLM combines multiple models such as MuLan+AudioLM and MuLan+w2b-Bert+Soundstream.

Among them, the AudioLM model can be regarded as the predecessor of MusicLM, which uses AudioLM's multi-stage autoregressive modeling as a generation condition, which can generate music at a frequency of 24kHz through text description and maintain this frequency for several minutes.

In contrast, MusicLM has more training data. The research team introduced the first MusicCaps specifically designed to generate task evaluation data for text-music to solve the problem of lack of evaluation data for tasks. MusicCaps is built by professionals and covers 5500 music-to-text pairs.

Based on this, Google trained MusicLM with a 280,000-hour music dataset.

Google's experiments show that MusicLM outperforms previous models in terms of both audio quality and compliance with text descriptions.

However, MusicLM also has risks common to all generative AI - technical imperfection, material infringement, moral disputes, etc.

For technical issues, for example, when MusicLM is asked to generate vocals, it is technically possible, but it does not work well, and the lyrics are messy and meaningless. MusicLM is also "lazy" – about 1% of the music generated is copied directly from the songs in the training set.

Also, is the music generated by the AI system considered original? Can it be copyrighted? Can you compete with "artificial music"? There has been no consensus on the dispute.

These are all reasons why Google did not release MusicLM to the public. "We acknowledge the potential risk of misappropriation of creative content from the model, and we emphasize that more work is needed in the future to address these risks associated with music generation." The paper published by Google reads.

Google's "crazy" generative AI track, the latest model can "create" music with text and pictures

Read on

To see how strong the AI is, someone took it to play a "script kill"

Hardware 丨 AMD expects to launch a CPU with an integrated AI engine as early as 2023

Why sound is suitable for building a brand strengthens the mind

The 7th generation of Qualcomm AI engine: through AI, see the future

Capture once in 5 minutes, at least 89 times a day at home! Suntech employee: I don't even dare to go to the toilet

Played a script kill, the same car teammate "not human"

2022 Le Orange New Product Launch: 14 new products qifa software and hardware fully upgraded

Is there any software to dub videos? Share software that can dub videos

Don't let ChatGPT run

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

Cheating with ChatGPT, beware of being caught, anti-plagiarism watermark technology makes students' nightmares come early

What to do if ChatGPT goes crazy? Xiaoice Li Di: Two keys that I can break

Experience ChatGPT again: it will still be wrong, but the logic is stronger

Wu Jun, a well-known computer expert: ChatGPT is not a new technological revolution and does not bring any new opportunities

In the face of ChatGPT's global popularity, how should China's AI debut?

Silicon Valley Big L5: Survivors of Winter

Why can't Europe create a mobile operating system that can compete with Android and iOS?

Ten thousand layoffs turned around and embraced AI, and Meta was going to change its name again

Microsoft Google wants to reinvent the business with AI, Musk said that AI will destroy humanity... Talk about AI

Xiaoza personally officially announced the Meta vision big model! Self-supervised learning requires no fine-tuning

The CV ring exploded again? Xiaoza high-profile official announcement DINOv2, split retrieval omnipotent, netizens: Meta is "Open" AI

Samsung "backstabbed" Google

AI competition is intense, Google makes another big move! Merger of DeepMind and Google Brain

By merging DeepMind and Google Brain, Google ushered in a new era of AI

Keep up with Microsoft! Google's generative AI Bard can program and debug code bugs too

Nothing has been achieved in AI research and development, and you still lay off employees while sending yourself "red envelopes"? Google's CEO made nearly $1.6 billion last year

Google CEO Pichai: Artificial intelligence occupies the C position Search is important but no longer the core business

Apple and Google led the development of draft tracking industry specifications to prevent abuse of features

After sparking outrage in Brazil, Google removed the Slave Simulator game

The Queens rights sold for more than $1 billion, and EXO members terminated their contracts with SM Entertainment

Can't stay 3 days a week, the Amazon CEO was forced to say ruthlessly: If you don't go back to the office, you will leave!