laitimes

Tiangong 3.0 is officially opened!400 billion parameter MoE is open source, opening the moment of music generation ChatGPT

author:Smart stuff
Tiangong 3.0 is officially opened!400 billion parameter MoE is open source, opening the moment of music generation ChatGPT

Author | vanilla

Edit | Desert Shadow

Zhidong reported on April 17 that Kunlun Wanwei today launched a 400 billion parameter open source large model Tiangong 3.0, which adopts MoE (hybrid expert model) architecture, compared with Tiangong 2.0 2 months ago, logical reasoning, semantic understanding, complex demand response, content creation and other 4 core capabilities have been greatly improved.

Tiangong 3.0 adds functions such as chart comparison generation, research mode, enhancement mode, expanded map retouching, etc., and trains the agent ability of the model in a targeted manner, so that the model can "think independently", plan and disassemble user needs, and complete complex tasks.

At the same time, Tiangong SkyMusic, based on Tiangong 3.0, is also China's first music AIGC SOTA (the best level in the field), which adopts the Sora model architecture in the field of music and audio, and is the only publicly available music generation model in China.

Tiangong SkyMusic has greatly lowered the threshold for music creation, and although it is still in its infancy, it has already achieved good results in the field of music generation. After the invitation test was opened in early April, more than one million people submitted test applications in the Tiangong SkyMusic background.

Loading...

▲Square Dance Divine Comedy version of "Farewell Kangqiao" (source: Zhidong)

The release of Tiangong 3.0 model represents another milestone moment on the strategic path of Kunlun Wanwei's "All in AGI and AIGC". How easy is the Tiangong 3.0 model, which has greatly increased its power, and in what ways can it significantly improve productivity?

1. The world's largest open-source MoE model, with 400 billion parameters and 4 core capabilities upgraded

Tiangong 3.0 has a parameter scale of 400 billion, making it the world's largest open-source MoE model. Compared with the previous generation, Tiangong 3.0's model technology knowledge ability has been increased by more than 20%, and the ability of mathematics, reasoning, code, and cultural creativity has been increased by more than 30%.

Tiangong 3.0 is officially opened!400 billion parameter MoE is open source, opening the moment of music generation ChatGPT

▲Tiangong 3.0 has become the world's largest open-source MoE model

The Tiangong 3.0 base model has been greatly improved in four aspects: logical reasoning ability, semantic comprehension ability, ability to respond to complex needs and content creation ability. As a multi-modal large model, Tiangong 3.0 integrates functions such as AI search, AI writing, AI long text reading, AI image generation, and AI music generation, surpassing GPT-4V in a number of authoritative multimodal evaluation results such as MMBench.

Tiangong 3.0 is officially opened!400 billion parameter MoE is open source, opening the moment of music generation ChatGPT

▲Tiangong 3.0 multi-modal performance surpasses GPT-4V

Based on the improvement of model capabilities, Tiangong 3.0 also adds functions such as multi-round search and comprehensive tool call, AI search research mode, and AI search enhancement mode, which can efficiently complete various complex needs such as industrial analysis and product comparison.

In the research mode, Tiangong 3.0 can extend related problems around simple instructions, and automatically generate research outlines, maps, practice summaries, mind maps, etc.

For example, I asked Tiangong 3.0 to study the "development process of OpenAI". After searching on the whole network, it can present the search results in the form of segmentation and refinement, and automatically summarize the outline and draw the mind map.

Tiangong 3.0 is officially opened!400 billion parameter MoE is open source, opening the moment of music generation ChatGPT
Tiangong 3.0 is officially opened!400 billion parameter MoE is open source, opening the moment of music generation ChatGPT

▲Tiangong 3.0 summarizes the development process of OpenAI (Source: Zhidong)

In the enhanced mode, Tiangong 3.0 can disassemble and refine the user's complex Query, and make it stronger in natural semantic understanding through questioning, information understanding and completion, and better face uncertain knowledge.

For example, if I enter the prompt word "2024 Science and Technology Circle", the difficulty of this requirement is relatively large, and it will contain a variety of prompt words for subdivided needs. Tiangong 3.0 can immediately realize this problem and ask further questions, and it also intimately provides industry development trends, product market size, investment environment and other direction options. After I selected "Development Trends", it quickly gave answers that included trend information such as AI, AIoT, and new energy based on the data obtained from the Internet.

Loading...

▲Tiangong 3.0 enhanced mode (source: Zhidong, video acceleration)

Based on the multi-round search and comprehensive tool call function, Tiangong 3.0 can disassemble the user's task into subdivided links, judge whether it is necessary to network or call tools in real time, and carry out single or multi-round network search and tool call.

Of course, the Internet has to investigate the latest current affairs hot spots, I decided to ask Tiangong 3.0 "Chengdu Disney" why it has been so popular recently, and Tiangong 3.0 immediately explained the origin of this stalk and the course of the incident. Then I jumped to the topic and asked "Disney play strategy", Tiangong 3.0 linked the context and gave a travel strategy for Chengdu. It's no problem to ask Shanghai Disneyland about the weather, and by calling the weather component tool, Tiangong 3.0 can directly give the weather forecast for Shanghai in recent days.

Tiangong 3.0 is officially opened!400 billion parameter MoE is open source, opening the moment of music generation ChatGPT

▲Tiangong 3.0 multi-round search and comprehensive tool call function (Source: Zhidong)

In terms of image generation, Tiangong 3.0 has made a breakthrough in the ability to modify and expand the map, which allows it to draw a landscape map and gradually add new items or elements to the map:

Tiangong 3.0 is officially opened!400 billion parameter MoE is open source, opening the moment of music generation ChatGPT

▲Tiangong 3.0 image drawing (source: Zhidong)

For users, Tiangong 3.0 is not only suitable for industrial analysis, market research, product comparison, knowledge management and other work scenarios, but also suitable for entertainment scenarios such as content creation, education and training, intelligent search, speech synthesis, image and music generation.

Students and workers can use the research mode and enhancement mode of Tiangong 3.0 to obtain comprehensive and concise information through simple inquiry, greatly shorten the time required for literature collection and data collection, and improve the efficiency of work and study.

Content creators can use Tiangong 3.0's AI music generation, AI voice, AI image generation and other functions to improve the efficiency and quality of creation, and at the same time, the threshold for creation is lowered, and everyone can become a "composer" and "illustrator".

In addition, in the field of ToB, enterprise users can also use the Tiangong model to build exclusive agents, realize exclusive knowledge bases, realize automatic invocation of formulation tools, complete complex instructions and follow agent construction, etc., to improve work efficiency, optimize the decision-making process, and enhance the competitiveness of products and services.

2. The first music AIGC SOTA in China, which generates 80-second songs and vocals in seconds

Recently, overseas music generation products such as Suno and Udio have exploded, and the field of AI music generation has received unprecedented attention. However, these products are designed for overseas markets and have a certain threshold for domestic users.

Tiangong SkyMusic, based on Tiangong 3.0, is not only the only publicly available AI music generation model in China, but also surpasses Suno V3 with a comprehensive score of 6.65 points in terms of vocal & BGM sound quality, vocal naturalness, pronunciation intelligibility and other performance, becoming a global AI music SOTA model.

Tiangong 3.0 is officially opened!400 billion parameter MoE is open source, opening the moment of music generation ChatGPT

▲天工SkyMusic综合性能超越Suno V3

Tiangong SkyMusic can generate 80-second 44100Hz sampling rate two-channel stereo songs, support the generation of rap, folk, funk, archaic, electronic and other music styles, and can also learn vibrato, opera, chanting, male and female duet, automatic harmony and other singing skills.

In addition, SkyMusic also has original reference music generation and dialect song generation capabilities. Users can upload or select existing reference music to generate songs with similar styles and singing tones, further lowering the threshold for use, and can also generate dialects such as Cantonese, Chengdu dialect, and Beijing dialect to spread regional culture.

It is also very easy to use, users only need to download the Tiangong APP, fill in the lyrics or use AI to generate lyrics, and then select or upload a reference song, click "Generate" to generate music in less than half a minute, and each song provides three versions to choose from.

Based on SkyMusic's SOTA ability and emotional expression advantages, you can add melodies to your favorite ancient poems:

Loading...

▲Lyrical version of "Long Hate Song" (source: Zhidong)

This "Song of Long Hatred" was generated by me using Xu Jiaying's song "Riding a White Horse" as a reference, and the music generated by Tiangong SkyMusic progresses layer by layer on the accompaniment, and the melody also reflects the difference between the main song and the chorus.

It is also possible to adapt the Internet hot meme into a rap version:

Loading...

▲ Rap version of the Internet hot stalk (source: Zhidong)

This passage is a popular "encrypted literature" on the Internet recently, and the reference music is a rap guide provided by Tiangong's official. As a "music novice", I don't know much about professional terms such as Verse and Flow in rap songs, but I can hear that this AI is really fast (Doge).

If you don't want to select an existing lyric, you can use the AI-generated lyrics function to let the AI continue to write for you. Here's a copy I generated with AI about "I don't want to go to work", generating a new song based on the "Earthy Divine Comedy" "5:20AM":

Loading...

▲AI creates a soil shake version of the song (source: Zhidong)

In terms of dialect song generation ability, Zhidong selected the lyrics of Eason Chan's classic Cantonese song "Under Mount Fuji", and used Jay Chou's "Blue and White Porcelain" as a reference track input to generate this blue and white porcelain version of "Under Mount Fuji" with national style characteristics:

Loading...

▲Blue and white porcelain version of Mount Fuji (source: Zhidong)

With such a realistic vocal ability to "fake and real", as well as a high degree of controllability in various links such as restoring music style, how did Tiangong SkyMusic achieve it?

It is understood that the existing AI music model companies generally do not disclose their own technical paths, so there is no open source music model that can be used for reference and reference. Kunlun Wanwei has made a lot of attempts in the exploration of technical paths, spent a lot of R&D resources, and finally explored the following path:

Tiangong 3.0 is officially opened!400 billion parameter MoE is open source, opening the moment of music generation ChatGPT

▲Tiangong SkyMusic technical schematic diagram (source: Kunlun Wanwei)

In the field of AI music generation, there are two major technical paths, the symbolic school and the large model school. Tiangong SkyMusic chose a more difficult and effective route for large-scale model music audio generation.

In the audio generation path, there are three subdivisions: Song, BGM, and Speech. In the past, a lot of AI music research focused on the field of BGM without voices, and there were few good solutions for the Song track with voices. Tiangong SkyMusic has made a great breakthrough in the field of Song, greatly improving the model performance of AI music generation technology in the field of Song, and creating a successful case of audio generation model.

Specifically, Tiangong SkyMusic adopts a model architecture similar to Sora, including three core modules - Encoder, DiT (Diffusion Transformer) and Decoder. Among them, Large-scale Transformer is responsible for composing music, learning the contextual dependencies of Music Patches, and completing music controllability, while DiT is responsible for singing, and Music Patches are restored to high-quality audio through LDM (Latent Diffusion Model).

It can be seen from the above cases and the horizontal evaluation with Suno V3 that compared with other overseas AI music models, Tiangong SkyMusic has excellent performance in the delicacy and recognizability of AI vocal synthesis, with articulation and pronunciation, and supports Cantonese, Chengdu dialect and other dialects.

Although it is still in its infancy, Tiangong SkyMusic has already made many users feel the joy of music creation. At the same time, Kunlun's choice to make the valuable technical architecture public also reflects its emphasis on the open source community ecology and the common development of the industry.

3. Build six AI business matrices, and launch AI search and AI music products for the first time in China

Large models have been soaring for 500 days, and how to implement their capabilities into application products is still a difficult problem for many AI manufacturers. When will the killer app for large models appear?

Fang Han, chairman and CEO of Kunlun Wanwei, told Zhidong that C-end + free may become the main path for the landing of large models. In the Internet era, Google and Microsoft in the United States, Baidu and Ali in China have all relied on this logic to become Internet giants, and the same principle will be extended to the era of large models.

On the one hand, the upper limit of C-end users is as high as 8 billion, and on the other hand, the subscription model has a high threshold and a relatively low level of user acceptance. And to be free, AI UGC (user-generated content) platforms are a good business model.

According to the Top 100 Generative AI Products report released by venture capital firm a16z last month, general content production applications such as ChatGPT and Gemini still account for the majority of consumer-level AI applications. Compared to the ranking 6 months ago, two new categories entered the rankings for the first time: Music and Productivity.

Suno is the only music generation product to enter the ranking, which shows that music production tools are gradually breaking into the field of vision of consumers and becoming the next potential C-end application landing path. The productivity category has 7 products on the list, including writing, video summarization, search engine, article summarization and other fields.

This coincides with Kunlun Wanwei's product layout path.

In April 2023, Kunlun Wanwei proposed the strategy of "All in AGI and AIGC", which is not limited to a single product or technology, but builds a complete AI ecosystem, and gradually forms six business matrices: AI large model, AI search, AI music, AI social networking, AI game, and AI video.

Among them, AI large model and AI search are the foundation of all AIGC capabilities, and music, video, social networking, games and other directions are Kunlun Wanwei's exploration on the road of AGI, reflecting its AI UGC platform business model.

In August 2023, Kunlun Wanwei launched the first AI search product in China - "Tiangong AI Search", which deeply integrates the capabilities of AI large models, provides users with fast and reliable interactive search services in a humanized and intelligent way, and promotes traditional search to leapfrog into the AI era.

At the beginning of this month, Kunlun Wanwei launched the first AI music generation product in China - "Tiangong SkyMusic", which adopts the Sora model architecture in the field of music and audio, and supports the generation of 80-second 44100Hz two-channel stereo songs with a sampling rate, which lowers the threshold for music creation, and everyone can use music to express emotions.

Why was Kunlun Wanwei able to launch the first domestic AI subdivision application creative new product twice in the domestic market?

This is inseparable from its forward-looking strategic layout, deep technology accumulation, strong R&D strength and keen insight into market demand.

Since 2020, Kunlun Wanwei has been deploying in the field of AIGC and large models, and has accumulated nearly four years of relevant engineering research and development experience, and the R&D investment is huge. According to its report for the third quarter of 2023, the company's R&D expenses in the first three quarters reached 620 million yuan, a year-on-year increase of 28.18%. At the same time, the company attaches great importance to the open source ecosystem, and the Tiangong model has also been helped by hundreds of AI scientists in the open source community during the development process.

In addition, Kunlun Wanwei has a keen insight into market demand and sees the huge potential of AI technology in search engines, music creation and other scenarios. Since the release of the Tiangong model in April 2023, the team has begun to try to integrate the large model with the search engine, and launched China's first AI search product, Tiangong AI Search, in August of the same year. Tiangong SkyMusic embodies an important direction of Kunlun Tiangong's exploration and research - emotional AGI.

结语:All in AGI与AIGC,昆仑万维交出最新答卷

With the open source public beta of the Tiangong 3.0 model, we have witnessed another milestone in Kunlun's Wanwei AI technology.

With its 400 billion parameter MoE architecture, Tiangong 3.0 has not only achieved a leap in core capabilities such as logical reasoning and semantic understanding, but also demonstrated its strong application potential in the field of multimodality. The successful launch of Tiangong SkyMusic has lowered the threshold of music creation to a new low, making it easy for everyone to play music.

Kunlun's strategic layout of "All in AGI and AIGC" not only shows its forward-looking future technology trends, but also demonstrates its ambition in the field of AI. We look forward to seeing more excellent domestic large models and AIGC products, and bringing changes to more industries and people's daily lives through various innovative explorations on the road to AGI.

Read on