laitimes

The large model entered the first year of application Kunlun Wanwei launched China's first music SOTA model

author:Interface News

Although it has only been more than a year since it entered the industry, the discussion on the basic research and development of AI large models and application scenarios has always been high. By 2024, the voice of AI large models entering the first year of application is even more noisy, has the industry really taken this step?

In fact, in view of the huge capacity and capital required for "R&D and application", there are not many companies in the industry that are worthy of this torture: Baidu "Wenxin Yiyan", Ali "Tongyi Qianwen", Kunlun Wanwei "Tiangong", SenseTime "Ririxin", iFLYTEK "Spark" and other enterprises and their large models are all strong competitors at the table.

To really stand out among them, we not only need a basic large model with advanced performance, but also a product application scenario that matches it and has the potential to be a "explosive product". In this regard, Kunlun Wanwei tried to answer with "Tiangong 3.0" and "Tiangong SkyMusic".

On April 17, Kunlun Wanwei's self-developed 400-billion-level large language model "Tiangong 3.0" officially opened the public beta and simultaneously open source. This MoE hybrid expert model with 400 billion parameters is one of the largest and most powerful MoE models in the world, and has significantly improved performance dimensions such as model semantic understanding and logical reasoning compared with the previous generation.

The large model entered the first year of application Kunlun Wanwei launched China's first music SOTA model

Beyond the technical layer, this may also be one of the potential leaders in the field of AI applications. Together with "Tiangong 3.0", the public beta is also Kunlun Wanwei's AI music generation model "Tiangong SkyMusic", which has aroused a lot of music creation waves when it was invited to test in a small area.

Not only music, "Tiangong 3.0" has integrated AI capabilities into multiple high-frequency application scenarios such as search, writing, long text reading, dialogue, and code, preparing for the upcoming landing application battle in the field of large models.

At this point, a complete AI large model technology and application ecology is taking shape. This is one of the most important chapters in the AI model narrative, and it could establish a watershed moment in the industry.

天工SkyMusic,引领AIGC音乐浪潮

Since OpenAI pushed the large model to the industry outlet, the "100 model war" belonging to the Chinese market has kicked off for more than a year, and in 2024, the industry's focus has begun to gradually tilt from technology research and development to application landing - there is no denying that the landing and application of large models is the long-tail indicator that determines its technology and value.

For all content modalities, audio content is a better way to understand human emotions than text and images, and music is also the most abundant content carrier for human emotions that are not limited by geography and culture. Therefore, among the many landing scenes, music creation has become the most accessible and interesting AIGC scene for the general public. For AI companies, this is a favorable opportunity to push themselves into the C-end market and gain public awareness.

Tiangong SkyMusic is a large model released by Kunlun Wanwei for the music industry, which was previously opened to the society on April 2, and was officially released with Tiangong 3.0 today. It is not only the only publicly available AI music generation model in China, but also the first AI music model SOTA model in China, and the first time that China's self-developed large model technology leads the world in the field of AIGC.

In the field of large models, the SOTA model refers to the model that is considered to be the "State of the Art" (SOTA). Just as OpenAI is seen as a SOTA for large models of text and video generation, the term "State of the Art" is often used to describe the most advanced and high-performing technology or method in a particular field or technology.

In the horizontal evaluation with Suno V3, the top overseas AI music model, Tiangong SkyMusic is significantly ahead of its opponents in the fields of vocal & BGM sound quality, vocal naturalness, pronunciation intelligibility, etc., and surpasses Suno V3 with a comprehensive score of 6.65 points, becoming the global AI music SOTA model.

The large model entered the first year of application Kunlun Wanwei launched China's first music SOTA model

At present, there are two major technical paths for AI music generation, namely the symbolic music generation route and the large-model music audio generation route. The former refers to annotating a large number of musical scores, and then training the model, and the final result generated is also a musical score, which requires additional algorithms or tools to convert the musical score into music, while the latter directly learns and generates audio waveforms, and the instruments, vocals, melodies, volumes, notes, etc. are all integrated end-to-end generation, but this method is not only difficult, successful experience is scarce, but also requires high computing power and funds.

SkyMusic opted for the technically more difficult audio generation route. Not only that, but since this route covers Song, BGM (Background Music), and Speach, the Song field has never been able to find an excellent solution because it contains vocals and is more difficult to generate. In this regard, Tiangong SkyMusic has invested a lot of resources to achieve certain technological breakthroughs, so that the generation quality in this field has finally been improved.

It is worth noting that Tiangong SkyMusic is a rare product in the field of music AIGC that discloses its own technical path. Its technical roadmap composed of three core modules, "Encoder-DiT-Decoder", has become an important technical reference for "audio route + vocal song route".

The large model entered the first year of application Kunlun Wanwei launched China's first music SOTA model

Compared with similar products in the industry, the Tiangong model has driven Tiangong SkyMusic to have more prominent product advantages.

For example, it has the ability to generate original reference music and dialect songs. Reference music generation refers to the ability of users to upload their own reference music, or select existing reference music in the "Tiangong SkyMusic" library to generate songs with similar styles and singing voices, which can combine creativity, technology and music production skills to allow users to use existing audio resources to create more colorful music works;

The ability to generate songs in dialects includes Cantonese, Chengdu, Beijing, Shanghainese, etc., which can not only expand its audience, but also strengthen the diversity of music creation and generation. At the same time, based on a powerful database and training model, Tiangong SkyMusic can also create a more recognizable natural voice, which will also distinguish it from the "AI vocal texture" of ordinary music AIGC products.

In addition, Tiangong SkyMusic can control emotional changes through lyrics, realize a variety of singing techniques such as vibrato, opera, chanting, etc., and also support the creation of rap, folk, funk, antique, electronic and other music styles.

This kind of flexibility and universality in music creation has brought more fun to the creative results of Tiangong SkyMusic. Among the many demos it announced, "Dragon Walking" shows how opera singing and electronic music are perfectly integrated, "Wukong" has lyrics that fit the legendary and uninhibited character of the characters themselves, and their unique understanding and comprehension are finally combined with the melody, and "Pack my bags" interprets the subtleties of European and American pop music, with a female voice that combines the timbre and skills of European and American female singers.

In this way, SkyMusic has been able to greatly lower the threshold for music creation, making it easier for every user to create their own melodies and songs, which is expected to become one of the most important music creation tools for everyone in the industry. With the continuous evolution of this music model, it is also possible to join the auxiliary process of professional musicians to improve the quality and efficiency of creation, and gradually promote the establishment of its own AI music creator ecosystem.

The era of open-source MoE models has arrived

In fact, Tiangong SkyMusic is just the first stop for Kunlun Wanwei to move towards the AIGC world. Since the release of Tiangong 3.0, this large model will cover more high-frequency AIGC application scenarios such as listening, speaking, reading, writing, drawing, singing, etc., officially opening a multi-modal large model era.

The gradual transition from a single modality to a multi-modality, and then the construction of a world model, is the most widely recognized evolution path to AGI in the industry. After OpenAI demonstrated the capabilities of GPT-4 and GPT-4V, the industry has been waiting for a multi-modal large model with more scene coverage, which will push the application of large model technology to a further step.

Kunlun Wanwei "Tiangong 3.0" is in this context. "Tiangong 3.0" adopts a 400 billion-parameter MoE hybrid expert model, which is one of the world's largest and most powerful MoE models, and has been selected as open source. Compared with the previous generation, it has been significantly improved in the fields of model semantic understanding, logical reasoning, generality, and generalization.

Specifically, the model ability improvement of "Tiangong 3.0" focuses on four aspects: logical reasoning ability, semantic comprehension ability, special agent training and content creation ability. In terms of logical reasoning, Tiangong 3.0's mathematical and reasoning capabilities have been improved by more than 30%, and semantic understanding can better understand and process complex semantic information in users' natural language queries, including metaphors and polysemous words.

Special agent training is the core of this model capability improvement. At present, AI agent (agent) has become the mainstream implementation direction of large model technology, and "Tiangong 3.0" has carried out special training for the model agent ability of the model to independently plan, call, and combine external tools and information, so that it can independently generate and call code, and complete a number of complex user requirements including chart drawing, tool calling, semantic judgment, etc.

The large model entered the first year of application Kunlun Wanwei launched China's first music SOTA model

Since then, "Tiangong 3.0" has become an all-round expert with expertise and capabilities in many fields. It can disassemble and optimize complex tasks, better understand user needs, and has the ability to judge and call special modes in real time to expand the base model to maximize model performance. Demand scenarios such as industrial research, product evaluation, information analysis, picture generation, and chart drawing can be efficiently covered by "Tiangong 3.0".

For AI users, the most intuitive value of the "Tiangong 3.0" performance upgrade is reflected in the AI search scenario. In terms of information presentation, Tiangong AI's research mode can improve the professional nature of Q&A, extend related questions around a simple instruction of the user, and automatically generate research outlines, maps, practice summaries, and mind maps, while the enhanced mode can further guide fuzzy questions to help users obtain more effective information and improve the quality of responses.

The large model entered the first year of application Kunlun Wanwei launched China's first music SOTA model

At the same time, Tiangong 3.0 shows the differentiated advantages that similar AI search engines in the industry do not have. - It not only has the ability to reply to text, but also has the ability to illustrate and text, and accompany text answers with pictures or videos to help users strengthen their understanding of information.

The large model entered the first year of application Kunlun Wanwei launched China's first music SOTA model

At the content creation level, based on the performance improvement brought by special agent training, the content creation capability of "Tiangong 3.0" has developed across stages.

In terms of basic reasoning and understanding, Tiangong 3. The improvement of mathematical reasoning ability also allows it to understand user needs more accurately.

On the basis of the previous generation of AI search, AI voice, AI dialogue, AI two-dimensional comic generation and other powerful content creation capabilities, "Tiangong 3.0" has developed stronger multimodal performance, such as the ability to generate pictures in real time in combination with text needs, or analyze content and charts in real time in dialogue, and has become a 100-billion-level open-source MoE model that integrates listening, speaking, reading, writing, searching, drawing, watching, singing and other capabilities.

So far, Tiangong 3.0 has been able to realize the deep integration and application of multi-modality. For the industry, this will bring more efficient and intelligent solutions, while lowering the R&D threshold and use cost of AI technology, and maximizing the sharing of technical capabilities and experience.

Lower the threshold for the use of AIGC and promote industrial upgrading

Since the day of ChatGPT's stunning debut, users who have been paying attention to the development of AI large models can probably experience the significant impact of "Tiangong 3.0" on the industry - it not only improves its competitiveness at the technical level, but also gradually covers the current high-frequency application scenarios at the practical application level, and at the same time strides forward towards the goal of building a large model application ecology.

From this perspective, the significance of the release of Tiangong 3.0 is not only the upgrade of large model application scenarios, but also accelerates the popularization of AI applications and prompts more enterprises and developers to participate in AI-led technological changes.

From the multimodal capabilities released by Tiangong SkyMusic to Tiangong 3.0, the industry can already predict the wave of AIGC that Kunlun Wanwei intends to set off.

Because it is not only China's first music AIGC SOTA, but also the world's largest open-source MoE model, Tiangong 3.0 has the ability to lead creators in more fields to freely enter and exit the channel of AIGC understanding and generation, and use multi-modal deep integration and application to greatly reduce the threshold and cost of content production, and redefine creative efficiency and quality standards. This influence will gradually promote the evolution of the entire content production industry and release more creativity and content value.

This is not only the mission and vision of a company, but also a hard stage goal for the industry to achieve breakthroughs. To this end, Kunlun Wanwei has been practicing for many years.

Since the release of the Tiangong series of large models, Kunlun Wanwei has completed its own business matrix layout in the two major directions of AGI and AIGC: from hundreds of billions of large language models to multimodal AI content generation capabilities, from AI search, AI music, AI social networking, etc. to the domestic leading AI Agent development platform, whether it is model technology or engineering capabilities, it has tried its best to stand firmly in the head camp of domestic AI enterprises and be ready to provide support for the industry.

Behind this, moving towards AGI and promoting the development of AIGC applications has always been the goal and mission of this company. Now, with the release and display of "Tiangong 3.0", Kunlun Wanwei has taken another step in the strategic journey of "All in AGI and AIGC", and is about to push the large-scale model war to a new climax.

Read on