laitimes

Behind the popularity of domestic AI applications

author:One Zero Society loves science

01

Quantitative change leads to qualitative change

The iteration of domestic large-scale models has accelerated

Now it can be said that it is an era of the flood of artificial intelligence concepts, it seems that all new technologies are born with the golden key of artificial intelligence, "children of large models", but as the so-called "crying children have candy to eat", the louder the volume of artificial intelligence, it often means that they are now in the stage of needing attention and traffic to develop, so at this point in time, what is the development of China's artificial intelligence?

Behind the popularity of domestic AI applications

In the AI 2.0 era, application landing is king

At the just-concluded Beijing Auto Show, most of the new models can do "say where to move", such as sitting in the main driver and turning his head to look in the direction of the co-driver and saying "open that window", and the co-driver window will automatically open...... Without exception, these models are equipped with domestic artificial intelligence models, which simulate human brains and neural networks, and have multi-modal interaction capabilities such as voice and vision, providing more humanized control capabilities for the cockpit, and at the same time calculating more accurately in intelligent driving, more and more like an "old driver" who has held a driver's license for many years.

The car model is an important microcosm of the development of domestic artificial intelligence - we know that in the field of new energy vehicles, the mainland can be called a peerless in the world, and intelligent driving is a core application direction of artificial intelligence development. From the perspective of application, large models can be divided into three categories: general, industry, and scenario. But obviously, for artificial intelligence at this stage, the knowledge system of the whole world is still too large, and this goal is somewhat unrealistic.

Therefore, at this stage, the artificial intelligence model has moved towards a dedicated route, by feeding it professional data in the industry, and training to form scenario-based, customized, and personalized models, it will generate proprietary models, realize artificial intelligence empowerment for various vertical fields, and achieve matching at the three ends of computing power, data and models, which also marks the development of artificial intelligence in a more refined direction, and the industry has also taken this as a time point to divide the current artificial intelligence into the AI 2.0 era.

Behind the popularity of domestic AI applications

Automobile is currently the fastest field for domestic special large models

Taking the automotive field as an example, almost all of the large models that have been landed in cars are well-known leaders in the industry, including but not limited to Huawei's Pangu, Baidu's Wenxin Yiyan, iFLYTEK's Xinghuo, 360's Zhibrain and other general large models of technology companies, as well as BYD's Xuanji, From the perspective of terminals, more than 10 brands of cars have been equipped with large models, and the development trend is like a spark of sparks, firmly grasping the technological advantages in the hands of Chinese companies.

Of course, in addition to the recent hot new energy intelligent driving, domestic artificial intelligence models have also made great progress in the field of generative AI, many people in the industry agree with a point of view - in the AI 2.0 era, generative AI is regarded as an important technology to promote the progress of productivity, if it can achieve a breakthrough in knowledge, reasoning, and execution of the three layers of capabilities, it will really bring about the leapfrog development of the entire social productivity, and from the specific performance point of view, the current domestic artificial intelligence does have the strength to compete with the international frontline.

Represented by Kimi, the subdivision and application of domestic large models has risen

In March this year, Kimi Chat, the first intelligent assistant product launched by Beijing Moon Dark Side Technology Co., Ltd. that supports the input of 200,000 Chinese characters, has sparked heated discussions on the whole network, and the latest version has even supported 2 million words.

In addition to the greatly improved long text processing capabilities, Kimi has also strengthened its own context window, lossless memory function, and multi-language support, and has also performed well in a variety of use cases such as online search and information collection, data processing, writing code, and simulating conversations, opening up a new situation for the "long text era" of large model applications.

Behind the popularity of domestic AI applications

The subdivision gameplay of domestic large models is gradually enriched

From the user's point of view, Kimi is free and easy to use, the knowledge base covers many fields such as science and technology, culture, history, education, etc., and the answer is very accurate, and it also supports TXT, PDF, Word, PPT, Excel and other commonly used documents content analysis function, taking the electronic product industry as an example, there are often highly professional and more than tens of thousands of words of instruction documents need to be processed, there are complex data formats in the document, reading is time-consuming and laborious, and the longest document supported by ChatGPT free version is about 2000 Chinese characters, users have to split the long document into many small segments to upload, which is time-consuming and laborious, and another foreign product, Claude3, although it supports tens of thousands of words of long text, but the number of free times a day is only 20 times.

In contrast, with Kimi, you only need to throw these long documents into the dialog box to quickly get accurate answers, which greatly improves the efficiency of data management and information retrieval. And Kimi's application interface is also very rich, including mobile apps, web and WeChat mini programs, and for most people, the practicality even exceeds that of paid models such as GPT-4.

Behind the popularity of domestic AI applications

Jingdong Live AI digital human "purchases and sells Dongge", and new ideas have been injected into the e-commerce track

Of course, in addition to Kimi, there is also a very hot spot in the near future that is the "Procurement and Sales Dongge" launched by JD.com, Liu Qiangdong appeared in the form of AI digital humans in the JD live broadcast room.

To be honest, there are still flaws at the technical level, such as the limited freedom of action and dialogue, and the lack of realism, but the fact that AI digital humans break through the limitations of time and space and improve the efficiency of content production such as live broadcasts and videos is also in front of us, and at the same time, it can also reduce the dependence on a single IP or star and enhance business stability. Moreover, this kind of application is driven by a new generation of artificial intelligence models, which can not only drive the demand for underlying infrastructure, but also benefit the relevant computing power industry chain, promote the improvement of overall social production efficiency, and more importantly, this application has also landed in China first, which is of great historical significance.

Computing power is strength, and large factories are still the protagonists

For users, what we see is the result of artificial intelligence, but for enterprises, the large model does not fall from the sky, it needs strong computing power to support it in order to "become a big thing", and today's environment is not particularly good, some countries to China's "chip blockade" is intensifying, not only restricting the export of high-end chips, but also restricting the export of advanced chip manufacturing equipment.

According to data from the General Administration of Customs of China, the mainland imported a total of 479.5 billion integrated circuits in 2023, down 10.8% from 2022, and imported US$349.4 billion, down 15.4%, a record low. However, the situation has eased this year, from January to February, the mainland imported 78.52 billion integrated circuits, a year-on-year increase of 16.8%, and the import value increased by 15.3% year-on-year to 54.7 billion US dollars, accounting for 13.6% of China's imports of goods, a significant increase from 12.2% in the same period last year, but the comprehensive pressure is still not small.

Behind the popularity of domestic AI applications

According to the latest data released by the National Bureau of Statistics, China's integrated circuit output in 2023 will be 351.4 billion, compared with 324.2 billion in 2022, a year-on-year increase of 6.9%. , a new high in recent years. In other words, it is now in the stage of comprehensive "blood exchange" on the domestic artificial intelligence hardware side, and the policy support has also clearly shown concrete requirements, that is, policies, enterprises, and industries are aiming at artificial intelligence computing power chips and working hard in one direction.

In this environment, domestic technology giants have also made frequent moves, such as Tencent and Alibaba's joint investment in Changxin Storage, Meituan's investment in silicon carbide power device R&D and manufacturer Qingchun Semiconductor, ByteDance's company becoming a shareholder of Xinyuan Semiconductor, and Ant Group's completion of Wuxi Muchuang, which focuses on security chips, led the A3 round of hundreds of millions of yuan......

Behind the popularity of domestic AI applications

The Wensheng video model is the next AI outlet

Therefore, the force of the domestic artificial intelligence model, in fact, is also a very money-burning process, the giant bowing down to the layout is inevitable in the industry, taking Ali Tongyi Qianwen as an example, recently announced its self-developed EMO model "National Singing" online Tongyi Qianwen App, which can generate a singing video with real facial expressions and various head postures by inputting a reference image and voice audio.

Next, we will also test this new function in detail,After all, the next stage of generative AI is most likely Wensheng video,Since the explosion of OpenAI Sora in February this year,Whoever can really land this segment,Who can stand on the tuyere of the artificial intelligence industry in 2024,But Wensheng video's demand for computing power is not an order of magnitude relative to Wensheng Wenwen and Wensheng diagrams,Therefore,The advantages of enterprises with strong resources and capital at this stage will be more obvious。

02

A phenomenal domestic AI application that has exploded the circle of friends

Phenomenal AI applications that actively break the circle

The landing of any new technology needs to be promoted by phenomenal applications, and when the first year of general AI has become a thing of the past, how to break the situation in landing applications?

When discussing the infinite possibilities of AI, we have to mention the amazing capabilities and potential it has shown in various fields, but for most people, AI is still unfamiliar and even somewhat unattainable. Especially at the moment when large models are "everywhere", major domestic technology companies, start-ups, scientific research institutions and even university research laboratories have incubated hundreds of domestic large models in the past year or so, which makes the public feel confused about the specific application scenarios and directions of AI.

Robin Li, the founder of Baidu, publicly stated at the Xili Lake Forum: "Constantly repeating the development of basic models is a great waste of social resources. Is there a chance to make a large model? Yes, but the opportunity for a large model is not only the large model itself, but more opportunities will come from its application. In the AI-native era, what we need is 1 million AI native applications, not 100 so-called large models. ”

There are too many large models, and too few valuable AI native applications, just like there are no goods in an empty store, and wasting computing power in vain has become a disadvantage in the current domestic and global AI field. In the AI era, large models are important as the basic base, but large models similar to operating systems always need to rely on terminal applications if they want to play a role, but even Sora, which created "Our T2 Remake", or Gemini, which represents the multimodal generation track, although they already have enough topicality and popularity, complex operations often discourage beginners.

"Breaking the circle" has become a top priority for the implementation of AI applications, and only by opening up application scenarios can AI applications realize traffic monetization, so as to continue to grow with the support of a huge C-end user group.

Behind the popularity of domestic AI applications

The wonderful duck camera that successfully "broke the circle".

In the AI application to break the circle,9.9Yuan Miao Duck camera has undoubtedly made a pretty good demonstration,With its gorgeous and exquisite photo quality and high similarity with users, it has successfully broken the circle,And overseas Remini、PicsArt and other AI photo generation applications are also rising rapidly,Millions of dollars of revenue can be obtained through internal purchases alone,Plus monthly income can exceed two or three million"Chat & Ask AI" and "ChatOn- AI Chat Bot Assistant" With such AI chat software, people can clearly see the potential of the C-end consumer market and the trend of breaking the circle in the AI application segment.

Let the Mona Lisa sing the Mona Lisa by a thousand questions of Emo

Let Mona Lisa open her mouth to sing, Gao Qiqiang popularizes the law...... Behind a series of creative videos in the circle of friends, Ali Tongyi Qianwen EMO surfaced. EMO is a new AI image-audio-video model technology launched by Alibaba Group's Intelligent Computing Research Institute, which is officially defined as "an expressive audio-driven portrait video generation framework".

The fun of Tongyi Qianwen EMO is that users only need to provide a photo and an arbitrary audio file, and EMO can generate a talking and singing A video and a dynamic small video that can achieve seamless connection, such as "Gao Qiqiang" in the TV series "Hurricane" talking about Luo Xiang's popularization of law, and a picture of Cai Xukun can "sing" a rapper rap through other audio combinations, and even the lip shape is almost identical.

After getting the first batch of test qualifications, the reporter of "Computer Daily" clicked on the Yiqianwen App, and after upgrading to the latest version according to the prompts, he entered "EMO" in the home page dialog box to activate.

Behind the popularity of domestic AI applications

It includes two major sections: "National Dance King" and "National Singing".

After entering the EMO operation interface, the author found that it itself is composed of two major plates: "National Dance King" and "National Singing", the former was only recently made popular in the circle of friends with "Terracotta Warriors and Horses Dancing 'Subject Three'", this time EMO obviously focused on the upgrade on the creation of the "National Singing" section.

At the same time, the "Creative Square" is designed at the bottom (at present, the content of "Creative Square" is not further subdivided), users only need to click to enter the template they like, and then click the "Play the same style" button, and they can upload pictures to generate similar video clips as required.

Behind the popularity of domestic AI applications

It's a bit like the feeling of a certain sound "shooting the same style".

The images uploaded here must meet the requirements of EMO to ensure that the front face appears in the frame. Once the complete and suitable photos are uploaded, the user can wait with peace of mind.

Judging from the generation effect, the expression is very in place, any voice, any speed of speech, and any image can be corresponded to one by one, and the maximum time of such a dynamic small video can be about 1 minute and 30 seconds. Letting a girl with a cold expression open her mouth to sing a playful song is a very communicative and topical thing in itself, and it is naturally easy to blow up the circle of friends.

Some netizens resurrected their idols, some netizens resurrected the historical figures in the textbooks, and there were many funny videos, and everyone had a lot of fun. Netizens joked that with EMO, there will be no more emo.

A graphic and video track that is gradually gaining popularity

Tongyi Qianwen EMO can be said to be single-handedly leading the fire in the entire domestic Tusheng video track. In addition to Alibaba, Meitu's visual model MiracleVision 4.0 and Byte's AI creation platform Dreamina have also embedded the Tusheng video function, while Tencent, Tsinghua University, and the Hong Kong University of Science and Technology have jointly launched a new Tusheng video model "Follow-Your-Click".

Behind the popularity of domestic AI applications

Dreamina, a byte AI creation platform, also integrates the "Picture Video" function

Unlike Runway, Pika and other AI models with Tusheng video as their core advantage, domestic "Tusheng Video" applications are often backed by giants such as Alibaba, Meitu, and Byte, and their huge ecosystem is enough to promote the rapid implementation of the "Tusheng Video" function, and the works created by "Tusheng Video" itself can feed back the giants' ecological content system.

Behind the "Tusheng Video" function of different platforms, it is often a competition of large model skills of major giants.

The interesting Tongyi Qianwen EMO was not born out of nowhere, behind it is Alibaba's continuous investment and cultivation in AI large models and application fields over the years. In the past year or so, Alibaba has launched a number of A-scale model products that benchmark OpenAl, including Tongyi Qianwen and Tongyi Wanxiang, as well as technologies such as OutfitAnyone, a real-life multi-costume technology based on the dual-flow conditional diffusion model, and Animate Anyone, a character animation model, to achieve multiple scene applications. At the beginning of this year, Alibaba launched the Qwen-VL model to achieve multiple iterative upgrades, and announced the Plus and Max versions of the upgrade, which support images and text as inputs, and text, images, and detection frames as outputs, so that the large model truly has the ability to "see" the world.

The EMO framework uses the Audio2Video diffusion model to generate expressive portrait videos. The technique consists of three main stages: the initial stage of frame encoding, where ReferenceNet is used to extract features from reference images and moving frames, and the second is the diffusion process, where the pre-trained audio encoder processes audio embedding. The facial area mask is integrated with multi-frame noise to control the generation of the face image, and the third is to use a backbone network to facilitate denoising operations. Two forms are applied in the web-based – referential attention and audio attention mechanisms, which are essential for preserving the character's identity and regulating the character's movements, respectively. In addition, the time module of the EMO is used to manipulate the time dimension and adjust the speed of movement.

Behind the popularity of domestic AI applications

Behind the phenomenal application is always the competition of AI large model technology

From pictures to videos, people's micro-expressions are often the key to whether AI-generated video content is "fake at a glance". Byte Dreamina has been commented by many users that "it has done a good job in the overall simulation of human movements, but it is still relatively rough in terms of facial expressions, finger movements and other details, especially in long-term close-ups, the subtle expression changes of the characters are often not in place, and it looks a little dull." Tencent's "Follow-Your-Click" designed the WebVid Motion dataset to emphasize human emotions, movements, and common object movements, and designed a motion enhancement module to enable the model to understand short prompts.

Tongyi Qianwen EMO was able to cause a sensation in the end market this time, which has a lot to do with its excellent facial expression management. EMO introduces a speed controller and a facial area controller that can control facial micro-expressions and make videos more expressive.

Watching the video made by netizens on EMO, you will find that the video characters can also have delicate changes in facial expressions according to the emotional changes of the song when singing, which is very expressive. Of course, EMO is still just a realistic video, but it is made more smooth and realistic, and logically it is still very different from Sora, which is almost a professional track, but its unique social attributes and low threshold are enough to make it have the potential to become a phenomenal application.

In general, from HeyGen, a translation video generation tool that allows Guo Degang to soar in English and mold to speak Chinese, and the "Miao Duck Camera" that set off a boom in AI ID photos, to today's Tongyi Qianwen EMO, AI has frequently incubated explosive applications in the C-end market, and has also promoted the maturity of the entire C-end market. Whether it's ChatGPT Plus, which charges $20 per month, or Kimi, where users actively discuss the membership payment mechanism, the era of C-end applications of AI has begun......

Read on