Optimization and breakthrough in the era of big model 2.0

Image source: @VisualChina

Text | Baker Street Detective, author | Vehicle transportation

In June this year, major manufacturers have upgraded their ChatGPT-like products. On June 9, iFLYTEK launched an upgraded version of the Spark cognitive large model; On June 13, after the release of ChatGPT-like products, 360 held another 360 Wisdom Brain Large Model Application Conference.

Unlike the large models released around February, the upgraded versions of the products released by various companies recently are more application-layer and are easier to reach thousands of households.

From the current release, 360 Brain has initially had cross-modal generation capabilities, in addition to completing text generation text, tables, pictures; In addition to basic creations such as picture generated text, pictures, and video generated text, text clipping, etc., it also redefines "digital human" and gives users an exclusive "artificial intelligence" that can be customized with "soul, design, and memory".

At present, the closest application scenario between 360 Wisdom Brain and users is 360's existing family bucket, Zhou Hongyi said at the press conference, "360 Wisdom Brain 4.0" will access 360 Security Guard, 360 Browser, 360 Search, etc., trying to reshape human-computer collaboration.

At the press conference, Zhou Hongyi changed his previous views, "I once said that the gap between domestic large models and ChatGPT is two years, and now I want to take back this sentence." Then added the latest view, that the current level of domestic large models is equal to GPT3.5, and at this speed, catching up or even surpassing GPT4 will be in an instant.

In the four months from the release of the initial version to the official release of 360 Wisdom Brain, Zhou Hongyi saw such a huge change?

01 Big tech manufacturers hunt big models

The "Chinese Intelligence Big Model Map Research Report" released at the 2023 Zhongguancun Forum shows that at present, the Chinese Intelligence Big Model is showing a vigorous development trend. According to incomplete statistics, up to now, 79 large models with parameters of more than 1 billion have been released nationwide.

However, the large model parameters of science and technology manufacturers are large: Ali Tongyi Qianwen large model parameters are more than 10 trillion level, Tencent Mixed Element large model and Huawei Pangu large model parameters are more than trillion level, Baidu Wenxin Yiyan large model parameters are more than 200 billion level, and Jingdong Yanxi large model parameters are 100 billion; The number of parameters that have been launched by technology enterprises in vertical industries is generally more than 100 billion; The parameters of the large model of scientific research institutions are at the level of 100 billion and below.

From the perspective of the layout system of large models, science and technology manufacturers have carried out a four-in-one comprehensive layout in the computing power layer, platform layer, model layer and application layer. Baidu, Alibaba, and Huawei have all carried out independent R&D from chips to applications, such as Baidu's "Kunlun chip + flying propeller platform + Wenxin large model + industry application", Ali's "Hanguang 800 chip + M6-OFA base + Tongyi large model + industry application", and Huawei's "Ascend chip + MindSpore framework + Pangu large model + industry application".

In addition, Kingsoft Office also released WPS AI on May 31, at present, WPSAI has been connected to Kingsoft Office's office components light documents, text, forms, presentations, PDF, and will anchor the development of AIGC, reading comprehension and Q&A, and human-computer interaction in three strategic directions in the future, and access the full line of Kingsoft Office products.

The rapid influx of large manufacturers into this track is mainly because the regulator quickly followed up and introduced measures to standardize the development of the industry, and with the top-level structure to escort, each large factory can naturally invest in research and development and launch products with confidence.

Since the mass launch of large models in March this year, AI regulatory policies have gradually become clear, which has also pointed out the direction for industry applications.

Looking back on the development history of the entire industry, on April 11, the draft of "Generative Artificial Intelligence Service Management Methods" was released; On May 30, the Academy of Information and Communications Technology is jointly compiling the "Paper Kite" open artificial intelligence model license, and the next step will be to release the "Paper Kite Open Artificial Intelligence Model License (Draft for Comments)".

Subsequently, first-tier cities cooperated in releasing the Implementation Plan for Beijing to Accelerate the Construction of an AI Innovation Source with Global Influence (2023-2025); Shenzhen Action Plan for Accelerating the High-quality Development and High-level Application of Artificial Intelligence (2023-2024).

In this context, Zhou Hongyi believes that the gap between domestic large models and ChatGPT will be quickly narrowed, which seems to be easy to understand.

03 How is the 360 wisdom brain different

According to Zhou Hongyi's plan, the 360 model will take into account scenario, productization, flattening and verticalization on the basis of continuously upgrading the large model.

Under this development strategy, 360 Brain can cover four major application scenarios: consumers (user personal AI assistants), small and medium-sized enterprises (SaaS vertical applications), enterprises/governments/cities (privatization deployment large model), and industries (industry vertical large models).

The application of the above four different scenarios is mainly based on the API of other companies and the specific products of different industries based on the 360 model API. At this stage, many departments and enterprises have a large number of exclusive IPs, and if they are all put into a public model, the incident of Samsung leaking chip secrets due to GPT will be reproduced, which highlights the importance of proprietary GPT products.

In order to better meet the needs of the above different scenarios, the general large model needs to complete the transformation from input text output to understanding images and videos, and being able to produce images and videos on the existing basis, which is equivalent to making the large model have "ears" and "eyes", laying the foundation for creating "digital humans".

Traditional digital people only need to output according to the established script, but in the era of large models, 360 digital people can be customized, so people, memory, experience, currently in the 360 digital human square platform, there have been more than 200 colors, divided into digital celebrities and digital employees two categories. 360 hopes that in the future, everyone can have their own AI assistant and have the opportunity to communicate with the ancients in virtual space, across time and space.

At the demonstration, Zhou Hongyi asked "Zhuge Liang" how he saw that today he became a ghost animal material, and the digital man replied in Zhuge Liang's tone: The fate of ancient and modern husbands is inevitable in fact. Today's situation is full of turmoil. Although I am old, I still aim to be in the world. Today's young people use me as ghost material, and I gladly accept this change. And I wish young friends to forge ahead on the road ahead and create a better future.

At the same time, Zhou Hongyi also stressed that in the future, the form of digital human will have its own goals, planning and decomposition capabilities, so that various vertical models can be called to complete tasks.

However, these functions are actually optimized on the basis of existing large model applications, and do not open up a completely new field. But in fact, when large models make breakthroughs, the most creative application scenario is unmanned driving.

03 Driverless driving has the opportunity to drive into the fast lane

Looking back at the field of unmanned driving, since 2016, major manufacturers have been laying out this field, but until this year, none of them can achieve true unmanned driving.

Currently, an L2+ level driverless system requires 10+ cameras; 1-2 lidars; Or the multi-dimensional data provided by 3-5 millimeter-wave radars, and the data can only be used for model training after manually labeling the data. With the advent of large models that recognize images, the time and material costs required for manual labeling will drop dramatically.

According to the April 2023 DriveGPT release, the current cost of manual labeling in the industry is about 5 yuan per map, and the cost of DriveGPT is 0.5 yuan. We believe that after the training of large models of technology companies, the marginal cost of automatic labeling of a single graph tends to be close to 0, and the average cost is expected to further decrease.

According to Zhang Peng, vice president of Kaiwang Data Product Project, in February 2023, data labeling is currently dominated by manual labeling, supplemented by machine labeling, and 95% of data annotation is still manual. The intervention of large models can greatly improve the efficiency of this industry. Taking Tesla as an example, the manual labeling team was more than 1,000 people in 2021, and the team laid off more than 200 employees in 2022.

In addition, in the era of large models, third-party technology giants are expected to help automakers build their own autonomous driving algorithms and data closed-loop systems by providing a complete tool chain, while relying on the data generation capabilities of large models to narrow the gap in the data field, and the Android era of autonomous driving is expected to come.

At present, large models have been used to empower data closed-loop, simulation, perception algorithms, regulation algorithms and other fields. Giants such as Microsoft and Nvidia compete for layout in large models and autonomous driving, or will create new sparks.

In addition, the emergence of large models also promotes the division of labor in the industry, avoiding "reinventing the wheel", while accelerating the iteration of sensors and chips, and the system cost is expected to be greatly reduced. Large model developers and players in the autonomous driving industry chain are expected to benefit in an all-round way.

Taking Baidu Apollo as an example, it first uses graphic information to pre-train a primitive model, uses algorithms to recognize, locate and segment the street view image data, and puts it into the encoder to form a bottom library, that is, a data pool corresponding to pictures and text information is established based on street view.

Secondly, specific scenarios (such as express cars, wheelchairs, children, etc.) can be searched and mined in the form of text, images, etc., and customized training of vehicle-end models can be carried out to greatly improve the utilization effect of stock data.

Baidu uses semi-supervised methods to make full use of 2D and 3D data to train a perceptual large model. By distilling small models in multiple links, the performance of small models is improved, and customized training for small models is automatically annotated to enhance the long-distance visual 3D perception ability and improve the perception effect of multimodal perception models.

Another leading player, SenseTime, has also publicly stated that AIGC can be used to generate real traffic scenarios and difficult samples to train the autonomous driving system, using multimodal data as input to the large model to improve the upper limit of the system's perception of Cornercase scenes.

At the same time, the multi-modal large model of autonomous driving can achieve integrated integration of perception and decision-making, and the 3D environment can be reconstructed through the environment decoder at the output to achieve visual understanding of the environment. The behavior decoder generates a complete path plan; Motivation decoders can describe the reasoning process in natural language, making autonomous driving systems safer and more reliable.

After the large model realizes the above functions, the threshold for unmanned driving will be lower and lower in the future, and while the head manufacturers accelerate the process of unmanned driving projects, they can also allow more new players to join this field, and open up tracks that require road planning functions in addition to road navigation, such as further optimizing the path planning of sweeping robots.

Now, after the centralized release period of large models from February to March, the product development period from April to May and the gradual clarification of the policy direction, the AI large model products and applications that have entered in June are expected to usher in a centralized release period, which also directly leads to the price reduction of OpenAIAPI.

In the foreseeable future, AI technology is still continuing to iterate, applications are also continuing to advance, and more and more technology manufacturers have launched products to cut into this track, and will continue to promote the rise of industry prosperity, and bring users GPT products that are more in line with market demand, such as Tencent, which has a large user base, also released technical solutions in the field of large models on June 19.

When these companies roll together, the development of the industry into the fast lane, but also means that C-end users will soon be able to use this product, as for who will pay, each manufacturer needs to rely on its own skills. (This article was first published on the titanium media APP)