laitimes

Directly hit WAIC丨Big model wind rises What is the solution to the dilemma between general and vertical?

21st Century Business Herald reporter Zhang Zitong reported from Shanghai

From July 6th to 8th, the 2023 World Artificial Intelligence Conference opened in Shanghai, as the center of this year's WAIC, the large model has become the focus of heated discussions among participating experts and enterprises.

The 21st Century Business Herald reporter learned during the visit that there are two large models of route differentiation in the current industry. One is a general-purpose large model represented by SenseCore, a large AI device of SenseTime and Baidu's "Wen Xin Yiyan", with large-scale parameter volume; The other path is to carry out the construction and application of large models in vertical industries on the basis of open source large models. That is, the bottom layer is an open source technology model, and the upper layer is for vertical industries, using vertical industry data combined with general data.

"The problem to be solved by the generic large model is to allow people who do not have the capacity building of the generic model to enjoy the convenience of the universal model; The large model of vertical industry should focus more on solving the problems of the industry, and there are great differences in the division of labor. Liu Yidong, chief technology officer of Midu, said in an interview with the 21st Century Economic Herald that the general large model is more suitable for head enterprises with very strong resources, time and technical strength, while some smaller enterprises are more suitable for focusing on the industry and focusing on vertical fields for model development.

Wang Xiaogang, co-founder, chief scientist and president of SenseTime's intelligent car business group, said in an interview with the 21st Century Business Herald that SenseTime's general large model: "AI device SenseCore" can be understood as an infrastructure within SenseTime's understanding. While making large models, the teams of each large model will also provide the models to the industry lines in various industries.

Distinguish between universal and vertical

The recently released "Chinese Intelligent Big Model Map Research Report" shows that at present, the Chinese intelligent large model is showing a vigorous development trend. According to incomplete statistics, up to now, 79 large models with parameters above 1 billion have been released nationwide. In parallel with this, there are many industry models that focus on vertical scenarios

In response to the difference between the two, Liu Yidong said that the general large model has a lot of data and large parameters, but it does not focus on a certain task. When the industry model enters the vertical field, it will have specific task goals, for example, in the proofreading task, the industry model can use the accumulation of common sense problems to detect problems such as easy to confuse words.

But at the same time, if only vertical industry data is used to train the model, the model's cognition will also be biased, so the general large model has become the "underlying base" of this era.

In this context, how to deeply understand the scene to serve customers well has become a problem faced by vertical large models.

Liu Yidong told reporters that in the process of building vertical large models, model providers interact with industry customers. From the perspective of model training, large models need augmented learning based on human feedback, while artificial intelligence enterprises need data based on customer feedback to further promote the improvement of large model construction capabilities in the industry.

"Therefore, the process of our service is also a process of co-promotion, and our customers will continue to give us feedback and optimization on the data quality of vertical industry training." Liu Yidong said.

In addition, Liu Yidong said that the core of the future competition in the industry is to test the ability of different enterprises to land large models and make profits. "This involves the question of a huge mechanism of how the company operates, not just the model itself. Therefore, the enterprises that can survive in the vertical landing application of large models must be enterprises that can achieve a balance of input-output ratio and are virtuous circles. ”

During the 2023 World Artificial Intelligence Conference, Midu released the first knowledge Q&A and content generation large-language model that supports localized software and hardware operating environments - honeynest. The reporter learned at the scene that the honeynest knowledge Q&A and content generation big language model is built on the basis of hundreds of billions of high-quality Chinese multimodal data training, which can realize the customized content generation of "thousands of texts and thousands of faces, thousands of people and thousands of faces".

Liu Jie, dean of the Institute of Artificial Intelligence of Harbin Institute of Technology and IEEE Fellow, pointed out at the "Intelligent on the Cloud, Trusted Intelligent Computing" sub-forum hosted by China Electronics Cloud that compared with vertical industry models that focus on upper-level applications, the general large model will undoubtedly invest more in technology and funds.

"GPT-3 training data has 500 billion data points, in terms of energy consumption, compared to from the moon to the earth and back, training is about 5 million to 10 million US dollars, consuming 190MWh, so it is roughly estimated that training a large model requires a volume level of 100 million yuan, so don't play with a big model without this money."

In the context of such "burning money" in the cost of data training, how to make large models have self-learning capabilities has become a difficult problem for small and medium-sized enterprises to solve when they get involved in large models.

Liu Jie took OpenAI as an example to say that in his view, its core advantage lies in the use of reinforcement learning methods, using people to evaluate the quality of the generative model to achieve further cost reduction and efficiency. "The most successful thing OpenAI has done is to get people to contribute as little as possible, that is, to find a way to measure the quality of human conversation, to achieve an automatic closed loop, and the so-called 'flywheel effect' can be turned."

In other words, because the user constantly inputs adjustment signals to the large model during use. Therefore, it is difficult to generate evaluation models in systems without enough users.

"Now the competition is on tens of billions, hundreds of billions, and even hundreds of millions of models, and the scale of the models is getting bigger and bigger, and the diameter is not visible." Liu Jie said that the current model parameter volume is constantly improving, and the trend of large models is still growing.

Build public computing power

As mentioned by the above-mentioned industry insiders, in the context of the cost of a single training of 100 billion parameters reaching tens of millions of dollars, the application of large models by small and medium-sized enterprises is inevitably "prohibitive". Due to the large amount of these model parameters and data, the importance of subsequent supporting infrastructure is also becoming increasingly prominent.

Therefore, the industry began to explore the construction of inclusive public computing power centers from the national perspective, that is, to provide computing power in the service mode of building public infrastructure, reduce unit variable costs, and provide computing power services for the whole society.

"We believe that inclusive computing power is an important channel to solve costs." Song Yu, vice president of CCID Consulting Co., Ltd., told the 21st Century Economic Herald reporter that the current general factory model is increasingly difficult to give full play to the entire computing performance, which will cause a certain waste of resources. At the same time, for small and medium-sized enterprises, the cost of large-scale algorithms and high-quality data is difficult to accept if they build themselves. What users need more is computing, and there are full-process services based on computing.

"Therefore, I believe that in the future, the implementation of computing power will be realized in a 'computing power wind tunnel' mode." Song Yu said.

Recently, China Electronics Cloud, a central enterprise with network information as its core business, proposed during the 2023 Artificial Intelligence Conference that it will invest in the construction of N trusted intelligent computing centers in China in the next 2 to 3 years to meet the needs of the industry. On the other hand, China Electronics Cloud will also develop a set of trusted intelligent computing cloud platform products that are heterogeneously compatible, secure and trusted, cloud-data integration, and open sharing.

In addition, with the growing demand for high-density computing power in the future, the energy consumption of computing power centers is also facing increasing challenges.

"Data centers and intelligent computing centers themselves are a large energy consumer, and in the long run, the POE control of intelligent computing centers will become more and more strict."

Liu Jie believes that in the context of increasingly stringent energy consumption standards, the industry should explore the in-depth promotion of the greening of the intelligent computing center from the whole chain in the future, including the increase of the energy consumption of the intelligent computing center, the proportion of green electricity use, and the application of new energy-saving technologies for liquid cooling, so as to solve the problem of high energy consumption.

For more information, please download 21 Finance APP