laitimes

The second half of the battle of large models: from general to industry vertical, "downward" rooted

author:Blue Whale Finance

Text: The First New Voice Qiuping

Edit|OK too

Recently, First New Sound and Tianyancha officially released the "2023 China AIGC Innovative Enterprise Series List", which shows the industrial chain layout of generative AI from three dimensions: basic layer, model layer, and application layer. The model layer mainly includes general large models and vertical large models (scenario/domain/industry large models).

At present, only the head "krypton gold players" in the basic layer are eligible to enter the game, and they are not among the fierce involution. The application layer is the "kaolin flower" that grows on the large model. As the basic model of generative AI, large models provide it with powerful language processing capabilities and wide applicability. According to public information, as of October this year, 238 large models have been released in China. The "100 Model War" is fighting extremely hot!

First New Voice found from the process of list selection and research that the battle of domestic large models is gradually entering the second half. The focus of leading technology companies has begun to shift from general models to vertical models such as industries/fields, and they have begun to take root "downward".

For example, on October 31, Alibaba Cloud not only released the latest version 2.0 of the Tongyi Qianwen model, but also launched eight industry models, on September 21, HUAWEI CLOUD released the Pangu medical model, and on September 19, Baidu officially released the first "industrial-level" medical AI model in China, the Lingyi model. It can be said that after "AI For Science", large models have begun to enter the stage of "AI For Industries".

In order to conduct in-depth research on the development direction and application effect of the general VS vertical model, First New Voice interviewed 3 companies and comprehensively introduced the evolution direction of the general VS vertical model based on the practice of each company.

01 There are more than 200+ large models in China, focusing on 3 basic application scenarios

Since the birth of ChatGPT at the beginning of the year, it has detonated the enthusiasm of large models at home and abroad, and funds from all walks of life have flocked to it.

According to relevant media reports, the number of pre-trained models on Hugging Face, the world's largest open-source community for large models, has increased from 100,000 to more than 300,000. I don't know if Open AI expected that it would be in full swing when it first released ChatGPT.

Returning to the domestic market, according to incomplete statistics from public information, as of the end of November 2023, 200+ large models have been launched in China, and they are "falling into place" in all walks of life. Judging from the statistics, in addition to the general large model, the landing speed in the financial industry is the fastest, and nearly 15% of the large models are financial vertical large models.

In terms of the types of large model manufacturers, domestic Internet technology companies have entered the game, including Baidu, Alibaba, Tencent, Huawei and other large manufacturers, iFLYTEK, SenseTime, Megvii Technology and other manufacturers in the field of AI, as well as large model start-ups such as Zhipu Huazhang, Baichuan Intelligence, Daguan Data, etc., as well as vertical industry enterprises such as finance, automobiles, education, smart home, and consumer electronics, which are also based on artificial intelligence technology and data accumulation capabilities in vertical fields, and launch large models.

It is worth noting that in the first half of this year, everyone's attention was mainly focused on the number of parameters and effect optimization of large models. Starting in the second half of the year, the focus shifts to how to put it in practice and how companies can use their capabilities to deliver revolutionary efficiencies. After half a year of practice, the three companies interviewed by First Xinsheng have gradually explored the development path of large models with their own characteristics.

For example, the "Yuanxin Model" launched by Wofone Technology in April this year absorbs the capabilities of the general model, and conducts industry knowledge training on the basis of 8 years of experience in the field of marketing + service, transforming the general model into an industry expert, and can build an exclusive knowledge base based on enterprise information. At present, Wofone Technology has successfully applied this large model to its four product lines: Udesk, GaussMind, ServiceGo, and Weifeng.

Zhao Chao, an AI algorithm expert at Wofone Technology, said: "Large models have a huge demand for computing power and data, and Wofone Technology has accumulated a large amount of online text, text and voice data since its establishment. Based on the existing data, the company plans to iterate the model for industries or specific scenarios. To this end, the team adopts the industry open source model and uses the data accumulated in the customer service industry to optimize and innovate the model to better meet the needs of the industry and improve the application effect in specific scenarios. ”

In the iteration of the full parameters of the large model, some skills and language problems will be encountered, so Wofone Technology has adopted two training strategies. One is to fix a part of the parameters and iterate only on the rest. The second is to iterate on the basis of the general large model.

Cloudwalk officially launched the "Calm Large Model" in May, the biggest feature of which is that Cloudwalk has a multi-modal series of large models, and has the ability to adjust industry large models, which can help customers deploy models according to the needs of industry scenarios to achieve the best cost performance. In July, Cloudwalk and Huawei jointly released the "Integrated Solution for Training and Pushing Large Models". The solution is based on cloud-based large model algorithms and tools, allowing users to easily train, build, and manage their own large models.

Regarding the prosperity of the domestic market and the company's plan for large models, Zhang Li, vice president of Yuncong Technology, said to First New Voice: "In fact, the company has made technical reserves in the field of large models two years ago. Due to the fact that the chips and computing power have not reached a high level, the large model cannot give full play to its effectiveness and efficiency. Last year, the performance of GPU chips led by NVIDIA has been significantly improved, especially the parallel computing power, which makes the training of large models more industrialized and possible, which has promoted the vigorous development of the large model industry and market this year. ”

The "Cao Zhi" large model launched by Daguan Data is the first batch of domestic GPT large language models dedicated to vertical industries in China, which has the characteristics of long text, verticalization and multilingualism, and is good at long document writing, review, translation, etc.

"Daguan Data has always focused on the ToB field and has accumulated deep professional experience in industries such as finance and manufacturing. The landing route we take is to introduce large models into the original products to provide customers with more valuable services. For example, IDPS, Daguan's intelligent text processing platform, used to be mainly biased towards text extraction, which required complex steps such as labeling, training, and tuning to achieve results. However, the large model can now be used to achieve automatic extraction without labels, which significantly reduces the delivery cost. Let the enterprise truly reduce costs and increase efficiency. Ji Daqi, CTO of Daguan Data, said.

Through exchanges with the three interviewed companies and previous research, First New Voice found that there are three common basic application scenarios for large models in enterprises: First, if enterprises want to use large models to directly generate articles, pictures, designs, etc., then they can use GPT or other open-source large models with a little Fine-tune (fine-tuning), and the follow-up work is mainly for front-end page design, without too much model iteration.

Second, enterprises want the large model to reflect the attributes of the enterprise when providing services, such as answering questions related to the enterprise. In this case, it is also difficult to quickly iterate a unique model for each enterprise, and the situation of the enterprise is changing at any time, and the corresponding model needs to be constantly adjusted. Therefore, it is feasible to combine the enterprise knowledge base with the large model.

Of course, there are also companies that have confidentiality needs for their knowledge base and are reluctant to provide it to external models. In this case, you can also deploy based on the model you trained. There are usually two deployment methods: one is to use the enterprise knowledge base to iterate on the basis of the enterprise's own model, and the other is to strengthen the understanding of the large model through RAG (RAG: Retrieval-Augmented Generation Retrieval Enhancement Generation) and then combine it with the knowledge base. The most direct advantage of RAG is that it allows large models to use their own logical derivation capabilities to understand enterprise private data and expand their Q&A capabilities.

Third, data analysis is also a common scenario for some enterprises. Traditional report configurations are complex, and when there are many reports, finding a specific report is time-consuming. Through the natural interaction mode of the large model, users can directly ask questions and realize intelligent data query. This interactive way of analyzing data is intuitive and efficient, and users can quickly get the information they need, which greatly improves the user experience.

02 General VS Vertical: Each has its own advantages and complementarity

Both the general model and the vertical model have their own unique capabilities, and they are complementary.

Because the general large model has strong language understanding capabilities, it can broaden the breadth of the application range, while the vertical large model is aimed at specific industries or needs, and can better meet the actual requirements in terms of accuracy and depth. These two are not opposites, but mutually supportive and synergistic development. In the future, the two types of large models will coexist and become the key to empowering thousands of industries.

Ji Daqi also agrees with this point of view, "The general model and the vertical model have different goals for or solve problems, the general model needs to have stronger generalization, and the vertical model must maintain high accuracy in the application of vertical industries." ”

Referring to the landing space of general and vertical models, he believes that one of the core differences is in terms of customer needs, and customers of different levels and sizes have different requirements for large models. For example, in ToC-end or small and medium-sized B-end enterprises, customers have lower requirements for the effect of the model, but pay more attention to cost control. As a result, they may choose to use a generic large model to solve some of the problems in order to achieve above-the-ordinary results at a lower cost.

However, for some large B-end customers, the ability to improve performance can have a significant impact and value on their business, so they are willing to invest more costs. These customers may choose to train a large vertical model or take advantage of a professional vertical large model service like Daguan Data to get better results. In this case, the customer's focus is not only on cost, but also on how to achieve the best business results.

Therefore, in the application of large models, it is very important to flexibly choose the model strategy suitable for specific business scenarios.

Zhao Chao also said that the iteration cost of general large models is high and requires a lot of computing power support. On the contrary, vertical large models have lower decision-making costs and require less computing power. However, the root of vertical large models is always in the general large model, which is usually based on the general large model and is trained by SFT supervised fine-tuning and other methods. In addition, if the basic capabilities of the general model are strong, the tuning cost of the vertical model is relatively low.

When verifying algorithms and strategies, since the vertical large model can be iterated and verified in a relatively short period of time, enterprises usually give priority to verifying and tuning the vertical model after the verification is completed, and then apply the experience to the general model, so as to improve the ability of the general model. After the general model is effectively improved, the industry model is iterated. It is a spiral cyclical process, which promotes the development of vertical models and general models to learn from each other and complement each other, rather than in the direction of single exclusion.

Zhang Li said that from the perspective of industry application, the general model is not a product, but a capability. If a business wants to purchase this capability, it usually needs to meet three conditions. "First, we must have sufficient financial reserves. Second, it is necessary to have the data and know-how accumulation of the industry's own model. Third, it is necessary to have corresponding technical capabilities. Understand the underlying principles of large model technology and how to train a model that meets their needs, and the flexibility of this ability allows customers to better leverage large model technology to meet their domain-specific needs. ”

In addition, Zhang Li also emphasized that the landing application of large models cannot be two burdens and one hot, depending on both ends. On the one hand, the supply side should have the accumulation and ability of vertical industry landing models, and on the other hand, the demand side should think clearly about what problems it needs to solve and what goals it needs to use large models.

However, in Zhao Chao's view, customized models may have higher value in vertical industries, mainly in two aspects: First, vertical industry models can better meet the specific needs of enterprises and create more business opportunities for enterprises. Second, the use of different large models will bring significant cost differences. Therefore, enterprises can choose to optimize training on large models, compressing large models with billions of parameters into vertical models with hundreds of millions of parameters.

"One possible solution is to annotate the data with a large model and then train it with a smaller model. In this way, it can not only provide enterprises with the excellent effect of vertical models, but also reduce the threshold for the use of hardware resources, thereby reducing the cost burden of enterprises to a certain extent. By fine-tuning the size of the model parameters, it is possible to meet the needs of specific industries and achieve higher economic efficiency in resource utilization. This strategy helps to provide enterprises with a more flexible and sustainable approach to the application of the model. Zhao Chao said.

In the future, giant companies such as Unilever, McDonald's, and Coca-Cola are likely to train their own large models. Zhao Chao believes that although this is a large private model from the outside, in fact, one training method is to use the enterprise's own large amount of data to train a complete model. Another approach is to use a vector database strategy, where the internal data is converted into vectors, and then the vectors are processed to obtain a smaller model that can be used in conjunction with the larger model. This approach allows for the ability to train the model separately and at a lower cost. "From the customer's side, the output model has enterprise characteristics and characteristics, but from a technical point of view, the essence is the superposition of large models and small models. ”

He also believes that in the future, this "big model + small model" method may become the mainstream landing method to a large extent in the actual application process. Because frequent iterations of the underlying model are difficult and require high computing power. Unless it is for technical research, buying a large amount of computing power is likely to cause a waste of resources, and the benefits are not obvious.

03 How to break through the three thresholds of computing power, data, and algorithms?

The application of large models is inseparable from the support of computing power, data and algorithms. This means that small and medium-sized enterprises or enterprises with insufficient computing power will have a high threshold for applying large models.

First, in terms of computing power, enterprises can try to increase the number of iterations and improve the convergence speed of the model without increasing the cost of hardware. At the same time, the computational complexity can also be reduced by converting floating-point numbers to fixed-point numbers and preprocessing large-scale matrix operations. These methods can effectively save computing resources and improve the training efficiency and overall performance of the model. In fact, some breakthroughs have been made in matrix operation, for example, a fast calculation method for super-large matrices has been proposed in the academic community, which is dozens of times faster than the traditional row and column calculation method.

In terms of computing power, Zhao Chao's view is that, on the one hand, enterprises with insufficient computing power can consider using small-scale computing power to do experiments to verify the application effect of large models. This is also one of the optimization directions to be considered within enterprises and academia. On the other hand, Few-shot Learning (Xi) and Zero-Shot Learning (Xi) are the most popular large-scale model training technologies. They can demonstrate strong learning Xi and reasoning skills in situations where data is insufficient. In this way, enterprises with insufficient data can effectively apply large models to optimize performance. With these two methods, continuous optimization and innovation can promote the wide application of large model technology.

Second, in terms of algorithms, it is also necessary to explore structures and methods that are more suitable for large models. Currently, most of the large models are built on the Transformer model proposed by Google. However, the Transformer model is not necessarily the best choice. For example, some researchers have introduced other structures such as ResNet (deep residual network) on the basis of the Transformer model, and have achieved good results in the field of images. Therefore, the innovation and optimization of algorithms is still a promising direction.

Third, in terms of data, we need to consider how to improve the quality and applicability of data. With the explosion of data on the Internet, the types and forms of data have become more diverse and complex. For unstructured data, it needs to be structured in advance to facilitate the Xi and understanding of the model. At the same time, the data needs to be cleaned and filtered to remove noise and useless information.

All of the above paths can effectively improve the validity and reliability of the data, thereby improving the generalization and adaptability of the model.

Regarding the future development of large models, Zhang Li's view is that the development of large model technology will shift from R&D-driven to eco-driven, which is an inevitable trend. Customers' needs for large models will become more and more complex, and large model manufacturers cannot directly solve all customer problems, nor can they have a comprehensive and profound grasp of the know-how of all industries. Therefore, the landing application of large models needs to be supported by professional information service companies in various industries.

"This cooperation model can more effectively respond to the professional needs of different fields, so that the application of large models can penetrate into various industrial chains more quickly and deeply. Moreover, through close cooperation with information technology companies, large model manufacturers can also build an ecosystem to make the development of large models more comprehensive and sustainable. Zhang Li said.

04 Two major problems in the implementation of large models

Although the development of large models is currently very active and lively, there are still two major difficulties in the actual implementation.

Difficulty 1: How to find the right application scenario?

Ji Daqi said that in order to make the large model technology truly land, it is necessary not only to rely on the large model itself, but also to consider the intermediate implementation process and the path to the last kilometer, that is, to design a suitable product form, choose the best cost performance, control the cost of machine resources, and finally find the best landing effect. Therefore, it is necessary to have professionals who understand both large models and the industry to solve this problem together.

One of the main problems in the industrialization of ToB is the increasing difficulty of supervision. On the ToC side, it is also necessary to face regulatory requirements such as filing. In the traditional Internet era, it is relatively easy to review text content, and some problematic content involving ideology can be discovered and dealt with in a timely manner. However, large models make regulation significantly more difficult. Therefore, in the process of implementation, how to carry out effective supervision has become an urgent problem to be solved. Failure to do so may result in misuse, misuse, or other potential legal issues. While solving regulatory problems, we also need to think about how to let more people benefit from the application of large models. In a word, how to ensure the balance between reasonable regulation and promoting social benefits is a key issue that the whole industry needs to seriously consider and solve.

"After the customer provides the data, the engineering team of Daguan Data will process it according to the specific situation, and this step is actually quite smooth. But the more difficult problem is how to combine large models to give full play to the value of data and empower enterprises to achieve clearer business goals. This requires a clear business strategy, defining the functionality and features of the product, and ensuring that the entire process effectively meets the needs of the customer. Ji Daqi emphasized.

Therefore, the challenge for all companies today is to think strategically about the application of large models and to translate these thoughts into concrete product design and implementation steps. Solving this challenge requires a combination of data science, business insights, and technical expertise to form a comprehensive and actionable solution. Ultimately, through deep strategic planning and clear product design, the potential of data and big models can be better leveraged to achieve more targeted and effective business outcomes.

Nowadays, the focus is not only on how to develop great large models, but also on how to apply those models better. This requires consideration of the solution level, especially the user experience level, rather than being limited to applications like OpenAI's chat capabilities, or just solving problems like search engines.

Current and future trends also indicate that people want to apply AI in more scenarios and use it as an underlying platform. This requires enterprises to innovate from 0 to 1, and constantly find some scenarios that are suitable for implementation and can be promoted on a large scale, so as to have more landing inspiration and methodology, and enhance everyone's confidence in this field. I believe that there will be a lot of large-scale models next year.

Difficulty 2: It is difficult for strategic planning and software and hardware facilities to be perfectly compatible.

Zhang Li explained that there are five factors that cause this difficulty: First, the customer's goal is not clear, which leads to the inability to achieve the expected effect.

The second is that many customers do not have enough understanding of the large model, and mistakenly think that this is a mature product that can be used out of the box after buying.

Third, even if the first two problems are solved, a detailed implementation plan has been formulated for the customer, and the application of the large model in the customer's enterprise will be promoted in stages. However, in such a long period of time, no one can guarantee whether the customer's strategic goals will change, which involves the stability and sustainability of the customer's strategic layout on the large model.

Fourth, the landing of the large model must be a two-way process. The customer is the protagonist, and the technology company is the "coach" positioning, responsible for accompanying and guiding the customer forward. However, due to the high requirements for the technical capabilities of enterprises using large models, and the traditional informatization capabilities of many customer technology departments, customers eventually rely entirely on technology companies, making technology companies change from "coaches" to protagonists, and the relationship is misplaced. This is seriously problematic because the goal of technology companies is to empower multiple industries and cannot focus on just one customer.

Fifth, the application of large models in the vertical market not only considers the model capabilities, but also considers the hardware configuration, but it is impossible for customers to completely replace the original hardware, subvert their original systems, and more importantly, consider the integration with the original systems. This requires engineering and integration capabilities to help customers reasonably integrate large model technology and existing resources. This involves the compatibility of the original system, software, database, hardware, etc

In the face of the above problems, Kei's view is that people should reach a consensus on two points. First of all, in the future, only a few manufacturers may have the ability to provide high-quality underlying general large models, and vertical large models and their industrial applications will usher in a lot of opportunities and competition. In the future, multiple large models may be combined at the same time to solve various problems within the enterprise. Second, the goal of enterprises is to use AI to solve problems, not simply combine with AI. As a result, companies need to think about how humans and machines can work better together and solve problems as a starting point. It's not about chasing big models for the sake of using them.

Zhang Li also holds the same position, she believes that when using large models to solve fundamental problems, it is necessary to focus on the effective combination of technology and industrialization. The focus of large model vendors should also be to build model-based applications or products to meet the actual needs of customers, rather than using large models for the sake of promoting large models. If you find that a large model is not up to the task, Cloudwalk can switch to other large models, even open source models. The goal is always to work together to solve the real problems faced by our customers.

"In the past, many applications may not have been as good as they could have been from the user side, but the introduction of large models can make them better, better understand user needs, and achieve a higher degree of automation. Rather than disrupting all the applications today, the company is adding the power of large models to it. Reduce costs or improve training efficiency through cloudification, and quickly industrialize this technology, so that more customers can enjoy the advantages of large models at a more reasonable cost. Zhang Li added,

In the process of AI implementation, large models should be human partners, not substitutes.

Proofreading/Tina

Curated by Eason

Read on