Tencent Cloud took the lead in launching the "big model application" signal gun

Written by | Wu Kun Proverbs Wang Pan

Edit | Wu Xianzhi

Tencent, which does not talk much in the big model track, sent Tencent Cloud to hand over its own answer sheet to the industry.

On June 19, 2023, Tencent Cloud held the Industry Big Model and Intelligent Application Technology Summit to announce Tencent Cloud's MaaS (modle-as-a-service, model and service) technology solutions in the B-side direction, as well as a number of SaaS intelligent application upgrades and industrial customer application implementation progress.

Tencent Cloud took the lead in launching the "big model application" signal gun

Among them, the most noteworthy thing is that Tencent Cloud did not choose a general large model with wider cognition, but chose an industry large model pre-trained based on Tencent Cloud's AI technology base, that is, a vertical field large model. It is understood that at present, Tencent Cloud has released vertical fields including finance, government, cultural tourism, media, education, etc., and enterprise customers only need to use the above pre-training models as the basis for training with industry data to obtain exclusive large models with business capabilities.

In other words, the capabilities demonstrated by Tencent Cloud at the conference do not lie in the model itself, but in the deployment and application of large models.

In fact, as early as before the full outbreak of large models this year, the vertical application of promoting models has become the consensus of players, and universities and industries have accumulated many small and medium-sized models in vertical fields. However, in the absence of the corresponding technical foundation, even if the execution efficiency, security, and interpretability of these models are unsatisfactory, the marginal cost of their training and deployment applications is difficult to be reduced, and the reserved innovation space is difficult to support their commercialization.

Today, thanks to the model emergence capability discovered by OpenAI and the training strategy derived from it, the general-purpose model capability has accelerated and laid the foundation for the rapid training and deployment of vertical models. Based on these successful experiences, Tencent Cloud is taking the first step in exploring the implementation of large models and market education.

The big model of the industry is timely

The talent training model commonly adopted in modern education is the T-shaped model. We tend to go through long periods of general studies, and after laying a solid foundation and developing a certain amount of personal thinking ability, we participate in higher education and delve into a certain field. The same seems to be true of the big model of the AI aggregator, which enters the vertical cycle in the reciprocating cycle of T-shaped development, and the signal gun is first fired by Tencent Cloud.

Tang Daosheng, Senior Executive Vice President of Tencent Group and CEO of the Cloud and Smart Industry Business Group, said at the Technology Summit that compared with the general model, enterprises need to target industry-specific large models, and combine their own data for training and fine-tuning to create more practical intelligent services. Enterprises have high requirements for the professional services provided and low fault tolerance, so the large models used must be controllable, traceable and correctable, and thoroughly tested and repeatedly tested.

As we all know, the general large model is full of topics, but it is too generalized in terms of ability, and it is difficult to solve the real problems in the business with only the general knowledge ability, so that the training and tuning of the general large model is more to explore the output boundary of the large model, rather than landing, so as to unlock its next value level.

In contrast, the most obvious advantage of the industry large model is its ability to focus on the professional field, that is, the degree to which the output of the model corresponds to the user's goals and interests.

A typical example is AGI, where Rohin Shah, a researcher at DeepMind, Google's AI frontier unit, believes that incorrect fine-tuning or incorrect generalization may cause AGI to pursue an incorrect goal, and misaligned AGI will have disastrous consequences. In other words, even if it is a general-purpose large model, its external output and internal fine-tuning need to be aligned with the universal values of human beings, let alone vertical professional fields with strong landing needs.

Even in high-demand and high-value vertical fields such as writing code, which are commonly used by general-purpose large models, the capabilities currently demonstrated in general-purpose models are difficult to say. Countless programmers who test code using large models that have been open for testing should have a deeper understanding of this.

What's more, the industry's large model is aimed at specific domain tasks and does not involve more generalization, so it does not need a huge general model as a base for further training, nor does it need to pile up too much on reasoning and memory, and pursues "miracles".

Wu Yunsheng, Vice President of Tencent Cloud, Head of Tencent Cloud Intelligence and Head of Youtu Lab, mentioned in a dialogue with us that although the scale of large models in the industry is constantly increasing, Tencent Cloud is more concerned about how to solve customer problems with the most effective and lowest cost means. In contrast, the number of parameters is actually not that important. Because the larger the scale of parameters, the more computing power and data set support are required for their value reflection and problem-solving capabilities, and the cost of training and inference is difficult for both Tencent Cloud and enterprise customers.

The value of technology is reflected in the landing, and the technology that cannot be implemented is just a castle in the air, waiting for the value illusion to be broken in the spring and autumn brushwork of the industry. The metaverse, blockchain and other outlets before the big model are already the best examples.

The industry model is clearly a more suitable solution, whether it is Tencent Cloud or enterprise customers. Then, the question that needs to be deeply investigated is Tencent Cloud's solution, that is, whether the industry big model trained by industry data can show stronger capabilities than the general large model with the same parameters in the industry understanding.

The technical base determines the upper and lower limits

In the era of large models, the demand for AI such as natural language processing, computer vision, and speech recognition has exploded, but these applications based on large models usually require huge computing resources and storage space, as well as efficient deployment and management mechanisms. In order to avoid the waste of resources under the general needs of enterprises, Model as a Service (MaaS) emerged as an emerging technology service model.

In other words, the core capability of MaaS services is actually to help enterprises complete the leap from "manual workshop" to "factory mode" of AI applications.

Tencent Cloud's solution, shown in the panorama below, provides enterprise customers with a tool chain that includes model pre-training, fine-tuning, and application development capabilities in the form of a "one-stop shop for large industry models", helping enterprises create and deploy AI applications with high efficiency, high quality, and low cost. The infrastructure that determines vertical capabilities is the infrastructure of the deployment model, the technology underpinning.

One is the intelligent computing power support - high-performance computing cluster HCC, which is the latest technology released by Tencent Cloud in April to increase computing power performance by 3 times compared with the previous generation, providing high-performance, high-bandwidth and low-latency intelligent computing capability support for large model training, and with the self-developed Xingmai high-performance computing network, it can bring 3.2Tbps of the industry's highest Internet bandwidth to the HCC computing cluster; The second is data retrieval support, the vector database Tencent Cloud Vector DB, verified by Tencent's massive business scenarios, processes an average of 100 billion vector retrievals per day under the premise of providing high throughput, low latency, low cost, high availability, and elastic scalability, and supports mixed scalar + vector retrieval at the same time.

If infrastructure provides an upper limit for the exclusive model that enterprises create, then the industry big model is the lower limit of the enterprise-specific model. According to Tang Daosheng, Tencent Cloud's industry-selected model stores cover 10 major industries such as finance, cultural tourism, government affairs, media, and education, and provide more than 50 solutions, including scenarios such as intelligent customer service, ORC identification, government affairs consulting, education consulting, and media management.

Taking the cultural tourism scenario as an example, the traditional intelligent customer service of a leading cultural tourism customer of Tencent Cloud required manual dialogue configuration, and the amount of knowledge maintenance was large and time-consuming, but the operation manpower was limited, the manpower allocation cost was high, and it involved complex business scenarios such as orders, which failed to complete the business closed loop for a long time. However, with the in-house data accumulated over a long period of time, combined with the capabilities of Tencent Cloud Travel Industry Big Model, the Tencent Cloud TI platform has fine-tuned and built a large model of exclusive cultural passenger service, which has increased the capabilities of intent recognition, long text recognition and answer generation compared with traditional functional customer service, allowing the cultural tourism enterprise to solve business problems end-to-end without manually configuring the dialogue process.

In addition to cost control and application optimization of customer enterprises, Tencent Cloud is based on the industry model and supplemented by the implementation path of industry data fine-tuning, which can revitalize the silent industry data occupied by multiple industries and enterprises in the current economic ecology, so that the value of data can be displayed.

In the past, in-house data in many industries was regarded as an asset because of the basic application of user portraits under big data, and its value dividend has gradually been exhausted. Today, industry data can be used as nourishment for enterprise-specific models, which not only returns data assets to the high ground of value, but also brings new imagination to the industry.

It is worth mentioning that the use of a large amount of data in model training also means that the stereotype of Finetune has been broken, that is, finetuning is no longer limited to small-scale datasets, but larger and deeper industry datasets. As the most important "craft work" of large models, Tencent Cloud's solutions naturally include data-related solutions.

Data is still the key to implementation

The middle-tier TI platform of Tencent Cloud's MaaS solution focuses on data as a key element, covering the entire process of data annotation, training, and application.

In terms of data labeling, high-quality labeling datasets have a greater impact on large model training than data size. In addition to the undebunked OpenAI myth in the industry, an internal researcher article accidentally leaked by Google in May confirms this.

Chinese disturbing information such as advertisements on the Internet further aggravates the difficulty of data cleaning, and the requirements for data services for Chinese model training are also increasing. In Tencent Cloud's view, the high-quality datasets required for large model training and optimization must be cleaned and preprocessed to eliminate noise, fill in missing values, and ensure data quality. If the imported data is of low quality, the trained model will also have problems, that is, "garbage in, garbage out".

The training and application of data is equally important. You know, the construction of vertical domain models is not simply feeding industry data to the general model, but a "craft work" involving multiple technologies and skills. For example, epoch (training samples are passed completely once in the neural network) value setting, if epoch is set too high, it will lead to catastrophic forgetting of the model and loss of existing capabilities; And the epoch setting is too low, the model is likely not to learn new knowledge at all, which is equivalent to running for nothing.

These three services are reflected in Tencent Cloud's industry model cases, which are typically multi-scenario intelligent customer service trained with different industry partners. Data from different industries such as finance, government, and education can be fine-tuned with the large model of the corresponding industry after annotation, which can show the model capabilities required by the business.

For example, in financial customer service scenarios where knowledge maintenance is large, cold start knowledge configuration takes a long time, and operations require continuous investment, the first batch of joint-stock commercial banks in a country used the accumulated industry data to build an exclusive large model and deploy it privately.

As for the acceleration component under the data platform of the middle layer, its function lies in the efficiency of model training and fine tuning. It is understood that on the basis of traditional CV and NLP algorithm models, Taiji acceleration components increase the model construction speed through asynchronous scheduling optimization, video memory optimization, computing optimization, etc., and the performance of common solutions in the industry is improved by more than 30%.

However, in the implementation of the model, the ownership of the model and data security problems are still difficult to avoid.

Taking the cloud computing industry where Tencent Cloud is located as an example, cloud deployment was an industry problem that needed to be solved in the cloud computing industry long before the rise of the MaaS model. Giant enterprises can spend a lot of money on private deployment, while small and medium-sized enterprises can only take the form of public cloud hosting, even so, giant enterprises must face high migration costs after private deployment.

The deployment, attribution, and more basic data security of large models are naturally top priorities for enterprise customers. A typical example is the API interface mode of ChatGPT, which has led to data leaks for many enterprises, and many giant enterprises led by Samsung and Apple have disabled ChatGPT API capabilities early.

Therefore, Tencent Cloud's more valuable service in addition to the panorama lies in providing flexible deployment solutions such as private deployment, public cloud hosting, and hybrid cloud deployment according to customer needs, while issues such as the use of computing power and the ownership of intellectual property rights of models are also cases by case, allowing enterprise customers to enjoy high-quality data and model services while ensuring that private data stays in-house, truly achieving "tailor-made and inclusive application".

epilogue

"For the industrial revolution, taking out the light bulb a month earlier is less important in terms of a long time span. The key is to do a solid job in the underlying algorithms, computing power and data. ”

It has been a month since Ma Huateng responded to Tencent's "loss of voice" in the large model track at the 2023Q1 earnings conference, and Tencent Cloud has increased the "first voice" of the industry's large model, which reflects a turn in the commercialization of Internet giants in the large model track. At a time when the media boom has gradually passed and the development of general large models has entered a stable period, landing is the ability that the industry should provide for model demanders.

So, are vertical and generic diametrically opposed paths?

In the realm of large models, the answer, of course, is no. Because vertical models are basically the same as the training methods of generic models, even at the code level.

The particularity of the large model "alchemy" determines that there can be two paths in the large model track at the same time, typical is to explore the artificial intelligence boundary under 100B parameters and rapid landing deployment and application under 7B parameters, both of which are the same way, both for the purpose of landing, but at present 7B is in front of the general. What is certain is that the phenomenon of "walking on two legs" in the big model track will continue. With its MaaS solution, Tencent Cloud has become the first player in the industry to successfully take the right foot.