laitimes

From "sky-high prices" to "fracture prices", large models are about to change

author:Titanium Media APP
Text | Light Cone Intelligence, Author|Yisi, Editor|Wang Yisu

Ten years later, domestic cloud vendors are fighting again!

In the past month, domestic cloud manufacturers have set off a new round of large-scale model price reductions. This means that the AI competition is no longer just an arms technology race, and the leading manufacturers are also thinking about how to make money.

In this price war, star entrepreneurial teams such as Volcano Engine, Alibaba, Zhipu AI, and Face Wall Intelligence are all involved. Each company seems to be cutting prices, but in fact, they want to quickly seize the market by comparing low prices, so as to achieve the rapid landing of commercialization.

From volume large model technology to volume price

In the eyes of everyone, it may be that the head manufacturers have the most confidence and strength to reduce prices, because the big factories have more businesses, even if the AI business loses money, it can be made up through other businesses, but the initiator of the price war about the large model is the star entrepreneurial team - Zhipu AI.

On May 11, a new price system was launched on the Zhipu AI large model open platform, and the amount of newly registered users was increased from 5 million tokens to 25 million tokens, and the call price of the personal version of GLM-3Turbo model products was reduced from 5 yuan/million tokens to 1 yuan/million tokens, which was reduced by 5 times. But that's not enough, the GLM-3 Turbo Batch API is also 50% cheaper, priced at 1 yuan/2 million tokens.

Four days later, Volcano Engine released a heavy blow and announced that the price of the Bean Bao Pro 32k model is 0.8 cents/thousand tokens, which is 99.3% lower than the industry, and in addition, the price of the Bean Bao Pro 128k model is 5 cents/thousand tokens. Compared with the industry model, the price is reduced by 95.8%.

As soon as this news was released, the entire AI circle was not calm. Some people say that the volcano engine has brought the large model into the "centi-age".

Let's calculate an account, taking the main model 32k window as an example, 1 yuan can buy 2400 tokens from GPT, and if you choose the domestic model, you can probably get more than 8000 tokens. If you build it yourself with the open source Llama, you can get about 30,000 tokens. But through the bean bag model, you can get 1.25 million tokens for 1 yuan. What is this concept? I believe that many people have read "Romance of the Three Kingdoms", a total of 750,000 words, which is converted to 1 yuan to handle the amount of text of 3 copies of "Romance of the Three Kingdoms".

Compared to other large models, the volcano engine is simply the price of cabbage, which is almost equal to no money. It can be said that the price reduction of the volcano engine this time has pushed the price war of large models to a climax.

Two days after the launch of the Volcano Engine, Tang Daosheng, Senior Executive Vice President of Tencent Group and CEO of the Cloud and Smart Industry Business Group, introduced the technology, performance, security, and low threshold for use of the hybrid model at the Tencent Cloud Generative AI Industry Application Summit, although the price was not publicly announced. However, according to Tencent Cloud's official website, the hybrid model can provide first-time users with a free trial quota of 100,000 tokens for the text generation model, which is valid for 1 year. In terms of the price of tokens resource packages, the model inference input prices of the hunyuan-standard model and the hunyuan-pro model with a window size of 32K are 0.0069 yuan/1000 tokens and 0.069 yuan/1000 tokens, respectively, both of which are 6.9% off the publication price.

Compared to before, the price is significantly lower. According to a billing report updated on Tencent Cloud's official website on May 14, the previous model inference input prices of the standard version (the predecessor of hunyuan-standard) and advanced version (the predecessor of hunyuan-pro) were 0.012 yuan/1,000 tokens and 0.12 yuan/1,000 tokens, respectively.

Although Alibaba Cloud's price reduction strategy is not obvious among several large manufacturers, as early as February 29 this year, it released a price reduction signal, which can be called the largest price reduction in the history of Alibaba Cloud, involving more than 100 products and more than 500 product specifications. Among them, the highest reduction is 36% for Elastic Compute Service (ECS), up to 55% for Object Storage Service (OSS), and up to 40% for ApsaraDB RDS.

It is not difficult to see from Alibaba Cloud's list of price reductions that the main force of price reductions is traditional cloud computing products, although they do not involve large models, such as AI training and inference computing power of GPUs. However, the development of cloud and large models complement each other, and it can be inferred that in this wave of price war for large models, Alibaba Cloud and even more large model manufacturers are likely to join in the future, but the rhythm of each is different.

On the other hand, OpenAI's actions in the past year, price reduction seems to have been its main task. OpenAI has made 4 price cuts since last year. At the just-concluded spring conference, OpenAI announced its latest model, GPT-4o, which not only has a significant increase in performance, but also reduces the price by 50%.

The prelude to the price war of large models has been unveiled.

What is the intention of the price war?

At present, the price war of domestic large models is in full swing.

Some people inevitably ask: large models are an industry with high input and low output, so why fight a price war?

The conclusion is very simple: "Accelerate commercialization." ”

Since the second half of last year, the large model has begun to change from "volume large model technology" to "volume large model application". In 2024, commercialization and landing application will become the main theme of large model companies.

According to the latest "Monitoring Report on China's Large Model Winning Bid Projects", from January to April this year, the amount of winning bids related to large models that can be counted has reached about 77% of the disclosed amount in 2023, involving various industries such as government affairs, finance, operators, energy, education, and transportation, indicating that the application demand for large models by enterprises is growing rapidly.

Why is there such an eagerness for commercialization?

There are two reasons for this. First, the cost of large-scale model research and development is getting higher and higher. As we all know, computing power has always been a constraint on the development of domestic large models. On the one hand, the United States occupies nearly 9% of the global computing power market share in the world's computing technology, and is almost monopolized by it. In contrast, the phenomenon of computing power shortage in China is becoming more and more severe.

According to relevant data, China's current computing power demand is about 150 million servers per year, while China's computing power supply is only 30 million servers per year, with a gap of 120 million units, accounting for 80% of the global computing power gap. With this, the tide of computing power leasing services has risen. Among them, companies such as Bingji Technology, Zhongbei Communication, and Winner Technology announced a sharp increase in computing service fees, resulting in a further increase in the cost of large-scale model research and development.

Under huge cost pressure, large model manufacturers have to step up to find a way to commercialize.

Second, the large model technology has been almost rolled, the general model has the ability to generalize, is not able to solve the actual problems of specific industries and specific scenarios, only a technology is successfully commercialized on a large scale is the real success, obviously, after a year of technical competition, has reached the stage of verification.

In order to speed up the commercialization of large models, domestic enterprises have taken action. At present, there are roughly two paths for the commercialization of large models: one is API call, and the other is privatization deployment.

API calls are the most common way to land. The call price of GLM-4, the fourth-generation pedestal model developed by Zhipu AI, is still 0.1 yuan/thousand tokens, and the price of millions of tokens is 100 yuan, which is relatively low, but the call price of qwen-72b-chat is also 0.02 yuan/thousand tokens, and the input/output price of OpenAI's GPT-4 Turbo is 10/30 US dollars per 1 million tokens.

In the short term, although the cost is not high, it is not very friendly to users and industries with large demand.

Privatized deployments are even more expensive. Up to now, no domestic manufacturer has disclosed the specific training cost of large models, but from the mouths of many industry insiders, the research and development costs of large models are much higher than imagined, often tens of millions, and even need to be calculated in "100 million".

Tian Qi, head of Huawei's large model, once mentioned that it costs about $12 million to develop and train a large model, which shows that even the most advanced companies in technology can train large models very expensive.

Wang Xiaochuan, founder and CEO of Baichuan Intelligence, also said that the corresponding training cost for every 100 million parameters is between 1.5 and 30,000 yuan. Therefore, the cost of a single training of a model with 100 billion parameters is estimated to be between 30 million and 50 million RMB. Anthropic's CEO, Dario Amodei, has also predicted that the cost of the model will reach $10 billion in the next two years.

Obviously, the high price of API calls and R&D has become a shackle to the commercialization of AI. Over time, large models become rich people's games, which is certainly not conducive to large-scale commercialization.

And the price war has become the most direct and fastest way to land. However, not all enterprises can join in, because only by reducing the R&D cost of large models to a minimum as much as possible can there be room and capital for price reduction.

As mentioned above, the biggest R&D cost of large models is computing power, so many vendors tend to reduce costs by improving the training efficiency of large models and reducing the cost of inference.

Zheng Weimin, an academician of the Chinese Academy of Engineering, once did such a calculation, in the process of large model training, 70% of the cost is spent on computing power; 95% of the cost of inference is also in computing power. It is self-evident why they make a fuss about reasoning.

For example, at Microsoft Build 2020, Microsoft unveiled the AI supercomputing supercomputer that powers GPT-3, which can make the training efficiency of large models 16 times higher than other platforms, thereby reducing time and risk costs.

Domestic large models are no exception. As early as version 2.0, the Pangu large model tried to use a sparse + dense architecture to reduce the training cost. One month after its launch, Wenxin Yiyan also improved the inference performance of the large model by nearly 10 times through technical means, and the inference cost was reduced to one-tenth of the original one.

Alibaba Cloud Tongyi large model focuses on the scale theorem, based on the data distribution, rules and proportions of small models, studies how to improve model capabilities under large-scale parameter scale, and improves model training efficiency by 30% and training stability by 15% through the optimization of the underlying Lingjun cluster.

Tencent has chosen a different path from Baidu and Alibaba, which iteratively upgrades the machine learning framework Angel, training and inference frameworks, among which, Angel can improve the training efficiency of large models to 2.6 times that of mainstream open source frameworks, and the training of hundreds of billions of large models can save 50% of computing costs.

In terms of training frameworks, Tencent's self-developed machine learning training framework, AngelPTM, can accelerate and optimize the entire process of pre-training, model fine-tuning, and reinforcement learning, so that larger models can be trained with fewer resources and faster speed. In terms of inference, Tencent launched AngelHCF, a large-model inference framework, which achieves faster inference performance and lower cost by expanding parallelism capabilities, and its inference speed is increased by 1.3 times compared with mainstream frameworks in the industry.

The commercialization of the racing model, the cloud vendor shines the sword

After observation, the commercialization paths of Alibaba, Tencent, and Byte are basically the same, that is, "the ability to iterate on general models + build a complete ecosystem + research and development of innovative AI products", but they also have different emphases.

Continuous iteration of large model capabilities is a prerequisite for the commercialization of large models.

Since last year, several major domestic manufacturers have continued to iterate the capabilities of large models, Baidu preemptively launched Wenxin Yiyan in March last year, and the Wenxin large model has been iterated to version 4.0, and a number of lightweight large language models have also been launched. It was followed by Alibaba, which was launched in April last year, and has now been iterated to version 2.5 after 2.0 and 2.1.

Tencent is the latest of the BAT, and Mixed Yuan came out in September last year. Subsequently, Tencent did not expand its voice by iterating on the new version, as the previous two did, but demonstrated its usefulness through its technical capabilities. For example, the upgraded machine learning framework Angel, the machine learning training framework AngelPTM, and the large model inference framework AngelHCF. In addition, Tencent also open-sourced the Wensheng map model some time ago, including the upgrade of the follow-up Wensheng video capabilities, and through these series of actions, the large model has penetrated into thousands of industries.

And bytes are the most special one,It only took one year to evolve the bean bag model from1.0 to 3.0,And from the bean bag model family released by the volcano engine this year,Not only includes two general models Pro、lite,It also launched 7 functional models,Covering role play、Speech recognition、Speech synthesis、Voice reproduction、Wensheng diagram, etc。 It shows that the volcano engine will penetrate into different industries and different scenarios in the future.

As we all know, the size of the call volume will directly affect the effect of the model, in this regard, the current Wenxin large model daily call volume has reached 200 million, the Tongyi large model has also exceeded 100 million, and the daily call volume of the byte bean package large model has reached 120 billion tokens (about 180 billion Chinese characters).

Strong ecology is an accelerator for the commercialization of large models.

In terms of ecological construction, several major manufacturers, including Baidu, Alibaba, and Byte Volcano Engine, have taken the same path - building a large model platform, which not only provides its own model services but also connects to third-party open source large models, which is convenient for customers to call on demand. Such as Baidu Intelligent Cloud Qianfan large model platform, Alibaba's Bailian platform, Tencent Yuanqi and Volcano Engine's Ark platform.

In order to accelerate the penetration of large models and expand the space of the commercial market. Alibaba is committed to open source, and in August last year, Alibaba Cloud open-sourced 8 large language models with parameter sizes ranging from 500 million to 110 billion. Among them, for end-side devices, small-size models such as 0.5B, 1.8B, 4B, 7B, and 14B are open-sourced; For enterprise-level users, large-size models such as 72B and 110B are open-sourced. In addition, Tongyi has also open-sourced visual, audio, code, and hybrid expert models.

Alibaba's open source dates back to 2022, that is, a year before the explosion of large models, Alibaba Cloud completed the construction of the ModelScope community, and the launch of the community Ali has open-sourced more than 300 high-quality models developed in the past five years. Zhou Jingren also introduced that the Moda community platform is still expanding and building, with a total of more than 4,500 high-quality open source models and more than 5 million developers on the platform.

Like Alibaba, Tencent also chose the open source route in the competition for the commercialization of large models. Not long ago, Tencent Cloud fully open-sourced the mixed-element Wensheng graph model. Zhang Feng, head of Tencent's hybrid model application, said, "In the past, we have open-sourced many projects in the non-large-scale model era, and the decision to open source in the large-scale model era is the conclusion reached by Tencent in the past six months during its contact with customers. ”

However, Baidu, which was the first to enter the game, has been adhering to the closed-source route. Baidu believes that closed-source large models can achieve better performance and lower cost than open-source large models, thereby promoting the prosperity of the AI application ecosystem.

AI reconstructs internal products, which is the first stop for large manufacturers to commercialize large models.

At present, Baidu has completed the AI reconstruction of Baidu Library, Baidu Search, Baidu Map, Ruliu and other businesses. Ali has comprehensively upgraded the core products of Ali Group, such as DingTalk, AutoNavi Map, Xianyu, Eleme, Youku, Hema, Taopiao, Tmall, Taobao and so on.

Byte has also opened a horse racing mechanism internally, and more than 50 business lines such as Douyin and Jianying have started AI exploration on their own, in addition, Byte has not changed the true nature of "App factory". In the past year, in addition to the main Doubao App, based on the Doubao model, it has also launched the interactive entertainment app "Cat Box", as well as AI creation tools such as Star Painting and Instant Dream.

Tencent, as the most low-key company in the large model, is particularly not low-key on the product side. Adhering to the principle of industrial practicality, Tencent has taken the AI transformation of internal products as a key task since the launch of Hybrid in September last year, and at present, the Hybrid model has been implemented in more than 600 Tencent's internal businesses and scenarios. It includes "One Door and Three Masters" products such as WeCom, Tencent Meeting, and Tencent Docs, as well as collaborative SaaS products such as Tencent Lexiang, Tencent e-Sign, Tencent Questionnaire, and Tencent Cloud AI Code Assistant.

Accelerate industry penetration and the last mile of large-scale model commercialization.

Whether it is a general model or an industry model, the final implementation is actually to solve the practical problems in specific industries and specific scenarios. Of course, in the choice of industry, there are crossovers and different directions.

Based on the Wenxin model, Baidu took the lead in reconstructing solutions for the four major industries of digital government, finance, industry, and transportation. Relying on the Tencent Cloud TI platform, Tencent Cloud has built a selection of industry model stores, covering 10 industries, including finance, cultural tourism, government affairs, media, and education, and can provide more than 50 solutions. At the same time, Tencent Cloud has also joined hands with 17 ecosystem partners from different industries to launch the "Tencent Cloud Industry Large Model Ecosystem Plan", which is committed to jointly promoting the innovation and implementation of large models in the industrial field.

Alibaba also released 8 industry models last year, covering finance, healthcare, law, programming, personalized creation and other fields.

According to the information, at present, Volcano Engine has established an automobile model ecological alliance with more than 20 manufacturers such as Geely Automobile, Great Wall Motor, Jietu Automobile, Sailis, and Zhiji Automobile. At the same time, it has also established a smart terminal model alliance with OPPO, vivo, Honor, Xiaomi and Asus and other terminal manufacturers. It can be inferred that Volcano Engine is likely to take the lead in these two industries as a starting point, and then penetrate into other industries when it matures.

epilogue

After a year, the competition of large models has transitioned from the volume of large model technology itself to the stage of commercialization of volume large models.

The former tests more technology and capital, while the latter needs to fine-tune the large model according to different industries and different business scenarios on the basis of the former, so as to provide a large model service that can really be required for enterprises.

At present, the major model manufacturers have shown their swords, but this is only the beginning, for a long time in the future, the heads will continue to focus on how to speed up the implementation of the large model, while competing, it will also push the large model to a new stage of development.

Read on