laitimes

If you want to land a large model, let everyone afford to use it first

author:虎嗅APP
If you want to land a large model, let everyone afford to use it first

Visual China

Since 2023, there has been a global wave of generative AI represented by large models. In the past year, all enterprises and themes related to computing power, algorithms, data, network security, cloud computing, and AI have been sought after on the tuyere of AI large models. However, from the perspective of technological innovation, the domestic large model has not brought qualitative changes to the production and lifestyle.

Not long ago, Kai-Fu Lee bluntly pointed out in an interview that for Americans, the "ChatGPT moment" happened 17 months ago, but Chinese users are still waiting for their own "AI moment".

At the end of the day, domestic chatbots and tools aren't doing well enough. "China must have its own ChatGPT in order to generate great public interest in AI technology and drive its adoption and investment in a wider range of areas." Lee Kai-fu said bluntly.

At the same time, there are constantly bigwigs shouting that "there is no point in the big model that can't be landed".

For example, Robin Li publicly said that China's hundreds of basic models are a huge waste of social resources, and how should more resources be explored to integrate with all walks of life, as well as the possibility of the next super application.

On the demand side, the market is actually very eager to find AI products and services that can be quickly implemented and generate benefits. For large model developers and adopters, volume parameters no longer make any sense. In the view of Tan Cheng, president of Volcano Engine, "how to let more people and all walks of life use it" is the best definition of "good model" at this stage.

Many industry insiders judge that this year will be the first year of the explosion of generative AI applications in China. At present, the key breakthrough to open the market, "price" has become the new consensus of the industry.

Ease of use is key

At the just-concluded Spring Feature Update Conference, OpenAI announced its latest model, GPT-4o, which not only has a significant increase in performance, but also reduces the price by 50%. You know, this is the 4th price cut made by OpenAI since the beginning of 2023.

In the current market, large models are usually billed in 1000 tokens. Taking GPT4 as an example, since its release in March last year, OpenAI has upgraded GPT4 to GPT-4o, and the input price has dropped from $0.03/1,000 tokens to $0.005/1,000 tokens, a drop of up to 83%; The output price also decreased by 75% from $0.06/1000 tokens to $0.015/1000 tokens.

According to OpenAI's expectations, its large models will continue to reduce costs by 50-75% per year.

OpenAI is not the only one that has buried cost reduction in the main line of business development. Since the beginning of this year, in order to accelerate the landing of the application side, domestic large-scale model players have not only begun to follow the trend of price reduction, but even the reduction has reached an astonishing extent.

For example, the Zhipu model officially announced its new price system this month, and the call price of the entry-level GLM-3 Turbo model dropped from 5 yuan/million tokens to 1 yuan/million tokens, a drop of up to 80%.

Another example is DeepSeek-V2, a second-generation MoE model released by DeepSeek on May 6, which is comparable to GPT-4 and llama 3-70B, with input and output of 1 yuan and 2 yuan per million tokens, respectively, which is only nearly 1% of the price of GPT-4 Turbo.

If you want to land a large model, let everyone afford to use it first

The bean bag model released by Byte on May 15 directly changed the cost of AI use from "in cents" to "in cents".

Among them, the main model Bean Bao Pro 32k is priced at 0.8 cents/1000 tokens, which is 99.3% lower than the industry average price. In the context of industry-wide price reduction, users can buy 2,400 tokens from GPT for the same 1 yuan, and more than 8,000 tokens for domestic large models.

If you build it yourself with the open-source Llama model, you can get about 30,000 tokens. For 1 yuan, you can get 1.25 million tokens on the bean bag model, which is equivalent to processing 3 750,000-word "Romance of the Three Kingdoms" text.

In the "100 model war" staged in the past year, there will always be people asking "how to accelerate the implementation of models" every once in a while. It is undeniable that AI models have infinite value, waiting for market players to explore, but now it is only a very early stage of exploration in the industry.

As a cost-driven productivity revolution, the value of AI models lies in the creation of images and language understanding, so that the marginal cost of creation is infinitely close to zero. In the view of Tan Bei, president of Volcano Engine, price reduction is an important driving force for value creation: "There are still very few application scenarios of large models in the to B market, including OpenAI, which is also constantly reducing prices, and everyone's common goal is to make the market cake bigger first." "Only by reducing the cost of trial and error for customers can we promote the prosperity of the industry.

An industry insider said frankly that the current market size of China's AI large model application is a drop in the bucket compared with the training cost invested by all players in the market. Enterprises are not yet able to achieve a positive cycle with to-B services, and the gap in revenue is more than two orders of magnitude. In this case, large-scale model companies are trying to reduce prices to get more people to use them, "at least a way to try." ”

Low prices rather than price wars

Chinese enterprises have always pursued applicationism, and the reason why most enterprises are willing to embrace the AI era is that they hope to use AI to create differentiated advantages in the market competition, so as to enhance the competitiveness of the industry.

However, in the face of the endless emergence of large models in the market, it is difficult for users to make accurate selection decisions. In addition, enterprises lack experience in model tuning and professional data processing, resulting in the actual application effect of the model not meeting expectations, and it is difficult to adapt to their own business.

Model effect is the most critical part of AI implementation. A number of industry practitioners said that good technology must be formed under the condition of large-scale application and continuous polishing. This is consistent with the concept that Byte expounded at the launch of the bean bag model - the key is to use it. "Only when it is implemented in real scenes, the more people use it, the larger the number of calls, can the model get better and better." Tan told Tiger Sniff.

Generally speaking, price is the first consideration for users when it comes to large models. After all, the risk factor of AI innovation is very high, and even if many enterprises want to do various innovations, they are limited by the cost of using the model, and they are worried that the model will not be able to perform the tasks of various vertical scenarios, resulting in negative ROI, so they will be more inclined to try tools without threshold burden.

Tan Cheng roughly calculated that if an enterprise wants to use AI to make an innovation, it will consume at least 10 billion tokens, and if it is sold according to the previous price of the large model, it will cost an average of 800,000 yuan, and now it only takes 8,000 yuan to use the bean bag model.

In the past, the cost of large model inference declined, largely relying on the upgrade of computing power. The reason why Doubao "crushes" other similar large models in terms of price depends on the optimization of the model structure, the engineering transformation from traditional stand-alone reasoning to distributed reasoning, and the mixed scheduling of inference of different loads, which brings unexpected cost reduction effects and becomes an important "switch" to open AI applications.

It is worth noting that leading manufacturers, including BATJ, are also focusing on improving the efficiency of model training and reducing the inference cost of large models.

Of course, if you simply spell the price, anyone can offer a large model with a lower price. But if you want to really break the high wall between the big model and the industry, and let the beautiful scene happen more widely, lower cost is a necessary condition, but it cannot be at the expense of the quality of the model.

For example, there are many lightweight models on the market, although they also achieve relatively low costs, but at the cost of compressing model capabilities and inference costs, the effect is greatly reduced, and cannot be compared with their main models.

Although the trend of price reduction of large models has become inevitable, the "lowest price in the industry" of bytes has also caused a lot of heated discussions.

On the one hand, the bean bag model allows enterprises to call the large model at the lowest cost in the industry, which accelerates the application landing; The other side criticized that Byte's ambition caused it to trigger a price war prematurely, and as the cost of using large models gradually decreases, a market battle is likely to be staged next.

In response to the doubts of the outside world, Tan Bei explained that the large model is still in its infancy, far from the level of fierce competition, and the reasoning cost of the model will continue to decline in the future, "Perhaps, wait until then to look back at today's 0.8 cents, maybe it is not cheap at all." In contrast, we are just one step ahead, not to drive away our opponents. ”

Moreover, low prices are not the same as "price wars". To B needs to maintain long-term profitability, Tan said: "Not losing money is the key to ensuring business stability, and it is also the principle of Volcano Engine. "Because that's the only way to deliver services sustainably in the long term.

From proof of concept to implementation

In the large-scale model training boom triggered by the AI explosion last year, Zhipu AI, Dark Side of the Moon, Minimax, and Zero One Things were the first to use the computing power of the volcano engine to train models.

Tan Cheng said frankly that the cloud market is actually competing for scale. The popularity of AI large models has opened up new market opportunities for cloud vendors. Microsoft Azure is the proof of this. By investing in OpenAI, it has made the cloud business soar and become the world's No. 1 intelligent cloud.

At present, the large model of bean bag has been officially provided to the public through the volcano engine. According to official data, the daily usage of Tokens in the bean bag model reached 120 billion, and the number of images generated in a single day exceeded 30 million.

Last year, when most of them staged a "100-model war" last year, the "absence" of bytes was once considered by the outside world to have a serious lag. At that time, the large model of bean bags was still in infancy. Who would have thought that bytes, which had been holding back for a long time, would make a move at this node.

In addition to the effect and cost of the model, the application implementation is also very crucial. In the past year, Doubao has connected to more than 50 businesses, including Douyin, Toutiao, Tomato Novel, etc., and the business scenarios cover office intelligent assistants, e-commerce shopping guides, after-sales customer service, marketing creation, data intelligent analysis, programming assistants, etc., and invited industry head customers in the fields of mobile phones, automobiles, finance, consumption, and interactive entertainment for internal testing.

In the process of polishing the model, the challenge is always there. First of all, it is necessary to make the basic capabilities and performance meet the corresponding standards, and then connect with enterprise customers to solve many problems, such as building a large model evaluation and application process, including test collection, case analysis, and fine-tuning capabilities, etc., otherwise it is difficult to distinguish which scenarios are reliable and which scenarios are not suitable in the short term.

It is also in the process of running in with enterprise customers that the Volcano Engine team has deepened its understanding and adaptability to scenarios such as assistants, knowledge, entertainment, and education, and implanted corresponding plug-ins and tool platforms in the Volcano Ark 2.0 platform for different scenarios, and continued to explore to find the best solution.

The capabilities of large models are currently amazing in many dimensions, but there are also many flaws, and at the same time, they are evolving rapidly, and they will change greatly every three months and six months. "This is precisely the challenge and fun of making large-scale model products, and it is necessary to constantly judge what the PMF (product-market matching point) of the next product may be in this continuous and dynamic technological development."

It is worth mentioning that on Byte's AI application building platform "Buckle", some developers have obtained a certain amount of income through their own developed applications. Tan predicts that in the second half of this year, the application of large models will allow more scenarios to move from the proof-of-concept stage to the landing stage.

At present, Volcano Engine and enterprise users such as China Merchants Bank, Mengniu, and OPPO have cooperated and explored in core business scenarios, and as for how the large model will shape the future, everything will be left to time to verify.

This content is the author's independent view and does not represent the position of Tiger Sniff. May not be reproduced without permission, please contact [email protected] for authorization

People who are changing and want to change the world are all in the Tiger Sniff APP

Read on