laitimes

The open-source model is not weak

Text | Speculative Finance

Since the advent of ChatGPT in November 2022, this phenomenal product has quickly ignited the enthusiasm of large models in the market. New and old technology companies have entered one after another, vowing to seize the biggest industrial dividend since the mobile Internet. When the industry is in full swing to develop a focus on technology, there is a "route dispute":

Closed-source route: With foreign OpenAI's ChatGPT, Anthropic's Claude, Google's Gemini, China's Baidu's Wenxin Yiyan, and Kimi on the dark side of the moon as typical representatives, they tend to the advantages of closed-source large models such as high performance and strong commercialization, among which Baidu is the most radical. Aroused heated discussions in the industry;

Open source route: Taking META's Llama and Alibaba Cloud's Tongyi as typical representatives, it is believed that the collaborative characteristics of the open source model can realize the rapid iteration of technology, and can improve the business growth space of cloud computing through model hosting, and this route is conducive to data-sensitive organizations to land large models through private clouds or localized intranets, which has the advantages of high growth and multiple landing scenarios compared with closed source.

Different from the past controversy in the industry, this large-scale model controversy is full of technical feelings, and the debate among practitioners mostly focuses on the "technical controversy".

So what kind of analytical framework do we need to establish in this open-closed source debate? How can we rationally judge the current line dispute?

First, according to the principle of scaling laws, the success of a large model is the comprehensive result of greater computing power, more data, and higher computing power, which is behind the massive investment of funds, perfect infrastructure, stable management, etc., and the large model has no blitzkrieg but only a protracted war.

Second, Baidu's choice of closed source has technical considerations, but it is also inseparable from the business path;

Third, the open source model is not as weak as imagined, and closed source may not always be advanced;

Fourth, the coexistence of large models of open and closed sources will be a long-term trend;

Scaling laws原理:大模型将长期烧钱

Let's start with the first-principles "scaling laws" in large language models (translated as "scaling principles" or "scaling laws").

In January 2020, OpenAI released the paper "Scaling Laws for Neural Language Models", which laid the foundation for Scaling Law and pointed out the direction for subsequent iterations of GPT: larger parameters, more data, and more computing power can get better model intelligence.

Since then, OpenAI has started the route of large-parameter models, with GPT-3 parameters reaching 175 billion (GPT-2 is only 1.5 billion), and training data jumping directly to 57 billion G.

The large-parameter arms race of large models has also kicked off, and hundreds of billions of large-scale models are popular in the market, bringing about the rapid development and popularization of technology.

This raises a new question: computing power.

According to the Scaling Law paper, 6ND can be used to estimate the training computing power required by the model (N is the parameter, D is the number of TOKENS in the dataset), and the computing power demand has been exponentially increased in the era of large models (the computing power required by long text large models may be higher than 6ND).

On the one hand, this has given rise to the explosive growth of GPU manufacturers represented by Nvidia, the underlying computing power provider, and on the other hand, large model manufacturers must spend a lot of money on computing infrastructure if they want to maintain their technological advancement.

The open-source model is not weak

In the chart of Huatai Securities, we can also clearly see that the large model is the same as the previous explosive growth of cloud computing, and the growth of business is premised on high investment in basic computing power. According to Visible Alpha's consensus forecast, the combined capital expenditure of the global tech big four (Microsoft, Google, META and Amazon) will reach $239.9 billion in 2026, with a CAGR of 18.86% from 2023 to 2026.

There is a view that the marginal effect of Scaling Law is narrowing, and it is believed that as long as the technology is mature (when the marginal effect of the Scaling Law effect is rapidly amplified), the investment in computing power will reach its peak, and the model only needs to maintain the reliability and stability of its own model.

Professor Tang Jie of Tsinghua University pointed out in February 2024 that we are far from the end of Scaling law, and the amount of data, calculations, and parameters is far from enough. The future of Scaling law still has a long way to go.

In reality, the computing power of mainstream large-scale model manufacturers is still increasing, and the parameter scale of the model is also increasing, and the end of the industry is not in sight.

Although R&D personnel can improve the performance of large models through technical architecture optimization and software and hardware resource collaboration, we must also admit that exponential technical iteration of large models still relies on high parameters and strong computing power.

Under the above two constraints, large model manufacturers have to face very difficult problems:

If the capital expenditure of computing power is regarded as an "egg", and the high performance of a large model is a "chicken", whether the chicken lays the egg or the egg lays the chicken has become a problem that the large model manufacturer has to face.

The open-source model is not weak

Let's take Baidu, a loyal fan of the closed-source model, as an example, when the pressure on the cornerstone advertising business continues to increase, its business philosophy has become more and more cautious, such as the elimination of non-core business, the optimization of personnel, and so on. This is reflected in the fact that capital expenditure has become more conservative in spending, which has been very evident over the past three years.

In 2023, leading technology companies such as META and Amazon are also carrying out structural optimization of capital expenditures, such as Amazon's logistics and warehousing costs have begun to decrease, while infrastructure such as cloud computing data centers is still in large-scale expansion. The same is true for Baidu, which is becoming more cautious in its capital expenditure, but the infrastructure investment related to large models is bound to grow rapidly.

This will also bring a problem to Baidu, when the structural capital expenditure reduction will finally come to an end, Scaling Law is far from seeing the end, and the "second curve" will not be able to shoulder the burden of spending in the short term, which will force Baidu to consider the business path from the financial side.

The closed-source model of selling the model (API interface) has become the first choice, charging members for C-end users Wenxin Yiyan, and for the B-side, the API interface fee is the main monetization, and because the closed-source model is exclusively developed by the enterprise, the maintenance and management costs are relatively low, which is very cost-effective for Baidu. On the issue of chicken-and-egg and egg-and-egg chickens, Baidu chose chicken-and-egg.

Can closed source really beat open source?

The open-source model is not weak

In the previous article, we have briefly outlined the current status of the industry from the perspective of the principle, technology and business path of the large model, and have a certain understanding of Baidu's extreme enthusiasm for the closed-source large model.

Next, let's discuss the trend of open and closed source large models.

As mentioned at the beginning, Robin Li often has contempt for open source large models, such as "open source models will become more and more backward" at the beginning, and "without application, open source closed-source models are worthless", is the closed-source model really so unbearable?

The open-source model is not weak

ARK Investment publishes its views and insights annually in the "Big Ideas" report at the beginning of the year, and in the 2024 report, "open source models are improving faster than closed source models" is one of its key points. In the figure above, Alibaba Cloud's Qwen-72B is the largest closed-source model.

On the one hand, the closed-source large model does have a first-mover advantage, with OpenAI's ChatGPT as a typical representative, but on the other hand, the evolution of the large model is a protracted battle (Scaling law is the main factor), and there are higher requirements for enterprise management, investment and continuous innovation.

The corresponding open source model also began to show its advancement at this time.

In April 2024, Meta released Llama 3, which is designed to be multimodal and multi-language, and its performance is comparable to GPT-4 according to the current training data published by Meta.

The success of the Llama model has given the open source camp enough confidence, and in the new large model evaluation benchmark LiveBench AI launched by the authoritative organization, Ali Tongyi Qwen2 won the first place in the world in the United States latest evaluation list of the open source large model, surpassing Meta's Llama3-70B model.

Under the investment guarantee of basic computing power, the open source large model can maintain sufficient competitiveness in the long term, and it is untenable for Robin Li to disdain the performance of the open source large model without progress.

This tells us once again that closed source and open source are not a battle of technical ideas, but a divergence of business paths.

So which business path is most suitable for the implementation of large models?

For reasons of space, we have omitted the tedious analysis process and summarized the following points:

Short and medium term: The closed-source model has more obvious advantages in monetization, and the main business model is to sell the model, which is simple and easy to operate. Baidu can also improve its product strength by transforming the original Internet application products (such as maps, libraries, searches, etc.) to realize the implementation of the model, and transition the business line from "AI+" to "+AI". In addition, it should be reminded that the transformation of the original product line within the enterprise is also accompanied by huge cost needs, such as Huatai Securities has calculated that if META's content recommendation is completely based on large models and replaces the original algorithm, it will require at least 500,000 NVIDIA GPUs, which alone is a huge expense (recently it was reported that META will exceed 300,000 GPUs this year), which puts forward higher requirements for the landing and realization of closed-source large models in the short term.

Long-term: The open source model goes further, such as the high degree of customization will improve the penetration rate of the large model to different industries, when different industries access the large model, improve the due breadth of the large model, the development of enterprises will rely on the computing power and cloud computing platform behind the open source large model to achieve sustainable growth.

In the above two paths, "capital" is a necessary condition for the operation of the business model, which returns to the "chicken and egg" paradox we mentioned earlier.

As a result, closed-source models often have the following characteristics: the application side has unique advantages (such as Google), and the technology is far ahead in the short term (such as OpenAI);

The open source model has a strong financial foundation (such as META) and a sound cloud computing infrastructure (such as Alibaba Cloud), which can withstand the huge cost of infrastructure expansion and can withstand the cloud computing needs after the popularization of the open source model.

Obviously, there is no big model that has all the advantages and no disadvantages, and Baidu's fierce language to advocate a closed-source model at this time should be behind its short-term commercialization anxiety (the previous API interface price war has a greater impact on the closed-source model), and the ambition to compete for the minds of target customers.

Based on this, we do not believe that there will be a large model path that takes everything into account, on the contrary, it is more "expedient" for enterprises to choose their own path, and customers will also have their own considerations when choosing the open and closed source model, and some enterprises also adopt the mode of coexistence of open and closed sources to meet the needs of different customers, such as Google's lightweight open source model series Gemma.

However, at this time, publishing out-of-the-box language is the most out of the circle and can improve the popularity of closed-source large models, but it ignores that the open source large model is by no means a "weak chicken", and the development of large models will be a protracted war, and there are too many unpredictables in the future, and it is easy to assert that it is likely to be countered in the future.

Read on