laitimes

Behind the 10% drop in Nvidia's stock price, it is actually a large model opportunity in China

Behind the 10% drop in Nvidia's stock price, it is actually a large model opportunity in China

Behind the 10% drop in Nvidia's stock price, it is actually a large model opportunity in China

Nvidia's share price went through a storm last Friday, culminating in a 10% drop in its share price and a 1.5 trillion yuan in market value.

On the surface, Nvidia was led to collapse by the company Chaowei Computer.

Because at the beginning, Chaowei Computer announced that it would no longer release performance forecasts, but wait for the end of April to directly release a quarterly report. This was quickly interpreted by the market as Chaowei Computer's performance fell short of expectations, which triggered a sell-off frenzy in the company's stock.

Because Chaowei Computer is actually a company that produces edge computing equipment, servers and storage devices with Nvidia's computing power chips as the core, coupled with the company's deep connection with NVIDIA, it is often regarded as a barometer of changes in NVIDIA's chip market.

After all, sometimes domestic and foreign Internet giants have to queue up to buy computing chips from NVIDIA, but as long as you add money, you can get a computing power server with NVIDIA chips from Supermicro Computer.

However, what really triggered this panic stock market crash was not just the fact that Chaowei Computer's performance forecast was not issued.

The underlying reasons behind it are even more important.

The giants began to think

On a sunny morning, Wall Street's stock market experienced an "earthquake". Nvidia, the chip-making giant, suddenly plummeted its stock price. This stock price shock not only frightened investors, but also triggered deep thinking in the industry.

I remember that not long ago, Nvidia released the GB200 chipset with powerful computing power, which is known as the strongest in history. At that time, Nvidia was in the limelight, and the industry was jubilant. However, the good times did not last long, and the plunge in the stock price on the 19th shook the market's confidence in Nvidia's future.

Brokerage analysts have stepped forward to interpret this phenomenon, and their views are surprisingly unanimous: the market has doubts about the future demand for Nvidia's chips. What's going on here?

To understand this shift, we need to dig deeper into the core of current AI technology, the Transformer architecture. This architecture, proposed by Google in 2017, has become a leader in the field of natural language processing. From OpenAI to Microsoft, from Google to META, almost all of America's big models are built on this architecture.

The magic of the Transformer architecture lies in its excellence in semantic understanding and AI training. But just as there are two sides to a coin, so too are its disadvantages: it can't break down the problem, and it must be trained holistically. This means that if you want to improve the performance of the model, you have to continue to pile up computing resources and open up more problem training channels.

As a result, NVIDIA's high-performance computing power chips have become the industry's sweetheart. From A100 to GB200, the price of Nvidia's computing chips has soared more than ten times, from $3,000 all the way to nearly $40,000. In order to improve the performance of large models, Internet giants have to continue to invest huge sums of money to buy more chips and power resources.

Behind the 10% drop in Nvidia's stock price, it is actually a large model opportunity in China

Taking OpenAI's ChatGPT as an example, according to British and American media reports, its operation requires up to 30,000 A100 computing cards and consumes 500,000 kilowatt-hours of electricity per day. The scale of such investment is staggering, not to mention the investment of many other giants in the field of artificial intelligence. However, these investments have not brought a corresponding commercial return. OpenAI's revenue is said to account for less than a third of its costs, and the earnings reports of several other giants show a similar situation.

This kind of thinking that relies entirely on huge investment to promote the development of the model is starting to make Internet giants feel overwhelmed. They began to wonder: Is this model of development really sustainable?

Against this backdrop, the collapse of Nvidia's stock price has undoubtedly exacerbated the panic in the market. Investors are beginning to worry about whether Nvidia's chip demand will drop sharply once this development model that relies on huge investment becomes unsustainable.

Such fears are not unfounded. After all, internet giants are facing an awkward reality: their increasing investment in artificial intelligence is far from covering the costs with commercial returns. In this situation, it is really unknown whether they can continue to act as Nvidia's "ATM".

And this trend lurking under the surface of the water is the root cause of market panic caused by the slightest movement in Nvidia's stock price. The helplessness of the giants also reflects the difficult balance between the pursuit of technological progress and commercial returns in the entire industry.

A new savior

In an era plagued by Nvidia chips and high training costs, internet giants desperately need a new savior to lead them out of their predicament. This savior is not a valiant knight with a sword or a mysterious magician, but a machine learning model architecture called MoE.

Once upon a time, the giants' dependence on Nvidia chips was as unwavering as faith. However, as the cost of model training skyrocketed, they began to look for new outlets. This is certainly an option, but chip development is a long road, and migrating from NVIDIA's CUDA platform is time-consuming and labor-intensive. In this era where speed is king, time is money and giants can't afford it.

Therefore, they set their sights on another possibility - to find a new model architecture that can perfectly solve the disadvantages of the Transformer architecture and improve the training efficiency. At this time, the MoE architecture is like a radiant savior, entering the field of vision of the giants.

MoE, or Mixture of Experts, is a machine learning architecture that consists of multiple "expert" models. Imagine these "experts" being like the elite of a think tank, each adept at handling different data tasks. When faced with complex problems, they work together to overcome them.

The workflow of the MoE architecture is like a symphony. The data first comes to a smart "doorman" who has a unique eye for identifying exactly which experts are best at working with it. The data is then precisely transmitted to the appropriate experts. These experts work in their own hands, but they work together, and finally their wisdom is gathered into a perfect answer.

Amazingly, the MoE architecture achieves a significant saving of training resources by cleverly breaking down large tasks into smaller tasks. Compared with traditional Transformer architectures, it requires significantly less inference and training resources. This is not only a revolutionary breakthrough in technology, but also brings tangible economic benefits to the giants.

More importantly, the successful application of the MoE architecture requires deep technical skills. Accurately slicing tasks, finding critical neural networks, and training excellent expert models requires ingenuity and refinement. Compared with the growing investment in hardware, this kind of soft investment in technology is obviously more favored by giants.

In addition, the characteristics of the MoE architecture are conducive to the development and growth of emerging large-scale enterprises, because they can break through the hardware moat of existing giants through technical understanding and development.

Behind the 10% drop in Nvidia's stock price, it is actually a large model opportunity in China

Therefore, the MoE architecture has begun to attract more and more large-scale model developers with its unique charm. It not only points out a way for giants to break the game, but also injects new vitality into the entire AI industry.

Here comes the opportunity for a large model in China

MoE, a concept that has been proposed in statistics for a long time, has begun to attract the attention of artificial intelligence researchers in recent years.

But to say that it really came to prominence, it had to go back to 2018. At that time, researchers found that this architecture, which had been dormant for many years, might provide a new solution for the increasingly large and complex training of large models.

However, the growth of any technology is not all smooth sailing. MoE has encountered many challenges in the training process, and the instability of the output results and the over-reliance on specific experts have limited its wide application. Although tech giants such as Google have made gains in this field, MoE is still a bit immature compared to the mature Transformer architecture.

The turning point came in June 2023, when a paper titled "MoE Meets Instruction Tuning" pointed out a new direction for the development of MoE. From the perspective of technical feasibility, the researchers provide effective solutions to the problems that are difficult to control in MoE. This paper is like a breath of fresh air, injecting new life into the MoE.

Just six months later, Mistra AI released its first open-source MoE model on the X platform, a move that undoubtedly pushed MoE from purely theoretical research to the forefront of practical applications. At the same time, the domestic model research and development team also smelled new opportunities.

A number of domestic companies such as MiniMax, Xindan Intelligence, and Yuanxiang Technology have announced their investment in the research and development of MoE architecture, and they have seen the infinite possibilities brought by MoE. The core idea of MoE, "divide and conquer", gives these enterprises hope to solve the problem of large model training.

For domestic large model developers, MoE not only solves many problems in the training process, but also shows unique advantages in inference. Traditional large model training methods are often accompanied by huge computing resources and long training cycles, but MoE easily improves the performance of the model by horizontally expanding the model without overburdening computing resources.

In terms of inference, the Router mechanism of MoE makes it possible to activate only some experts during inference, which greatly reduces the inference cost. This advantage makes domestic developers more competitive in commercial applications, and also brings them more market opportunities.

In particular, the flexibility and scalability of the MoE architecture have also brought new vitality to the domestic large-scale model market. With the continuous accumulation of technology and data, developers can easily add new experts to the model to further improve the performance of the model. This flexibility allows the MoE to quickly adapt to changes in the market and technological developments.

Nowadays, many leading teams in China have begun to try to apply MoE to the development of large models. Baidu's ERNIE model is one of the best examples, which enables in-depth language understanding and generation through MoE architecture design, and is widely used in fields such as text classification, sentiment analysis, and machine translation. The Tiangong series model released by Kunlun Wanwei is also a hybrid model based on the MoE architecture, and the Tiangong 3.0 that is in public testing also shows strong performance.

It can be said that the MoE architecture is not only a technological breakthrough, but also represents a new R&D concept and model. Under the guidance of this concept and model, China's large-scale model field is ushering in unprecedented development opportunities.

To some extent, this may be the key to China's catching up with or even surpassing the United States in the field of large models.

Author | Zhang Jinjing 

Read on