AI server shortage truth investigation: the price increased by 300,000 in two days, and even the "MSG King" entered the market

Smart stuff

Author | LI Shuiqing

Edit | Heart

Almost overnight, AI server prices soared in the domestic market.

Zhidong learned from a server channel salesperson that the price of popular AI server models equipped with NVIDIA A800 GPUs has reached 140~1.5 million yuan / unit, an increase of more than 40% over June this year; The price of AI servers equipped with 8 NVIDIA H800 GPUs is even more outrageous, rising by hundreds of thousands in a few days, approaching 2.8 million yuan / unit, an increase of more than 10%.

Since the start of the "100-model war", the domestic AI server industry can be described as a double heaven of ice and fire.

On the one hand, the wave of large models has brought a surge in demand for AI servers. Internet cloud manufacturers, AI model companies and industry companies need to invest money. Not only ICT leaders such as China Telecom have recently set a procurement project of more than 8 billion yuan of AI computing power servers, but even cross-border players such as "MSG King" Lotus Health have also stepped in, and recently invested 700 million yuan to purchase GPU servers.

On the other hand, massive demand is difficult to match insufficient supply. Popular AI server models have been sold several times, and nearly 3 million units are not available. Including Inspur Information, New H3C, Ningchang, Lenovo, Industrial Fulian and many other head manufacturers have launched large-model new machines, but when will they be on the list? As the United States tightens restrictions on NVIDIA GPUs and domestic AI chips, this issue may be marked with a big question mark.

The importance of AI servers for large models is self-evident. If the big model is compared to a child who needs to eat a lot of data, then the AI server is the chef who determines whether the child can eat well. The "100-model war" is essentially a battle for AI servers.

As the role of connecting chip manufacturers and connecting large model enterprises, how can Chinese server manufacturers break the game? This has also become an important topic in the development of the mainland large-scale model industry.

First, the AI server business under the wave of large models: soaring prices, the red ocean in sight, and customers breaking the circle

"Before, the server was not easy to sell, but now the reverse is the customer begging to buy!" A head server manufacturer agent salesman told Zhidong, "The price increase is secondary, many customers no longer care about more than tens of thousands, now is an obvious seller's market, after signing the order does not fully guarantee the delivery is the point of time, but will not promise liquidated damages." ”

AI server is a heterogeneous server, and its core chips can be combined in different ways, including CPU+GPU, CPU+TPU, CPU+other acceleration cards, etc. Compared with general-purpose servers, AI servers are more suitable for the needs of AI training and reasoning such as large computing power, high concurrency, and large data flow, and have become the "sweet potato" in the era of big models.

Taking the popular AI server model Wave NF5688M6 Server as an example, an agent listed a price of 1.25 million yuan on the e-commerce platform, and this server with 8 A800 GPUs was still 1.05 million yuan in May this year, but even this price is nearly 20% higher, it is out of stock. Another online store with goods has a price of nearly 1.6 million yuan NF5688M6, and the salesperson told Zhidong that the spot can be obtained for 1.45 million yuan, but there are only 2 units in hand, and more need to be matched with other brand machines such as Ningchang and Chaomicro.

Screenshot of the sales page of the NF5688M6 server on JD.com

The store told Zhidong that there was a batch of new products for AI servers based on H800 GPUs, but when we asked the price, the store called it outrageous, and it rose by hundreds of thousands in a few days. Some time ago, the price was only 2.5 million yuan, and now it takes 2.8 million yuan to win. The sales channel that reacted slowly changed its tune overnight and raised the price by 300,000 yuan.

For this year's market, server manufacturers and agent channels are quite flattered, and a server manufacturer person sighed to Zhidong: "Every time I thought that the computing power was going to become a "red ocean", it appeared an infinite "blue ocean". ”

This "blue ocean" has basically been "sealed" by the top. On October 8, the Ministry of Industry and Information Technology and other six departments jointly issued the Action Plan for the High-quality Development of Computing Power Infrastructure, which proposes that by 2025, the scale of computing power in mainland China will exceed 300EFLOPS (300 petaflops per second), and the proportion of intelligent computing power will reach 35%. Compared with the data of the China Academy of Information and Communications Technology, as of the end of June this year, the scale of computing power in mainland China reached 197EFLOPS, of which intelligent computing power accounted for 25%.

This means that the intelligent computing power index has increased by more than 110%, and it is expected that there will be an incremental market of about 56EFLOPS.

The relevant person in charge of the server leading enterprise Inspur Information told Zhidong: "The accelerated development of AIGC technology represented by large models has brought unprecedented opportunities to AI computing. Rich application scenarios and enthusiasm for technological innovation iteration have significantly increased the attention and demand for AI servers in the Chinese market, and may continue to maintain rapid growth in the next few years. ”

According to a previous report by IDC, a well-known industry research institution, the accelerated server market reached US$3.1 billion in the first half of 2023, an increase of 54% over the first half of 2022; China's acceleration server market will reach US$16.4 billion (about 119.884 billion yuan) by 2027.

The layout of intelligent computing power "blue ocean" and the intelligent computing center that gathers AI server clusters is a big grasp. As shown in the figure below, from March to October 2023, more than 10 super-large intelligent computing centers have been started or put into use across the country, evenly distributed throughout the country. Most of the enabled computing centers are expanding their capacity as they are used, which will expand the demand for AI servers.

The construction and use of some domestic intelligent computing center projects

The promoters behind it, Internet cloud manufacturers, operators, AI large model companies and industry leaders have also been involved, throwing hundreds of millions of orders to server manufacturers.

The relevant person in charge of the head server manufacturer New H3C told Zhidong: "The "100-model war" has been deepened, so that more and more enterprises, research institutions and developers have begun to use deep learning technology, which has promoted the demand for AI servers. The training and inference stages require a lot of computing resources for deep learning tasks, and AI servers can provide high-performance heterogeneous computing capabilities to meet such needs. ”

Recently, China Telecom's AI computing power server (2023-2024) centralized procurement project has completed the evaluation of bidding documents, purchasing a total of 4,175 training servers, with a total of about 8.463 billion yuan, and manufacturers such as Super Fusion, Inspur Information, New H3C, Ningchang, ZTE, Fiberhome, Lenovo and several Huawei agents have been shortlisted.

Under the huge waves, even crossover players such as "MSG King" Lotus Health are buying AI servers. According to a procurement contract dated September 28, New H3C will deliver 330 NVIDIA H800 GPU series computing power servers (each server contains 8 GPUs) to Lotus Science and Technology for a total contract price of 693 million yuan.

It can be seen that whether it is a tens of P-level intelligent computing centers at every turn, or hundreds of millions or billions of orders at a time, businessmen in the server industry no longer worry about selling goods. Under the wave of large models, the business price of AI servers has soared, the red sea is in sight, and customers have broken the circle, pushing AI server manufacturers to a nugget area.

Second, server manufacturers are piling up to issue large model new products, single can not be connected, scheduled until next year

"Half of the list is AI servers, more than twice as many as traditional servers." A person from a head server manufacturer told Zhidong, "AI servers will be in short supply for a while, the demand for reasoning machines has not really been released, many customers buy reasoning machines this year is also testing the water, next year may invest more." ”

Looking at the long-term track of large models, server manufacturers with faster responses have launched new hardware products for large models.

The situation of new server products launched by some manufacturers for large models

Compared with the previous dedicated small models, large model training places many new demands on the server. This includes not only high-performance computing power, big data storage, and more framework adaptation, but also higher data transmission efficiency, better breakpoint repair power, and scheduling and management capabilities of AI computing power clusters, which promote server vendors to launch new machines for large model training and inference.

1. Large models promote server design innovation, and head players grab the land

"Deep learning models are becoming larger and more complex, requiring higher computing power, driving AI servers to continuously improve performance, adopt powerful AI acceleration cards, and higher bandwidth and larger capacity." The relevant person in charge of New H3C told Zhidong, "In order to meet the needs of deep learning tasks, AI servers have promoted many design innovations. For example, in order to improve the computing density and performance of servers, thermal management and power consumption management of AI servers, and the construction of green data centers have also become important design considerations. ”

In June this year, New H3C launched the H3C UniServer R5500 G6, an AI server for large models, which is said to increase the computing power by 3 times compared with the previous generation of products, and shorten the training time for GPT-4 large model training scenarios by 70%.

As the industry leader in the AI server market share for five consecutive years, Inspur also upgraded and launched the latest NF5468 series of AI servers on September 21, which greatly improved the fine-tuning training performance of large models such as Llama. In order to achieve global optimal performance, energy efficiency or TCO, the industry chain needs to be coordinated. Since 2019, Inspur has led the formulation of OAM (Open Computing Project Accelerator Module) standards and accelerated adaptation to chip factories, and recently released a new generation of OAM server NF5698G7, full PCIE Gen5 link, H2D interconnection capability increased by 4 times.

The relevant person in charge of Inspur Information said that the large model puts forward higher requirements for the performance and function of the AI server, considering not only a single chip or a single server, but also the form of final deployment in most cases is an integrated and highly integrated intelligent computing cluster including computing, storage, network equipment, software, framework, model components, cabinets, refrigeration, power supply, liquid cooling infrastructure, etc.

The veteran server manufacturers represented by Lenovo are even more in the era of AI models for the company's strategic layout. In August this year, Lenovo launched two new AI server products - Lenovo WA7780 G3 AI Large Model Training Server and Lenovo WA5480 G3 AI Training and Push Integrated Server; At the same time, Lenovo released the "Puhui" AI computing power strategy for the first time, proposed to support 100% of computing power infrastructure products to support AI, 50% of infrastructure research and development investment in the field of AI and other strategic measures, and launched Lenovo intelligent computing center solutions and service core products.

Introduction of two new Lenovo server products

Chen Zhenkuan, vice president of Lenovo Group and general manager of the server business unit of the China Infrastructure Business Group, mentioned at the time that AI-oriented infrastructure should be designed and optimized according to the characteristics of AI data and algorithms, including the characteristics of AI data such as "vector, matrix or multidimensional array", "data noise" and other characteristics, as well as the characteristics of AI algorithms such as "large-scale parallel computing and matrix computing" and "tolerance of low-precision floating point or quantized integers".

2. The efficiency of computing power is more critical, testing the engineering ability of software and hardware collaboration

Although server manufacturers are chasing me to launch new models of large models, there are still a few people who can get the real machine in the first time. There are many new large model servers that use 8 H800, A800 or L40S GPUs. The person in charge of the relevant manufacturer told Zhidong that the new AI server product has no orders, and previously said that the order will be scheduled to 6 months later, and now it looks like 12 months later.

Even so, server vendors are still accelerating the chess game from software to ecology.

The relevant person in charge of Inspur Information told Zhidong that unlike the traditional small model, the ability of the large model comes from a large number of engineering practice experience. Therefore, when the scarcity of computing power resources in front of us is gradually solved next year, the efficiency of computing power under computing power is another difficult proposition.

Taking the pre-training stage as an example, firstly, the evolution of AI large models puts forward high requirements for the parallel computing efficiency, on-chip storage, bandwidth, and low-latency access to the cluster for the cluster, and the planning and construction, performance tuning, and computing power scheduling of the Vancard AI platform are difficult to solve. Secondly, large-scale training generally has problems that small-scale training will not encounter such as hardware failure and gradient explosion; Third, the lack of engineering practices makes it difficult for companies to achieve rapid improvements in model quality.

To this end, Inspur not only has hardware layout, but also accelerates full-stack capability coverage in software algorithms. Its latest launch is OGAI (Open GenAI Infra), which is its large-model intelligent computing software stack launched on August 24, which is said to provide AI computing power system environment deployment, computing power scheduling guarantee and model development management capabilities for large-model businesses, and help large-model enterprises solve the full-stack system problems, compatibility and adaptation problems, and performance optimization problems of computing power. Since 2019, Inspur has led the launch of the metabrain ecological plan, bringing together partners with core AI development capabilities and overall industry solution delivery capabilities.

Introduction to Inspur Information OGAI

Relevant experts of New H3C also believe that the advancement of the 100-model war makes large-scale AI server clusters need to be effectively managed and deployed. In order to manage and deploy these servers, efficient cluster management software and automation tools are required to ensure high availability, performance, and efficiency of the servers.

To this end, New H3C started from the enabling platform, data platform and computing power platform to create an overall AIGC solution. In August, LinSeer, a large model of the private domain of New H3C, achieved a domestic leading level of 4+ in the compliance verification of the large model standard organized by the China Academy of Information and Communications Technology. In addition, New H3C has also strengthened cooperation with leading Internet companies to explore the deep integration of private domain models and general models.

In addition, manufacturers are also competing to launch industry reports, standards and guidelines in the hope of gaining a voice.

For example, Inspur released the Open Acceleration Specification AI Server Design Guide, which refines and improves the full-stack design reference for AI chip application deployment from nodes to clusters for AIGC. While actively developing new products, Ning Chang actively participated in the AI server research project and participated in the compilation of the "AI Server White Paper".

It can be seen that the accelerated development of large models and AIGC technology has brought unprecedented opportunities to AI computing, but also brought huge challenges, which need to be dealt with from multiple levels such as hardware, software and algorithms, and ecosystems.

AI servers are a must for all server vendors, a battle for the blue ocean, and a battle for survival.

Still taking the industry leader wave information as an example, the company will achieve operating income of 24.798 billion yuan in 2023, a year-on-year decrease of 28.85%; The net profit attributable to the parent was 325 million yuan, down 65.91% from the same period last year. With the limited increment of the traditional general-purpose server market, how to grasp the opportunity of intelligent computing power under the opportunity of large models and obtain a larger market has become a key step for server manufacturers to achieve a new leap.

Third, to deal with industrial chain risks: the supply of international chip manufacturers is doubtful, and accelerate the support of domestic AI core shows

Lack of market is the other side of the explosion of demand for AI servers, and the reason behind it is the insufficient supply chain supply.

On October 17, the U.S. Department of Commerce's Bureau of Industry and Security (BIS) announced new export control rules for advanced computing chips and semiconductor manufacturing equipment, restricting China's ability to purchase and manufacture high-end chips. Nvidia adapted to previous restrictions by supplying the Chinese market with "castrated" versions of its flagship computing chips, the A800 and H800, which have reduced interconnection speeds. The changes in the new regulations may affect the sales of NVIDIA A800 and H800, and AMD, Intel, etc. are also expected to be affected by the new regulations, which undoubtedly aggravates the supply chain difficulties of domestic AI servers.

A number of industry insiders told Zhidong that for a long time, most of the well-known large models at home and abroad were trained based on GPGPU, accounting for about 90%, and only 10% were based on other ASIC chips. The GPGPU is basically the most efficient with NVIDIA's A100, A800, H100, and H800.

Due to the US ban and restrictions, coupled with NVIDIA's insufficient estimate of the market, GPGPU supply has become a key point to jam the AI server market. A channel person in the server field told Zhidong that the US ban has been tightened in the past few days, and many people are worried that the machines on the market will become out of print, and the price will rise immediately.

In fact, based on the background of limited supply, server leaders have continued to develop GPU servers in the past six months, and on the other hand, they have adopted open architectures and are compatible with domestic independent innovation chips. For example, Inspur has launched an open accelerated computing architecture, which is said to have the characteristics of large computing power, high interconnection and strong expansion. Based on this, Inspur released three generations of AI server products, realized the landing of multiple AI computing products with more than 10 chip partners, and launched the AIStation platform, which can efficiently schedule more than 30 AI chips.

Some server vendors bypass the GPGPU route and find another way to land AI servers from independent innovation hardware.

For example, on August 15, iFLYTEK and Huawei jointly released the iFLYTEK Spark all-in-one machine. Based on Kunpeng CPU + Ascend GPU, the Spark all-in-one machine uses Huawei storage and networking to provide a complete cabinet solution, with FP16 computing power of 2.5 PFLOPS. In contrast, the NVIDIA DGX A100 8-GPU, which is the most popular in large model training, can output 5PFLOPS FP16 computing power.

Huawei's AI inference training server and related parameters

According to the first financial report, the Xinghuo all-in-one machine is likely to use Huawei's Ascend 910B AI chip, which has not been officially released to the public, which is likely to be benchmarked against the A100. From the Atlas series of server products that Huawei has announced, it has involved a variety of inference machines and training machines, and the Ascend 910 used has slightly exceeded the A100 80GB PCIe version, and has been replaced in specific large model scenarios such as Pangu and iFLYTEK Spark.

However, Zhidong learned from the industry chain that the current Ascend 910 is more suitable for large models in its own ecosystem, and its own development frameworks such as MindSpore are not yet versatile. Other models, such as GPT-3, require deep optimization to run smoothly on Huawei platforms. Although large model manufacturers such as iFLYTEK have reached cooperation with it, a lot of work may have just begun.

In addition, it was learned from industry insiders that Haiguang Information has independently developed two generations of DCU deep calculation series products, and large-scale mass production, leading product performance, can better support the training and inference of general large models. There are also chip startups such as Cambrian, Moore Thread, Wall Technology, Muxi, etc. can also supply AI server manufacturers. Although some companies are affected by the US Entity List, the clearer situation objectively provides them with the impetus to accelerate product iteration and implementation.

In general, server manufacturers are mainly prepared to resist the risk of shortage in the industry chain. Industry insiders told Zhidong that because most AI chip startups only began to develop large-model AI chips at the end of last year, they may still be immature in chip architecture and software support, but through a faster iteration rhythm, domestic AI chips are expected to support some AI server demand at the end of this year or next year.

Conclusion: At the threshold of "100-model war", the role of AI server manufacturers as a "hub" is more critical

With the implementation of large models in thousands of industries, the deployment of AI computing power has become an important development direction of computing power infrastructure. IDC report shows that with the outbreak of generative AI applications, the demand for intelligent computing in various industries has exceeded general computing power for the first time, and AI computing power has become the main direction of computing power development, and has become a new strong driving force for the development of "East Data and West Computing".

The server industry and manufacturers are an important part of the construction of intelligent computing power. We have seen that the current domestic server market has seen soaring prices, the red ocean in sight, and customers breaking the circle, and at the same time, it is also facing the severe risks of supply chain shortage and supply and demand imbalance. At the threshold of the hundred model war, AI server manufacturers have come to the moment of verification of the dredging power of the industrial chain. Whether it can resist supply chain risks while forming a strong alliance with upstream and downstream partners has become a key for AI server manufacturers to break the game.

AI server shortage truth investigation: the price increased by 300,000 in two days, and even the "MSG King" entered the market

First, the AI server business under the wave of large models: soaring prices, the red ocean in sight, and customers breaking the circle

Second, server manufacturers are piling up to issue large model new products, single can not be connected, scheduled until next year

Third, to deal with industrial chain risks: the supply of international chip manufacturers is doubtful, and accelerate the support of domestic AI core shows

Conclusion: At the threshold of "100-model war", the role of AI server manufacturers as a "hub" is more critical

Read on