Zhitong Financial APP learned that CSC released a research report saying that generative AI has made a breakthrough, and the large model training and inference end represented by ChatGPT need strong computing power support, the resonance of the industrial chain is obvious, and the order of industrial chain amplification is: advanced process manufacturing - > 2.5D/3D packaging represented by Chiplet, HBM->AI chips - > board assembly - > switches - > optical modules - > liquid cooling - > AI servers - > IDC rental operation and maintenance.
On the whole, the large model is still in the melee stage, the application is in the early stage of penetration, the certainty of the growth of computing power demand in the AI sector is high, and in the next two years, the computing power sector will be in a high prosperity stage, focusing on recommending related companies in all links of the AI computing power industry chain.
CSC's main points are as follows:
Generative AI has made breakthroughs and achieved a leap from 0 to 1, and the training and reasoning of artificial intelligence large models represented by ChatGPT require strong computing power.
Since the official launch of ChatGPT by OpenAI at the end of 2022, the number of users has grown significantly, and applications related to ChatGPT have emerged one after another, and its versatility ability has helped humans save a lot of time in work such as text. At the same time, under the new architecture of Transformer, new breakthroughs have been made in multimodal large models, and functions such as Wensheng graphics and Wensheng videos have been continuously improved, and good progress has been made in advertising, games and other fields. Generative AI will be the most important productivity tool in the next few years, and profoundly change various industrial links, around generative AI, whether it is training or inference, computing power demand will be expected to grow explosively.
The order of value increase in the computing power industry chain is as follows: advanced process manufacturing - > 2.5D/3D packaging represented by Chiplet, HBM->AI chips-> board assembly-> switches-> optical modules-> liquid cooling->AI servers->IDC rental and maintenance.
Advanced packaging, HBM: In order to solve the problems of rapid increase in advanced process costs and "memory wall", Chiplet design + heterogeneous advanced packaging has become the best solution for performance and cost balance, and the CoWoS packaging technology developed by TSMC can realize the interconnection of computing cores and HBM through 2.5D packaging, so NVIDIA A100, H100 and other AI chips have adopted TSMC CoWos packaging, and are equipped with 40GB HBM2E and 80GB HBM3 memory. TSMC, the global foundry leader, has created a global benchmark for 2.5D/3D advanced packaging processes, and the growth of the packaging market in the next few years will be mainly due to the expansion of advanced packaging production.
AI chip/board packaging: represented by NVIDIA, began to release performance in the second quarter of this year. Model training requires large-scale computing power chips to be deployed in intelligent servers, and CPUs are indispensable, but performance improvement has encountered bottlenecks, and the CPU+xPU heterogeneous solution has become the standard for large computing power scenarios. Among them, GPU parallel computing has obvious advantages, CPU+GPU has become the most popular heterogeneous computing system, while NPU has obvious performance and efficiency advantages in specific scenarios, and the application potential of the inference end is huge.
In the AI acceleration chip market, NVIDIA is in a leading position in the market with the advanced performance of its hardware products and the perfection of ecological construction, and occupies a leading position in training and inference. According to Liftr Insights, NVIDIA's share of the data center AI acceleration market will reach 82% in 2022. Therefore, the demand for AI chips has exploded, and NVIDIA has benefited the most, with its Q2 revenue guidance of $11 billion, and it is expected that its data center chip business revenue will nearly double. Although domestic manufacturers have a gap with the former in terms of hardware product performance and industrial chain ecological architecture, they are gradually improving product layout and ecological construction, constantly narrowing the gap with industry leading manufacturers, and NVIDIA and AMD are limited in supplying high-end GPU chips to China, and domestic computing power chips are ushering in a domestic replacement window.
Switches: Compared with the network architecture of traditional data centers, AI data network architecture brings more switch port requirements. The switch has technical barriers, and the Chinese market pattern is stable.
Optical modules: AI computing power drives large data traffic in data centers, and the speed and number of optical modules have been significantly improved. The demand for the training side optical module is strongly correlated with the GPU shipment, and the demand for the inference side optical module is strongly related to the data traffic, and with the accelerated penetration of applications, the computing power and traffic required for future inference may actually be much greater than that of training. At present, the A100 GPU of NVIDIA on the training side mainly corresponds to 200G optical modules and 400G optical modules, and H100 GPUs can correspond to 400G or 800G optical modules.
According to the bank's calculations, the ratio of A100 and 200G optical modules at the training end is 1:7, and the ratio of H100 and 800G optical modules is 1:3.5. 800G optical modules will begin to be shipped in small quantities at the end of 2022, and the demand in 2023 will mainly come from NVIDIA and Google. At this point in time in 2023, the next generation of high-rate optical modules in the market are pointing to 800G optical modules, superimposed on the computing power and model competition brought by AIGC, the bank expects that major North American cloud vendors and related technology giants are expected to purchase 800G optical modules in large quantities in 2024, and may also purchase in advance in 2023.
Upstream of optical modules - optical chips: passive optical chips represented by AWG, PLC, etc., domestic manufacturers have a leading market share in the world. Active optical chips represented by laser chips, detector chips and modulator chips such as EEL, VCSEL, DFB are important cornerstones of modern optical technology and an important part of active optical devices.
Liquid cooling: The power density of GPU servers used in AI large model training and inference will be greatly improved, taking the NVIDIA DGX A100 server as an example, its maximum power can reach about 6.5kW, which greatly exceeds the power level of a single ordinary CPU server of about 500W.
The data shows that the density of a single cabinet of a natural air-cooled data center generally only supports 8kW-10kW, and usually a single cabinet of a liquid-cooled data center can support more than 30kW of heat dissipation capacity, and can better evolve to more than 100kW. At the same time, "East Data West Calculation" clarifies the PUE (total energy consumption of data center/IT equipment energy consumption) requirements, the PUE requirements of the hub node are higher, and considering the overall planning layout, more new cabinets will be in the hub node in the future, the air cooling scheme may not be able to meet the requirements strictly in some areas, and the penetration rate of the liquid cooling solution is expected to accelerate. At present, driven by the demand for AI computing power, server manufacturers such as Inspur Information and ZTE have begun to vigorously deploy liquid-cooled server products. In the process of accelerated penetration of liquid cooling solutions, data center temperature control manufacturers and liquid cold plate manufacturers are expected to benefit.
AI server: It is expected that Q2-Q3 this year will gradually release performance. Specifically, more than 70% of the cost of training AI servers is composed of GPUs, and the rest of the CPU, storage, memory, etc. account for a relatively small proportion, and the average price often reaches more than one million yuan. For inference-based servers, the GPU cost is about 2-30%, and the overall cost composition is similar to that of high-performance servers, and the price is often 200,000-300,000. According to IDC data, the global AI server market size will be $20.2 billion in 2022, a year-on-year increase of 29.8%, accounting for 16.4% of the server market size, a year-on-year increase of 1.2pct.
The bank believes that the global AI server market will maintain rapid growth in the next three years, with a market size of US$395/890/160.1 billion respectively, corresponding to a growth rate of 96%/125%/80%. According to IDC, China's AI server market will be worth US$6.7 billion in 2022, a year-on-year increase of 24%. The bank expects that from 2023 to 2025, combined with the prediction of the global AI server market size and the assumption that the proportion of mainland China will continue to increase, the size of the mainland AI server market is expected to reach 134/307/56.1 billion US dollars, a year-on-year increase of 101%/128%/83%. In terms of competitive landscape, considering the need for more adequate capital and technical support in AI server R&D and investment, the competition pattern in the domestic market is expected to continue to concentrate on the head and maintain a super competitive pattern.
IDC: In the context of digital China and artificial intelligence driving the recovery of the cloud computing market, IDC, as a key link in the cloud infrastructure industry chain, is also expected to enter the stage of demand release. In the past two and a half years, due to multiple factors, cloud computing demand has declined, but IDC construction and supply have not slowed down significantly, and the number of new cabinets in 2021 and 2022 will be 1.2 million and 1.5 million respectively, so there is a short-term imbalance between supply and demand (supply and demand in core areas are relatively good), and the power rate in some areas is average. Therefore, IDC's performance in 2022 is generally under pressure.
At present, the bank believes that the domestic IDC industry is expected to be marginally good. With the improvement of the macro economy, the recovery of platform economic development, and the pull of AI, IDC demand is expected to be gradually released, and the new supply in 2023 is expected to decrease compared with 2022 (for example, the three major operators will add 156,000 IDC cabinets in 2022 and plan to add 114,000 in 2023). Looking forward to the future, telecom operators will still achieve rapid growth in cloud computing business, and Internet companies such as Baidu and ByteDance are expected to achieve breakthroughs in the AIGC field, which will generate greater new demand for cloud infrastructure including IDC, and related IDC vendors are expected to benefit.
Risk warning: the process of domestic substitution is not as expected. The domestic replacement process of GPUs faces many difficulties, and the domestic substitution process may not be as expected; AI technology is not progressing as expected. The rapid progress of current AI technology has driven a huge demand for AI computing power, and if the progress of AI technology is not as expected, it may adversely affect the overall demand of the GPU market. Internet vendors' capital expenditures are less than expected, etc.