laitimes

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

author:Think Tank of the Future

(Report produced by: CSC Securities)

First, AI is expected to significantly drive investment in computing power infrastructure

The popularity of 1.1ChatGPT has aroused people's great attention to the development of artificial intelligence

Artificial intelligence (AI) refers to the intelligence displayed by machines, that is, computers simulate various functions of the human brain based on big data, such as reasoning, visual recognition, semantic understanding, learning ability, and planning and decision-making ability. Artificial Intelligence Generated Content (AIGC) refers to the use of artificial intelligence technology to generate content, including painting, composition, editing, writing, etc. The germination of AIGC can be traced back to the 50s of the last century, and the 90s gradually changed from experimental to practical, but limited by algorithm bottlenecks, unable to directly generate content, from the 10s of the 21st century, with the introduction and iteration of deep learning algorithms represented by generative adversarial networks (GANs), AIGC ushered in a stage of rapid development.

Market demand has accelerated the implementation of AIGC technology. 1) Reduce labor and time costs: AIGC can help people do a lot of tedious work, thereby saving human capital and working hours, and can produce more content in the same amount of time. 2) Improve the quality of content. AIGC is considered to be a new way of producing content after professional production content (PGC) and user generated content (UGC). Although PGC and UGC content is more diverse and personalized, the market is not well supplied due to incentives and creators' own factors. 3) Promote industrial digitalization and help the development of the digital economy. Industrial digitalization is the integration part of the digital economy, which is the production quantity and efficiency improvement brought by the application of digital technology in traditional industries, and its new output constitutes an important part of the digital economy, and AIGC provides important data elements for the digital economy.

The popularity of ChatGPT has aroused people's great attention to the development of artificial intelligence. On November 30, 2022, OpenAI released the language model ChatGPT. The model interacts with people in the form of a dialogue, answering follow-up questions, admitting mistakes, challenging incorrect premises, and rejecting inappropriate requests. ChatGPT not only shows strong capabilities in daily conversations, professional question answering, information retrieval, content continuation, literary creation, music creation, etc., but also has the ability to generate code, debug code, and generate comments for code.

1.2 Artificial intelligence requires strong computing power

Artificial intelligence applications represented by ChatGPT require strong computing power support behind their operation. OpenAI launched in 2018 with 117 million GPT parameters and about 5GB of pre-training data, while GPT-3 has 175 billion parameters and 45TB of pre-training data. In the model training phase, the total computing power consumption of ChatGPT is about 3640PF-days, and the total training cost is 12 million US dollars, and there will be greater consumption in the service access phase.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

According to IDC, global AI IT investment was $92.95 billion in 2021 and is expected to increase to $301.43 billion in 2026, with a compound annual growth rate of about 26.5%. AI investment in the Chinese market is expected to reach US$26.69 billion in 2026, accounting for about 8.9% of global investment, ranking second in the world, with a compound annual growth rate of about 21.7%. In the next five years, hardware will become the largest market segment of Chinese intelligence, accounting for more than 50% of the total investment in artificial intelligence. IDC predicts that China's IT investment in the AI hardware market will exceed $15 billion by 2026, approaching the size of the U.S. AI hardware market, with a five-year CAGR of 16.5%. Servers, as a major component of the hardware market, are expected to account for more than 80% of total investment. The development of artificial intelligence will put forward higher requirements for computing power, and the demand for computing power network infrastructure is expected to continue to increase. According to data from the China Academy of Information and Communications Technology, the total computing power of global computing devices reached 615EFlops (floating point operations per second) in 2021, a year-on-year increase of 44%, of which the basic computing power scale is 369EFlops, the intelligent computing power scale is 232EFlops, and the supercomputing power scale is 14EFlops, and the global computing power scale is expected to reach 56ZFlps in 2030, with an average annual growth of 65%. The scale of intelligent computing power in mainland China continues to grow rapidly, and the scale of intelligent computing power has exceeded that of general computing power in 2021. According to data from the China Academy of Information and Communications Technology, the total computing power of computing equipment in mainland China has reached 202EFlops, accounting for about 33% of the world, maintaining a high-speed growth trend of more than 50%, and the growth rate is higher than that of the world, of which intelligent computing power has grown rapidly, with a growth rate of 85%, accounting for more than 50% of the mainland's computing power.

The 1.3AI computing power industry chain involves many links, and the industry demand is expected to increase comprehensively

The AI computing power industry chain involves many links, including AI chips and servers, switches and optical modules, IDC computer rooms and upstream industry chains. Among them, with the increase in training and inference requirements, the demand for AI chips and servers will be the first to increase; AI computing power has a large impact on the internal data traffic of the data center, the speed and number of optical modules have been significantly improved, and the number of ports and port rates of the switch have also increased accordingly. IDC is also expected to enter the demand release stage, and it is expected that the penetration rate of liquid-cooled temperature control will increase rapidly, and the submarine data center may also usher in a key node of industrialization.

1. The demand for AI chips and servers will be the first to increase

According to estimates, the peak computing power demand of global large model training will grow at a compound annual growth rate of 78.0% from 2023 to 2027. In 2023, the total amount of A100 converted into all the computing power required by the global large-mode training terminal will exceed 2 million. From the perspective of computing power required for cloud reasoning, from 2023 to 2027, the peak computing power demand of global large model cloud inference will grow at a compound annual growth rate of 113%, and if the application of edge AI inference is considered, the scale of computing power on the inference end will be further expanded.

2. AI computing power changes the internal network architecture of the data center, and the speed and demand of optical modules and switches increase

In AI data centers, due to the large internal data traffic, the non-blocking fat tree network architecture has become one of the important requirements, the optical module rate and number have been significantly improved, and the number of ports and port rates of the switch has also increased accordingly. 800G optical modules will begin to be shipped in small quantities at the end of 2022, and the demand in 2023 will mainly come from NVIDIA and Google, and large-scale shipments are expected in 2024, and there is a possibility of time forward. From the perspective of the electrical port of the switch, the rate of the SerDes channel doubles every four years, the number doubles every two years, and the bandwidth of the switch doubles every two years; From the perspective of optical ports, optical modules are upgraded every 4 years, and the actual shipment time is later than the release of the new version of the electric port SerDes and switch chips. In 2019, as the time point of 100G optical module upgrade, the market is divided into two upgrade paths: 200G and 400G. However, at this point in time in 2023, the next generation of high-rate optical modules in the market are pointing to 800G optical modules, superimposed on the computing power and model competition brought by AIGC, we expect that all major cloud vendors and related technology giants in North America are expected to purchase 800G optical modules in large quantities in 2024, and may also purchase them in advance in 2023.

3. IDC demand is expected to be released, and the high power density of AI servers may push up the penetration rate of liquid cooling

As a key link in the computing power infrastructure industry chain, IDC is also expected to enter the demand release stage. In the past two and a half years, due to multiple factors, cloud computing demand has declined, but IDC construction and supply have not slowed down significantly, and the number of new cabinets in 2021 and 2022 will be 1.2 million and 1.5 million respectively, so there is a short-term imbalance between supply and demand (the supply and demand situation in the core area is relatively good), and the power rate in some areas is average. Therefore, IDC's 2022 performance is generally under pressure. With the recovery of platform economy development and AI, IDC demand is expected to be gradually released, and the new supply in 2023 is expected to decrease compared with 2022 (for example, the three major operators will add 156,000 IDC cabinets in 2022 and plan to add 114,000 in 2023). The power density of GPU servers used in AI big model training and inference operations will be greatly improved, taking the NVIDIA DGX A100 server as an example, its maximum power can reach about 6.5kW, which greatly exceeds the power level of a single ordinary CPU server of about 500W. In this case, on the one hand, it is necessary to build a new ultra-high-power cabinet, on the other hand, in order to reduce PUE, it is expected that the penetration rate of liquid-cooled temperature control will increase rapidly, and the submarine data center may also usher in a key node of industrialization.

Second, the demand for AI chips is growing explosively

2.1 The large-scale application of AI puts forward all-round requirements for the performance and quantity of AI chips

Broadly speaking, chips that can run AI algorithms are called AI chips. CPU, GPU, FPGA, NPU, ASIC can all execute AI algorithms, but there is a huge difference in execution efficiency. CPUs can quickly perform complex mathematical calculations, but when performing multiple tasks at the same time, CPU performance starts to degrade, and there is now basic confirmation in the industry that CPUs are not suitable for AI computing. The heterogeneous solution of CPU+xPU has become the standard for large computing power scenarios, and GPU is the most widely used AI chip. At present, the types of AI chips widely recognized in the industry include GPU, FPGA, NPU, etc. Since the CPU is responsible for controlling the provisioning of the computer's hardware resources and the operation of the operating system, it is still indispensable in modern computing systems. GPUs, FPGAs and other chips exist as accelerators for CPUs, so the current mainstream AI computing systems are heterogeneous parallelism of CPU+xPU. CPU+GPU is currently the most popular heterogeneous computing system, which is the mainstream choice in scenarios such as HPC, graphics and image processing, and AI training/inference. IDC data shows that in 2021, the market share of GPUs in China's AI chip market was 89%.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

2.1.1 GPU performance and functions have undergone long-term iterative upgrades, becoming the most widely used choice in AI chips

GPUs are capable of parallel computing and are designed to accelerate graphics rendering. When NVIDIA released the GeForce 256 graphics processing chip in 1999, it first proposed the concept of a GPU (Graphic Processing Unit) and defined it as "a single-chip processor with an integrated conversion, lighting, triangle setup/cropping, and shading engine capable of processing at least 10 million polygons per second." From the perspective of the proportion of computing resources, the CPU contains a large number of control units and cache units, and the actual computing unit accounts for a relatively small proportion. GPUs use a large number of arithmetic units, a small number of control units and cache units. The architecture of the GPU enables parallel computing at scale, especially for simple, computationally intensive tasks. GPUs improve computer performance and speed up applications by undertaking some computationally intensive functions (such as rendering) from the CPU, which is also the early function positioning of GPUs.

GPU performance improvement and rich features gradually meet the needs of AI computing. The Fermi architecture, proposed by NVIDIA in 2010, was the first complete GPU computing architecture, and many of the new concepts proposed are still in use today. The Kepler architecture has a double precision computing unit (FP64) in hardware, and proposes GPU Direct technology, bypassing the CPU/System Memory and directly interacting with other GPUs. The Pascal architecture uses the first generation of NVLink. The Volta architecture began to apply Tensor Core, which is of great significance for AI computing acceleration. A brief review of the hardware transformation of NVIDIA GPUs, the upgrade of basic features such as the increase in process and the number of computing cores continues to promote performance improvement, and the functional features contained in each generation of architecture are constantly enriched, gradually better adapting to the needs of AI computing.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

Under the premise of balanced allocation of resources, the number of hardware units that handle low precision is higher, and the computing power performance is higher. In order to have good performance on data types with different precisions, NVIDIA generally balances the allocation of hardware units that handle different data types. Because the calculation of low-precision data types occupies fewer hardware resources, the number of hardware units processing low-precision data types in the same GPU is larger, and the corresponding computing power is also stronger. Taking V100 as an example, the number of FP32 cells in each SM is twice that of FP64 cells, and the final FP32 hashrate (15.7 TFLOPS) of V100 is also approximately twice that of FP64 (7.8 TFLOPS), and similar patterns can be seen in the flagship P100, A100 and H100 of various generations.

Tensor Core continues to iterate to improve its acceleration capabilities. The changes introduced by Tensor Core in the Volta architecture have significantly improved the AI computing power of the GPU, and in each subsequent generation of architecture upgrades, Tensor Core has been greatly improved, and the supported data types have gradually increased. Taking A100 to H100 as an example, Tensor Core iterated from 3.0 to 4.0, and the peak throughput of H100 in FP16 Tensor Core increased to 3 times that of A100. At the same time, H100 Tensor Core supports the new data type FP8, and H100 FP8 Tensor Core has 6 times the throughput of A100 FP16 Tensor Core.

Data access governs computing power utilization. AI computing involves the storage and processing of large amounts of data, and according to Cadence data, each AI training server requires 6 times the memory capacity compared to the typical workload. Over the past few decades, processors have been running at speeds along with Moore's Law, while DRAM has improved much more slowly than processors. At present, the performance of DRAM has become an important bottleneck in the overall computer performance, the so-called "memory wall" that hinders performance improvement. In addition to performance, memory's energy efficiency limitations have become a bottleneck, with Cadence data showing that storage accounts for 82% of energy consumed in natural language AI workloads.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

The improvement of hardware units and video memory upgrades have enhanced the release of single GPU computing power, however, with the large-scale development and application of Transformer models, the number of model parameters has exploded, and the amount of GPT-3 parameters has reached 175 billion, which is nearly 1500 times higher than GPT, and the amount of pre-training data has increased from 5GB to 45TB. Many problems caused by the exponential growth of large model parameters make GPU clustering operations necessary: (1) Even the most advanced GPUs are no longer possible to fit model parameters to main memory. (2) Even if the model can be installed in a single GPU (e.g., by swapping parameters between host and device memory), the large number of computational operations required may result in unrealistically lengthening the training time without parallelization. According to NVIDIA data, training a GPT-36 model on 8 V100 GPUs with 175 billion parameters takes 36 years, while training on 512 V100 GPUs takes 7 months.

NVIDIA developed NVLink technology to solve GPU cluster communication. On the hardware side, stable and high-speed communication between GPUs is a necessary condition for cluster computing. The interconnect bandwidth of traditional x86 server interconnect channel PCIe is determined by its generation and structure, for example, x16 PCIe 4.0 bidirectional bandwidth is only 64GB/s. In addition, the interaction between GPUs via PCIe competes with CPU operations on the bus and even further consumes the available bandwidth. In order to break through the bandwidth limitations of PCIe interconnect, NVIDIA equipped the P100 with NVLink (a bus and communication protocol), the first high-speed GPU interconnect technology, and the GPUs no longer need to interact with each other through PCIe.

NVDIA develops NVSwitch, a chip based on NVLink, as a "hub" for data communication in GPU clusters. With NVLink 1.0 technology, all 8 GPUs in a server cannot be directly interconnected. At the same time, when the number of GPUs increases, relying only on NVLink technology requires a large number of buses. To solve the above problems, NVIDIA released NVSwitch during NVLink 2.0, which realized full connectivity of NVLink. NVSwitch is a GPU bridging chip that provides the NVLink cross-network needed to act as a "hub" in communication between GPUs. With NVswitch, each GPU can access other GPUs with the same latency and speed. In terms of programs, all 16 GPUs are considered one GPU, maximizing system efficiency and greatly reducing the difficulty of optimization for multi-GPU systems.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

Cluster distributed computing is enabled by adding more NVSwitch to support more GPUs. NVLink networks can also provide significant improvements when training large language models. NVSwitch has become an integral part of high-performance computing (HPC) and AI training applications.

2.1.2 NPUs accelerate AI operations through special architecture designs

NPU has high operational efficiency on artificial intelligence algorithms. Designed to accommodate common applications and algorithms in a particular domain, often referred to as "Domain Specific Architecture (DSA)" chips, NPU (Neural Network Processor) is one of them, often designed for the acceleration of neural network operations. Taking the Huawei mobile phone SoC Kirin 970 as an example, the NPU has a significant acceleration effect on the operation of the image recognition neural network, making its image recognition speed significantly better than the performance of competitors of the same generation.

At present, there are many NPUs or chips equipped with NPU modules, and other well-known chips include Google TPU, Huawei Ascend, Tesla FSD, Tesla Dojo and so on. Various manufacturers have their differences in the design of computing cores, such as Google's TPU's pulsating array and Huawei's Ascend's da Vinci architecture. Take Google TPU and the computing core structure pulsation array as an example, compare the difference between CPU and GPU: CPU and GPU are universal, but at the cost of resource consumption caused by frequent memory access. Both CPUs and GPUs are general-purpose processors that can support millions of different applications and software. For every computation in the ALU, the CPU, GPU needs to access registers or caches to read and store intermediate computation results. Because the speed of data access is often much slower than the speed of data processing, frequent memory accesses limit the overall throughput and consume a lot of energy. The Google TPU is not a general-purpose processor, but is designed as a matrix processor specifically for neural network workloads. TPUs cannot run word processors, control rocket engines, or perform banking transactions, but they can handle a lot of multiplication and addition of neural networks at great speed, while consuming less energy and taking up less physical space. Inside the TPU, a pulsating array consisting of multipliers and adders is designed. At calculation time, the TPU loads the parameters from memory into the multiplier and adder matrix, and each time the multiplication is executed, the result is passed to the next multiplier while summing. So the output will be the sum of all multiplication results between the data and the parameters. Throughout massive computation and data delivery, there is no need to access memory at all. This is why TPUs can achieve high computational throughput on neural network computations with much lower power consumption and a smaller footprint.

NPUs have found widespread use in the field of AI computing acceleration. The NPU case for large-scale use in data centers, or TPUs, has been used by Google to build supercomputers in data centers to perform training tasks for specific neural networks. On the user side, mobile phones, cars, smart security cameras and other devices began to be equipped with AI computing functions, usually using the trained neural network model to perform image processing and other work, at this time the disadvantages of poor NPU versatility were reduced, and the advantages of high computing power and high energy consumption ratio were amplified, so it was widely used. In terminal equipment, NPUs are often included in the SoC in the form of modules to accelerate AI operations, such as Tesla's autopilot chip FSD contains NPU.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

2.1.3 Training/inference and cloud/edge respectively put forward different requirements for AI chips, and the computing power requirements of the inference end will far exceed those of the training end in the future

AI technology in practical application includes two links: training (Training) and inference (Inference). Training refers to training a complex neural network model through big data so that it can be adapted to specific functions. Training requires high computing performance, can process massive amounts of data, and has certain versatility. Inference refers to the process of using a trained neural network model to perform calculations and use the new input data to obtain the correct conclusion at the first time.

According to the different tasks undertaken, AI chips can be divided into training AI chips and inference AI chips: (1) Training chips: used to build neural network models, which require high computing power and certain versatility. (2) Inference chip: use the neural network model for inference prediction, pay attention to comprehensive indicators, unit energy consumption computing power, delay, cost, etc. must be considered. According to the location of AI chip deployment, it can be divided into cloud AI chip and edge AI chip: (1) Cloud: that is, data center, focusing on computing power, expansion capacity, and compatibility. AI chips deployed in the cloud include training chips and inference chips. (2) Edge: that is, mobile phones, security cameras and other fields, pay attention to comprehensive performance, and require low power consumption, low latency, and low cost. AI chips deployed at the edge are focused on inference functions. The proportion of cloud inference has gradually increased, and the number of AI landing applications has increased. According to IDC data, as artificial intelligence enters the critical period of large-scale landing applications, in 2022, inference accounts for 58.5% of computing power in cloud deployment, training accounts for only 41.5% of computing power, and it is expected that by 2026, inference accounts for 62.2% and training accounts for 37.8%. The gradual increase in the proportion of cloud inference indicates that the number of AI landing applications is increasing, and artificial intelligence models will gradually enter the extensive production mode.

The limitation of bandwidth and interconnection rate makes A100 and H100 better for hyperscale model inference in the cloud instead of inference cards such as T4 and A10. Taking GPT-3 as an example, OpenAI data shows that the 175 billion parameters of the GPT-3 model correspond to more than 350GB of GPU memory requirements. Assuming that the parameter size has a linear relationship with the required video memory, and the amount of intermediate parameters of inference is estimated by 1 times, the inference of a large model with a parameter scale of 1 trillion parameters requires about 4000GB of video memory, and 50 A100 (80GB) or 167 A10 (24GB) is required. The larger number of GPUs in the cluster means more complex interconnect requirements, and A10 cannot apply NVLink and NVSwitch technology, a large number of A10 clusters rely only on PCIe communication, and the interconnect bandwidth is significantly disadvantageous compared to graphics cards such as A100, which may lead to poor timeliness of model inference.

It is estimated that AI large models will generate huge computing power/AI chip requirements on both the training and inference ends. If large models are widely commercialized in the future, the demand for computing power/AI chips on the inference side will be significantly higher than that on the training side. Estimation of the demand for computing power for cloud training of large models: Calculation principle: Starting from the scale of (1) parameters of the model, estimate the total computing power requirement according to (2) the number of tokens required to train the large model and (3) the relationship between the training cost of each token and the number of model parameters, and then consider (4) the computing power of a single GPU and (5) the computing power utilization of the GPU cluster to derive the total GPU demand. (1) Parameter scale: In the past few years, the number of parameters of large models has increased exponentially, and the number of parameters of GPT-3 models has reached 175 billion. GPT-4 has multimodal capability, and its number of parameters will be larger than GPT-3. In our calculations, we assume that the average number of parameters of multimodal large models will reach 1,000 billion in 2023, and then maintain an annual growth rate of 20%; The average number of parameters of ordinary large models reached 200 billion, and then maintained a growth rate of 20% every year. (2) The number of tokens required to train large models: the parameter scale of natural language large models GPT-3, Jurassic-1, Gopher, MT-NLG in the order of hundreds of billions, the number of tokens required for training is in the order of hundreds of billions, and the amount of Token data required by some multimodal large models in the training process also increases with the increase of the number of parameters, we assume in the calculation that the number of tokens required for training a multimodal large model reaches the trillion level, and the number of tokens and models The parameter scale maintains a linear growth relationship.

(3) The relationship between the training cost per token and the number of model parameters: Referring to the analysis in the paper "Scaling Laws for Neural Language Models" published by OpenAI, the training cost of each token is usually about 6N, where N is the number of parameters of LLM, and we follow this relationship in our calculations. The specific principle is as follows, the training process of neural network includes two processes: forward propagation and backpropagation, which roughly includes four steps: 1. Do a single inference operation and get the output y, for example, the picture of the input cat gets the output 0.986. 2. Find the difference between output y and the true target output Y (assuming the target output Y=1 is set), for example, the difference between the output and the target true value is 0.014. 3. Backtrace the output difference and calculate the gradient relationship of the difference with respect to each parameter. 4. Correct the parameters of each neuron according to the output difference and gradient, realize the parameter update of the neural network, and promote the output to approximate the target true value. Therefore, in a neural network with N parameters, the overall amount of computation brought about by one input to the training process is roughly 6N, where 2N is the forward propagation process and 4N is the backpropagation process.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

(4) Single GPU computing power: Because when training large models, it mainly relies on the achievable mixed-precision FP16/FP32 FLOPS, that is, the computing power of FP16 Tensor Core, we select the computing power corresponding to A100 SXM and H100 SXM as parameters in the calculation. (5) Computing power utilization of GPU clusters: Referring to the analysis in the paper "PaLM: Scaling Language Modeling with Pathways" published by Google Research, we assume that the computing power utilization is about 30%.

Measurement of the demand for computing power for large-model cloud inference: In the cloud inference scenario, we measure the computing power required for cloud inference and the memory required for cloud model deployment. Calculation principle of computing power angle: Based on the assumptions of parameter scale and number of models in the previous article, the total computing power demand of the inference end is estimated based on (1) the number of daily active users of the large model, (2) the average number of tokens queried per person, (3) the relationship between the inference cost per token and the number of model parameters, and then the total GPU demand is derived from (4) the computing power utilization of a single GPU and the GPU cluster. (1) Number of daily active users of large models: According to Similarweb statistics, the number of daily active users of ChatGPT reached 13 million in January 2023. In our calculations, we assume that the average daily activity of multimodal large models will reach 20 million in 2023, and the average daily activity of ordinary large models will reach 10 million, and then maintain rapid growth every year. (2) Average number of tokens queried per person: According to OpenAI data, there are an average of 750 words per 1,000 tokens, and we assume that the average number of tokens queried by each user remains at 1,000. (3) The relationship between the inference cost per token and the number of model parameters: Referring to the analysis in the paper "Scaling Laws for Neural LanguageModels" published by OpenAI, the inference cost of each token is usually about 2N, where N is the number of parameters of LLM, and we follow this relationship in our calculations. (4) Single GPU computing power: Since the magnitude of large model parameters in the calculation is on the order of hundreds of billions and trillions, considering the bandwidth capacity and bandwidth limitations in cluster computing, we assume that H100 or A100 is used as the cloud inference card in the calculation.

According to all assumptions and obtainables, from the perspective of computing power required for cloud inference, the peak computing power demand of global large model cloud inference will grow at a compound annual growth rate of 113% from 2023 to 2027. Memory angle measurement principle: First of all, SK Hynix has developed the industry's first 12-layer 24GB HBM3, considering the limited area of a GPU board, which limits the number of HBMs that can be arranged around the computing core, so the GPU memory capacity will have less room to improve in the future. Second, the most important requirement of reasoning is timeliness, in order to meet timeliness, the storage space required by the model needs to be placed in the video memory. Combining the two conditions of limited HBM capacity of GPU board and the inference end model needs to be placed in GPU video memory, we first estimate the memory capacity required for the inference end to run a large model (1), and then assume the peak access volume of the large model in the business scenario, and obtain the overall video memory requirement (2), and finally get the demand for computing power/AI chip. (1) Video memory required to run a model: Taking the GPT-3 model with 175 billion parameters as an example, OpenAI data shows that parameter storage requires 350GB of space. Suppose that the parameters generated in the middle of the inference calculation are doubled, so inference requires at least 700GB of memory space, that is, nine 80GB memory versions of the A100 are required to deploy a model. (2) The amount of business scenario deployment model and the required video memory: Assume that the number of concurrent tasks that the model can process at the same time is 100, that is, 9 A100 80GB can handle 100 users with concurrent access at the same time. If the maximum number of concurrent visitors is 20 million, you need 20 million/100*9=1.8 million A100 80GB.

2.2 NVIDIA's leading position is solid, and domestic manufacturers are gradually catching up

Overseas leading manufacturers occupy a monopoly position, and the AI acceleration chip market presents a trend of "one super and many strong". In the data center CPU market, Intel's share has declined, but it still maintains a large lead, and AMD continues to seize the share. In the AI acceleration chip market, NVIDIA is the largest company in training and inference with its hardware advantages and software ecology. According to Liftr Insights data, in the data center AI acceleration market in 2022, NVIDIA accounted for 82%, other overseas vendors such as AWS and Xilinx accounted for 8% and 4% respectively, AMD, Intel, Google accounted for 2%. Domestic manufacturers started late and are gradually making efforts, and a number of breakthrough enterprises have emerged in the field of some accelerated chips, but at present, most of them are small start-ups, technical capabilities and ecological construction are still incomplete, and there is still a big gap between them and overseas manufacturers in the field of high-end AI chips. In the future, as the United States continues to increase export restrictions on China's high-end chips, the localization process of AI chips is expected to accelerate.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

In the GPU market, overseas leaders occupy a monopoly position, and domestic manufacturers are accelerating to catch up. At present, NVIDIA, AMD, and Intel dominate the global GPU chip market. Integrated GPU chips are generally used in desktop and notebook computers with low performance and power consumption, and major manufacturers include Intel and AMD; Discrete graphics cards are commonly used in servers for higher performance and higher power consumption, and major manufacturers include NVIDIA and AMD. In terms of application scenarios, NVIDIA and AMD occupy the main share of the server GPU market used in artificial intelligence, scientific computing, video coding and decoding scenarios. According to JPR, in Q1 2023, NVIDIA's discrete graphics cards (including AIB partner graphics cards) will have a market share of 84%, while AMD and Intel will account for 12% and 4% respectively.

Graphics rendering GPUs: NVIDIA has led the industry for decades, and has achieved long-term leadership in continuous technology iteration and ecosystem construction. Since 2006, NVIDIA GPU architecture has maintained a rhythm of updating every two years, the performance of each generation product has been significantly improved, the ecological construction is complete, the market share of Geforce series products has long maintained the first place in the market, the latest generation GeForce RTX 40 series represents the peak performance of the current graphics card, using the new Ada Lovelace architecture, TSMC 5nm level process, with 76 billion transistors and 18,000 CUDA cores, and Ampere Approximately 70% more cores than architecture, nearly tripling the energy consumption ratio, and driving DLSS 3.0 technology. Performance far exceeds that of the previous generation. AMD discrete GPUs have a clear iteration path in the RDNA architecture, RDNA 3 architecture adopts 5nm process and chiplet design, which has a 54% performance improvement per watt over RDNA 2 architecture, and it is expected that the RDNA 4 architecture can be officially released by 2024 and will be manufactured using a more advanced process. At present, the gap between domestic manufacturers and foreign leading manufacturers in graphics rendering GPUs is constantly narrowing. Xindong Technology's "Fenghua No. 2" GPU pixel fill rate 48GPixel/s, FP32 single-precision floating-point performance 1.5TFLOPS, AI operation (INT8) performance 12.5TOPS, measured power consumption 4~15W, support OpenGL4.3, DX11, Vulkan and other APIs, to achieve a breakthrough in domestic graphics rendering GPUs. Although Jingjiawei lags behind NVIDIA's products of the same generation in terms of process manufacturing, core frequency and floating point performance, the gap is gradually narrowing. In 2023, the JM9 series graphics processing chip will be successfully released, supporting OpenGL 4.0, HDMI 2.0 and other interfaces, as well as H.265/4K 60-fps video decoding, with a core frequency of at least 1.5GHz, equipped with 8GB of video memory, and floating-point performance of about 1.5TFlops, similar to NVIDIA's GeForce GTX1050 performance, and is expected to benchmark against GeForce GTX1080.

GPGPU: NVIDIA and AMD are currently the global leaders in GPGPUs. NVIDIA's general-purpose computing chip has excellent hardware design, through the CUDA architecture and other full-stack software layout, to achieve the generalization of GPU parallel computing, deeply explore the performance limit of chip hardware, in various downstream applications, have launched high-performance software and hardware combinations, and gradually become the global leader in the field of AI chips. According to stateof. AI 2022 reported that NVIDIA chips appear far more frequently in AI academic papers than other types of AI chips, and are the most commonly used artificial intelligence acceleration chips in academia. In Oracle and Tencent Cloud, NVIDIA's GPUs are also almost all used as computing acceleration chips. AMD released the Radeon Instinct GPU-accelerated chip for data centers in 2018, the Instinct series is based on CDNA architecture, such as MI250X adopts CDNA2 architecture, achieving significant improvement in computing power and interconnection capabilities in the field of general computing, in addition to launching the AMD ROCm open source software development platform that benchmarks the NVIDIA CUDA ecosystem. NVIDIA's H100 and A100, AMD's MI100, MI200 series, etc. are the most mainstream GPGPU product models.

In terms of ASIC market, due to its certain customization attributes, the market structure is relatively fragmented. ASICs also have a place in the field of artificial intelligence. Among them, Google is in a relatively cutting-edge technical position, since 2016, it has launched an ASIC customized for machine learning, that is, a tensor processing unit (TPU), and recently, Google first announced the details of its AI chip TPU v4 for training artificial intelligence models, which uses low-precision computing, greatly reducing power consumption and speeding up computing without affecting the effect of deep learning processing, while using a pulsation array Columns and other designs to optimize matrix multiplication and convolution operations, multiplication of large-scale matrices can maximize data reuse, reduce the number of memory visits, greatly improve the training speed of Transformer models, and save training costs. Google claims that the TPU-based Google supercomputer is up to 1.7 times faster and 1.9 times more energy-efficient than the NVIDIA A100 chip-based system on the same size system. Google TPUs are custom ASIC chips, which are integrated chips tailored for neural networks and TensorFlow learning frameworks, etc., and need to be used in such specific frameworks to perform the highest operating efficiency.

The ecosystem determines the user experience and is the deepest moat for computing power chip manufacturers. Although the computing power of NVIDIA's GPU itself hardware platform is excellent, its strong CUDA software ecosystem is the key force to promote the popularity of its GPU computing ecosystem. From a technical point of view, the performance threshold of GPU hardware is not high, and the leading level can be connected through product iteration, but downstream customers are more concerned about the ecological problem of whether it can be used and whether it is good or not. Before the launch of CUDA, GPU programming required machine code to go deep into the graphics card core to complete the task, and after the launch, it is equivalent to packaging complex graphics programming into a simple interface for the benefit of developers, and has become the most developed and extensive ecosystem so far, and is currently the most suitable GPU architecture for deep learning and AI training. After its launch in 2007, NVIDIA has continuously improved and updated, derived various toolkits, software environments, built a complete ecosystem, and cooperated with many customers to build subdivision acceleration libraries and AI training models, and has accumulated 300 acceleration libraries and 400 AI models. Especially after deep learning became mainstream, NVIDIA improved performance with the best efficiency through targeted optimization, such as supporting mixed-precision training and inference, adding Tensor Core to the GPU to improve convolutional computing power, and the latest Transformer Engine in the H100 GPU to improve the performance of related models. These investments include co-design of software and silicon architecture, allowing NVIDIA to stay ahead of the curve in performance with minimal cost. And even the ROCm platform of AMD, NVIDIA's biggest competitor, still has a gap in user ecology and performance optimization. As a complete GPU solution, CUDA provides a direct access interface to the hardware, greatly reducing the development threshold, and this easy-to-use software ecosystem that fully leverages the potential of the chip architecture allows NVIDIA to have a huge influence in the large model community. Because CUDA has a mature and well-performing underlying software architecture, almost all deep learning training and inference frameworks take the support and optimization of NVIDIA GPUs as a necessary goal, helping NVIDIA to continue to be in a leading position.

NVIDIA's leading position is solid. NVIDIA will continue to be in the leading position with good hardware performance and perfect CUDA ecosystem, but the challengers who started late are also catching up, and a diversified competitive landscape is expected to emerge in the future. In terms of the training market, NVIDIA high-computing power GPU is the mainstream choice for AI training, Google TPU is facing the limitation of versatility, and AMD has an ecological construction gap, but under the impact of the two and the competition of cloud manufacturers' self-developed chips, the AI training market may also change its pattern. In terms of the inference market, GPUs have good ecological continuity and still dominate the mainstream, such as NVIDIA's product Tesla T4 for the inference market The chip contains 2560 CUDA cores, and the performance reaches FP64 0.25 TFLOPS, FP32 8.1TFLOPS, INT8 up to 130 TOPS, which can provide multi-precision inference performance, and 40 times better than CPU low latency high throughput, which can satisfy more requests in real time. However, other solutions have advantages in cost and power consumption, the competition pattern in specific markets is relatively fierce, and the chip performance requirements of different workloads are different, T4 PCIe is expected to have the coexistence of various chips.

Domestic computing power chip manufacturers have a better opportunity to enter the game. The market demand for domestic computing power chips is huge, the domestic artificial intelligence ecological environment is better, the pace in the field of AI applications is at the forefront of the world, domestic GPU manufacturers have fertile soil for incubation and development, and the diversified needs of domestic manufacturers' supply chains have brought about the adaptation window period of domestic AI chip manufacturers, especially the early stage of the current large model development is the golden window period of adaptation. Among them, Cambrian, Huawei and other compatible CUDA and self-built ecosystems are the two major trends in the development of domestic manufacturers, which have great competitiveness potential. In the short term, domestic manufacturers are compatible with NVIDIA CUDA, which can reduce the difficulty of development and migration, and then quickly realize client import. At the same time, it is necessary to avoid NVIDIA's absolute superiority and form differentiated competition in chip design structure; In the long run, if domestic GPUs are completely dependent on the CUDA ecosystem, hardware updates will have to be bound to NVIDIA's development process, and AMD and Google should learn from AMD and Google to build their own ecosystems, carry out a platform layout combining software and hardware, and create the ability to quickly land vertical solutions in different fields to forge their own ecosystem core barriers. It is expected that domestic manufacturers with efficient hardware performance and the ability to build ecosystems that meet downstream needs are expected to stand out.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

2.3 Advanced packaging has become a cost-effective alternative, and the application potential of storage and computing integration is huge

2.3.1 Advanced packaging: innovative direction in the post-Moore's Law era, cost-effective alternative to advanced processes

Large computing power chips require continuous improvement in performance, and cost-effective solutions are urgently needed in the post-Moore era. With the increase of large model parameters, the demand for computing power of AI large models has increased significantly, and the performance improvement of large computing power chips such as GPUs has encountered two major bottlenecks: on the one hand, Moore's Law gradually fails after entering 28nm, and the cost of advanced processes increases rapidly. According to IBS statistics, after reaching the 28nm process node, if the number of process nodes continues to be reduced, the manufacturing cost per million crystal transistors does not fall but rises, and Moore's Law begins to fail. Moreover, the R&D expenses of chips using advanced processes have increased significantly, and the R&D expenses of chips in 5nm processes have increased to 542 million US dollars, almost 10.6 times the R&D expenses of 28nm chips, and the high R&D threshold has further reduced the application scope of advanced processes. On the other hand, memory bandwidth grows slowly, limiting processor performance. In traditional PCB packages, trace density and signal transmission rate are difficult to increase, so memory bandwidth slowly increases, resulting in development from memory bandwidth much slower than the speed of processor logic circuits, resulting in "memory wall" problems.

In order to realize the heterogeneous integrated Chiplet package, a series of advanced packaging processes such as 2D/2.1D/2.3D/2.5D/3D are required. The different layers of advanced packaging are divided based on the physical structure and electrical connection of multiple chip stacks, such as chips in 2D packages connected directly to the substrate, while other packages are interconnected with different forms of interposer. Among them, 2.5D packaging is commonly used for the package interconnection between computing core and HBM, and 3D packaging is commonly used in multilayer stacking of HBM's memory and is expected to be used for heterogeneous integration of different ICs.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

1) CoWoS: 2.5D package important solution, realize the interconnection of computing core and HBM package

The computing core and HBM are interconnected through 2.5D packaging, and the CoWoS packaging technology developed by TSMC is a widely used solution. TSMC introduced CoWoS technology in 2011 and first applied it to Xilinx's FPGAs in 2012. Since then, Huawei's HiSilicon, NVIDIA, Google and other manufacturers have adopted CoWoS, such as GP100 (P100 graphics core) and TPU 2.0. Today, CoWoS has become a widely used 2.5D packaging technology in the field of HPC and AI computing, and the vast majority of high-performance chips using HBM, including most of the startup's AI training chips, use CoWoS technology.

CoWoS-S provides system integration for advanced SoCs and HBM based on silicon interposer, and is widely used in the packaging of computing power chips such as GPUs. CoWoS-S is characterized by a high-performance subsystem that mixes the wideband memory module HBM (High Bandwidth Memory) and the large-scale SoC, connecting HBM and the SoC through the Si interposer to achieve wideband memory access. CoWoS-S was first developed in 2011 and has gone through 5 generations of development. Initially, the silicon chips mounted on the interposer were multiple logic chips, and the Xilinx high-end FPGA "7V2000T" using this technology was equipped with four FPGA logic chips in CoWoS-S. Generation 3 begins to support mixed loading of logic and memory. 5th Generation CoWoS-S technology uses a completely new TSV solution, thicker copper connections, 20 times more transistors than Gen 3, silicon interposers expanded to 2500mm2, equivalent to 3x the mask area, space for 8 HBM2E stacks, and capacities up to 128 GB. Expected in 2023, the 6th generation technology will pack 2 compute cores on a substrate and up to 12 HBM cache chips on-board.

CoWoS helped TSMC obtain orders for high-performance computing chips such as NVIDIA and AMD. ACCORDING TO DIGITIMES, MICROSOFT HAS APPROACHED TSMC AND ITS ECOSYSTEM PARTNERS TO DISCUSS USING THE CoWoS package for its own ai chips. NVIDIA's high-end GPUs all use CoWoS packaging technology to bring together GPU chips and HBM. The Tesla P100 delivers more than three times the memory performance of NVIDIA Maxwell architecture by adding CoWoS third-generation technology with HBM2 to tightly integrate compute performance and data in the same package. V100, A100, and other high-end GPUs are all equipped with TSMC CoWoS package, equipped with 32 GB HBM2 and 40GB HBM2E memory, respectively, and the new Hopper architecture H100 GPU also adopts CoWoS package, with 80GB HBM3 memory and ultra-high 3.2TB/s memory bandwidth. AMD will also revert to the CoWoS package. ACCORDING TO DIGITIMES, AMD MI 200 WAS ORIGINALLY PROVIDED BY ESUN GROUP AND ITS SILICON PRODUCTS, USING FO-EB ADVANCED PACKAGING (FAN-OUT EMBEDDED BRIDGE), WHILE THE NEW MI SERIES DATA CENTER ACCELERATOR CHIP WILL RE-USE TSMC'S ADVANCED PACKAGE CoWoS. The Aldebaran GPU-based MI250 or 5th generation CoWoS packaging technology enables ultra-high-performance configurations such as 128GB HBM2E memory.

2) HBM: 3D packaging creates multi-layer stacked memory, breaking through capacity and bandwidth bottlenecks

HBM stacks multiple DRAM dies vertically via TSVs in a 3D package. In the post-Moore era, storage bandwidth restricts the effective bandwidth of the computing system, resulting in limited chip computing performance improvement, HBM came into being, unlike traditional DRAM, HBM is a 3D structure, which uses TSV technology to stack several DRAM dies to form a cubic structure, that is, thousands of tiny holes are mounted on the DRAM chip and connected to the upper and lower chips through vertical electrodes; Below DRAM is the DRAM logical control unit, which controls DRAM. From a technical point of view, HBM has accelerated DRAM from traditional 2D to stereoscopic 3D, making full use of space and reducing area, in line with the development trend of miniaturization and integration in the semiconductor industry. HBM and silicon interconnect technologies break through memory capacity and bandwidth bottlenecks and are considered next-generation DRAM solutions. Compared to traditional packaging methods, TSV technology can reduce volume by 30% and reduce energy consumption by 50%.

HBM has significantly increased the number of data transmission lines compared to traditional memory data transmission lines. Memory bandwidth refers to the amount of data that can be transferred per unit of time, and the easiest way to increase the bandwidth is to increase the number of data transmission lines. In a typical DRAM, each chip has eight DQ pins 2, or data input/output pins. After composing the DIMM3 module unit, there are a total of 64 DQ pins. However, as system requirements for things like DRAM and processing speed increase, so does the amount of data transferred. Therefore, the number of DQ pins (the number of entrances and exits at station D) no longer guarantees that data will pass through. HBM has up to 1,024 DQ pins thanks to system-in-package (SIP)4 and through-silicon vias (TSV) technology, but its form factor (physical area) is more than 10 times smaller than standard DRAM. Because traditional DRAM requires a lot of space to communicate with processors such as CPUs and GPUs, and they need to be connected via wire bonding 5 or PCB trace 6, it is not possible for DRAM to process large amounts of data in parallel. In contrast, HBM products can communicate over very short distances, adding DQ paths, significantly speeding up signal transmission between stacked DRAM, enabling low-power, high-speed data transmission.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

HBM is becoming standard on AI server GPUs. The need for AI servers to process large amounts of data in a short period of time places higher demands on bandwidth, and HBM is an important solution. The AI server GPU market is dominated by NVIDIA H100, A100, A800, and AMD MI250 and MI250X series, which are basically equipped with HBM. HBM's solution has evolved into a more mainstream high-performance computing field that extends high-bandwidth solutions. The samples of SK hynix HBM3 memory have passed NVIDIA's performance evaluation work and will be officially supplied to NVIDIA in June 2022, and the latest H100 NVL GPU dedicated to ChatGPT released at the 2023 GTC Conference is also equipped with 188GB HBM3e memory; Rambus HBM3 may be tape-out in 2023 and will be used in data centers, AI, HPC and other fields. IDC data shows that in 2019, the maximum number of GPGPUs per machine of China's AI-accelerated servers reached 20, and the weighted average was about 8 per unit. HBM's memory capacity with a single GPU reaches 80GB, corresponding to a value of about $800.

SK Hynix is a pioneer in HBM's development and holds a leading position in technology development and market share. In 2014, SK Hynix and AMD jointly developed the world's first HBM product. SK Hynix's HBM3 went into mass production seven months after its release and will be installed on the NVIDIA H100. According to BussinessKorea, SK Hynix has gained 60%-70% market share in the HBM market. After SK Hynix, Samsung and Micron launched their own HBM products, iterating to HBM3 and HBM2E respectively. Foundries, including TSMC and GF, are also working on HBM-related packaging technologies. With the improved performance of HBM3, the market space is broad in the future. In terms of bits, HBM currently accounts for only about 1.5% of the entire DRAM market, and there is a lot of room for penetration improvement. While pushing AI chips such as GPUs to the peak, it has also greatly driven the market demand for a new generation of memory chips HBM (high bandwidth memory), it is reported that since the beginning of 2023, Samsung, SK Hynix's HBM orders have increased rapidly, and prices have also risen. According to TrendForce Consulting, the HBM market is expected to grow to more than 40-45% CAGR from 2023 to 2025, and the market size is expected to grow rapidly to $2.5 billion by 2025.

3) 3D IC: Multi-chip vertical stacking enhances interconnection bandwidth, and has great potential for future development

3D ICs refer to stacking multiple device layers on a single chip using the FAB process, including stacking between multiple Logic chips. Compared to 2.5D packages, 3D IC packages differ in interconnect methods. The 2.5D package connects the chips through the TSV conversion board, while the 3D IC package stacks multiple chips vertically together and interconnects the chips through direct bonding technology. In a 2.5D structure, two or more active semiconductor chips are placed side-by-side on a silicon interposer to achieve extremely high chip-to-chip interconnect density. In 3D structures, active chips are integrated through chip stacks to achieve the shortest interconnection and smallest package size. On the other hand, the manufacturing process of 2.5D packaging and 3D IC packaging is also different, 2.5D packaging requires the manufacture of silicon-based intermediaries, and requires complex process steps such as lithography technology; 3D IC packaging, on the other hand, requires difficult manufacturing process steps such as direct bonding. Current mainstream 3D IC packaging products include TSMC SoIC technology, Intel Foveros technology, and Samsung X-Cube technology.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

2.3.2 Storage and computing integration: Solving the traditional von Neumann architecture "storage wall", the energy efficiency ratio has great potential

The integration of storage and computing is expected to solve the "storage wall" under the traditional von Neumann architecture. Since the design of the processor is mainly to improve the computing speed, the storage pays more attention to capacity improvement and cost optimization, and the performance mismatch between "storage" and "calculation" leads to problems such as low memory bandwidth, extended time, and high power consumption, which is commonly known as "storage wall" and "power consumption wall". The more intensive the visits, the more serious the problem of the "wall" and the more difficult it is to improve the computing power. With the rapid rise of memory-intensive applications represented by artificial intelligence computing units, the memory access delay and power consumption overhead cannot be ignored, and the transformation of computing architecture is particularly urgent. As a new type of computing power, storage and computing integration refers to the integration of computing unit and storage unit, which can directly perform calculations while completing the data storage function, which is expected to solve the problems of "storage wall" and "power consumption wall" under the traditional von Neumann architecture, and is expected to become an advanced application technology in the era of artificial intelligence with its huge energy efficiency ratio improvement potential. Storage wall: Slow data handling and high energy consumption are the key bottlenecks of high-speed computing. Extracting data from memory outside the processing unit, the handling time is often hundreds or thousands of times the computing time, and the useless energy consumption of the whole process is about 60%-90%, and the energy efficiency is very low.

PIM: Through Silicon Via (TSV, 2010) technology to cram a computing unit between the upper and lower banks of memory. CIM: Computing operations are performed by independent computing units located inside the memory chip/region, and the storage and computation can be analog or digital. This route is generally used for algorithm calculation of scenarios with fixed algorithms. At present, the main route is based on NOR flash, and in most cases the storage capacity is small, which makes the NOR flash single-chip computing power reach more than 1TOPS The cost of the device is large, usually the industry's large computing power is generally 20-100TOPS or more. Other memory, including SRAM, RRAM, etc., can be used to achieve the integration of storage and computing with large computing power.

Scientific research institutes and leading manufacturers actively layout, the future market potential is greater. In 2011, the storage-computing integrated chip began to attract the attention of the academic community, and from 2016 to 2017, it became a hot topic in the academic circle, followed by academic bigwigs and industry leaders who began their commercial exploration. In terms of scientific research institutes, Professor Xie Yuan's team at the University of California, Santa Barbara, is committed to the research on the function of realizing computing in the new memory device ReRAM (resistive memory), that is, the PRIME architecture. Professor Liu Yongpan's team and Professor Wang Yu's team of Tsinghua University have participated in the research and development of PRIME architecture, and have realized a neural network integrating computing and storage in the 150nm process and a neural network integrating computing and storage in a resistive memory array, reducing power consumption by 20 times and increasing speed by 50 times. In addition, Tsinghua University and SK Hynix jointly established a joint research center for intelligent storage and computing chips, which will be committed to the research and development of storage-computing integration and near-storage processing technologies in the next five years. In terms of industrial applications, Intel, Bosch, Micron, Lam Research, Applied Materials, Microsoft, Amazon, and SoftBank have all invested in NOR flash memory integrated chips. Among them, Intel released Optane SSDs with off-chip storage technology to enable high-speed data transfer between CPU and hard disk, balancing the price/performance ratio of large-scale memory workloads such as advanced analytics and artificial intelligence. SK Hynix announced the development results of in-memory computing - DRAM in-memory computing based on GDDR interface at this year's ISSCC and presented a sample of its first in-memory computing technology product - GDDR6-AiM. According to Qubit Think Tank, large-scale mass production of large-scale computing power chips based on storage and computing integration will be achieved in 2030, and application scenarios cover big data retrieval, protein/gene analysis, data encryption, image processing, etc. In 2030, the market size of small and medium-sized computing power chips based on storage-computing integrated technology will be about 106.9 billion yuan, and the market size of large computing power chips based on storage-computing integrated technology will be about 6.7 billion yuan, with a total market size of about 113.6 billion yuan.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

Third, the penetration rate of AI servers has increased rapidly

3.1 AI server is the most important hardware of computing infrastructure, and the main cost of training comes from GPU chips

3.1.1 The AI server adopts a heterogeneous architecture, and the mainstream structure is CPU+multiple GPUs

Compared with the vast majority of the space allocated to the CPU of ordinary servers, AI servers are heterogeneous servers, which can be combined according to the scope of application in heterogeneous ways, generally adopting CPU+multiple GPU architectures, and also combining CPU+TPU, CPU+other acceleration cards. Compared with ordinary servers, AI servers are better at parallel computing, with advantages such as high bandwidth, superior performance, and low energy consumption. In the pre-training of large models, on the one hand, the understanding of the text context is emphasized, and on the other hand, there are a large number of vector and matrix calculations on the algorithm, which makes the AI server with parallel computing better at handling the pre-training task of large models. As an emerging industry in the digital economy, artificial intelligence and general large models have driven a large amount of computing power demand and have become one of the most important hardware in domestic computing power infrastructure construction.

Heterogeneous servers with GPUs as the core will become mainstream in the future. Compared with the internal architecture of the CPU and GPU, the CPU adopts a monolithic ALU (operation unit), and a large amount of space is used for the control unit and cache, and the serial computing power is strong; GPUs, on the other hand, use a large number of discrete ALUs, with little space allocated to control units and caches, and have strong parallel computing capabilities. Since the tasks of image recognition, visual effects processing, virtual reality, and large model training all include a large number of simple repeated calculations, matrix calculations, etc., it is more suitable for processing with heterogeneous AI servers equipped with GPUs, and with the intelligent transformation of enterprises and the rise of general large models, heterogeneous AI servers with GPUs as the core will occupy an increasingly important position in the construction of computing power infrastructure.

3.1.2 Dismantling of upstream and downstream & cost structure of AI server industry chain

The upstream of the AI server industry chain is mainly composed of server component manufacturers, of which CPU, GPU as the core components, mainly supplied by Intel, AMD, Nvidia, domestic suppliers account for less, other components including memory, SSD, PCB, optical module, power supply, etc. There are more domestic suppliers; The middle stream of the industry chain includes motherboard integrators and server manufacturers, which first integrate many chips by motherboard integrators and then assemble them into complete machines for sale by server manufacturers. At present, domestic enterprises occupy an important position among server manufacturers; The downstream of the industry chain mainly includes Internet manufacturers led by BAT, the three major operators of mobile, telecom and China Unicom, and many government and enterprise customers (mainly concentrated in the three major industries of government, finance and medical care, because they need AI customer service and other related products the most).

The cost of general-purpose servers is mainly composed of CPU, storage, memory and other parts, while the cost composition of AI servers will also change due to the heterogeneous architecture composed of multiple GPU chips. Specifically, training AI servers have stronger computing power due to the need to process a large amount of data, and the price of training chips is significantly higher than that of inference chips. More than 70% of the cost of training AI servers are composed of GPUs, and the remaining CPU, storage, memory, etc. account for a relatively small proportion. For inference-oriented servers, the GPU cost is about 2-3%, and the overall cost composition is similar to that of the high-performance type.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

3.2 The AI server market is expected to maintain rapid growth, and the current orders are full

3.2.1 Global AI servers will maintain rapid growth in the past three years

According to IDC data, the global AI server market size in 2022 will be $20.2 billion, a year-on-year increase of 29.8%, accounting for 16.4% of the server market size, a year-on-year increase of 1.2pct. We believe that with the continuous increase in data volume, the increase in the number of large model participants and individual model parameters, and the promotion of digital transformation, the AI server market will continue to grow rapidly. Combined with Section 2.1.3 Exhibit 45 We estimate the incremental demand for AI chips brought by large language models, we believe that global AI servers are expected to achieve rapid growth from 2023 to 2025. According to the actual demand of enterprises for AI servers, although the demand for inference is more vigorous, from the perspective of procurement, it is more inclined to train / inference integrated servers equipped with A100/A800 GPUs. Therefore, combined with Section 3.1.2 for the cost disassembly of training and inference AI servers, we estimate that the incremental GPU demand from 2023 to 2025 accounts for about 70% of the cost of AI servers. In addition, with the launch of new generation chips including H100/H800 and algorithm iterative upgrades, it is expected to bring overall efficiency improvements, and the incremental market space of AI servers may be slightly lower than expected by large model demand. Combined with the above assumptions, we believe that the global AI server market will maintain rapid growth in the next three years, with a market size of $395/890/160.1 billion respectively, corresponding to a growth rate of 96%/125%/80%. Since major downstream customers such as Internet vendors tend to stock up for potential future demand in advance, the market growth rate in 2023 may be higher than the forecast, and the market growth rate in 2024 and 2025 may be slightly lower than the forecast.

3.2.2 China's AI servers will maintain rapid growth in the past three years

According to IDC, China's AI server market was worth $6.7 billion in 2022, up 24% year-on-year. GPU servers dominate the market, with a market share of 89% to $6 billion. Meanwhile, non-GPU-accelerated servers such as NPUs, ASICs, and FPGAs accounted for 11% market share at a year-over-year growth rate of 12% to $700 million. Before the arrival of the wave of big models, under the influence of policies such as the digital economy and "counting from east to west", China's AI computing power achieved a year-on-year rapid growth of 68.2% in 2021. According to the "2021-2022 Global Computing Power Index Assessment Report" jointly launched by Inspur, International Data Corporation (IDC) and Tsinghua University, China's AI computing power development leads the world, and the scale of AI server expenditure ranks first in the world. We believe that under the wave of large models, coupled with the construction of data centers and intelligent computing centers driven by the digital economy and the East Data and West Computing, the share of the mainland in the AI server market is expected to further increase at about 1/3 of the current global proportion. We expect that from 2023 to 2025, combined with the forecast of the global AI server market size and the assumption that the proportion of mainland China will continue to increase, the size of the mainland AI server market is expected to reach 134/307/56.1 billion US dollars, a year-on-year increase of 101%/128%/83%. Since major downstream customers such as Internet vendors tend to stock up for potential future demand, the market growth rate in 2023 may be higher than the forecast, while the market growth rate in 2024 and 2025 may be slightly lower than the forecast.

3.2.3 At present, AI server vendors have sufficient orders in hand, and the certainty of high growth of the AI server market is strong

Since the wave of large models driven by ChatGPT last year, leading Internet manufacturers at home and abroad have joined the arms race for AI computing power and increased resource investment in AI computing power. The high prosperity of AI computing power has led to the explosive growth of AI server demand, which is reflected in the order side of AI server manufacturers. The wave of leading manufacturers ranking first in the world in terms of AI server shipments mentioned that the AI server market has ushered in significant growth since the first quarter, and customers' focus has shifted from price to whether they can meet their needs in a timely manner. In addition, according to Unigroup's reply on the investor interactive platform, its AI server orders have been greatly increased in the first quarter of this year, there is no problem in production capacity to meet market demand, and GPU servers optimized for GPT scenarios have been developed and are expected to be fully listed in the second quarter of this year. As a global ICT equipment leader, according to its latest financial data, ISG (Infrastructure Solutions Business Group) achieved a year-on-year revenue growth of 56.2% from January to March 2023, and a year-on-year increase of 36.6% in the whole fiscal year, mainly benefiting from the explosion of overseas AI server demand and the rapid growth of storage business, the company expects that the revenue growth rate of AI server revenue in the new fiscal year will be significantly faster than that of general-purpose servers, driving the revenue growth of ISG division to exceed the market average by more than 20%. Zhongke Sugon's deep layout in the field of computing power, including upstream chips, midstream server solutions, liquid cooling technology, and downstream computing power scheduling and other businesses, the company has replied many times on the investor interactive platform, will provide general computing power and intelligent computing power products and services according to user needs, with the growth of mainland computing power demand, all kinds of product sales have shown a growth trend, with the development of mainland artificial intelligence technology and industry, it is expected that the demand for intelligent computing products will gradually increase.

3.3 The concentration of the AI server market is expected to increase, and domestic manufacturers present a super strong pattern

3.3.1 Global AI Server Competitive Landscape

According to IDC data, in the global AI server market in the first half of 2022, Inspur, Dell, HP, Lenovo, and New H3C ranked in the top five with market shares of 15.1%, 14.1%, 7.7%, 5.6%, and 4.7% respectively. The market structure is relatively scattered, and the share of leading manufacturers is relatively close. In addition, because the demand side, mainly North American cloud vendors, prefers to adopt the ODM model, the share of non-brand vendors is relatively high, close to 50%.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

3.3.2 China AI Server Competitive Landscape

According to IDC data, in the mainland AI server market in 2022, Inspur Information, New H3C, and Ningchang ranked in the top three, with market shares of 47%, 11%, and 9% respectively. The market structure presents a super-strong situation, except for the wave, its share with manufacturers is relatively close. Since the domestic top manufacturers adopt the ODM-like model to serve Internet customers, the share of ODM manufacturers is low.

3.3.3 Future evolution trend of AI server competitive landscape

From the perspective of R&D and delivery of AI servers, the models and timelines of brand owners and foundries are slightly different, with brands having a longer R&D cycle but faster delivery, and foundry R&D cycles being slightly shorter but delivering products slightly longer. On May 29, NVIDIA CEO delivered a keynote speech at the Taipei International Computex 2023 conference, in which he released the prototype of the AI server made by the current Taiwanese ODM manufacturers for customer needs, and will further do customized development according to customer needs, from customized development to product delivery customers are expected to take several months. For OEM manufacturers, including Inspur, Lenovo, Xinhua and other manufacturers, the R&D cycle is relatively long, which takes nearly one year to verify, and further verification according to different customer configuration specifications. Mature products that have been validated by OEMs can be delivered faster than ODMs.

3.4 The global server market is expected to remain stable

3.4.1 General-purpose servers are still in the stage of destocking, and the global market size is expected to decline

According to a report released by research institute TrendForce on May 17, the server market demand outlook for 2023 is not good, and the forecast for global server shipments this year has been lowered again to 13.835 million units, a year-on-year decrease of 2.85%. TrendForce said that the four major Internet companies in the United States, Google, Microsoft, Meta, and Amazon, have successively reduced their server purchases; At the same time, OEMs such as Dell and HPE also lowered their annual shipment estimates in February ~ April, reducing by 15% and 12% year-on-year, respectively; In addition, due to various factors such as the international situation and economic factors, the outlook for server demand for the whole year is not good. In Q1 2023, due to off-season effects and end-of-line inventory corrections, global server shipments decreased by 15.9% sequentially. TrendForce is less confident about the recovery of the industry in the second quarter, and the peak season did not occur as expected, with a month-on-month growth forecast of only 9.23%. In addition, ESG discussions have led the four major U.S. Internet companies to extend the service life of their servers, thereby reducing procurement and controlling capital expenditures, which is also one of the factors affecting the server market. It is expected that the completion of inventory de-localization will come in the second half of this year or the first half of next year, and if the progress of inventory de-inventory is not as expected, the full-year server market size forecast may be further lowered.

3.4.2 The proportion of AI server shipments has further increased, and its contribution to the overall shipment volume of the global server market has been limited

Since the end of last year, the fire of artificial intelligence applications such as ChatGPT has caused a surge in demand for AI servers, and NVIDIA chips have been in short supply. Including Microsoft, Google, Meta, Tencent, Baidu and other domestic and foreign cloud service providers have actively increased investment in AI computing power. According to TrendForce estimates, AI server shipments will increase by 10% year-on-year in 2023, but because AI servers account for less than 10% of the number of units, the impact on the entire market is relatively limited, and it is expected that the overall global server shipments will show a flat or slightly declining trend. From the perspective of the domestic market, the construction of Internet manufacturers and intelligent computing centers has promoted the surge in demand for AI servers, and the new orders of related manufacturers in the first quarter exceeded 40% year-on-year, and the shipment amount is expected to maintain rapid growth throughout the year. Considering that the demand for the general server market is expected to pick up in the second half of the year, the market size is expected to be flat or slightly increased throughout the year, coupled with the rapid growth of AI servers, according to IDC's forecast, it is expected that the server market size is expected to achieve more than 10% growth throughout the year.

Fourth, AI is driving the demand for high-rate optical modules

In traditional data centers, the network side mainly includes the traditional tree-shaped three-tier architecture and leaf-ridge architecture. Early data centers generally adopted the traditional three-layer structure, including the access layer, aggregation layer and core layer, in which the access layer was used to connect the computing nodes and the cabinet switch, the aggregation layer was used for the interconnection of the access layer, and the core layer was used for the interconnection of the aggregation layer and realized the network connection with the external network. With the rapid increase of east-west traffic in the data center, the core layer and aggregation layer of the three-layer network architecture have increased tasks, the demand for performance improvement is high, and the equipment cost will increase significantly. Therefore, a flat leaf spine network architecture suitable for east-west traffic came into being, with leaf switches directly connected to compute nodes, spine switches equivalent to core switches, dynamically selecting multiple paths through ECMP. The leaf spine network architecture has the advantages of high bandwidth utilization, good scalability, predictable network latency, and high security, and realizes a wide range of applications in data centers.

In AI data centers, due to the large internal data traffic, the non-blocking fat tree network architecture has become one of the important requirements. NVIDIA's AI data center uses a fat-tree network architecture to achieve non-blocking functions. The basic concept of Fat Tree's network architecture is: use a large number of low-performance switches to build a large-scale non-blocking network, for any communication mode, there is always a path to make their communication bandwidth reach the bandwidth of the network card, and all switches used in the architecture are the same. Fat tree network architecture is generally used in data centers with high network requirements, such as supercomputing centers and AI data centers.

NVIDIA's A100 GPU mainly corresponds to 200G optical modules, and H100 GPUs can correspond to 400G or 800G optical modules. One Mellanox HDR 200Gb/s Infiniband network card per A100 GPU and one Mellanox NDR 400Gb/s Infiniband network card per H100 GPU. In the design of H100 SuperPOD, NVIDIA adopts an 800G optical module, and the use of 1 800G optical module in the optical port can replace 2 400G optical modules, and the 8 SerDes channels can also be integrated in the electrical port, which corresponds to the 8 100G channels of the optical port. Therefore, under this design, the channel density of the switch is increased and the physical size is significantly reduced.

NVLink bandwidth is much larger than PCIe bandwidth on the NIC side, so widening NVLink from GPU interconnect within servers to GPU interconnect between different servers will significantly increase the bandwidth of the system. In order to achieve GPU interconnection between different servers according to the NVLink protocol, in addition to the physical switch using NVSwitch chip, physical devices are also required to realize the connection between the switch and the server, then the optical module has also become an important component, which will also greatly increase the demand for 800G optical modules. Recently, NVIDIA founder and CEO Jensen Huang announced in his NVIDIA Computex 2023 speech that the generative AI engine NVIDIA DGX GH200 is now in mass production. GH200 boosts computing power with NV Link4's 900GB/s network bandwidth, and the server may use copper wires inside, but we think the servers may be connected by fiber. For a single 256 GH200 chip cluster, one GH200 on the computing side corresponds to nine 800G optical modules; For multiple GH200 clusters of 256, one GH200 on the compute side corresponds to 12 800G optical modules.

The demand for training side optical module is strongly correlated with GPU shipments, and the demand for inference side optical module is strongly correlated with data traffic. The increase in AI's demand for optical modules is mainly divided into two stages, training and inference. Among them, the network architecture on the training side is mainly based on the fat tree architecture, because in the process of large model training, the requirements for network performance are very high, and network non-blocking is one of the important requirements, such as the star vein network used by Tencent for large model training using the fat tree architecture. At the same time, we believe that most manufacturers will use the Infiniband protocol network, which has a much lower latency than Ethernet, which can improve computing efficiency and shorten model training time. The demand for training side optical modules is strongly related to the number of GPU graphics cards used, and the number of required optical modules can be obtained according to the ratio relationship between GPU and optical modules in the fat tree architecture, A100 corresponds to 200G optical modules, and H100 corresponds to 400G or 800G optical modules. On the user side, the network architecture is closer to the leaf spine architecture of traditional cloud computing data centers, which is mainly used to carry the data traffic increment brought by AI applications. Traditional cloud computing is mainly a ToB market, the number of users is not much, if there are pictures or video-related explosive AI applications in the future, on the one hand, the number of users is expected to increase significantly, on the other hand, the data traffic generated by a single user may increase significantly, so the total data traffic will skyrocket, so the computing power and traffic required for inference may actually be much larger than training, so the demand for network equipment including optical modules will play a strong support and boost.

Silicon photonics technology is silicon or silicon-based materials (Si, SiO2, SiGe) as the substrate material, using the CMOS process compatible with integrated circuits to manufacture the corresponding photonic devices and optoelectronic devices to achieve the excitation, modulation, response, etc. of light, which are widely used in optical communications, optical sensing, high-performance computing, etc. Silicon photonics in the field of data communication have also achieved large-scale commercial use, and their share is expected to continue to increase in the future. With the rapid development of data centers, the demand for optical modules has exploded, and many manufacturers have begun to vigorously develop silicon optical modules for data centers. The initial stage is the small-scale application of 40G silicon optical optical module, Intel and Luxtera's 100G silicon optical module large-scale application, the current 400G silicon optical module has achieved mass production, 800G is also in verification. At present, domestic silicon photonics module manufacturers have strong competitiveness, including Zhongji Xuchuang, Xinyisheng, Huagong Technology and other companies have self-developed silicon photonics chips, Borche Technology and other companies and overseas silicon photonics chip giants in-depth cooperation, is expected to make a breakthrough in the 800G optical module market.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

Co-packaged optics, that is, co-packaged optics, optical engine PIC and electrical engine EIC packaging technology. CPO switches are mainly divided into switch chips, SerDes, and optics, and switch bandwidth has increased 80 times in the past 10 years. The bandwidth of the switch chip increases by a factor every two years; The number and rate of SerDes for electrical interfaces are also increasing, from 10G/s to 112G/s, and from 64 channels to 512 channels in the 51.2T era. The switch bandwidth increased from 640G to 51.2T, the switch chip power consumption increased by 7.4 times, the power consumption per Serdes channel increased by 2.84 times, and the total power consumption increased by 22.7 times combined with the increase in the number of Serdes channels. CPOs, on the other hand, can reduce power consumption (a core advantage), reduce costs, and reduce size. CPO participating companies mainly include cloud service vendors, equipment vendors and chip vendors. At present, CPO still has many technical problems, such as the power consumption of the light source, the light source as one of the core components, although the external light source is more flexible in configuration, but the laser is less efficient at high temperature, so when providing light sources to multiple channels at the same time, high power brings low efficiency, and its power consumption will be higher. Moreover, the optical engine is closely arranged around the switch chip, how to effectively dissipate heat from the huge heat, how to replace the optical engine flexibly after failure, how to define the new optical connector, and other technical problems require more effective solutions. In addition, CPO products are to integrate optical modules and switches, so it will have a greater impact on the optical module and switch industry, and how to make the two industry chains better coordinate after formulating relevant product standards will also be an important challenge.

We believe that this round of optical module sector market can refer to 2016-2018H1 and 2019H2-2020H1. Datacom optical module industry in 2016-2018H1 is in a boom cycle, the stock price performance is better during the period of Zhongji Xuchuang, 2018H2-2019H1 global cloud computing and Internet giants capital expenditure ushered in adjustment, during the period the stock price also declined. North American FAAM (Facebook, Amazon, Alphabet, Microsoft) 2016-2018 Capex growth rate of 29.65%, 27.94%, 62.74%, although the growth rate of the whole year of 2018 is strong, but the growth rate has slowed down significantly since 2018Q3. After nearly 3 years (2016-2018H1) of the boom cycle, the utilization rate of cloud vendor infrastructure such as servers and optical networks is not full enough, which is equivalent to a certain "inventory" of computing, storage, and network capabilities, superimposed on the uncertainty caused by macroeconomic and Sino-US frictions, the contraction of enterprise informatization investment, the slowdown of enterprise cloud migration, and the growth pressure of Internet giants, so the growth rate of capital expenditure slowed down significantly, until the negative growth of capital expenditure in 2019Q1.

Fifth, AI will drive the market demand for switches

AI brings changes in the network architecture of data centers, and the speed and number of optical modules have increased significantly, so the number of ports and port speeds of switches have also increased accordingly. AIGC technology, represented by ChatGPT, relies on powerful AI models and massive data to produce high-quality content in multiple application scenarios, which is expected to promote the wider application of artificial intelligence. As one of the important supports of AIGC technology, computing power is the core factor affecting the development and application of AI. In addition to the strong demand for computing power hardware such as CPU/GPU, the network side has also spawned greater bandwidth requirements to match the growing traffic. Compared to the network architecture of traditional data centers, AI data network architecture brings the need for more switch ports.

The training side is likely to use Infiniband or IB-like low-latency network protocols, and the inference side is expected to use Ethernet switches. InfiniBand is an open standard high-bandwidth, low-latency, high-reliability network interconnection technology, and with the rise of artificial intelligence, it is also the preferred network interconnection technology for GPU servers. Compared with Ethernet protocol networks, Infiniband networks have certain advantages in bandwidth, latency, network reliability, and networking methods. Of course, Ethernet has better compatibility and lower cost, and can be used in various application scenarios to adapt to a variety of different device terminals. The AI training end has high latency requirements, so the training side is likely to use the Infiniband network, or the ROCE network, that is, Ethernet-based RDMA technology, which can also achieve low latency. The bandwidth of NVIDIA NVLink technology has been greatly improved, and the bidirectional bandwidth of NVLink4 can reach 900GB/s, which will also have a strong advantage on the training side. On the inference side, we believe that the network protocol can follow the Ethernet of cloud computing data centers.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

The power consumption of SerDes in the switch is greatly increased. With the increase in power consumption brought about by the increase in the bandwidth of a single SerDes, combined with the increase in the number of SerDes, the total power consumption of SerDes in the switch will increase significantly in the future. The power consumption of the network part has increased significantly in the data center: According to Facebook's calculations, with the significant increase in internal traffic in the data center, the proportion of power consumption in the network part has increased significantly, and the proportion of power consumption in the next-generation network part will increase from about 2% to about 20%. The closer the transmission distance, the lower the power consumption of SerDes. Shortening the distance that electrical signals need to travel between the switch and the optical module simplifies the functionality of the Serdes chip while reducing the transmit power of the electrical signal, thereby reducing the power consumption of SerDes.

Sixth, AI increases the demand for high-power IDC cabinets, and the liquid cooling penetration rate increases

6.1 "East Data and West Computing" coordinates the construction of the national computing power network, and the demand for cloud computing may pick up

In May 2021, the National Development and Reform Commission, the Cyberspace Administration of China, the Ministry of Industry and Information Technology, and the Energy Administration jointly issued the Implementation Plan for the Computing Power Hub of the Collaborative Innovation System of the National Integrated Big Data Center, which clearly proposed the layout of the national computing power network and the national hub node, launched the implementation of the "East Data and West Computing" project, and built a national computing power network system. The "Implementation Plan for the Computing Power Hub of the National Integrated Big Data Center Collaborative Innovation System" focuses on the major national regional development strategy, according to the energy structure, industrial layout, market development, climate environment, etc., in the Beijing-Tianjin-Hebei, Yangtze River Delta, Guangdong-Hong Kong-Macao Greater Bay Area, Chengdu-Chongqing, Guizhou, Inner Mongolia, Gansu, Ningxia and other places to build a national hub node of the national integrated computing power network, guide the intensive, large-scale and green development of data centers, and build data center clusters. The national hub nodes will further open up network transmission channels, accelerate the implementation of the "East Data and West Computing" project, and improve the level of cross-regional computing power scheduling.

According to the requirements of the "Implementation Plan for the Computing Power Hub of the Collaborative Innovation System of the National Integrated Big Data Center", nodes such as Beijing-Tianjin-Hebei, Yangtze River Delta, Guangdong-Hong Kong-Macao Greater Bay Area, and Chengdu-Chongqing have a large scale of users and strong application demand, so it is necessary to focus on coordinating the layout of data centers in and surrounding areas, optimize the supply structure of data centers, expand the space for computing power growth, meet the implementation needs of major regional development strategies, accelerate the transformation and upgrading of existing data centers within cities, and give priority to meeting business needs with high real-time requirements. Guizhou, Inner Mongolia, Gansu, Ningxia and other nodes, rich in renewable energy, suitable climate, data center green development potential, should focus on improving the quality and utilization efficiency of computing power, give full play to resource advantages, consolidate network basic support, actively undertake nationwide needs for background processing, offline analysis, storage and backup and other non-real-time computing power needs, to build a nationwide non-real-time computing power guarantee base.

According to the NDRC's statement, the overall idea of the "East Data and West Calculation" project has three aspects: first, to promote the moderate agglomeration and intensive development of data centers across the country; The second is to promote the layout and overall development of data centers from east to west; The third is to realize the gradual and rapid iteration of "East Data and West Calculation". In the current initial stage, 10 data center clusters are planned to be set up in 8 computing power hubs, delineating physical boundaries, and clarifying development goals such as green energy saving and shelf rate, such as the average shelf rate of data centers in the cluster should reach at least 65%, and the PUE of Zhangjiakou, Shaoguan, Yangtze River Delta, Wuhu, Tianfu, and Chongqing clusters is required to be below 1.25, and the PUE of Linger, Guian, Zhongwei, and Qingyang clusters is below 1.2. We believe that the data center clusters in 10 countries are more newly built projects, and most of the energy consumption indicators previously issued by various localities and the investment plans of relevant IDC companies in other regions may continue to be implemented (existing IDC suppliers originally have fewer investment arrangements in the above 10 regions), so it will bring benefits to the IDC construction industry chain.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

6.2 AI large computing power servers require high-power cabinets, liquid cooling or required

The power density of GPU servers used in AI big model training and inference operations will be greatly improved, taking the NVIDIA DGX A100 server as an example, its maximum power can reach about 6.5kW, which greatly exceeds the power level of a single ordinary CPU server of about 500W. In this case, on the one hand, it is necessary to build a new ultra-high-power cabinet, on the other hand, in order to reduce PUE, it is expected that the penetration rate of liquid-cooled temperature control will increase rapidly. The PUE value is an important indicator of IDC's energy efficiency. PUE is calculated as the total power consumption of the data center compared to the power consumption of IT equipment, and the closer the value is to 1, the more energy efficient the IDC is. According to CCID Consulting's statistics, about 43% of the energy consumption of China's data centers in 2019 was used for the heat dissipation of IT equipment, which is basically the same as 45% of the energy consumption of IT equipment itself. Therefore, the heat dissipation energy consumption of equipment becomes a key factor in reducing PUE.

Liquid-cooled data centers are suitable for providing high-density computing power, improving single-cabinet deployment density, and improving the utilization rate of data center unit area. According to the data of the "Cold Plate Liquid Cooled Server Reliability White Paper", liquid can transfer heat faster than air (20-25 times difference) and can take away more heat (2000-3000 times difference), providing a better solution for high-density deployment. Generally, a single cabinet in a liquid-cooled data center can support more than 30kW of heat dissipation capacity, and can better evolve to more than 100kW. The single-cabinet density of natural air-cooled data centers generally only supports 8kW-10kW, and the micromodule with isolated hot and cold air ducts plus water-cooled air conditioning will greatly reduce the cost performance above 15kW, and the heat dissipation capacity and economy of liquid cooling have obvious advantages in comparison. Due to the development of AIGC, high-power AI server shipments are expected to grow rapidly, which in turn requires a significant increase in the power of a single cabinet, and the industry has begun to build 20kW and 30kW power cabinets on a large scale. At the same time, data center PUE reduction is also a rigid need. In this context, liquid cooling is expected to become the main cooling solution for AI large computing power data centers due to the obvious shortcomings of air-cooled technology in high-power cabinet refrigeration.

Whether it is cold plate liquid cooling or immersion liquid cooling, data center temperature control and ICT equipment manufacturers need to cooperate with each other, and the market has previously doubted the cooperation of the industrial chain. At present, driven by the demand for AI computing power, server manufacturers have begun to vigorously deploy liquid-cooled server products, and the industrialization progress of liquid-cooled is expected to accelerate. In 2022, Inspur will incorporate "All in liquid cooling" into the company's development strategy, and the full-stack layout of liquid cooling will realize the full line of products of the four major series of general-purpose servers, high-density servers, full-cabinet servers and AI servers all support cold plate liquid cooling, and build Asia's largest liquid-cooled data center R&D and production base with an annual production capacity of 100,000 units, realizing the industry's first large-scale delivery of cold-plate liquid-cooled complete cabinets. In 2022, ZTE released the "ZTE Liquid Cooling Technology White Paper", the company's fully liquid cooling data center project won the 2022 CDCC Data Center Scientific and Technological Achievement Award, and recently the company's G5 series servers were launched in the overseas market for the first time in Thailand, supporting liquid cooling technology and using cold plate liquid cooling heat dissipation.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

6.3 The demand for AI computing power is expected to promote the large-scale development of submarine data centers

We believe that submarine data centers may usher in a key node of industrialization. First, on December 14, 2022, the China Communications Industry Association approved and published the standard T/CA 303-2022 "Code for the Design of Underwater Data Centers". Second, China and the world have made great progress in offshore wind power generation in the past two years, and offshore data centers can absorb offshore wind power nearby. Third, the demand for computing power and IDC in eastern coastal cities is strong, and submarine data centers can meet the needs nearby. Fourth, AIGC requires a single cabinet power consumption may reach tens of kW, the power of a single cabinet in the submarine data center can reach about 35kW, using seawater cooling, no compressor operation, single cabin PUE can be less than 1.10, and no cooling tower is required, which can save a lot of water resources. Fifth, the world's leading submarine data center layout is Microsoft, which began to launch tests in 2015, both tests were successful, and in 2022, the United States Subsea Cloud US plans to launch a commercial submarine data center.

6.3.1 Relevant design specifications for domestic submarine data centers have been released

The first domestic standard for underwater data centers has been published and has begun to be implemented. On December 14, 2022, the China Communications Industry Association approved the publication of standard T/CA 303-2022 "Code for the Design of Underwater Data Centers", which came into effect on January 1, 2023. The standard follows the principles of openness, fairness, transparency, consensus and promotion of trade and exchanges, and is formulated in accordance with the standard formulation procedure documents published by the national group standard information platform, and is jointly drafted by Shenzhen Hailan Cloud Data Center Technology Co., Ltd., China Communications Industry Association Data Center Committee, China Three Gorges Group Co., Ltd., Offshore Oil Engineering Co., Ltd., Verti Technology Co., Ltd., Tsinghua University and other units. The standard is applicable to guide and regulate the design of new and renovated underwater data centers deployed at sea. "Code for the Design of Underwater Data Center" stipulates the classification and performance requirements, site selection and system composition, underwater cabin system design requirements, electrical system design requirements, air conditioning system design requirements, monitoring system design requirements, network and wiring system design requirements, power and communication cable system design requirements, fire and safety system design requirements of underwater data centers based on the characteristics of underwater data center underwater data center such as underwater sealing, oxygen and dust free, space restriction and unattended. Underwater data centers deployed in lakes, rivers, etc. can also be implemented as such.

6.3.2 Offshore wind has already achieved large-scale development and is expected to combine with subsea data centers to generate new business models

After experiencing the 2020-2021 sea breeze rush tide, the domestic offshore wind power industry chain has accelerated its maturity. Continental offshore wind exploration originated in 2007. On November 8 of that year, the first offshore wind power project installed with one Goldwind 1.5 MW wind turbine was completed in the Bohai Suizhong Oilfield, which has experienced more than ten years of development, and at the end of 2020, the installed capacity of offshore wind power in mainland China reached 9.89GW. On May 24, 2019, the National Development and Reform Commission issued the Notice on Improving the Wind Power Feed-in Tariff Policy, proposing to change the benchmark feed-in tariff for offshore wind power to a guidance price, and all newly approved offshore wind power projects will determine the feed-in tariff through competition; For offshore wind power projects that have been approved before the end of 2018, if all units are connected to the grid before the end of 2021, the on-grid tariff at the time of approval (about 0.85 yuan/kWh, the subsidy strength exceeds 0.4 yuan/kWh), the very attractive subsidy price, has brought about a rush to install offshore wind power, only in 2021, China's offshore wind power installed capacity exceeded 16.9GW, and the rush to install also accelerated the maturity of the mainland offshore wind industry chain, and the cost level of mainland offshore wind power per GW in 2010 was about 240 About 12-13 billion yuan, which has now dropped to 12-13 billion yuan. By the end of 2022, China's offshore wind capacity reached 30.51GW.

6.3.3 The submarine data center has outstanding energy-saving advantages and can better meet the strong computing power demand in coastal areas

Subsea Data Center UDC is a type of underwater data center. An undersea data center is a new type of data center that installs information infrastructure such as servers in a sealed pressure vessel on the seabed, uses flowing seawater to dissipate heat, and uses submarine composite cables to supply power and transmit data back to the Internet. The submarine data center has significant green and low-carbon characteristics such as energy saving, land saving, low latency, safety and reliability, and has many advantages, which is in line with the trend of green and low-carbon development. Subsea data centers are generally built 10-20 kilometers along the coastline, which can meet the requirements of high computing power, data storage and low latency in coastal areas. Underwater data centers offer a solution for low-latency connectivity that reduces the time it takes for data to travel between source and destination. Data centers in the western inland region can perform some calculations with low latency requirements for cold data storage, but for higher latency requirements, it is necessary to find data center resources in the eastern coastal area. Computing power is in high demand in eastern coastal cities, and undersea data centers can take advantage of relatively close distances to provide low-latency connectivity to a large coastal population, as more than 50% of the world's population lives within 120 miles (200 kilometers) of the coast.

Artificial intelligence industry in-depth report: in the era of computing power, the AI computing power industry chain panorama combing

6.3.4 Global submarine data center construction case - Microsoft Natick project

The world's first submarine data center was developed by Microsoft Corporation in the United States in 2015, and Microsoft's research experiment to build underwater data centers and place servers in the ocean - the Natick project has completed a four-month underwater proof-of-concept test and a two-year underwater data center test. The purpose of the first phase of the project is to effectively test the cooling system of the underwater data centre. The purpose of the second phase is to determine the manufacturing feasibility of full-scale underwater data center modules and the economic feasibility of deploying them within 90 days. In addition, over a two-year period, Microsoft was able to test and monitor the performance and reliability of its underwater data center servers.

The future third phase of Microsoft's Natick project has been described as a "pilot." Specifically, Microsoft will build a "larger-scale" underwater data center for Phase 3 of the Natick project, which "could be multiple ships" and "could be a different deployment technology than Phase 2." Phase 3 of the soft Natick project will be placed at depths greater than 117 feet (36 meters). Microsoft has explored the potential of subsea data center development through the Natick project. The second phase of the Natick project test results showed that the subsea data center had a PUE of 1.07, which is one-eighth of the failure rate of the ground data center. At the same time, Microsoft found through the Natick project that underwater data centers can be deployed quickly, sealed in submarine-like pipes, and run on the seabed for years without any manual on-site maintenance. Preliminary analysis shows that the main reason for the server's superior performance underwater is the avoidance of moisture and oxygen corrosion. However, it should be noted that there are also bottlenecks in the development of submarine data centers. First, submarine data centers require high construction costs, including the purchase of data cabins, servers, wiring, power distribution systems, and communication systems. Second, the technology of submarine data centers is very difficult, and it is necessary to have such technologies as construction, anti-tide, anti-wave, and anti-noise in the marine environment. Third, the operation and maintenance of submarine data centers is complicated, and due to the complex and changeable conditions of the submarine environment, special technology and equipment are required to complete the operation and maintenance work.

(This article is for informational purposes only and does not represent any investment advice from us.) For information, please refer to the original report. )

Selected report source: [Future Think Tank]. 「Link」

Read on