laitimes

AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era

(Report Producer/Author: Huaxi Securities, Liu Zejing, Meng Lingruqi)

Liquid cooling has gone from "optional" to "mandatory"

Computing power has exploded in the era of large models, and the demand for optical modules has increased

Large model parameters show exponential scale, detonating massive computing power demand: According to data from the Financial Associated Press and OpenAI, there is a huge gap in computing power under the wave of ChatGPT, and according to OpenAI data, the growth rate of model computing far exceeds the growth rate of artificial intelligence hardware computing power, with a 10,000-fold gap. The growth of computing scale has driven the demand for the single-point computing power of AI training chips, and put forward higher requirements for data transmission speed. According to Zhidong data, in the past five years, the development of large models has shown an exponential level, and some large models have reached the trillion level, so the demand for computing power has also risen.

With the growth of large model parameters, the interconnection of data centers has become the core key: in the context of trillions of big data, the computing power of a single card/single server can no longer support the training of huge models, and the interconnection between chips has become the top priority, and the effect of clusters is particularly critical. The higher the efficiency of the interconnection, the higher the transmission rate, the higher the optical module has become the first choice at present.

Liquid cooling has changed from "optional" to "mandatory", and the inflection point of liquid cooling has arrived

Why liquid cooling is the next optical module of AI: the iterative law of electronic product upgrades, from the growth path of optical modules confirmed above, in the era of AI high-speed interconnection, high computing power matches the efficient transmission architecture. From 40G instead of 10G, 100G instead of 40G, 400G instead of 100G, 800G instead of 400G, 1.6T is expected to replace 800G, the road to upgrade will never stop, and all experience the process from "luxury" and "early adopter" to "popular" and "just need". Similarly, the field of heat dissipation is the same, and related technologies are also improving, first natural air cooling, air conditioning fans, heat sinks, and then liquid cooling. Liquid cooling also has spray type, cold plate type and immersion type, etc.

Why liquid cooling has changed from "optional" to "mandatory":

Chip: The influence of ambient temperature on the chip cannot be ignored, and in a high-temperature environment, the electronic components inside the chip will be lost due to long-term work, thereby shortening the service life of the chip. The increase in temperature will cause the thermal expansion of materials such as capacitors, resistors, and metal wires, which in turn will lead to their mechanical deformation and structural damage, and ultimately affect the normal operation of the chip. According to the previous news, and from the chip alone, the heat dissipation power of the extreme chip of air cooling is 800W, and some of Nvidia's products have broken through the air cooling capacity and gone online.

Data center: According to the previous news, the density of a single cabinet in a natural air-cooled data center generally only supports 8-10kW, and the cost performance drops significantly after the cabinet power exceeds 10kW. According to the data of the non-network, the density of a single cabinet of AI cluster computing power is expected to reach 20-50kW in 2025, which is far beyond the upper limit of air cooling.

Liquid cooling policy for the liquid cooling market into the "heart booster"

The PUE value (Power Utilization Efficiency) is a key indicator to measure how green a data center is. Power Usage Effectiveness (PUE) is the efficiency of power utilization. PUE is the ratio of all energy consumed by a data center to the energy consumed by the IT load. The higher the PUE value, the lower the overall efficiency of the data center. When this value exceeds 1, it indicates that the data center needs additional power overhead to support the IT load. The closer the PUE value is to 1, most of the power in the data center is consumed by servers, network equipment, and storage devices, and the higher the degree of greening of the data center. In data centers, the energy consumption of the cooling system accounts for a high proportion of the overall energy consumption of the data center. According to the data of the Communication Power Supply Committee of the China Institute of Communications, in the typical data center energy consumption composition, the largest proportion is IT equipment, accounting for 50%, followed by refrigeration system equipment, accounting for 35%, and finally power supply and distribution system equipment and other data-based central facilities that consume electric energy. In the refrigeration system, it mainly includes air conditioning equipment, cooling source equipment and fresh air system, and the specific energy consumption composition is shown in the following table.

Compared with traditional air cooling, the PUE value of liquid cooling technology is generally lower. According to CSDN data, the PUE value of traditional air cooling is about 1.3, while the PUE value is significantly reduced with liquid cooling technology. Among them, the PUE value of traditional cold plate technology is about 1.2, and the PUE value of immersion liquid cooling is between 1.05-1.07.

Deeply bound to NVIDIA, Vertiv is moving towards growth

On December 11, 2023, according to today's hot news, a subsidiary of Vertiv Technology will acquire all the shares and related assets of CoolTera, a provider of liquid cooling technology infrastructure for data centers. Founded in 2016, CoolTera is a United Kingdom-based provider of liquid cooling infrastructure solutions for data centers, with professional original R&D, design and manufacturing capabilities for cooling distribution units, secondary side pipelines and Manifold. Previously, Vertiv and CoolTera had been working closely together on liquid cooling technology for three years, jointly deploying multiple data centers and supercomputing systems around the world. We believe that this acquisition further strengthens Vertiv's thermal management capabilities and industry presence.

In-depth dismantling of the core value chain of liquid cooling

The concept and classification and comparison of liquid cooling

Liquid cooling: A cooling method that ensures that a computer operates at a safe temperature. Liquid cooling technology uses the high specific heat capacity of a flowing liquid to absorb and migrate heat generated by the internal components of a computer to the outside, and the advantage of this method is that it uses liquid heat transfer, and the high specific heat capacity of liquid can transfer heat more efficiently than air, thus reducing energy consumption.

Liquid cooling technology is divided into direct liquid cooling technology and indirect liquid cooling technology according to different contact methods. In indirect liquid cooling technology, the cooling liquid is not in direct contact with the heating device, mainly including the cold plate type. In the direct liquid cooling technology, the cooling liquid is in direct contact with the heating device, mainly including immersion type and spray type liquid cooling, wherein the immersion type can be divided into single-phase immersion type and phase change immersion type according to whether the cooling medium has a phase change.

Compared with traditional air cooling technology, liquid cooling technology has a significant energy saving effect. According to the data of "Research Status and Development Trend of Heat Dissipation and Cooling Technology for Green and Energy-efficient Data Centers", the PUE value of data centers is usually around 1.5 under air cooling. According to the ODCC data of the Open Data Center Council, the PUE value of cold plate liquid cooling is 1.1-1.2, the PUE value of phase change immersion liquid cooling is less than 1.05, the PUE value of single-phase immersion liquid cooling is less than 1.09, and the PUE value of spray liquid cooling is less than 1.1.

Liquid cooling market space calculation: The 100 billion market is poised to take off

Market split: According to our judgment, the market size of liquid cooling or air cooling is judged according to the construction of IDC, and the entire IDC market can be basically divided into cloud computing, supercomputing, and intelligent computing markets, and according to the cloud computing, supercomputing, and intelligent computing markets, the downstream can be divided into CPU servers, general-purpose servers, and AI servers, among which the AI servers of the intelligent computing center are the direct incremental factors of liquid cooling.

Penetration rate: According to the data of the Financial Associated Press and Inspur Information, the development of liquid cooling in the mainland is a gradual iterative process, with a penetration rate of about 5% in 2023 and is expected to exceed 20% by 2025.

Price/KW: According to the data of Zhihu, the price of a single kw is about 6,000 yuan for air cooling, 10,000 yuan for cold plates, and 12,000 yuan for immersion liquid cooling.

Liquid cooling calculation of AI servers: According to the data of the China Business Research Institute, the shipment of AI servers in 2023 will be 354,000 units, and we assume that the growth rate of AI servers in 2024 will be 120%, and the growth rate of the industry in 2025 will be 80%, of which the server models in 2023, 2024, and 2025 will be DGX A100, DGX H100, and DGXB200 respectively (without considering the United States ban factor), and according to NVIDIA data, the power consumption of a single server will be 6.5 KW respectively, 10.2KW, 14.3KW, we assume that the average power consumption is 80% of the peak power, the liquid cooling permeability is 10% in 2023, the liquid cooling permeability is 30% in 2024, and the liquid cooling permeability is 100% in 2025 due to the chip process. From 2023 to 2025, the proportion of cold plate liquid cooling is 95%, 90%, and 80%, respectively, and the rest are all immersion liquid cooling.

Ecological Splitting of Liquid Cooling Industry - Secondary Side: Cold Plate Type

Cabinet process refrigerant supply and return manifold RCM: Installed inside the liquid-cooled cabinet, the function is liquid separation, liquid collection and exhaust, etc., generally composed of exhaust valves, branch pipelines and main pipelines. QDC is installed at the end of the hose of the branch line to connect with the cold plate assembly in the server. The main connection is located at the upper or lower end and is the interface for the process refrigerant supply and return liquid-cooled cabinet, which is connected to the LCM by a hose. Cooling distribution unit CDU: The function of CDU is to isolate the process refrigerant entering the server cold plate assembly from the cooling water on the cold source side, and distribute the cooled process refrigerant to the cooling equipment of the cold plate of different servers. According to the different layouts, it is divided into rack-mounted CDUs and cabinet-type CDUs. See 2.1.1 for a comparison. Loop Process Refrigerant Supply and Return Manifold LCM: LCM is typically installed at the bottom of the data center floor, and sometimes on top of the cabinet, and has functions such as liquid separation, liquid collection, and exhaust. LCM is generally composed of exhaust valves, branch pipelines, main pipelines, valves, etc. The LCM will be fed from the CDU-cooled process refrigerant to the RCM via a branch hose. Process refrigerant: Pure water and formula liquid are mainly selected. The pure water solution is mainly deionized water, and the formula solution is mainly ethylene glycol or propylene glycol solution.

Sort out the beneficiary companies of the liquid cooling industry chain

Liquid cooling benefits from the industrial chain - the inside end of the server

We simply dismantle the beneficiary companies of the liquid cooling industry chain into three categories, namely the server inner end, the liquid cooling construction end, and the liquid cooling infrastructure provider. Server Internal: We define the server internally as the internal components of the server, which directly benefit from the volume of AI chips with high computing power and high power.

Liquid cooling benefits from the industrial chain - liquid cooling construction end: a full-chain solution manufacturer

On the liquid cooling construction side, due to the different construction subjects, we divide the liquid cooling construction end into liquid cooling full-chain solution manufacturers, server manufacturers and IDC manufacturers. Liquid cooling construction end of the full chain solution manufacturers: We define such manufacturers as providing liquid cooling full-stack solutions, and such solution manufacturers cannot provide servers, so related products must be adapted to server (chip) manufacturers such as Huawei and NVIDIA in order to give full play to the significance of their liquid cooling solutions, such as Vertiv technology.

Liquid cooling benefits from the industrial chain - liquid cooling construction end: IDC construction

Liquid-cooled server manufacturers: We define this type of vendor as an IDC vendor whose main business is an IDC vendor, and since it is a direct builder of a data center, we judge that it will build relevant liquid cooling solutions according to the needs of IDC, such as Internet vendors, or carry out the transformation of liquid-cooled equipment rooms.

Liquid cooling benefits from the industrial chain - liquid cooling infrastructure provider

Liquid cooling infrastructure provider: We define a liquid cooling infrastructure provider as one that can provide related liquid cooling products, such as CDU, LCM, RCM, etc., and the demand side of the product may be liquid cooling builders or IDC data centers. With the upgrading of liquid cooling, its related products are expected to rise in volume and price.

Excerpts from the report:

AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era
AIGC Industry Report: Liquid Cooling, the Next "Optical Transceiver" in the AI Era

(This article is for informational purposes only and does not represent any investment advice from us.) To use the information, please refer to the original report. )

Selected report source: [Future Think Tank]. Future Think Tank - Official Website

Read on