laitimes

The high-performance computing market has soared, and the value of humble components has increased by 8 times

author:The semiconductor industry is vertical
The high-performance computing market has soared, and the value of humble components has increased by 8 times

As the market size of high-performance computing (HPC) systems, especially AI servers, continues to expand, the performance and power consumption of their core processors, including CPUs, GPUs, NPUs, ASICs, FPGAs, etc., as well as chip components such as memory and network communications, are improving. With the improvement of performance, the improvement of power management level is more important, because the power consumption of HPC systems, especially AI servers, is increasing, which puts forward higher requirements for the power management capabilities of the entire system and major chips.

In AI servers, the CPU needs to be powered, the GPU board needs to be powered, the memory (DDR4, DDR5, HBM) needs to be powered, and various interfaces need to be powered. At this time, the power management system is very important, in addition to AC/DC power supply, DC/DC converter, etc., the passive components used in the power management system (mainly inductors and capacitors) also play a key role, with the improvement of system performance and power consumption, the performance and quantity of these passive components are put forward higher and more requirements.

High-performance passives can provide more stable voltage and current to ensure the normal operation of HPC systems such as AI servers, guaranteeing fast transient response and low ripple. Low-loss passive components can improve the energy efficiency of AI servers, improve the efficiency of key components, and save energy and protect the environment. To ensure the reliability and stability of AI servers, higher requirements are placed on inductors.

01

Power supply challenges for AI systems

Compared to ordinary servers, AI servers require higher configuration and energy consumption. Since the power of AI servers is 6~8 times higher than that of ordinary servers, the requirements for power supply have also increased simultaneously, at present, general-purpose servers on the market generally need two 800W power supplies, and AI servers need up to four 1800W power supplies.

As server performance improves, the number of supporting inductance transformers will inevitably increase. Taking chip inductors as an example, some institutions report that due to the increase in the number of GPUs, AI servers need a total of 24~48 inductors, and at $1 each, the value of chip inductors in AI servers is 60%-220% more than that of ordinary servers.

In addition, in AI servers, all-in-one forms such as multi-phase or coupled inductors will gradually replace single-inductor applications, and in order to solve the problems of heat dissipation and loss, ultra-thin applications and power modules will be more widely used.

Data centers require more and more AI acceleration cards, which require a large number of processors (xPUs) and massively parallel computing solutions, which have a large number of small cores compared with ordinary CPUs, which are helpful for neural network training and AI inference. However, xPU consumes a lot of power when performing AI calculations and transmitting data. In other words, xPU is a very power-hungry chip, and its strict power consumption requirements pose new challenges to the AI accelerator card, which will also affect system performance.

AI systems require extremely high computing power, especially when dealing with workloads such as deep learning and inference. At the system level, AI accelerators play a key role in delivering near real-time results. All xPUs have multiple high-end cores that are made up of billions of transistors and consume hundreds of amperes. The core voltage of these xPUs has been reduced to 1V.

The peak current density required for an AI accelerator card is a very heavy burden on any motherboard and difficult to handle. The highly dynamic nature of the workload and extremely high current transients can result in very high di/dt and voltage spikes lasting a few microseconds, which are very damaging and can cause damage to the xPU. The average workload of AI lasts for a long time, and the decoupled capacitors will not always be able to provide the energy to meet immediate demand, so the transients of the AI accelerator need to be eliminated to avoid damage to the entire distribution network.

Currently, the requirements for xPU regulators (VR) are very different from those for standard PoL regulators. Some applications require more than 1000A for xPU at less than 1V. At this time, the power consumption must be controlled, otherwise, it will be difficult for the system to work stably.

How to reduce the energy consumption of AI systems has become an industry problem. At present, there are two main ways to reduce the energy consumption of AI systems: first, reduce the energy consumption of AI system core processors, and second, optimize the power management system to improve the efficiency of power management of AI core processors. However, with the popularity of emerging applications such as AI, the efficiency of solutions such as AC/DC, DC/DC, multiphase power controllers, and DrMOS power stage combinations used in traditional computing systems has reached the ceiling, requiring more advanced power management solutions.

02

Server power systems are evolving

The miniaturization of the processor has led to a decrease in the supply voltage, but the current consumption has increased instead of decreasing, resulting in a continuous increase in power consumption. One of the problems posed by the trend towards low voltage and high current is how to improve the ability to respond quickly to load fluctuations.

As the voltage decreases, the allowable tolerance of the voltage becomes very small. For example, in order to avoid processor misoperation, if the core voltage is supplied with an accuracy of ±3%, the tolerance at 1V must be controlled at ±30mV. For server-specific power supplies, the output voltage must be as stable as possible, even under driving conditions with high-current load changes of more than 1000A.

In practical applications, the trend of low voltage and high current development has continued, and high frequency and multi-phase are usually used to deal with it. Switching at higher frequencies allows smaller components, such as capacitors and inductors, to manage and smooth the flow of energy in input and output circuits. For converters based on ordinary silicon power semiconductor devices, the typical switching frequency is 30~80kHz, and at such frequencies, it is cost-effective to use widely recognized capacitors. However, above this frequency range, parasitic effects can lead to excessive resistance losses and self-generated heat.

While increasing the frequency can do a great job in improving the load response, it can also greatly increase the losses of the switching element. In addition, by using large-capacity external capacitors, voltage fluctuations in high-current applications can be suppressed to a certain extent, but this increases the mounting area and capacitor cost.

With all of the above in mind, Trans-Inductor Voltage Regulators (TLVRs) are currently the mainstream circuit configuration solution for dealing with rapid load fluctuations in low-voltage and high-current applications. The scheme is to connect each phase switch to an inductor with additional windings, and then connect the windings and compensation inductors of each phase in series into a loop so that current is supplied to each phase at the same time. TLVR enables the processor to achieve high transient response to meet load requirements with little to no reduction in supply voltage, while reducing power losses and maintaining a small output capacitance value, reducing installation area and system cost.

03

More inductance options

In high-performance computing systems, especially in the power management system of AI servers, there are more and more inductance solutions, and in addition to the above-mentioned TLVR, there are also products such as one-piece inductors, chip inductors, and ultra-thin one-piece inductors.

The chip inductor plays the role of supplying power to the front end of the chip, which is mainly used for voltage and current conversion, and is commonly used in power management chips (PMICs) and FPGA power supply circuits. In a high-performance computing system, chip inductors, capacitors, MOS transistors, and driver chips together form power supply circuits to meet the power supply requirements of GPUs and CPUs.

With the miniaturization of power modules and the increase of current, the volume and saturation characteristics of ferrite inductors have been difficult to meet the requirements of high-performance GPUs. The chip inductor of metal soft magnetic material is used, and the applicable switching frequency can reach 500kHz~10MHz.

There is also a chip inductor, which is based on the semiconductor thin film process, using photolithography processing technology, different from the traditional wound inductor and integrated inductance process, the biggest feature of the semiconductor thin film process is that it can realize the full-page production of chip inductor products, which improves the production efficiency. Based on the SIP process, the traditional power module encapsulates the chip and the inductor on a package base, and processes the power inductor and the package base together to realize the two-in-one power inductor and the package base. Compared with the traditional SIP, which requires "chip + inductor + base", the solution based on the semiconductor thin film process only needs to package the chip with the integrated inductor and other devices to achieve complete power module and peripheral circuit functions, further reducing the volume of the power module, improving the power density and reducing the cost.

This chip inductor uses a new magnetic material, the permeability and saturation current are very good, and at the frequency of 6MHz, the material loss of the inductor accounts for a very low proportion of the total loss of the inductor.

04

Capacitance is also important

In the power management system of high-performance computing, in addition to inductors, capacitors and thermistors are also being replaced.

At present, the proportion of AI servers in the overall high-performance computing market is still low, therefore, there is no market research agency to calculate the consumption of MLCCs (chip multilayer ceramic capacitors) by AI servers, however, in terms of the development situation, passive component distributors are generally optimistic about capacitors, especially the application prospects of MLCCs in AI servers, and there will be a significant growth trend in the second half of 2024, and MLCC specifications and unit prices will be greatly increased.

At the technical level, computing system processors all require capacitors, which have traditionally been tantalum or polymer capacitors. To reduce the reliance on decoupling capacitors, a small subset of Class II MLCCs (e.g., X5R, X6S, or X7R devices) can be placed directly near the processor. Currently, some manufacturers are working to embed aluminum polymer decoupling capacitors into chip carriers in packages, working with on-chip silicon capacitors, to overcome the decoupling challenges faced by high-performance processors and support higher converter frequencies, possibly up to 10MHz in the future.

05

Opportunities for passive component manufacturers

A few days ago, at the GTC conference held by NVIDIA, Delta Electric, a major server foundry manufacturer, said that in the AI server power conversion system, how to keep the voltage at 0.8V of GPU operation under the rapid surge of current, the inductor plays a key role, and it must be able to maintain stable operation in high current and low voltage conditions.

The power consumption of AI servers equipped with NVIDIA's new Blackwell architecture acceleration chips is as high as 1000W~1200W, and the inductor usage is 2~3 times higher than that of ordinary servers.

In order to improve the instantaneous response performance, it is necessary to add 5~10 TLVR inductors to each AI server, and the unit price of TLVR inductors is 3~5 times that of general inductors.

Not only the latest AI servers, but more and more high-performance computing systems require more and better inductors. In general, the inductance usage will increase significantly if the CPU is only upgraded to the server, taking Eagle Stream to Birch Stream as an example, because the CPU power consumption is increased by about 50%, and the inductor usage will be increased by 50%~70%.

It can be seen that for major passive device manufacturers, especially high-quality inductor companies, new business opportunities are in front of them. At present, the top related manufacturers in the industry include TDK, Yageo, Sunluo Electronics, Taiqingke, ITG and EATON.

As mentioned above, the use of chip inductors in high-performance computing power management systems is increasing, which is not only good news for international manufacturers, but also for local Chinese companies to improve product quality and market share. China's chip inductor industry started late, and in the early stage of development, the level of technology research and development and production management lagged behind international manufacturers, especially TDK, Murata, Chilixin and Taiyo Yuden. In recent years, China's local Shunluo Electronics has been making efforts, ranking among the top five in the world, in addition, it is worth paying attention to local chip inductor companies including Boke New Materials, Maijie Technology, Yitong New Materials, Tiantong Shares, Dongmu Shares, Hengdian East Magnetic, etc.

06

epilogue

As the market size of high-performance computing systems, especially AI servers, continues to expand, the requirements for key chip components are getting higher and higher, not only for high-performance processors such as GPUs and CPUs, but also for power management systems, as well as the consumption and quality requirements of related chips and components.

As inconspicuous but indispensable inductors and capacitors in power management systems, the higher and higher power consumption of computing systems is the stage for them to give full play to their efficiency and role, and related new technologies and materials are also expected to emerge.

For passive component manufacturers, national manufacturers with high-quality products will still get better business opportunities, while for China's local related enterprises, the huge domestic market gives them enough room to display and have more opportunities to seize the market share of international manufacturers.