With the advent of the AGI era, the demand for computing power and storage has increased simultaneously, and before the integrated storage and computing model becomes the mainstream, HBM (high-bandwidth storage) has strong advantages in overcoming the "storage wall" and improving bandwidth, and is mainly used in AI chip on-chip storage. According to SK hynix, HBM's demand will grow at a CAGR of 109% between 2022 and 2025. The rapid growth of HBM has brought incremental space to the industrial chain links such as IDM, wafer manufacturing, packaging, and equipment materials, and has now become a battleground for all links in the memory chain.
summary
AI computing power pursues high-performance dynamic storage, and HBM has become the best solution. With the increasing amount of data and the accelerated development of AI chips, the problem of Feng's computing architecture is highlighted: the performance mismatch between "memory" and "computing" makes the growth of computing power of computers encounter bottlenecks. GDDR is currently the most widely used memory technology. However, GDDR is also embarrassed in the field of AI computing, so manufacturers have turned to HBM technology.
The demand for HBM is driven by AI chips, and the competition among mainstream manufacturers is fierce. According to our calculations, the comprehensive demand of HBM is significantly related to multiple parameters such as storage capacity requirements, bandwidth requirements, and the number of HBM stacking layers of AI chips. SK hynix, Samsung Electronics, and Micron Technology are competing fiercely, and they are now working on HBM3E products.
HBM's manufacturing complexity has increased, and there are opportunities for different links in the industrial chain to participate. The manufacturing process of AI chips is greatly increased compared with traditional computing chips, and considering the different requirements for accuracy and process requirements of different connection methods, the manufacturing process is distributed among IDMs, wafer fabs, and packaging fabs. GPU and HBM are the main active components in chiplets, which are manufactured by IDMs, fabs, and memory fabs, while among passive components, interposer and RDL can be manufactured by fabs, IDMs, and packaging fabs, and substrates and PCBs are supplied by corresponding manufacturers.
HBM stacking technology has greatly increased the requirements for front and rear equipment, and the change of bonding mode path is a hot topic in the market. The HBM stacking process mainly revolves around bump manufacturing, surface routing, TSV, bonding, debonding, lithography, gluing and developing, sputtering machine, etching, electroplating and other front-end tools. As the stacking structure increases, the wafer thickness decreases, and the demand for equipment such as thinning, dicing, and molding increases. Among the more critical bonding, the current mainstream bonding methods in the market are still TCB lamination and MR solutions, and we believe that hybrid bonding may become the mainstream solution in the future.
risk
The mainstream path of AI chips has changed, the demand for AI chips has not met expectations, and the DRAM and HBM paths have changed.
body
AI computing power pursues high-performance dynamic storage, and HBM has become the best solution
Artificial intelligence, cloud computing, and deep learning can be summarized into three major computing power stages, and they are currently in the third stage. Cloud AI processing requires multi-user, high-throughput, low-latency, and high-density deployment. The rapid increase in computing units has made the IO bottleneck more serious, requiring an increase in the number of DDR interface channels, on-chip cache capacity, and multi-chip interconnection. The traditional von Neumann architecture is computing-centric, and because the processor focuses on increasing speed, memory pays more attention to capacity improvement and cost optimization, resulting in a performance mismatch between "memory" and "computing".
HBM has the advantages of high bandwidth and small size. With the advent of GPGPUs, GPUs are increasingly being used in high-performance computing, and GDDR is also embarrassed in the field of AI computing, so manufacturers have turned to HBM technology. Through multi-layer stacking, HBM can achieve a higher number of I/O, resulting in a memory bit width of 1,024 bits, which is almost 32 times that of GDDR, and the memory bandwidth is significantly improved, in addition to the advantages of lower power consumption and smaller form factor. The memory bandwidth has been significantly improved, solving the problem of "memory wall" in AI computing in the past, and HBM has gradually increased the penetration ratio of GPUs in mid-to-high-end data centers.
Due to the structure, the upper limit of the total bandwidth of GDDR is lower than that of HBM. Total bandwidth = I/O data rate (Gb/s) x bit width/8. In order to solve the problem of low DDR bandwidth, it is necessary to increase the data rate and bit width of a single I/O (number of I/Os * single I/O bit width), which can be divided into GDDR monolithic solution and HBM stacked solution. Monolithic GDDR improves total bandwidth by dramatically increasing the data rate of a single I/O, with single I/O data rates ranging from 7 Gb/s to 16 Gb/s for GDDR5 and GDDR6, exceeding the 6.4 Gb/s of HBM3. HBM uses TSV technology to increase the number of I/Os and the bit width of a single I/O, thereby significantly increasing the bit width, while maintaining a lower single I/O data rate, but the total bandwidth is much better than that of GDDR.
The combined power consumption of HBM is lower than that of GDDR. HBM reduces the bus frequency by increasing the number of I/O pins, resulting in lower power consumption. Although a large number of on-chip caches can provide sufficient computing bandwidth, due to storage structure and process constraints, on-chip caches occupy most of the chip area (typically 1/3 to 2/3), limiting the increase in computing power.
HBM uses a 3D packaging process to stack DRAM dies vertically, which can greatly reduce the on-chip area occupied by memory chips. The HBM chip is 20% smaller than traditional DDR4 chips and saves 94% of the surface area compared to GDDR5 chips. According to Samsung Electronics, the 3D TSV process saves 35% of the package size compared to the traditional POP package.
At present, the mainstream GDDR standard is GDDR6, and the mainstream HBM standard is HBM3, and the memory bandwidth of HBM3 is about 8-9 times that of GDDR6. The official standard for GDDR7 was released by JEDEC on March 5, and a big technical change is the conversion of two-bit non-return-to-zero (NRZ) coding on the memory bus to three-bit pulse amplitude modulation (PAM3) encoding, with JEDEC expecting data rates of the first generation of GDDR7 to be around 32 Gbps/pin. We expect HBM3E and GDDR7 to become mainstream standards in the short to medium term, and HBM3E is expected to reach 6 times the memory bandwidth of GDDR7.
Figure 1: GPU and storage types by different brands and models on the market
Source: Companies' official websites, Yole, CICC Research
HBM's supply and demand estimation and technical path discussion
Requirements: Calculate HBM requirements based on incremental GPU requirements. According to our estimates, the total global demand for HBM wafers in 2024 and 2025 will be 60,000 pieces per month and 150,000 pieces per month respectively. The base assumption is that the total number of GPUs carrying HBM in 2024 and 2025 will be 6.47 million and 8.1 million, respectively, and that a single GPU will carry 6 and 8 cubes (HBM after stacking), and the total number of wafers will also increase as the average number of stacked layers increases. We then assume that the number of cuts on each wafer is 400 constant. With a total wafer demand of 160,000 wafers per month in 2025 and a global production capacity of 150,000 wafers in 2024, according to Yole, HBM still has a certain gap under our assumptions.
Figure 2: Calculation of total HBM requirements
Source: Nvidia, AMD, Yole, CICC Research
Figure 3: HBM wafer yield estimation
Source: Yole, CICC Research
Supply: SK hynix, Samsung Electronics, and Micron Technology are competing fiercely, and each of them is working on HBM3E products. During the recent NVIDIA GTC, all three companies exhibited their latest HBM3E products, which were gradually aligned in terms of stack layers, single cube capacity, and bandwidth. SK hynix's HBM3E has significantly improved chip density, I/O rate, bandwidth, and maximum capacity.
Figure 4: The HBM roadmap by memory vendor
Source: Yole, each company's official website, CICC Research
HBM supply chain dismantling: manufacturers, equipment manufacturers, material suppliers
HBM manufacturing is still dominated by IDM, but the domestic development of the front and back division of labor model. The manufacturing process of GPU chips is distributed among IDMs, fabs, and packaging fabs. Generally speaking, xPU (CPU, GPU, etc.) and HBM are the main active devices on the chip, which are manufactured by IDMs, fabs, and memory fabs, while among passive components, interposer and RDL can be manufactured by fabs, IDMs, and packaging fabs, and substrates and PCBs are supplied by their respective manufacturers. The final closure is usually manufactured and tested in the packaging plant.
We believe that OSATs have a certain accumulation of stacking technology and packaging processing process for HBM packaging process, but there is a certain gap between OSATs and fabs and IDM in wafer processing. At present, considering that there are not many HPC design companies in the world that use HBM and Chiplet stacking technology, and the chips on the consumer electronics and PC chain are still not larger in terms of total volume, so simply doing HBM packaging or Chiplet packaging is not the best economic choice for ultra-large packaging and testing plants, but with the increase in the total demand for AI chips, especially servers, in the future, we believe that some mature process wafer factories, Large OSATs will also gradually start to invest in high-end advanced packaging.
Exhibit 5: Storage Industry Packaging Players
Source: Companies' official websites, CICC Research Department
EUV lithography machines have been widely used in DRAM manufacturing. Samsung Electronics first applied EUV to the production of 1z DRAM in 2020, and SK hynix announced that it completed the first production line equipped with EUV tools in February 2021 for the production of 1a DRAM in the second half of 2021. In the coming years, SK hynix and Samsung are expected to produce DRAM samples with high numerical aperture EUV in preparation for mass production of products with node sizes ≤ 10nm after 2026. Micron has been using self-aligning multiple patterning methods such as SAQP, but in the case of less than 1β nodes, the process control and production stability of multiple patterning methods and immersive lithography are becoming more and more difficult, so Micron may introduce EUV technology from the 1γ node.
The proportion of etching equipment in DRAM manufacturing lines is increasing. According to Yole's estimates, more than 70% of equipment spending on DRAM manufacturing is likely to be focused on deposition and etching systems. Spending on lithography is likely to drop below 20%. In terms of the global market, Lam, TEL and AMAT almost monopolize the global dry etching equipment market, with the global market share of dry etching equipment in 2020 being 46.71%, 26.57% and 16.96% respectively, accounting for more than 90% in total. Among them, silicon-based etching is mainly monopolized by Lam and AMAT, and dielectric etching is mainly monopolized by TEL and Lam.
Figure 6: HBM's Major Suppliers of Front-End Equipment
Source: Companies' official websites, CICC Research Department
HBM's mid-to-back-end manufacturing process mainly focuses on bumps, chip surface routing, substrate routing, and bonding between different layers. The equipment and materials used are basically the same as those of the front end, of which bonding is one of the more critical steps.
Bumping: Flip is the core process in advanced packaging, and Bumping is an important process in the flip process, which is the first step of Chiplet. Bumping refers to the growth of solder balls at a reserved position (usually a pad) on the surface of the wafer, and the connection to the substrate and PCB is realized through the solder balls. Bumping's materials generally include tin, copper, and gold, and its manufacturing process is basically similar to the pre-wafer manufacturing steps, mainly involving PI coating, lithography, sputtering, electroplating, cleaning, reflow soldering and other processes. Bumping parameters are mainly divided into diameter, height and density, with the increase of chip complexity, the number of pins increases correspondingly, resulting in smaller diameter, lower height, higher density, and higher difficulty in correspondence.
TSV (Through silicon via): Mainly used for three-dimensional packaging, perforating vertically in the silicon wafer to play the role of electrical extension and interconnection for the chip. According to the different types of integration, TSVs are divided into 2.5D and 3D, with 2.5D TSVs located in the interposer, while 3DTSV runs through the chip itself and directly connects the upper and lower chips. The TSV connection method is widely used in high-end memory stacking and interposer.
Globally, the companies involved in the middle of the manufacturing equipment are similar to the suppliers of the front manufacturing equipment, among which in the lithography process steps, AMAT, TEL, SUSS, Veeco, PSK, DNS and other companies are involved, and domestic manufacturers of bonding/debonding, TSV, CMP and testing processes have occupied a certain share. Domestic front-end equipment manufacturers such as North Huachuang, Shengmei Shanghai, Xinyuan Micro, Xinqi Microelectronics, Zhongke Flying Measurement, Huazhuo Jingke, and Shanghai Microelectronics have shipped a large number of products in Zhongdao manufacturing equipment, and provided greater support for revenue growth in the early stage of the development of the above-mentioned companies. And we believe that under the rapid development trend of advanced packaging, the importance of Zhongdao manufacturing has gradually become prominent, and the demand for Zhongdao equipment will continue to increase, and we believe that it will continue to be an important source of performance for semiconductor assembly equipment and parts manufacturers in the future.
HBM's multi-layer stacking structure improves the process steps and drives the continuous increase in the demand for packaging equipment. The increasing number of HBM stacked structures requires a continuous reduction in wafer thickness, which means that there is an increased demand for equipment such as thinning and bonding, HBM's multi-layer stacking structure, which relies on ultra-thin wafers and a hybrid copper-copper bonding process, increases the need for temporary bonding/debonding, and the protective materials for each layer of DRAM die are also critical, which puts forward high requirements for injection molding or compression molding equipment.
Figure 7: HBM's mid-way manufacturing industry chain
Source: Wind, company announcements, CICC Research
Figure 8: HBM's back-end manufacturing industry chain
Note: Statistics as of April 1, 2024
Source: Wind, company announcements, CICC Research
HBM has clear requirements for stacking height and heat dissipation, and the current mainstream bonding methods in the market are still TCB press-pressing and MR solutions, and we believe that hybrid bonding may become the mainstream solution in the future, but its cost and time are still relatively vague. For HBM, the following aspects are sought after by stacking: 1) shorter interconnects and larger single-cube capacity, 2) better heat dissipation, and 3) maintaining the same single-cube height.
MR- MUF(Mass reflow,批量回流焊)
MR-MUF is Hynix's high-end packaging process, in which chips are attached to the circuit, and when stacked, a liquid epoxy molding compound (LMC) liquid protective material is injected between the chips and hardened. Compared to the traditional method of laying thin film material after each chip is stacked, MR technology has certain advantages in terms of heat dispersion efficiency, production efficiency and cost-effectiveness. SK hynix has applied MR technology to its HBM3E products.
Figure 9: SK hynix's mass reflow manufacturing process
Source: SK hynix official website, CICC Research
TCB(Thermo-Compression Bonding,热压键合)
At the heart of TCB is a high-density chip package that is held together with the substrate through hot-press bonding technology. With the continuous reduction of soldering bump spacing and the decreasing thickness of substrate and wafer, the traditional reflow soldering process has defects such as warping, local bridging, and chip offset, and the TCB process can solve these problems well.
Figure 10: TCB process
资料来源:Li, J. H. et al.《The thermal cycling reliability of copper pillar solder bump in flip chip via thermal compression bonding》(2020),中金公司研究部
Figure 11: ASMPT's LPC TCB process
资料来源:Li, Ming et al.《A high throughput and reliable thermal compression bonding process for advanced interconnections》(2015),中金公司研究部
HB(Hybrid bonding,混合键合)
The HB process provides higher interconnect densities, so the HB process is gradually replacing the traditional die-to-die soldering process for bump pitches below 15 μm. While the bumps of the traditional soldering process use copper pillars that cover the solder, the HB process uses metal sheets parallel to the surface, increasing the density and efficiency of the interconnect. The HB process mainly includes two types of bonding: die-to-wafer and wafer-to-wafer, the wafer-to-wafer process is more mature, but it needs the same size of each chip and the overall yield is low, so it lacks some flexibility compared with the die-to-wafer process. According to ZDNET, JEDEC (International Semiconductor Standardization Organization) may relax the stack height of the sixth-generation HBM4, and the MR and TC solutions can continue to be used in terms of corresponding thicknesses, although the HB solution can provide narrower pitch pitches and thinner heights, considering its low popularity, with the current high price, large-scale adoption may be delayed.
Figure 12: Hybrid Boding process
资料来源:A. Elsherbini et al.《Enabling Hybrid Bonding on Intel Process》(2021),中金公司研究部
Figure 13: Hybrid Bonding in 3D Packaging
资料来源:A. Elsherbini et al.《Enabling Hybrid Bonding on Intel Process》(2021),中金公司研究部
DRAM Scaling Challenges and Stacking Methods
DRAM manufacturers and research institutes are eager to push the boundaries of new processes and find new processes that push the boundaries of DRAM. With the slowdown of Moore's Law and the limitation of physical limits, the scaling of planar DRAM still has some room for scaling. However, in order to continue to increase density and reduce the price per bit, various research is underway, such as adjusting the manufacturing method of transistors and adopting a monolithic 3D-DRAM structure.
Continuing the Scaling Direction: Planer DRAM adopts EUV and HKMG manufacturing technologies. We have observed that DRAM scaling was expected to be discontinued a few years ago, but new technical solutions have allowed it to carry over to the 1β node, which is now entering early production. The increasing cost of scale and the limitations of the underlying physics make scaling in the planar direction increasingly challenging for DRAM manufacturers. We believe that new materials, new devices, new device architectures (e.g., monolithic 3D DRAM), and new process technologies will be necessary for the long-term continuation of DRAM Scaling.
Continuation of Scaling direction: 4F2 unit structure. The 4F² cell structure is seen as one of the main options for reducing the chip area, allowing for approximately 30% area reduction compared to the existing 6F² structure without the need for smaller lithography nodes. In May 2023, Samsung established an R&D team to develop the 4F² structure of DRAM at 10nm nodes (such as 1D) and beyond. 4F² DRAM is likely to employ vertical capacitors and vertical transistors.
Figure 14: 4F2 can save about 30% of wafer area compared to 6F2 at the same line width
资料来源:Spessot, A., & Oh, H. (2020). 1T-1C Dynamic Random Access Memory Status, Challenges, and Prospects. IEEE Transactions on Electron Devices, 67, 1382-1393.,中金公司研究部
Continuing Scaling: From Planar Structures to 3D DRAM. The scaling capability of planar DRAM is limited, and as the size of transistors continues to decrease, the size of capacitors must also shrink correspondingly, resulting in a decrease in the ability to store charges, so it is necessary to develop 3D DRAM to significantly improve memory density and performance by vertically stacking memory cell layers.
Another 3D DRAM structure is very similar to 3D NAND, the Complementary Metal-Oxide-Semiconductor Bonded Array (CMOS-Bonded Array, or CBA). The peripheral circuitry and memory arrays of the DRAM architecture are processed on different wafers and then combined. This DRAM architecture is likely to be adopted when 4F² units are introduced (Yole expects it to be after 2025). At present, it is not convenient to combine CBA with a 6F² unit.
Figure 15: DRAM with transversely arranged capacitances
Source: NEO Semiconductor, CICC Research
图表16:CBA(CMOS bonded array)结构与3D-stacking NAND 结构类似
Source: Yole, CICC Research
HBM and GPU stacking mode. AMD has exhibited a way of stacking memory and GPUs on top of each other. In its presentation at ISSCC 2023, AMD detailed ways to improve energy efficiency in data centers and manage to keep up with Moore's Law amid a slowdown in semiconductor manufacturing node progress, which is stacking HBMs and GPUs on top of each other in the form of multi-chip modules (MCMs), where the logic chips and HBM stacks are placed on top of the silicon interposer.
Figure 17: AMD shows how different memory and compute chips are combined
Source: AMD presentation at ISSCC 2023, CICC Research
Article source:
This article is excerpted from: "Top of the AI Wave Series: HBM Becomes a Strategic Storage Location" released on April 5, 2024
Yikang Zhang Analyst SAC License No.: S0080522110007 SFC CE Ref: BTO172
Hu Jiongyi Analyst SAC License No.: S0080522080012
Tang Zongqi Analyst SAC License No.: S0080521050014 SFC CE Ref: BRQ161
Lei Jiang Analyst SAC License No.: S0080523070007 SFC CE Ref: BTT278
彭虎分析员 SAC 执证编号:S0080521020001 SFC CE Ref:BRE806
Shi Xiaobin Analyst SAC License No.: S0080521030001
Legal Notices