laitimes

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

author:The semiconductor industry is vertical
Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

本文由半导体产业纵横(ID:ICVIEWS)编译自eetimes

Although the demand for GPUs itself is huge, the supply simply can't keep up.

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Since the release of ChatGPT by Open AI in November 2022, the demand for generative AI (artificial intelligence) has exploded globally. This series of AI applications runs on AI servers equipped with AI semiconductors such as NVIDIA GPUs.

However, according to the forecast of TrendForce, a Taiwanese research company, on December 14, 2023, the growth rate of AI server shipments will not be as expected. AI servers are expected to account for only 6% of all server shipments in 2022, 9% in 2023, 13% in 2024, 14% in 2025, and 16% in 2026. (Figure 1).

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Figure 1: Number of servers shipped, proportion of AI servers, and proportion of AI chip wafers. Source: TrendForce

The reason for this is believed to be the rate-limited supply of AI semiconductors. Currently, NVIDIA's GPUs monopolize about 80% of AI semiconductors, and manufacturing is carried out at TSMC. In the subsequent process, CoWoS will be used for packaging, but the production volume of CoWoS is currently a bottleneck.

In addition, in CoWoS, multiple HBMs (High Bandwidth Memory) are placed around the GPU, and these HBMs are stacked DRAM, and this HBM is also considered to be one of the bottlenecks.

So, why does TSMC's CoWoS (Chip on Wafer on Substrate) production capacity continue to be insufficient? In addition, although there are three major DRAM manufacturers such as Samsung Electronics, SK hynix, and Micron Technology, why is HBM not enough?

These details are discussed in this article. The shortage of AI semiconductors such as NVIDIA GPUs is expected to last for years or more.

What is TSMC's manufacturing process?

Figure 2 shows how NVIDIA's GPUs are made at TSMC. First, in the preprocessing, the GPU, CPU, memory (DRAM), and so on are created separately. Here, since TSMC does not produce DRAM, it seems to get HBM from DRAM manufacturers such as SK hynix.

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Figure 2.2.5 Manufacturing processes in D to 3D. Source: Tadashi Kamewada

Next, the GPU, CPU, HBM, etc., are bonded to a "Chip on Wafer, or CoW". Silicon interposers have pre-formed wiring layers and through-silicon vias (TSVs) to connect the chips.

After this step is completed, the interposer is attached to the substrate (Wafer on Substrate (WoS) and various tests are performed, and the CoWoS package is completed.

Figure 3 shows the cross-sectional structure of CoWoS. Two logic chips, such as the GPU and CPU, as well as HBM with stacked DRAM, are bonded to a silicon interposer on which the wiring layer and TSV are formed. The interposer is connected to the package substrate by a copper bump, and the substrate is connected to the circuit board by a package ball.

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Figure 3. CoWoS structure and two bottlenecks for AI semiconductors such as NVIDIA GPUs. Source: WikiChip

Here, we consider the first bottleneck to be the silicon interposer, and the second bottleneck is HBM, which is the cause of the NVIDIA GPU shortage.

The size of the silicon interposer becomes enormous

Figure 4 shows the evolution of CoWoS since 2011. First, we can see that each generation of silicon interposers has become huge. In addition, the number of HBMs installed is constantly increasing.

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Figure 4: Increasing adapter area and number of installations per generation of HBM. Source: TSMC

Figure 5 shows the type of Logic chip installed in a 12-inch wafer from CoWoS Gen 1 to Gen 6, the HBM standard and number of installations, the silicon interposer area, and the number of interposers available.

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Figure 5. CoWoS generations, number of HBM installations, number of 12-inch wafer interlayers.

It can be seen that the number of HBM installations has continued to grow by 1.5 times since the third generation. In addition, HBM's standards have changed and performance has improved. In addition, as the interposer area increases, the number of interposers that can be obtained from a 12-inch wafer decreases.

However, this number of acquisitions is only "the value obtained by dividing the area of the 12-inch wafer by the area of the interposer", and the actual number of acquisitions is much smaller.

The 6th generation CoWoS interposer released in 2023 has an area of 3400 mm2, but if we assume it is a square, it will be 58 mm × 58 mm. If it is placed on a 12-inch wafer, all interposer layers on the edge of the wafer will be defective. Then, a 58 mm × 58 mm interposer can only acquire up to 9 chips from a 12-inch wafer.

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Figure 6. How many interposer layers can be obtained for a 12-inch wafer. Source: Tadashi Kamewada

In addition, a wiring layer and TSV are formed on top of the interposer, but the yield is about 60~70%, so the maximum number of good interposers that can be obtained from a 12-inch wafer is 6.

The representative GPU of CoWoS made with this adapter board is NVIDIA's "H100", which is highly competitive in the market, trading for up to $40,000.

TSMC's CoWoS production capacity is insufficient

So, how big is TSMC's CoWoS manufacturing capacity?

According to the DIGITIMES workshop "Opportunities and Challenges in the Global Server Market in 2024 in the Generative AI Wave" held on November 14, 2023, the production capacity in the second quarter of 2023 is 13K~15K pieces per month. It is predicted that monthly production will double to 30K~34K in the second quarter of 2024, thereby narrowing the gap between supply and demand for NVIDIA GPUs.

However, such prospects are still far away. This is because, as of April 2024, NVIDIA still doesn't have enough GPUs. TrendForce said in a news release on April 16 that TSMC's CoWoS production capacity will reach about 40K per month by the end of 2024 and double by the end of 2025.

In addition, TrendForce reports that NVIDIA will release the B100 and B200, but these interposers may be larger than 58 mm × 58 mm. This means that the number of high-quality interposers that can be obtained from 12-inch wafers will be further reduced, so even if TSMC desperately tries to increase CoWoS capacity, it will not be able to produce enough GPUs to meet demand.

The enormity of this GPU CoWoS interposer and the increase in TSMC's production capacity have no end no matter how far they go.

It has been suggested to use a 515×510mm prismatic organic substrate instead of a 12-inch wafer as an interposer. In addition, Intel Corporation in the United States has proposed the use of rectangular glass substrates. Of course, if a large rectangular substrate can be used, a large number of interposers can be obtained more efficiently than a round 12-inch wafer.

However, in order to form a wiring layer and TSV on a rectangular substrate, specialized manufacturing equipment and transmission systems are required. Considering the preparation of these, it takes time and money. Let's explain the situation with HBM, which is another bottleneck.

HBM's roadmap

As shown in Figures 4 and 5, the number of HBMs increases with the generation of CoWoS, which also leads to a huge interposer. DRAM manufacturers should not continue to manufacture HBMs of the same standard. With the development of CoWoS, HBM's various properties needed to be improved. HBM's roadmap is shown in Figure 7.

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Figure 7.HBM's roadmap and the number of DRAMs stacked by HBM. Source: DIGITIMES Research

First, HBM had to increase the bandwidth of data exchanged per second to match the increase in GPU performance. Specifically, in 2016 HBM1 had a bandwidth of 128 GB/s, while HBM3E will expand its bandwidth by about 10 times to 1150 GB/s and will be released in 2024.

Next, HBM's memory capacity (GB) had to be increased. To do this, it was necessary to increase the number of DRAM chips stacked in HBM from 4 to 12. The next-generation HBM4 is expected to have 16 layers of DRAM.

In addition, HBM's I/O speed (GB/s) had to be increased. In order to achieve all of these goals at the same time, we must achieve miniaturization of DRAM at all costs. Figure 8 shows the change in the proportion of DRAM sales by technology node. 2024 will be the year of switching from 1z (15.6 nm) to 1α (13.8 nm). After that, miniaturization will occur in increments of 1 nm, such as 1β (12.3 nm), 1γ (11.2 nm), and 1δ (10 nm).

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Figure 8: Percentage of DRAM sales by technology node. Source: Yole Intelligence

Note that the numbers in parentheses are the smallest processing size that actually exists in that generation of DRAM chips.

EUV is also starting to be used in DRAM

DRAM manufacturers must miniaturize in 1nm increments to achieve high levels of integration and speed. As a result, EUV (extreme ultraviolet) lithography has begun to be applied to the formation of fine patterns (Figure 9).

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

图9. The Lord's Word 来源:Yole Intelligence

The first company to use EUV in DRAM was Samsung, which applied only one layer in Generation 1z. However, this is just borrowing a huge DRAM production line with a maximum output of 10,000 wafers per month from Samsung's logic foundry to practice EUV applications. So, in the true sense of the word, Samsung is only using EUV in DRAM from 1α when it used EUV in Layer 5 DRAM.

This was followed by SK hynix, which has the largest market share in the HBM segment, which applied EUV in 1α production. The company plans to move to 1β in 2024 with the potential to apply EUV at layers three to four. As a result, SK hynix, which has only had a few EUV units to date, will launch 10 EUV units by 2024. Samsung, which also owns a logic foundry, is thought to have more than 30 EUV units.

Finally, Micron has pursued a strategy of using as little EUV as possible in order to advance its technology nodes faster than anywhere else. In fact, Micron didn't use EUV until 1 β. In the development process, it also plans to use ArF Immersion + Multi-Pattern technology instead of EUV at 1 γ, but since it finds it difficult to ramp up production because there is no more room for matching, it is expected to introduce EUV from 1 γ.

The three DRAM manufacturers are currently experimenting with EUV with a lens aperture of NA = 0.33, but it is believed that they are considering switching to high NA from 2027-2028. As a result, DRAM manufacturers will continue to go further and further in their miniaturization process.

How many HBMs will now be produced using these state-of-the-art processes?

DRAM 出货量和 HBM 出货量

Figure 10 shows DRAM shipments, HBM shipments, and HBM's percentage of DRAM shipments. As mentioned at the beginning of this section, ChatGPT was released in November 2022, giving Nvidia's GPUs a major breakthrough in 2023.

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

图10. DRAM 出货量、HBM 出货量和 HBM 所占百分比。 来源:Yole Intelligence

At the same time, HBM's shipments have grown rapidly: from $2.75 billion (3.4%) in 2022 to $5.45 billion (10.7%) in 2023, almost doubling to $14.06 billion (19.4%) in 2024.

Looking at DRAM shipments, 2021 peaked due to the special demand for Corona, but in 2023 after this special demand ended, the shipments dropped sharply. After that, shipments are expected to recover and surpass the 2021 peak in 2025. Moreover, shipments are expected to continue to grow from 2026, albeit with some ups and downs, to exceed $150 billion by 2029.

HBM's shipments, on the other hand, are expected to continue to grow after 2025, but HBM's share of DRAM shipments will reach saturation of 24-25% after 2027. Why is that?

The number of HBM shipments and the total number of HBM shipments

As shown in Figure 11, the mystery can be solved by looking at the shipments of various HBMs and the total shipments of HBM.

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Figure 11.Shipments of HBM by type and HBM by HBM. Source: Yole Intelligence

First, prior to 2022, HBM2 was the dominant shipment. Secondly, in 2023, Nvidia's GPUs have made a major breakthrough, with HBM2E replacing HBM2 as the mainstream. Moreover, HBM3 will become mainstream between 2024 and 2025 this year. In 2026-2027, HBM3E will be the largest product shipped, while from 2028, HBM4 will play a leading role.

In other words, HBM will go through a generational change at intervals of about two years. This means that DRAM manufacturers must continue to miniaturize at 1 nanometer while updating the HBM standard every two years.

As a result, as shown in Figure 11, shipments of all HBMs will hardly increase after 2025. This is not because DRAM manufacturers are slacking off, but because they have to do their best to produce the most advanced DRAM and the most advanced HBM.

In addition, one of the reasons why HBM shipments will not grow significantly after 2025 is that the number of DRAM chips stacked in HBM will increase (Figure 12): as GPU performance increases, HBM's memory capacity (GB) will also have to increase, and therefore the amount of DRAM stacked in HBM 2 and HBM2E will increase. The number of DRAMs stacked in HBM2 and HBM2E will increase to 4-8 DRAMs, the number of DRAMs stacked in HBM3 and HBM3E will increase to 8-12, and the number of DRAMs stacked in HBM4 will increase to 16.

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Figure 12.Memory capacity (GB) per HBM and number of DRAM chips stacked in HBM. Source: Yole Intelligence

This means that HBM2 will only require 4 to 8 DRAM, while HBM4 will require 2 to 4 times more DRAM, or 16 DRAM. This means that in the HBM4 era, DRAM manufacturers can produce 2-4 times more DRAM than HBM2 and still ship the same amount as HBM.

As a result, as DRAM continues to shrink in 1nm increments, HBM changes its generation every two years, and the amount of DRAM stacked in HBM increases with each generation, and it is expected that HBM's total shipments will reach saturation from 2025 onwards.

So, will the shortage of HBM continue? Is it impossible for DRAM manufacturers to further increase HBM's shipments?

DRAM manufacturers are rushing to mass produce HBM

We've already explained why DRAM manufacturers can't significantly increase HBM shipments, but DRAM manufacturers are still able to reach their limits, and if they do, they're trying to mass produce HBM. This is because the price of HBM is very high.

Figure 13 shows the average price per gigabyte for various HBM and generic DRAM. Both regular DRAM and HBM have the highest price per gigabyte at launch. While the trend is the same, the price per gigabyte difference between regular DRAM and HBM is more than 20 times. To compare the average price per gigabyte of regular DRAM and HBM, the chart in Figure 13 shows 10x the price of regular DRAM.

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Figure 13.Comparison of average prices per gigabyte for various HBM and generic DRAM. Source: Yole Intelligence

Comparing the price per gigabyte compared to $0.49 for regular DRAM, at the highest price just after launch, HBM2 was about 23 times more expensive per gigabyte than regular DRAM ($11.4), HBM2E was about 28 times more expensive per gigabyte than regular DRAM ($13.6), and HBM4 was about 30 times more expensive per gigabyte than regular DRAM ($14.7).

In addition, Figure 14 shows the average price of various HBMs. The highest priced HBM2 is $73, HBM2E is $157, HBM3 is $233, HBM3E is $372, and HBM4 is up to $560.

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Figure 14.Comparison of average price per gigabyte for various HBM and standard DRAM. Source: Yole Intelligence

Figure 15 shows how expensive HBM is. For example, a 16GB DDR5 DRAM produced by a DRAM vendor in the 1z process is up to $3~4. However, this year, SK hynix's HBM3E price will be 90~120 times higher than $361.

DDR (Double Data Rate) is a memory standard. Data transfer speeds are getting faster and faster, DDR5 is twice as fast as DDR4 and DDR6 is twice as fast as DDR5. 2024 will be the year of the transition from DDR4 to DDR5, and DRAM manufacturers must also constantly update their DDR standards.

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Figure 15: Comparison of various semiconductor processes, chip sizes, number of transistors (number of bits), and average prices.

Going back to HBM, the HBM3E has a chip size about the same as the A17 Bionic AP (Application Processor) of the latest iPhone 15, which is produced on TSMC's state-of-the-art 3nm process, but at a 3.6x higher price. The HBM of DRAM is higher than that of advanced logic. It's shocking. And because the price is so high, DRAM manufacturers will do everything in their power to increase shipments in order to dominate HBM's supremacy.

Let's take a look at the roadmaps of the three DRAM manufacturers.

DRAM 制造商争夺 HBM

Figure 16 shows how the three DRAM manufacturers produced HBM from 2015~2024.

Will NVIDIA's GPU shortage continue in the future, and what are the bottlenecks?

Figure 16. HBM's roadmap for SK hynix, Samsung, and Micron. Source: DIGITIMES Research

The first successful mass production of HBM1 was made by SK hynix. However, in the case of HBM2, Samsung was the first to achieve mass production than SK hynix. When NVIDIA's GPUs made a major breakthrough in 2023, SK hynix was the first to successfully mass-produce HBM3. This has brought huge benefits to SK hynix.

On the other hand, Micron, another DRAM manufacturer, initially developed a hybrid memory cube (HMC) that was different from the HBM standard. However, the Joint Electronic Device Engineering Council (JEDEC), an industry organization that promotes semiconductor standardization in the United States, has officially certified the HBM standard instead of HMC. As a result, Micron began to abandon the development of HMC in 2018 and entered the development of HBM, far behind the two Korean manufacturers.

Thus, in HBM's market share, SK hynix has 54%, Samsung has 41%, and Micron has 5%.

SK hynix, which has the largest share of HBM, will start producing HBM at its NAND plant M15 in 2023. In addition, HBM3E will be released in the first half of 2024. In addition, in 2025, the M15X plant, currently under construction, will be redesigned specifically for HBM to produce HBM3E and HBM4.

On the other hand, Samsung, which wants to catch up with SK hynix, plans to start production of HBM at Samsung Display's factory in 2023, double its HBM production capacity in 2024, and mass produce HBM4 in 2025 before SK hynix.

Micron, which has been lagging behind, aims to skip HBM3 in 2024~2025, compete with HBM3E, and gain a 20% market share in 2025. In addition, by 2027~2028, the company has also set a goal of catching up with the two major Korean manufacturers in terms of mass production of HBM4 and HBM4E.

In this way, the fierce competition between the three DRAM manufacturers may break through the saturation of HBM shipments, thereby eliminating the shortage of HBM.

How long will NVIDIA's GPU shortage last?

In this article, we explain the reasons for the global shortage of AI semiconductors such as NVIDIA GPUs.

1. NVIDIA's GPUs are manufactured in TSMC's CoWoS package. The capacity of this CoWoS is completely insufficient. The reason for this is that silicon interposers equipped with chips such as GPUs, CPUs, and HBMs are getting larger with each generation. TSMC is trying to increase the capacity of this intermediate process, but as the GPU generation advances, the interposer will also become huge.

2. HBM shortage for CoWoS. The reason for this is that DRAM manufacturers must continue to miniaturize in 1nm increments, the HBM standard is forced to be replaced every two years, and the number of DRAM chips stacked in HBM increases with each generation. DRAM manufacturers are doing their best to produce HBM, but shipments are expected to saturate after 2025. However, due to the very high price of HBM, there is fierce competition among DRAM vendors, which may lead to a shortage of HBM.

As mentioned above, there are two bottlenecks that are causing NVIDIA's GPU shortage: TSMC's manufacturing capacity shortage and HBM's shortage, but these issues are unlikely to be resolved in about a year. As a result, NVIDIA's GPU shortage is expected to continue in the coming years.

*Disclaimer: This article was created by the original author. The content of the article is his personal point of view, and our reprint is only for sharing and discussion, and does not mean that we agree or agree, if you have any objections, please contact the background.