laitimes

ChatGPT caused a shortage of AI chips, and TSMC became a super winner behind NVIDIA

author:Ray Technology

In 1849, the gold rush began after news of the discovery of gold in California, USA. Countless people poured into the new land, some from the East Coast, some from the European continent, and the first generation of Chinese immigrants to the United States, who first called the place "Gold Mountain" and later "San Francisco."

But no matter what, the gold prospectors who come to this new land need food, clothing, shelter and transportation, of course, the most critical thing is the equipment for gold mining - shovels. As the saying goes, "If you want to do things well, you must first sharpen your tools", in order to pan for gold more efficiently, people began to frantically flock to shovel sellers, along with wealth.

More than a hundred years later, not far south of San Francisco, two Silicon Valley companies set off a new gold rush: OpenAI was the first to discover the "gold mine" of the AI era, and Nvidia became the first "shovel seller". As in the past, countless people and companies began to pour into this new hot land, pick up the "shovel" of the new era and start panning for gold.

The difference is that in the past, there was almost no technical threshold for shovels, but today's NVIDIA's GPU is everyone's choice. Since the beginning of this year, ByteDance alone has ordered more than $1 billion in GPUs from NVIDIA, including 100,000 A100 and H800 accelerator cards. Baidu, Google, Tesla, Amazon, Microsoft... These big companies have ordered at least tens of thousands of GPUs from Nvidia this year.

ChatGPT caused a shortage of AI chips, and TSMC became a super winner behind NVIDIA

H100 GPU, Photo/NVIDIA

But that's still not enough. In an interview with Caixin at the end of March, Megvii CEO Yin Qi said that only about 40,000 A100s in China can be used for large model training. As the AI boom continues, the castrated version of NVIDIA's previous generation high-end GPU A100 - A800 once rose to 100,000 yuan in China.

At a private meeting in June, OpenAI CEO Sam Altman again said that the severe shortage of GPUs has led to the postponement of many efforts to optimize ChatGPT. According to the technical consulting firm TrendForce, OpenAI needs about 30,000 A100s to support the continuous optimization and commercialization of ChatGPT.

Even since the new round of ChatGPT outbreak in January this year, the shortage of AI computing power has lasted for nearly half a year, why are these large companies still short of GPU and computing power?

ChatGPT lacks a graphics card? What is missing is NVIDIA

To borrow a slogan: not all GPUs are NVIDIA. The shortage of GPUs is essentially the shortage of NVIDIA's high-end GPUs. For AI large model training, either choose NVIDIA A100, H100 GPU, or the reduced version of A800 and H800 specially launched by NVIDIA after last year's ban.

The use of AI includes two links, training and inference, the former can be understood as creating a model, and the latter can be understood as using a model. The pre-training and fine-tuning of AI large models, especially the pre-training process, requires a lot of computing power, especially focusing on the performance and data transmission capabilities provided by a single GPU. But today AI chips that can provide the computational efficiency of pre-training large models (AI chips in a broad sense only refer to chips for AI use):

It cannot be said that there is not much, only very little.

A very important feature of the large model is at least hundreds of billions of parameters, behind the need for huge computing power for training, data transmission and synchronization between multiple GPUs will lead to some GPU computing power idle, so the higher the performance of a single GPU, the smaller the number, the higher the utilization efficiency of the GPU, and the lower the corresponding cost.

ChatGPT caused a shortage of AI chips, and TSMC became a super winner behind NVIDIA

NVIDIA DGX H100 AI supercomputer, photo/NVIDIA

The A100 and H100 released by NVIDIA since 2020 have high computing power of a single card on the one hand, and high bandwidth advantages on the other hand. The A100's FP32 hashrate reaches 19.5 TFLOPS (trillions of floating point operations per second), and the H100 has a whopping 134 TFLOPS.

At the same time, investment in communication protocol technology such as NVLink and NVSwitch has also helped NVIDIA build a deeper moat. By the H100, fourth-generation NVLink can support up to 18 NVLink links for a total bandwidth of 900GB/s, which is 7 times the bandwidth of PCIe 5.0.

The A800 and H800, customized for the Chinese market, have almost unchanged computing power, mainly to avoid regulatory standards, and the bandwidth has been cut by about a quarter and half, respectively. According to Bloomberg, the H800 takes 10%-30% more time than the H100 for the same AI task.

But even so, the A800 and H800 are more computationally efficient than other GPUs and AI chips. This is also why there will be a "hundred flowers" imagination in the AI inference market, including the AI chips and other GPU companies developed by major cloud computing companies can occupy a certain share, but in the AI training market with higher performance requirements, only NVIDIA is "the largest".

ChatGPT caused a shortage of AI chips, and TSMC became a super winner behind NVIDIA

H800 "knife" bandwidth, picture / NVIDIA

Of course, behind the "one family", the software ecology is also the core technical moat of NVIDIA. There are many articles on this, but in short, the most important thing is that the CUDA unified computing platform launched and adhered to by NVIDIA since 2007 has become the infrastructure of the AI world, and most AI developers are based on CUDA, just as Android and iOS are to mobile app developers.

However, it stands to reason that NVIDIA also understands that its high-end GPUs are very sought-after, and after the Spring Festival, there are many news pointing out that NVIDIA is adding wafer foundry orders to meet the strong demand of the global market, which should be able to significantly increase foundry production capacity in these months, after all, it is not TSMC's most advanced 3nm process.

However, the problem lies precisely in the foundry link.

NVIDIA's high-end GPU is inseparable from TSMC

As we all know, the low tide of consumer electronics and the continued destocking have led to a general decline in the capacity utilization rate of wafer foundries, but TSMC's advanced process is an exception.

Due to the AI boom triggered by ChatGPT, A100 based on TSMC's 7nm process and H100 of 4nm are urgently adding orders, of which TSMC's 5/4nm production line is close to full load. Supply chain sources also estimate that NVIDIA's SHR (Most Urgent Handling Level) orders flocking to TSMC will last for 1 year.

In other words, TSMC's production capacity is not enough to meet the strong demand of NVIDIA in the short term. No wonder some analysts believe that because A100 and H100 GPUs are always in short supply, whether from the perspective of risk control or cost reduction, looking for Samsung and even Intel for foundry outside TSMC is the right thing to do.

ChatGPT caused a shortage of AI chips, and TSMC became a super winner behind NVIDIA

Semiconductor silicon chip on the chip, picture/TSMC

But it turns out that Nvidia doesn't have that idea, at least in the short term, and there's no way to leave TSMC. Just before Sam Altman complained that NVIDIA GPUs were not enough, NVIDIA founder and CEO Jensen Huang said on COMPUTEX that NVIDIA's next-generation chips will still be handed over to TSMC.

The most core reason in technology is that from V100, A100 to H100, NVIDIA's high-end acceleration cards all use TSMC's CoWoS advanced packaging technology to solve the memory and computing integration of chips in the context of high computing power AI. And CoWoS advanced packaging core technology: no TSMC.

In 2012, TSMC launched the exclusive CoWoS advanced packaging technology, which realized one-stop services from wafer foundry to terminal packaging, and customers including NVIDIA, Apple and many other chip manufacturers have adopted high-end products. In order to meet NVIDIA's urgent needs, TSMC even adopts some outsourcing and subcontracting methods, but this does not include the CoWoS process, and TSMC still focuses on the most valuable advanced packaging part.

According to Nomura Securities' estimates, TSMC's CoWoS annualized production capacity is about 70,000-80,000 wafers at the end of 2022, and is expected to increase to 14-150,000 wafers by the end of 2023, and is expected to challenge 200,000 wafers by the end of 2024.

However, far from hydrolysis near fire, TSMC's advanced CoWoS packaging capacity is seriously in short supply, TSMC CoWoS orders have doubled since last year, and this year's demand from Google and AMD is also strong. Even Nvidia has to further strive for higher priorities through Huang's personal relationship with TSMC founder Zhang Zhongmou.

ChatGPT caused a shortage of AI chips, and TSMC became a super winner behind NVIDIA

TSMC, Photo/Wikimedia Commons

Write at the end

Over the past few years, due to the pandemic and geopolitical changes, everyone has realized that a cutting-edge technology built on sand – chips are so important. After ChatGPT, AI has once again attracted worldwide attention, and along with the desire for artificial intelligence and accelerated computing power, countless chip orders have also arrived.

The design and manufacture of high-end GPUs require a long R&D investment and accumulation, and need to face insurmountable hardware and software barriers, which also leads to this "feast of computing power", NVIDIA and TSMC can get most of the cake and the right to speak.

Whether it's focusing on generative AI today or the last wave of deep learning dominated by image recognition, the speed at which Chinese companies are catching up in AI software capabilities is obvious to all. However, Chinese companies spend huge sums of money to turn the ship's bow to AI, rarely focusing on the lower hardware.

But behind AI acceleration, two of the most important four GPUs have been limited in China, and the other two castrated A800 and H800 not only slow down the catch-up speed of Chinese companies, but also cannot rule out the risk of limitation. Perhaps we need to see Chinese companies compete at a lower level than on big models.

Read on