laitimes

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

  New Zhiyuan reports  

Editor: Editorial Department

In the past 25 years, the semiconductor process has been approaching the limit, and ChatGPT was born. Today, the world's most powerful NVIDIA GPU has more than 208 billion transistors. TSMC bigwigs predict that in the next decade, 1 trillion transistor GPUs will come out.

At the GTC 2024 conference, Lao Huang presented the world's most powerful GPU, the Blackwell B200, which packaged more than 208 billion transistors.

Compared with the previous generation H100 (80 billion), the number of B200 transistors is more than twice that of it, and the training AI performance has soared by 5 times, and the running speed has increased by 30 times.

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

If so, what does expanding the number of 100 billion transistors to 1 trillion mean for the AI community?

Today, the front page of IEEE features an article written by TSMC's chairman and chief scientist - "How do we achieve 1 trillion transistor GPUs"?

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

The main purpose of this thousand-word long article is to make people in the AI community aware of the contribution of breakthroughs in semiconductor technology to AI technology.

From "Deep Blue", which defeated the human chess champion in 1997, to ChatGPT, which exploded in 2023, AI has been crammed into everyone's mobile phone from a research project in the laboratory in the past 25 years.

All of this is thanks to major breakthroughs at three levels: ML algorithm innovation, massive data, and advances in semiconductor processes.

TSMC predicts that in the next 10 years, the number of transistors integrated in GPUs will reach 1 trillion!

At the same time, GPU performance per watt will increase by a factor of 1,000 over the next 15 years.

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

The evolution of semiconductor technology led to the birth of ChatGPT

From software and algorithms to architecture, circuit design, and even device technology, every layer of the system dramatically improves the performance of AI.

But the continuous improvement of basic transistor device technology has made all this possible:

The chip processes used by IBM to train Deep Blue are 0.6 microns and 0.35 microns.

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

Ilya's team trained the ImageNet competition-winning deep neural network on a 40nm process.

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

In 2016, DeepMind's AlphaGo defeated Lee Sedol using a 28nm process.

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

The chip used to train ChatGPT is based on a 5nm process, and the latest version of the ChatGPT inference server has reached 4nm.

It can be seen that from 1997 to the present, the progress made in the semiconductor process node has promoted the rapid development of AI today.

If the AI revolution is to continue at its current pace, it needs innovation and support from the semiconductor industry.

If you take a closer look at the computing power requirements of AI, you will find that the amount of computing and memory access required for AI training has increased by several orders of magnitude in the last five years.

GPT-3, for example, requires the equivalent of more than 5 quadrillion operations per second for an entire day (equivalent to 5,000 gigabytes of floating-point operations) and 3 terabytes (3 trillion bytes) of memory capacity.

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

With the advent of a new generation of generative AI applications, the demand for computing power and memory access is still increasing rapidly.

This raises a pressing question: how can semiconductor technology keep up with this pace of development?

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

From integrated chips to integrated chipsets

Since the invention of integrated circuits, the semiconductor industry has been looking for ways to make chips smaller so that more transistors can fit into a chip the size of a fingernail.

Today, transistor integration processes and packaging technologies have moved to the next level – the industry has moved from the scaling of 2D space to the integration of 3D systems.

The chip industry is consolidating multiple chips into a more integrated, highly interconnected system, marking a giant leap forward in semiconductor integration technology.

In the era of AI, one of the bottlenecks of chip manufacturing is that lithography chip manufacturing tools can only manufacture chips with an area of no more than about 800 square millimeters, which is the so-called lithography limit.

But now, TSMC can push through that limit by connecting multiple chips on a single silicon wafer with embedded interconnects, enabling large-scale integration that is not possible on a single chip.

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

TSMC's CoWoS technology, for example, is capable of packing up to six chips within the lithography limit, as well as twelve high-bandwidth memory (HBM) chips together.

High-bandwidth memory (HBM) is a key semiconductor technology that the AI field is increasingly relying on to integrate systems by stacking chips vertically, a technology known as System Integrated Chips (SoICs) at TSMC.

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

HBM consists of multiple layers of DRAM chips stacked vertically, all of which sit on top of a control logic IC. It uses through-silicon vias (TSVs), a vertical connection, to pass signals through each chip layer and solder balls to connect the individual memory chips.

Currently, state-of-the-art GPUs rely heavily on HBM technology.

In the future, 3D SoIC technology will offer a new solution that enables denser vertical connections between stacked chips compared to existing HBM technology.

With the latest hybrid bonding technology, 12 layers of chips can be stacked to create a new HBM structure that is a copper-to-copper connection that is tighter than traditional solder ball connections.

Address: https://ieeexplore.ieee.org/document/9265044

This memory system is thermally bonded on a larger base logic chip with an overall thickness of only 600 microns.

With high-performance computing systems consisting of many chips running large AI models, high-speed wired communications could become the next bottleneck in computing speed.

At present, data centers have begun to use optical interconnect technology to connect server racks.

In the near future, TSMC will need optical interfaces based on silicon photonics technology to package GPUs and CPUs together.

Address: https://ieeexplore.ieee.org/document/10195595

This enables optical communication between GPUs, improving the energy and area efficiency of bandwidth, allowing hundreds of servers to run efficiently like a giant GPU with unified memory.

Therefore, due to the promotion of AI applications, silicon photonics technology will become one of the most critical technologies in the semiconductor industry.

Towards a trillion transistor GPUs

The current GPU chips used for AI training, with about 100 billion transistors, have reached the limit of lithography machine processing.

If the number of transistors continues to increase, it is necessary to use multiple chips and integrate them with 2.5D and 3D technologies to complete the computing tasks.

Currently, advanced packaging technologies such as CoWoS or SoICs are available that allow more transistors to be integrated into GPUs.

TSMC predicts that in the next decade, a single GPU with multi-chip packaging technology will have more than 1 trillion transistors.

At the same time, these chips need to be connected via 3D stacking technology.

Fortunately, the semiconductor industry has been able to drastically reduce the spacing of vertical connections, increasing connection density.

Moreover, there is huge potential for increasing connection density in the future. TSMC believes that it is entirely possible for the density of connections to grow by an order of magnitude, if not more.

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

The density of vertical connections in 3D chips is growing at about the same rate as the number of transistors in GPUs

GPU energy efficiency performance trends

So, how do these leading hardware technologies improve the overall performance of the system?

Looking at the evolution of server GPUs, it's clear that so-called Energy Efficiency Performance (EEP) – a comprehensive measure of a system's energy efficiency and speed – is steadily improving.

Over the past 15 years, the semiconductor industry has achieved the feat of increasing EEP by about 3 times every two years.

In TSMC's view, this growth trend will continue, which will benefit from many innovations, including the application of new materials, the progress of equipment and integration technology, the breakthrough of EUV technology, the optimization of circuit design, the innovation of system architecture, and the comprehensive optimization of all these technical elements.

In addition, the concept of system-technology co-optimization (STCO) will become increasingly important.

In STCO, the different functional blocks within the GPU are assigned to dedicated chiplets, each built with the technology that best suits its performance and cost-effectiveness.

This optimal selection for each component will play a key role in improving overall performance and reducing costs.

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

Thanks to advances in semiconductor technology, the EEP indicator is expected to increase by a factor of three every two years

A revolutionary moment for 3D integrated circuits

In 1978, Professor Carver Mead of the California Institute of Technology and Lynn Conway of Xerox PARC, jointly developed a revolutionary approach to computer-aided design.

They developed a set of design rules that simplify the process of chip design, making it easy for engineers to design complex LSIs without being well-versed in process technology.

Address: https://ai.eecs.umich.edu/people/conway/VLSI/VLSIText/PP-V2/V2.pdf

In the field of 3D chip design, similar needs are being faced.

- Designers need to be proficient in chip and system architecture design, but also have knowledge of hardware and software optimization.

- Manufacturers, on the other hand, need in-depth knowledge of chip technology, 3D integrated circuit technology, and advanced packaging technology.

Just as we did in 1978, we need a common language that allows electronic design tools to understand these technologies.

Today, a new hardware description language, 3Dblox, is supported by most of today's technology and electronic design automation companies.

It gives designers the freedom to design 3D integrated circuit systems without worrying about the limitations of the underlying technology.

Get out of the tunnel and embrace the future

In the tide of artificial intelligence, semiconductor technology has become a key force to promote the development of AI and applications.

The new generation of GPUs has broken the traditional size and shape limitations. The development of semiconductor technology is no longer limited to the reduction of transistors in a two-dimensional plane.

An AI system can integrate as many energy-efficient transistors as possible, have an efficient system architecture optimized for specific computing tasks, and have an optimized relationship between hardware and software.

The chairman of TSMC predicts that in the next 15 years, the performance of GPUs per watt will increase by 1,000 times, and the number of GPU transistors will exceed one trillion

Over the past 50 years, advances in semiconductor technology have been like a clear tunnel in which everyone knows what to do next: to keep shrinking the size of transistors.

Now, we've come to the end of this tunnel.

The development of semiconductor technology in the future will face more challenges, but at the same time, there are also broader possibilities outside the tunnel.

And we will no longer be bound by the constraints of the past.

Read on