laitimes

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

Feng se Xiao Zhen originated from The Temple of Oufei

Qubits | Official account QbitAI

He came, he came, and Lao Huang came with NVIDIA's latest generation GPU.

Before everyone guessed the 5nm wrong, a big surprise, Lao Huang directly on the TSMC 4nm process.

The new card, named H100, uses a new Hopper architecture and directly integrates 80 billion transistors, 26 billion more than the previous generation A100.

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

The number of cores soared to an unprecedented 16,896, 2.5 times that of the previous generation of A100 cards.

Floating-point computing and tensor core computing power has also increased by at least three times, such as FP32 to reach 60 trillion times per second.

In particular, the H100 is oriented to AI computing, and is equipped with an optimization engine for Transformer, which allows the training speed of large models to directly ×6.

(You can know the secret behind the 530 billion parameters of Wei Zhentian-Turing.) )

As a new GPU with explosive performance, it is no accident that the H100 will become the same as the predecessors V100 and A100 to become the big treasure in the hearts of AI practitioners.

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

However, it has to be mentioned that its power consumption has also exploded, reaching an unprecedented 700W, returning to the level of nuclear bombs.

Regarding the self-developed Grace CPU, the conference also announced more details.

Unexpectedly, Lao Huang learned from Cook that one hand 1 + 1 = 2, the two CPU "glued" together to form a CPU super chip - Grace CPU Superchip.

Grace CPU uses the latest Arm v9 architecture, the two have a total of 144 cores, with 1TB/s memory bandwidth, which is higher than Apple's latest M1 Ultra's 800GB/s.

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

Based on the new CPU and GPU basic hardware, the conference also brought the next generation of enterprise-level AI infrastructure DXG H100, the world's fastest AI supercomputing Eos.

Of course, Nvidia, as a true metaverse pioneer, is also indispensable to new advances on Omniverse.

Let's take a closer look.

The first Hopper architecture GPU with explosive performance

As the successor to the previous generation of GPU architecture A100 (ampere architecture), how much leaps and bounds has the H100 equipped with the new Hopper architecture made?

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

Without further ado, let's go up the parameters first:

Lao Huang can be described as a blood cost, first directly using the TSMC 4nm process, transistors integrated 80 billion in one go.

You know, the previous generation of A100 is still only 7nm architecture, before this conference came out, many voices in the outside world speculated that Lao Huang would use a 5nm process, and the result was released to give everyone a big surprise.

The most terrifying thing is that the CUDA core directly soared to 16896, directly reaching nearly 2.5 times that of the A100. (You should know that when you go from V100 to A100, the core is only a little bit more)

This time, you can't feel that the old yellow knife method is accurate.

Looking at the floating-point operation and the TENS operation of INT8/FP16/TF32/FP64, the performance is basically all improved by more than 3 times, compared to the previous two generations of architecture upgrades are also small.

This also makes the thermal power consumption (TDP) of the H100 directly reach an unprecedented 700w, and the NVIDIA "nuclear bomb factory" lives up to its name (manual dog head).

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

Then again, the H100 is also the first GPU to support PCle 5.0 and HBM3, and the data processing speed has further soared - the memory bandwidth has reached 3TB/s.

What is this concept?

Lao Huang smiled mysteriously at the press conference: only 20 H100s are in hand, and I have global Internet traffic.

The overall parameter details are compared with the previous generations of A100 and V100:

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

△ The source of the figure is @anandtech

It is worth mentioning that the new GPU of the Hopper architecture and the Nvidia CPU Grace name group together, became the name of the famous female computer scientist Grace Hopper, which was also used by Nvidia to name their super chip.

Grace Hopper invented the world's first compiler and COBOL language, known as the "First Lady of Computer Software Engineering".

Train a large model with 395 billion parameters in just 1 day

Of course, Hopper's new features go far beyond parameters.

This time, Lao Huang deliberately highlighted the Transformer engine that Hopper was equipped with for the first time at the press conference.

Well, built specifically for Transformer, this type of model remains accurate and 6x more performant when training, which means that the training time is reduced from weeks to days.

How to behave?

Now, whether it's a GPT-3 (19 hours) training 175 billion parameters or a Transformer big model with 395 billion parameters (21 hours), the H100 can reduce training time from a week to less than a day, with a speed increase of up to 9 times.

Reasoning performance is also greatly improved, like the 530 billion Megatron model launched by Nvidia, the throughput of inference on the H100 is 30 times higher than that of the A100 directly, and the response delay is reduced to 1 second, which can be said to be perfectly held.

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

I have to say that NVIDIA's wave has indeed broken into the Transformer camp.

Prior to this, NVIDIA's series of GPU optimization designs were basically for the convolutional architecture, and it was close to printing the words "I love convolution" on the brain gate.

The only thing to blame is that Transformer has been so popular lately. (Manual Dog Head)

Of course, the highlights of the H100 are more than that, along with it and a series of Nvidia chips, the introduction of NVIDIANVLink fourth-generation interconnect technology will be introduced.

In other words, the efficiency of the chip stack is higher, and the I/O bandwidth is extended to 900GB/s.

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

This time, Lao Huang also highlighted the security of GPUs, including isolation protection between instances and confidential computing functions of the new GPU.

Of course, mathematical computing power has also improved.

The new DPX instruction on the H100 can speed up dynamic programming, which is 7 times faster than a series of dynamic programming algorithms including algorithm optimization and genomics.

According to Lao Huang, H100 will start supplying in the third quarter of this year, and netizens joked that "it is estimated that it is not cheap."

Currently, the H100 is available in two versions:

One is the SXM with a power of up to 700W for high-performance servers; the other is suitable for more mainstream server PCIe, which consumes 50W more power than the 300W of the previous generation A100.

4608 H100 blocks, creating the world's fastest AI supercomputing

H100 has been released, and Lao Huang naturally will not miss any opportunity to build a supercomputer.

The latest DGX H100 computing system based on the H100 is also equipped with 8 GPUs like the previous generation of "ovens".

The difference is that the DGX H100 system achieves 32 Petaflop AI performance at FP8 accuracy, which is a full 6 times higher than the previous generation of DGX A100 system.

The connection speed between GPUs has also become faster, with 900GB/s speeds close to 1.5 times that of the previous generation.

The most important thing is that this time NVIDIA also built an Eos supercomputer on the basis of DGX H100, which became the performance TOP 1 of the AI supercomputing industry in one fell swoop.

The AI computing performance of 18.4 Exaflops alone is 4 times faster than Japan's Fugaku supercomputer.

The supercomputer is equipped with 576 DGX H100 systems and directly uses 4608 H100s.

Even with traditional scientific calculations, the hash rate can reach 275 Petaflops (Fugaku is 442 Petaflops), and there is no problem in ranking among the top 5 superclassmen.

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

"Assemble" the CPU, running divided into TOP1

At this GTC conference, Lao Huang still "mentioned a few mouths" of the super server chip Grace.

It has already made an appearance at the GTC conference in April last year, and as at that time, Lao Huang said: It is expected that supply can begin in 2023, and it is impossible to meet this year anyway.

However, Grace's performance is worth mentioning, with "amazing progress".

It is used in two super chips:

One is the Grace Hopper superchip, a single MCM that consists of a Grace CPU and a Hopper-architecture GPU.

One is the Grace CPU Super Chip, which consists of two Grace CPUs interconnected by NVIDIA NVLink-C2C technology, including 144 Arm cores, and has up to 1TB/s of memory bandwidth – a 2-fold increase in bandwidth and "only" 500w of energy consumption.

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

It is difficult not to think of Apple's M1 Ultra, which seems to have made "assembly" a major trend in the chip industry due to the progress of inter-chip interconnect technology.

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

The Grace Super Chip achieved 740 points in SPECrate2017_int_base benchmarks, 1.5 times (460 points) of the CPU currently mounted on the DGX A100.

Grace Super Chips can run on all NVIDIA computing platforms, either as stand-alone pure CPU systems or as GPU-accelerated servers, using NVLink-C2C technology to carry one to eight GPUs based on Hopper architecture.

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

(Well, just after I said that, Lao Huang's chip pile was piled up.) )

It is worth mentioning that NVIDIA has opened up NVLink-C2C to third-party custom chips.

It is an ultra-fast chip-to-chip, die-to-die interconnect technology that will support consistent interconnects between custom dies and NVIDIA GPUs, CPUs, DPUs, NICs, and SOCs.

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

Perhaps, Nintendo's new handheld can expect a wave?

Even industry has to be done in the metacosm

Of course, in addition to the above content, this time NVIDIA also revealed a lot of cases related to industrial applications.

Scenarios such as autonomous driving and digital twins, including virtual factories, are inextricably linked to computer rendering and simulation technology.

NVIDIA believes that the industry can also increase the amount of data for AI training by simulating in a virtual environment, in other words, "doing big training in the metaverse" in other words.

For example, let AI intelligent driving "practice" in the metaverse, use the simulated data to create a semi-real environment, and add some environmental simulations that may suddenly fail:

Another example is to create a "digital factory" with equal proportions and exactly the same parameters as the materials in the real environment, start trial operation in advance before construction, and check the environment where problems may occur in a timely manner.

In addition to digital twins, the production of digital assets is also a major consideration in the early construction phase of the metaverse.

In this regard, Nvidia has launched Omniverse Cloud, which can collaborate anytime, anywhere in the cloud.

NVIDIA's new nuclear bomb GPU: 4nm process of 80 billion transistors, the new Hopper architecture is too explosive

Most interestingly, the conference also demonstrated an AI-driven virtual character system.

In reality, for 3 days, the virtual characters practiced kung fu for 10 years by reinforcement learning in the metaverse.

When you have practiced your skills, you are a good "action actor" in both games and animations.

It is used to generate animation without binding bones and k-frames, and can be used to give commands in natural language, just like the director and live actors communicate, greatly shortening the development process.

To talk about the infrastructure of the meta-universe, we must look at Lao Huang.

Venturebeat commented that "these cases give real meaning to the metaverse."

So, are you bullish on NVIDIA's omniverse prospects?

Read on