laitimes

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

author:New Zhiyuan

Editor: Editorial Department

[New Zhiyuan guide] just now,Nvidia released the world's strongest AI chipH200,Performance is 60% to 90% higher than H100,It can also be compatible withH100。 With the shortage of computing power, big tech companies are about to start stockpiling frantically.

NVIDIA's rhythm is getting more and more terrifying.

Just now, Lao Huang once again blew up the field in the middle of the night - releasing the world's strongest AI chip H200!

Compared with the previous overlord H100, the performance of the H200 has been directly improved by 60% to 90%.

Not only that, but the two chips are compatible with each other. This means that enterprises that use H100 training/inference models can seamlessly switch to the latest H200.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

AI companies all over the world are in a computing power shortage, and Nvidia's GPUs are hard to find. Nvidia has also previously said that the architecture cadence of biennial releases will change to annual releases.

Nvidia's announcement comes as AI companies are scrambling to find more H100s.

Nvidia's high-end chips are so valuable that they have become collateral for loans.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

Who owns the H100 is the most compelling top gossip in Silicon Valley

As for the H200 system, Nvidia said it expects to be launched in the second quarter of next year.

Also next year, Nvidia will release the B100 based on the Blackwell architecture, and plans to triple H100 production in 2024, with a goal of producing more than 2 million H100s.

At the press conference, Nvidia didn't even mention any competitors throughout the whole process, only to keep emphasizing that "NVIDIA's AI supercomputing platform can solve some of the world's most important challenges faster." 」

With the big bang of generative AI, the demand is only going to be greater, and that's not even counting the H200. Win hemp, Lao Huang really won hemp!

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

141GB of large video memory, directly double the performance!

H200 will add power to the world's leading AI computing platform.

It's based on the Hopper architecture, equipped with an NVIDIA H200 Tensor Core GPU, and advanced video memory, so it can process massive amounts of data for generative AI and high-performance computing workloads.

The NVIDIA H200 is the first GPU to feature HBM3e and has up to 141GB of video memory.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

Compared to the A100, the H200 has almost doubled its capacity and increased bandwidth by 2.4 times. Compared to the H100, the bandwidth of the H200 has increased from 3.35 TB/s to 4.8 TB/s.

Ian Buck, vice president of massive and high-performance computing at NVIDIA, said——

To create intelligence with generative AI and high-performance computing applications, large, fast GPU memory must be used to process massive amounts of data at high speeds and efficiently. With the H200, the industry's leading end-to-end AI supercomputing platform will be even faster, solving some of the world's most important challenges.
Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

Llama 2 inference speed increased by nearly 100%

Compared to its predecessor, the Hopper architecture has achieved an unprecedented leap in performance, and the continuous upgrade of H100 and the powerful open-source library of TensorRT-LLM are constantly raising the performance standard.

The release of H200 has taken the performance leap to another level, directly making the inference speed of the Llama2 70B model nearly double that of the H100!

The H200 is based on the same Hopper architecture as the H100. This means that, in addition to the new memory features, the H200 has the same features as the H100, such as the Transformer Engine, which accelerates LLMs and other deep learning models based on Transformer architectures.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

Powered by NVIDIA's NVLink and NVSwitch high-speed interconnect technologies, the HGX H200 delivers more than 32 petaflops of FP8 deep learning computing power and 1.1TB of ultra-high memory bandwidth.

When the H200 is used instead of the H100 and paired with the NVIDIA Grace CPU, the result is the even more powerful GH200 Grace Hopper superchip, a compute module designed for large-scale HPC and AI applications.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

Let's take a look at what the performance improvement of H200 is reflected in compared with H100.

First of all, the performance improvement of H200 is mainly reflected in the inference performance of large models.

As mentioned above, when processing large language models such as Llama 2, the inference speed of H200 is nearly 1 times faster than that of H100.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

Because the computing core update is not large, if you take GPT-3 with a size of 175B as an example, the performance improvement is about 10%.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

Memory bandwidth is critical for high-performance computing (HPC) applications because it enables faster data transfer and reduces processing bottlenecks for complex tasks.

For memory-intensive HPC applications such as simulation, scientific research, and artificial intelligence, the H200's higher memory bandwidth ensures efficient access to and manipulation of data, with up to 110x faster time to results compared to CPUs.

Compared with the H100, the H200 also has a more than 20% improvement in handling high-performance computing applications.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

For the inference energy consumption, which is very important for users, H200 is directly cut in half compared to H100.

In this way, H200 can greatly reduce the cost of use for users, and continue to allow users to "buy more, save more"!

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

Last month, foreign media SemiAnalysis revealed a hardware roadmap for Nvidia in the next few years, including the much-anticipated H200, B100 and "X100" GPUs.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

Nvidia also announced the official product roadmap, which will use the same architecture to design three chips, and will continue to launch B100 and X100 next year and the year after.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

B100, performance is no longer in sight

This time, Nvidia announced the new H200 and B100 in the official announcement, directly doubling the rate of data center chip updates every two years in the past.

Taking GPT-3 with 175 billion parameters as an example, the H100 just released this year is 11 times the performance of the previous generation A100, and the H200 that will be launched next year is more than 60% higher than the H100, and the performance of the subsequent B100 is even more unexpected.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

So far, the H100 has also become the shortest "flagship" GPU in existence.

If H100 is now the "gold" of the technology industry, then NVIDIA has successfully manufactured "platinum" and "diamonds".

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

With the blessing of H200, a new generation of AI supercomputing centers is coming

In terms of cloud services, in addition to Nvidia's own investment in CoreWeave, Lambda and Vultr, Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure will be the first vendors to deploy H200-based instances.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

In addition, with the new H200, the GH200 superchip will also provide a total of about 200 Exaflops of AI computing power to supercomputing centers around the world to drive scientific innovation.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

At the SC23 conference, a number of top supercomputing centers announced that they will soon use the GH200 system to build their own supercomputers.

Germany's Uurich Supercomputing Center will use the GH200 superchip in the supercomputing JUPITER.

The supercomputer will be Europe's first hyperscale supercomputer and is part of the EuroHPC Joint Undertaking.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

The Jupiter supercomputer is based on Eviden's BullSequana XH3000 with a fully liquid-cooled architecture.

It has a total of 24,000 NVIDIA GH200 Grace Hopper superchips, interconnected via Quantum-2 Infiniband.

Each Grace CPU contains 288 Neoverse cores, and Jupiter's CPU has nearly 7 million ARM cores.

It can provide 93 Exaflops of low-precision AI computing power and 1 Exaflops of high-precision (FP64) computing power. The supercomputer is expected to be installed in 2024.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

The Japan Joint Center for Advanced High Performance Computing, jointly established by the University of Tsukuba and the University of Tokyo, will be built using NVIDIA's GH200 Grace Hopper superchip in next-generation supercomputers.

The Texas Advanced Computing Center, one of the world's largest supercomputing centers, will also use NVIDIA's GH200 to build a supercomputer, Vista.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

The National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign will use NVIDIA's GH200 superchip to build their supercomputing DeltaAI, tripling AI computing power.

In addition, the University of Bristol will be responsible for building the UK's most powerful supercomputer, Isambard-AI, funded by the UK government – which will be equipped with more than 5,000 NVIDIA GH200 superchips, providing 21 Exaflops of AI computing power.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

Nvidia, AMD, Intel: The Big Three Battle AI Chips

The GPU race has also entered a white heat.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

In the face of the H200, the plan of the old rival AMD is to use the upcoming big killer - Instinct MI300X to improve the performance of the video memory.

The MI300X will be equipped with 192GB of HBM3 and 5.2TB/s of memory bandwidth, which will make it far superior to the H200 in terms of capacity and bandwidth.

Intel is also gearing up to increase the HBM capacity of the Gaudi AI chip, and said that the third-generation Gaudi AI chip to be launched next year will increase from the previous generation's 96GB HBM2e to 144GB.

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

Intel's Max series currently has a maximum HBM2 capacity of 128GB, and Intel plans to increase the capacity of Max series chips in future generations.

H200 price unknown

So, how much does the H200 sell? Nvidia has not yet announced.

You know, the price of an H100 is between $25,000 and $40,000. To train an AI model, you need at least a few thousand blocks.

Previously, the AI community had widely circulated this picture "How many GPUs do we need".

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

GPT-4 is trained on about 10,000-25,000 A100s, Meta needs about 21,000 A100s, Stability AI uses about 5,000 A100s, and Falcon-40B uses 384 A100s for training.

According to Musk, GPT-5 may require 30,000-50,000 H100 blocks. Morgan Stanley's claim is 25,000 GPUs.

Sam Altman denied training GPT-5, but mentioned that "OpenAI has a severe shortage of GPUs, and the fewer people who use our product, the better."

Lao Huang's late-night AI chip H200 was released shockingly!The performance soared by 90%, and the inference speed of Llama 2 doubled

What we do know is that when the H200 is launched in the second quarter of next year, it will inevitably cause a new storm.

Resources:

https://nvidianews.nvidia.com/news/nvidia-supercharges-hopper-the-worlds-leading-ai-computing-platform?ncid=so-twit-685372

Read on