laitimes

Nvidia lowered the price of H20 AI chips for the Chinese market

Nvidia lowered the price of H20 AI chips for the Chinese market

The semiconductor industry is vertical

2024-05-24 18:19Posted in Beijing, creators in the field of science and technology

Nvidia lowered the price of H20 AI chips for the Chinese market

THIS ARTICLE IS SYNTHESIZED BY THE SEMICONDUCTOR INDUSTRY (ID: ICVIEWS).

As an NVIDIA product sold in China, the H20 artificial intelligence chip has attracted the most attention.

Nvidia lowered the price of H20 AI chips for the Chinese market

It is reported that the AI chips developed by Nvidia for the Chinese market have a weak start and are in sufficient supply. At present, Nvidia is planning to reduce the price of H20 artificial intelligence chips supplied to the Chinese market.

Nvidia's China market, which accounts for 17% of its revenue in fiscal 2024, has flattened the price of its AI chips, highlighting the challenges facing Nvidia's China business and casting a shadow over its future in the Chinese market.

China's growing competitive pressures have also been a wake-up call for Nvidia investors. After announcing a strong revenue forecast on May 22, the company's share price extended its impressive upward momentum.

The H20 is the strongest of the three AI chips developed by Nvidia for the Chinese market (HGX H20, L20 PCle, and L2 PCle), but the computing power is lower than that of Nvidia's flagship AI chips, the H100 and H800, which were also developed specifically for the Chinese market.

From the three models H20, L20, L2, L2, H20 should be training cards, while L20 and L2 should be inference cards, H20 is based on the latest Hopper architecture, while L20 and L2 are based on Ada architecture.

Nvidia lowered the price of H20 AI chips for the Chinese market

Judging from the previously exposed specifications, the H20 has a memory capacity of 96 GB, an operating speed of up to 4.0 Tb/s, and a computing power of 296 TFLOPs, using the GH100 chip, with a performance density (TFLOPs/Die size) of only 2.9. In other words, the AI computing power of H20 is less than 15% of that of H100.

The H20 has a higher cache and bandwidth than the Ascend 910B, where the bandwidth is twice that of the 910B, which means that the H20 has an advantage in terms of interconnection speed, which determines the speed of data transfer between chips. This means that the H20 is still competitive with the 910B in applications where a large number of chips need to be connected together to work as a whole system, and training a large model happens to be such a scenario.

At present, Huawei's Ascend community has announced that there are three models of the Atlas 300T product, corresponding to the Ascend 910A, 910B, and 910 Pro B, with a maximum power consumption of 300W, the first two AI computing power is 256 TFLOPS, and the 910 Pro B can reach 280 TFLOPS (FP16).

In comparison with the H100, the H100 has 80GB of HBM3 memory, memory bandwidth of 3.4Tb/s, theoretical performance of 1979 TFLOPs, and performance density (TFLOPs/Die size) of up to 19.4, making it the most powerful GPU in NVIDIA's current product line.

The H20 has 96GB of HBM3 memory and a memory bandwidth of up to 4.0 Tb/s, both of which are higher than the H100, but the computing power is only 296 TFLOPs and the performance density is 2.9, which is far inferior to the H100. Theoretically, the H100 is 6.68 times faster than the H20. However, it's important to note that this comparison is based on FP16 Tensor Cores' floating-point computing power (FP16 Tensor Core FLOPs) and enables sparse computation (which greatly reduces the amount of computation and therefore significantly increases speed), so it doesn't fully reflect all of its computing power.

In addition, the GPU has a thermal design power of 400W, which is lower than the H100's 700W, and can be configured with 8 GPUs in the HGX solution (NVIDIA's GPU server solution), and it also retains the 900 GB/s NVLink high-speed interconnection function, while also providing 7 MIG (Multi-Instance GPUs).

H100 SXM TF16(Sparsity)FLOPS = 1979

H20 SXM TF16(Sparsity)FLOPS = 296

According to Peta's LLM performance comparison model, H20 has a peak token/sec at moderate batch size, which is 20% higher than H100, and the token-to-token latency at low batch size is 25% lower than that of H100. This is due to reducing the number of chips required for inference from 2 to 1, and if 8-bit quantization is used again, the LLAMA 70B model can run efficiently on a single H20 instead of requiring 2 H100s.

It is worth mentioning that although the computing power of H20 is only 296 TFLOPs, far less than the 1979 of H100, if the actual utilization rate of H20 MFU (the current MFU of H100 is only 38.1%), which means that H20 can actually run 270 TFLOPS, then the performance of H20 in the actual multi-card interconnection environment is close to 50% of that of H100.

From a traditional computing perspective, the H20 is a degradation compared to the H100, but in terms of LLM inference, the H20 will actually be more than 20% faster than the H100, on the grounds that the H20 is similar in some ways to the H200 that will be released next year. Note that the H200 is the successor to the H100, a superchip for complex AI and HPC workloads.

Meanwhile, the L20 comes with 48 GB of memory and 239 TFLOPs of compute performance, while the L2 configuration comes with 24 GB of memory and 193 TFLOPS of compute performance. L20 is based on L40 and L2 is based on L4, but these two chips are not commonly used in LLM inference and training.

Both the L20 and L2 come in a PCIe form factor, with PCIe form factors for workstations and servers, and are more streamlined than higher-form factor models like the Hopper H800 and A800.

But NVIDIA's software stack for AI and high-performance computing is so valuable to some customers that they are reluctant to abandon the Hopper architecture, even if the specs are downgraded.

L40 TF16 (Sparsity) FLOPs = 362

L20 TF16 (Sparsity) FLOPs = 239

L4 TF16 (Sparsity) FLOPs = 242

L2 TF16 (Sparsity) FLOPs = 193

Let's look at the mass production progress of the H200. In March of this year, NVIDIA announced that it would start shipping the H200, a cutting-edge image processing semiconductor. The H200 is an AI-oriented semiconductor with better performance than the current flagship GPU, the H100. Nvidia has successively launched the latest AI semiconductors with the aim of maintaining a high market share. Then in April, OpenAI President and co-founder Greg Brockman revealed on social media X that Nvidia delivered the world's first DGX H200 to OpenAI, with a photo of himself with OpenAI CEO Altman and NVIDIA CEO Jensen Huang at the delivery site. Brockman said the device that Huang is building "will advance AI, computing and human civilization." However, Nvidia did not disclose the price of the GH200.

*Disclaimer: This article was created by the original author. The content of the article is his personal point of view, and our reprint is only for sharing and discussion, and does not mean that we agree or agree, if you have any objections, please contact the background.

View original image 77K

  • Nvidia lowered the price of H20 AI chips for the Chinese market
  • Nvidia lowered the price of H20 AI chips for the Chinese market
  • Nvidia lowered the price of H20 AI chips for the Chinese market

Read on