laitimes

Give Nvidia 1.6 trillion transistors, and it will be able to support global Internet traffic

Give Nvidia 1.6 trillion transistors, and it will be able to support global Internet traffic

Nvidia's annual GTC conference arrived as scheduled, and hopper, the biennial updated GPU architecture, was officially unveiled.

This year, NVIDIA founder and CEO Jen-Hsun Huang unveiled a series of new products at NVIDIA's new headquarters building, from the new architecture GPU H100, to the Grace CPU super chip, to new hardware products for cars, edge computing, and comprehensive software updates.

Nvidia's new release once again announces to the outside world that Nvidia is not just a chip company, but a full-stack computing company. They are strengthening their leadership in AREAS, automotive, and more, while also trying to capture the next wave of AI and the opportunities of the metaverse.

Of course, as a company that invented the GPU, NVIDIA's new GPU architecture is still the most noteworthy new product of GTC 2022.

Give Nvidia 1.6 trillion transistors, and it will be able to support global Internet traffic

Named after Grace Hopper, a pioneering scientist in the field of computing in the United States, Nvidia Hopper's new architecture will replace the NVIDIA Ampere architecture introduced two years ago. Compared to the previous generation, the H100 GPU based on the Hopper architecture has achieved an order of magnitude of performance leaps.

Huang Said 20 H100 GPUs can support the equivalent of global Internet traffic, enabling customers to roll out advanced recommendation systems and large language models that run data inference in real time.

Various systems built on the H100 GPU, as well as various systems combined with the Grace CPU super chip, combined with NVIDIA's powerful software ecology for many years, will become the energy of NVIDIA to set off a new generation of computing waves.

The H100 GPU will be shipped in the third quarter of this year, and the Grace CPU super chip will be supplied in the first half of next year.

The latest Hopper architecture H100 GPU 6 major breakthroughs

In 2020, Huang Renxun came out of his own kitchen at that time, the world's largest 7nm chip Ampere architecture GPU A100, and two years later had a successor - Hopper architecture H100. Using a TSMC 4N process optimized for NVIDIA's accelerated computing needs, the NVIDIA H100 GPU integrates 80 billion transistors, significantly increasing the speed of AI, HPC, memory bandwidth, interconnects, and communications, and enabling nearly 5TB/s of external interconnect bandwidth.

Give Nvidia 1.6 trillion transistors, and it will be able to support global Internet traffic

The H100 also combines multiple firsts in one, including the first GPU to support PCIe 5.0, the first GPU to use HBM3, which can achieve 3TB/s of video memory bandwidth, and the world's first GPU with confidential computing capabilities.

The H100's second breakthrough was the transformer engine's ability to increase the speed of the Transformer network to six times faster than the previous generation without compromising accuracy. Transformer enables self-supervised learning, which is now a standard model scheme for natural language processing and one of the most important models in the field of deep learning models.

The H100 will support chatbots using the ultra-powerful Monolithic Transformer language model Megatron 530B, with 30 times higher throughput than the previous generation, while meeting the sub-second latency required for real-time conversational AI.

The H100's third breakthrough is a further upgrade of the second-generation multi-instance GPU. In the previous generation, NVIDIA's multi-instance GPU technology split each A100 GPU into seven independent instances to perform inference tasks. The new generation of Hopper H100 extends some of MIG's capabilities by a factor of 7 in the cloud by providing a secure multi-tenant configuration for each GPU instance compared to the previous generation.

Give Nvidia 1.6 trillion transistors, and it will be able to support global Internet traffic

MIG technology supports the division of a single GPU into seven smaller and completely independent instances to handle different types of tasks.

The fourth breakthrough of the H100 is that it is the world's first accelerator with confidential computing capabilities, which was previously only possible on CPUs, and the H100 is the first GPU to implement private computing, protecting AI models and processing customer data. The advantage of confidential computing is that it not only ensures the confidentiality of data without compromising performance, and can be applied to federated learning in privacy-sensitive industries such as healthcare and financial services, as well as to shared cloud infrastructure.

The H100's fifth breakthrough is the improvement in interconnect performance, supporting the 4th generation of NVIDIA NVLink. Today's AI models are getting bigger and bigger, and bandwidth is an obstacle that limits the iteration of hyperscale AI models. NVIDIA combines NVLink with the new external NVLink Switch to expand NVLink into an interconnected network between servers, connecting up to 256 H100 GPUs, up to 9 times more bandwidth than the previous generation of NVIDIA HDR Quantum InfiniBand networks.

The immediate boost this breakthrough can bring is that with the H100 GPU, researchers and developers can train huge models, such as the hybrid expert model with 395 billion parameters, with training speeds up to 9 times faster and training time from weeks to days.

Give Nvidia 1.6 trillion transistors, and it will be able to support global Internet traffic

The H100's sixth breakthrough is the new DPX instructions that accelerate dynamic programming for a range of algorithms including path optimization and genomics, with Nvidia's test data showing up to 40x and 7x faster than CPUs and previous-generation GPUs, respectively.

In addition, the Floyd-Warshall algorithm and the Smith-Waterman algorithm are also among the accelerations of the H100 DPX instruction, which can find optimal routes for autonomous robot fleets in a dynamic warehouse environment, and the Smith-Waterman algorithm, which can be used for DNA and protein classification and folded sequence comparison.

In addition to the hardware breakthrough, Nvidia has also released a series of corresponding software updates, including the NVIDIA AI software suite for workloads such as voice, recommendation systems, and ultra-large-scale inference, as well as more than 60 updates to cuda-X's libraries, tools and technologies, which can accelerate research progress in areas such as quantum computing and 6G research, cybersecurity, genomics and drug discovery.

Obviously, the six breakthroughs of the H100 GPU have brought higher computing performance, but the improvement and optimization of these performances all point to AI computing, which is also the embodiment of Nvidia's further expansion of leadership in the field of AI computing.

NVIDIA Eos, 4 times faster than the world's fastest supercomputer AI performance

With the performance-upgraded GPU, NVIDIA's fourth-generation DGX H100 system, the DGX POD and the DGX SupePOD architecture, can meet the large-scale computing needs of large language models, recommendation systems, medical and health research and climate science.

Give Nvidia 1.6 trillion transistors, and it will be able to support global Internet traffic

Each DGX H100 system is equipped with eight NVIDIA H100 GPUs, connected by NVIDIA NVLink, capable of achieving 32 Petaflop AI performance at the new FP8 accuracy, which is 6 times higher than the performance of the previous generation system. Each DGX H100 system also contains two NVIDIA BlueField-3 DPU for offloading, accelerating, and isolating advanced networking, storage, and security services.

The new DGX SuperPOD architecture features the new NVIDIA NVLink Switch system, which connects up to 32 nodes for a total of 256 H100 GPUs. The fourth generation of NVLink, combined with NVSwitch, is capable of achieving 900 GB/s connection speeds between each GPU in each DGX H100 system, which is 1.5 times faster than the previous generation system.

The new generation of DGX SuperPOD also delivers significant performance improvements, delivering 1 Exaflops' FP8 AI performance, up to 6x higher than the previous generation, and the ability to run massive LLM workloads with trillions of parameters, helping to advance the future of climate science, digital biology, and AI.

Based on the DGX H100, Nvidia will start running the world's fastest AI supercomputer later this year , NVIDIA Eos, the "Eos" supercomputer equipped with a total of 576 DGX H100 systems, a total of 4608 DGX H100 GPUs, is expected to provide 18.4 Exaflops OF AI computing performance, 4 times faster than Japan's Fugaku supercomputer, the latter is currently the fastest running system.

In terms of traditional scientific computing, the Eos supercomputer is expected to deliver 275 Petaflop performance.

Give Nvidia 1.6 trillion transistors, and it will be able to support global Internet traffic

"For NVIDIA and ITS and cloud computing partners, Eos will be the blueprint for advanced AI infrastructure," Huang said. ”

The 576 DGX H100 systems are capable of building one of the world's fastest-running AI systems, with a small combination of DGX SuperPOD units, and can also provide the AI performance needed to develop large models for industries such as automotive, healthcare, manufacturing, communications, retail, and more.

Huang mentioned that to support DGX customers who are developing AI, MLOps solutions provided by NVIDIA DGX-Ready software partners, including Domino Data Lab, Run:ai, and Weights & Biases, will join the NVIDIA AI Acceleration program.

To simplify AI deployment, NVIDIA has also launched the DGX-Ready Managed Services program, which supports customers who want to partner with service providers to oversee their infrastructure. With the new DGX-Ready Lifecycle Management Program, customers can also upgrade their existing DGX systems with the new NVIDIA DGX platform.

Grace CPU super chip, the most powerful CPU

Last year's GTC 21, Nvidia's first data center CPU Grace unveiled, NVIDIA's chip route was also upgraded to GPU + DPU + CPU.

At this year's GTC 22, NVIDIA launched the first Arm Neoverse-based data center dedicated CPU Grace CPU superchip for AI infrastructure and high-performance computing.

Give Nvidia 1.6 trillion transistors, and it will be able to support global Internet traffic

Designed for AI, HPC, cloud computing, and hyperscale applications, the Grace CPU Superchievice can accommodate 144 Arm cores in a single socket and achieve industry-leading analog performance of 740 in SPECrate 2017_int_base benchmarks. According to NVIDIA Labs' estimates using similar compilers, this result is more than 1.5 times higher than the current dual CPU (AMD EPYC 7742) on the DGX A100.

Huang praised: "Everything about Garce is amazing, and we expect the Grace superchip to be the most powerful CPU by then, 2 to 3 times the top CPU of the 5th generation that has not yet been released." ”

According to reports, relying on an innovative memory subsystem composed of LPDDR5x memory with error correction codes, the Grace CPU super chip can achieve the best balance of speed and power consumption. The LPDDR5x memory subsystem provides twice the bandwidth of traditional DDR5 designs, reaching 1 TB/s, while also significantly reducing power consumption, with the overall CPU plus memory consumption of only 500 watts.

It is worth noting that the Grace CPU superchip consists of two CPU chips, interconnected together via NVLink-C2C. NVLink-C2C is a new type of high-speed, low-latency, chip-to-chip interconnect technology that will support custom die and consistent interconnects with NVIDIA GPUs, CPUs, DPU, NICs, and SOCs.

With advanced packaging technology, NVIDIA's NVLink-C2C interconnect links are up to 25 times more efficient than PCIe Gen 5 on NVIDIA chips, 90 times more area efficient, and achieve consistent interconnect bandwidth of 900 GB per second or more.

Thanks to the Grace CPU Super Chip, it can run all of NVIDIA's computing software stacks, including NVIDIA RTX, NVIDIA HPC, NVIDIA AI, and Omniverse. Grace CPU Super Chip combined with NVIDIA ConnectX-7 network card, can be flexibly configured into the server, can be used as a stand-alone pure CPU system, or as a GPU acceleration server, equipped with one, two, four or eight Hopper-based GPUs, customers can optimize the performance of their specific workloads by maintaining a software stack.

Give Nvidia 1.6 trillion transistors, and it will be able to support global Internet traffic

The NVIDIA Grace SuperChip Series released today and the Grace Hopper SuperChip released last year both use NVIDIA NVLink-C2C technology to connect to the processor chip.

Nvidia said that in addition to NVLink-C2C, NVIDIA will also support the UCIe (Universal Chiplet Interconnect Express) standard released earlier this month. Custom chip integration with NVIDIA chips can use either the UCIe standard or NVLink-C2C.

Copyright declaration: Where the content of this public account indicates [original], the content copyright belongs to the author, the pictures and text content that are not marked [original] are reproduced from the network, the copyright belongs to the original, if the pictures and text are infringing, please inform us, we will delete it immediately. Thank you!

Read on