laitimes

Jen-Hsun Huang 2022 GTC Speech Express: Turning Data Centers into AI Factories

"Data centers are transforming into 'AI factories' that process massive amounts of data for intelligence." On the evening of March 22, NVIDIA founder and CEO Jen-Hsun Huang said at the 2022 NVIDIA GTC Conference.

Jen-Hsun Huang 2022 GTC Speech Express: Turning Data Centers into AI Factories

In his keynote speech at the GTC conference, Huang repeatedly mentioned the term "AI factory", "AI data centers process massive and continuous data to train and refine AI models, raw data comes in, is refined, and then intelligently outputs – enterprises are making intelligence and operating large AI factories." ”

From the "digital wormhole" that once introduced Omniverse to the "AI factory" that describes the "new data center" to the "AI factory", Huang Jenxun seems to have been good at using metaphors to gather his own thoughts.

So, what is an "AI factory"?

At a media briefing on March 23, when asked if the "AI factories" mentioned in the speech really existed, Huang Renxun said, "In fact, they are hidden and at the same time obvious." They're right in front of your eyes, it's just that you don't realize it. ”

It seems very metaphysical, but after listening to Huang Jenxun's next explanation, it seems that you can understand why you say so.

Huang Renxun first proposed, let's define what a factory is. In his view, the factory is a 'big box', input raw materials, use some energy, so the raw materials are converted into some valuable commodity output, "food is processed in this way, cars are made this way, chips are also made this way."

"As far as the world's largest Internet company is concerned, no matter who he is, he's got data coming in, powering it up, and then coming out smart — a model that can recognize language or a model that can predict and recommend to users what they might like." Huang Jenxun said.

Further, Huang believes that in the future, it may be a model that understands you and can help you recommend drugs, readings, treatment options, etc., which must be trained repeatedly between input data and output models. "So you've seen a lot of AI factories like that, and they're very obvious." In the future, every company will have an AI factory because what every company does is fundamentally intelligent. For most parts of the world, this is a new type of data center. It's everywhere, but that's just the beginning. ”

Understanding this, it seems to be able to understand many of NVIDIA's actions, such as the first time in the newly released H100 GPU equipped with a Transformer engine, such as aiming at the two major tracks of graphics processing and artificial intelligence. The market has also brought practical results to this judgment - NVIDIA's market value is 663.1 billion, which is 108 billion yuan higher than TSMC, the second largest semiconductor market value in the world (as of press time, TSMC's market value is 555.1 billion).

"Only you can beat yourself": The performance monster H100 CPU

Transformer is now a standard model scheme for natural language processing and one of the most important models in the field of deep learning models. The H100 GPU is equipped with a Transformer engine that allows such models to maintain the same accuracy and improve performance by 6 times when training, which means that the training time that would otherwise take weeks is reduced to a few days.

Specifically, whether it's GPT-3 (175 billion parameters) or "for large Transformer model training, the H100 will deliver up to 9x performance, and training that used to take weeks to complete can be reduced to a few days," Paresh Kharya, senior director of product management at NVIDIA, said at the launch.

Jen-Hsun Huang 2022 GTC Speech Express: Turning Data Centers into AI Factories

As for why transformers love Transformers, Jen-Hsun Huang explained that Transformers has made self-supervised learning possible without the need for humans to label data, and there has been "amazing progress" in the field of AI. As a result, Transformer is playing a role in more and more areas. For example, Google BERT for language understanding, NVIDIA MegaMolBART for drug discovery, and DeepMind's AlphaFold2 all go back to Transformer's breakthrough. ”

In addition, the H100's inference performance has also been greatly improved. Nvidia's Megatron-Turing model (530 billion parameters) inference on the H100 is 30 times higher than the previous generation A100, and the response delay is reduced to 1 second. In terms of FP16, FP32 and FP64 tensor operations, the H100 is three times faster than the previous generation A100 and six times faster at 8-bit floating-point math.

Jen-Hsun Huang 2022 GTC Speech Express: Turning Data Centers into AI Factories

The NVIDIA H100 beat the NVIDIA A100 to take over the world's largest AI acceleration chip (the H100 integrates 80 billion transistors, 26 billion more than the previous generation A100; the CUDA core soared to 16896, nearly 2.5 times that of the A100), which may be the legendary "only you can beat yourself".

Coincidentally, the Hopper architecture is also "overcoming yourself". NVIDIA has announced that Hopper's next-generation accelerated computing platform will replace the Ampere architecture introduced two years ago, and Ampere is NVIDIA's most successful GPU architecture to date.

The H100 is NVIDIA's first GPU based on the Hopper architecture. According to Huang, the H100 uses TSMC's latest 4nm process instead of the 5nm that has been circulating for a long time. At the same time, the H100 is equipped with the fourth generation of NVLink high-speed GPU interconnection technology, which can connect up to 256 H100 GPUs and expand the bandwidth speed to 900GB/s.

Jen-Hsun Huang 2022 GTC Speech Express: Turning Data Centers into AI Factories

At the same time, the mathematical computing power of the H100 has also increased, Hopper has introduced a new instruction set called DPX, which can accelerate dynamic programming, and optimize problems such as dynamic programming algorithms such as computational path optimization and genomics, which are 40 times faster than CPUs and previous generation GPUs, respectively.

"20 H100s can carry global Internet traffic," Huang said at GTC, "Hopper H100 is the biggest performance leap ever – its large-scale training performance is 9 times that of the A100, and the inference throughput of the large language model is 30 times that of the A100." According to reports, H100 will begin to supply in the third quarter of this year.

Currently, the H100 is available in two versions: one is the SXM with unprecedented thermal power consumption of 700W (the professional field self-media calls NVIDIA's "nuclear bomb factory"), which is used for high-performance servers; the other is suitable for more mainstream server PCIe, which consumes 50W more power than the 300W of the previous generation A100.

The latest DGX H100 computing system based on the H100 is commonly equipped with 8 GPUs. But the DGX H100 system achieves 32 Petaflop's AI performance at FP8 accuracy, which is 6 times higher than the previous generation of DGX A100 systems, and the GPU connection speed of 900GB/s is close to 1.5 times that of the previous generation.

At the GTC conference, Huang Jenxun also introduced the Eos supercomputer built on the basis of the DGX H100, and created the world's first AI supercomputing performance (its 18.4 Exaflops AI computing performance is 4 times faster than the Japanese "Fugaku" supercomputer). The Eos is equipped with 576 DGX H100 systems and uses 4608 H100s. In traditional scientific calculations, the computing power can reach 275 Petaflops, and the first Fugaku is 442 Petaflops.

The H100's next-generation Hopper architecture is named after Grace Hopper, the "First Lady of Computer Software Engineering." Grace Hopper was one of the pioneers of computer science, inventing the world's first compiler, the A-0 system. In 1945, Grace Hopper discovered a moth that caused machine failure in Mark II, and "bug" and "debug" became a special term in the field of computing.

Jen-Hsun Huang 2022 GTC Speech Express: Turning Data Centers into AI Factories

There are "Hopper" and "Grace", at the GTC conference, Huang Jenxun also introduced the latest progress of the super server chip Grace: grace Hopper super chip and Grace CPU super chip, the former consisting of a Grace CPU and a Hopper architecture GPU; the latter consisting of two Grace CPUs, interconnected by NVIDIA NVLink-C2C technology, including 144 Arm cores, The memory bandwidth is up to 1TB/s and the energy consumption is 500w.

Huang Jenxun also showed a statistic on the spot - the Grace super chip achieved 740 points of simulation performance in the SPECrate2017_int_base benchmark test, which is 1.5 times (460 points) of the CPU on the current DGX A100.

What is the Performance Monster used to do? Huang Jen-hoon: Making the World/Metacosm

The Omniverse, which NVIDIA has been building in recent years, now looks like a "meta-universe infrastructure" tool, and digital twins can also be understood as recreating the physical world in virtual space, or "making the world."

But this is not an entertainment project, and the picture of the future that Wong describes for Omniverse is an integral part of becoming an "action-oriented AI." What does it mean, Huang Jen-hoon cites NASA as an example, "Half a century ago, the Apollo 13 lunar mission was in trouble. To save the crew, NASA engineers created a crew module model on Earth to help solve problems astronauts encounter in space.

Amazon used Omniverse Enterprise to build a virtual "order fulfillment center" to find the most efficient way, PepsiCo used Metropolis and Omniverse to build a digital twin factory simulation to troubleshoot problems at low cost, and used simulation data to let AI agents "practice" in a virtual but real-world physical environment.

Jen-Hsun Huang 2022 GTC Speech Express: Turning Data Centers into AI Factories

A digital twin factory is built in Omniverse

Practice Kung Fu in Omniverse

"AI is 'blossoming' in everything from new architectures, new learning strategies, larger, more powerful models, new areas of science, new applications, new industries, etc., and all of these areas are evolving," Said Jen-Hsun Huang.

This judgment is also based on Huang Jenxun's view of five trends affecting the development of the industry: million-X million times the speed of computing leaps, Tranformers that have greatly accelerated the speed of AI, becoming the data center of AI factories, the exponential growth in demand for robotic systems, and the digital twin of the next AI era.

"We're going to accelerate the entire stack at data center scale over the next decade, once again achieving a million-fold performance leap." At the end of his speech, Jen-Hsun Huang said, "I can't wait to see what the next million-fold performance leap will bring." ”

Read on