laitimes

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

author:Changqi's growth

#头条创作挑战赛 #

丨Focus

(1) Huang said that accelerated computing and artificial intelligence have reshaped the computer industry, and the era of CPU expansion is over. Today's data centers that need to continuously increase computing power need fewer and fewer CPUs and more GPUs. Humanity has reached the tipping point of generative AI.

(2) Huang believes that artificial intelligence means that everyone can now become a computer programmer because all one needs to do is talk to computers, and he hails the end of the "digital divide".

(3) Huang stressed that enterprises and individuals should understand the new wave of artificial intelligence and quickly promote the innovation of new technologies, otherwise they will be eliminated. At the same time, AI will also become an auxiliary tool, increasing the productivity and efficiency of workers and creating new jobs, while some traditional jobs will disappear.

(4) Huang revealed that the world's first accelerated computing processor with huge memory, the GH200 Grace Hopper, is now in full production, and Microsoft, Meta and Google Cloud are expected to be the first users of the supercomputer.

(5) In addition, Huang called Nvidia's $6.9 billion acquisition of supercomputer chip maker Mellanox in 2019 "one of the greatest strategic decisions it has ever made."

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

Finish|Tencent Technology Ma Jionghui

Compilation group|Mowgli, Golden Deer

Editor's note:

On May 30, Beijing time, NVIDIA's stock price rose to nearly 4% before the US stock market on Tuesday, and the stock price exceeded $404 per share, with a total market value of more than trillion dollars. NVIDIA also became the world's first chip company with a market value of more than trillion dollars.

As the "leader" of NVIDIA, Jensen Huang recently delivered a keynote speech at Computex 2023, sharing their latest progress in AI, graphics and other fields.

In Huang's nearly two-hour speech, he brought a series of blockbuster products and news to users, including AI model foundry services for game developers, the DGX GH200 artificial intelligence supercomputer with greatly increased computing power, and the progress of cooperation with other leading technology companies. Jensen Huang painted a more concrete picture of an AI-enabled future.

The following is the full transcript of the two-hour speech compiled by Tencent Technology:

Hi everyone, I'm back! This is my first in-person event in nearly four years. I haven't given a public speech in four years, and I have a lot to tell you, but time is so limited. So, let's get started!

01

Gaming GPUs are in full production

Ray tracing, simulating the properties of light, and materials are the ultimate challenges in accelerating computing. Six years ago, we first demonstrated the ultimate accelerated computing challenge, presenting this scenario in less than a few hours. After ten years of research, we were able to render this scene in 15 seconds using GPUs.

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

Six years ago, we invented the NVIDIA RTX and combined three basic technologies – hardware-accelerated ray tracing, running artificial intelligence on NVIDIA tensor core GPUs, and completely new algorithms. Let's take a look at how they've differed just five years after launch.

This is the image processed on the Kuda GPU. Six years ago, it would have taken hours to render this beautiful picture. So, while accelerating the already rapid progress of computing, this is a huge breakthrough. Then, we invented the RTX GPU.

The "holy grail" of computer graphics ray tracing can now be realized in real time. This is the technology we adopted in RTX, and today five years later is a very important moment for us, as we take the third-generation Ada architecture RTX GPU for the first time and bring it to the mainstream with two new products. Now, both products are fully operational.

Inside and out, it all looks very different. This is our brand new product, and the device you see in my left hand is running a Kuda GPU that supports ray tracing and artificial intelligence at 60 frames per second, its screen is about 14 inches and it barely feels weight, and it's more powerful than the highest-end PlayStation consoles.

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

On my right hand is our core gamer's favorite RTX 4060Ti. Now, both products are in production and our partners are producing them in large quantities. I'm really excited, thank you very much! I can almost put it in my pocket.

AI can help us do many things that were previously thought to be absolutely impossible. In the case of render pixels, we previously couldn't predict other pixels when rendering pixels. Now, we can use artificial intelligence to predict another seven pixels for each pixel, with incredibly high performance.

Now, I'm going to show you the performance of these two GPUs, but it wouldn't have been possible if it weren't for NVIDIA's supercomputers always running to help train models so that we could enhance our applications.

So the future is what I've just shown you, and you can infer that all the things I'm going to talk about in the rest of my talk pretty much boil down to a simple idea, which is that there will be a mainframe computer that writes software, develops and deploys software, which is incredible, and it can be deployed on devices around the world.

02

Launched game AI model OEM service

We can use artificial intelligence to render scenes, and we can use artificial intelligence to bring them to life. Today we are announcing the launch of the NVIDIA ACE Avatar Cloud Engine, which is designed for animation to bring digital avatars to life.

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

It has several features, or several abilities, speech recognition, text-to-speech, natural language understanding based on large language models, using your voice to generate sounds to make your face make different expressions, and using your voice and expressions to make your hands make corresponding movements. All of this is fully trained by artificial intelligence.

We have a service that includes pre-trained models that developers can modify and enhance the app based on their own stories, as each game has a different story, which you can then deploy to the cloud or on your own devices.

We have great backends, with Tensor RT, RT is a video deep learning optimization compiler that you can deploy on NVIDIA's graphics processor with Onyx and industry-standard output so you can run it on any device.

It only takes a second to see this scene, but let me tell you about it first. This is completely rendered with ray tracing, notice the beautiful lights, there are so many different rays, all the different rays are projected from the same light source. So you can have all kinds of direct light, have global illumination, you see incredibly beautiful shadows and physics simulations, notice the rendering of the characters, all done in Unreal Engine 5.

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

We partnered with a maker of avatar frameworks and avatar tools called AI to develop this video game demo. It's all in real time, and the conversation is as follows:

Player: Hey Jen, how are you?

Jen: Unfortunately, not very good. I'm worried about crime here, and it's been getting worse lately. My shop was destroyed in the crossfire.

Player: Can I help?

Jen: If you want to do something, I've heard rumors that crime boss Kuman is causing all kinds of chaos in the city, and he may be the source of the violence.

Player: I'll go talk to him. Where can I find him?

Jen: I heard that he freaked out to underground fight clubs in the east of the city. Try to go there.

Player: I'll go!

Jen: Stay safe!

None of these conversations are pre-scripted. We give the AI the character a backstory, including the story of its shop and the story of the game, and all you have to do is talk to the character, because the character has been integrated with artificial intelligence and large language models, which can interact with you and understand what you mean. The bottom line is that these interactions are all done in a very reasonable way.

All facial animations are done by artificial intelligence, and we make it possible to generate a wide variety of characters. They are all different domains with knowledge in their respective fields. You can customize it, so everyone's game is different, see how beautiful and natural they are. This is the future of video games, where artificial intelligence will not only help with the synthesis of rendering and environments, but will also animate these characters. Artificial intelligence will be an important part of the video game of the future.

03

Three major trends in the computer industry

The most important computer of our generation is undoubtedly the IBM 360 system. This computer revolutionized a lot of things: it was the first computer in history to introduce the concept of a central processing unit, virtual memory, scalable IO, multitasking, and the ability to expand this computer in different computing ranges.

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

One of the most important contributions and insights is the importance of protecting software investments. This software can run on all computers and spans generations of computers. IBM recognizes the importance of software, the importance of protecting investments, and, very importantly, the importance of the installation base.

Not only did this computer revolutionize computing, but many of us grew up reading the computer's manual to understand how computer architecture works, or even learning about DMA for the first time (Editor's note: DMA, or Direct Memory Access, is the principle of computer composition). This computer revolutionized not only computing, but also the thinking of the computer industry.

The IBM 360 system and its programming model have largely survived to this day. For 60 years, $1 trillion worth of data centers around the world has basically used the same computing model invented 60 years ago. The computer industry today is undergoing two fundamental shifts. All of you are in it and you can feel them.

There are several fundamental trends. The first trend is that the era of CPU expansion is over, as is the tenfold increase in performance every five years for the same cost, which is the main reason why computers are so fast today. Maintaining a tenfold increase in computing power every five years without increasing power is the reason why the world's data centers don't consume as much electricity on Earth. This trend is over, we need a new approach to computing, and accelerated computing is the way forward.

It happened to happen when a new way of software development was discovered—deep learning—and these two things combined to drive today's computing forward rapidly.

Accelerated computing and generative artificial intelligence, the way software is developed, is a reinvention from scratch. It's not easy. Accelerated computing is a full-stack problem, it's not as easy as general-purpose computing, and CPUs are a marvel, high-level programming language, great compiler. Almost anyone can write a fairly good program because the CPU is so flexible.

However, its continued scalability and performance gains are over, and we need a new approach. Accelerated computing is a full-stack problem, and you have to redesign everything from top to bottom and bottom to top, including chips, systems, system software, new algorithms and optimizations, and new applications.

The second trend has to do with the size of data centers. The reason data center size is problematic is that today's data centers are computers. Unlike in the past, where your PC was a computer or your phone was a computer, today your data center is a computer, and applications run across the entire data center. Therefore, it is critical that you understand how to optimize the chip to compute software across nodes, switching to the other end in a distributed computing fashion.

The third trend in accelerated computing is multi-domain, but also in specific areas. The algorithms in the software stack you create for computational biology are completely different from the software stack you create for computational fluid dynamics. Each of these scientific fields all require their own stacks, which is why accelerated computing took us nearly three decades to complete.

The entire stack took us almost 30 years. However, the performance is incredible, and I will show you. After 30 years, we now realize that we are at an inflection point where a new model of computing is extremely difficult to achieve. Because to develop new computing models, you need developers.

However, before a developer can join, someone must create an app that the end user will buy. Without end users, there are no customers, and there are no computer companies producing computers. If a computer company doesn't produce computers, it doesn't have an installation base, and without an installation base, it doesn't attract developers, and ultimately there are no available applications.

In my 40 years in the industry, many computer companies have gone through this cycle. This is indeed the first major period in history, and a new computational model has been developed and created. We now have 4 million developers, more than 3,000 apps, and 40 million CUDA software downloads, including 25 million last year alone. There are 15,000 startups in the world built on NVIDIA's technology, and 40,000 large companies are using accelerated computing.

04

The tipping point of the new computing era

We have now reached a tipping point in a new computing era that is now welcomed by every computer company and every cloud computing company in the world. There's a reason for that. It turns out that at the end of the day, the ultimate benefit of every calculation method is cost reduction. The 80s were the decade of the personal computer revolution, when personal computers brought the price of computing to unprecedented levels.

Then mobile devices are convenient and also save a lot of money. We put cameras, music players, phones, and many different things together. In the end, you will not only be able to enjoy your life better, but also save a lot of money and gain great convenience.

Each generation offers something new and saves money. That's the idea behind accelerated computing. This is accelerated computation for large language models, which are basically the core of generative AI. This is a very expensive project, and we need to bear all the costs, including the cost of developing chips, deploying networks, etc. $10 million can buy nearly 1,000 CPU servers, and the process of training a large language model requires 11 gigawatt-hours of power.

That's what happens when you speed up workloads with accelerated computing, buy servers for $10 million, and that's why people say GPU servers are expensive. However, GPU servers are no longer computers, but computers are data centers. Your goal is to build the most cost-effective data center, not the most cost-effective servers.

In the past, computers were servers, which was a reasonable thing. But today, computers are data centers. What you want to do is, you want to create the most efficient data center with the best TCO and spend $10 million on 48 GPU servers. It consumes only 3.2 gigawatt-hours of power, while performing 44 times the former.

We want so-called intensive computers, not mainframe computers. Let me show you something else. That's $10 million, 960 CPU servers. What we're going to do this time is keep the power constant, which means your data center has a limited power supply. In fact, most data centers today have limited power supply, in which case you can get 150x the performance at 3x the cost using accelerated computing.

Why is accelerated computing so useful? The reason for this is because, finding another data center is very expensive and time-consuming. Today, almost every data center has a limited power supply, and almost all of them are scrambling to break new ground to get more data centers. So if your power supply is limited, or your customers have limited power supply, what they can do is invest more in their current data centers. This way, you can get more throughput and continue to drive your company's growth.

Let's look at another example: if your goal is to get the job done, you don't care how to do it. Then you don't need to understand the strategy, you don't need to understand the technology, just remember: the more you buy, the more you save. That's what NVIDIA can offer.

Back to the data center. Every time you see me over the years, I've pretty much talked about accelerated computing. It has been the same for the past 20 years, why is that? Why is this a tipping point? Because the equation of the data center is very complex, it is related to the cost of the data center. Data center TCO is a function, and it's the part that everyone often screws up. This is a function of the chip and a function of the system, but this is also because there are many different use cases.

This is a function that creates system diversity. Why are there so many computers with different configurations? Mainframes, minicomputers, cheap computers, hyperscale computers, supercomputers, etc., all of which are fully compatible. It's incredible that our hardware ecosystem can create so many different versions of software that are compatible.

The throughput of a computer is very important. It depends on the chip, but it also depends on the algorithm, because without the algorithm library, accelerated computing cannot do anything. It's there, and you need the algorithm to run it. This is a big problem for data centers. Network problems and network problemsDistributed computing are all about software. Again, system software is important. Soon, in order to show your system to customers, you will end up having to have a lot of applications running on it, and the software ecosystem is important.

The utilization of a data center is one of the most important criteria for its TCO, just like hotels. If the hotel is great, but empty most of the time, the operating costs will be very high. So a high utilization rate is required. To increase utilization, you have to have a lot of different applications. The richness of applications, the algorithms in the library, and now the software ecosystem are all important.

Let's say you buy a computer, but it's very different from the moment you buy it until you get the computer to work and make money, and the time lag can last a few weeks. We can build a supercomputer in a matter of weeks, because we have built a lot of supercomputers around the world. If you don't do it well, you may have to spend a year closing the gap, leaving you lost the opportunity to make money, incur high costs, and ensure lifecycle optimization because the data center is software-defined.

There are a lot of engineers who will continue to improve and optimize the software stack because the video software stack is architecturally compatible, spanning all our generations, spanning all our GPUs. Every time we optimize an aspect, we benefit all products. So life cycle optimization, as well as the amount of power you use, are very important.

The equation is complex, but we've now solved a lot of different domains, industries and data processing, deep learning, and classical machine learning, and we've deployed software in many ways, from the cloud to enterprise supercomputing to edge computing. We also have a lot of different GPU configurations, from HGX to Omniverse, from cloud GPUs to graphics versions, etc.

The utilization rate is high now. NBA GPU utilization is very high, almost every cloud is overscaled, and almost every data center is overscaled. There are a lot of different apps using it. So we've now reached the tipping point of accelerated computing, and we're also at the tipping point of generative AI, and I want to thank all of you for your support, help, and cooperation to make this dream a reality.

With each new product release, our demand increases, as does Kepler, Volta, Pascal, and Ampere. And now this generation of accelerated computing needs come from virtually every corner of the world. We are excited to have the H100 in full series production." It is built by multiple vendors and is used in the cloud and enterprise.

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

Incredibly, this system board has 35,000 components and 8 Hopper architecture GPUs. It weighs more than 27 kg and requires the help of robots to lift and integrate. The $200,000 product replaced other computers throughout the room. I know it's expensive, it's probably the most expensive system board in the world, but the more you buy, the more you save. This is what the computing tray looks like, the brand new H100, the world's first computer with a transformer engine. The performance is simply incredible.

05

The acquisition of Mellanox was one of the greatest strategic decisions

We've been pushing this new way of computing for 12 years. When we first met deep learning researchers, we were fortunate to realize that deep learning would not only become a magic algorithm for many applications, initially computer vision and speech, but it would also become a whole new way of developing software that could use data to develop general-purpose function approximators with incredible dimensions. It can basically predict anything you have data for.

We realized that this new approach to developing software was important as long as the data had a structure that could be learned from, and it had the potential to completely reshape computing. We made the right bet. 12 years later, we've reinvented everything we've ever invented. We started by creating a new type of library, which is essentially like a sequel, except for deep learning for neural network processing, which is like a rendering engine, a solver for neural network processing.

We reinvented the GPU, and people thought the GPU was still the original GPU, but they were completely wrong. We're committed to redesigning the GPU to make it great. In terms of tensor processing, we created a new package called SXM and worked with TSMC to stack multiple chips so that we could connect these SXM modules to each other in a high-speed chip-to-chip fashion.

About a decade ago, we made the world's first chip-to-chip module so we could expand memory. We created a new type of motherboard called HGX. Never before have computers been so bulky and consume so much power. Every aspect of the data center had to be redesigned. We also invented a new type of computer device so that we could develop software on it.

There is a simple device that we call DGX, which is basically a huge GPU computer. We also acquired supercomputer chip maker Mellanox (Editor's note: NVIDIA announced its $6.9 billion acquisition of Israel-based Mellanox on March 11, 2019), one of our company's greatest strategic decisions. Because we realize that in the future, if the data center is the computer, then the network is the nervous system. If the data center is configured as a computer, then the network defines the data center. It was a very good acquisition. We've done a lot of things together since then, and today I'm going to show you some really great work.

Well, if an operating system has a nervous system, such as a distributed computer, it needs to have its own operating system. We call it Magnum IO. We have some of the most important work to do, and all the algorithms and engines are on these computers, which we call NVIDIA AI. It is the only AI operating system in the world that takes data processing from data processing to training, optimization, deployment, inference, end-to-end deep learning processing. It is the engine of artificial intelligence today.

With every generation that started with Kepler, every two years we had a huge leap, but we realized we needed something more, and that's why we connected GPUs to build bigger GPUs, and we connected those GPUs together with unlimited bandwidth to form larger computers, allowing us to drive processors and scale computing.

For AI research institutions, the community is advancing AI at an alarming rate. Every two years, we take a big leap, and I expect the next one to be big enough. This is the new computer industry, where software is no longer written solely by computer engineers, but by computer engineers working in collaboration with artificial intelligence.

06

Supercomputers will become the new plant

Supercomputers will become the new factories. It is very logical that the automotive industry has factories that produce cars that you can see. It also makes sense that the computer industry has computer factories, and you can see the computers they build. In the future, every large company will have its own AI factory for building and producing its own company's intelligence. This is a very sensible thing to do. We nurture our employees and constantly create conditions that enable them to work best. We will be smart producers, producers of artificial intelligence, every company will have factories, factories will be built this way.

That translates into your throughput, which translates into your scale, and you're going to build it in a very, very good way because we're committed to pursuing that path and relentlessly improving performance over 10 years. We increase throughput, we increase scale, and the overall throughput of all stacks increases 1 million times in 10 years. In the beginning, I showed you how fast computer graphics can develop in 5 years, and we have improved computer graphics performance by 1000 times with artificial intelligence and accelerated computing in 5 years.

The question now is, what can you do when your computer is 1 million times faster? It turned out that our friends at the University of Toronto, Alex Chris and Ilya Sutskever and Jeff Hinton. Sutzkevi, co-founder of OpenAI, discovered the continued expansion of AI in deep learning networks and came up with ChatGPT. In this general form, it is the transformer engine and the ability to use unsupervised learning to learn from large amounts of data and identify patterns and relationships in large sequences, and use the transformer to predict the next word, from which large language models are created.

This breakthrough is very obvious, and I'm sure everyone in this room has tried ChapGPT, but importantly, we now have a software capability to learn the structure of almost any information. We can learn the structure of text, sounds, images, and learn the structure of all people's bodies, such as proteins, DNA, chemicals. We can learn languages such as English, Chinese, Japanese, etc., but you can also learn many other languages.

The next breakthrough is generative artificial intelligence. Once you learn the language, especially the language of specific information, and then through the guidance of other sources of information that we call prompts, you can guide the AI to generate various information. We can generate text, images. But it is important that it is possible to convert this information into other information. Text to proteins, text to chemicals, images to 3D images to 2D images to text descriptions. Video to video, many different types of information can now be converted.

For the first time ever, we have a software technology that can understand multiple forms of information expression. We can now apply computer science, we can apply the tools of our industry, we can apply the tools of our industry to many different areas, which was unthinkable before. That's why everyone is so excited.

Now let's take a look at some of these examples and let's see what it can do. First, there's a prompt here, and that prompt is "Hi, Computex." I'm typing the text here: "I'm here to tell you how delicious stinky tofu is!" You can enjoy it here, the night market is the best. I was there that night. "The only thing I input is text, and all I output is video.

Next, let's try the text to music function. Simply give musical style cues and lyrics, and the AI can automatically generate them. The text I typed was: "I am here at computex, I will make you like me best." Sing sing it with me, I really like Nvidia!" I can't do it anyway. With such performance, next time I will hire artificial intelligence.

So, obviously, this is a very important new capability. That's why so many generative AI startups are popping up. We are working with about 1,600 generative AI startups in a variety of fields, from languages to media, biology.

One of the most important areas we care about is the field of digital biology, which will undergo a revolution. It would be an incredible thing to help us create tools so we could build great chips and systems. For the first time, we will have computer-aided drug discovery tools that will be able to manipulate and process proteins and chemicals, understand disease targets, and try a variety of chemicals that have never been thought of before.

This is a very important area with a lot of startups, tools, and platform companies. Let me show you a video of what they're doing.

Unbelievable, right? It's incredible. There is no doubt that we are in a new era of computing. Every era of computing allows you to do many things that were previously impossible. Artificial intelligence certainly fits this condition. In this particular era of computing, several aspects are exceptional. First, it's able to understand information beyond just text numbers, but also multimodality, which is why this computing revolution can impact every industry.

Second, because this computer doesn't care how you program it. It will try to understand what you mean because it has very strong language model capabilities and very low programming barriers. We have narrowed the digital divide and everyone is a programmer. Now, you just need to say something to the computer. Third, this computer is not only capable of doing amazing things for the future, it can also support every application of the last computing era to do amazing things, which is why all these APIs are connected to Windows applications, browsers, Powerpoint, and Word.

Every existing application is better because of AI. You don't have to integrate AI capabilities just for emerging applications, this era of computing doesn't require new applications. It can be successful in all applications, it will have new applications. Because it's so easy to use, it's evolving so fast that it's going to touch the core of every industry, because with every era of computing, it requires a new approach to computing. In this particular era, the method of calculation is accelerated computing, and it has been completely reinvented.

07

The Grace Hopper is in full production

For the past few years, I've been talking to you about the new processors we're developing. That's why we created it. The Grace Hopper is in full production, and the computer has nearly 200 billion transistors.

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

This processor is truly amazing. It has several unique features: it is the world's first accelerated computing processor, and it also has a huge amount of memory. It has almost 600 GB of memory, between the CPU and the GPU. So the GPU can reference memory, the CPU can reference memory, and any unnecessary back and forth copying can be avoided. Amazingly fast memory allows GPUs to handle very large data sets. It's a computer, not a chip, and it uses low-power DDR memory, just like your phone, except it's optimized for highly resilient data center applications with amazing performance.

It took us years to develop and I'm really excited about it and I'm going to show you what we're going to do with it. Four converter engines and 72 CPUs are connected by a high-speed chip-to-chip link, up to 900 gigabytes per second. Local memory, 96GB HBM 3 memory is expanded on a very large cache with 1PDDR memory. So this computer is like never seen in the world.

Now let me give you some demonstrations. I'm comparing three different apps here. This is a very important application. If you've never heard of it, be sure to check it out. It is called a vector database, that is, a labeled database and already vectorized the data. As a result, it understands the relationships of all the data in its storage. This is extremely important for knowledge enrichment of large language models and to avoid hallucinations.

The second application is a deep learning recommendation system. That's how we get the news, the music, and all the text you see on your device. Of course, music, merchandise and various things are recommended. Recommender systems are the engine of the digital economy. This is probably the most valuable software run by any company in the world, the world's first artificial intelligence factory. There will be other AI factories in the future, but this is really the first.

The third application is large language model inference. 65 billion parameters is already a fairly large language model, which is simply impossible to implement on a CPU. With x86 architecture, Hopper will be faster, but please note that it has limited memory. Of course, you can split 400 GB of data and then distribute it across more GPUs. But in the case of Grace Hopper, its memory is larger than all of these modules. Do you understand? So you don't have to divide the data into so many chunks. Of course, this method is more computationally intensive, but it is much easier to use.

This is an easy way to use if you want to extend large language models, vector databases, or deep learning recommendation systems. It's so good to plug it into the data center. That's why we built Grace Hopper. Another application that I'm super excited about is the foundation of our company. Nvidia is a big customer of Cadence, we use all of their tools, and all of their tools run on the CPU. They run on the CPU because NVIDIA's dataset is very large.

Moreover, these algorithms will improve over a long period of time. Therefore, most algorithms are CPU-centric. We've been speeding up some of these algorithms with Cadence for a while, but now with Grace Hopper, we've only spent days and weeks working on it, and the performance has sped up and I can't wait to show it to you guys, it's crazy! This will revolutionize the entire industry, which is one of the most computationally intensive in the world, including designing chips, designing electronic systems, CAE, CAD, EDA, and, of course, digital biology.

All of these markets, all of these industries require a lot of computation, but the data sets are also very large. The Grace Hopper is ideal. 600 GB of RAM is a big number, and what I have in my hand is basically a supercomputer. But think about it, 12 years ago, when we started training 1.2 million images on 62 million parameters of AlexNet, now Google's Palm has scaled up 5,000 times. Of course, we'll do something bigger, and it's already trained on more than 3 million times the data.

Thus, over the course of a decade, the computational problems of deep learning have increased by a factor of 5,000 for software and 3 million times for data sets. No other computing area is growing so fast. So we've been pursuing advances in deep learning for a long time, and that's going to make a very, very big contribution. However, 600 GB is still not enough, we need more.

Let me show you what we're going to do. First, we have the Grace Hopper super chip and plug it into the computer. The second thing we have to do is connect eight of them using NVLink switches, connecting eight chips into three exchange trays to form eight Grace Hopper pods. Each of the eight Grace Hopper pods is connected at a speed of 900 gigabytes per second. We then connected 32 first-level systems with Layer 2 switches, and the 256 Grace Hopper superchips could speed up to 1 Exaflop (exascale times).

You know, many countries are always working on implementing Exaflop computing. The 256 Grace Hopper expands to 1 Exaflop converter engine, which offers us 144 TB of memory that can be seen on each GPU. This is not a 144 TB distributed deployment, but a 144 TB connection. Why don't we take a look at what it really is?

This device contains 150 miles of cable - fiber optic cable; With 2,000 fans capable of purifying 70,000 cubic feet of air per minute, it may take a few minutes to purify the air in the entire venue. It weighs 400,000 pounds, which is equivalent to the weight of 4 elephants.

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

This is our new brand, Grace Hopper, which is an artificial intelligence supercomputer and a huge, completely incredible GPU. We're building it now, with every single part in production. We are very excited that Google Cloud, Meta and Microsoft will be among the first companies in the world to access. They will join us in exploratory research at the forefront of artificial intelligence. We will build these systems into products. So if you want an AI supercomputer, we'll come and install it in your company. We also share blueprints of this supercomputer with all of our cloud vendors so that cloud partners can integrate it into their networks and infrastructure. We will also build this supercomputer in-house for our own research and development. Overall, the DGX GH200 supercomputer is like a giant GPU.

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

1964, the year after I was born, was a great year for the tech industry. IBM introduced the System/360 mainframe computer that year, and AT&T showed the world the first videophone encoded, compressed and transmitted over copper and twisted pairs. This is a decoded videophone, a small black and white screen.

To this day, the same is true of this experience. For a variety of reasons, we all know that video calls are essential in our daily lives nowadays. Everyone does it. Today, video accounts for about 65% of web traffic. However, it is still implemented in basically the same way. Compress on the device, stream, and then unzip on the other end. Nothing has changed in 60 years. We treat communication as if it were in a dumb pipe. The question is, what happens if we apply generative AI to this? We have now built the computer, Grace Hopper, which can be easily deployed widely around the world. As a result, every data center, every server will have the power of generative AI. If it's not just decompressing streams and read recovery decompression, what are the results of cloud-based generative AI capabilities?

In the video clip just now, all the words that come out of my mouth are generated by artificial intelligence, so in the future, communication will no longer be compression, data flow and decompression, but perception, transmission and reconstruction, regeneration. It can be produced in a variety of different ways. It can generate 3D images or regenerate your language and another language. So, we now have a universal translator.

This computing technology can be put into each individual cloud. But what's really amazing is that the Grace Hopper is so fast that it can even run the 5G stack. Currently, the most advanced 5G stack can run completely free of charge in Grace Hopper's software. Suddenly, a 5G radio capable of running in software, just like a video codec that used to run in software, can now run a 5G stack in software. The first layer, the second layer – the MAC layer, is also the core layer of 5G, and all of these are very computationally intensive. Today, the entire stack can now run in a single Grace Hopper.

08

Partnered with SoftBank to build a distributed data center network

Basically, what's happening here, this computer that you see here, allows us to bring generative AI to every data center in the world today, and because we have software-defined 5G, telecom networks can also be computing platforms like cloud data centers. Every data center of the future will likely be smart, and every data center can be software-defined, whether it's internet-based, web-based, or 5G-based communications, everything will be software-defined. We are taking this opportunity to announce a partnership with SoftBank to redesign and deploy generative AI and software-defined 5G stacks into SoftBank's global data center network. We are very excited about our partnership with SoftBank.

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

I just talked about how we're going to scale the frontiers of AI, and I talked about how we're going to scale generative AI, extending generative AI to advanced generative AI. The number of computers worldwide is indeed impressive, and data centers around the world and all data centers in the next decade will be recycled into accelerated data centers and AI-capable data centers.

But in a lot of different fields, they have a lot of different applications, scientific computing, data processing, large language model training, generative artificial intelligence that we have been talking about, EDA, SSD, etc., generative artificial intelligence for enterprises. Each of these applications has a different server configuration, different applications are sent with different emphases, different deployment methods and security are different, different operating systems are different, different management methods are different, and computers are located differently. Therefore, each of the different application areas had to be redesigned with a new type of computer. This is just a huge number of configurations.

09

Introduction of the NVIDIA MGX server specification

Today, we are announcing partnerships with a number of companies to launch the NVIDIA MGX server specification. This is an open, modular server design specification designed to accelerate computing.

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

Most servers today are designed for general-purpose computing. For a very high-density computing system, mechanical, thermal, and electrical are not enough. Accelerated computers compress many servers into one. This way, you'll be able to save a lot of money and a lot of floor space, but it's a different architecture. We've designed it to be a multi-generic, multi-generation, standardized product so that once you make an investment, our next-generation CPUs, next-generation GPUs, and next-generation DPUs will continue to be easily provisioned into them, and we can have the best time to market and the best investment protection. We can configure hundreds of configurations for different diversity and different applications and integrate into cloud or enterprise data centers. You can route in a bus or power conditioner, or in a hot or cold aisle. Different data centers have different requirements, and we make them modular and flexible so that it can be applied to all of these different areas.

These are basic features. Let's take a look at other things you can do with it. System manufacturers can use it to quickly and cost-effectively build more than 100 server configurations for a wide range of AI, HPC and NVIDIA Omniverse applications. MGX supports NVIDIA's full range of GPUs, CPUs, DPUs and network adapters, as well as a variety of x86 and ARM processors. Its modular design enables system manufacturers to more effectively meet each customer's unique budget, power delivery, thermal design, and mechanical requirements.

This is the Grace superchip server, which is just a CPU, capable of economically accommodating four CPUs, four gray CPUs, or two gray superchips, but with excellent performance. If your data center has limited power, this CPU has incredible power to run the PageRank algorithm in a power-constrained environment, and there are a wide variety of benchmarks to run on. The entire Grace server consumes only 580 watts, while the latest generation of CPU servers, x86 architecture, consumes 1090 watts. At the same performance, it's basically half the power, or in other words, at the same power, you can get twice the performance if your data center is power-constrained. Most data centers today have limited power, so this is truly an amazing feature.

We love servers, I love servers, they are beautiful to me. What you're seeing right now is our Grace Hopper, an artificial intelligence supercomputer. I would like to thank all of you for your great support, thank you.

We need to expand artificial intelligence to a new field. If you look at data centers in the world, data centers are computers right now. It can be said that the network defines the function of the data center. At a high level, there are two types of data centers today. One is for hyperscale data centers with a variety of different application workloads. This type of data center mainly uses CPUs, and the number of GPUs used is relatively small. Such data centers have a very large number of users, the workload is very uneven, and the workload is loosely coupled.

There is also a type of data center, which is like a supercomputer data center, an artificial intelligence supercomputer data center, whose workloads are tightly coupled. It has a very small number of users, sometimes only one. Its purpose is to deal with massive computational problems. It is basically unbeatable. So supercomputing centers and artificial intelligence supercomputers, world clouds and hyperscale clouds are very different in nature.

10

Launch of the new accelerated Ethernet platform, Spectrum-X

Ethernet is based on TCP, which is a lossy algorithm and is very resilient. Whenever there is a packet loss, transmission occurs. It knows which packet was lost and requests the sender to resend it. Ethernet is able to connect components almost anywhere, which is why the world Internet was born. If it requires too much coordination, how can we build the internet today? So Ethernet has a profound contribution. In Ethernet's view, lossy capability is resiliency because it can basically connect almost anything together. However, a supercomputing data center can't afford it. You can't connect random things together, because for a $1 billion supercomputer, the gap between 95 percent of network throughput and 50 percent of network throughput is actually $500 million. The cost of workloads running on entire supercomputers is so high that businesses can't afford any loss in the network.

Unlimited bandwidth (Infiniband) is very dependent on our DMA. It's a flow control, a loss-free approach. It requires flow control, which basically means that you have to understand the bottlenecks in the data center, switches, and software from end-to-end so that you can leverage adaptive routing to coordinate traffic, handle congestion control, and avoid oversaturation of traffic in isolated areas, resulting in packet loss. You simply can't afford it because with unlimited bandwidth, it has less to lose. Therefore, one has losses and the other has less losses, is very resilient and has very high performance. These two data centers are completely different. Now, we want to bring generative AI to every data center. The question is, how do we introduce a new type of Ethernet that is backwards compatible with everything but designed in a way that brings AI workloads to any data center in the world.

It's been a very exciting journey, and at the heart of this strategy is a new transformation we've made. This is Spectrum-X. Everything I've shown today is heavy. This is a switch with 128 ports at 400 gigabits per second. This is its chip, very huge. The chip has 100 billion transistors, the size is 90 mm by 90 mm, and there are 800 balls at the bottom. The switch consumes 2800 watts, it is air-cooled and contains 48 PCBs. That is, 48 PCBs merged together to form Spectrum-X. The switch is designed to support the new Ethernet.

Based on network innovation, Spectrum-X tightly couples the NVIDIA Spectrum-4 Ethernet Switch with the NVIDIA BlueField-3 DPU, achieving 1.7 times the overall AI performance and energy efficiency improvement compared to the traditional Ethernet fabric, and enhancing multi-tenancy capabilities through performance isolation to maintain consistent and predictable performance in multi-tenant environments.

Remember what I said? Unlimited bandwidth is fundamentally different because because we build unlimited bandwidth from end-to-end, we can do adaptive control and adaptive routing, so we can do congestion control, so that we can isolate performance so that we can isolate noisy neighbors, we can earn revenue in structural calculations, all of which are not possible in lossless Internet and Ethernet methods.

The way we do unlimited bandwidth is to design from scratch, just like a supercomputer. This is how supercomputers are constructed. We have to do the same. Now, for the first time for Ethernet, we've been waiting for the critical part. We will bring this new DPU to the world as every data center wants to turn itself into a generative AI data center. Some people need to deploy Ethernet throughout the company, and their data center has many users, it is very difficult to have unlimited bandwidth capabilities and isolate them within the data center. For the first time, we brought the power of high-performance computing to the Ethernet market.

We will bring several things to the Ethernet market. The first is adaptive routing. Basically, it is based on the amount of traffic passing through the data center, depending on which port of that switch is overly congested. It will tell Bluefield-3 to send it to another port. Bluefield-3 on the other end will reassemble it and present the data to the CPU, to the computer, and then to the GPU, completely in our DMA, without any CPU intervention. First, adaptive routing, and second, congestion control. It is possible. In which case a different port can become severely congested, the telemetry of the switches lets each switch know how the network is performing and contacts the sender. Please do not send more data right away because the network is currently congested. Congestion control basically requires an important system that includes software, switches that work with all endpoints, and is used to fully manage congestion or traffic and throughput in the data center.

It's important now to recognize that in high-performance computing applications, each GPU has to do its job so that the application can move forward. In many cases, you do all the cuts and you have to wait until each individual result. If a node takes too long, everyone is affected. This capability will greatly improve the overall performance of Ethernet. Really excited to launch Spectrum-X.

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

NVIDIA Spectrum-X accelerates AI workflows that can experience performance penalties on traditional Ethernet networks

As a sketch and testbed for the Spectrum-X reference design, we are building a hyperscale generative AI supercomputer Israel-1 in our Israel data center. The device, worth hundreds of millions of dollars, features Dell PowerEdge XE9680 servers, NVIDIA's HGX H100 supercomputing platform, and the Spectrum-X platform with BlueField-3 DPUs and Spectrum-4 switches.

The world's applications, the world's businesses have not yet enjoyed generative AI. So far, we have been working with CSPS. CSPS will be able to bring generative AI to many different applications in many different regions and industries. The great journey lies ahead of us. With so many businesses and everyone in the world, and because of the multimodal capabilities I mentioned earlier, every industry can now benefit from generative AI.

11

Assist various industries to customize language models

We have to do several things. First, we have to help the industry customize language models, which not everyone can use in public services. For some customers, they need highly specialized language models. All industries have proprietary information. How can we help them do that? We launched Nvidia AI Foundations, a cloud service product for building custom language models and generative AI models, which includes development services for language, images, video, and 3D models.

These models are created for specific tasks in the domain of the enterprise and are trained on proprietary data. Let's have this model run Nvidia AI Enterprise, the operating system I told you about earlier. This operating system runs in every cloud. This gives this very simple system a video AI foundation for training large language models and deploying them in video AI enterprises. It is available in every model, allowing every business to be able to participate.

One thing that few people realize is that there is only one software stack today that is enterprise secure and enterprise-grade, and that software stack is the CPU. This is because in order to be enterprise-grade, it must be secure and must be managed and supported by the enterprise throughout its lifecycle.

There is so much software, more than 4,000 software packages, in accelerated computing, that people need today to use accelerated computing in data processing, training, and optimization, all the way to inference. This is the first time we've used all the software, and we'll be maintaining and managing it the way Red Hat did with Linux. Now, enterprises can finally have enterprise-grade and enterprise-grade security software stacks. It's a big deal. Otherwise, while the promise of accelerated computing is possible for many researchers and scientists, it is impossible for businesses.

Let's take a look at what this does for them. It is a simple image processing application. If you run it on a GPU, not on a CPU, you can increase efficiency by more than 20 times, or you only have to pay 5% of the cost. It's amazing. That's the benefit of accelerated computing in the cloud. But for many companies, you can't achieve this unless you have a stack.

Nvidia AI Enterprise is now fully integrated into AWS, Google Cloud, Microsoft Azure, and Oracle Cloud, and when you deploy workloads in these clouds, if you want enterprise-grade software, or if your customers need enterprise-grade great software, Nvidia AI Enterprise is waiting for you. It is also integrated into machine learning product lines worldwide. As I mentioned earlier, AI is a different type of workload and a new type of software. This new type of software has a whole new software industry. We are 100 percent connected to the software industry through Nvidia AI Enterprise.

Now let me tell you about the next stage of artificial intelligence, artificial intelligence meets digital twins. Why does AI need a digital twin? I'm going to explain that in a moment, but first let me show you what you can do with it. In order for AI to understand heavy industry, it is necessary to give AI a digital twin. Remember, so far, AI has only been used for light industry, information, text, images, music, and so on. If you want to use AI in heavy industry, a $50 trillion manufacturing industry, all the different manufacturing bases, whether you're building chip factories, battery factories or electric car factories. All of this must be digitized in order to automate the design and automation of the business of the future using artificial intelligence.

The first thing we have to do is we have to create an ability for their world to be expressed in numbers. The first is digitalization. Why would you use it? Let me give you a simple example. In the future, you can say to your robot, I want you to do something, and the robot will understand your words and generate animations. Remember, I said before that you can go from text to text, from text to images, from text to music. Why can't you go from text to animation? So in the future, robotics will be highly innovated by the technology we already have. However, how does this robot know that the movement it produces is based on reality? It is based on physics. You need a software system that understands the laws of physics.

Now, with ChatGPT, you've actually seen that. Nvidia AI can use Nvidia Omniverse, just like self-reinforcement in a reinforcement learning loop. ChatGPT is able to use reinforcement learning, using human feedback. Using human feedback, ChatGPT is able to develop on a human basis and align it with our principles. Therefore, reinforcement learning with human feedback is very important. Reinforcement learning is important for physical feedback.

All the video content you see is analog, nothing artistic. Isn't it amazing? For the last 25 years, when I came here, you always sold me things. I'm nervous right now that the Omniverse will be the first item I'll pitch to you. That's because it will help you revolutionize your business, turn it into a digital one, and automate it through artificial intelligence. You will first produce products digitally. Before making it physical, you will build a factory; Before making it physical, plan it with numbers. So in the future, Omniverse's business scale will be very large.

Now I'm going to show you Omniverse in the cloud soon. The entire stack of Omniverse is complex. We put everything into cloud management services and are hosted in Microsoft Azure.

This is in California, about 6,264 miles from us. The video lagged by only 34 milliseconds and was fully interactive. Everything is ray traced. No art required. You put everything in the whole CAD into Omniverse, open the browser, bring your data in, bring your factory in. No art required. Lights just do what lights are supposed to do. Multiple users, as many as you want, can enter Omniverse at the same time and work together.

You can virtually establish a unified data source for your entire company, and you can virtually design, build, and operate factories. Don't make mistakes before breaking ground, which usually generate a lot of change at the beginning of the integration and cost a lot of money. Don't just notice that humans are interacting with Omniverse now, generative AI will be interacting with Omniverse in the future, and we can even use generative AI to help us build virtual worlds.

12

Partnering with WPP to develop generative AI content engines

Today we announced that WPP, the world's largest advertising agency, will partner with NVIDIA to develop a content generation engine based on Omniverse and generative artificial intelligence. It integrates tools from many different partners. Adobe Firefly, for example, is integrated into the entire environment, enabling it to generate unique content for different users and advertising applications. For example, in the future, whenever you do a particular ad, it will be born for you. But the product is rendered precisely because the integrity of the product is very important. Every time you run a particular ad in the future, it will be retrieved.

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

Notice that the calculation mode has changed. WPP produces 25% of the world's advertising, and 60% of the world's largest companies are their clients.

13

Open robotics platform

We also brought our own robotics platform, Nvidia ISAAC AMR, which has become a reference design for any company that wants to build robots. Like we did with high-performance computing and video, we build the whole stack, and then we break it down. Whether or not we buy our chips, complete systems or software, whether we use our algorithms or not. If you prefer to use your own algorithms, our business is open and can help you integrate accelerated computing wherever you like.

Must read, collect! Huang Jenxun's latest speech (20,000 words transcript + full video review)

In the future, we will do the same in the field of robotics. From chip to algorithm, we build the entire robot stack from top to bottom. It's completely open, and you can use any or all of it.

This is the Nvidia ISAAC AMR, which includes a chip called Orin. This is currently the most advanced AMR in the world.

We can design robots in ISAAC, simulate robots, train robots, and then implant the brain ISAAC SIM to turn robots into actual robots. With some tweaking, it should be able to do the same job. This is the ecosystem of future robots, Omniverse and artificial intelligence working together. The IT industry is finally capable of understanding the language of the physical world. We can understand the language of heavy industry. Our software tool, Omniverse, allows us to simulate, develop, build, and operate our physical factories, physical robots, and physical assets as if they were digital.

The excitement of heavy industry is incredible. We have been using Omniverse to connect tool companies, robotics companies, sensor companies, and various industries around the world. As we said, there are three industries that are now making big investments, namely the chip industry, the battery industry, and the electric vehicle industry. These industries will invest trillions of dollars in the coming years. They all want to do better. We now give them a system, a platform tool, to realize their vision.

I want to thank all of you for being here today. I talked a lot about things. We haven't seen each other for a long time. So I have a lot to tell you. I said a lot last night, and I said a lot this morning, and I said a little too much. To summarize: First, we are experiencing two simultaneous transformations of the computing industry: accelerated computing and generative artificial intelligence. This form of computing differs from traditional general-purpose computing in that it is full-stack. This is at the scale of the data center, because the data center is the computer. It's domain-specific for every field, every industry you want to go into, and you need to have a software stack. If you have a software stack, then the utility, the utilization of the equipment, the utilization of the computer will be high.

Second, it is the number of full stacks. We are fully committed to developing the engine of generative AI, which is the HGX H100. In the meantime, the engine that will be used in AI factories will be extended with Grace Hopper, the engine we created for the era of generative AI. We were also using Grace Hopper and realized that we could scale performance on the one hand, but we also had to scale in order to make larger trainable models.

The NVIDIA MGX server specification, which provides a modular reference architecture for system manufacturers. System manufacturers can use it to quickly and cost-effectively build more than 100 server configurations for a wide range of AI, HPC and NVIDIA Omniverse applications.

We want to extend generative AI to businesses all over the world, and servers come in so many different configurations. We put the video in the cloud so that every business in the world can let us create a generative AI model and deploy it in a secure way, in every cloud in an enterprise-grade and enterprise-secure way.

The NVIDIA Avatar Cloud Engine is a custom AI model foundry service that middleware, tools, and game developers can use to build and deploy custom voice, conversation, and animation AI models. It empowers non-player characters with smarter and evolving dialogue skills, allowing them to answer player questions with lifelike personalities. Based on NVIDIA Omniverse, ACE for Games provides optimized AI base models for speech, dialogue, and character animation, including: NVIDIA NeMo, which uses proprietary data to build, customize, and deploy language models; NVIDIA Riva for automatic speech recognition and text-to-speech for real-time voice conversations; NVIDIA Omniverse Audio2Face, used to instantly create emoticons of game characters to match any voice track.

Finally, we want to extend AI to the global industrial sector. So far, the industry I'm in, the industry we're all in, is a small part of the world's industry. For the first time ever, we're doing work that touches every industry, and we're doing that by automating factories and robots automating them.

Thank you all for your cooperation over the years. Thank you.