laitimes

Live interview with Jensen Huang: 20 soul questions, talking about GPU pricing and Chinese exports, and AGI timelines

Live interview with Jensen Huang: 20 soul questions, talking about GPU pricing and Chinese exports, and AGI timelines

San Jose, USA, reported on March 19 that the most eye-catching AI technology event in the American technology circle, the NVIDIA GTC conference, is being held. Nvidia founder and CEO Jensen Huang communicated with global media such as Xindong at the GTC conference, responding to 20 key issues such as the impact of Sino-US friction on NVIDIA, China's GPU product export plan, Blackwell GPU's pricing and sales strategy, and TSMC's CoWoS supply and demand situation.

NVIDIA's latest flagship AI chip, Blackwell GPU, is a dual-core design solution, while the previous generation H100 and H200 are single-chip solutions, which is not easy to make a direct comparison in pricing. Huang emphasized that there will be price differences between different systems, and that Nvidia is eyeing the entire data center business rather than just selling chips.

In addition, according to Huang, Samsung, which has fallen heavily behind in the HBM3E competition, has taken advantage of Nvidia, which is testing Samsung HBM and has announced that it will use it.

On both days, the GTC conference was visibly popular with the naked eye, with the San Jose Convention Center surrounded by Nvidia GTC banner advertisements and the streets filled with attendees wearing the iconic NVIDIA green badge. There are also some Nvidia partners who are fancy to help, such as Unitree Technology, which sent a team of robot dogs to tease native American dogs on the street, and WEKA ingeniously parked a few eye-catching purple cars on the side of the nearby street, with the words "NVIDIA DGX SuperPOD certified" written in large letters on the front cover of the car.

Live interview with Jensen Huang: 20 soul questions, talking about GPU pricing and Chinese exports, and AGI timelines

▲Eye-catching purple cars and advertising slogans parked on the side of the road near the GTC venue (photo by Xindongxi)

In addition to NVIDIA's new products, Huang also shared his views on OpenAI's video generation model Sora, OpenAI CEO Sam Altman's plans to scale up chips, how to predict the AGI timeline, whether AI will wipe out coders, and how to respond to the clamor of AI chip startup Groq.

Especially the confrontation with Groq, which is about to turn into a reversal TV series, just after the end of Nvidia's GTC keynote speech, Groq, a large-scale model reasoning chip startup that came out of the circle with a porcelain technology boss, issued an article, naming and hardening Nvidia's statement: "It's still faster". Today, Groq added a "...... And it still consumes less power."

Live interview with Jensen Huang: 20 soul questions, talking about GPU pricing and Chinese exports, and AGI timelines

Asked for his thoughts on the matter at a media conference, Huang responded, "I really don't know much about it, and I can't make a wise assessment...... The chip exists to implement this software. Our job is to facilitate the invention of the next ChatGPT. If it was Llama-7B, I would be very surprised and shocked. ”

Before the matter was over, Groq founder and CEO Jonathan Ross immediately posted a photo with Huang on social media: "I've met Huang before, and his team updated GTC specifically this week in response to Groq, so it seems unlikely that I don't know much about Groq." In other words, Groq can run a 70 billion parameter model faster than Nvidia can run a 7 billion parameter model. Experience it: groq.com"

Live interview with Jensen Huang: 20 soul questions, talking about GPU pricing and Chinese exports, and AGI timelines

The cutting-edge American AI chip companies obviously attach great importance to GTC and pay close attention to it.

Cerebras, which has just released its third-generation wafer-level chips, today held Cerebras AI Day less than a 10-minute walk from the GTC exhibition area, where it announced that "the world's fastest AI chip CS-3 with 4 trillion transistors", "Qualcomm was selected to deliver unprecedented performance in AI inference", and the AI supercomputer G42 with a computing power of 8EFLOPS broke ground, and shared the core of wafer-level architecture, AI capability gap, GPU challenges, Large models are best trained on large chips, and new multimodal large models are released.

Live interview with Jensen Huang: 20 soul questions, talking about GPU pricing and Chinese exports, and AGI timelines

Today's Dayeus AI (the 1970s)

Cerebras did not forget to step on the GPU in the post: "On CS-3, we were able to train at scale with an order of magnitude performance advantage over GPUs. But even our largest clusters themselves operate as individual devices...... Now applause!"

Live interview with Jensen Huang: 20 soul questions, talking about GPU pricing and Chinese exports, and AGI timelines

Here are 20 Q&As from Huang's media briefing session (some of the questions and answers have been edited for ease of reading, as little as possible):

1. How much impact will the Sino-US friction have on NVIDIA?

1. How do U.S.-China tensions affect manufacturing and systems?

"Yes, there are two things we have to do, one is to make sure we understand and follow the policies, and the other is to do everything we can to make our supply chain more resilient," Huang replied. ”

The world's supply chain is complex, he said, for example, HGX has 35,000 parts, eight of which are from TSMC and a large portion of the others are from China, as are the automotive and defense industries.

He believes that the goals of the countries are not opposite: "The apocalyptic scenario is unlikely to happen, and I hope it will not happen." What we can do is about resiliency and compliance. ”

2. How has the relationship between Nvidia and TSMC developed in the past two years, including chips, packaging, and the Blackwell dual-core design?

Huang called Nvidia's partnership with TSMC "one of the closest in the industry." Nvidia does something hard, but TSMC does it well. Nvidia has computing die, CPU die, GPU die, CoWoS substrate, and memory from Micron, SK hynix, Samsung, assembled in Taiwan. The supply chain is not simple. This requires the coordination of large companies to do this for NVIDIA.

"They also realized that more CoWoS was needed. We'll take care of it all. "It's good to collaborate across companies, you assemble them, another company does the testing, and the other company builds the system, you need a supercomputer to test the supercomputer, and the manufacturing layer is a huge data center.

"Blackwell is a miracle, but we have to make it happen at the system level. People ask me if I make GPUs like SoCs, but what I see are racks, cables, and switches, which is my mental model of GPUs. TSMC is very important to us. Huang said.

3. For TSMC, companies always want to get more, can you talk about the supply and demand of NVIDIA this year and next year? For example, NVIDIA's CoWoS demand this year is 3 times that of last year?

"You want exact numbers, it's interesting. Huang said Nvidia's demand for CoWoS is very high this year and will be even higher next year because it is at the beginning of its AI transformation — with only $100 billion to invest in this journey, there is still a long way to go.

Huang is very confident in TSMC's growth, saying they are good partners and deserve to be what they are. He thinks people are working very hard and the technology is in the perfect position. Generative AI is in an incredible position.

4. How much does Nvidia plan to sell its new network technology to China, and can you tell China about its specific tendency to integrate other technologies on computing chips?

"I barely announced it this year, so I'm a little greedy," Huang said. This is what we want to announce. Whenever and wherever it is sold to China, of course, there are export controls, so we will think about it. For China, we have L20 and H20. We are doing our best to optimize it for certain customers in China. ”

5. When cloud computing manufacturers have begun to develop their own chips, Nvidia is turning to the cloud business, what do you think of this phenomenon? Will their self-developed chips affect the price? What is NVIDIA's cloud computing strategy and solutions in China?

Huang replied that Nvidia makes the HGX, sells it to Dell, which puts it in its computer and sells it. Nvidia developed software that runs on Dell (devices) to create market demand to help sell these computers.

"We've partnered with cloud service providers to put NVIDIA Cloud into their clouds. "We're not a cloud computing company, our cloud is called DGX Cloud, but we're actually part of their cloud, and our goal is to bring customers to the cloud and let customers transact on this machine." ”

"We will develop developers, and we will create demand for cloud services. "It's not about anybody's chips — Nvidia is a computing platform company and we have to develop our own developers — that's why GTC exists." ”

"If we're an x86 company, why do we have a developer conference?" Huang asked sharply, "What's a developer conference? Because the architecture is still being accepted, it's complex to use, and we haven't overcome it, so DRAM doesn't need a developer conference, the Internet doesn't need a developer conference, but computing platforms like ours do, because we need developers, and these developers will appreciate Nvidia's ubiquity on every cloud." ”

2. Explain Blackwell pricing: I don't want to sell GPUs, data centers are the pursuit

Raymond James analysts estimate that Nvidia will cost about $3,320 to manufacture each H100, B200 will cost about $6,000, and the GB200 solution will cost much more than the single-chip GH100 with 80GB of memory, and an H100 will cost $25,000~$30,000, and the price of the new GPU will be 50%~60% higher than that of the H100.

However, Nvidia has not disclosed its pricing, which is also a rare time for Nvidia not to directly list the details page of the B200 on the official website, only releasing the introduction information of DGX B200 and DGX B200 SuperPOD, and the Blackwell architecture introduction page has not yet been launched.

Live interview with Jensen Huang: 20 soul questions, talking about GPU pricing and Chinese exports, and AGI timelines

▲Screenshot splicing of the catalog of NVIDIA's official website (the green part is the new product released at this GTC conference)

In an exclusive interview with CNBC this week, Huang revealed that the R&D budget for the new GPU architecture is about $10 billion, and the price of Blackwell GPUs is about $30,000~$40,000. Huang added clarification on this issue at today's media briefing:

6. What is the pricing range for Blackwell? You mentioned earlier that the price of each Blackwell GPU is $30,000-$40,000. And TAM, what percentage of the $250 billion TAM do you want to have?

Huang replied, "I just want to give you a general idea of the pricing of our products, and I'm not going to quote — we're not selling chips, we're selling systems." ”

According to him, Blackwell's pricing for different systems is different, not only Blackwell, but also NVLink, the partitions are different, NVIDIA will price each product, and the pricing will come from TCO as always. "Nvidia doesn't make chips, Nvidia builds data centers. Huang emphasized.

Nvidia built a full-stack system and all the software, debugged it, made it high-performance, and built the data center. Nvidia breaks down the data center into many modules, so that customers can choose how to configure it according to their needs, and decide how much to buy and how to buy it.

One reason is that maybe your networking, storage, control plane, security, management are different, so Nvidia works with you to break everything down, helps you explore how to integrate them into your system, and has a dedicated team to help.

So it's not about buying chips, it's not about the way people used to sell chips, it's about designing and integrating data centers, and Nvidia's business model reflects that.

As for what percentage of the $250 billion TAM Nvidia wants to make? Huang said that Nvidia's opportunity is not a GPU opportunity, but a chip opportunity. The GPU market is very different from the one that Nvidia is pursuing, which is pursuing data centers. The global data center is about 200 billion euros, and this is one of those buildings. The Nvidia opportunity is part of that $250 billion, and it will grow now, and AI is proving to be quite successful, it was $250 billion last year, in line with a growth rate of 20-25%, and the long-term opportunity will be $1 trillion ~ $2 trillion, depending on the timeline.

7. When building a platform like Blackwell, how do you estimate the computing needs of [customers]? The goal is basically to increase compute, how do you think about power, efficiency, and sustainability?

"We have to figure out the physical limits, reach the limits, and go beyond them. Huang said that how to go above and beyond is to make things more energy-efficient, for example, you can train GPT with 1/4 of the power.

Hopper requires 8,000 GPUs for tasks, while Blackwell requires 2,000 GPUs, consuming less energy efficiency at the same time. Because it's more energy efficient, you can push the envelope. Energy efficiency and cost efficiency are top priorities. Nvidia has saved a lot of energy by speeding up the generation of tokens from large language models by 30 times, that is, the energy required to produce the same tokens has been reduced to 1/30 of the original.

8. In addition to HBM, what do you think of Samsung and SK hynix's production?

Huang joked: "It's like asking TSMC, apart from foundry, apart from GPUs, do you still like Nvidia?"

According to him, HBM is complex and has high added value. Nvidia spends a lot of money on HBM!

"We are testing Samsung HBM and we will use it. "Samsung is a great partner. South Korea is the country with the largest production volume of advanced memory in the world. HBM is very complex, it's not like DDR5. It's a technological marvel. That's why it's so fast. HBM is like logic, and it's getting more and more complex, more and more semi-customized. ”

He praised HBM as a miracle and that DDR for the entire data center is a thing of the past and the future belongs to HBM thanks to generative AI.

"The upgrade cycle for Samsung and SK hynix is incredible. Our partners will grow with us. We will replace the DDR in the data center with HBM. Energy efficiency has improved a lot. That's how Nvidia is making the world more sustainable — with more advanced memory, less power consumption, Huang said.

9. What is the overall strategy and long-term goal of cooperation between NVIDIA AI foundry and enterprises?

Huang said the foundry's goal is to make software, not software as a tool, but let's not forget that Nvidia has always been a software company. Nvidia created two important pieces of software a long time ago, one called OptiX, which later became RTX, and the other called cuDNN, which is an AI library, and we have a lot of different libraries.

The library of the future is a microservice that is not only described mathematically, but also in AI. These libraries, Nvidia called cuFFT, cuBLAS, cuLitho - in the future they will be NIM. These NIMs are some of the most complex pieces of software that NVIDIA packages so you can go to a website to use it, or download it, and run it on the cloud or on a computer or workstation. NVIDIA will make NIM perform better.

When an enterprise runs these libraries, the custom operating system licenses them for $4,500/GPU/year, and you can run as many models as you want on them.

3. AI chip competitors openly provoked, Huang shot back "I really don't understand"

10. What is your comment on a chip startup like Groq, which posted a tweet yesterday saying that it should be faster than your "child"?

"I don't really know much about it, and I can't make an informed assessment. Huang believes that token generation is difficult, depending on the model you want, and each model needs its own special partitioning method.

In his opinion, being a Transformer is not the end of all models - each Transformer is related because they all have attention, but they are all completely different, some are feedforward or MoE (hybrid experts), some MoEs are 2 experts, some are 4, and the division of labor is different, so each of these models requires very special optimization.

If a computer is too fragile and designed to do something very specific, it becomes a configurable computer, not a programmable one. It doesn't let you benefit from the speed of software innovation.

Huang believes that the reason for the CPU miracle should not be underestimated: because of programmability, the CPU has overcome the configurable things on the motherboard and on the PC over time. The genius of a software engineer can be realized through the CPU, and if it is fixed to the chip, it cuts off the chip talent of the software user. What it's really trying to do is benefit from both.

Nvidia has found a special form of computing, he said, that uses a parallel stream computing model that is fault-tolerant, performs very well, and is programmable. There's an architecture that's been around since AlexNet, across all the models, and eventually Transformer came along, with a whole bunch of variants, and these models are constantly evolving in the state space, memory, and architecture.

"It's important that we can make a horizontal model. "The chip exists to implement this software." Our job is to facilitate the invention of the next ChatGPT. If it was Llama-7B, I would be very surprised and shocked. ”

4. What do you think of OpenAI's chip factory network plan?

11. Sam Altman has been talking to the entire chip industry about expanding the scope and scale. Did you talk to him? What do you think he wants to do? How does that affect you and NVIDIA?

"I don't know his intentions, unless he thinks generative AI is a huge market opportunity, and I agree. Huang said.

He started with the fundamentals, that today's computers generate pixels, retrieve, decompress, and display. It is thought that the whole process requires very little energy, but the opposite is true. The reason is that every prompt word, every thing, every time you use your phone, it goes to a data center somewhere, gets some response in a way that makes sense from a recommender system perspective, and then sends it back to you.

For example, if every time you ask him a question, he has to run to his office instead of answering it directly, which is a waste of time and energy. He believes that the way to work together should be to scale up AI generation. More and more computations in the future will be generated, not retrieved, and generations must be smart and contextually relevant.

"I believe, and I think Sam believes, that almost every pixel on every computer, every time you interact with a computer, is generated by a generative chip. He hopes that Blackwell and subsequent iterations will continue to contribute a lot in this area.

"I wouldn't be surprised if everyone's computer experience was generative. But that's not the case today. That's a big chance, and I think I'll agree with that. Huang said.

5. AI writes code for others, and humans don't have to learn programming?

12. Did you say earlier that no one needs to learn programming anymore, implying that people shouldn't learn programming skills?

Huang believes that people are learning a lot of skills, and that skills like piano and violin are really hard, and that whether it's math, algebra, calculus or differential equations, people should learn as many of these skills as they can. But for those who succeed, programming skills are not essential.

"There was a time when a lot of big guys around the world were advocating that everybody had to learn to code, so you were inefficient. "But I think it's wrong, it's not a one-person job to learn C++, it's a computer's job to make C++ work." ”

In his opinion, AI has already made the greatest contribution to society – you don't have to be a C++ engineer to succeed, just be a timely engineer. For example, humans communicate through conversations, and we need to learn how to prompt AI, just like prompting teammates to get the results you want in sports, depending on the work you want to do, the high-quality results you want to achieve, whether you are looking for more imagination, or if you want to be more specific in the results. Depending on the answer, different people, you will give different hints.

"I believe that the first great thing AI can do is to close the technology gap. Look at all the videos on YouTube, it's people creating AI, not writing any programs, so I think it's interesting. "But if somebody wants to learn to code — do it." We're hiring programmers!"

6. Setting a timetable for AGI, are you afraid of AGI?

13. You mentioned earlier that AGI will be realized in 5 years, is this timetable still there? Are you afraid of AGI?

Huang replied with a slight backlash: "First, define AGI. He was silent for a moment, then continued: "I paused because now, as I said, I'm sure it's hard for everyone to do that. I want you to define AGI specifically, so that each of us knows when we can arrive. ”

He directly expressed his dissatisfaction with the previous practice of news reports being taken out of context: "Every time I answer this question, I specify the AGI specification. But every time it was reported, no one specified. So it depends on what your goals are. My goal is to communicate with you. Your goal is to figure out what story you want to tell. ”

"OK, so I believe in AGI, as I pointed out, probably in 5 years, AGI, which is general intelligence, I don't know how we define each other, that's why we have so many different words to describe each other's intelligence. He said.

In Huang's view, predicting when we'll see a generic AGI depends on how AGI is defined, and what AGI means in the context of the problem needs to be clarified.

He gave two examples, such as defining where Santa Clara is, where it is located, and defining the New Year, where everyone knows when the New Year is coming, despite the different time zones.

But AGI is a little different. If we designate AGI as something specific, such as a software program that can be excellent (80 percent or more) after completing a set of tests, better than most people or even better than everyone else, do you think computers can do that in five years? The answer is probably yes, Huang said.

These tests can be math, reading, logic, academic, economics tests as well as bar eligibility, pre-med exams, etc.

14. How will our lives change in the future with large language models and basic models?

The problem, Huang argues, is how do we have our own large language models.

"There are a couple of ways to do it, and at first, we think you're fine-tuning, but fine-tuning is time-consuming, and then we find cue tuning, we find long context windows, working memory. I think the answer is a combination of all these factors. He said.

In his opinion, in the future, you can fine-tune by adjusting only one layer of weights. You don't need to tweak everything, just fine-tune one layer like LoRA. Low-cost fine-tuning, prompt engineering, context, memory storage, all of which come together to make up your custom large language model. It can be in a cloud service or on your own computer.

15. Where is the biggest growth opportunity for software?

Huang said Nvidia's recent opportunities are for two types of data center computing, one about modern computing in the data center and the other about new prompt generation for the data center.

Nvidia is doing this to help customers build AI. Llama、Mixtral、Grok...... A lot of teams create AI, but it's hard to use. The base model is primitive and not easy to use.

Nvidia will create some of these and then select some of the mainstream open source partners and turn those models into usable models for product quality. It also needs to provide services, such as NeMo.

"We're not just going to invent AI, we're going to make AI software so that everyone can use it. Our software is about $1 billion running rate, and I think manufacturing AI can certainly do quite a bit. Huang said.

16. Some key tasks require 100% correctness, can the AI illusion problem be solved?

Huang believes that hallucinations can be solved, as long as the answers are well researched.

He talks about adding a rule that for each answer, you have to look for the answer, and that's RAG retrieval enhancement generation. If you make a query, it should do a search first, not make up an answer and output, but prioritize the most accurate answer and then feed back to the user. If this AI is important, it doesn't just answer you, it does research first, determines which answer is the best, and then summarizes. It's not an illusion, it's a research assistant. It also depends on the critical situation – more guardrails or timely works.

For answers to mission-critical questions, such as health advice or similar questions, Huang believes that it may be possible to check multiple sources and known sources of truth to go ahead.

17. You talked about using generative AI and simulation to train robots at scale, but a lot of things are not easy to simulate, especially when robots go out of the built environment, what do you think are the limitations of simulation? What should we do when we encounter these limitations?

Huang said there are several different ways to think about it. The first is to build your idea for a large language model. Keep in mind that large language models operate in an unconstrained, unstructured world. This may be a problem, but it has learned a lot from it. The generalization ability of large language models is magical, and then the context window is obtained by iteration or through prompts.

For example, if you're making an omelet in the kitchen, only you can specify the problem, specify the background, the tools you can use, and describe the environment of the robot, which should be able to generalize effectively.

This is the ChatGPT moment for bots. There are still some issues that need to be addressed, but inferences can be seen. It all can generate tokens that were generated before the bot looked like this. Robotics makes sense for software. The software doesn't understand the difference, it's just a token. So you have to organize all the poses, annotate all the outputs, generalize the environment, input the context, reinforce the human feedback, give it a whole bunch of proper Q&A examples, proper answers in philosophy, chemistry, mathematics.

Some of them are described in the page. You may need more than 10,000 large model examples to make ChatGPT. Our brains can distinguish the difference between words and robotic actions, and the computer can only see numbers, it doesn't know the difference between these things.

18. Regarding computer games, last year you said that every pixel will be generated and rendered, how far do you think we are from this world where every pixel is generated at a real-time frame rate?

Huang believes that with almost all technologies, the S-curve is not longer than the technology. Once it becomes practical and better, like ChatGPT, I don't think it will take more than 10 years. In 10 years, you're a different kind of expert, and in 5 years, things are changing in real time and everything is happening. So you just have to decide how far we've gone in this regard. It's been about 2 years now. For the next 5-10 years, that's basically how it was.

19. You have said that many industries will usher in a ChatGPT moment, can you pick one that excites you?

Huang said some of his excitement was due to technology, some from his first encounter, and some from impact.

"I'm really excited about Sora, OpenAI is doing a great job, and last year we saw the same thing at Wayve, a self-driving company, and you also saw some examples of what we did, almost two years ago, about generating videos from work. He said.

In order to generate the video, the model must understand physics, so when you put the cup down, the cup is on the table, not in the middle of the table. It has sensibility. It doesn't have to obey the laws of physics, but it has to be sensible and understand all the laws of physics.

Second, Huang believes that Nvidia's work on CoreDiff, a generative AI model for the Earth-2 climate digital twin cloud platform, has a huge impact on predicting weather within 2-3 kilometers. Nvidia has made it 3,000 times more energy efficient and 1,000 times faster, predicting flight paths in extreme weather, and sampling more frequently and 10,000 times in chaotic weather. The ability to get the most likely answer for this example is greatly improved.

Thirdly, the work done in molecular generation, drug discovery, in druggable molecules that have very desirable properties for the protein of interest. You can put it in a reinforcement learning loop like AlphaGo to generate connections between various molecules and proteins, and then explore huge spaces. It's very exciting.

20. Tell us more about your thoughts on drug discovery, protein structure prediction, and molecular design, and how does this impact on other fields?

"We're probably the biggest quantum computing company that doesn't make quantum computers," Huang said. The reason we do this is because we believe in it, we want to be here, we just don't think there's any need to build another one. "A QPU is an accelerator, just like a GPU, for some very specific things.

Nvidia built cuQuantum to simulate quantum computers. There can be 34-36 qubits. People use it to simulate quantum circuits. We can do post-quantum encryption and make the world quantum-ready, because when quantum comes, all the data is correctly encoded, encrypted. Nvidia can contribute to everyone, working with most of the world's quantum computing companies. Huang believes it will take some time to bring about a breakthrough.

For digital biology, the sensitivity of NIM is derived from digital biology. BioNeMo is NVIDIA's first NIM. These models are so complex that NVIDIA wanted to encapsulate them in a special way so that all researchers could use them. BioNeMo is used in many places. Enter a pair of chemical proteins and it will tell you if the binding energy is effective, or send a chemical and ask it to produce other chemicals.

Attached: Huang's 15-minute speech dry goods information record

The on-site media session will be divided into two parts. Before the media Q&A, Huang gave a 15-minute one-man speech. During the event, Huang highlighted his views on OpenAI's video generation model Sora, and talked about the technical planning and layout logic of NVIDIA's key product lines from his insights into generative AI trends and AI programming, including the revolutionary progress of the new architecture Blackwell, the Omniverse API, and the modular design of the system.

Here's a summary of Huang's 15-minute speech:

The industry is undergoing two transformations at the same time: from general-purpose computing to accelerated computing, and the emergence of new generative AI tools.

Generative AI is referred to by some as data centers. A standard data center has files, while generative AI generates tokens, and the floating-point numbers it generates are converted into text, images, and sounds.

In the future, these tokens will be proteins, chemicals, animated machines, robots. If a computer can talk, why can't it move like a robot?

Generators are a new category, a new industry, and that's why a new industrial revolution is happening. These rooms and buildings are called AI factories. In the last industrial revolution, water and fuel were put in to generate electricity. Now what enters the AI factory is data, and what is output is token, which can be distributed all over the world and included in the company's costs, operating expenses, and capital expenditures.

In the new world, software is very complex, getting bigger and bigger, and requires a lot of different things. Today it learns through text, images, videos, reinforcement learning, synthetic data, through debates like AlphaGo. Over time, these models become more and more complex, and they learn a lot of methods.

Huang highlighted three breakthroughs:

1. Save energy and money: NVIDIA has created a new generation of computing for the future of trillions of parameters, which is realized by Blackwell. Blackwell is very energy efficient. Taking the GPT-MoE-1.8T parameter model as an example, H100 is 15MW of power for 90 days, while Blackwell is 4MW, saving 11MW. "We've reduced the amount of work. Huang said it saves a lot of energy, a lot of money.

2. AI generation: Gamers always see the GPU as a generative engine, generating images and pixels. All the images you see are generated by the largest GPUs. In the future, images, videos, text, proteins, and molecules will all be generated by GPUs. GPUs have evolved from graphics generation to AI training, AI inference, and now AI generation. Almost all of our computing experience will be generated in large quantities, and everything will be pre-recorded and personalized. Everything will be created in the future, and this requires a special processor. Nvidia built Blackwell, which has a second-generation Tranformer engine, next-generation NVLink, and multi-GPU parallelism.

3. Software: In the future, software is AI, and you can interact with it as long as you talk to it, which is very easy to use. APIs are so natural that they can connect a lot of AI together. Nvidia built NIM microservices and tied them together so that companies could use off-the-shelf, custom ones. The NeMo service helps customers customize NIM, which is known as an AI foundry. NVIDIA has the technology, the expertise, the infrastructure to make that happen, and that's the foundry, and NVIDIA can help every company build custom AI and bring AI technology to the world.

For the next wave of AI, AI must understand the physical world, according to Huang.

"We're seeing some revolutionary, amazing AI from OpenAI called Sora. When the video generated by Sora is meaningful, the car is parked on the road and turning, and a contemplative person walking down the street has a reflection, apparently the AI understands this and understands the laws of physics. "If we push it to its limits, then AI can act in the physical world, and that's robotics." ”

As a result, the next generation requires new computers to run new robots, new tools, Omniverse, digital twins, and new foundational models must be developed. Nvidia entered the market as a technology platform, not a tool manufacturer. Enterprises can use the Omniverse API to create digital twins. Huang was very pleased with the mission's success, calling the connection to the tools a "supercharge."

Blackwell is the name of the chip, but also the name of the computer system. Nvidia has a legacy x86 system called HGX, where you can pull the Hopper's tray out and push the Blackwell in. Since the infrastructure to support production already exists, production changeovers and customer growth will become much easier.

Nvidia also has DGX, a new liquid-cooled architecture that can create large NVLink domains, supporting 8 GPUs in one domain, or 16 dies. If you want to build a bigger machine, Nvidia has stacked versions of the Blackwell and Grace Blackwell superchips, as well as the NVLink Switch.

Huang said that NVLink Switch is "the world's highest-performing switch" and is very modular and popular.

Read on