laitimes

Jensen Huang's Q&A: What's the next wave of AI?

Jensen Huang's Q&A: What's the next wave of AI?

Jensen Huang, CEO of Nvidia, a leader in AI (artificial intelligence) chips, opened the conversation at the NVIDIA GPU Technology Conference (GTC).

On March 19, local time, the day after the packed NVIDIA GTC conference keynote speech, Huang participated in global media interviews and answered interview questions from more than 20 media outlets.

These questions range from Nvidia's corporate prospects, to the future of AI, cloud computing, robotics and even quantum computers, to the international situation and global supply chains.

In the interview, Huang explained the principles of several of the company's core technologies in simple terms, and repeatedly emphasized that Nvidia is not just selling chips, but is targeting the potential data center market, hoping to be "ubiquitous", due to the sheer size of the supply chain and the complexity of its products, Nvidia's chips are made up of parts from all over the world, and it is also working to strengthen the resilience of the supply chain, and one of the major contributions of AI is bridging the technology gap, allowing anyone to command software as if they were talking to a human.

The full text of Huang's remarks and media interviews is as follows, translated and edited by The Paper:

Jensen Huang: Welcome to GTC. Media from all over the world, it's a pleasure to meet all of you.

I touched on five themes [in my opening remarks]. First, our industry is going through two transformations at the same time. The first transformation was about how computers were made and how computers were built from general-purpose computing to accelerated computing. The second transformation is what it can do. In other words, the first is accelerated computing, and the second is what it can do, which we call generative AI. As a result of generative AI, a new type of tool has emerged, and this new tool is the AI generator. Some people call it a data center, but as you know, a data center is used by a lot of people. It is capable of storing a large number of files and running many applications. There are a lot of different things you can do with it. But in the case of generative AI, it does only one thing.

It (generative AI) processes one thing for a person, or company, and produces AI, i.e., produces tokens. As a revolutionary AI, when you interact with ChatGPT, it's generating tokens, generating floating point numbers that can be turned into text, images, or sounds. Proteins, chemicals, computer animations, and robots, they are no different from talking to machines. If computers can talk, why can't they make machines move? So, these capabilities, these token generators are a whole new category, a new industry. That's why we say that an industrial revolution is happening, because it's new. This new industry has created these (server) rooms, these buildings, and I call them AI factories because that's the most plausible.

In the last industrial revolution, the raw material that flowed into factories was water. What comes out is something that is invisible, called electricity. Now, we have this raw material that flows into the factory, which is data, and what comes out of it are data tokens. This token is likewise invisible, and it can be distributed all over the world and is very valuable. In the past, data centers were counted as a cost to your company, considered an operating expense, and into capital expenditure, you thought it was a cost. However, factories are able to make money. So, this new world with new generative AI, new factories, which is the AI factory, is a new industrial revolution. Do you understand? Okay, that's the first thing that's happening.

This transformation is accelerated computing, led by NVIDIA, and then second, generative AI. In this new world, software is extremely complex. ChatGTP is no easy feat, it's one of the greatest scientific breakthroughs of all time, and it's huge and ever-expanding, because there's so much you want it to learn these days. It learns from words and pictures, and it will also learn from videos. It will learn reinforcement learning, learn synthetic data generation. It will learn reinforcement through mutual dialogue, just like AlphaGo does, by debating with each other. It will be learned in many different ways. As a result, these models will become more and more complex over time.

We've created a whole new generation of computing tools for this future, and this model of the future has trillions of parameters. We call it Blackwell. Blackwell is revolutionary in several ways. First of all, it is designed to be very efficient and energy-efficient. I showed an example in my presentation where training a 1.8 trillion parameter GPT of the same specification would be completed in 90 days, not 15 megawatts, but only 4 megawatts – you save 11 megawatts. As a result, we have significantly reduced the energy consumption of our work. Energy efficiency is the effort divided by the inputs, and the effort is the training of the model. In 90 days, the input was 4 megawatts, which saved a lot of energy and, of course, a lot of money. This is the first breakthrough.

Jensen Huang's Q&A: What's the next wave of AI?

NVIDIA's latest Blackwell GPU. Source: Nvidia's official website

The second breakthrough is generation, where for the first time AI is being considered for more than just reasoning. For example, here's a picture of a cat that shows predictive reasoning about a cat. Outside of reasoning, although it still belongs to the category of reasoning, there is a profound difference, and that is the emergence of generation. It's reasoning, but it's also generating, not just recognizing, it's generating, not just understanding, it's generative AI. Blackwell was designed as a generative computer, and this is the first time that a data center understands our GPUs in this way.

Now, if you're a gamer, you've always thought of Nvidia's GPUs as a type of generative computer. Because all the images you see are generated by NVIDIA's GPUs, but in the future, everything from images, videos, text to proteins, chemicals, motion control – they're all going to be generated, and they're all going to be generated by the GPU. It's actually quite interesting, almost Back to the Future. Our GPUs have gone from generative processors and computer graphics generation, to AI learning processors, AI inference processors, and now back to the AI generative processors that started with it. In the future, almost all of our computing experiences will be primarily generated. That's not the case today, which is why this opportunity is so huge.

When you're doing calculations, ask yourself, when you're doing it on your phone, that file, that information is all pre-recorded. Someone writes it in advance, someone takes a photo in advance, someone records a video in advance, everything is pre-recorded. In the future, it's going to be enhanced by that kind of pre-recorded content, but in a unique way for you to be generated, which is why everyone's computing experience is going to be very different. We don't use search anymore. The search will be enhanced with the build. We call this RAG, or Retrieval Enhanced Generation. So, in the future, almost all of our experiences will be generated, and this generation engine will require a special type of processor, and that is Blackwell. We created Blackwell, a second-generation transformer, with a brand new transformer engine, and a very large NVlink so that we could generate a lot of information very quickly and parallel multiple GPUs at the same time. So that's Blackwell.

The third thing is, in this new world, the software that we can write is different, it's very complex, but how do large companies, enterprises, use this software as they do with Windows? It exists in binary form. You download it, install it. In SAP's case, your IT department installs it for you. Some apps are on the cloud, but if you want to create your own apps, and you need to have this incredible technology called AI built-in. So somebody has to figure out a way to package this very complex software into a container with all the high-performance computing technology, all the GPU technology, all the Tensor RTLM and distributed computing, to make it easy for people to use, but also easy to download and use, meaning you can interact with it directly.

What's really cool is that in the future, software is AI, and the way you interact with software is direct conversation. So, AI software is coming. It will be very easy to use. The API is very easy to use, very natural, and you can connect many of these AIs together. We call them NIMs, we call them NIM microservices, and we're going to help companies connect them together. You can use it directly, you can customize it, we can teach you how to customize it, you can connect it. We can teach you how to connect it with many other applications. So, we talked about NIM, we talked about this service. We'll help customers customize their own NIM approach, which we call an AI foundry.

We have the AI technology, we have the tools that we need to execute it, and of course the infrastructure of the company, and these three things – technology, expertise, and infrastructure, are basically the characteristics of a workshop. As a result, we can help every company build their custom AI. Now, who wants to customize AI? Companies that have a platform. That's why SAP, ServiceNow, ANSYS, Cadence, and NetApp want to customize AI. So we can work with them and help them build their custom AI, and we can build these AIs for them like a workshop, and they can bring it to market themselves. This gives an example of how we can leverage this AI technology and bring it to the world.

The last thing I talked about is the next wave of AI, which requires AI to understand the physical world. Of course, you've seen some revolutionary AI, including OpenAI's Sora. When Sora is generating videos, it actually makes sense. When a car is on the road, it will turn, and people walking on the street have reflections. Obviously, AI understands physics, right? It understands the laws of physics. So, imagine if we pushed it to its limits, then AI could actually act in the physical world, and that's robotics.

Sora-generated dailies (00:59)

As a result, the next generation of technology requires new computers to run in robots. We call it Omniverse's new tool that allows robots to learn in digital twins. Of course, we also need to invent some new AI models, new basic models. So the whole stack, that's how we get to market, as you know, we're a technology platform, not a tool company. We have developers, Omniverse is our digital twin, and through APIs or SDKs, we connect with developers.

We've announced a lot of great developers this time. 3D Extreme will connect to the Omniverse API to enhance photorealistic rendering and physically based rendering. Siemens, Cadence, Blackwell, Hexagon and others are all connected to Omniverse APIs, which they can use to create digital twins and become our super partners. I'm very happy with Omniverse's success in connecting these tools. These tools are essentially enhanced by Omniverse, and I'm very happy about that.

So those are the five things that we're talking about. Blackwell is both the name of a chip and the name of a computing system. This is the HGX platform, from the Ampere A100 to the H100, H200, B100 and B200 versions, this version is really great and perfectly compatible with Hopper. So, you can take out a Hopper and replace it with a Blackwell. This production transition will make it easier for customers to grow because the infrastructure is already there. We also have a new architecture with liquid cooling that allows us to create very large NVLink regions and create 8 GPUs in one NVLink domain. We wish there were bigger GPUs. In other words, one GPU, 8 hopper die. In the case of Blackwell, there are 16 die die, and each wafer die brings a significant breakthrough. So, by all means, that's Blackwell.

But if we want to create something bigger, we can keep doing it. We can stack multiple versions of Blackwell and Grace CPUs on top of each other, and they're connected together via NVlink switches, and the NVLink switch is here, which is the highest performing switch in the world, and we've stacked nine of these switches, which can connect 36 chips, 76 GPUs. Okay, I'm open to questions.

Reporter: I would like to ask, how many new network technologies are you planning to sell to China, and do we have any SKUs that are designed specifically for the Chinese market? These SKUs may integrate some other technologies, other than computing bare dies, which you can't send for sale because of the compute density, but what new SKUs are you also developing that integrate other advanced technologies, other than the ones that we saw yesterday?

Jensen Huang: I just announced this SKU. No, that's all we're talking about today. Of course, any product we sell to China must comply with export control regulations, and that is the first priority. So we're going to think about this, we're focusing on this. For China, we have L20 and H20 chips, and we are doing our best to optimize them for the Chinese market and serve the customers there.

Reporter: You mentioned in your keynote that NVIDIA is an AI workshop that works with many companies, and I think that's very important. Can you share more about your overall strategy and long-term goals?

Jensen Huang: The goal of the AI Workshop is to build software AI, not software as a tool, but remember that NVIDIA has always been a software company. One of the most important pieces of software that we created a long time ago was called Optics, which later became RTX. Another very important thing is called cuDNN, which is an AI library. We have all these different libraries. The library of the future is a microservice because the library of the future will be described not only mathematically, but also with AI. So these libraries, we used to call them cuBLAS, and there's a whole bunch of "cus", but in the future, they're all NIMs. These NIMs are super complex software, all you need to do is come to our AI website, where you can use it directly, or download it to another cloud platform, or run it on your own computer. If it's small enough, you can run it on your PC, run your workstation, run your data center. We will make the performance of these NIMs very efficient. So, this is a new way to use the NVIDIA libraries, and when you run these libraries as a business, we have an operating system that you need to license, and the cost of this operating system is $4500 per GPU per year. You can run as many models on it as you like.

Reporter: This morning you mentioned that the price of a Blackwell chip is between $30,000 and $40,000, and you didn't specify which one, so can you provide specific pricing? What is the $250 billion TAM (addressable market size) market that you mentioned in your speech, and what proportion of it is Nvidia in it?

Jensen HUANG: Thank you for your question. Okay, first of all, I'm trying to give a feel for the pricing of our product. I'm not going to make a specific offer. The reason is, we're obviously not just selling chips, and the pricing of Blackwell as one or more systems is very different. And you can't just use Blackwell, the Blackwell system includes NVlink, right here. So, this time the pricing difference is completely different, and we're going to provide pricing for each of them. Pricing for each will be based on TCO (Total Cost of Operation) as usual.

Nvidia doesn't just make chips, Nvidia builds data centers, and you can see what I showed in the last slide, obviously it's not just a chip, we build the whole thing and all the software, we spin it up, make it work, tune it up, make it efficient, do all the work that needs to be done to build the entire data center. We're actually building all of this ourselves, right? We're building a couple of our own [data centers] to make them as efficient as possible. And then that's the crazy part, and we're breaking them down into smaller pieces like this. So we take an entire data center and let you decide which parts you want to buy. That's why we leave it up to you to decide how you want to buy. Maybe your network is different, your storage is different, your control plane is different. At least your admin module will be different. So, we work with you, and we'll break everything down, figure out how to integrate it into your system, and then we have the whole team of people to help you do that. This is not how people used to buy chips. It's really designing a data center, and consolidating our data center into someone else's data center. Our business model reflects this.

So, what is the opportunity for Nvidia? The opportunity for Nvidia is not the opportunity for GPUs, because that's just a chip opportunity. There are a lot of people who have made GPUs, and the GPU market is very different from the opportunities that we are pursuing. What we're pursuing is the data center market, which is valued at about $200 billion to $250 billion a year, and that $250 billion is rapidly shifting to accelerated computing and generative AI. So that's our opportunity, and obviously, because AI has proven to be quite successful, this opportunity is going to continue to grow. So, I think our opportunity is a percentage of that $250 billion.

I also want to clarify that the figure that I quoted, that $250 billion is about last year's figure, and I think it's going to grow by about 25 or 20 percent a year. So that's why I would say that the opportunity for Nvidia could be between a trillion and two trillion dollars, depending on the time frame, but the estimate of that range is reasonable.

Reporter: Sam Altman (CEO of OpenAI) has been talking to people across the chip industry about expanding the scope and scale of the AI chip field. Has he talked to you about this? Regardless of whether he has talked about it or not, how do you see his intentions and how does that affect you and your company?

Jensen Huang's Q&A: What's the next wave of AI?

Sam Altman Visual China infographic

JC: I don't know what he thinks generative AI is going to be a very big market opportunity, other than knowing that he thinks it's going to be a very big market opportunity. I agree with him as well. Let's go back to the basics. The way computers generate pixels today is by retrieving it, then unzipping it, and displaying it on your screen. The whole process, people think that very little energy is needed, but in fact it is the opposite. The reason is that every prompt, every time you touch your phone, it has to quickly go to a data center somewhere, collect all the pieces of data, let the CPU take all the pieces and put them together in a way that makes sense from a recommender system perspective, and send them back to you. If every time you ask me a question, I have to go to my office to find an answer, it will consume more energy than if I had answered it directly. So, the way I work with you is basically generative AI, I'm generative rather than retrieval-based.

So, in the future, more and more computation will be generative rather than retrieval-based, but this generation must be intelligent, it must be contextual, and so on. We believe, and I know he believes, that almost every pixel on everyone's computer, every time you interact with the computer, will be generated by a generative chip. And today's generative chips come from NVIDIA. We hope that as Blackwell and future generations grow, we will be able to continue to make a lot of contributions in this area. But I wouldn't be surprised if one day everyone's computer, everyone's computing experience, is generative. So it's a huge opportunity, and I would agree with that.

Reporter: What is your vision for the future? We have a workshop, a basic model, how will it develop in our lives?

JH: yes, the question is how do we have our own personal LLM (large language model)? At first, we thought we might need to fine-tune it, and we kept fine-tuning over time, but as you know, fine-tuning is quite time-consuming. Then we found Prompt Adjustment and Prompt Engineering, then we found Context, Memory, Large Context Window, and then we found Working Memory, and so on. I think the answer is that the future will be a combination of all of these. You can fine-tune by adjusting one level of weights, using the LAURA training method. You don't have to fine-tune everything, you just freeze it except for one or a few levels. Then, you can do low-cost fine-tuning, you can do prompt engineering, you can handle context, you can store memories, and all of that adds up to your own special LLM, which can be in a cloud service or your own computer.

Reporter: I would like to know what you have to say about AI chip startups like Groq, who tweeted after your keynote that said, "We're still faster."

Jensen Huang's Q&A: What's the next wave of AI?

Source: X Platform

Jensen HUANG: Sounds infuriating. I don't know much about this and can't make a smart comment. I do think that token generation is a very difficult problem, and if you want to generate tokens for each model, each model needs its own special partitioning method, as transformer is not a collective name for all models. It's based on Transformer technology, and each person's Transformer is related at this point in the intent layer, but they're all quite different. Some of them are not feedforward networks, and with this thing called a mixture of experts, there are four experts who go from one expert to two experts, and there are four experts who are different in how they distribute their work and how they route information, from one expert to another.

As a result, each model requires very special optimizations. If a computer is too fragile, that is, it is designed to do something very specific and requires very specific inputs, it is a configurable computer, not a programmable computer. There's nothing wrong with that, there's something for it. But it doesn't allow you to benefit from the speed of software innovation. The reason why the miracle of the CPU cannot be underestimated. There is a very simple reason why CPUs have always been CPUs and have overcome these things that need to be configured on PC motherboards for so many years, because CPUs are programmable. The genius of a software engineer can be achieved with a CPU. If you fix it in the chip, then you're cutting off the talent of the software engineer.

Nvidia has found a way to benefit from both: a very specialized form of computing, parallel computing, a computing model based on large-scale thread flows, a tolerance for latency, and some unique properties about Nvidia processors that make them very efficient, while on the other hand, it is also programmable. If you noticed, there has always been only one architecture that has always existed, through all the other networks, resnets, NNs, hardened models, and finally transformers. There are many types of transformers, and now there's a convergence between transformers and state spaces, and the way people deal with context and memory, and these architectures are changing like crazy.

So, it's important that we get a model to work well. This is an important observation. It's really great that somebody does that, but I think that in the end, AI is not a chip problem, it's a software problem, and chips exist to facilitate the development of software. Our job is to facilitate the invention of the next ChatGPT, and if it were the Llama 70B, I would be very, very surprised.

Reporter: Continuing from the answer about the software. A big part of what you announced yesterday was about software and NIMs. Where are the opportunities for growth? From the announcements that you made yesterday, where are the biggest growth opportunities for Nvidia? I have a feeling that these microservices are going to be your next big thing. On the second part of the software question, you've said in a couple of interviews that because the future can be achieved by talking, no one needs to code anymore. Are you suggesting that people shouldn't learn those skills?

Jensen Huang: On the second part of the question, first of all, I think people should learn all kinds of skills. It looks really hard to play the violin, as well as juggling, as well as mathematics, algebra, calculus, differential equations. I think people should learn as many skills as they can, not that programming is no longer necessary for you to be a successful person. There was a time when many great people in the world advocated that everyone had to learn to code, or you would be useless. I think that's wrong. It's not a personal job to learn C++, and it doesn't need C++ to come in handy, it's computer work.

That's what I want to say. And one thing that I think is overlooked is that I believe AI has made the greatest contribution to bridging the technology divide. You don't have to be a C++ programmer to succeed. Now you just need to be an engineer who can give hints, and who can't be an engineer who can give hints? When my wife talks to me, it's like she's giving me hints, and it works really well. I think we all need to learn how to give AI hints, but it's not that different from people learning how to coach their teammates. Depending on the work you want to do, the quality of the results you seek, whether you are looking for more imagination or you want more specific results, you will prompt a person in different ways.

In the future, you'll interact with AI in the same way. You'll make it react differently depending on the answer you want to get. Maybe, you want to get a surprising answer at the beginning and then gradually make it more specific? Multi-term prompts? So the way it works with computers, everybody knows how to do it, and I believe that's the first great thing that AI does, and it bridges the technology gap. Look at the videos of all the people on YouTube creating AI, they don't have to write any programs at all. So I guess that's my point, but if anyone wants to learn how to code, be sure to do so because we're hiring programmers.

For the first question, our recent opportunity, is the two types of data centers that are about to be built. One of them is a data center that modernizes general-purpose computing into accelerated computing. The second is these AI-generated data centers, prompting to generate data centers. This is a very, very big opportunity for us recently. While we're doing this, we want to help our customers make AI. There is the invention of AI. For example, LAMA2 is fantastic. Mistral is also great. There are a lot of others, right? GR is outstanding. There are many, many AIs being created, but these AIs are difficult for companies to use. There are algorithms, they exist in their raw form and are difficult to run. So we're going to create some partners, and then we're going to take some of the most popular open source models and turn them into production-quality, usable models.

But these available models, these pre-trained models are not entirely useful on their own. You'll still need to tweak, fine-tune, restrict, give them access to proprietary information, and so on. So for companies to be able to use AI, we still need to have a whole suite of services around it, which we call NeMo. Once we're done with that, this software you can run anywhere. So actually, we're not just going to invent AI, we're going to make AI, if you will. By making these AI and AI software, everyone can use it. In the enterprise, our software business operates approximately one billion dollars per year. I think manufacturing AI is probably going to be a pretty big business.

Reporter: You mentioned the technology divide, and I suspect that now it's actually widening because a lot of non-programmers, probably doctors, lawyers, managers, service providers and things like that, don't fully understand what these machines are rolling out and what kind of disruption this is going to have for their own business in the very near future. I wonder what advice you can give to someone who is fairly sure they know how to work and that it will continue for years to come, or even more, and perhaps a few more specific words to my Israeli audience.

Jensen Huang: First of all, I have 3,300 employees in Israel. I have nearly 100 employees in Gaza and the West Bank. We have contractors in Gaza, and our hearts are with all of you. The first priority, of course, is to stay safe, and we as a company provide all possible support, and we do our best. So, anyone representing a company, please do so.

To the first thing you asked, I observed that on GTC, there are healthcare companies, drug discovery companies, financial services companies, manufacturing companies, industrial companies, consumer companies, advertising companies, automotive companies, transportation companies, and logistics companies, and so on. I'm fairly sure they're all here because of AI. So, the first thing you should observe is that for most industries that aren't in the computer industry, computing technology is secondary, and their industry sector is number one. But because AI makes computers so easy to use, we've actually closed the tech divide for them. So, if you're a healthcare provider, you have more opportunities than ever to use AI and computing to impact your own industry.

One example is where the number of AI startups has grown dramatically, in areas such as healthcare, not just the computer industry. So it's very clear that all industries themselves recognize the incredible power of AI and that they have the ability to take advantage of it. So I think that's definitely happening.

Jensen Huang's Q&A: What's the next wave of AI?

Visual China data map

Reporter: You talked about a lot of using generative AI and simulation to train robots at scale, but there are a lot of things that we can't simulate, especially when we start asking robots to perform more tasks in an unstructured environment. What do you think are the limitations of training a bot in a simulation, and what do we do when we start hitting those limitations?

Jensen Huang: There are a few different ways to think about this. First, consider your problem in the context of a large language model, remembering that large language models operate in a completely unconstrained world. It's a structureless world, and that's one of the problems. But think about it, it learns from a lot of text. Therefore, the ability of large language models, these basic models to generalize, is the secret of its magic. Generalize, and then get the context, through a few iterations. Maybe in your prompt, you tell it that you're in the kitchen, you're about to make an omelet, you specify the problem, you specify the context. These are the only tools you can use, you don't have butter. You're sitting here, and everything is in the fridge. You're describing the context just like you would do when interacting with a large language model, and the bot should be able to generalize enough if applying some of the ChatGPT secrets you've already seen.

That's what I'm talking about, the ChatGPT moment in robotics could be just around the corner. There's a lot of great science to solve, but you can see the extension of it, and this bot can generate tokens. Does that make sense to software? It doesn't know the difference. It's just a token. So you have to tokenize this. What is its number? Computer scientists will figure it out. Once they've tokenized all of these gestures, they're going to generalize, contextualize them just like you tokenize words. The last part is to make it concrete. The embodied part is reinforcement learning, and in ChatGPT it's human feedback, and you'll give it a lot of examples, questions and answers. Appropriate answers in philosophy, chemistry, mathematics, and very well-crafted, humanly appropriate questions and answers. Some of them are described in pages, there are thousands, and many more examples have been presented to the large language models that became ChatGPT. The work they do is really hard.

Here's an example of a human being. Let me show you how to make coffee, which is an example of a very clear expression, and then the robot will say, oh, I see. Let me generalize. You mean if I move this here a little bit, it's still the same activity, making coffee. So I'm using the exact same analogy. Can you see these two similar paths? So, in fact, what you're seeing in ChatGPT, now with my explanation, you can pretty much see it. We can't see it, and the only reason is that, somehow, we can't separate words from robot actions in our brains. That's the only reason, the only obstacle. If I told you that for a computer, they're all just numbers; And then you're like, wow, that's funny. It's doable.

Reporter: On the issue of hallucinations, I wonder how you think about it, especially in a mission-critical matter like health care, where you have to be 100% right. Is that solvable? What do you think?

Jensen Huang: yes, I'm very grateful for that question. The hallucination is very solvable, not on its own, but it is easily resolved. If you say every answer, you first have to look up the answer. So, this is called retrieval enhancement generation. Now, there are still some weaknesses in search generation, but the basic concepts make sense anyway. So, you can't make up an answer, if you make a web query, it should search first, and then from the search, the AI reads the answer. Don't make it up. Simply read the answers from the web, and from what you read, prioritize the ones that you think will best answer my question, the most accurate, the most truthful. Maybe it knows something about a website, or maybe it just knows something is wrong in the description, so it's going to reject the answer and find the answer that makes the most sense, and then describe it to you.

Actually, if the answer is really important to you, this AI can't answer something that isn't true to you. It first conducts research, determines which answer is the best, and then summarizes it for you. It's going to be studied. So now, let's say you don't have a chatbot, which is actually a research assistant doing a summary for you. Depending on the criticality of the information, I may insist that you always do your research before answering me. It's not a big deal. For example, if I'm just wondering, I know these answers are common sense, it's just that I'm not sure exactly what they are. For example, what is the temperature of hot tea? I'm not sure. If you're not sure, you can check it out first.

Reporter: I'm wondering how you estimate the computing needs when building a platform like Blackwell, or just add computing power to infinity as fast as possible. If it's the latter, how do you think about it from a power and sustainability perspective?

Jensen Huang: The answer is very simple. We have to figure out where our physical limits are and push as far as we can to those physical limits while surpassing them. So, how do we go beyond the limits of physics? The way to go beyond the limits of physics is to make things more energy efficient. So the first thing we do is make things more energy efficient. Actually, the example I showed yesterday is that it takes about 90 days to train GPT-4. With Hopper, it will take 8000 GPUs 90 days to train GPT-4, and with Blackwell, only 2000 Blackwell, 4 megawatts, 11 megawatts less than training GPT-4, the same amount of time. So we made Blackwell, and with more energy efficiency, we can push the limits. Energy efficiency and cost efficiency are top priorities. I also demonstrated that it can generate tokens for large language models up to 30 times faster. In other words, we've made it 30 times faster, which means we save a lot of energy while doing so, and it takes 30 times less energy to produce the same token. Energy efficiency and cost efficiency are actually at the heart of everything we do, and that's actually primary.

Jensen Huang's Q&A: What's the next wave of AI?

Visual China data map

Reporter: You mentioned that a lot of industries have the potential to experience a ChatGPT-style moment, and obviously, you're trying to make that happen in many industries. Can you pick out an industry that you think will be the first big breakthrough that really excites you?

JJ: There are a lot of examples, and some of them excite me for technical reasons. Some of it was because of the first contact that turned me on, and some because of the impact. Okay, let me give you some examples.

I'm very excited about Sora. I think the work that OpenAI has done with Sora has been extraordinary. From Wayve, a self-driving car company, saw the same capability last year. You see some examples that we did almost two years ago, on how to generate a video from text, in order to generate a plausible video, the model has to have a perceptual understanding of physics, and when you put down a cup, it's on top of the table, not in the middle of the table. Therefore, the walking person is on the ground, and their feet are not in the ground. So, this has a perceptual understanding of physics. It does not follow the laws of physics but has a perceptual understanding of physics. It is a model for understanding the world.

Secondly, I think the work we did with Earth 2 CorrDiff had a huge impact. In order to predict the weather on a scale of three kilometers, a supercomputer 25,000 times larger than the supercomputers currently used for weather forecasting is needed. Therefore, the scale of three kilometers allows us to predict the impact of extreme weather on local communities. Another benefit of what we've done is that we've made it 3,000 times more energy efficient, and by doing that, and a thousand times faster, we can predict a whole bunch of different flight paths for extreme weather, because the weather is chaotic. So, you want to sample as much of it as you can. We can do 10,000 samples, not just one. As a result, our ability to get the correct answer or the most likely answer is greatly improved. So extreme weather forecasting, local, regional forecasting, I think it's very impactful work.

I also think that the work in generating probable, medicinal, molecules with specific target proteins is ideal, basically discovering small molecule drugs. We can put it into a reinforcement learning loop like AlphaGo, sit there and generate all sorts of molecules, and attach them to proteins, and use AI models to do that so that we don't have to do that in a supercomputer. We can explore huge spaces. That's very impactful stuff. Some of the areas where early metrics are very exciting, such as what we just talked about bots. They can be made less fragile by their potential impact in general robotics. So these things that are happening are very exciting.

Reporter: I hope you can dig into your vision for drug discovery and proteins, such as structural prediction, and ultimately what involves molecular design. Also, how do these efforts impact your other projects, such as quantum computing, and do you need to do more work on quantum to help support other projects, such as drug discovery?

Jensen Huang: I'll answer from the back to the front. You know, we're probably the biggest quantum computing company in the world, and we don't make quantum computers. We do this because we believe in quantum computing. We don't see the need to build another quantum computer. And quantum computing is more than just a quantum computer. When it happens, the quantum computer will most likely be an accelerator, like a video accelerator. It is used for certain things that are very specific. Quantum computing will not be used for all calculations. It is a very specific field and will be connected with classical computers. So we created CUDA-Q, which is the programming model of CUDA, but for quantum, classical quantum CUDA architecture. Secondly, we created CUDA Quantum, another "CU" that allows us to simulate quantum computers.

Today, we can simulate a quantum computer that is faster than a quantum computer with 34 or 36 qubits. We can use it to simulate quantum circuits so that algorithmic experts can start working on quantum computers. We can use it for post-quantum cryptography to prepare the world for the arrival of quantum, by which time all the data has been encoded and cryptographically encoded appropriately. So we can contribute to all of that. We work with the vast majority of the world's quantum computer companies, researchers, quantum computer manufacturers, and more. Therefore, we believe that it will be quite a long time before it can contribute to scientific breakthroughs in digital biology.

Jensen Huang's Q&A: What's the next wave of AI?

Visual China data map

In fact, the whole understanding of NIM was gained from the work we did with digital biology and BioNEMO, which was pretty much our first NIM. The reason is that these models are amazing, but they are too difficult to use. So we started thinking about packaging them in a very special way so that all researchers could use them. BioNeMo is being widely used. I'm very proud of that. You send a chemical protein pair, and it will tell you if the binding energy is low enough. You send a chemical, you say, give me a bunch of examples like this, explore that druggified space, and it creates a whole bunch of them. I mean, that's really cool.

Reporter: How do you think the tensions between the U.S. and China will affect Nvidia's production and sales? That is, if there's a problem where you make them, or where you can sell them, that's an obstacle that you can't control. So, how do you think this will affect NVIDIA's path forward?

Jensen Huang: Yes, we have to do two things at once. One of them is to make sure that we understand the policies and that we follow them. And then second, do what we can to strengthen the resilience of supply chains. As you know, we don't just make a chip. I'm going to use this Blackwell as an example, there are more than 35,000 parts, 8 parts are from TSMC (TSMC), and the other 35,000 parts are not. When we configured this thing like a DGX, it had 600,000 parts. These parts come from all over the world, and many of them are made in China. That's just the way it is. This is true for the automotive industry, but also for the defense industry. As a result, the world's supply chains are quite complex. I am indeed very convinced that the objectives of States are not confrontational. They have some fairness issues that need to be addressed, but I don't think the apocalyptic scenario is likely to happen, and we don't expect it to happen. But of course, I hope that doesn't happen. What we can do is about resilience, and then compliance, getting other people to do their jobs.

Reporter: Can you talk a little bit about Nvidia's relationship with TSMC, and how has that relationship evolved over the last few years as chip and package complexity has increased, especially considering Blackwell and its dual-core design, how have they helped you achieve that design?

Jensen Huang: Our partnership with TSMC is probably the closest of all our partnerships, and that's understandable. What we did was very, very difficult, and they did it very, very well. We get compute D, CPUD, GPU chips, COAs substrates, and memory from Micron, SK Hynix, and Samsung, all assembled in Taiwan. So, the supply chain is not simple and requires the coordination of large companies, who do this on our behalf. These large companies work together, and they realize that there will be a need for more COOs-type interactions in the future. So, we solved all the problems. Inter-company collaboration is actually very, very good. Once assembled, a third company needs to come and test it, and then a fourth company will integrate it into a large system.

In order to build a supercomputer like this, you need a supercomputer to test it in order for it to get into the data center. Imagine the manufacturing floor as a huge data center, so there's a lot of complexity up and down the entire supply chain, and we're not building this chip alone. It's a miracle. When people ask me, what do you make? You make GPUs, I feel like they're imagining us making these chips like we make SOCs. But whenever someone says GPU, what I see is this, it's a rack like this, it's a cable and a switch, it's a GPU and a whole bunch of software in my mind. So TSMC is a very critical part of that.

Reporter: Nvidia is moving to the cloud business, while other cloud providers are making chips. What are your thoughts on this trend?First, will the fact that big tech companies make chips have an impact on your long-term pricing strategy?What is your cloud strategy?Do you have any plans to launch DGX Cloud in China, especially considering the Chinese market?If not, what solutions would you offer?

Jensen Huang's Q&A: What's the next wave of AI?

DGX Cloud, an AI supercomputing service released by NVIDIA in March 2023. Source: Nvidia's official website

Jensen Huang: First of all, we built HGX. We sell it to Dell, which then puts it in the computer and sells. The reason why we do that, and then we create software that runs on Dell computers, and we create market demand to drive sales of Dell computers, because Dell doesn't know as much about Nvidia's technology as we do. So we have to help Dell create demand. We had to help Dell create these systems and develop software for them. We have to do the same with cloud services. We work with cloud service providers to integrate NVIDIA Cloud into their clouds. We're not a cloud computing company. Our cloud is called DGX Cloud, but actually we are in their cloud.

Our goal is the same as the Dell I just mentioned. Our goal is to bring customers to their cloud, just like bringing customers to this machine at Dell. So, instead of HGX to Dell, it's called DGX Cloud to CSPs to Azure. It's the same philosophy. That's why we develop software, educate developers, and create demand for CSPs that use our architecture. It's not about anyone's chips, it's about NVIDIA's role as a computing platform company. A computing platform company must train its own developers. That's why GTC exists, a developer conference. If we're an X86 company, why do we need a developer conference? Everyone in the world is using X86. What's the use of a developer conference? Because our architecture is still being adopted and its use is complex. We have to have a developer conference for it. So DRAM doesn't need a developer conference, Ethernet doesn't need a developer conference, but computing platforms like ours need a developer conference because we need developers. Nvidia is everywhere. We're in every cloud, in every data center, and so on.

Reporter: You've said that artificial general intelligence (AGI) is going to come in five years, and Blackwell is so powerful, do you still stick to that timeline, or do you think it's going to speed up? If it does, do you have any concerns? I ask because you're obviously a modern-day Leonardo da Vinci, but you can also be a modern-day Oppenheimer.

Jensen Huang: Oppenheimer made a bomb, and we didn't do that. First, define AGI. I'm talking about this now, and I'm sure everyone is trying to define it. I want you to be specific about defining an AGI so that each of us knows when we have arrived. For example, what is Santa Clara? It's geo-spatial, it's very specific, and you all know how to get there. Defining the New Year, all of us know when the New Year will come, and even based on our time zone, we know that it has arrived. But AGI is a little bit different, and if we define AGI as something very specific, it means a whole bunch of tests, math tests, reading tests, reading comprehension tests, logic tests, medical tests, law tests, economics tests, and so on, whatever tests you say — a bunch of tests. If I take a bunch of tests, and I say that the definition of AGI is when this set of testing software programs does a good job, it means better than most of the 80%, or almost all of them. Do you think computers will be able to do it in five years? The answer is probably yes.

So every time I answer this question, I specify AGI, but when it comes to media reports, no one specifies. So it depends on what your goals are. My goal is to communicate with you, and your goal is to figure out what story you want to tell. Therefore, I believe that AGI, as I define it, could be achieved within five years. For the three words of general artificial intelligence, I don't know if we agree or not. That's why we have so many different words to describe each other's intelligence.

Reporter: I like your point of view that computer games are OG (original) generators. Based on that, you had a very prescient reference last year that in the future, every pixel will be generated, not rendered. How far do you think we're from a world where every pixel is generated at real-time frame rates, and what is your vision for both gaming and non-gaming experiences in this new paradigm?

Jensen Huang: I don't think the curve of change in technology is going to be more than a decade. Once it becomes viable, it gets better, and of course, ChatGPT is not only viable, it is better in most cases. I think that's less than 10 years from now. In 10 years, you're at the other end of that S-curve. Five years from now, you may be in the middle, and everything is changing in real time. Everybody will say, look, this is happening. So you just have to decide. Are we already in the first two years of those 10 years, maybe we have already entered the first two years. So I would say somewhere in the next 5 to 10 years, let's say eight years, that's pretty much for sure.

Reporter: The Nikkei hit an all-time high after your latest earnings report, so it's safe to say that a lot of eyes are on Nvidia in the Japanese market. You also met with Prime Minister Kishida in December to discuss expanding AI capabilities in Japan. Can you share with us what progress you're making in expanding AI capabilities in Japan? or your general outlook for the Japanese market, NVIDIA products and business.

Jensen Huang: I think Japan has a high level of awareness of the importance of increasing productivity. We all know that when a company becomes more productive, earnings increase. When earnings increase, we hire more people. When an economy made up of many companies becomes more productive, the economy grows, employs more people, and has a higher quality of life. Japan, like many countries and companies, needs to increase productivity. AI is the best way we know to increase the productivity of our country. I think Japan understands that. Japan also understands that Japan's data, language, and culture are very specific, and there is no reason to allow other third parties to collect that data to create AI, and then re-enter it into Japan, and then have the Japanese market pay for it, which makes no sense. You should build it yourself. Therefore, for reasons such as sovereign AI, national productivity, and corporate productivity, I think AI will be very important in Japan. I find the dynamism of Japan to be exhilarating and has an amazing energy, and all my business partners are very excited about this opportunity, and I am also excited for you.

Reporter: Does Nvidia plan to further expand its operations in Israel, open more centers, and acquire more companies? What will the future of Nvidia hold in Israel?

Huang: Israel is one of the countries where Nvidia has the largest number of employees on a per capita basis. Israel is home to NVIDIA's largest headquarters, with 3,300 people, and home to some of our most talented engineers. One of our most important investments, NVSwitch comes from Israel. The thing I'm talking to you about is the heart and soul of Blackwell, and it comes from Israel. Therefore, we will continue to invest heavily in Israel. That area is very important to me. We are also hiring in the West Bank. We support all of our Palestinian employees in the West Bank and take care of them and their families. We won't lose our support, we're not going to leave the West Bank, which is very important to us. Employees need to know that the company has their back. In order for them to do great work, you need to know that your foundation is strong. Nvidia is a strong company. Our foundation is strong, and our support for the employees there is unmistakable. So, they can know that. They should know.

Reporter: The Indian government recently pledged to buy 10,000 GPUs through a public-private partnership. Is Nvidia part of this plan? At the same time, India's AI computing power is currently less than 2% of the world's. How do you see this developing in the near future?

Jensen Huang: First of all, if India is buying GPUs for AI, I'd like to share my thoughts on that. I think the AI GPUs that Nvidia makes are excellent, and when you go back, if you can spread the word, then we do a really good job of doing that. Secondly, we're very interested in it, and if somebody wants to buy some GPUs, we'll be happy and will open up the business. So, I hope everyone spreads the news, Nvidia has opened up its business.

I think AI is a huge opportunity. In fact, when I go to India, I usually have the opportunity to meet Indian Prime Minister Narendra Modi, and it was a very extraordinary experience. He said to me, "Jensen, India should not export flour to import bread. "Very reasonable. Why export raw materials to import value-added products? Why export data from India so you can import AI, refine it and add value to it?

Second, India has the largest number of IT professionals in the world. There is no doubt that they are retraining for AI. When I met with leaders in India, they knew very well that this was one of the biggest opportunities to retrain themselves. They will no longer just be the IT of the company's back-office, but the IT of the company's front office, where they will create value. AI is used in engineering, marketing, sales, finance, business operations, and go-to-market strategy. All of this is in the foreground, not in the background. India is looking to get into the front end of IT, where there is the biggest market opportunity, and I think you guys are absolutely going to do it, and I'm excited for you guys.

Jensen Huang's Q&A: What's the next wave of AI?

On October 20, 2021, in Hsinchu, an employee walked into TSMC's headquarters in Hsinchu. Visual China data map

Reporter: I would like to ask about TSMC, we know that TSMC can always ensure supply, but many companies still want more. What do you think of Samsung or SK hynix, other than HBM (High Bandwidth Memory)?

Jensen Huang: It's like asking me a question about TSMC and saying, what do you think of TSMC other than their foundry business? (Like asking) Do you like working with NVIDIA, other than GPUs? You know that HBM memory is very complex and has a very high value-added value. We spent a lot of money on HBM. We are in the process of certifying Samsung's HBM. Not yet, but not yet. But we will.

Reporter: Have you asked Samsung to make chips? What do you think about this relationship?

Jensen Huang: Samsung is a very good partner. South Korea, as you know, produces the largest amount of advanced memory in the world. HBM is very complex. Don't think of HBM as DDR 5. Not at all. It's a technological marvel, and that's why it's so fast. HBM memory runs like logic, not just DRAM, and is becoming more and more complex. Those manufacturers are so humble that you misunderstand. HBM is a technological marvel. Now, it's amazing that in all data centers, the DDR memory of the past will become the HBM memory of the future in all data centers. The upgrade cycle of Samsung and SK hynix is incredible. As long as Nvidia starts to grow, they will grow with us, and the amount of memory that we're going to replace in the world's data centers is huge.

Why is it so good? Because HBM memory is much more energy efficient. This is how we are making the world more sustainable so that we can use more advanced memory, which is faster but consumes very little power. It's very complicated. Yes. I value our partnership with SK hynix and Samsung. They are outstanding.

We don't have a very close relationship with TSMC. No, we should be closer, but we should be close to all partners. Our relationship with Samsung is very deep. Every car we will build in the future is based on Samsung, and our commitment to the autonomous vehicle industry is very high. Therefore, our trust in our partners must be very long-term. Samsung is an extraordinary company, maybe you live in the same city as Samsung, and you forget how amazing they are. But from where I am, Samsung is an extraordinary company, and SK hynix is also extraordinary. That's why they can be world leaders in their field.

Read on