laitimes

Interview with the woman who fought with NVIDIA: AI chips have no moat Choosing to be open is crucial

Interview with the woman who fought with NVIDIA: AI chips have no moat Choosing to be open is crucial

Focus:

--AMD CEO Su Zifeng said that in addition to GPUs, the overall supply and demand balance of the current global chip market.

• The Chips and Science Act is good for the U.S. semiconductor industry, but it will take many years to achieve real results.

AMD is developing the AI chip MI300, which can compete in training workloads with the NVIDIA H100.

--There is no moat for AI chip development, and it is especially important to choose openness, especially open software.

Interview with the woman who fought with NVIDIA: AI chips have no moat Choosing to be open is crucial

AMD CEO Lisa Su recently had an in-depth conversation with Nilay Patel, a well-known American technology media person, at The Code Conference. The two sides discussed many topics, especially about artificial intelligence and chip supply chains.

In the past few years, the global chip shortage has been exacerbated by the new crown pneumonia epidemic; Now, as big tech companies want to run AI models, demand is suddenly surging again.

Su Zifeng said that the current balance between supply and demand is generally in a fairly good state, with the exception of high-end GPUs that support large AI models. Su Zifeng also said that the United States still has a long way to go to achieve self-sufficiency in chips. The following is a summary of the interview:

Q: I have many questions for you. Let's start with some exciting things. AMD made some news in the AI market today. What's going on?

Su Zifeng: First of all, the theme of this year's Code Conference is artificial intelligence, which is also the theme of the technology industry these days. When we see all the opportunities where computing really drives AI, that's exactly what we're working towards. So, this morning, we did get an announcement from a startup called Lamini. We've been working with this great company that has some top researchers on large language models.

When I talk to CEOs of various companies, all of them ask, "I know I need to focus on AI." I knew I needed to do something. But what should I do? It's too complicated and there are a lot of different factors. "With foundational models like Llama, many enterprises actually want to customize these models with their own data, ensuring that applications can do this in a private environment. Lamini does exactly that. Lamini actually customizes and fine-tunes models for enterprises, and the company uses AMD's GPUs. So it's a cool thing. We spent a lot of time working with this company to really optimize the software and applications to make it as easy as possible to develop models that were fine-tuned for the enterprise.

Q: I want to talk a little bit deeper about software. I think it's very interesting to abstract the different levels of software development from the hardware. But let's get back to the topic. In the chip market, we are ending a period where chips at every process node were incredibly limited. What do you think we are currently in?

Su Zifeng: This topic is very interesting. I think I've been in the semiconductor industry for at least 30 years. For a long time, people didn't even understand what semiconductors were, where they were in the entire supply chain, and where they were necessary for applications. I think the last few years, especially with the pandemic-driven demand and everything we've done with AI, people are really starting to look at semiconductors now.

I think it's a huge cycle. The first is the cycle where we need many more chips than we supply, and the second is the cycle in which too many chips are produced. But at the end of the day, I think the truth is that semiconductors are essential for many applications. Especially for us, we focus on semiconductors that are the most complex, highest-performing, and cutting-edge. What I'm saying is that this market is going to grow substantially.

Q: What do you think is the bottleneck now? Is it cutting-edge? Or are we hearing about problems on older process nodes during chip shortages?

Su Zifeng: I think the whole industry has formed an ecosystem that has invested a lot of resources to make sure that the overall demand is met. So overall, I would say that the balance of supply and demand is very good except for the GPU. If GPUs are needed for large language model training and inference, their current supply may be slightly tight.

But the truth is, we're definitely putting in a huge effort to accelerate the entire supply chain. These are the most complex devices in the world - hundreds of billions of transistors, a large number of advanced technologies. But overall supply is definitely increasing.

Q: The Chips and Science Act, passed last year, made a lot of investments in U.S. semiconductor factories. AMD is clearly the world's largest fabless semiconductor company. Has the CHIP and Science Act already had a tangible effect, or do we still need to wait for it to bear fruit?

Su Zifeng: I do think that if you look at the CHIP and Science Act and what it does for the U.S. semiconductor industry, it's really remarkable. I have to say that I salute Gina Raimondo and the U.S. Department of Commerce and industry for all that they have done. These are things that take a long time. The U.S. semiconductor ecosystem needed to be built five years ago. It's expanding now, especially in the frontier, but it will take some time.

So, I don't know if we're feeling the impact yet. But we've always believed that the more long-term investments, the more impact will come. So I'm excited about the production capacity in the U.S. mainland. I'm also very excited about some of the investment in our nation's research infrastructure, as it's also extremely important for long-term semiconductor strength and leadership.

Q: AMD's financial numbers are proof of this. AMD is currently selling many more chips than it was a few years ago. Where did you find the supply? Is AMD still relying on TSMC while waiting for new semiconductor fabs to emerge?

Su Zifeng: When you look at the business we are in, it is pushing the frontiers of technology. As a result, we are always at the most advanced node and working towards the next big innovation. It is a combination of process technology, manufacturing, design and design systems. We are very pleased with the cooperation with TSMC. TSMC is the best enterprise in the world, with advanced and leading technology.

Q: Can AMD get rid of TSMC?

Su Zifeng: I think the key is geographical diversity. When you consider geodiversity, it's true no matter what happens. No business wants to stick to business as usual because risks come with it. This is where the Chips and Science Act actually helps, because a large number of semiconductor manufacturing plants are now being built in the United States. They're actually going to start production in the coming quarters, and we're actively doing some manufacturing in the U.S.

Q: I spoke with Intel CEO Pat Gelsinger when he attended the groundbreaking ceremony for the Ohio plant. Intel is trying to become a semiconductor foundry. He said to me very confidently, "I want to have the AMD logo on the side of one of the semiconductor factories." How far is he from achieving this goal?

Su Zifeng: I want to say that from the perspective of domestic manufacturing in the United States, we are indeed looking for many, many opportunities. I think the Intel CEO has a very ambitious plan, and I think that's the goal. I think we're always looking at who the best manufacturing partners are, and for us and most importantly, companies that are really committed to cutting-edge technology.

Q: Are there any competitors from TSMC in this regard?

Su Zifeng: There is always competition in the market. TSMC is certainly very good. Samsung Electronics is certainly also investing heavily. You mentioned Intel. I think Japan is also coming to cultivate advanced manufacturing. So there are a lot of different options.

Q: You mentioned that GPUs are currently in short supply. In the case of the Nvidia H100, there is actually a black market for these chips. AMD has GPUs and is preparing to launch new GPUs. You also mentioned earlier that Lamini's training is entirely on AMD's GPUs. Because NVIDIA's GPU supply is limited, does AMD see an opportunity to disrupt the market?

Su Zifeng: I want to take a step back and talk about the incredible things that are happening in the AI market. If you think about the technology trends we've seen in the last decade or two – whether we're talking about the internet, the cell phone revolution, or how PCs are changing things – it's ten, a hundredfold more than AI in terms of the impact of everything we do.

Therefore, from the perspective of productivity, whether it is enterprise, personal or social productivity, the impact of artificial intelligence will be huge. So the fact that there is a shortage of GPUs, I don't think is surprising, because people have recognized the importance of this technology. Right now, we're in the early days of how AI, and generative AI in particular, is coming to market, and I think we're talking about the ten-year cycle issue, not how many GPUs you can get in the next two to four quarters.

AMD is excited about its roadmap. I think that for high-performance computing, I call generative artificial intelligence the killer application of high-performance computing. You need more and more. Although today's large language models are already good, it can still get better if you continue to improve training performance and inference performance.

That's what we did. We make the most complex chips. We do have a new product coming out. If you want to know its codename, just call it MI300, it will be great. Its goal is large-scale language model training as well as large-scale language model inference. Do we see an opportunity? Yes. We see great opportunities, and not just one. The idea that cloud service providers are the only users is incorrect. There will be a lot of enterprise AI. Many startups have also received huge venture capital support in AI. So we see opportunities in all of these areas.

Q: Is the name of the new product MI300?

Su Zifeng: Yes.

Q: In terms of performance, will it be on par with the NVIDIA H100, or will it exceed the H100?

Su Zifeng: In terms of training workloads, it will certainly be competitive, and there is no one-size-fits-all chip product in the AI market. Some chips excel in training and some appear in inference, depending on how you put them together.

What we did with the MI300 was build a great inference product, especially for large language model inference. So most of what we're doing now is companies training and deciding what their models will be. But going forward, we actually think the market for inference will be much bigger, which is exactly what we designed the MI300 for.

Q: If you look at what Wall Street thinks Nvidia's model is, it's CUDA, proprietary software stacks, long-term relationships with developers. AMD's ROCm mode is slightly different. Do you think this is a moat that could be conquered with a better product or a more open approach? How will AMD attack?

Su Zifeng: When the market is developing so fast, I don't believe there is a moat. When you think about the moat, it's a more mature market where people don't really want to change a lot of things. When you look at generative AI, it's evolving at an incredible pace. The progress we're currently making in a few months in a regular development environment could take years in the past. Especially software, we chose open software.

There is actually a dichotomy. If you look at the people who have developed software in the last five, seven, or eight years, they tend to use... Let's call it more hardware-specific software. It is very convenient. But there weren't many options at the time, smart choice of this. When looking to the future, it's actually found that everyone is looking for the ability to build hardware agnostic software because people want choice. Frankly, people want choices. People want to use the old infrastructure. People want to make sure they can move from one type of infrastructure to another. So they're building these higher-level software. For example, such as the open-source Python machine learning library PyTorch, it often has hardware-agnostic features.

I do think the next decade will be different from the last decade because it's about how you evolve in AI. I think we're seeing this across the industry and the ecosystem. The good thing about an open approach is that no single company has all the ideas. So the more we can bring the ecosystem together, the more we can leverage all the very smart developers who want to accelerate AI learning.

Q: The future of PyTorch is huge, right? This is the language in which all these models are actually coded. I've talked to CEOs of some cloud vendors, and they don't like to rely on Nvidia any more than anyone likes to rely on any one merchant. Can you work with these cloud vendors and say, "We're going to optimize our chips for PyTorch instead of CUDA," and then developers can run on PyTorch and choose the best optimized one?

Su Zifeng: Exactly. If you think about what PyTorch is trying to do – it really is trying to be that hardware-agnostic layer. We achieved a major milestone on PyTorch 2.0, with AMD qualifying on day one. This means that anyone running CUDA on PyTorch right now, can run it on AMD out of the box. Frankly, it works on other hardware as well.

But our goal is to "make the best chips win." The way to do this is to make the software more seamless. It could be PyTorch, it could be Jax, or some of the tools that OpenAI introduced in Triton. There are a lot of people who are also doing the "build their own" type of thing. So I do think this is the future wave of AI software.

Q: Is AMD developing custom chips for these companies?

Su Zifeng: We have the ability to develop customized chips. I think the timing of developing custom chips is actually when they get a lot of applications. So I believe there will be custom chips in the next few years. Another interesting thing is that you need a variety of different types of AI engines. Therefore, we spent a lot of time talking about large GPUs because this is what is needed to deal with large language models. But you'll also see some ASIC chips. You'll also see artificial intelligence in the client chip. So I'm very excited about how AI will be broadly integrated into chips across all market segments.

Q: We're talking about things that raise the cost curve to a great extent: a lot of smart people do a lot of work on cutting-edge process nodes to develop really high-end GPUs. Everything is getting more expensive, and you can see how expensive consumer apps are: $25 per month, Microsoft Office 365 Copilot is priced at $30 per user per month. When will you get a lower cost curve for consumer prices?

Su Zifeng: That's a very good question. I believe that the value of AI in terms of productivity will definitely be proven. Admittedly, the cost of this infrastructure is high right now, but on the other hand, the productivity gains for users are also exciting. We're deploying AI inside AMD, which is a high priority because if we can design chips faster, it can lead to tremendous productivity.

Q: Do you trust AI? Will you ask employees to check the work that AI is doing, or trust it outright?

Su Zifeng: We're all trying, right? We're in the early stages of building the tools and infrastructure for deployment. But the truth is that AI saves us time – whether we're designing chips, testing chips, or validating chips – it saves us time. In our world, time is money.

But back to your question, when will you be able to reduce the cost curve. I think that's why it's so important to think broadly about AI, not just in the cloud. If you think about what the ecosystem will look like in a few years, you'll think of cloud infrastructure to train these largest base models, but you'll also have some edge AI. Whether in a computer or a mobile phone, local artificial intelligence can be carried out. Local AI services are cheaper, faster, and when you do, they're actually more private. So that's the idea that AI is everywhere and how it can really enhance the way we currently deploy it.

Q: How should we regulate AI?

Su Zifeng: I think it's something that we all take very seriously. From a productivity and discovery perspective, this technology has great advantages, but AI also has security issues. I do think that, as large companies, we have a responsibility. If we consider two things around data privacy, and when developing these models, make sure they are developed to the best of our ability without too much bias. We will make mistakes. The entire industry won't be perfect here. But I think the importance of AI is clear, and we need to work together, we need public/private partnerships to make that happen.

Q: Will chips be restricted if regulated?

Su Zifeng: Yes, I will accept an opportunity to see what safeguards need to be put in place.

Q: I think it's going to be the most complicated one... I don't think we expect chips to be limited in what we can do, it feels like it's a question we have to ask and answer.

Su Zifeng: I repeat, it's not the chip itself. Because in general, chips have a wide range of functions. This problem is related to the chip plus software and model. Especially in terms of models, what have you done in terms of security measures.

Q: AMD's chips are already on the PS5 and Xbox. There is an opinion that cloud gaming is the future of everything. This could be fine for AMD, which will also be in their data centers. But do you see this shift happening? Is this true, or are we still in the console era?

Su Zifeng: Games are everywhere. People have been talking: is this the end of console gaming? I don't see it. I think PC games are powerful, and I think console games are powerful. I think cloud gaming is also powerful. They all require similar types of technology, but they obviously use it in different ways. (Mowgli)