laitimes

Zuckerberg's latest 20,000-word interview: Llama3, the "most powerful open source model" worth tens of billions of dollars, and everything behind it

author:Wall Street Sights

On April 18, Meta launched Llama 3, calling it "the most capable open source model so far", and the debut of Llama3 once again affected the competitive landscape of AI large models and detonated the AI circle. On the same day, Meta CEO Zuckerberg and well-known tech podcast host Dwarkesh Patel also released an exclusive interview at the same time, in this 80-minute interview, mainly focusing on Llama3, artificial general intelligence (AGI), energy issues, AI security issues, the risks of open source and the implications. Zuckerberg said that AI has become the core of Meta, Meta AI is now the smartest AI assistant currently available for free use, and the upcoming large version of Llama 3 will have more than 400 billion parameters. In terms of AI model training and development, Xiaozha mentioned that the emergence of Llama 3 confirms the importance of large-scale data and computing resources for AI models, and that in the future, training large-scale AI models may face challenges such as capital and energy constraints, emphasizing that the emergence of AI is not an attempt to replace humans but to empower people with more powerful tools to accomplish more challenging tasks.

  • The performance of the Llama3 8 billion with the smallest parameters and the previous generation Llama2 70 billion with the largest parameters are of the same order of magnitude, while the most powerful version with 405 billion parameters is still on the way.
  • The advent of Llama 3 confirms the importance of large-scale data and computational resources for AI models, and AI is moving from a "question answering" tool to a broader "reasoning" system that needs to understand the context of the problem, integrate multiple aspects of knowledge, and apply logical reasoning to reach conclusions.
  • Multimodality is an area of focus for Meta, with a particular focus on emotional understanding, and if a breakthrough can be made in this area that allows AI to truly understand and express emotions, then the interaction between humans and machines will become more natural and deeper than ever.
  • AI will indeed change the way humans work and is expected to significantly improve the productivity of programmers, but AI is not coming in an attempt to replace humans, but rather to empower people with more previously unimaginable tasks through these tools.
  • Like the emergence of computers, AI will fundamentally change human life, bringing many new applications that were previously impossible, and reasoning will profoundly change almost all product forms.
  • When AI development encounters GPU bottlenecks or insufficient funds, it will first encounter energy problems, and if humans can solve the energy problems, it is entirely possible to build computing power clusters with a larger scale than they are now.
  • I THINK THERE WILL BE META AI UNIVERSAL ASSISTANT PRODUCTS IN THE FUTURE, AND EVERY BUSINESS WANTS TO HAVE AN AI THAT REPRESENTS THEIR INTERESTS, AND AI WILL ADVANCE SCIENCE, HEALTHCARE, AND VARIOUS FIELDS, AND ULTIMATELY AFFECT ALL ASPECTS OF THE PRODUCT AND THE ECONOMY.
  • I think in the future, if AI is overly centralized, the potential risk may be no less than its widespread spread, and if an institution has AI that is more powerful than everyone else, is that also a bad thing?
  • I think there are many possibilities for the development of training, and commercialization is really one of them. Commoditization means that as there are more options in the market, the cost of training will be greatly reduced and it will become more accessible.
  • The issue of existential risk does deserve our attention, but for now we're more concerned about content risk, where the model could be used to create violence, fraud, or other harm to others.
  • Open source is becoming a new and powerful way to build large models. While specific products evolve, emerge, and disappear over time, their contributions to human society are long-lasting.
  • Meta may soon train large models on its own chips, but Llama-4 may not be able to do so yet.

Here is the full text of the interview:

Zuckerberg's latest 20,000-word interview: Llama3, the "most powerful open source model" worth tens of billions of dollars, and everything behind it

Llama 3 Top-of-the-Line Edition is still training Dwarkesh Patel: Mark, welcome to this podcast. Mark Zuckerberg: Thank you for having me. I'm a big fan of your podcast. Dwarkesh Patel: Thank you so much for the compliment. Let's start by talking about the products that will be released at the same time as this interview. Can you tell me a little bit about the latest developments in Meta AI and related models, and what's exciting? Mark Zuckerberg: I think what most people will be looking at is the new version of Meta AI. The most important thing we're doing is upgrading the model. We released Llama-3. We're making it available to the developer community as an open source, and it will also support Meta AI. There's a lot to talk about Llama-3, but I think the most important point is that we now think of Meta AI as the smartest AI assistant that people can get for free, and we also integrate Google and Bing for real-time knowledge. We're going to make it stand out more in our app, and at the top of Facebook and Messenger, you can use the search box directly to ask questions. We've also added some authoring features that I think are really cool and people will love. I think animation is a great example of this, you can basically take any image and make it move. One thing that people will find very amazing is that it can now generate high-quality images so quickly, literally in real-time as you type. You type in your query and it adapts, like "show me a picture of a cow standing in a field with mountains in the background, eating macadamia nuts and drinking beer," and it updates the image in real time, which is really cool and I think people will love it. I feel like that's going to be something that most people can feel in the real world. We're rolling it out, not everywhere, but we're starting with a handful of countries and expanding in the coming weeks and months. I think it's going to be an amazing thing and I'm really excited to get it into people's hands. This is a big step forward for Meta AI. But if you want to dig a little deeper, Llama-3 is clearly the most technically interesting. We're training three versions: We've actually trained three versions, 8 billion, 70 billion, and 405 billion dense models, and 405 billion of those models are still being trained, so we're not releasing them today. But I'm very excited about the performance of 8 billion and 70 billion, which are leading in terms of their size. We'll publish a blog post with all the benchmark results that people can see for themselves, it's obviously open source, so there's a chance to try it out. We have a new version of the roadmap that will bring multimodality, more multilingualism, and a larger context window. Hopefully, later this year, we'll be able to launch a version with 405 billion parameters. As far as training is so far, it's already around 85 on MMLU, and we expect it to lead in many benchmarks. I'm very excited about it all. The 70 billion version is also fantastic. We release it today. It is about 82 points on the MMLU and has a leading score in math and reasoning. I think it's going to be really cool to put it in people's hands. Dwarkesh Patel: Interesting, this is the first time I've heard of MMLU as a benchmark. It's impressive. Mark Zuckerberg: The 8 billion parameter version is almost as powerful as the largest version we have released, the Llama-2. So the smallest Llama-3 is basically as powerful as the largest Llama-2. Dwarkesh Patel: Before we dive into these models, I want to go back in time. I'm guessing you guys started sourcing these H100s in 2022, or you can tell me exactly when. At that time, the stock price took a heavy hit. People ask what these capital expenditures are all about. People don't buy the metaverse. I guess you spend capital outlay to buy these H100s. How did you know you were going to buy the H100? How did you know you needed a GPU? Mark Zuckerberg: I think it was because we were working on Reels. We always want to have enough computing power to build something in the future that we don't see yet. We ran into a situation where we needed more GPUs to train the model while developing Reels. This is a major evolution of our services. We didn't just sort the content of the people or Pages you follow, we started to strongly recommend what we call non-connected content, which is content from people or Pages you don't follow. We may show you a library of content candidates that scales from thousands to millions. It requires a completely different infrastructure. We started working on it, but we were limited in terms of infrastructure and couldn't catch up with TikTok at the pace we wanted. I basically looked at it this way, and I thought, "Hey, we have to make sure we don't get into this situation again." So let's order enough GPUs to do what needs to be done in terms of Reels, content ranking, and feeds. But let's double it again. "Again, our universal principle is that there will always be something in the future that we don't see yet.

Zuckerberg's latest 20,000-word interview: Llama3, the "most powerful open source model" worth tens of billions of dollars, and everything behind it

The Road to AGI Dwarkesh Patel: Did you know it was going to be AI? Mark Zuckerberg: We think that's going to be something to do with training large models. At the time I thought it might have something to do with the content. It's just a pattern match for running a company, and there's always another direction to deal with, when I was mired in trying to get the recommendation system for Reels and other content to work well. This is a huge breakthrough for Instagram and Facebook, being able to show people interesting content from people they don't even follow. But in hindsight, this decision was very correct, and it stemmed from our backwardness. It's not because of "oh, I'm thinking too much". In fact, most of the time, we make decisions that look good later because we messed up something before and just don't want to make the mistake again. Dwarkesh Patel: It's completely off topic, but I'd like to ask now. We'll come back to the topic of AI in a moment. You didn't sell for $1 billion in 2006, but I think you must have a price in mind that you'd like to sell, right? Have you ever thought in mind, "What do I think Facebook actually valued at the time, and the price they gave was not reasonable"? If they bid $5 trillion, of course you would sell. So how did you weigh that choice at the time? Mark Zuckerberg: I think some things are just personal. I don't know if I was shrewd enough to do that kind of analysis. People around me are making all sorts of arguments for $1 billion, like, "We need to generate so much revenue, we need to do this big." This is obviously many years later. "It was far beyond the scale we had at the time. I didn't really have the financial expertise I needed to participate in that kind of debate. Deep down, I believe in what we're doing. I did some analysis, "What would I do if I didn't do this?" Well, I really like to create things, and I like to help people communicate. I love understanding what is happening and interacting with people. So I thought, if I sell this company, I might build another similar company, and I kind of like this one. So why bother?" I think a lot of the biggest bets people make tend to be just based on beliefs and values. In fact, it is often very difficult to do forward-looking analysis. Mark Zuckerberg: I don't know exactly what the timeline is. I think these things will gradually progress over time. Dwarkesh Patel: But in the end: Llama-10. Mark Zuckerberg: I think there's a lot to that question. I'm not sure if we're replacing people, or if we're giving people more tools to do more. Dwarkesh Patel: With Llama-10, will the programmers in this building be 10 times more productive? Mark Zuckerberg: I hope it's more than 10 times. I don't think humans have a single threshold of intelligence because people have different skills. I think at some point, AI might outperform humans in most things, depending on how powerful the model is. But I think it's gradual, and I don't think AGI is just one thing. You're basically adding different abilities. Multimodality is a key point we are focusing on now, initially with photos, images, and text, but eventually extending to video. Because we're so focused on the metaverse, 3D type stuff is also important. One modality that I'm very concerned about, and I don't see a lot of other people in the industry paying attention to, is emotional understanding. There are so many parts of the human brain that are just dedicated to understanding people, understanding expressions, and emotions. I think this in itself is a complete modality that allows AI to really understand and express emotions, and then the interaction between humans and machines will become more natural and deep than ever before. So in addition to the big improvements in reasoning and memory, there are a lot of different abilities that you want the training model to focus on, and memory is a whole thing in itself. I don't think in the future we're going to be mostly cramming things into a query context window to ask more complex questions. There will be different memory storage or different custom models, which will be more personalized. These are just different abilities. Obviously, there's also making them bigger and smaller. We pay attention to both. If you're running something like Meta AI, it's very server-based. We also want it to work on smart glasses, and there isn't much space in them. So you want to have something very efficient to do that. Dwarkesh Patel: If you're using intelligence on an industrial scale for inference that's worth tens of billions of dollars, or even hundreds of billions of dollars in the end, what are the use cases? Is it simulation? Is it artificial intelligence in the metaverse? What are we going to use data centers for? Mark Zuckerberg: Our bet is that it's going to basically change all of the product. I think there will be a Meta AI universal assistant product. I think it's going to go from something more like a chatbot where you ask a question and it formulates an answer, to you give it more complex tasks, and then it's going to leave and complete those tasks. So, it takes a lot of reasoning, but also a lot of calculations and other means. And then I think interacting with other other agents is going to be a big part of what we do, whether it's for businesses or creators. My important theory about this is that there won't be a single AI that you interact with, every business will want an AI that represents their interests. They won't want to interact with you primarily through an AI that will sell your competitor's products. I think creators are going to be a big group. There are about 200 million creators on our platform. They basically all have this model, they want to engage their community, but they're limited by time. Their community often wants to attract them, but they don't know they're limited by daylight hours. If you can create something where creators can basically have the AI, train it the way they want, and involve their community, I think that's also going to be very powerful, and there's going to be a lot of engagement in all of these things. These are just consumer use cases. My wife and I run our foundation, the Chan Zuckerberg Initiative. We've done a lot of work on the science side, and obviously there's a lot of AI work that's going to advance science, healthcare, and all of these things. As a result, it ultimately affects essentially every area of the product and the economy. Dwarkesh Patel: You mentioned that AI can do some multi-step things for you. Is this a much bigger model?for example, for Llama-4, is there still going to be a version with 70 billion parameters, but you just need to train it on the right data and it's going to be very powerful?what does the progress look like? is it scaling up? or is it the same size but different databases, as you said? Mark Zuckerberg: I don't know if we know the answer to that question. I think one thing that seems to be a pattern is, you have the Llama model, and then you build some kind of other application-specific code around it. Some of these are fine-tuning for use cases, but some are, for example, how Meta AI should use tools like Google or Bing to bring in the logic of real-time knowledge. This is not part of the underlying Llama model. For Llama-2, we have something like this, which is more hand-designed. Part of our goal with Llama-3 is to incorporate more of these things into the model itself. In the case of Llama-3, as we start getting into more of these agent-like behaviors, I think some of them will be more hand-designed. Our goal for Llama-4 will be to incorporate more of these into the model. At every step, you get a sense of what's possible on the horizon. You start fiddling with it, doing some hacks around it. I think this helps you hone your intuition and know what you want to try to train in the next version of the model. This makes it more versatile, because obviously for anything that you handcode by hand, you can unlock some use cases, but it's inherently fragile and non-generic. Dwarkesh Patel: When you say "incorporate the model itself," do you mean training it on what the model itself wants? What do you mean by "incorporate the model itself"? Mark Zuckerberg: In the case of Llama-2, the use of tools is very specific, and Llama-3 is much better in terms of tool usage. We don't have to manually write everything to get it to use Google and search. It can do this directly. Similarly, the same is true for coding and running code and many things like that. Once you've acquired this ability, you can get a glimpse of what we can start doing next. We don't have to wait until Llama-4 comes along to start building these features, so we can start doing some hacks around it. You do a lot of hand-coding, at least during the transition period, which makes the product better. And then that helps point the way to what we want to build in the next version of the model. Dwarkesh Patel: Which community tweaks of Llama-3 are you most looking forward to? Maybe not the one that works the most for you, but the one you enjoy playing the most. They tweaked it in ancient times, and you'd talk to Virgil or something. What are you interested in? Mark Zuckerberg: I think the nature of this kind of stuff is that you're going to be surprised. Anything specific that I think is valuable, we're probably building. I think you'll get the distilled version. I think you'll get the smaller version. One thing is, I don't think 8 billion is small enough to cater to a large number of use cases. Over time, I'd love to get a model with 10-2 billion parameters, or even a 500 million parameter model, and see what you can do with it. If there are 8 billion parameters, we are almost as powerful as the largest Llama-2 model, then with 1 billion parameters, you should be able to do something interesting, and faster. It's great for categorization, or a lot of the basic things that people do in understanding the intent of a user's query, before giving it to the most powerful model to refine what a prompt should be. I think this might be a gap that the community can help fill. We're also thinking about starting to distill some of these things ourselves, but now the GPUs are all being used to train 405 billion models. Dwarkesh Patel: You've got all these GPUs, I think you said there would be 350,000 by the end of the year. Mark Zuckerberg: That's the whole series. We built two, I guess 22,000 or 24,000 clusters, which is a single cluster that we use to train large models, obviously in a lot of what we do. A lot of our stuff is used to train the Reels model, the Facebook news feed, and the Instagram feed. Reasoning is a big deal for us because we serve a large number of people. Given the sheer size of the communities we serve, the ratio of inference computation to training we need is probably much higher than most other companies that do these jobs. Dwarkesh Patel: One of the interesting things about the material that they shared with me beforehand is that you use more data in training than you can in the computationally optimal data that you use only for training. Reasoning is a big problem for you guys, and it's a big problem for the community, and it makes sense to put trillions of tokens in it. Mark Zuckerberg: One interesting thing about the 70 billion parameter model is that we think it's going to be more saturated. We trained it with about 15 trillion tokens. I guess our prediction at the beginning was that it would be more asymptotic, but even in the end it was still learning. We could have given it more tokens and it would have been a little bit better. In a way, you're running a company, and you need to do these meta-reasoning questions. I'm trying to spend our GPUs on further training 70 billion models? Do we want to go ahead and start testing the Llama-4 hypothesis? We need to make this decision, and I think we've got a reasonable balance in this version of 70 billion. There will be another 70 billion in the future, the multimodal one, which will be launched in the next period of time. But it's fascinating that at this point, the architecture can accept so much data.

Zuckerberg's latest 20,000-word interview: Llama3, the "most powerful open source model" worth tens of billions of dollars, and everything behind it

Energy bottlenecks constrain developmentDwarkesh Patel: It's really interesting. What does this mean for future models? You mentioned that 8 billion for Llama-3 is better than 70 billion for Llama-2. Mark Zuckerberg: No, no, it's almost as good. I don't want to exaggerate. It's on the same order of magnitude. Dwarkesh Patel: Does that mean that Llama-4's 70 billion will be as good as Llama-3's 405 billion? What does the future look like? Mark Zuckerberg: That's a great question, right? I don't think anybody knows. In this world, planning an exponential curve is one of the trickiest things. How long is it going to last? I think we're likely going to continue. I think it's worth investing tens of billions or over $100 billion to build infrastructure and assume that if it continues to grow, you're going to get something really amazing that will create amazing products. I don't think there's anyone in the industry who can really tell you for sure that it's definitely going to continue to scale at that rate. Generally speaking, in history, you will run into bottlenecks at some point. With so much energy now being poured into this area, maybe those bottlenecks will be broken soon. I think that's an interesting question. Dwarkesh Patel: What would it be like in a world without these bottlenecks? Assuming progress just continues at this rate, it seems possible. From a broader point of view, forget about Llamas... Mark Zuckerberg: Well, there will be different bottlenecks. For the past few years, I think there has been an issue with this GPU production. Even companies that have the money to buy GPUs don't necessarily get as much as they want because of all these supply constraints. Now I think that's decreasing. So you see a bunch of companies that are now thinking about investing a lot of money in building these things. I think this is going to be for There is a capital issue. At what point is it not worth investing capital? I actually think that before we get into this, you're going to run into energy constraints. I don't think anyone has built a single training cluster at the gigawatt scale. These things you encounter will eventually become slower in the world. Obtaining energy permits is a government function that is heavily regulated. You start with software, and software is regulated to some extent, and I think it's more regulated than a lot of people in the tech community think. Obviously, if you're starting a small company, maybe you'll feel that. We interact with different governments and regulators around the world, and we have a lot of rules to follow and make sure we do well. There is no doubt that energy is heavily regulated. If you're talking about building a big new power plant or a big expansion, and then building transmission lines that cross other private or public land, that's just a tightly regulated thing. You're talking about years of preparation. If we want to build some large facilities, powering them is a very long-term project. I think people will do it, but I don't think it's a piece that can be like getting to a certain level of AI, raising a lot of money and putting it in, and then the model will ...... You do run into different bottlenecks along the way. Dwarkesh Patel: Did you mention something that Meta couldn't afford even if it had 10 times the R&D budget or CapEx budget right now? Is there such a thing, maybe it's an AI-related project, maybe not, even a company like Meta doesn't have the resources? Is there something that comes to your mind, but with Meta right now, you can't even issue shares or bonds for it? It's 10 times bigger than your budget? Mark Zuckerberg: I think energy is one aspect. I think if we have access to energy, we might be able to build bigger clusters than we currently have. Dwarkesh Patel: Is this fundamentally limited by money at its limits? If you have $1 trillion...... Mark Zuckerberg: I think it's a matter of time. It depends on how far the exponential curve goes. Many data centers are now around 50 megawatts or 100 megawatts in size, or 150 megawatts for a large data center. Take an entire data center, fill it with everything you need to do training, and you build the largest cluster you can build. I think there's a group of companies that are doing something like this. But when you start building a 300 megawatt, 500 megawatt, or 1 gigawatt data center, no one has ever built a 1 gigawatt data center. I think it's going to happen. It's only a matter of time, but it won't be next year. Some of these things take years to build. Just to illustrate, I think a gigawatt data center is the equivalent of a meaningful nuclear power plant that is only used to train a model. Dwarkesh Patel: Didn't Amazon do that? They have 950 megawatts. Mark Zuckerberg: I don't know exactly what they did. You'll have to ask them. Dwarkesh Patel: But it doesn't have to be in the same place, right? If distributed training works, it can be distributed. Mark Zuckerberg: Well, I think that's a big question, how is it going to work. In the future, it seems very likely that the training of these large models that we are talking about is actually closer to inference to generating synthetic data and then feeding it into the model. I don't know what that ratio will be, but I think the generation of synthetic data is more like inference than today's training. Obviously, if you're doing this to train a model, it's part of a broader training process. So it's an open question, this balance and how it's going to develop. Dwarkesh Patel: Could this also apply to Llama-3, maybe starting with Llama-4? Like if you put it out, if somebody has a lot of computing power, then they can use the model that you put out to make these things arbitrarily intelligent. Let's say there are some random countries, like Kuwait or the UAE, that have a lot of computing power, and they can actually just use Llama-4 to make something smarter. Mark Zuckerberg: I do think there will be such a dynamic, but I also think there is a fundamental limitation to model architecture. I think the 70 billion models we trained with the Llama-3 architecture can get better, and it can continue to evolve. As I said, we feel like it will continue to get better if we keep giving it more data or rotate the high-value tokens again. We've seen a group of different companies around the world basically take the Llama-2 70 billion model architecture and then build a new one. But when you make generational improvements to Llama-3 70 billion or Llama-3 405 billion, there aren't any similar open source models today. I think it's a huge step up. What one is able to build on top of, I don't think can be developed from there indefinitely. Before you reach the next step, you can optimize it a bit.

Zuckerberg's latest 20,000-word interview: Llama3, the "most powerful open source model" worth tens of billions of dollars, and everything behind it

Where does AI go in the future?Dwarkesh Patel: Let's zoom in a little bit from the specific model and even the multi-year lead time that you need to get energy approvals. In the big picture, what will happen to AI in the next few decades? Does it feel like another technology, like the metaverse or social, or does it feel like something fundamentally different over the course of human history? Mark Zuckerberg: I think it's going to be very fundamental. I think it will be more like a creation of the computer itself. You'll get all these new apps, just like you would when you got the web or mobile phone. People have basically rethought all of these experiences because a lot of things that were not possible before have become possible. So I think that's going to happen, but I think it's a much lower level of innovation. My feeling is that it's going to be more like people going from not having a computer to having a computer. On a cosmic scale, this will obviously happen quickly over a period of several decades. There are some people who worry that it will really get out of hand and go from being a little smart to being extremely smart overnight. I just think that having all these physical limitations makes this unlikely. I just don't think that's going to happen. I think we'll have time to get used to it a little bit. But it's really going to change the way we work and give people all these creative tools to do things differently. I think it's going to really empower people to do more of what they want to do. Dwarkesh Patel: So maybe not overnight, but on a cosmic scale, can we think about these milestones in that way? Humans evolved, and then artificial intelligence came along, and then they went to the galaxy. Maybe it will take decades, maybe a century, but is this the grand picture that is happening in history right now? Mark Zuckerberg: Sorry, in what sense? Dwarkesh Patel: In that sense, there are other technologies, such as computers, and even fire, but the development of AI itself is just as important as human evolution. Mark Zuckerberg: I think it's tricky. Human history is where people basically think that certain aspects of human nature are really unique in different ways and then accept the fact that it's not true, but human nature is actually still very special. We think of the Earth as the center of the universe, but it's not, but human beings are still very good, very unique, right? I think another bias that people tend to have is that intelligence is fundamentally connected to life in some way. It's not actually clear whether that's the case or not. I don't know if we have a clear enough awareness or definition of life to look at this adequately. There's all this science fiction about creating intelligence, and it's starting to take on all these human-like behaviors and things like that. The current incarnation of all these things feels like it's moving in a direction where intelligence can be quite detached from consciousness, agency, and things like that, and I think that just makes it a super valuable tool. Mark Zuckerberg: Obviously, it's hard to predict which direction these things are going to go over time, which is why I don't think anyone should dogmatically plan how to develop it or what to plan to do. You want to look at it with each release. We're obviously very supportive of open source, but I haven't committed to releasing everything we do. I'm basically very inclined to think that open source is good for the community and good for us because we're going to benefit from innovation. However, if, at some point, there is some qualitative change in the capabilities of this thing, and we feel that it is irresponsible to open source it, then we will not open source. It's hard to predict.

Zuckerberg's latest 20,000-word interview: Llama3, the "most powerful open source model" worth tens of billions of dollars, and everything behind it

Risk Balance for Open Source Dwarkesh Patel: If you see any specific qualitative changes when training Llama-5 or Llama-4, it makes you think, "You know what, I'm not sure if to open source it"? Mark Zuckerberg: It's a bit difficult to answer this question in the abstract because any product can exhibit negative behaviors, and as long as you can mitigate those behaviors, that's fine. There are bad things about social media, and we try to alleviate them. There are also downsides to Llama-2, and we spent a lot of time trying to make sure that it doesn't help people commit acts of violence or anything like that. This does not mean that it is an autonomous or intelligent being, it just means that it has learned a lot about the world and that it can answer some questions that we think it is not helpful to have it answered. I think the question is not what behaviors it exhibits, but what we can't mitigate after it exhibits those behaviors. I think there are so many ways to make things good or bad that it's hard to enumerate all of them beforehand. Look at the situations and various injuries that we have had to deal with in social media. We've basically summed up about 18 or 19 categories of harmful things that people would do, and we've basically built AI systems to identify what those things are and try to make sure that those things don't happen on our network. Over time, I think you'll be able to break it down into a more detailed taxonomy as well. I think it's something that we've spent time looking at because we want to make sure we understand that. Dwarkesh Patel: In my opinion, that's a good idea. I would be disappointed if, in the future, AI systems are not widely deployed and everyone will not be able to access them. At the same time, I want to better understand the mitigation measures. If the mitigation is fine-tuning, the problem with open weights is that you can remove fine-tuning, which is usually a superficial feature on top of those abilities. If it's like talking to a biology researcher on Slack...... I think the model is far from that. Now, they're like Google searches. But if I can show them my Petri dish and they can explain why my smallpox sample isn't growing and what needs to be changed, how do you alleviate that? Because somebody can just fine-tune it in, right? Mark Zuckerberg: That's true. I think that most people will choose to use off-the-shelf models directly, but there are also some ill-intentioned people who may try to use these models for bad behavior, and on the other hand, one of the reasons why I'm so philosophically in favor of open source is that I think the potential risks of AI in the future if it's overly centralized are probably as much as if it were widely disseminated. A lot of people are thinking, "If we can do this, is it going to be a bad thing for these technologies to be widely used in society?" At the same time, another question worth thinking about is, if an institution has more powerful AI than everyone else, is that also a bad thing? I think of a security analogy where there are so many security vulnerabilities in many different things. If you could go back a year or two, let's say you just had a year or two more about security breaches. You can hack into almost any system. This is not artificial intelligence. So it's not entirely fanciful to believe that a very intelligent AI might be able to identify some vulnerabilities, basically like a human could go back a year or two and break all those systems. So how do we as a society deal with this? An important part of that is open source software, which makes it so that when software is improved, it is not limited to just one company's product, but can be widely deployed to many different systems, whether it's a bank, a hospital, or a government thing. As software becomes more powerful, it's because more people can see it, more people can bang on it, and there are some standards for how these things work. Worlds can be upgraded together very quickly. I think in a world where AI is being deployed very widely, and it's been hardened over time, all the different systems are going to be constrained in some way. In my opinion, this is fundamentally much healthier than this is more concentrated. So there's risk in all directions, but I think it's a risk that I don't hear people talk about as much. There is a risk that AI systems will do bad things. But what I worry about all night is that an untrustworthy actor has super-powerful AI, whether it's a hostile government, an untrustworthy company, or something else. I think it's probably a much bigger risk. Dwarkesh Patel: Because they have a weapon that no one else has? Mark Zuckerberg: Or just create a lot of chaos. My gut feeling is that these things end up being very important and valuable for economic, security, and other reasons. If someone you don't trust or an adversary gets something stronger, then I think that could be a problem. Perhaps the best way to mitigate this is to have good open-source AI, make it the standard, and be a leader in many ways. It just makes sure that it's a more level and balanced playing field. Dwarkesh Patel: That seems reasonable to me. If this becomes a reality, it will be the future I prefer more. I want to understand mechanically, how does the fact that there are open source AI systems in the world prevent someone from creating chaos with their AI systems? Mark Zuckerberg: If you take the security issue that I mentioned as an example, I think someone with weaker AI would be less likely to succeed if they tried to hack into a system that was protected by stronger AI. As far as software security is concerned. Dwarkesh Patel: How do we know that everything in the world is like this? What if biological weapons aren't like that? Mark Zuckerberg: I mean, I didn't know everything in the world was like this. Biological weapons is one of the areas of concern for people who are most worried about these kinds of things, and I think that makes a lot of sense. There are some mitigations. You can try not to train certain knowledge into the model. There are different approaches, but in a way, if you come across a really bad actor and you don't have other AI to balance them out and understand what the threat is, that's probably a risk. This is one of the things we need to pay attention to. Dwarkesh Patel: Can you see anything when you're deploying these systems? Like you're training Llama-4 and it tricks you because it thinks you didn't notice something, and then you think "Wow, what's going on?" This may be unlikely in a system like Llama-4, but can you imagine any similar situation that would make you really worried about deception, and billions of such copies spreading in the wild? Mark Zuckerberg: I mean, now we're seeing a lot of hallucinations. More so. I think it's an interesting question, how do you tell the difference between hallucinations and deception. There are a lot of risks and things to consider. At least in running our company, I try to balance these long-term theoretical risks with what I actually believe to be fairly real risks that exist today. So when you talk about deception, my biggest concern is that people are using this to create misinformation and then spread it through our network or other networks. The way we fight this harmful content is by building AI systems that are smarter than confrontational. That's part of my theory as well. If you look at the various kinds of harm that people do or try to do through social networks, some are not very confrontational. Hate speech, for example, is not hyper-confrontational in the sense that people aren't getting better at racism. At this point, I think AI is getting more and more sophisticated in general, and it's doing it much faster than people are on these issues. We have problems on both fronts. People do bad things, whether it's trying to incite violence or whatever, but we also have a lot of false positives, basically something that we shouldn't be censoring. I think that's understandably annoying to a lot of people. So I think it would be good to have an AI that's getting more and more precise in that over time. In these cases, I'm still thinking about the ability to make our AI systems more complex at a faster rate than they do. It's an arms race, but I think we're at least currently winning it. It's a lot of things I've spent time thinking about. Yes, whether it's Llama-4 or Llama-6, we all need to think about the behavior we observe Dwarkesh Patel: Part of the reason you open source it is that there are a lot of other people who are also working on this. Mark Zuckerberg: So, yes, we want to see what other people are looking at, what we're looking at, what we can improve. We then evaluate whether it can be open sourced. But I think for the foreseeable future, I'm optimistic that we can do that. In the short term, I don't want to ignore the actual bad things that people are trying to use these models today, even if they don't exist, but they're like pretty serious everyday harms that we're familiar with and run our services. Actually, I think it's something that we have to spend a lot of time on. I find the synthetic data thing really weird, and I'm actually interested in why you don't think like the current model, and why it might make sense to have asymptote in synthetic data over and over again. If they get smarter and adopt the kind of techniques that I mentioned in my paper or blog post, which will be widely applied on the day of launch, it will lead to the right chain of thinking. Why doesn't this form a cycle? Of course, it doesn't happen overnight, but after months or even years of training. It might be possible to use a smarter model, it will get smarter, it will produce better output, and then it will get smarter again, and so on. I think this is achievable within the parameters of the model architecture. In a way, I'm not sure, I think like today's 8 billion parameter models, I don't think you're going to be as good as the most advanced hundreds of billions of parameter models that incorporate new research into the architecture itself. But those models will also be open source, but I think that depends on all the issues that we've just discussed. We hope that will be the case. However, at each stage, like when you're developing software, you can do a lot of things with the software, but in a way, you're going to be limited by the chip that's running it, so there's always going to be different physical limitations. The size of the model will be limited by the amount of energy you can obtain and use for inference. So I'm also very optimistic that these things will continue to improve at a rapid pace. I'm more cautious than some, I just don't think it's likely to get out of hand. I think it makes sense to keep the options open. There are so many unknowns that we face. There is a situation where it is really important to maintain the balance of power. It's like there's an intellectual explosion and they love to win. A lot of things seem possible. Just like keeping your options open, it seems reasonable to consider all of them. Dwarkesh Patel: Meta as a big company. You can do both. As for the other dangers of open source, I think you're making some really reasonable points about the balance of power and the harms that we can eliminate through better alignment techniques or whatever. I wish Meta had some kind of framework. Other labs have frameworks like this, and they'll say, "If we see this specific thing, it can't be open sourced, and maybe not even deployed." "Just write it down so the company is ready, people have something to expect from it, and so on. Mark Zuckerberg: That's a good point in terms of existential risk. Now we're more focused on the types of risks that we're seeing today, more of these content risks, and we don't want the model to do something that helps people commit violence, fraud, or harm people in different ways. It might be intellectually more interesting to talk about existential risk, but I actually think that the real harm that takes more effort to mitigate is that someone is holding a model and doing something that hurts others. In practice, for the current model, I guess the next generation model, or even the next generation model, these are the more common harms that we see today, like people cheating on each other and stuff. I just don't want to underestimate that. I think it's our responsibility to make sure that we do a good job in that regard. Dwarkesh Patel: Meta is a big company. You can do both. Mark Zuckerberg: Exactly.

Zuckerberg's latest 20,000-word interview: Llama3, the "most powerful open source model" worth tens of billions of dollars, and everything behind it

Thoughts on the MetaverseDwarkesh Patel: Let's talk about something else. Metaverse. What period of human history would you most like to go to? From 100,000 B.C. to the present, you just want to see what it was like? Mark Zuckerberg: Does it have to be the past? Dwarkesh Patel: yes, it must be the past. Mark Zuckerberg: I'm very interested in American history and classical history. I'm also interested in the history of science. I actually thought it would be interesting to see and try to learn more about how some of the big developments are happening. All we have is some limited knowledge about these things. I'm not sure if the metaverse will allow you to do that, because it's hard to go back in time for things we haven't documented. I'm actually not sure if going back in time would be a big deal. I think this would be cool for something like a history lesson, but it's probably not the use case I'm most excited about about the metaverse overall. The main thing is to be able to feel with people, no matter where you are. I think that would be fatal. A lot of the conversations we're having about AI are about the physical limitations behind all of this. I think one of the lessons of technology is that you want to move things out of the realm of physical constraints and into software as much as possible, because software is much easier to build and evolve. You can make it more democratized because not everyone will have a data center, but a lot of people can write code and modify open source code. The goal of the metaverse version is to achieve a real digital presence. It's going to be an absolutely huge difference, so people don't feel like they have to be together for a lot of things. Now I think there might be something better together. These things are not black and white. It's not like "Okay, now you don't need to do that anymore." "But overall, I think it's going to be very powerful for socializing, connecting with people, work, some parts of industry, medicine, and many other things. Dwarkesh Patel: I'd like to go back to one thing you said at the beginning of the conversation. You didn't sell the company for $1 billion. Regarding the metaverse, you know you're going to do it, even if the market slams you out for it. I'm curious. What is the source of that advantage? You say, "Oh, values, I have this intuition," but everybody says that. If you were to say something that is unique to you, how would you say it, and why are you so convinced of the metaverse? Mark Zuckerberg: I think these are different questions. What drives me? We've talked about a lot of topics. I just really love creating things, and I especially like creating things around how people communicate and understand how people express themselves and their work. I studied computer science and psychology when I went to college, and I think a lot of other people in the industry studied computer science. So for me, the intersection of these two things has always been important. It's also a very deep driver. I don't know how to explain it, but I feel from the bottom of my heart that if I don't create something new, I'm doing something wrong. Even when we're making a business case for investing $100 billion in AI or investing huge sums of money in the metaverse, we have plans, and I think those plans are very clear, and if our stuff works, it would be a good investment. But you can't know from the beginning, and besides, people have all sorts of arguments, whether it's with advisors or different people. Dwarkesh Patel: Well, how can you, how do you have enough confidence to do this? You can't be sure from the beginning. People have all kinds of arguments, discuss with advisors or different people. How do you have enough confidence to do this? Mark Zuckerberg: The day I stop trying to create new things, I'm done, and I'm going to create new things somewhere else. I fundamentally can't run something or in my own life and not try to create something new that I think is interesting. For me, it's not even a question of whether we're going to try to create the next thing. I just can't help it, I don't know. I've been like this in every aspect of my life. Our family built this ranch on Kauai, and I was involved in designing all of these buildings. We started raising cattle and I was like, "Okay, I want to raise the best cattle in the world, so how do we design this ranch so that we can figure it out and build all the things that we need to try to do." I don't know, that's what I am. Dwarkesh Patel: I'm not sure, but I'm actually curious about another thing. At the age of 19, you read a lot of ancient and classical works, including during high school and college. What important lesson did you learn from it? not just the interesting things you found, but things like...... By the time you're 19 years old, you won't be consuming many tokens. A lot of it is about the classics. Obviously, this is important in some way. Mark Zuckerberg: You don't consume a lot of tokens...... That's a good question. That's one of the things that I think is really interesting. Augustus became emperor, and he tried to establish peace. There was no real concept of peace at that time. The understanding of peace is a temporary period between the inevitable attack of the enemy on you. So you can get a short break. He had the idea of changing the economy from something mercenary and militaristic to a de facto positive-sum game. This was a very novel idea at the time. This is a very fundamental thing: the boundaries that one can imagine at the time as a rational way of working. This applies to both the metaverse and the AI stuff. Many investors and others can't understand why we want to open source. It's like "I don't understand, it's open source." This must just be a temporary period for you to make things proprietary, right?" I think that's a very profound thing in technology, and it actually creates a lot of winners. I don't want to overemphasize this analogy, but I do think that a lot of times, there are patterns of building things that people don't usually understand. They can't understand how this can be a valuable thing for people, or how it can be a reasonable state of the world. I think there's a lot more to it than people think. Dwarkesh Patel: It's very interesting. Can I tell you what I was thinking? about what you might get out of it? That may not be right at all, but I think the point is that some of these people have important roles and how young they are in the empire. For example, Caesar Augustus, by the time he was 19 years old, was already one of the most important figures in Roman politics. He is leading the battle to form a second triumvirate. I wonder if you're 19-year-old thinking "I can do it because Caesar Augustus did." Mark Zuckerberg: That's an interesting example, both in a lot of history and in American history. One of my favorite quotes is one of Picasso's, that all children are artists, and the challenge is to remain an artist as they get older. When you're younger, it's easier to have crazy ideas. In your life, and for your company or whatever you build, there are all these analogies to the innovator's dilemma. You're in an earlier position on the trajectory, so it's easier to pivot and embrace new ideas without breaking other commitments to different things. I think that's one of the fun parts of running a company. How do you stay dynamic?

Zuckerberg's latest 20,000-word interview: Llama3, the "most powerful open source model" worth tens of billions of dollars, and everything behind it

Open Source $10 Billion Model Dwarkesh Patel: Let's get back to the topic of investors and open source. The $10 billion model, assuming it's completely safe. You've already done these evaluations, and unlike in this case, the evaluator can fine-tune the model as well, hopefully in future models as well. Will you open source this $10 billion model? Mark Zuckerberg: As long as it helps us, it will. Dwarkesh Patel: But will it help? $10 billion in R&D, now it's open source. Mark Zuckerberg: It's also an issue that we need to evaluate over time. We have a long history of open source software, but we don't tend to open source our products, and we don't take Instagram's code to open source. We've taken a lot of the underlying infrastructure and made it open source. Probably the biggest one in our history was our Open Compute project, where we took the design of all of our servers, network switches, and data centers and made it open source, and in the end it proved to be very helpful. While many people can design servers, the industry has now adopted our design standards, which means that the supply chain is basically built around our designs. So production goes up, it's cheaper for everybody, it saves us billions of dollars, and that's fantastic. So, there are a number of ways that open source can help us. One is if people figure out how to run the model cheaper. Over time, we're going to spend tens of billions of dollars or more on all of these things. So if we can improve efficiency by 10 percent, we can save billions or tens of billions of dollars. That in itself may be worth a lot. Especially if there are other competitive models, our stuff is not giving away some kind of crazy advantage. Dwarkesh Patel: So your view is that training is going to be commoditized? Mark Zuckerberg: I think there could be a lot of ways to go, and this is one of them. Therefore, "commoditization" means that it will become very cheap because there are so many options. Another direction in which this may develop is qualitative improvement. You mentioned fine-tuning. Right now, there's very limited what you can do with fine-tuning other major models. There are some options, but they are usually not available for the largest models. Ability to do that, different application-specific things or specific use case things, or build them into a specific toolchain. I think this will not only lead to more efficient development, but also potentially a qualitative difference. Here's an analogy. I think one of the issues that is prevalent in the mobile ecosystem is that you have these two gatekeepers, Apple and Google, that can tell you what to allow to build. There's an economic version, like we build something, and then they take you a whole chunk of money. But there's also a qualitative version, which actually upsets me even more. There have been many times when we've launched or wanted to launch some feature, and Apple has said "no, you can't launch it." "That's bad, right, so the question is, are we building such a world for AI? You're going to get a handful of companies running these closed models that will control the API and therefore be able to tell you what you can build? For us, I can say it's worthwhile to make sure we're not in that position, to build a model ourselves. I don't want any other company to tell us what we can build. From an open source perspective, I don't think a lot of developers want companies to tell them what they can build. So the question is, what's the ecosystem that's been built around that? what's the interesting new stuff? to what extent does that improve our product? I know there's a lot of situations where if this ends up being our database or caching system or architecture, we're going to get valuable contributions from the community that will make our product better. And then, the work that we're doing on a particular application is still so varied that it doesn't really matter, right? Maybe the model ends up being more like the product itself, and in this case, I think open source or not becomes a much more complex economic calculation, because to do that is largely commoditizing yourself. But from what I've seen so far, it doesn't seem like we're there yet. Dwarkesh Patel: Do you expect to make a decent amount of revenue from licensing your model to a cloud provider? So they have to pay a fee to actually offer the model. Mark Zuckerberg: We wish there was such an arrangement, but I don't know how important it would be. This is basically our license for Llama, and in many ways it's a very permissive open source license, except that we have a limit on the use of it by the largest companies. That's why we've set this limit. We're not trying to stop them from using it. We just want them to come and talk to us if they're going to basically take what we've built, resell it and make money out of it. If you're a company like Microsoft Azure or Amazon, if you're going to resell the model, then we should get a piece of the pie. So before you go for it, talk to us about it. That's how things went. So with Llama-2, we have deals with basically all of these major cloud companies, and Llama-2 is available as a managed service on all of these clouds. I'm assuming that this is going to be a bigger thing as we release bigger and bigger models. It's not the main thing we're doing, but I think it makes sense that if these companies were going to sell our models, we should share the benefits in some way. Dwarkesh Patel: On the other dangers of open source, I think you're making some really reasonable points about the balance of power, and the harms that we can eliminate with better alignment techniques or whatever. I wish Meta had some kind of framework. Other labs have frameworks like this, and they'll say, "If we see this specific thing, it can't be open sourced, and maybe not even deployed." "Just write it down so the company is ready, people have something to expect from it, and so on. Mark Zuckerberg: That's a good point in terms of existential risk. Now we're more focused on the types of risks that we're seeing today, more about these content risks. We don't want the model to do something that helps people commit violence, cheat, or harm people in different ways. While it may be intellectually more interesting to talk about existential risk, I actually think that the real harm that requires more effort to mitigate is when someone takes the model and does something that hurts others. In practice, for the current model, I guess the next generation model, or even the next generation model, these are the more common harms that we see today, like people cheating on each other and stuff. I just don't want to underestimate that. I think it's our responsibility to make sure that we do a good job in that regard. Dwarkesh Patel: In terms of open source, I'm curious if you think it's possible for open source projects like PyTorch, React, Open Compute, etc., to have an impact on the world beyond Meta's impact on social media? I've talked to users of these services, and they think it's a possibility, because most of the internet depends on these open source projects. Mark Zuckerberg: Our consumer products do have a huge user base globally, reaching almost half of the world's population. However, I think open source is becoming a new, powerful way to build. It could be like Bell Labs, where they originally developed transistors to enable long-distance calls, and that goal did happen and brought them a decent profit. But five to 10 years from now, when people look back on the inventions they're most proud of, they may mention other, more far-reaching technologies. I strongly believe that many of the projects we build, such as Reality Labs, some AI projects, and some open source projects, will have a lasting and profound impact on human progress. While specific products evolve, emerge, and disappear over time, their contributions to human society are long-lasting. That's an exciting part of what we're able to participate in together as technology practitioners.

Zuckerberg's latest 20,000-word interview: Llama3, the "most powerful open source model" worth tens of billions of dollars, and everything behind it

Dwarkesh Patel: Regarding your Llama model, when will it be trained on your own custom chip?Mark Zuckerberg: Soon, we're working on this process, but Llama-4 may not be the first model to be trained on a custom chip. The approach we take is to develop our own custom chips to handle our ranking and recommendation type inference tasks first, such as Reels, newsfeed ads, etc. This allows us to use more expensive NVIDIA GPUs for training more complex models once we are able to offload these tasks to our own chips. In the near future, we will hopefully have our own chips, which we can use to train some relatively simple things first, and then eventually train these very large models. At the same time, I would say that this project is going well, we are moving forward in an orderly manner, and we have a long-term roadmap.

Zuckerberg's latest 20,000-word interview: Llama3, the "most powerful open source model" worth tens of billions of dollars, and everything behind it

If Xiaozha becomes Google+'s CEODwarkesh Patel: One last question. This is completely off topic, if you are appointed CEO of Google+, can you make it successful? Mark Zuckerberg: Google+? Oh. Well, I don't know. I don't know, it's a very difficult counterfact. Dwarkesh Patel: Okay, so the real final question is: did someone in the office say "Carthago delenda est" when Gemini was launched? Mark Zuckerberg: No, I think we're gentler now. That's a good question. The problem is that Google+ doesn't have a CEO. It's just a department within the company. You've asked earlier what is the rarest commodity, but you're asking about the dollar. I actually think that for most companies of this size, the most scarce thing is focus. When you're a startup, maybe you're more constrained in terms of funding. You focus on only one idea, and you may not have all the resources. At some point, you will cross a threshold and get into the essence of what you are doing. You're building multiple things, you're creating more value between them, but you're becoming more limited in the amount of energy you can put into it. There are always situations where something great happens randomly in an organization that I don't even know about. Those are great. But I think in general, the capabilities of an organization are largely limited by what the CEO and management team are able to oversee and manage. This has always been a focus for us. As Ben Horowitz said, we should put the main things first and try to focus on your key priorities. Dwarkesh Patel: Very good, thank you very much. Mark, you've done fantastic. ⭐ Star Wall Street news, good content do not miss ⭐ This article does not constitute personal investment advice, does not represent the views of the platform, the market is risky, investment needs to be cautious, please make independent judgment and decision-making.

Read on