The old subordinate revealed Musk's secret: he hated meetings, did not want to be non-technical middle-level, and advocated layoffs
Fish and sheep from the Au Fei Temple
量子位 | 公众号 QbitAI
Musk is already famous for being a "devil boss".
Now, his old subordinate Andrej Karpathy has "hammered" him again in the latest interview: I have to beg him (Musk) to allow me to hire people, and he always defaults to laying off employees.
In addition to liking layoffs, at this AI Ascent event organized by Sequoia, Kapathi also revealed more details of Musk's management of the company: he hates meetings, refuses to lie down, and prefers to talk directly to engineers about work than VP......
In addition, he also talked about a lot of big model topics that big guys care about, including: Does size matter, how can young startups compete with OpenAI?
For more details, the text version is shared below~
(Claude 3亦有贡献)
Large language models are the CPUs of the new era
Q: Andrey, thank you very much for joining us today. OpenAI's original office was just across the street from our San Francisco office, and many of you were crammed together at the time.
In addition to working upstairs in a chocolate factory, fulfilling Willy Wonka's dream, what other memorable moments have you had working here?
Kapasi: yes, OpenAI's original office was there, if not counting Greg's apartment.
We were there for about two years, and the chocolate factory downstairs was always delicious. At that time, the team was about 10-20 people.
We had a really fun time there. Lao Huang mentioned at the GTC conference that he sent the first DGX supercomputer to OpenAI, and it happened there.
Q: Andre doesn't need an introduction, but I'd like to mention his background. He studied with Geoffrey Hinton and Feifei Li, and first made a name for himself with his deep learning courses at Stanford University.
In 2015, he co-founded OpenAI. In 2017, he was poached by Musk.
You may not remember the situation at that time: Tesla went through 6 heads of Autopilot, and each of them only worked for 6 months. I remember when Andrei took over the position, I wished him the best of luck.
It didn't take long for him to return to OpenAI. And now he has complete freedom to do whatever he wants. So we're looking forward to hearing what he has to offer today.
What I admire most about André is that he is a fascinating futurist thinker, he is a staunch optimist, and at the same time he is a very pragmatic builder. Today he will share with us some insights on these aspects.
First of all, even seven years ago, AGI seemed like an almost impossible goal in our lifetimes. And now it seems to be in sight. What are your thoughts on the next 10 years?
Kapasi: You're right. A few years ago, the path of AGI was still very unclear and was still in the stage of very academic discussion. But now it's clear that everyone is trying to fill the gap.
Optimization work is in full swing. Broadly speaking, everyone is trying to build a "large model operating system (LLM OS)".
I like to compare it to an operating system. You'll have to prepare all kinds of peripherals and connect them to a new CPU. These peripherals include text, images, audio, and other modalities. The CPU is the language model itself. It also needs to be connected to all the Software 1.0 infrastructure that we've built.
I think there's a lot of work going on to build something like that and then tailor it to something that works in every part of the economy.
In general, the direction of development is that we can adjust these relatively independent agents, assign them high-level tasks, and let them specialize in various tasks. It's going to be very fun and exciting. And there will be more than one agent, there will be many. Imagine what that would look like?
Q: If the future is really what you say, how should we adjust our lifestyle now?
Kapasi: I don't know. I think we have to work on building it, influencing it, making sure it's positive. In short, try to make the results as good as possible.
Q: Since you are now a free man, I would like to ask a significant question, which is that OpenAI is dominating the entire ecosystem.
Most of you here today are entrepreneurs trying to carve out some niche markets and pray that OpenAI won't bring them down overnight.
Do you think there's still a chance in it, and in what areas will OpenAI continue to dominate?
Kapasi: My overall impression is that OpenAI is working on building an LLM operating system. As we heard earlier today, OpenAI is trying to develop a platform. On top of that, you can build different companies in different verticals.
The operating system analogy is actually interesting, because operating systems like Windows also come with some default applications, such as browsers.
So I think OpenAI or other companies might launch some default apps as well, but that doesn't mean you can't run different browsers on top of them, you can run different agents on top of them.
There will be some default apps, but there may also be a vibrant ecosystem with a wide variety of apps, fine-tuned for specific scenarios.
I'm a big fan of the analogy with the early iPhone app. These apps are a bit of a joke at first and take time to develop. I think we're experiencing the same thing right now. People are trying to figure out what this thing is good at, what it's not good at, how do I use it, how do I program it, how do I debug it, how do I get it to do the actual task, what kind of supervision is needed, because it's fairly autonomous, but not completely autonomous. So what should oversight look like? What should evaluation look like? There's a lot to think about, to understand. I think it's going to take some time to figure out how to work with this new infrastructure. So I think we'll see that in the next few years.
Q: The race for large language models is in full swing right now, with OpenAI, Anthropic, Mistral, Llama, Gemini, and the entire ecosystem of open-source models, as well as a large number of small models. How do you foresee the future of the ecosystem?
Capasi: yes, so I emphasize again that the analogy of operating systems is interesting. We have closed-source systems like Windows and macOS, as well as open-source Linux. I think that's probably the case with big models.
We also have to be careful when we call these models, and many of the models you listed, like Llama, Mistral, etc., I don't think they're really open source. It's like throwing out an operating system binary that you can use, but not entirely useful. There are some language models that I consider to be completely open source that have a complete release of all the infrastructure needed to compile the "operating system", from data collection to model training. This is definitely much better than just getting the model weights, because you can fine-tune the model.
But I think there's a subtle problem that you can't fully fine-tune the model, because the more you fine-tune, the worse it will perform on everything else.
So if you want to add a certain capability without impacting the others, you may actually need to mix the previous dataset distribution and the new dataset distribution for training. If you just give you model weights, you can't do that. You need training loops, you need datasets, etc. So you're actually limited when you use these models.
They are certainly helpful, but we may need better terms to describe them. Open weight models, open source models, and proprietary models, the ecosystem might look like this. And most likely very similar to our ecosystem today.
Scale is the most important determining factor
Q: Another question I want to ask is scale. In simple terms, size seems to be the most important thing. Data scale and computing power scale. As a result, large research labs, big tech giants have a huge advantage today. What do you think about that? Is scale everything? If not, what's important?
Kapathi: I think scale definitely comes first.
There are some details that really need to be taken care of. I think it's also important to prepare the dataset so that the data is very good and very clean, which can make the computational efficiency better.
But I think size is going to be the main deciding factor, the number one main ingredient, and of course you have to get a lot of other things right.
If you don't have scale, you can't train these large models at all. If you're just doing fine-tuning and stuff like that, you probably don't need that much scale, but we haven't really seen that fully materialize.
Q: Can you elaborate on what other factors you think are important besides size, perhaps a lower priority?
Kapasi: First of all, you can't just train these models. If you're just providing the funding and scale, it's still very difficult to actually train these models.
Part of the reason is that the infrastructure is too new, still under development, and not yet perfect. But training a model at this scale is extremely difficult and is a very complex distributed optimization problem. Talent in this area is actually quite scarce at the moment. It's basically a crazy thing, with models running on thousands of GPUs and randomly failing at different points in time. Monitoring this process and getting it to work is actually an extremely difficult challenge.
Until recently, GPUs were able to handle workloads of 10,000 GPUs as expected. So I think a lot of infrastructure is squeaking under this pressure, and we need to fix that.
Now, if you're just giving someone a lot of money or a lot of GPUs, I'm not sure if they're going to be able to produce big models directly, and that's why it's not just a matter of scale. You actually need a lot of expertise, both in terms of infrastructure, in terms of algorithms, and in terms of data, to be very careful.
Q: The ecosystem is evolving so fast that some of the challenges that we thought existed a year ago are now increasingly being addressed. Hallucinations, contextual windows, multimodal capabilities, faster and cheaper inference. What are some of the challenges of language model research that keep you up at night, and what problems do you think are urgent enough to be solved?
Kapathi: I think one of the things that I think a lot about in terms of algorithms is the clear difference between diffusion models and autoregressive models. They are both methods of representing probability distributions. It turns out that different modalities clearly fit into one or the other. I think there might be some space to unify them, or tie them together in some way.
Another thing I want to point out is the intrinsic efficiency of the infrastructure that runs the big model. My brain is about 20 watts. Lao Huang just talked about the large supercomputer they want to build at GTC, and the numbers are all in the megawatt range. So maybe you don't need that much energy to run a brain. I don't know exactly how much it will take, but I think it's safe to say that we can improve the efficiency of running these models by a factor of 1,000 to 1,000,000 percent.
I think part of the reason is that the current machines are simply not suitable for this workload. Nvidia's GPUs are a good step in this direction, because you need extremely high parallelism. We don't really care about sequential calculations that rely on data in some way. We just need to execute the same algorithm on many different array elements. So I think the first is to adapt the computer architecture to accommodate the new data workflows, and the second is to push for some of the things that we're currently seeing improvements.
The first one could be precision. We've seen the precision drop from the original 64-bit double to the current 4, 5, 6 bits, or even 1.5 to 8 bits depending on the paper you're reading. So I think precision is a big lever to control this problem.
The second, of course, is sparsity. In fact, many parameters in a large model are zero, or close to zero. So it would be great if you could take advantage of this in some way, say make sparse matrix multiplication more efficient. There is some promising research in this area.
There are also some interesting ideas, such as Singular Value Decomposition (SVD), to see if it can be broken down into smaller matrices and then recombined. For example, only forward propagation is calculated, no backpropagation is done, and a smaller model is trained to predict the output of the larger model.
So I think, fundamentally, there are two problems to solve:
One is to build more suitable hardware. The other is to find better algorithms that increase efficiency while maintaining performance.
I think there's a lot of room to explore in both areas. From an energy efficiency perspective, if we can close the gap with the brain, it would be a huge step forward. This could mean that each of us can afford a model, or run one on our devices, without the need to connect to the cloud.
Musk "manages the world's largest startup"
Q: Okay, let's change the subject. You've worked side-by-side with many of the great minds of our time, OpenAI's Sam, Greg, and other team members, as well as Elon Musk.
How many of you have heard a joke about the U.S. rowing team and the Japanese rowing team? It's an interesting story. Musk shared the joke, and I think it reflects a lot of his philosophy in building the culture and team. There are two teams in the story, the Japanese team has 4 rowers and 1 helmsman, and the American team has 4 rowers and 1 rower. Can anyone guess what the U.S. team will do when they lose? Say it out loud. Exactly, they'll fire that paddler.
By sharing this example, I think Musk is illustrating his views on hiring the right people and building the right team. What have you learned by working closely with these incredible leaders?
Kapathi: I would say that Musk's way of managing the company is very unique. I feel like people don't really realize how special it is. Even if you listen to someone else, it's hard to fully understand. I find it hard to put into words. I don't even know where to start. But it's really a very unique, different way.
In my words, he's managing the world's largest startup. I think it's hard to describe it right now, and it may take longer to think about and summarize.
But first and foremost, he likes to build a company with a small team of strong and highly skilled people.
In other companies, the team size tends to grow as it grows. Musk, on the other hand, has always opposed the overextension of the team. I had to put a lot of effort into recruiting employees. I had to beg him to allow me to recruit people.
Plus, it's often difficult for large companies to get rid of underperforming employees. Musk, on the other hand, prefers to take the initiative to lay off people.
In fact, in order to retain some employees, I had to argue because he always defaulted to laying them off.
So the first point is to maintain a small team with strong strength and excellent technology. Never have that kind of non-technical middle management. This is the most important point.
The second point is how he creates the atmosphere and how he feels when he walks into the office.
He wants a dynamic work environment. People move around, think about problems, and focus on exciting things. They are either writing and drawing on a whiteboard or typing code in front of a computer. He doesn't like a pool of stagnant water, he doesn't like the lack of life in the office.
He also doesn't like lengthy meetings and always encourages people to leave meetings when they don't make sense. You can really see that if you don't contribute to the conference and you don't get anything out of it, you can just walk away, and he's very supportive of that. I think that's hard to see in other companies.
So I think creating a positive working atmosphere is the second important idea that he instilled. Perhaps this also includes the fact that when companies get bigger, they tend to over-pamper their employees. And not so in his company. The culture of the company is that you have to give 100% professional competence, and the pace and intensity of work are very high.
I think the last point, perhaps the most unique, the most interesting and the most unusual is that he is so closely connected to the team.
Usually the CEO of a company is a remote person, managing 5 levels of subordinates, only communicating with the vice president, the vice president then communicates with their subordinate supervisors, the supervisor communicates with the manager, you can only talk to the immediate boss. But Musk runs the company in a completely different way. He would come to the office in person and speak directly with the engineers.
When we have meetings, there are often 50 people in the room who face Musk, who talks directly to the engineers. He didn't want to just talk to the VPs, the executives.
Typically a CEO spends 99% of his time communicating with the VP, while he may spend 50% of his time communicating with engineers. So if the team is small and efficient, then engineers and code are the most trusted sources of information. They have first-hand truth. Musk will talk directly to the engineers to understand the actual situation and discuss how to improve.
So I would say that he's very close to the team and not out of reach, which is very unique.
In addition, the way in which he exercises power within the company is unusual. For example, if he talks to an engineer and learns about some issues that are holding back the progress of the project. For example, if an engineer says, "I don't have enough GPUs to run the program," he'll take it to heart. If he hears a similar complaint twice, he'll say, "Okay, that's a problem." So what's the timeline now, and when will it be resolved?"
If he doesn't get a satisfactory answer, he'll say, "I'm going to talk to the head of the GPU cluster," and someone will call the person in charge, and he'll say bluntly, "Double the cluster capacity now." Begin tomorrow and report back to me every day until the cluster is doubled. ”
The other party may excuse it and say that it will take 6 months to go through the procurement process. That's when Musk frowns and says, "Okay, I'm going to talk to Huang." "Then he'll just remove the obstacles to the project.
So I don't think people really realize how deeply involved he is, how he removes obstacles, how he exerts influence.
Honestly, leaving such an environment and going to a normal company, you will really miss these unique places.
— END —