laitimes

Dialogue with Zhou Hongyi, founder of 360 Group: Making a large language model is much simpler than making a lithography machine

author:Embroidery Corporation

The most pointed questions about the big model, and the answer given by Zhou Hongyi.

This article is a transcript of the conversation between Luo Yihang, founder and CEO of Pinplay, and Zhou Hongyi, founder and CEO of 360 Group, at the "Model Speculation - Domestic Large Model Ecological Seminar" held by Pinplay on May 31, which has been collated, edited and released.

Interviewer: Luo Yihang

Interviewee: Zhou Hongyi

Dialogue with Zhou Hongyi, founder of 360 Group: Making a large language model is much simpler than making a lithography machine

Large models are indeed much simpler than lithography machines

Luo Yihang: I am now the chief evangelist of Pintou, and today I am talking to a major builder in the field of Chinese big language models, and he is also an evanglist (evangelist) in the field of large language models. Over the past few months, we've seen him share his views on big language model trends on various occasions, and of course his own team is working on big language base models. He is Mr. Zhou Hongyi, Chairman and CEO of 360 Group.

Zhou Hongyi: I was sick not long ago to heal, I am the first Yang, the sequelae are more serious, if you have a big model hallucination for a while, say something wrong, don't care.

Luo Yihang: That depends on how I give prompts, and the reason for model hallucinations is often that prompts are not well given.

Zhou Hongyi: I have listened to your opening speech below for so long, and I feel that it is almost overflowing.

Luo Yihang: That's your Token is not enough. Well, I'll try to ask you questions in short, precise prompts. Everyone has been talking about big models in the past three months, do you think the gap between China and the United States in the field of big language models is bigger or smaller after the Spring Festival?

Zhou Hongyi: I think of course it's smaller, because when ChatGPT first came out, I, a searcher, used the idea of search to look at artificial intelligence, it was unbelievable, it turned out that the path answer to develop a large model can be like this, and you don't know how it works.

However, in recent months, domestic counterparts have successively released their own large models, although objectively speaking, there is still a bit of a gap with GPT4.0, and there is also a bit of a gap compared to GPT3.5, but the gap is not so big.

By the way, domestic testing large models especially like to use brain teasers, but if you look at Microsoft's test of GPT4.0, in fact, in terms of reasoning ability and the ability to think very long chains, GPT4.0 is far ahead.

Then again, the journey of a thousand miles begins with a single step, it turns out that I don't know what pre-training is, I don't know what fine-tuning is, I don't know what user reinforcement learning is, and I stepped on a lot of pits. , and from tens of billions of parameters to tens of billions, hundreds of billions of parameters, now finally make a usable thing, although there is a gap, but it is also a very big progress.

Domestic research and development of large models full of calculations, the start late after the Spring Festival began to do, to now is only three or five months, once again proved that the large model is indeed much simpler than the lithography machine.

However, yesterday, I saw NVIDIA's Huang Jenxun release the GH200 chipset, and I think the gap has widened again.

Luo Yihang: That's why I asked whether the gap is getting bigger or smaller, this thing is dynamic, one moment bigger and one moment small.

Zhou Hongyi: It depends on the angle. The gap in computing power must have widened, people's memory is about 144TB, the whole model is more than enough to put in, so that there is no need to do hundreds of computer cluster training, the same parameter of the model, the past month to train, now three hours or a day can be trained.

This iteration speed is too amazing, because many times the results of training are not necessarily convergent, and it may be found to be a chicken feather after one month of training, and you have to train from scratch, but a month has passed. Therefore, the training speed of others may be hundreds of times faster than yours, and from this point of view, the large model gap between China and the United States has widened.

I feel the same as you, everyone feels very anxious, three months is like thirty years, every day there are endless results around the big language model, soft and hard, all kinds of framework open source tools have come out. But overall, I tend to be more optimistic.

Luo Yihang: But many peers think that it will take a quick time to catch up with the current level of ChatGPT, of course, ChatGPT itself is also iterating.

Zhou Hongyi: I think the gap is objective, but some people in the industry love to brag, if you want to accurately predict when the gap will catch up, I personally think you should be humble, after all, there are still many people in China who have really used GPT4.0, you can't brag because most people have not seen it.

We should find out where the gap is, and then find the right scenario to give full play to its capabilities while allowing users to have tolerance and understanding of innovation and the difficulties they face, otherwise the bragging is too big, everyone has high expectations, and the result is very disappointing.

Dialogue with Zhou Hongyi, founder of 360 Group: Making a large language model is much simpler than making a lithography machine

The opportunities in the vertical are far from being presented

Luo Yihang: Who are you more optimistic about among your peers? Is it a bigger giant than 360, or a startup?

Zhou Hongyi: I think each has its own advantages, the most important thing is that China will not be satisfied with only one GPT4.0, nor will it have only one big model.

Now there is a trend to make large models small, as small as a machine installed with NVIDIA 3090, 4090 chips can be fine-tuned, and even deployed in IoT (Internet of Things) devices in the future, which determines that large model computing power structures will be ubiquitous in the future.

When computers first came out, some people also concluded that people around the world needed five supercomputers, but what really changed the industrial revolution was personal computers, entering every home, entering every company, installing different software to do all kinds of things.

Luo Yihang: So many vertical models will emerge?

Zhou Hongyi: I think it will be faster than I think.

People need to change their minds and don't use GPT 4.0 as a benchmark. , just like a Harvard postdoc, the level is very high, obtained more than a dozen doctorates, we want to replicate a short term a little difficult, but this does not prevent us from training a 211 or 985 undergraduate student, his scenario goal is very realistic, that is, directly train vertical business.

Because you use GPT you will find a lot of questions that seem to be answerable, but lack industry depth because it is too generic. If you look at the big language model as a productivity tool, I'm very convinced that there are many opportunities in the vertical field that are far from being presented. If you really use ChatGPT 4.0 to see a doctor, do you dare to take the prescription? If you really write a lawsuit entirely in ChatGPT, wouldn't that be going to happen? One example in the United States is a company that has built a big model of the legal industry, it can't sing, can't write poetry, can't do brain teasers, but can answer legal questions.

Therefore, many enterprises need their own customized ChatGPT.

Luo Yihang: With the development cost, deployment cost, and training cost, right?

Zhou Hongyi: This is no secret, because the difficulty has been reduced a lot, so thanks to open source, all kinds of large models are equivalent to fresh brains for you to choose, at least in the short term, knowhow (knowledge and cognition) into data into pre-training, or according to the characteristics of the industry, for the future industry may use the ability to fine-tune.

Training data is now as important as training methods, and thanks to the open source ecosystem, these methods are gradually being revealed. The cost of owning and deploying a large model is drastically reduced, but if the goal of a large model is to compete with GPT4.0 or even 5.0, the investment is still huge and the competition is still fierce.

Not long ago, Samsung employees put the company's confidential data on ChatGPT to train, resulting in data leakage, which is a typical example, whether it is domestic or foreign, there may be a data leakage problem. Now many companies are trying to train their GPT because they have saved a lot of security internal data, which is their eating guy, can they use it to train a general large model? No way.

Therefore, we can only train the company's proprietary GPT to make the big language model understand the industry and the enterprise better. In the future, this market should have very large scenarios and opportunities.

This afternoon, I will attend a conference of our company that combines a large visual model and smart devices. Everyone is now talking about software APP, the AIoT of intelligent hardware has not been really realized, and there is a very huge opportunity in this regard after the large model really becomes artificial intelligence, which will be another important application scenario.

Today's intelligent networked cars, such as Tesla, should they use large models? Definitely, but the speed of the car is fast, the need to respond in time, the large model may not be able to respond in time if in the cloud, and the cost of installing a large model on the car will not be particularly high, it may be the cost of an NVIDIA 3090 processor.

Luo Yihang: It is the car that pursues precision, and the big language model cannot be accurate.

Zhou Hongyi: So only proprietary vertical large models can solve the problem of so-called illusions.

Luo Yihang: Only proprietary large models do not talk nonsense.

Zhou Hongyi: Big model nonsense is not unique, everyone has this problem. For example, you ask all GPT big models, how many movies has Tom Hanks acted in? It will definitely end up talking nonsense to you in a bunch of movies that Tom Hanks has not acted in, or even that does not exist at all. I have been thinking about this knowledge vaguely for a long time, which is an inevitable problem of generative intelligence algorithms, which pay more attention to the learning of knowledge paradigms in the process of imitating human learning knowledge, and do a lot of compression on the details of knowledge, even lossy compression.

Luo Yihang: The process of learning from humans is the opposite.

Zhou Hongyi: It's just part of the human learning process, and the response to knowledge ambiguity must rely on search ability. Large models do not replace search, on the contrary, a powerful search, whether based on full-text search or vector databases, can bring two corrections to large models. One is the problem of not knowing "what year is tonight" caused by the delay in training time, and the other is the ambiguity of knowledge, and many ridiculous problems can be easily solved if they are assisted by search capabilities, both on enterprise-level and professional-level large models.

Luo Yihang: Can you say that the emergence of general large models is for the emergence of more small models that solve specific problems in the future?

Zhou Hongyi: Yes, including Hugging Face, some new computing frameworks have recently emerged, which have nothing to do with the model. One idea is the main driving idea, because the big model is good at understanding language and can communicate with people, so that after understanding the intention of people, the big model can call many other application systems or other small models. Why do we have to let Harvard professors with a dozen PhDs do everything? It is only reasonable for a dozen employees of different professions to do different things with different small models and different training methods, and finally a large model is needed to coordinate them.

There is also a cost problem here, maintaining a large model of hundreds of billions of dollars, not to mention the high cost of pre-training, that is, regularly organizing training every quarter, doing some fine-tuning, the cost will be very high. If it is a vertical model proprietary to the enterprise, it may be a model with 6-7 billion parameters or 10 billion parameters, the maintenance operating cost will be very low, and the change will be very fast, so the large model is not a panacea.

Recently, Hugging Face also has an "Agent" mode, that is, there is an external "Agent" to call the large model to complete work planning, decomposition and landing. There are many programs around large models, and there are also many small models, small applications that can make up, and they are glued together like glue. Large models are not a panacea, but through various modes together, you can learn from each other and do what you are good at.

Dialogue with Zhou Hongyi, founder of 360 Group: Making a large language model is much simpler than making a lithography machine

Don't be too anxious about the so-called "big model ecology"

Luo Yihang: It is more reliable to pursue the accuracy of data and data feedback in a certain industry or field, and implement it on the application or small model.

Zhou Hongyi: We found a feature in the process of training large models, that is, the data type must be very uniform, mixed with liberal arts, mixed with science, if recently crazy physics problems will appear forgetting phenomenon, many skills will drop sharply, I guess it is related to the parameter changes of internal probability and statistics during the training process.

To make a large model to meet the long-tail needs of one billion users, one moment can calculate high number problems, one moment can give life answers, one moment can write a poem, the other can write BMW car advertising copy, this is very difficult.

Why am I advocating that China take the road of big model? Maybe I didn't catch up with GPT overnight, it would take a few years, but I can't say that if I can't catch up with GPT, our products are garbage, so I don't do it, if I don't do it, I will never catch up with others, but to reach the level of 60, 70 or even 80 points, in many verticals, this weakness becomes less important.

Luo Yihang: To use an inappropriate analogy, in fact, the general large-language model is more like Socrates and Plato, because the prophets of that era knew everything, physics, mathematics, chemistry, including Aristotle, and the models we expect to be more useful to humans may be Qian Xuesen and Yuan Longping, which are very specialized in specific fields.

Zhou Hongyi: What I said is more extreme than you, the big model is like the personal computer in the past, it is a general architecture, plus different software, different environments, can do a lot of things.

I don't think you should be too anxious about this ecosystem to build a large model ecological environment today, because now everyone has not figured out how to do the big model, do you expect the ecosystem to come out directly and directly consider doing the ecosystem? I think it's a little too much of a rush.

Now everyone has not even officially released it, only after the release. The application of large models here is not only in the to C scenario, but also I personally think that everyone should also pay attention to the application of large models in the field of to B. Like you to study prompt, research large models, the threshold and difficulty is still relatively high, although individual consumers can also use it but analyze the annual report of a listed company, read a paper, individual consumers can also use, but ordinary users will really use? Most people don't have this desire to learn and the need for analysis.

The biggest significance of the big model is to use it as a tool for enterprises, countries, and industries to improve productivity.

I would like to give a suggestion, in fact, there is no need to do it.

Luo Yihang: Is it not necessary to make a big model or is it not necessary to make a big model for play?

Zhou Hongyi: There is no need to make a model, just build a personal GPT of Luo Yihang, hang a 360 dashcam on it, write down all the places you go every day, stand on the stage for an hour, write them down digitally, link all these data to train for two years, you can train a proprietary version of Luo Yihang's GPT, train for two years.

"The Wandering Earth 2" depicts digital images, copies your life on a USB flash drive, and then plugs it into a supercomputer, which seems to be called W500 in the movie. I just started thinking that isn't this a fantasy? When I see GPT, I find that this is completely possible, and when I "kill" you, won't you be immortal? Because we can still communicate with your data avatar, and your data avatar can stand on stage and answer my questions.

Luo Yihang: You'd rather see my doppelganger than see me in person, would you? Back to the question just now, you still didn't answer who is more optimistic about the big model in China.

Zhou Hongyi: You asked me which GPT I am more optimistic about, I think it doesn't matter what I am optimistic about, all companies have their own advantages, especially why do Chinese Internet companies have to do it themselves? The first difficulty is not so high, and the second represents the future of artificial intelligence, so it is not possible to use other people's APIs, you must understand others.

GPT prioritizes solving the problem of NLP (natural language processing), NLP is the crown jewel of all artificial intelligence, and whoever understands language thoroughly will truly understand the world and become the foundation of other artificial intelligence tasks in the future. The biggest innovation and indication of OpenAI is to guess and predict all text sequences as a sequence, so now processing vision, processing sound, is basically the same idea. Using large models to do multimodal effects is much more advanced than the original CNN (convolutional neural network) and DNN (deep neural network) gameplay.

If we look at everything as a sequence, then from robots to autonomous driving, it is possible to combine large models with other models to form technological breakthroughs. DeepMind's analysis of proteins, and even human analysis of gene sequences, large model algorithms have the potential to help form breakthroughs, because gene sequences are also a sequence. Even big model tools may become tools in the hands of mathematicians and physicists in the future, helping people to study cutting-edge technologies.

Luo Yihang: Now everyone is talking about generative generative artificial intelligence, or Predictive predictive artificial intelligence, behind them is a set of general large model base, right?

Zhou Hongyi: Right. At present, many open sources that appear on the Internet are no longer a "clean" thing, but a lot of data sets that have been pre-trained, although the ability is not high, but General knowledge is there, and the rest is to fine-tune themselves, just like developing in a professional direction. Recently, there is also a trend of open source in China, so I think I still have to thank open source.

Yihang Luo: What do you think of the current trend of open source? Recently, there are also many Chinese developers in the open source community, but what aspects are not enough?

Zhou Hongyi: Foreign countries want to open source because it reflects the strength of more people under the conditions of market economy, and the gathering of more people, because it cannot be supported by one company or a team. I think the development of Meta's LLAMA series is to stimulate each other, maybe an open source project only 40 points, but stimulate you will do 60 points, the other may achieve 80 points, the country has not yet done is the habit of changing open source to their own use.

Yihang Luo: No longer return their work to the open source community.

Zhou Hongyi: It may take a process.

I felt like three months was like thirty years

Luo Yihang: Because of the emergence of big language models and artificial intelligence such as General AI, what do you think will happen in the next five years?

Zhou Hongyi: Why do you always think about things that are so long-term? I felt like three months was like thirty years.

Its implementation is very clear in my strategy, no matter how good the people must be in line with the general trend, China's general trend is industrial digitalization, and Internet digital companies are actually supporting roles in it. I think 360 is to do two things well: one is digital security, network security alone is not enough, there must be data security. The other is artificial intelligence security, which is the most complex and the most concerned now. There may also be many government departments in our country who are concerned about these things, and today we cannot answer whether artificial intelligence will cause large-scale social problems after becoming a new species after generating consciousness.

I think artificial intelligence is no less significant than computers and the Internet, a new industrial revolution, and the pinnacle of digitalization. Everyone finally went to the cloud with big data, but this is not the end of digitalization, we have to pour big data into the big model and turn it into a general intelligent service to empower hundreds of industries like electricity.

The various scenarios that 360 is doing now are worth redoing with artificial intelligence, so it will do both existing and incremental scenes.

Luo Yihang: Not only the stock, but also the increase.

Zhou Hongyi: It is to build enterprise-level and industry-level GPT, including GPT for small and medium-sized enterprises. Directly to small and medium-sized enterprises large models they will not use, must go through SaaS packaging, so it is security and digitalization two legs.

Thank you for giving me this little ad time.

Luo Yihang: Advertising time still needs to be given, but today I am more grateful to Mr. Zhou, who has iterated so quickly after a few months, and has quickly come out of the myth of big language models. Americans want to talk about democratized artificial intelligence, democratized big language models, if we also apply them, the most important thing is to let everyone use what is really useful to themselves, make it really easier for every developer to use, and everyone who wants to build their own model, really build a model that meets the needs of their own enterprise, industry, and industry.

Zhou Hongyi: This is technological equality, and I have been thinking about how much change can GPT bring about by this big language model? Our generation, like Boss Huang Jenxun Huang of NVIDIA, experienced the advent of the PC forty years ago, and you will find that when the computer was invented, it did not bring about an industrial revolution, even if it is powerful, it is only a tool for the army to develop nuclear weapons, a tool for the weather forecast of the Meteorological Bureau, a tool for the government to do demographics, and has nothing to do with ordinary people.

When did the Industrial Revolution occur? In fact, it is equal rights in science and technology, and PCs have entered thousands of households. The same is true for mobile phones, because of smart phones, today a homeless person can also take out a mobile phone to swipe a short video, find you to swipe the QR code to pay. How powerful a thing is depends on whether it can penetrate into thousands of households and industries, and it turns out that big data does not have this ability. The company may also have big data, but it is difficult to use it directly, and someone needs to help you analyze it.

The big language model solves the problem of using big data and analyzing big data, and creates a general artificial intelligence question and answer ability and even writing ability and discussion ability, which makes it irrelevant to the industry, can empower hundreds of industries, empower thousands of households, I think this is definitely an industrial revolution-level invention.

Luo Yihang: Finally, how do you comment on Huang Jenxun's current business and NVIDIA's trillion-dollar market value?

Zhou Hongyi: It's hard for me to evaluate this, when I was most familiar with Lao Huang was when he was most frustrated, at that time he had been looking for a way for NVIDIA's microelectronic chips, when NVIDIA tried to enter the mobile market, but his mobile phone chip was too hot, hot to fever.

I was with an entrepreneur who wanted to use chips to make home game consoles and enter the console game market, so I went to Silicon Valley to meet Lao Huang, who took this matter very seriously and invited me to eat a steak meal, but it turned out that console game is a very unique market, and only Sony and Microsoft and Nintendo have succeeded in history.

Lao Huang was actually very confused about the business for a while, hoping to find a way out for the business, so he also showed great respect and respect for the two entrepreneurs from China. I think Lao Huang's success today is not luck, but perseverance. According to the situation I envision, large models are everywhere in the future, and if you need NVIDIA machines, the demand is of course great.

In the past few years, we have been building supercomputing centers, but many supercomputing centers are idle, because it cannot do general-purpose computing tasks, lacks a general-purpose computing architecture, if they are replaced with NVIDIA's A100 or A800, it is guaranteed that the business will be much better, because the world's demand for NVIDIA is still very strong.

Read on