Zhou Hongyi: Big models need to "emerge" AI capabilities, and there are still some "pits" in China that have not been stepped on

"At present, the real gap between China and GPT-4, I think is mainly in the so-called super 'emergence' ability, but this gap is not an algorithm, nor a model gap, but a gap in pre-training data and training methods, there are still some 'pits' that have not been stepped on, but this time difference is less than half a year." On May 31, in an interview after the 360 Vision Big Model and AI Hardware New Product Launch, Zhou Hongyi said to Leifeng.com.

"Emergence" is often mentioned in the field of artificial intelligence. So what is intelligence emergence? In the past, artificial intelligence was to teach a machine whatever skills it wanted to learn. It is possible to teach it, but it is not to be taught. The big model allows AI to learn to "learn without a teacher", that is, "emerge".

The industry generally believes that 50-60 billion parameters is a threshold for whether large models have the ability to emerge AI. As a result, hundreds of billions of parameters have now become the "standard" of large models, and many large model products now call themselves "hundreds of billions of models", but there are very few models that can really empower the industry to improve productivity.

So what does Zhou Hongyi think about the "emergence" ability? What is the relationship between the 360 intelligent brain model and the visual big model? How does 360 use the big model to empower the industry? In the post-conference interview, Zhou Hongyi had an in-depth discussion with a number of media including Lei Feng.com.

Zhou Hongyi: Big models need to "emerge" AI capabilities, and there are still some "pits" in China that have not been stepped on

Talk about "emergence": It has nothing to do with model size

Zhou Hongyi believes that there is no unified statement in the industry at present, some people think that 100 billion parameters have the ability to emerge, and some people think that 30 billion can be. This actually has nothing to do with the size of the model, but with the pre-trained data and training method. This is like a child's mind is not smart enough, the brain capacity is not enough, and he will definitely not be able to learn. However, the brain capacity has a lot to do with your learning method.

In Zhou Hongyi's view, the current time for these manufacturers in China is less than half a year, the long one is 5 months, and the short one may be 3 or 4 months. Therefore, in such a short period of time, being able to come up with something that is basically comparable to GPT-3.5 is already a big improvement, and it will take some time to shorten the gap between the two.

He believes that the time to catch up with this gap may be half a year, and in this time, there are basically many methods and training models in training, and everyone steps on the "pit" is almost the same. The ability to emerge has a lot to do with the knowledge content of pre-training, because Chinese data is still generally lacking in high-quality knowledge data, and a large number of high-quality materials must be supplemented with English language. For example, if a child is raised reading articles similar to the story club, without logical reasoning, the probability of his complex logical reasoning ability is very low.

Talking about the visual model and the brain: from perception to cognition

If given a statue of the Mona Lisa with muscles all over the body, ask him what is weird? Traditional perception level computer vision may recognize a portrait at most, not necessarily recognize the Mona Lisa, even if you recognize the Mona Lisa, you can't feel how a female Mona Lisa has a man's unicorn arm, and the 360 visual large model can interpret the meaning, which is a change from perception to cognition.

Zhou Hongyi said that the visual big model and the language big model are two different foundations, first of all, there must be a large language model, the big language model can fully understand human knowledge and understand human natural language. On this basis, give a lot of graphics and text, and then train, the visual large model can in turn strengthen the ability of the large language model, such as asking and answering pictures, laying a good foundation for the next step of understanding vision.

He believes that the visual large model is a vertical large model, in the past to train a photo is a cat or a dog, first of all, you have to do a lot of artificial labeling, and even if you identify whether it is a cat or a dog, it is matched according to the image you label, it does not understand what is going on, it does not know what dog means, what cat means. Therefore, now on the basis of the large language model, it can understand natural language, and in the process of recognizing the picture, the picture not only does the recognition of objects, but also can do a lot of semantic interpretation. For example, a child standing on a tall cabinet, or an elderly person lying on the floor, can identify unreasonableness and give early warning, which is the ability of multimodality.

Talking about the reasons for choosing the combination of AI and hardware to land, Zhou Hongyi said: "The original AIoT is only vertical AI, not general AI, and AIoT empowered by large models is 'real AI'. ”

In the past, artificial intelligence was weak artificial intelligence, and the intelligent hardware built on this basis did not have real intelligence. After the advent of large models, computers for the first time really understand the world and can give AIoT real intelligence. He said that the emergence of large models marks the arrival of general artificial intelligence, and AI has completed the evolution from the perception layer to the cognitive layer, which is not only a subversive revolution for traditional artificial intelligence, but also can promote the development of autonomous driving, protein computing, robot control and other fields.

"Big models will bring a new industrial revolution", Zhou Hongyi believes that all software, APP, websites, all industries are worth reshaping with large models, and intelligent hardware is a hardware-based APP. From the perspective of the development trend of large models, multimodal is the only way for the development of large models, and the most important change of GPT-4 is that it has multimodal processing capabilities. Therefore, Zhou Hongyi predicts that the combination of multimodal large models and the Internet of Things will become the next outlet.

Talking about AI security: not developing is the biggest insecurity

With the application of AI technologies such as GPT, it is not uncommon to use fake audio and video such as "AI face change" and "AI voice change" to commit fraud and slander.

Zhou Hongyi believes that the safety of AI must be paid attention to, he said that 360 has now also set up an internal special AI security team, the Ministry of Science and Technology has also given 360 an AI security technology platform, 360 undertakes the mission of solving AI security problems, but this problem is more complex than ordinary problems. On the one hand, AI minimizes the requirements for ordinary people, and AI can easily be used to do bad things, so if you want to fight, you must increase the cost of crime and counterattack, such as adding fingerprints to AI works. On the other hand, Zhou Hongyi said that the security of AI is not only these, in the future, in addition to traditional network security, we must also be vigilant against data security and artificial intelligence security. Because it is possible that in the future, AI will form super AI capabilities, which will produce consciousness and self-awareness.

So why does the 360 have to be a big model? Zhou Hongyi talked about two points: first, not developing is the biggest insecurity, because AI is an industrial revolution, we can't choke on food because it has some safety problems; Second, the process of making a large model is to understand its principle and the entire complete process, rather than treating it as a black box, so as to propose a better safety scheme in the process.

Talking about the overall idea and role of the 360 layout model, Zhou Hongyi said that 360 will do two things well:

First, the digital security foundation, 360 security has a relatively mature solution, the future not only to solve network security, but also to solve data security and artificial intelligence security.

Second, in the digital era, the big model is the pinnacle of digitalization, from digital to intelligent, so who does not master the core technology of the large model in this era, without the use of the actual scene of the large model, who will be eliminated by the industry, this is the industrial revolution, just like with electricity, steam engine, computer, basically all business has to be reshaped again. Therefore, as an Internet company, we have a lot of digital technology accumulation, and all big data must eventually be used in the big model.

Talk about big models and scenes: A big model without scenes has no vitality

Zhou Hongyi believes that first of all, we must firmly grasp the core technology of the large model in our own hands, not only to build it ourselves, but also to cooperate with partners; The second is to grasp the scene, artificial intelligence is not a closed-door car, only combined with users and scenes, and large models without scenes are lifeless.

He said that the application scenarios of 360 models have been very clear, mainly divided into four ways: First, ToC consumer scenarios, mainly browsers, desktop and search, mobile browsers and other stock scenarios, around the core capabilities of 360 Brain to create everyone's personal assistant, in this regard, 360 goal is to ensure that it is in the top three or so; Second, it has built a SaaS store, which will be upgraded to an AI store in the future, opening up large-model APIs for ecological partners to provide SaaS services to small and medium-sized enterprises; Third, to build the exclusive GPT of enterprises, governments and cities, there will not be only one large model in the future, the public large model has data security problems, and the proprietary or private large model is more in line with the user scenario; Fourth, work with industry partners to build vertical GPT in the industry, such as GPT in the enterprise consulting industry and the IoT industry that can empower the IOT industry.

Talking about the future development of large models, Zhou Hongyi said that large models must be miniaturized, lightweight, and rapid in the future, including training is pursuing automation.

"The market for large models is very large, if everyone says I want to kill you, you have to kill me, in order to compete for who is China's ChatGPT, then the market is very narrow, if the large model is used in vertical fields, industry fields, enterprise fields, in fact, the requirements for the ability of large models are reduced." Zhou Hongyi proposed that for example, training a special GPT in law, medicine, and education is much lower than training a general large model.

Zhou Hongyi finally said: "Everyone was shocked when GPT just came out, carefully calm down and think, if it can really become a productivity tool and can be used by us, it still has to take the road of verticalization." ”