laitimes

Zhou Hongyi: The advent of Sora is a wake-up call to the industry, and there is a huge gap between China and the United States in the field of AI

Zhou Hongyi: The advent of Sora is a wake-up call to the industry, and there is a huge gap between China and the United States in the field of AI

Zhou Hongyi: The advent of Sora is a wake-up call to the industry, and there is a huge gap between China and the United States in the field of AI

Titanium Media App learned that Zhou Hongyi, a member of the National Committee of the Chinese People's Political Consultative Conference and founder of 360 Group, focused on responding to the AI (artificial intelligence) focus on the outside world at an event on February 23.

Zhou Hongyi said that compared with the United States, it is an objective fact that there is a gap between China and the United States in AI technology.

"I said that there is a gap between China and the United States, and I have always insisted on saying this, and only when I see the gap do I know how to catch up, if you don't admit the gap, we are already far ahead. We're going too far. In the results of the list, the domestic large models basically monopolized the Top1 to the Top10, and GPT-4 was brushed to the top 10. But the advent of Sora still gave us a bucket of sober cold water. Zhou Hongyi said.

However, Zhou Hongyi also mentioned that the gap between China and the United States in AI is mainly reflected in the direction, and once the direction is correct, domestic companies will immediately catch up. Whether it is the Transformer model adopted by Sora or Sora itself, the essence is software, "I think the current lag can be solved in about one to two years." ”

Talking about Li Yizhou, the "Internet celebrity with AI with classes" who has been deeply involved in controversy recently, in Zhou Hongyi's view, people really need AI science education, "He made a big mistake, that is, he shouldn't charge for it." Zhou Hongyi revealed that he will launch a free AI course in the near future, hoping to explain the most profound technology in the most superficial language and do a good job in AI science popularization for everyone. Specifically, he announced the relevant matters on February 29.

"AI science is very important, although everyone is talking about AI, they are actually full of fear of AI, thinking that AI will bring mass unemployment, in fact, AI is man's best friend. Zhou Hongyi said that at present, the entire large model track has not yet begun to make money, and the only thing that can make money now is "NVIDIA", in addition to Microsoft, OpenAI is losing money.

"According to the idea of disruptive innovation, a disruptive innovation thing is not perfect, it has great shortcomings, and its greatest value is to lower the threshold for use. Zhou Hongyi predicts that the future dividend period of the large model will be at least ten years.

Zhou Hongyi emphasized that the AI model is definitely an industrial-level revolution. "This is just the beginning, if everybody makes a lot of money, just like Internet companies are making money today, the industry will be 'mature'. ”

The following is a summary of some of Zhou Hongyi's response to AI:

Q: How do you see Sora, how big of an impact will it have on the industry, and will there be a competitive landscape similar to the previous 100-model war, and what do you think?

Zhou Hongyi (hereinafter referred to as Mr. Zhou): The technical principle of Sora has been discussed a lot abroad. Like today, Stable Diffusion posted something similar to the architecture of open source. I often say that as soon as people open source, our technology will progress. So, further down, the domestic may be a little weaker in the original innovation from zero to one, but once OpenAI announces the technical direction and announces the product, I think the domestic imitation will soon follow. So, there must be a lot of people who make similar Wensheng videos and similar tools, and I think there will be the situation of the 100 model wars you said.    

Q: Recently, there has been a lot of public opinion and controversy about AI training, I don't know what you think, and how do you think this industry is regulated?

Mr. Zhou: There are two things I am sure of.

First, the popularization of AI has become very critical in China. Because I have met a lot of people, although they are talking about AI, in fact, everyone has a fear of AI, and this fear is rhythmized by some accounts on the Internet, thinking that AI will lead to large-scale unemployment and AI will bring about the collapse of the industry. You go and use AI to know that AI is the best friend of humanity that humans have ever invented, and the best tool that can unlock skills for many of us, that can make us become. For example, I couldn't draw in the past, or I couldn't be a director in the past to make videos, I can unlock this ability. Therefore, especially for young people, AI allows you to go from a very junior person to a person with experience to immediately stand on the same starting line as those who have experience.

Second, I have always believed that AI will not bring about the collapse of the industry, or make any industry (be) subverted, it will actually bring positive impetus to the industry, such as the short video industry, the film and television industry, and the advertising industry, and only those who do not use AI will be eliminated by those who use AI.

But many people say, it's useless for you to talk about these truths, now as long as you set up a camera on the Internet, pretending to talk in front of the camera, including myself, there will be a lot of people who believe it, and everyone can't confirm who is right and who is wrong, so I think the most important thing is that AI must be used by itself and must be reduced. Including I also told the bosses of many enterprises why they should adopt a concept of AI in the enterprise. Enterprises need to use AI from top to bottom, from the inside to the outside, and only after using it can we know where its strengths and weaknesses are, where its boundaries are, what its advantages are, what its shortcomings are, and avoid AI phobia or AI omnipotence, both of which are wrong.

After everyone uses AI, they have a popular science about AI, so that they can better know how to embrace AI. Therefore, I think it is right to engage in AI science education.

Everyone needs popular science education, so I think AI popular science education is very important, but I think he made two mistakes: first: he should be free, and second, he does not have his own AI products behind him, it seems that many foreign products are made into a shell, I will not comment on this, this is definitely a problem.

So, I've been thinking lately, I'm going to do a free AI class, do you think I'm good at teaching, but I'm definitely not going to charge for it.

Q: We'd like you to explain a little bit more about how we understand this gap, or what are the core reasons behind the widening of the gap?

Mr. Zhou: First, I said that there is a gap between China and the United States, I have always insisted on saying this, only when I see the gap do I know how to catch up, if you don't admit the gap, we are far ahead, we are ahead of the curve, we are too leading, you have to think so every day, not that some companies have accurately predicted when to exceed GPT4. Everyone makes a big model and goes to brush the list, and you all know the game of brushing the list, and train the exam question in advance. Of course, in the results of the list, the domestic large model can basically monopolize the Top1 to the Top10, and GPT4 has been brushed to the top 10, but the advent of Sora still gives us a bucket of sober cold water, which makes people see that there is still a bit of a gap.

Second, I think GPT has some secret weapons in its hands that have not been revealed. They argued for a long time during OpenAI's "Gong Dou" last year, and now GPT5 is already gaining momentum, and the question of whether GPT-5 is released or not depends entirely on Ultraman's mood and his sense of rhythm. When does Ultraman post something? When Google wants to do something, or when Meta wants to do something, he sends something. So, considering their confidence in AGI, I think the gap between us and them in terms of the originality of artificial intelligence is mainly in the direction of originality.

As we all know, the most difficult thing to engage in technology is to find an original direction. It turns out that artificial intelligence, deep learning, neural networks, I feel a little emotional, do you like to listen to it. Including META's Yann LeCun (Yann LeCun) is desperately attacking Sora, GPT, and Transformer models.

In fact, the Transformer model was not invented by OpenAI, but OpenAI was the first to choose a new usage, which is to infinitely add the size of parameters, the number of connections for attention, and the number of layers of neural networks, that is to say, they believe in a phenomenon called violent aesthetics, that is, as long as it is a miracle with great force, therefore, many models that competed with Transformer at that time, such as T5 and BERT, everyone in small data, Transformer is not the best in terms of performance in the case of small parameters, but as long as the parameters are increased, only Transformer can support unlimited scale-up. Therefore, so far, the Transformer model has at least been verified, and it is the best effect so far.

So, do you understand that? This is definitely the right direction. OpenAI is on the right track.

Secondly, Transformer is to put text, text is a one-dimensional data, a word before and after another word, only before and after the relationship. They now use a similar method to deal with images, the picture is two-dimensional, one pixel, and has the relationship between the X and Y axes. Video is 3D data, why?In addition to the position of a color block on the picture, it has a relationship of moving according to time or deforming according to time, therefore, the appearance of Sora this time has a huge technical achievement, OpenAI uses the Transformer architecture to successfully realize the transformation of various texts, pictures, The normalized processing of sound and video, coupled with Transformer's own understanding of semantics and knowledge, makes it possible to make Sora this time by integrating the capabilities of GPT. It also made a Wensheng diagram called DALLE, which also incorporated the capabilities of DALLE, so it was much better than Pika or Runway, which just used the Diffusion model to replicate pixels. At the beginning, we didn't know what architecture to use, some people used Diffusion to do it, that is, Pika, Runway, this kind of concept, that is, the animation is seen as a multi-frame picture, drawing one by one, but it does not use the Transformer model to do it. So, in terms of this kind of directional innovation, OpenAI has done a very good job, and I think the gap is mainly in this.

I've just gotten into a bit of technical detail. First, the gap between us and others is mainly in the sense of direction, once the sense of direction is determined, the learning ability and imitation ability of Chinese companies will be very fast, and you can imagine that there will be people who will immediately poach those people in the Sora team, and some of these people will come out to start a business, such as peers will release open source things, will release some public papers, so many of these methods will soon be leaked or shared, and it is not a difficult problem for the Chinese team to follow up. But what shocked me the most was that Sora said that it produces video is a by-product, and in the process of doing this, it suddenly found that through the study of a lot of video materials, not only did I learn how to draw patterns, but most importantly, it had to draw real videos that were in line with the common sense of each of us, and it had to understand the interaction between many elements of the world, and I don't know if you understood this meaning.

So, let me give you another example, if Sora opens a test account, if you can get it, you can do more experiments for me, let Sora draw a basketball game, let Sora draw a football game. If a person has not watched basketball and football at all, does not understand that basketball and football have different trajectories, and different scoring rules, it will not be able to draw. For example, when a basketball hits a rebound, it will bounce off and won't pass through, but when it hits the basket, it will fall vertically, and the basketball will bounce off the ground. If Sora doesn't master this knowledge, you can imagine that if a person hasn't seen it and hasn't summarized this common sense, it will be almost difficult for him to reproduce it.

I'm going to use this example to explain why Sora brings people one step closer to AGI? GPT actually solves the problem of mutual understanding and interaction between machines and humans because it understands language. When he understood the language, Yang Lekun once attacked, saying that it didn't understand, and it would just fill in the blanks. But in fact, once the human language is understood, it means that it is very much progress, because language is a unique invention of human beings, human beings can use language to describe logic, human beings can use language to describe the model of the world, and human beings can use language to describe the accumulated knowledge of human beings. So once you understand the language, it means that the first hurdle of AGI has been overcome.

But intelligence is not of much use just because it can speak. Because it doesn't know a lot of the laws of the world, for example, if you get a robot and you want the robot to go to the refrigerator and get a scrambled egg with tomatoes, you find it difficult to train. Because it wants to know that tomatoes are hard and can't be broken, and eggs will break when they fall, and he has to know how to knock eggs, and this knowledge is not enough to rely on text knowledge, and it must be seen before it can be known like us humans. So, this time Sora is equal to intentional or unintentional, and I personally think that it may have been done unintentionally for OpenAI, and after a miracle, he found out that he actually made the machine interact with the world through Sora's training method.

In the end, he used the Diffusion model to just make a video of what he was going to do, but it must have understood some of the rules of the world in the Transformer model, I don't know if you understand what this means? Because I made a comparison, a picture of a cat scratching the owner in the morning to eat, and the owner turned over in bed, have you seen that? You may only pay attention to the cat and the owner, but not the pillow. You know how soft the pillow feels when the owner rolls over on the pillow, the pillow is crumpled. If you use computer special effects to do this, this is definitely a nightmare, what function do you use to describe the collapse of this pillow, to describe the wrinkles of this pillow, everyone will not do it really. But Sora's limited computing power, it must have seen the feeling that the bed is similar to the quilt and pillow, so it can redraw this feeling. So, I think that's the best thing about Sora.

Why do I say that its real contribution in the end is to general-purpose robots and autonomous driving? You give general-purpose robots and autonomous driving the ability to interact and perceive the real world, and your understanding of the world is one step closer than understanding language.

So, Sora's breakthrough in AGI this time is a great breakthrough from a human perspective.    

Q: Are there any other unique advantages for China that are worth expanding in 2024? How do you see China's prospects in AI this year?

Mr. Zhou: I think China still has an advantage. Although everyone is now one-sided, they always think that because we are lagging behind in original technology, it is an objective fact. However, this backwardness, the only optimistic one, is not as big as the gap between lithography machines and chips. After all, whether it is a Transformer model or a Sora, it is still software in nature, so I think this backward time, which I think is about a year to two years, can be solved.

But, on the other hand, you don't need to wait until GPT-4 is fully catching up, fully catching up with Sora, before we can go for it. So, now there is a main line to do super general models like Sora and GPT4. It's a thread. In 2022 and 2023, China will catch up well, and it will take less than a year to catch up with GPT-3.5, which I think is still OK.

In 2024, I think it should be the year of application. Otherwise, how would you feel about GPT? It is that you can write poems, have fun, and solve Olympiad problems, but you are still too far from work. Or can help us with some work in the office. SoraEveryone is shocked a lot, because Sora is one step closer than GPT, and everyone obviously feels that short video generation can do something specific in the film and television industry, game industry, and advertising industry, but it is still a general tool. I think in 2024, in addition to these two things, in 2024 in the enterprise side of the vertical, the big model is promising.

The large model is supposed to truly produce an industrial revolution, and the large model must enter hundreds of industries and be combined with the business processes or product functions of many enterprises. We generally talk about making a general large model that surpasses GPT4, and it is really difficult to surpass GPT4, but GPT4 is an all-rounder, who knows everything, but it is not specialized. But if I have unique business data in a certain business domain, I can train the large model well in a vertical domain, and combine the large model with many business tools of the enterprise. Just like the large model not only has brains, but also has unique knowledge, as well as hands and feet, then, I think it is completely possible for the ability of large models in some vertical fields to surpass GPT4 on the one hand. And to do a vertical model, I also agree with a point of view, it does not need to be a 100 billion, trillion model, it only needs to be a model of 10 billion, so that it is no problem for many enterprises to bear the cost.

If you are an enterprise, according to one of my predictions, large models are everywhere, and there will not be only one super large model in the enterprise in the future, but there will be multiple small-scale, tens of billions of large models in the enterprise, and each large model will do a scene to strengthen the work. This large model is combined with the business platform of the enterprise, according to this model, the large model is completely affordable for many enterprises now, and it can be used very well.

Therefore, we (360 Company) made a secure vertical model in terms of network security, which was trained with tens of billions of models. Because 360 has two advantages, one is that I have a lot of security tools, which is equivalent to using the capabilities of these tools to enhance its capabilities. The big model can't just move its mouth, it can't just think, it also has hands and feet. There are also a lot of experts accumulated knowledge, 360 security big data accumulated knowledge, we poured all this knowledge into the large model, now these large models have completely replaced the 360 security brain, now in the user to try, encountered APT attacks have been able to automatically find, automatically deal with, automatically report to users. In this case, I can proudly say that we surpassed GPT4, and of course you don't compare the overall ability with others.

Therefore, I think that in 2024, there should be a specific scene in both To c and To B.

Q: I would like to ask, after the Wensheng video, how far away is the video Shengwen from us?

Mr. Zhou: That's an interesting question. Wensheng video is the most difficult, and in the process of Wensheng video, it must be supported by the technology of video life. Therefore, in this Sora technical report, it is responsible for this technical caption, and many people translate it into subtitle technology, which is the technology of video and picture text. OpenAI has moved a lot of video clips from Tiktok, including many movies in the United States for training, and it's useless to just watch videos for it, it must be labeled with videos, and it must be indicated, and the technology of graphics and video biotext needs to be used here, which is relatively easy.

Q: Now that more and more young people want to start AI companies, where do you see the technology unlocking its greatest potential?

Mr. Zhou: Let me simply say that there are three directions for AI at present. One direction is to make a big model itself, I don't think young entrepreneurs can do it, if you want to do it, you should join a big factory to do it, because you don't have enough graphics cards, you don't have enough computing power, there is no long-term investment, these foreign companies are to put it bluntly, now Microsoft OpenAI has to lose tens of billions of dollars a year, the investment is huge, obviously small startups will definitely not be able to do it. Now these small start-ups, I don't name them, even if they raise a few rounds of funding and make a big model, what will happen? Because now the free model has turned this from an atomic bomb into a tea egg, and the big model is almost free. So, I think this path is what you said, and I don't approve of them going.

Second, use the API of the large model to find some applications in the to C scenario, which is equivalent to using someone else's, using Baidu, 360, and Ali's large model as the background, and the large model is equivalent to an agent, providing me with the support of capabilities, and I find some user scenarios, this road is more feasible. But this is not a simple casing, the casing is similar to what you do with GPT, Sora, and Stable diffusion, this kind of casing is worthless, as long as the large model manufacturer upgrades the casing one day, a batch of things will die. For example, I'll give you an example, Sora came out, is it worthless to make a Wensheng video, no. Sora can only do one minute, but if you want to do ten minutes, for example, do you need to have a project management, let it help you tune ten minutes of things to Sora to make ten videos, and finally you have to connect ten videos together, you have to dub them, you have to add subtitles, some places are the result of Sora, and some places are accompanied by your own videos. As long as you find the user's scenario, you can adjust the support of various artificial intelligence powerful APIs, and you can still make application levels, don't always say I'm going to go with Sora in the core technology. So, recently Pika said that it is ready to transform, and if it succeeds, its human-machine interface is more user-friendly than Sora, and it can provide more complex video editing capabilities on top of Sora.

Third, as I have just said several times, from the current point of view, the real role of GPT is to improve productivity, and in China, the state is also supporting more enterprises to do digitalization, so it is completely feasible to enter the enterprise with a large model of tens of billions of dollars at a very low cost, and as long as you are not greedy and greedy in the enterprise, say that I will make you a very grand model, but choose a more specific and micro scenario to solve the problem, then it is very likely to surpass GPT4. But this is where you need to put down your body, because at this time, you must find companies with vertical businesses in certain industries and specialties in what scenarios and businesses you use, and you must cooperate with them. At this time, it is a bit like you do Party B and others do Party A, which is a huge challenge for many entrepreneurs, because entrepreneurs are sometimes more bullish and self-oriented, and may not necessarily be able to listen to the opinions of others.

Q: What is the difficulty of large models now?

Mr. Zhou: First, after this direction comes out, there are also two lines of struggle abroad, one is OpenAI's closed source, and the other is open source represented by META, so the open source route, the goal of open source must be closed source, they will continue to guess how to do closed source, and many things will be open sourced. The advantage of open source is that there are many good companies in the world, and there are many large companies, and individual programmers will actively contribute to open source projects, and everyone will step on the basis of others to contribute results, so its chemistry will be very obvious. I now guess that I have recently found some universities in China, including when I came back yesterday with Professor Zhang Yaqin of Tsinghua University, he was originally the head of Microsoft China, and he also served as the president of Baidu, and now he is a teacher at Tsinghua University, I am not studying at Tsinghua University now, so I also chatted with him, and now many technologies use those models and algorithms should be public, in fact, OpenAI's biggest ability is to find direction, and the other is that the engineering idea is very rigorous, and now it is to explore this engineering idea。 Therefore, in the process of learning GPT before, the direction has been determined, and there is no doubt that you will be able to get through if you go east, but how to cross a threshold and how to turn over a mountain, it also has a lot of specific pits to step on, and there are many specific methods to verify, so it takes time.

Second, my guess may be a hurdle for computing power. Some remarks on the Internet are also wrong, the Internet says that Sora's parameters are not big, only 3 billion parameters, this person makes a mistake, the parameters of the video and the parameters of the text cannot be simply compared, just like the text I have 100,000 words, the video is only 640×480, but the storage capacity of the two is not at all in the same order of magnitude, so, first, it not only has 3 billion parameters, and second, even if there are only 3 billion parameters, the consumption of computing power by video analysis should be far more than making a 100 billion model. Therefore, I think that after the domestic graphics cards are stuck in the neck, the computing power may be a problem. So, including GPT, including Sora, why can it only be done for one minute, I'm guessing, because it's fundamentally different from 4 seconds and 6 seconds, 4 seconds and 6 seconds are all generated with pixels, and without the knowledge of the world, I can't think of what the picture will look like after 4 seconds or 6 seconds. So, if Sora can solve it in one minute, it means that it can do it in 10 minutes, and it can do it in 60 minutes, but why doesn't it do it? I think it's also due to the limitation of computing power, and the limitation of cost.

Therefore, for China, how to concentrate computing power is the following.

Why do I do science popularization repeatedly, if Sora is just a tool for Wensheng video, we will fall behind, not just our advertising lag behind, not just our movies make a little slower, these backwardness will not bring problems to great power competition, will not have an impact on the entertainment industry. But in fact, this event indicates a key node in AGI, so this event is very important to the country.

Q: When will your judgment appear on the bonus period of the large model?

Mr. Zhou: I think the dividends of the large model have not yet begun, and now the only one who can make money is NVIDIA, and the most profitable foreign country is NVIDIA, except for NVIDIA, foreign countries, including Microsoft, do not make money, Microsoft is losing money every year, and OpenAI is also losing money, so it has not yet entered the bonus period. I estimate that Amazon will make money, and cloud manufacturers and hardware manufacturers will definitely take the lead in making money, but I think that if it can be scenario-based, in 2023 and 2024, we may be able to see this opportunity to make money in some scenarios, so the dividend of the large model, this is an industrial revolution-level revolution, and the future dividend period will be at least more than ten years.

This dividend will be very long, at least ten years, and now everyone has not made money, except for Nvidia who has made money, no one has made money, which means that it has just begun, if everyone makes a lot of money, just like today's Internet companies are very profitable, the industry has matured.

Read on