laitimes

Yang Fan, co-founder of SenseTime: New opportunities for AI industry development brought by the wave of big models|WISE2023 Subverts AIGC Industry Development Summit

On May 23, 36Kr held the "Subversion · AIGC" Industry Development Summit. This summit brings together industry forces, discusses the coping strategies of enterprises and industries in the face of change, shares thinking, explores and discovers the most potential enterprises and the most valuable technologies in the industry, and explores the way forward in a turbulent environment.

At the conference, Yang Fan, co-founder of SenseTime and president of Big Device Business Group, delivered a keynote speech entitled "New Opportunities for AI Industry Development Brought by the Big Model Wave". Yang Fan believes that the new round of AI wave has two characteristics: first, the cycle from technological breakthrough to business model innovation is shorter, and technological achievements are used for business and industrial exploration and practice faster; Second, compared with the past decade, the current industrialization of artificial intelligence is easier to transform technical advantages into data barriers and scale advantages.

Yang Fan also expressed his views on the reasons why artificial intelligence technology can make breakthroughs. He believes that although the success of the large model still confirms the violent aesthetics of artificial intelligence's "data, computing power, and algorithms", behind these three elements is actually a comprehensive system engineering. Taking OpenAI as an example, Yang Fan pointed out that how to do a good job in data engineering, how to improve the effective resource utilization of chips, and how to design lower-cost but well-structured algorithms, each link requires expert experience knowledge and system engineering capabilities. In his view, this is the ultimate embodiment of the accumulation of core basic technical capabilities of model-level enterprises, and it is also the key ability to provide AI infrastructure services to the market.

The following is the transcript of Yang Fan's speech (edited by 36Kr):

Hello everyone! It is an honor to be able to share with you today at the 36Kr event some of the industry trends of the big model.

In such a period of extreme industrial change, I would like to share a few views. First of all, when we talk about the big model today, it is not accurately defined, is it hundreds of billions larger, or tens of billions larger? In my opinion, artificial intelligence from 2012 to now, in the past ten years, the model structure has been getting larger, the number of parameters has been getting larger, why now everyone seems to suddenly have a concept, detonating more hot spots? We can see that in 2016 AlphaGo as a representative of the new application, there is a strong correlation between individual consumers, in the past two years, artificial intelligence technology has made new progress and breakthroughs, first of all, these progress, breakthroughs and everyone is more directly related, everyone can directly feel it, second, these breakthroughs do form a greater impact, I think artificial intelligence can complete some other disciplines in the field of scientific research innovation work, whether it is biology, physics, chemistry, or other fields, For example, the ChatGPT model that everyone is focusing on today is very meaningful, because it has the potential to drive new advances in our entire underlying technology. Such new advances have the potential to bring more increments to humanity in the future.

Since 2021, there have been more technological breakthroughs, and at the same time we have seen a very interesting phenomenon, this round of technological breakthroughs from the formation of certain achievements in technology, we began to explore and practice in industry and business, and this cycle became shorter than the original. After that, a large number of innovative companies at home and abroad were established, professors and scholars began to start businesses, I think there may have been some paths in the market in the past, and the recognition of investors has become higher, including some of the API of Wen Sheng Tu after the announcement, soon some people on Xiaohongshu to try to be Internet celebrities.

We see a lot of trends, from technological breakthroughs to commercial innovations, and the cycle seems to be shorter. In some forums I have participated in recently, I have found that most people are talking about what kind of big model they want to make, how big and powerful the model is, what to do with this model, build a super new APP in some specific scenarios, and so on. In the absence of any big model in China being officially licensed by the government, there has been such a big expansion change in the last two months.

So I think this is a phenomenon that deserves our attention, we see that the commercialization process of this round of large models is faster, why does it have such an effect? It is very important that we see a lot of new technologies that can do more C-end applications, and at the same time, can naturally form a closed loop of data accumulation, which is easier to establish business barriers than technical entrepreneurship in the past. I think that's the trend we've seen in the industry in recent months.

Yang Fan, co-founder of SenseTime: New opportunities for AI industry development brought by the wave of big models|WISE2023 Subverts AIGC Industry Development Summit

Yang Fan, co-founder of SenseTime and president of Big Device Business Group

The second is what is behind the technology that we make models bigger today. There is a consensus that whether it is a large model or looking back at the past 10 years, the development and changes of the entire artificial intelligence industry are basically the success of a violent aesthetic, including the traditional three elements of artificial intelligence: data, computing power, and algorithms. Algorithm, you can understand the model structure, today we call these large models, or technically updated models, almost all models in each field whether it is the scale of the computing power used in the dataset scale, or the structure of the algorithm itself, and the number of parameters of the model, in fact, maintain a very high growth rate, Transformer This model is very stable, the effect is very good, can solve many field problems, and can get good results. When we find that the amount of data is large enough to get good generalization results, in fact, in a sense, it also verifies that the general direction of progress in artificial intelligence technology is to violently produce miracles, and more resources can be integrated to get better results.

However, it is not enough to have such a resource, we look at the corresponding three elements, each element in each field before forming a good result, to do a lot of professional engineering practice.

In fact, the speech of the guest just now explained why we need large computing power in the field of computing power, and how can these large computing power be connected? If there are 1,000 cards today, can we make them play a good price-performance ratio, and can the effective utilization rate be 60%, 80%, or even 90%? Or, if we connect 1,000, 2,000, 4,000 cards today, what will be the effect? OpenAI previously connected 10,000 V100, at present, no one in China can connect 10,000 cards together to run the same training task, and make the effective resource utilization rate reach 50%, 60% or more, now some people may be doing, but there is no such result, why? Behind this is a very complex engineering event. For example: a model with hundreds of billions of parameters, in the training need to do a lot of data interaction and intermediate gradient information interaction, when you divide a large number of transmission data on thousands of GPU cards to form an effective balance between the transmission of data and computing results, many times the model is between point-to-point, in the network structure to do two or two transmission. When we connect thousands of cards together, what is the acceptable state of the effect, which is actually not complicated, that is, a lot of engineering practice, just like you have done this, you have stepped on enough pits, you will adjust better than others, this thing is a very important experience problem.

The same is true for algorithms, and today's algorithm structure design can be less expensive than the original. Good structural design, with fewer parameters, smaller data can achieve a final algorithm effect similar to a design without special optimization, which also has a lot of expert knowledge, let alone data.

When OpenAI did ChatGPT4, in the end, only a small part of the middle may be less than 10% of the collected data for training, which is very large for the gap between resource saving and full training, the Internet data is very large, which data is more effective and which data has higher value? When we do training, which data we lose first, and what methods we throw later, there is actually a lot of trial and error in the middle. Why is computing power so scarce and everyone needs to take more computing power? Because many people who make large models are trial and error, they may be divided into three or four groups at the same time, trial and error in different directions, and then gradually do iterative optimization, violent aesthetics or large-scale resource aggregation is the reason why AI technology and AI algorithms can continue to be obtained today.

The more important reason is that we need some experts' experience knowledge and systematic engineering capabilities in every link, which is actually a comprehensive system engineering. This is also to see OpenAI let the best scientists do data engineering, rather than algorithms, which is greatly beyond our previous understanding of the field, in the future, this may become a key threshold, will also become our core ability to provide services to the market.

Why after the new artificial intelligence technology came out, the industrial wave followed up very quickly, we see that model services naturally meet many fields, people in the Internet circle are very excited, investors feel that it will grow as rapidly as the Internet. Large models can have some opportunities for renewal in terms of commercialization thresholds and barriers, of course, the acquisition of these opportunities depends on different gaps and characteristic specialties. In any case, compared with the past 10 years, today's artificial intelligence industrialization will have very big advantages, because it is not a single technical barrier, today's technical advantage is likely to be transformed into data barriers and scale advantages, we believe that there will be more industrial applications in the future.

Yang Fan, co-founder of SenseTime: New opportunities for AI industry development brought by the wave of big models|WISE2023 Subverts AIGC Industry Development Summit

Yang Fan, co-founder of SenseTime and president of Big Device Business Group

SenseTime began to make early large models in 2019, and in our opinion, the entire AI model has been in a larger and larger state, so we have accumulated a lot of capabilities internally, including self-developed CV and NLP models. In April this year, SenseTime opened up some model APIs for industry partners to try, including some large-language models, which in our view is more the ultimate embodiment of the accumulation of core basic technical capabilities.

We have released a series of models this year, behind the service support for the market is our large device, we feel that the entire artificial intelligence industry to move forward, someone needs to provide such a large-scale and efficient infrastructure, which is basically an inevitable path. The entire wave of AI technology, if it becomes more and more resource consumption and expert experience accumulation games in the future, in fact, the threshold is extremely high, which is not conducive to the rapid application of AI by a large number of industries, so we judge that it is bound to form differentiation, there will be someone to provide infrastructure services, whether it is to call the model API form, or on this basis to do a small model, or in other ways, you can quickly use AI basic resources and capabilities at low threshold and low cost. So as to quickly improve their own business model closed loop.

SenseTime's positioning is to be an AI infrastructure provider, today we have the largest artificial intelligence computing node in Asia, we have more than 5000P resource computing power, but also provide a lot of industry cooperation, so that partners can use their large models to do training on large devices, which reflects SenseTime's deep accumulation, whether at the resource level or at the expert engineering cognitive level, part of our capabilities can be standardized, into software and services, can not be standardized part, We can turn it into a professional type classification service, and we hope to provide these capabilities to the entire industry in a package to help customers make their own domain models or model applications.

Train AI large models, use SenseTime large devices.

Finish|Shen Xiao

Read on