laitimes

Dialogue with Alibaba Cloud CTO Zhou Jingren: "Tongyi Qianqian" is not the starting point or the end

Focus:

  • 1 Tongyi Qianwen is a large model, and its name comes from the "thousands of inquiries", the way humans acquire wisdom.
  • 2 Evaluating the general meaning of Qianwen is a challenging task, and there is currently no standard in the industry. The test set includes multiple aspects such as creation, copywriting, reasoning, and mathematics.
  • 3 Tongyi Qianwen performs well in reasoning, but it still falls short compared to ChatGPT.
  • 4. The future development direction is to continuously improve its own performance ability and launch more model series.

Interviewer: Wang Zhaoyang and Luo Yihang

Dialogue with interviewee: Zhou Jingren

Dialogue arranger: Li Xiaoxian

On April 7, Alibaba's big language model "Tongyi Qianwen" opened the test.

Alibaba DAMO Academy, which developed "Tongyi Thousand Questions", explained the naming of this large language model: "Thousands of questions, thousands of questions, tens of millions of learning, those who can ask a thousand questions must be true love, those who can answer a thousand questions must also be really learned, and AI has the same enthusiasm for interaction with us." ”

After testing the large model product with an invitation code for a day, we also had the opportunity to have an exclusive conversation with the person in charge behind it, Alibaba Cloud Intelligent CTO Zhou Jingren, and put forward our "Thousand Questions".

The following is a transcript of the conversation:

"Tongyi Qianqian" is not a starting point or an end, but a node on the established route

Pinplay: We have been using Tongyi Qianwen today, many of its features make us impressed, there are also a lot of tests on the product online, we are curious how you will test it, how to judge its performance?

Zhou Jingren: How to comprehensively evaluate a large model itself is very challenging. In fact, there is no standard for the entire industry today. Even everyone's evaluation is actually very subjective, right?

Sometimes it's more about letting the model face some of your responses, guessing which way you like to speak, and then constantly adjusting, which in itself is an ability of the model to understand.

We will have a series of reviews, in which simple tests include some creation, some copywriting continuation, some simple reasoning, and even some simple mathematics. Including today with some multimodal extensions, and some knowledge-enhancing tests.

Now we have a test set internally, and slowly in the future, we will change from letting people evaluate a model to using models to evaluate a model itself, and we will definitely come to this step.

Pin Play: So if you benchmark against ChatGPT, what level is it now?

Zhou Jingren: We have to admit that ChatGPT, especially GPT4, is still very leading. I feel like it's a normal process.

But in direct comparison, it is also difficult to evaluate. We focus on how to make our model compensate for its own shortcomings, and even have some outstanding capabilities in more scenes. All the models today are still far from we can actually simulate human intelligence.

Play: Ali did not start to invest intensively in large models because of the emergence of ChatGPT, and has shown the progress of many model research before, so what is the position of "Tongyi Qianqian" in Ali's large model research process?

Zhou Jingren: This product is an intermediate state for us.

Today we are constantly exploring the path of a large model based on multimodality, or a node close to the path of multimodality truly approaching human intelligence. It is not a starting point or an end, it is a node on a given route. This time, we are opening up part of the work accumulated in the past to the society and developers, but we still have a long way of work in this area, and we need to further break through and innovate.

Pinplay: Despite your emphasis on multimodality, we noticed that this time there is no function of the Bunsheng diagram.

Zhou Jingren: Yes, many companies are doing Wensheng diagrams, in fact, DAMO Academy has also published a series of related work, such as our Composer model, etc., not only can do Wenser diagrams, but also can modify this diagram according to some of your detailed instructions. Therefore, it is not the most difficult to access the Wen Sheng diagram to Tongyi Qianqian, it is more of an engineering problem.

We actually think that what is more difficult today is to integrate the model capabilities of each modal into a model, such as how to integrate visual capabilities into language models. Because a person's way of acquiring knowledge has vision, language, hearing, etc., they will not be separated by independent forms, and in the human brain it is a body of knowledge that can be penetrated, which can organically combine different forms of input. In the future, multi-modality will definitely be able to do it, whether it is any modal information, your knowledge system or the signal received can be organically integrated in a high-dimensional space.

I think this is inevitable. This is also where GPT4 or 5 will definitely break through, and we have also invested a lot in this area, which is also an important direction we recognize.

Pinplay: That is to say, Ali's large model route is a blueprint of multi-modality to the end, and these are all part of the plan.

Zhou Jingren: We have been trying a variety of large models since 2019, from StructBERT to M6, to PLUG to the latest Composer and a series of other visual models, in fact, the essence is to constantly explore and innovate on the overall idea of pre-trained large models.

I think today's big models are actually approaching our human intelligence. An important point of human intelligence comes from language, and LLM (large language model) began to effectively extract a large amount of human knowledge based on natural language understanding, which is very unique in this regard.

Today we see that some of the abilities shown by ChatGPT, similar to Tongyi Qianqian, are actually on this path. So for those of us in the industry, we don't think this path will suddenly appear in 2023. We see this as an evolution of technological development over a long period of time. Even the capabilities of the so-called big models we see today are the tip of the iceberg, and there will be an even more amazing series of performances in the future.

I think the introduction of ChatGPT and, more importantly, educates society as a whole. The last time we talked about large models half a year ago, even some workers in the scientific and technological industry may not be optimistic about this route. Today, ChatGPT is based on the form of a product such as Chat is very good, and the relevant capabilities are effectively expressed, not only for the front-line workers of the model, but also to make the public and people from all walks of life suddenly wake up like a dream. Even today, it has brought us the entire field of computer science a surprising response, or a rapid educational process, to quickly realize how well a pre-trained large model based on an intelligent organism can perform.

During this process, we were also surprised by some of the technologies and aspects of ChatGPT. But Ali is not saying that with ChatGPT today, we will join such an array, or that we can make a model together. In fact, we have been accumulating in this regard, and it should be said that it is one of the earliest companies in China to explore the direction of large models.

ChatGPT is very leading, but next time maybe we're leading the way in technology

So what exactly has ChatGPT changed for big models?

Zhou Jingren: It is based on SFT (supervised fine tuning), including reinforcement learning-based tuning methods are eye-catching.

Today, in fact, we look back and see that the potential to incorporate knowledge into the model is huge, but before the InstructGPT came out, there was actually a lack of an effective means to unleash this ability. The use of these techniques can now more effectively unleash the ability of models as a body of knowledge to quickly solve specific problems.

Play: What impact does this have on the big model research route?

Jingren Zhou: First of all, ChatGPT including InstructGPT gave us a lot of inspiration, I believe OpenAI actually has a long technical foresight, when he released GPT3. 0 or 3. At the time of version 5, in fact, all aspects were already ready.

But in the end, we think that pre-trained large models similar to human agents must be multimodal. We've been investing in this for a long time, so the birth of ChatGPT didn't change our investment direction. Later, the release of GPT4 actually invisibly verified that everyone had a relatively consistent view - AI will further develop towards a multimodal system.

Therefore, today our direction is the same, and we must learn from each other in the path of realization, which is the only way for the development of science and technology. Today ChatGPT has some great work, and we want to integrate a series of their work and some advances into our technical roadmap.

Play: The fact that the AI industry's agenda is being set by ChatGPT means that whatever other players do, they will be benchmarked against ChatGPT.

Zhou Jingren: I think this is the charm of science and technology.

This kind of chasing me is the only way for the healthy development of science and technology today. Any technology is in your catch up, today I may have some new ideas, can push technology one step further, then the next time may be you to undertake some innovation in this area, everyone is learning from each other, constantly to promote science and technology forward.

In the process of continuous learning, we can not be self-defeating, we also hope to continue to promote the most advanced technological progress, next time may be we are promoting the development of the industry. Only in this way can the overall science and technology of human beings continue to improve, constantly innovate, and constantly make breakthroughs.

As for the whole OpenAI that is already setting the agenda for us today, I think it's because it's the leader, which has to be admitted, and how to catch up quickly, how to iterate the model quickly will be the key to success.

That said, we have some new ideas today, so how can I try them today? If it takes months or more to try each attempt, today you can't innovate at your entire pace.

Play: It becomes a competition for system efficiency.

Zhou Jingren: The speed of innovation iteration today requires us to have some new ideas, but more importantly, we need the infrastructure of today's cloud. It allows us to try quickly, to trial and error, and to get feedback quickly, so that scientific and technological innovation can continue to accelerate.

We say that it is a comprehensive competition, not only that today is the competition of the model itself, in fact, today it is both research and engineering, from cloud infrastructure to AI algorithms, to today's data processing, a full range of competition, and even involves all aspects of our computer science today. There are all kinds of distributed system restarts, and all aspects of the underlying networked storage are involved. The reason why OpenAI can do a very good job also comes from the organic combination of it and Microsoft Azure today, in fact, it is also a strong combination invisibly, which can continue to drive the speed of OpenAI innovation through the cloud infrastructure and a series of optimizations of the entire system.

I think that in this competition, it is a performance of a company's all-round ability, and if it lags slightly behind in any link, it will be at a disadvantage in the whole competition.

"Tongyi Qianqian" is actually the foundation of a MaaS (model as a service).

Pinplay: That is, maybe the model is not fully mature, but it also needs to be put into a real or even commercial environment first. Today, the development of models and the application of the industry are already going hand in hand.

Zhou Jingren: Yes, people gradually realize that based on the strong knowledge understanding and reasoning ability of the big model itself, after finding a direction such as SFT and Prompt, they can slowly release the small universe. That would, of course, stimulate a series of model applications based on large models.

Today, to a certain extent, the algorithm system of artificial intelligence business algorithms has been changed, and in the future, we must slowly learn how to do secondary development on large models, and do a series of related algorithms and work, including adapting it to different scenarios.

Last year, we launched Model as a Service (MaaS) for the first time in China, and we were even the first in the world to propose such a concept. Then we are also more pleased to see that more and more industries, cloud computing vendors and more Internet companies have begun to agree with such a view, and even today they have begun to build their own product and service systems according to such a viewpoint. We've really entered the world of models.

Pinplay: So will MaaS put us on a different evolution path than OpenAI?

Zhou Jingren: We think that the threshold for AI development will become lower and lower in the future, and we expect that even elementary school students can use various models to develop. In the future, we need to form a different hierarchical structure of a model.

And a universal model is actually difficult to solve all problems. From the developer's point of view, it will be more model-oriented as the first perspective and element, which means that today's paradigm around model development will gradually be born. We proposed and have been emphasizing MaaS last year before ChatGPT came out, which is actually a series of thinking behind such a concept.

We were thinking about how the ecology of the model could develop rapidly, so we proposed MaaS, and in order to accelerate MaaS, we created a community like "Mobai". So almost all of these efforts are strongly interconnected and are on the main line of our overall AI strategy and model strategy.

Play: Tongyi Qianwen is actually one of the results of MaaS, right?

Zhou Jingren: Yes, today Tongyi Qianwen is based on dialogue as a form of ability, but we expect some enterprise-level applications to be born on it, that is to say, today the Tongyi Qianwen model is really used as a base on MaaS, which can be additionally developed on it and can be truly applied to some scenarios in all walks of life. That's really MaaS.

DAMO Academy supports others to develop big language models on Alibaba Cloud

Play: That is, on Alibaba Cloud, others can also develop their own models.

Zhou Jingren: We welcome it very much. We don't think of the big model as one big one today.

We hope that Alibaba Cloud can provide you with an efficient computing power, not simply a number of computing power, but can better provide this part of the infrastructure to our Chinese startups and help them realize their own innovation in AI, so that China's overall AI capabilities can be improved in an all-round way.

Pinplay: In fact, our attitude towards the model of startups, the entire DAMO Academy including the entire Ali is actually quite open?

Zhou Jingren: That's right.

Pinplay: What about startups making big models? What do you guys think?

Zhou Jingren: I think we in the science and technology community must keep an open mind when dealing with big models, right? Because a lot of innovation comes from many new ideas, in fact, we can hardly say that we are in a monopoly state.

Ali should say that we are very open in this regard, we are willing to make our model available to everyone, on the other hand, we also serve the capabilities of such a cloud that our model relies on. Although we also recognize that today's small companies have financial and technical thresholds for training large models, I think this is due to the characteristics of the problem itself. We hope that more participants will continue to join in this area of scientific and technological innovation.

Sell a pass, and the next "Tongyi" model

Pinplay: Many people today attribute the leap forward of large models to emergence. Can you describe which R&D scenario in Tong Yi Qianqing makes us think that it may have achieved a very successful large-scale emergence?

Zhou Jingren: I think emergence may be a subjective definition.

Play: Not a scientific word, is it?

Zhou Jingren: Yes, why do you say that, because it is an impact on an individual's cognitive system, which is called an emergence. Of course, everyone's cognition is different, so everyone sees the same result, maybe it is a shock to me, but not to you.

So for us science and technology workers, all the development of science and technology today is step by step. So I just said that the reason why we have been doing research on related large models since the earliest in China a few years ago, so sure, in fact, we will have a variety of a variety of emergence every year, each of our jobs actually has a variety of emergence, of course, this time seems to be an emergence of the whole society, everyone participates, but I still think that the development of science and technology always has a certain accumulation, to a certain moment there is a breakthrough, in fact, there are a lot of technical details in the middle, a lot of know-how.

Play: So what are these know-hows? Many people say that they can't find it now, it's like alchemy, do you agree with this statement?

Zhou Jingren: I think it should be said that today we are a bit like being in the early stage of deep learning development. At that time, everyone's understanding of the entire deep learning also felt that "I don't know why, it works anyway", which is undoubted. Frankly speaking, today's progress does have a lot of engineering and empirical factors in it, and there are indeed many aspects worth studying, including the deep mechanism of this model. Today we probably know why this model has such a sudden performance, but the real situation we still need to have some theoretical research.

Pinplay: What is the most desired or most needed problem to be solved now?

Zhou Jingren: There are still many places where this model can be optimized. I just emphasized that it is only one part of the overall plan, allowing people to experience some progress in our work, but there is still a lot of work to be done before our overall design.

For example, from the model itself and the system layer, how to serve the training of larger scale models more efficiently, how to access more modalities, improve the code ability on the reasoning side, improve the ability to assist all aspects of human daily work and life, and customize the ability to combine industry knowledge.

But the solution process is also a necessary way for our scientific and technological development, and if we feel that the problem has been solved at a point in time today, this field will no longer be exciting. Because today is a new field that has just developed, we believe that the room for imagination is huge.

Pinplay: The last question, why is it called Tongyi Qianqian, does it have to be so literate?

Zhou Jingren: We released the "Tongyi" large model series in September last year, and our overall series of releases is not a strategy to change for a certain release, we have a systematic thinking. Today we release Qianwen at a node on our established route.

So Tongyi is the name of a model series, and Qianwen mainly refers to its current main dialogue form. Thousands of questions are the way humans acquire wisdom, and we hope that Qianwen can continue to learn to approach human wisdom.

Tongyi Qianwen has become an important member of our family of tongyi models. We'll be testing another generic model soon, and we'll sell it now. It will also be called General Meaning, and it represents another important development in our related field today.

Read on