Tsinghua Liu Zhiyuan: The big model "ten questions" to find the research direction under the new paradigm

Source: Zhiyuan Community

Author: Liu Zhiyuan

Finishing: Li Mengjia

The emergence of big models ushered in a new era of AI research, and the results brought about by them have improved significantly, beyond the improvement of designing specific algorithms for research problems in many fields.

Specifically, the most essential feature of the new paradigm pre-trained to Finetune is the unified framework as well as the unified model. First, a more unified architecture, before the advent of pre-training, algorithmic frameworks including CNN, RNN, Gate, Attention, etc. emerged in an endless stream. After the advent of Transformer in 2017, a unified framework was replaced by various popular frameworks. Second, this unified framework brings a unified model through a pre-trained mechanism, so we can now fine-tune it with a unified model so that it can be used simultaneously for a very large number of downstream tasks.

So, what are the new issues that need to be paid attention to and explored in the era of big models?

Therefore, I would like to share with you ten questions that are worth exploring in depth. Hopefully, more researchers will find their own research direction in the era of big models.

The questions are as follows:

1. Theory: What is the basic theory of the big model?

2. Architecture: Is Transformer the ultimate framework?

3. Energy efficiency: how to make the big model more efficient?

4. Adaptation: How does the big model fit into downstream tasks?

5. Controllability: How to achieve controllable generation of large models?

6. Safety: How to improve the safety ethics in the big model?

7. Cognition: How to make large models obtain advanced cognitive abilities?

8. Application: What are the innovative applications of the big model?

9. Evaluation: How to evaluate the performance of large models?

10. Ease of use: How to reduce the threshold for the use of large models?

01 Theory: What is the basic theory of the big model?

First of all, I think the first very important problem in the big model is its basic theoretical problem. A very important feature of the big model is that it can use very little downstream task data to adapt to related downstream tasks, whether it is the training data of the full downstream task or the feedback-shot learning, or even zero-shot learning, which can achieve quite good results. At the same time, in the process of pre-training to the downstream task adaptation, the amount of parameters that need to be adjusted can be very small, and these two characteristics are new phenomena brought to us by the big model.

We have a lot of questions to ask about this phenomenon:

First, what is the big model? What kind of good mathematical or analytical tools should we have to quantitatively analyze or theoretically analyze large models, which is a very important problem in itself.

Second, How – Why is the big model good? How does the big model do this? How are Pre-training and Fine-tuning related? And what exactly did the big model learn? These are How's problems.

Finally, Why – Why – Why do big models learn so well? There are already some very important research theories in this regard, including theories such as parametrics, but the veil of the ultimate theoretical framework has still not been lifted. For these three aspects, namely What, How and Why, the era of big models has a lot of theoretical questions worth exploring.

02 Architecture: Is Transformer the Ultimate Framework?

The second problem, the mainstream infrastructure currently used by the big model, Transformer is 5 years away from us (proposed in 2017). We see that as the scale of the model continues to grow, the performance improvement gradually decreases in marginal benefits, so is Transformer the ultimate framework? Is it possible to find a better and more efficient framework than Transformer? This is also a question worth exploring.

Neural networks themselves are inspired by neuroscience, and we can explore the next generation of big model frameworks through the support of other disciplines. Among the inspirations from mathematical disciplines is the framework of the non-European space Manifold and how to put some geometric priors into models, which are relatively new research directions in recent times.

We can also think about this from an engineering and physical perspective, such as the State Space Model, and the perspective of dynamic systems; the third aspect comes from neuroscience, and the brain-like network has recently been researched on the Skinning Neural Network, which are cutting-edge research on new architectures. What exactly is the next generation of big model frameworks? There is no standard answer yet, which in itself is a question that needs to be explored urgently.

03 Energy Efficiency: How to Make Big Models More Efficient?

The third problem is the performance problem of the big model. As large models get bigger and bigger, the consumption of computing and storage costs naturally increases. Recently, the concept of GreenAI was proposed, that is, the need to consider the situation of computational energy consumption to comprehensively design and train artificial intelligence models. Facing this problem, we believe that as the model becomes larger, AI will increasingly need to be combined with computer systems to propose a more efficient support system for large models. On the one hand, we need to build more efficient distributed training algorithms, in this regard there are a lot of related explorations at home and abroad, including the internationally famous DeepSpeed and some acceleration algorithms developed by the Wudao team.

On the other hand, once the big model is trained to be used, the "big" model will make the reasoning process very slow, so another frontier direction is how to efficiently compress the model as much as possible, while accelerating the reasoning while maintaining its effect. The main technical routes in this regard include pruning, distillation, quantification and so on. At the same time, we have recently found that the large model has a very strong phenomenon of sparse distribution, which is very helpful for the efficient compression and calculation of the model, which requires the support of some specialized algorithms.

04 Adaptation: How does the big model fit into downstream tasks?

The fourth question is, once the big model is trained, how to adapt it to the downstream task? The larger the model, the better it performs on known tasks, while also demonstrating the potential to support complex tasks that are not defined. At the same time, we will find that as large models become larger, the computational and storage overhead of adapting to downstream tasks increases significantly. If you look at the papers at our top conference from 2020 to 2021, you will find that more and more papers are using pre-trained models, but the papers that really use large models are still at a very low level.

The reason why it is very important is that even if the world has open sourced a lot of large models, but for many research institutions, they actually have no way to adapt the big model to the downstream task, which is a very important research frontier of the big model, a very important direction is actually the Jump Tuning mentioned by Tang Jie just now, by changing the downstream task form to a so-called masked language in the pre-training process The similar form of the model makes the adaptation process smoother and easier.

Another very important frontier is actually parameter-effcient learning or Delta Tuning, the basic idea is to adjust only a very small number of parameters in the big model, so that the model is very fast to adapt to the downstream task, so that the adaptation process will not become so difficult, this is how we think how to quickly adapt the big model to the downstream task of the key problem, which is a very cutting-edge direction. Just now, Teacher Tang mentioned that we actually open source two tools, including OpenPrompt and OpenDelta, to support rapid research in this area, and we are also welcome to use, relevant opinions and suggestions and even contribute.

05 Controllability: How to achieve controllable generation of large models?

The fifth problem is the controllable generation of large models. At present, the big model can already generate some new text or images, but how to accurately add the conditions or constraints we want to the generation process is a very important research direction for the big model.

There are also many technical solutions in this direction, including the ideas mentioned by Teacher Tang, adding some appts to make the generation process accept the conditions we provide.

There are also some open-ended issues in this regard, such as how to establish a unified controllable generation framework, how to implement a better evaluation method, conceptual or even factual self-consistent detection of generated text, and how to generate related data for new data.

06 Security: How to Improve Safety Ethics in Big Models?

The sixth problem is that the current big model itself has less consideration in terms of safety ethics. In fact, it will be easy to attack large models, and it may not work if you change the input slightly. In addition, there will be certain ethical problems in the use of large models, which require us to carry out targeted constraints on large models.

In this regard, including the team of Mr. Huang Minlie, who is also doing some work, we found that the big model is particularly easy to be consciously implanted with some backdoor, so that the big model can make a specific response in some specific scenarios, which is a very important security issue.

In addition, previous studies have shown that after the model becomes larger and larger, it will become more and more biased and less and less trustworthy, and this trend of lower trust is the problem we need to explore.

07 Cognition: How to make large models gain advanced cognitive abilities?

Seventh, can people's advanced cognitive abilities be learned by big models? Can you make the big model do some tasks like a human? People to complete the task will generally do several aspects of the work: first, we will try to split the task into a number of simple tasks, second, for these tasks to do some relevant information to obtain, and finally we will carry out so-called advanced reasoning, so as to complete more complex tasks.

This is also a frontier direction worth exploring, and attempts at methods such as WebGPT in the world have begun to make big models learn to use search engines and so on. We may even ask if we can let the big model learn to surf the web like a human, to get some relevant information in a targeted manner, and then complete the task.

08 Applications: What are the innovative applications of the big model?

The eighth problem is the innovative application of large models in many fields. In recent years, a variety of applications have emerged in Nature cover articles, and large models have begun to play a crucial role in this. One familiar piece of work in this regard is AlphaFold, which has had a dramatic impact on the overall protein structure prediction.

In this direction in the future, the key problem is how to add domain knowledge to the large-scale data modeling and large-scale model generation process that AI is good at, which is an important proposition for innovative applications using large models.

09 Evaluation: How to evaluate the performance of a large model?

The ninth question is that the big models are getting bigger and bigger, and the types of structures, data sources, and training targets are also increasing, how much has the performance improvement of these models been? Where do we still have to work? Regarding the performance evaluation of large models, we need a scientific standard to judge the strengths and weaknesses of large models, and in this regard, Zhiyuan also has corresponding efforts, so we put forward the concept of "Zhiyuan Index".

10 Ease of use: How to lower the threshold for using large models?

Finally, we believe that the big model has shown great strength under the support of a unified framework and a unified model, and there is a hope that it will be widely used in a variety of scenarios in the future. In order to be more widely used, the problem that needs to be solved is how to lower the threshold of its use. In this regard, we should be inspired by historical database systems and big data analysis systems, and we need to build big model systems and consider the underlying related computing devices, system support, user interfaces, and application universality.

In this regard, with the support of Tsinghua University and Zhiyuan Research Institute, we have recently developed a support system for large models, which can provide efficient computing support for the whole process in training, fine-tuning, inference to post-processing, etc., and the system is expected to be officially released at the end of March. Now that individual kits are available online, we welcome you to use the big model system to better navigate the era of big models and make cutting-edge exploration and applications.

To sum up, the above ten problems are the directions that I think are very important and worth exploring, and I hope that more students and more researchers will find problems worth studying in the era of big models. It's a whole new era, some of the old ones are gone, and more new ones are emerging, and we look forward to exploring them together.

Tsinghua Liu Zhiyuan: The big model "ten questions" to find the research direction under the new paradigm

Read on