Introduction to FLAN: A More General Language Model with Instruction Fine-Tuning

For a machine learning model to generate meaningful text, it must have a lot of knowledge about the world as well as the ability to abstract. While trained language models are increasingly able to automatically acquire this knowledge as they expand, it is unclear how best to unlock this knowledge and apply it to specific real-world tasks.

A proven technique called fine-tuning is to train pre-trained models (such as BERT and T5) on labeled datasets to adapt them to downstream tasks. However, fine-tuning requires a large number of training examples, as well as model weights stored for each downstream task, which is not always feasible, especially for large models.

In Fine-tuning the Language Model is a Zero-Lens Learner, we explore a simple technique called instruction fine-tuning, or instruction tuning for short. This involves fine-tuning the model, not to solve a specific task, but to make it more suitable for solving general NLP tasks. We use instruction tuning to train a model, which we call fine-tuning LAnguage Net (FLAN). Since the instruction tuning phase of FLAN requires only a small amount of updates compared to the large amount of computation involved in the pre-trained model, it is a metaphorical dessert for the pre-trained main course. This enables FLAN to perform a variety of invisible tasks.

Introduction to FLAN: A More General Language Model with Instruction Fine-Tuning

background

A popular technique that has recently used language models to solve tasks is called zero-trigger or less-trigger prompts. This technique formulates tasks based on the text that the language model might see during training, and then the language model generates an answer by completing the text. For example, to categorize the mood of a movie review, you can give a language model a sentence, "Movie review 'Best RomCom since Beautiful Woman' is _" and ask for the sentence to be completed with the word "positive" or "negative.".

While this technique performs well on some tasks, it requires careful and timely engineering to design the task to be the data the model sees during training – this approach performs well on some, but not all, tasks, and can also be an unintuitive way for practitioners to interact with the model. For example, the creators of GPT-3, one of the largest language models in use today, found that this hinting technique does not produce good performance on natural language reasoning (NLI) tasks.

Instruction tuning

FlaN instead fine-tuns the model based on a large number of different directives that use simple and intuitive task descriptions, such as "Classify this movie review as positive or negative" or "Translate this sentence into Danish."

Creating a set of instructions from scratch to fine-tune the model will cost a lot of resources. Therefore, we used templates instead to convert existing datasets into instructional formats.

We showed that by training the model based on these instructions, it is not only good at solving the various instructions seen during training, but also at following instructions in general.

Evaluate the model

To compare FLAN with other technologies in a meaningful way, we use established benchmark datasets to compare the performance of the model with existing models. In addition, we evaluated the performance of the FLAN, but did not see any examples in the dataset during training.

However, if we train on a dataset that is too similar to the evaluation dataset, it may still affect the performance results. For example, training on one Q&A dataset might help the model do better on another. Therefore, we group all datasets into clusters by task type, keeping not only the training data for the dataset, but also the entire task cluster to which the dataset belongs.

We group the datasets into the following clusters.

outcome

We evaluated FLAN on 25 missions and found that FLAN improved over zero-shot hints on all but 4 missions. We found that out of 20 of the 25 tasks, our results were better than zero GPT-3s, and even better than a small number of GPT-3s in some tasks.

We also found that model size is important for the model's ability to benefit from instruction tuning. At smaller scales, flan techniques actually degrade performance, and only at larger scales can the model generalize from instructions in the training data to invisible tasks. This may be because a model that is too small does not have enough parameters to perform a large number of tasks.

conclusion

The FLAN model is not the first to be trained on a set of instructions, but to our knowledge, we are the first to apply the technique on a large scale and show that it can improve the generalization capabilities of the model. We hope that the approach we propose will help inspire more research into models that can perform invisible tasks and learn from very little data.

We also publish code that performs the transformation so that other researchers can reproduce our results and build on top of them.

Introduction to FLAN: A More General Language Model with Instruction Fine-Tuning

Introduction to FLAN: A More General Language Model with Instruction Fine-Tuning

background

Instruction tuning

Evaluate the model

outcome

conclusion

Read on

"Lin Feng Customs and Lingnan Landscape Painting Language Exploration Special Exhibition" was launched in Shenzhen

An overnight hit AI language processing tool that can summarize the paper in one sentence, but it is still "difficult to distinguish between right and wrong"

Shen Jiaxuan | Looking at the Western View of Categories from Language: Better "Expressing Yourself"

How do adults learn the language after immigrating to Luxembourg? Language course registration method is attached

#Book Review#A picture book with more charm than story 丨 "Song Dingbo Sells Ghosts"

Decode the "language of love" and improve the communication ability of parent-child relationship and husband-wife relationship

"Language School Introduction" Elite Japanese Language Two Major Changes and Notices

How hard has Kazakhstan worked on the road to de-Russification? Even the language has been changed

"Beginnings" | I hope that everyone will not be a language giant on the Internet, and the real-life aphasia

Is human value worth decentralizing? V God Chinese reply: Language is an obvious example

Pet consignment knowledge | 6 kinds of dog tail language revealed! A must-see for newbie dog owners

Do you understand that a little pet who can't speak can express his thoughts with body language?

How can a person who does not understand the art and language of illness cure the disease?

Jobs's reality-distorting force field is a high-power language

The new Aopao adopts a new family design language, and the overall style is very young and fashionable