Jin Lei is from the Temple of Cave Fei

Qubits | Official account QbitAI

Family, the big language model (LLM) that has been suffering for a long time finally has a solution!

Just recently, Jia Jiaya's team at Hong Kong Chinese University and MIT announced a new study that broke this deadlock:

Released the world's first 70B parameter long text open source big language model - LongAlpaca.

Jia Jiaya's team open sourced the world's first 70B long text large language model and read papers directly ProMax

It is worth noting that the team's open-sourced LongAlpaca this time is not only a single large language model, but actually a series, including:

Medium cup: Long Alpaca-7B
Large cup: Long Alpaca-13B
Oversized cup: LongAlpaca-70B

Behind their complete training and evaluation, more critical is the LongAlpaca-12k, a long text dataset carefully selected and refined by the research team.

And with the blessing of LongLoRA, a self-developed large-language model text length expansion scheme, amazing results have been achieved in the final result——

With only two lines of code and an 8-card A100 machine, the text length of the 7B model can be extended to 100k tokens, and the text length of the 70B model can be extended to 32k tokens.

You know, the mainstream large language models now only support short text input (LLaMa supports 2k tokens, Llama2 supports 4k tokens); What we often need in the actual use process is the input of long text.

For example, let the big language model interpret the plot of a novel, or let it help you analyze a paper, and so on.

In the past, if the open source big language model wanted to expand the length of the text, it was basically the kind of heap computing power, which often consumed more than 100 GPUs or TPUs.

Not only is the consumption of computing resources huge, but also due to the lack of specialized data sets, the results of long text input are not so ideal.

This is the origin of our mention at the beginning of "bitter long texts".

So what is the specific effect of LongAlpaca released by the research team led by Jiaja Jiaya?

Let's experience it now.

A "world-first" first-hand experience

Since it is a first-hand experience, our main focus is to review LongAlpaca from all aspects.

Analyze the reasons for a paper award

First, let's "feed" an ICLR Oral paper (12,487 tokens) to LongAlpaca to help analyze the reasons for the award.

The length of the text of a paper will certainly not be as short as we usually ask ordinary questions, and the text throughout is logically interconnected and extremely rigorous.

LongAlpaca still gave a more perfect answer in the face of such a professional paper, and analyzed the key elements such as the novelty of the proposed method, the comprehensive evaluation of the framework, and the experimental results.

The final summary is:

All in all, the key reason for winning this paper is the novelty, validity, and versatility of the proposed Domino method and evaluation framework, which drives the latest advances in slice discovery. Comprehensive experimental evaluation and quantitative analysis provide strong empirical evidence to support conceptual contributions.

Analyze the difference in meeting styles based on the two papers

From the first evaluation, LongAlpaca is completely good at handling the length of a paper.

So if the two articles are "fed", or the kind that is contrasted, can it still hold?

Without further ado, just "throw it in":

Based on these two papers, compare the differences between the two conferences (25802 tokens total).

Similarly, LongAlpaca still answers this question from multiple angles and aspects.

The structure of the paper, the depth of technology, the theory and experiment, the notation, and even the language style of the article all give specific analysis results.

In the end, LongAlpaca concluded:

CVPR's writing style is more structured and experimental, focusing on describing an approach to a computer vision task.

The ICLR paper style is more flexible, focusing on key formulas and theoretical analysis of proof meta-learning insights.

Ask questions about famous book chapters

After feeling LongAlpaca's analysis of the paper, let's move on to see if it can pick up more dramatic fiction masterpieces.

Throw in an excerpt from Journey to the West (36870 tokens) and ask:

In "Journey to the West", why was Sun Wukong very powerful during the period of the Heavenly Palace, but he was repeatedly frustrated on the way to get the scriptures?

LongAlpaca analyzes the reasons for Sun Wukong's arrogance during the troubles of the Heavenly Palace, his later limitations by the Buddha, and the growing strength of the monsters along the way.

To sum up:

Strong opponents, limitations on his power, deception and lack of companions are the initial reasons for his failure on the journey; But over time, he gained wisdom, experience, and comrades that enabled him to eventually defeat more formidable enemies.

Compete with Llama 2

After its own vertical evaluation, it is certainly indispensable to compare horizontally.

So how will LongAlpaca perform compared to Llama 2, which is hot in the big language model open source world?

This round we sent the "Big Cup", the LongAlpaca-13B player, and the question asked it was:

According to the chapters in the first part of "The Three-Body Problem" (a total of 32252 tokens), why did Ye Wenjie contact aliens, and why did he regret it later?

LongAlpaca-13B can be said to give the answer more accurately:

Ye Wenjie's initial contact with aliens was motivated by both idealism and revenge, but over time, her view of human nature changed, she realized that she might pose danger to all of humanity, and her regrets increased.

On the other hand, Llama 2, which is also a large language model of 13B magnitude, makes a somewhat unsatisfactory answer, and most of the text does not revolve around the question itself.

Just a simple mention:

In the end she regretted it because she realized that her actions had led to consequences she never thought of.

All in all, from various evaluation performances, LongAlpaca has indeed achieved great optimization in dealing with the problem of long text input.

So the next question is:

How is it done?

Grab data with your left hand and strategy with your right hand, this is LongAlpaca's response.

In terms of data, as we just mentioned, one of the difficulties in training a long text large language model is the lack of public long text dialogue data.

Moreover, most of the previous training of long text models was pre-trained in the way of "next-token-generation" on non-conversational corpus.

Although this method can align the position encoding format of long text by the model, the disadvantage is also obvious, that is, it is difficult to make the model have better dialogue ability.

Therefore, Jajaya team collected 9k long text Q&A corpus pairs, including various Q&A on famous books, papers, in-depth reports, and even financial statements.

Among them, the relevant questions and answers of papers are the most detailed, including "review", "paper comparison", "conference style comparison", "revision comments", and questions about the content of the paper.

But after all, "long" can not forget "short", so Jia Jiaya's team also selected about 3k short question and answer corpus mixed training from the original Alpaca dataset.

In the end, the LongAlpaca-12k we mentioned earlier was successfully built.

Next, there's the strategic level.

As we just mentioned, another persistent problem in the long text input problem of large language models is the huge consumption of computing resources.

Specifically, it focuses on the calculation of the self-attention mechanism—the overhead increases in square order with the length of the text.

Therefore, the research team took this as a breakthrough and proposed LongLoRA, a text length extension scheme for large language models under development. At the same time, grouping and offset are used to simulate the global self-attention mechanism.

LongLoRA design overview

Among them, the specific key technical point of LongLoRA is shift short attention, which we call bias short attention.

Its core idea is to replace dense global attention with sparse local attention.

It can probably be understood as the idea of retrieval, and you only need to use centext with high matching and similarity.

This significantly reduces the consumption of computing resources.

△Shift short attention diagram

What's more, LongLoRA training only takes 2 lines of code to achieve!

In addition, LongLoRA explored ways to train at low ranks. The original low-rank training methods, such as LoRA, cannot achieve good results in text length transfer.

On the basis of low-rank training, LongLoRA introduces embedding layers (Embedding layer and Normalization layers) for fine-tuning, so as to achieve the effect of approximating full fine-tune.

For 8k length model training, LongLoRA reduces memory consumption from 46.3GB to 25.6GB compared to full parameter fine-tuning.

For model training with a length of 64k, LongLoRA reduces the training time from about 90~100 hours to 52.4 hours compared to conventional LoRA.

△Full parameter fine-tuning, performance comparison of conventional LoRA and LongLoRA

It is worth mentioning that LongLoRA has demonstrated excellent performance in various language tasks, including text modeling (Proof-pile, PG-19) and information retrieval (topic retrieval, passkey retrieval).

And LongLoRA can extend the text length of the 7B model to 100k tokens and the text length of the 70B model to 32k tokens on only an 8-card A100 machine, and maintain excellent language modeling performance.

How is it deployed?

For such a "fast, good, and provincial" project, can't wait to try it?

It is now open sourced on GitHub with a very detailed deployment tutorial.

For example, when it comes to installation, only six simple steps are required:

1) Fork this repo in GitHub.

2. Clone the repository on your local machine, use git Clone and paste the URL of this project.

3. Run the following code:

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

4. Use "Published Model" and "Fine-tuned Model" according to preference.

5. Test the model through dialogue.

6. Deploy to your own demo.

As well as various "cup" models, training process code, etc., the team has shown in detail in the GitHub project.

Small partners in need can poke the link below to pick up their own ~

GitHub Project Address:

https://github.com/dvlab-research/LongLoRA

Paper Address:

https://browse.arxiv.org/pdf/2309.12307.pdf

— End —

Qubits QbitAI · Headline number signed

Jia Jiaya's team open sourced the world's first 70B long text large language model and read papers directly ProMax

A "world-first" first-hand experience

How is it done?

How is it deployed?

Read on