两只羊驼掐头去尾拼一起，屠榜HuggingFace

2023-12-29 13:06:00

Mengchen is from Wafei Temple

量子位 | 公众号 QbitAI

HuggingFace's open-source large model ranking has been slaughtered again.

The front row is occupied by an all-out SOLAR 10.7B trim build, squeezing out the various Mixtral 8x7B trim trim versions from a few weeks ago.

What is the origin of the SOLAR model?

The paper has just been uploaded to ArXiv from the South Korean company Upstage AI, which uses a new method of scaling large models deep up-scaling (DUS).

To put it simply, two 7B alpacas pinch their heads and tails, one cuts off the front 8 layers, and the other cuts off the back 8 layers.

The remaining two 24 layers are stitched together, and the 24th layer of the first model is spliced together with the 9th layer of the second model, and finally it becomes a new 48-layer 10.7B large model.

The paper claims that the new approach surpasses traditional scaling methods such as MoE and can have the same infrastructure as the underlying large model.

There is no need for add-on modules such as gated networks, training frameworks optimized for MoE, and custom CUDA kernels for fast inference, which can be seamlessly integrated into existing methods while remaining efficient.

The team chose Mistral 7B, the largest single model of 7B, as the substrate, and spliced it together in a new way to surpass the original and MoE versions.

At the same time, the aligned Instruct version also surpasses the corresponding MoE Instruct version.

Carry out the stitching to the end

Why is this splicing method, the paper is introduced from a kind of intuition.

Start with the simplest way of scaling, which is to repeat the base model of 32 layers twice to become 64 layers.

The advantage of this is that there is no heterogeneity, all layers are from the base large model, but there is a large "layer distance" at the seams of layers 32 and 33 (the same as layer 1).

Previous studies have shown that different layers of Transformer do different things, such as deeper layers are better at handling more abstract concepts.

The team believes that too large a layer distance may hinder the model's ability to make effective use of pre-trained weights.

One potential solution is to sacrifice the middle layer and thus reduce the variability at the seams, and this is where the DUS method is born.

Based on the trade-off between performance and model size, the team chose to remove 8 layers from each model, and the seams went from 32 layers to layer 1 to 24 layers with layer 9.

The performance of the model after simple splicing will be lower than that of the original basic model at first, but it can be quickly recovered after continued pre-training.

In the instruction fine-tuning phase, in addition to using the open-source dataset, a mathematically enhanced dataset was made, and the DPO was used in the alignment phase.

The last step is to put the weighted average of the model versions trained with different datasets, and also to stitch to the end.

Some netizens questioned the possibility of testing data leaks.

With this in mind, the team specifically reported the data contamination test results in the appendix of the paper, which showed low levels.

Finally, both the SOLAR 10.7B base model and the fine-tuning model are open source under the Apache 2.0 license.

Netizens who have tried it have reported that extracting data from JSON format data performs well.

Address:

https://arxiv.org/abs/2312.15166

— END —

QbitAI · Headline number signed

两只羊驼掐头去尾拼一起，屠榜HuggingFace

Carry out the stitching to the end

Read on

39-year-old Guan Tingna lives in a house to feed alpacas, wears pajamas, shows off her plump career line, and has fair skin and good condition

Zhao Benshan's "royal wife" Guan Tingna raised alpacas leisurely in the mansion, showing elegance and calmness

Zhao Benshan's royal wife Guan Tingna raised alpacas in a mansion, showing off her wealth and poking the hearts of netizens

sparked heated discussions "pet lovers talk about pets are expected to be on the high-speed rail, dogs can be alpacas!" ”

The AD gap is obvious, the alpaca is completely exploded, and Xu Xiu lies down to win! DK sends KT to lose two in a row!

Talking about new opportunities for China's economy|Communication: The Chinese market brings more development opportunities to Peruvian alpaca artisans

Celebrating more than one year of alpaca prince cheating? Liu Da Hammer exploded and broke the news that Qing 3 fans were smoking

Zhao Benshan's "royal wife" Guan Tingna, who raises alpacas, wears low-necked pajamas and has a plump figure!

The "mythical beast" alpaca enters the community, and this summer market has a → to watch

Drinking tea, playing cards, chatting, listening to local bands, taking photos with peacock alpacas, tea stalls by the Yellow River are becoming more and more foreign!

Can you tell the difference between an alpaca and an alpaca? Come and find out about the Defying Eco-City! #趣看动物吧#第五期互动福利放送中

Can Alpacas cure AIDS? The latest research shows that it has super "nano antibodies"

Incontinence in the alpaca mall! Netizen: Sure enough, it's fresh "pearl" milk tea, which is made to order

Meta's first multimodal Llama 3.2 is open source! 1B baby alpaca, running on the phone

Li Yi Tong Dou Quaint Photos! 巨型羊驼裙,少女颜值爆表,搞怪顔尽显俏皮

Inhuman: Forced to go on a blind date in September, my mother introduced the bald returnee doctor, and my father introduced the alpaca