Baijiao is from the Au Fei Temple

Quantum Position | 公众号 QbitAI

The 120B model that is not in the starting lineup of Llama 3 was unexpectedly "exposed", and it is very capable of playing?!

Recently, such a mysterious large model called "Llama3-120b" has become popular, and the reason is that it performs so well-

轻松击败GPT-4、gpt2-chatbot那种。

For example, when faced with a difficult problem like "seeing if the Higgs field will change its state".

GPT-4冷酷而决绝：No；

But Llama3-120b is different, "only if we question the Copenhagen interpretation of quantum mechanics, let me explain ......"

网友缝合Llama3 120B竟意外能打,轻松击败GPT2-chatbot和GPT-4

还有就是让LIama3-120B解释笑话，并同两个gpt2-chatbot比较：im-a-good-gpt2-chatbot；im-also-a-good-gpt2-chatbot。

I randomly walked past the grave of bayes yesterday, but being a frequentist i didn’t bother taking a photo。

(Yesterday I happened to pass by the Bayesian tomb, but I'm a frequency school and too lazy to take pictures.) ）

First, all three of them judged that it was a joke about two schools of statistics, and that the two schools of thought were opposites:

The Frequentist school of frequency is only concerned with extracting information from the results of sampling/experiments, while the Bayesian school also sets up a subjective a priori information.

But the two GPT2-chatbots gave a further explanation that it was precisely because the frequency school did not agree with the Bayesian theory that they were not interested in Bayes' tomb, let alone take pictures of him proudly.

而LIama3-120B精准指出笑点在“ i didn’t bother taking a photo”，并给出了更深一层的解释。

Because as a school of frequency, it would be believed that the probability of encountering a Bayesian tomb is almost zero. Such a small probability event is not worth taking pictures or doing something meaningful.

Oh Mo, that makes sense......

In addition, as an aside, its answer format is very neat, which is pleasing to the eye.

In addition to this, netizens have discovered that it can also create new words that Google has found 0 results.

As well as answering vague questions directly, without giving a background explanation, this is not much better than ChatGPT.

(Didn't say ChatGPT was bad)

Netizens after the review sighed: It's too smart, I won't fiddle with it anymore. Because it has its own ideas.

It's really the smartest model I've ever used.

Some netizens searched for a long time but couldn't find an official source......

At the same time, more versions began to appear, such as 170B, 225B... Well, one version is stronger than the other.

Call 3 120B竟意外能打

In the past two days, various stories about how to play Llama3 120B have appeared on social networks.

For example, deriving and explaining some theories, the Omega hypothesis.

There are some new words created such as prefaceate, driftift, etc

And give it a complete explanation and definition.

There are even people who have made a whole evaluation to evaluate this large model of unknown origin. The results scored well in the creative writing test, ranking 6th, surpassing GPT-4, Claude3-Haiku and other models.

That being the case, how did this unofficial large model Llama3 120B come about?

According to the author, it is made with MergeKit, which merges Meta's official LIama3 70B model (Self-Merge)

MergeKit is a toolkit designed to merge pre-trained models, either fully on the CPU or accelerated with as little as 8GB of VRAM. It has 3.6k stars on GitHub.

目前支持Llama、Mistral、GPT-NeoX、StableLM 等模型。

△ Supported merging algorithms

The author, Maxime Labonne, is a veteran machine learning expert who currently works at LiquidAI, a general-purpose large model startup.

He graduated from the École Polytechnique de Paris with a Ph.D., and in 2019 he began to study large language models and graph neural networks, and applied them to different environments, such as R&D, industry, finance, etc., and wrote the book "Hands-On Graph Neural Networks using Python".

他也是开发者社区的活跃开发者,在HuggingFace上发布过各种LLM, 例如AlpahMonarch-7B、Beyonder-4x7B、Phixtral 和 NeuralBeagle14。以及一些工具,例如 LLM AutoEval、LazyMergekit、LazyZxolotl 和 AutoGGUF。

On GitHub, he took a course on large models and received 29.5K Star.

However, for the use of this "stitched" model, the author suggests that it can be used for creative writing.

In the multi-party evaluation, it can be seen that it is sometimes nervous, but the writing style is good. There are also sometimes spelling mistakes and a great penchant for capital letters.

And because I felt that the reasoning ability of this version was relatively poor, the author made another 225B.

Netizen: After reading it, I am looking forward to the official 400B

Some netizens speculated why LIama3-120B can be so strong.

On the one hand, LIama3-70B is really strong in its own right, and it quickly jumped to the top of the leaderboard when it was first released. According to HuggingFace, there were more than 270,000 downloads last month.

lmsysorg provides an in-depth analysis of the strength of LIama3, which beats the top models for open-ended writing and creative problems, but is slightly weaker for closed-ended math and coding problems.

However, as the prompts become more complex, LIama3's capabilities have decreased significantly.

And in terms of output content, the output of LIama3 is more friendly and conversational than other models.

In addition, some netizens analyzed that this is related to the depth of the model.

In fact, the only difference with LIama3-70B is the additional layer, even the duplication, and no new training data.

This means that the intelligence level of the 120B large model is generated from the depth of the model. "It's not just a function of training data, it's a combination of data and depth."

Some netizens have tried local deployment, and Ollama already supports downloading. Netizens said: It uses 48 GB VRAM + 38 GB system RAM.

Ah......h Walked away.

There is LMStudioAI that offers GGUF form, and it's also pretty straightforward: not for people who are out of memory.

The original author also said funny: It's time to say goodbye to your RAM.

But in any case, more official models are already being expected.

For example, the 400B kind.

Reference Links:

[1]https://x.com/spectate_or/status/1788031383052374069

[2]https://x.com/spectate_or/status/1787308316152242289

[3]https://x.com/spectate_or/status/1787295252576952325

[4]https://x.com/spectate_or/status/1787264115804606628

[5]https://huggingface.co/mlabonne/Meta-Llama-3-120B-Instruct

[6]https://x.com/maximelabonne/status/1787485038591746269

[7]https://x.com/spectate_or/status/1788102406250664171

[8]https://x.com/spectate_or/status/1787576927529615516

— END —

量子位 QbitAI 头条号签

网友缝合Llama3 120B竟意外能打,轻松击败GPT2-chatbot和GPT-4

Call 3 120B竟意外能打

△ Supported merging algorithms

Netizen: After reading it, I am looking forward to the official 400B