Baijiao is from the Au Fei Temple
量子位 | 公众号 QbitAI
Mistral-Medium was accidentally leaked, which was previously only available through the API, and its performance is close to that of GPT-4.
The CEO's latest voice: It is true, it was leaked by the early customer employees. But we still hope for it.
In other words, this version is still old, and the actual version will perform better.
In the past two days, this mysterious model named "Miqu" has exploded in the large model community, and many people still suspect that it is a fine-tuned version of LIama.
The CEO of Mistral also explained that Mistral Medium was retrained on the basis of Llama 2 because it needed to provide early customers with an API closer to the performance of GPT-4 as soon as possible, and the pre-training was completed on the day of the release of Mistral 7B.
Now that the truth is revealed, the CEO is still selling off, and many netizens are poking their hands at the bottom in anticipation.
Mistral-Medium意外泄露
Let's revisit the whole story. On January 28, a mysterious user named Miqu Dev posted a set of files "miqu-1-70b" on HuggingFace.
The document states that the "prompt format" and user interaction of the new LLM are the same as those of Mistral.
On the same day, an anonymous user on 4chan posted a link to the miqu-1-70b file.
So some netizens noticed this mysterious model and began to conduct some benchmarks.
It was astonishingly found that it scored 83.5 points on the EQ-Bench (local assessment), surpassing all other large models in the world except GPT-4.
For a while, netizens strongly called for this large model to be added to the leaderboard and find out the real model behind it.
There are three main directions of general suspicion:
- It's the same model as Mistral-Medium.
Some netizens posted the comparison effect: it knows that the standard answer is reasonable, but it is impossible that even the Russian wording is exactly the same as Mistral-Medium.
- Miqu is supposed to be a fine-tuned version of LIama 2.
But other netizens found that it is not a MoE model, and it has the same architecture, the same parameters, and the same number of layers as LIama 2.
However, it was immediately questioned by other netizens, and the Mistral 7b also has the same parameters and layers as the llama 7B.
Instead, this is more like an early non-MoE version model of Mistral.
However, it is undeniable that in the minds of many people, this is the closest model to GPT-4.
Now, Mistral co-founder and CEO Arthur Mensch admits that the leak was that one of their early customer employees was overzealous and leaked a quantified version of an old model they had trained and publicly released.
As for Perplexity, the CEO also clarified that they have never been weighted by Mistral Medium.
Netizens are worried about whether this version will be taken down.
Interestingly, Mensch didn't ask for the post on HuggingFace to be deleted.
Instead, they left a comment saying: Attribution may be considered.
Reference Links:
[1]https://www.reddit.com/r/LocalLLaMA/comments/1af4fbg/llm_comparisontest_miqu170b/
[2]https://twitter.com/teortaxesTex/status/1752427812466593975
[3]https://twitter.com/N8Programs/status/1752441060133892503
[4]https://twitter.com/AravSrinivas/status/1752803571035504858