Microsoft launched a ChatGPT-level model that the iPhone can run, netizens: OpenAI has to eliminate 3.5

author：Quantum Position 2024-04-23 12:55:00

Mengchen is from the Au Fei Temple

量子位 | 公众号 QbitAI

Just a few days after the release of Llama 3, Microsoft took action to cut it off?

Microsoft launched a ChatGPT-level model that the iPhone can run, netizens: OpenAI has to eliminate 3.5

The just-released technical report on the Phi-3 series of small models has aroused heated discussions in the AI circle.

The Phi-3-mini, which has only 3.8B parameters, outperformed the Llama 3 8B in multiple benchmarks.

In order to facilitate the use of the open source community, it has also been specially designed to be compatible with the Llama series.

This time, Microsoft played the banner of "a small model that can run directly on a mobile phone", and the 4-bit quantized phi-3-mini ran to 12 tokens per second on the Apple A16 chip used by the iPhone 14 pro and iPhone 15.

This means that the best open-source model that can run locally on mobile phones now has reached the level of ChatGPT.

There was also a trick in the technical report, asking the phi-3-mini to explain to himself why it was amazing to build a model as small as a mobile phone to run.

In addition to the mini cup, the small cup in the cup is also released:

Phi-3-small, 7B parameters, in order to support multi-language use of tiktoken tokenizer, and an additional 10% multilingual data.

Phi-3-medium, 14B parameters, trained on more data, has surpassed GPT-3.5 and Mixtral 8x7b MoE in most tests.

(Big cups they don't plan to do at the moment)

The author lineup is not simple at first glance, and at a glance, the MSRA and MSR Raymond teams have invested a lot of people.

So, what exactly is unique about the Phi-3 series?

According to the technical report, the core secret lies in the data.

Last year, the team discovered that simply stacking up the number of parameters is not the only way to improve the performance of the model.

On the contrary, the careful design of training data, especially the use of large language models to generate synthetic data, with strict filtering of high-quality data, can greatly improve the capabilities of small and medium-sized models.

也就是训练阶段只接触教科书级别的高质量数据，Textbooks are all you need。

Phi-3 continued this line of thinking, and this time they made a lot of money:

Feeding up to 3.3 trillion tokens of training data (4.8 trillion in medium)
The "education level" filtering of the data has been greatly enhanced
More diverse synthetic data, covering a variety of skills such as logical reasoning, knowledge question answering, and so on
Unique command fine-tuning and RLHF training greatly improve dialogue and security

For example, the results of a football match on a given day may be good training data for a large model, but the Microsoft team removed the knowledge-enhancing data and left more data to improve the model's inference ability.

In this way, compared to the Llama-2 series, it is possible to obtain a higher MMLU test score with smaller parameters.

However, a small model is a small model after all, and it is inevitable that there will be some weaknesses.

Microsoft revealed that the model itself does not have the ability to store too much facts and knowledge in its parameters, which can also be seen from the low TriviaQA test score.

The mitigation is enhanced by access to the Internet search engine.

In short, the Microsoft Research team is determined to continue on the road of small model + data engineering, and plans to continue to enhance the multilingual ability and security of small models in the future.

Regarding the fact that the open-source small model surpasses ChatGPT, many netizens believe that the pressure is now on OpenAI's side, and it is necessary to launch a successor to GPT-3.5 as soon as possible.

Reference Links:

[1]https://arxiv.org/abs/2404.14219

— END —

QbitAI · Headline number signed

Microsoft launched a ChatGPT-level model that the iPhone can run, netizens: OpenAI has to eliminate 3.5

Read on