laitimes

微软开源最强小参数大模型—Phi-3 Mini

author:Not bald programmer
微软开源最强小参数大模型—Phi-3 Mini

On the evening of April 23, Microsoft open-sourced a large language model with small parameters - Phi-3-mini on its official website.

It is reported that Phi-3-mini is the 4th generation of Microsoft's Phi family, with pre-training and instruction fine-tuning of a variety of models, with parameters of only 3.8 billion training data but as high as 3.3T tokens, which is more than many model training data with tens of billions of parameters, which is also one of the main reasons for its super performance.

The Phi-3-mini has a minimal memory footprint and can be deployed in similar phones such as the iPhone 14. Despite the limitations of mobile hardware devices, it is still capable of generating 12 tokens per second.

It is worth mentioning that Microsoft uses synthetic data when pre-training Phi-3-mini, which can help large models better understand language architecture, expressions, text semantic understanding, logical reasoning, and professional terminology for specific business scenarios.

Open source address: https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3

Ollama address: https://ollama.com/library/phi3

Technical Report: https://arxiv.org/abs/2404.14219

微软开源最强小参数大模型—Phi-3 Mini

In June 2023, Microsoft debuted Phi-1, a model dedicated to Python coding, with only 1.3 billion parameters, beating well-known models such as GPT-3.5 in the field of programming, which allows Microsoft to see the broad development space of small-parameter models.

Subsequently, on the basis of Phi-1, Microsoft launched Phi-1.5, a large language model with reasoning, text generation, content summarization, and email drafting, which became one of the strongest small-parameter models at that time.

微软开源最强小参数大模型—Phi-3 Mini

In December 2023, Microsoft developed Phi-2 on top of Phi-1.5 with only 2.7 billion parameters and beat Llama-2 with 13 billion parameters and Mistral with 7 billion parameters without human feedback reinforcement learning and instruction fine-tuning, and even outperformed Llama-2 with 70 billion parameters in coding and math tests.

The Phi-3 series released this time combines all the excellent technical features of the previous three generations, and uses massive high-quality datasets, innovative training and fine-tuning methods, making it the most powerful open-source small-parameter model at present.

Phi-3-mini架构简单介绍

The Phi-3-mini uses a transformer architecture to support 4K and 128K context windows, and is the first open-source product of its kind to support 128K.

微软开源最强小参数大模型—Phi-3 Mini

The high-quality training dataset is one of the important reasons for the superior performance of Phi-3-mini, and Microsoft uses the 3.3T tokens dataset including: network public documents that have been rigorously screened for quality, selected high-quality educational data and programming code;

Textbook-style data created from synthetic data, e.g., math, coding, common sense reasoning, world knowledge, psychological science, etc.;

Supervised data in a high-quality chat format that covers a variety of topics to reflect human preferences in different aspects, e.g., following instructions, authenticity, honesty, etc.

In terms of training strategy, in order to help Phi-3-mini better absorb synthetic data, Microsoft used iterative training strategies: in the initial stage, Phi-3-mini used public network data and learned basic syntax, semantics, and context understanding;

微软开源最强小参数大模型—Phi-3 Mini

In the iterative stage, the synthetic data and network data were combined to build a new training set, and the Phi-3-mini was iteratively trained to further strengthen the model's understanding and generation capabilities, and the training was repeated multiple times.

In terms of test data, Phi-3 Mini has conducted comprehensive tests on language understanding, logical reasoning, machine translation, coding, etc. in well-known benchmark platforms such as MMLU, GSM-8K, MedQA, and BigBench-Hard.

The results show that the performance of Phi-3-mini in language comprehension, coding, and mathematics exceeds that of the model with larger parameters through only a small number of sample prompts, and the overall performance is very good.

微软开源最强小参数大模型—Phi-3 Mini

Microsoft said it will release two smaller models, the 7 billion parameter Phi-3-small and the 14 billion parameter Phi-3-medium, in the coming weeks. Among them, the performance of Phi-3-medium is comparable to that of Mixtral 8x7B and GPT-3.5, but with less resource consumption.

微软开源最强小参数大模型—Phi-3 Mini

Read on