On the evening of April 23, Microsoft open-sourced a large language model with small parameters - Phi-3-mini on its official website.

It is reported that Phi-3-mini is the 4th generation of Microsoft's Phi family, with pre-training and instruction fine-tuning of a variety of models, with parameters of only 3.8 billion training data but as high as 3.3T tokens, which is more than many model training data with tens of billions of parameters, which is also one of the main reasons for its super performance.

The Phi-3-mini has a minimal memory footprint and can be deployed in similar phones such as the iPhone 14. Despite the limitations of mobile hardware devices, it is still capable of generating 12 tokens per second.

It is worth mentioning that Microsoft uses synthetic data when pre-training Phi-3-mini, which can help large models better understand language architecture, expressions, text semantic understanding, logical reasoning, and professional terminology for specific business scenarios.

Open source address: https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3

Ollama address: https://ollama.com/library/phi3

Technical Report: https://arxiv.org/abs/2404.14219

In June 2023, Microsoft debuted Phi-1, a model dedicated to Python coding, with only 1.3 billion parameters, beating well-known models such as GPT-3.5 in the field of programming, which allows Microsoft to see the broad development space of small-parameter models.

Subsequently, on the basis of Phi-1, Microsoft launched Phi-1.5, a large language model with reasoning, text generation, content summarization, and email drafting, which became one of the strongest small-parameter models at that time.

In December 2023, Microsoft developed Phi-2 on top of Phi-1.5 with only 2.7 billion parameters and beat Llama-2 with 13 billion parameters and Mistral with 7 billion parameters without human feedback reinforcement learning and instruction fine-tuning, and even outperformed Llama-2 with 70 billion parameters in coding and math tests.

The Phi-3 series released this time combines all the excellent technical features of the previous three generations, and uses massive high-quality datasets, innovative training and fine-tuning methods, making it the most powerful open-source small-parameter model at present.

Phi-3-mini架构简单介绍

The Phi-3-mini uses a transformer architecture to support 4K and 128K context windows, and is the first open-source product of its kind to support 128K.

The high-quality training dataset is one of the important reasons for the superior performance of Phi-3-mini, and Microsoft uses the 3.3T tokens dataset including: network public documents that have been rigorously screened for quality, selected high-quality educational data and programming code;

Textbook-style data created from synthetic data, e.g., math, coding, common sense reasoning, world knowledge, psychological science, etc.;

Supervised data in a high-quality chat format that covers a variety of topics to reflect human preferences in different aspects, e.g., following instructions, authenticity, honesty, etc.

In terms of training strategy, in order to help Phi-3-mini better absorb synthetic data, Microsoft used iterative training strategies: in the initial stage, Phi-3-mini used public network data and learned basic syntax, semantics, and context understanding;

In the iterative stage, the synthetic data and network data were combined to build a new training set, and the Phi-3-mini was iteratively trained to further strengthen the model's understanding and generation capabilities, and the training was repeated multiple times.

In terms of test data, Phi-3 Mini has conducted comprehensive tests on language understanding, logical reasoning, machine translation, coding, etc. in well-known benchmark platforms such as MMLU, GSM-8K, MedQA, and BigBench-Hard.

The results show that the performance of Phi-3-mini in language comprehension, coding, and mathematics exceeds that of the model with larger parameters through only a small number of sample prompts, and the overall performance is very good.

Microsoft said it will release two smaller models, the 7 billion parameter Phi-3-small and the 14 billion parameter Phi-3-medium, in the coming weeks. Among them, the performance of Phi-3-medium is comparable to that of Mixtral 8x7B and GPT-3.5, but with less resource consumption.

微软开源最强小参数大模型—Phi-3 Mini

Phi-3-mini架构简单介绍

Read on

In the era of large models, is the data center outdated now?

轩辕大模型的实践与应用 | ML-Summit 2024

The mobile UI model came out, and the Apple iPhone may welcome a new cycle of upgrades

iFLYTEK does not tell the "sexy story" of large models

Meta released the "strongest open-source AI model", and the next generation may be stronger than GPT

面壁新模型:早于Llama3、比肩 Llama3、推理超越 Llama3!

Huawei's profit soared by 564% in the first quarter, Tianya community recovered, and Xiaohongshu tested its self-developed large model

13 Models of Effective Communication Expression

Eat through an industrial chain in one day: NO.37 AI large model industrial chain

10 domestic large models vs. mentally handicapped - Chinese comprehension ability assessment

The most complete interpretation of the MoE hybrid expert model: revealing the key technologies and challenges

Baidu's strongest SOTA: 3DGS based on diffusion model!

Sprint 2024 "Half Year Red" | Sixty percent of AI companies have achieved profitable growth, and large model companies have made money?

Dialogue with UBTECH Jiao Jichao: Large model accelerates humanoid robots to "work in the factory"

iFLYTEK's profit puzzle: high investment and low return in the field of large models

Ali Lin Junyang: Large models are not enough for many people, and building multimodal agents is the key