laitimes

Face the wall not only to do Chinese Mistral, but also to surpass it

author:Silicon Star Man

Author|Zhou Yixiao

E-mail|[email protected]

After more than 70 days, after the release of MiniCPM-2B, Facewall brought four models with distinctive characteristics, and at the same time, it also officially announced hundreds of millions of yuan in new financing.

The financing was led by Primavera Venture Capital and Huawei Hubble, followed by Beijing Artificial Intelligence Industry Investment Fund, and supported by Zhihu as a strategic shareholder. This is the largest financing since its establishment. This company, which is often used as a benchmark for Mistral, is not satisfied with just being a "Chinese Mistral", and after the ammunition is sufficient, it will carry out the matter of "fighting the elite" to the end.

Small and strong, small and complete: the small steel cannon is fired in four bursts

At the beginning of February this year, Facewall released the open-source end-side model MiniCPM-2B with 2B parameters, and called it a "small steel cannon", which realized the performance of Mistral-7B and Llama2-13B on the basis of smaller parameters. Since its release, MiniCPM-2B has topped GitHub Trending several times, and has also received praise from Thomas Wolf, co-founder of HuggingFace.

After more than 70 days, Facewall Intelligence released four models at one time, let's take a look at their performance.

Multimodal model MiniCPM-V 2.0

MiniCPM-V 2.0 is a multi-modal large model that can be deployed on mobile phones, with a scale of only about 2.8B, but it has achieved good scores in mainstream evaluations. The OpenCompass list, which integrates 11 mainstream evaluation benchmarks, surpasses Qwen-VL-Chat-10B, CogVLM-Chat-17B, and Yi-LV-34B in general.

Facewall Intelligence particularly emphasizes that the hallucination probability of MiniCPM-V 2.0 is very low, which is on par with GPT-4V, and on the Object HalBench list for evaluating large model hallucinations, MiniCPM-V 2.0 is 14.5% and GPT-4V is 13.6%.

Face the wall not only to do Chinese Mistral, but also to surpass it

In terms of OCR capability, MiniCPM-V2.0 surpasses the 13B model of the whole series and is comparable to Gemini Pro. MiniCPM-V2.0 also strengthens the recognition and understanding of long images, and optimizes the compatibility of images of various sizes, supporting high-definition large images from 448x448 pixels to 1.8 million pixels, and also supports the extreme aspect ratio of 1:9.

Face the wall not only to do Chinese Mistral, but also to surpass it

Long text model MiniCPM-2B-128K

Long text has become a kind of "standard" for large models, and MiniCPM-2B-128K achieves 128K long text capability with the scale of 2B, and the average score in the InfiniteBench list exceeds that of Yarn-Mistral-7B-128K, Yi-6B-200K, ChatGLM3-6B-128K, and LWM-Text-128K, and achieves the best performance in models below 7B.

"The long text thing has just begun, although it is a 2B model, it still requires a very large memory to make the model run, and the next step will be to further do more extreme technical exploration, so that the long text model can run on the end side. ”

Face the wall not only to do Chinese Mistral, but also to surpass it

MoE MINICPM - MoE - 8x2B MoE

MiniCPM-MoE-8x2B MoE introduces the MoE architecture and enhances performance, which can improve the average model by 4.5% on the original basis. Save on training costs compared to training from scratch. Through the MOE method, the average activation parameter is only 4B, but it is better than LiaMA2-34B, Gemma-7B and other models, and the inference cost is only 69.7% of Gemma-7B.

"更Mini"的MiniCPM-1.2B

The MiniCPM-1.2B parameters have been cut in half, while maintaining 87% of the overall performance of the previous generation 2.4B model, which involves many optimizations, such as replacing words in the vocabulary that are not high-frequency. In multiple list tests, the comprehensive performance of MiniCPM-1.2B surpasses that of Qwen 1.8B, llama2-7B, and even llama2-13B.

Face the wall not only to do Chinese Mistral, but also to surpass it

By making the 1.2B model more effective than the 1.8B model, 25 tokens/s on the mobile phone is realized. Compared with MiniCPM-2.4B, MiniCPM-1.2B reduces memory by 51.9% and costs by 60%.

"The model is smaller, and the use case is bigger." In addition to supporting lower-configuration mobile phones, MiniCPM-1.2B has a wide range of applications in application scenarios such as emotional companionship and real-time translation. "They're looking forward to smaller and stronger models."

More than just "Chinese version of Mistral"

Li Dahai, CEO of Facewall Intelligence, summarized the series of models released this time as "small and strong, small and complete", and once again emphasized the underlying logic of Facewall Intelligence: a company that pursues efficient large models. This is very reminiscent of Mistral, the "European Open AI", which is also the pursuit of high efficiency, the same high-performance model with small parameters, and the same sought after by the open source community.

However, Facewall Intelligence obviously does not want to be just a second Mistral, this company that gathers China's first batch of technicians to study large models has its own distinct technical judgment and product route.

In terms of infrastructure, Facewall Intelligence has developed frameworks such as BMTrain to support the training of large models, thereby reducing training costs.

At the algorithm level, by conducting a large number of "sandbox experiments", facewall intelligence explores the optimal training configuration such as model batch size and hyperparameter configuration, finds the optimal solution theoretically, and clarifies the law with a smaller cost and cost. For example, a large number of sandbox experiments are done on the model with smaller parameters, and the performance and parameter scheme of the larger-scale parameter model are estimated through the "alchemy" of scientific experimentation, and finally the performance and parameter scheme of the larger-scale parameter model are predicted.

"Continue to research better ScalingLaw, use greater model compression effect, and train better models and smaller models with less data. ”

In addition, in addition to the base model, another focus of the face wall is in the AI Agent, which is also very different from Mistral.

ChatDev is one of the earliest teams to conduct Agent research, ChatDev is a large model + Agent project open source of Facewall Intelligence and OpenBMB and NLP Laboratory of Tsinghua University, ChatDev is like a software development company with multiple Agent collaborative operations, after the user specifies the requirements, the Agents of different roles carry out interactive collaboration to produce complete software including source code, environmental dependency manuals, and user manuals. With multi-agent collaboration, better results can be produced in existing models. This is what Andrew Ng recently said at the Sequoia AI Summit about GPT3.5 + Agentic Workflow >GPT4, in fact, Andrew Ng directly used ChatDev as an example in his speech.

Agent is an important breakthrough in the commercialization of Facewall Intelligence, ChatDev has also started to commercialize from paper research and open source products, and FaceWall Intelligence has launched ChatDev, a SaaS product of AI Agent, trying to help software developers and entrepreneurs complete software development work at a lower cost and threshold. At the same time, Facewall Intelligence is also exploring the commercial application of large model + Agent in scenarios such as finance, education, government affairs, and intelligent terminals.

OpenAI has provided a path to AGI with a miracle, but there is not only one way to get there. In the large-scale model industry, which is crazy about burning money and fighting for computing power, there will be a bottleneck in relying only on a single dimension of improvement, and this improvement may be affected by diminishing marginal benefits. Facing the wall intelligence uses experimental science to do basic model research, emphasizing high efficiency, and to a certain extent, pursuing a kind of "cost performance". With the same resources, Facewall can use "efficient" leverage to achieve higher returns. The MiniCPM series of models has proven that it is feasible to make the model better within the same resources, and we can expect to continue this idea and come up with GPT-4 model products.

In contrast, although Mistral has launched a large model product that claims to challenge GPT-4, it is not only more and more like OpenAI in terms of business model, but also Mistral Large is no longer open source, which makes people question whether Mistral, after taking Microsoft's investment, has embarked on the old path of OpenAI and eventually become another "vassal" of Microsoft.

If the pursuit of high efficiency is the similarity between Facewall Intelligence and Mistral, the research investment and accumulation of Agent have allowed Facewall to have a different commercialization path. From websites to apps, we have witnessed the changes of the main carriers of Internet native applications, and in the era of AI, Agents have new potential, and Xiaogang Cannon has become the best carrier to tap this potential.

From benchmarking Mistral to surpassing Mistral, Facewall Intelligence may have chosen a path that few people have taken, but it has enough confidence to continue to walk.

Read on