Recently, in the mouths of many American developers, an open-source model has been frequently mentioned, and its pronunciation sounds like "sleepy". It's always confusing at first glance. Which developer says sleepy in Chinese every day.

In fact, this is Ali's open-source model Tongyi Qianwen, the name taken from the pinyin abbreviation Qwen, which was given a new pronunciation by foreigners.

In addition to Qwen, there are several domestic open source large models that are in full swing overseas, and they frequently refresh various benchmarks, and the demand and response are even higher than in China. These open source models from the Chinese team are not only not "sleepy", but also make rapid progress.

Tanishq Mathew Abraham, head of AI research at Stability AI, simply reminded in an article: "Many of the most competitive open source models, including Owen, Yi, InternLM, Deepseek, BGE, CogVLM, etc., are from China. The claim that China is lagging behind in the field of artificial intelligence is completely untrue. Instead, they are making significant contributions to ecosystems and communities. ”

"The most embarrassing thing for the United States is the significant contribution of China's open source models today"

So how powerful are China's open source models today? Let's take a look at each of them.

Tongyi Qianwen: Climb to the top of the mainstream open source list, and each of the eight sizes can be played

On May 9, Alibaba Cloud officially released Tongyi Qianwen 2.5, the strongest Chinese open source large model on the surface. Compared with the previous version, the comprehension ability, logical reasoning, instruction follow, and code ability of the 2.5 version model have been increased by 9%, 16%, 19%, and 10% respectively, and the performance in the Chinese context "fully catches up with GPT-4".

At the end of last month, the team just open-sourced Qwen1.5-110B, the first 100 billion parameter level model in the Qwen 1.5 series, which can handle 32K tokens context length and support multiple languages such as English, Chinese, French, Spanish, and German. Technically, it adopts the Transformer architecture and has an efficient grouped query attention mechanism. The basic capabilities are close to those of Meta-Llama 3-70B and Mixtral-8x22B, and they also perform well in the chat scenario evaluation of MT-Bench and AlpacaEval 2.0.

Maxime Labonne, senior machine learning scientist at Liquid AI, said, "It's crazy. The Qwen 1.5-110B scored on MMLU surprisingly higher than the instruct version of the 'performance beast' Llama 3 70B. After fine-tuning, it has the potential to become the most powerful open-source SOTA model, at least comparable to Llama 3. ”

Qwen1.5-110B还曾凭实力登顶Hugging Face 开源大模型榜首。

In fact, since Tongyi Qianwen announced the "full-modal, full-scale" open-source route in August last year, it has begun to iterate non-stop and break into the field of vision of overseas AI developer communities.

In order to meet the needs of different scenarios, Tongyi has launched a total of eight large models with a parameter scale of 500 million to 110 billion, and small sizes such as 0.5B, 1.8B, 4B, 7B, and 14B can be easily deployed on device-side devices. Large sizes such as 72B and 110B can support enterprise and scientific research applications; The 32B's mid-range size, on the other hand, seeks to find the best price/performance ratio between performance, efficiency, and memory.

With the flexible selection of various sizes, the model performance of other parameters of Tongyi Qianwen has also been highly praised.

其中Qwen-1.5 72B曾在业界兵家必争之地:LMSYS Org推出的基准测试平台Chatbot Arena上夺冠,Qwen-72B也多次进入"盲测"对战排行榜全球前十。

Bindu Reddy, founder and CEO of Abacus.AI Twitter, tweeted directly about Qwen-72B's benchmark results and excitedly said, "The open-source Qwen-72B beats GPT-4 on some benchmarks!" China is fighting back against the monopoly of AI companies that plague the United States! Join the global open source revolution! ”

Other netizens pointed out that the Qwen-72B base model can reach the most advanced level with the same score as GPT-4 on VMLU, that is, the Vietnamese version of MMLU, without fine-tuning and out-of-the-box.

The members of the Qwen family with smaller parameters are particularly popular.

On Hugging Face, Qwen 1.5-0.5B-Chat and CodeQwen 1.5-7B-Chat-GGUF received 226,000 and 200,000 downloads, respectively, last month. Five models, including Qwen 1.5-1.8B and Qwen 1.5-32B, were downloaded more than 100,000 times last month. (A total of 76 model versions have been released, which is really a model worker in the industry.) ）

We also note that in many of today's papers on model performance, Qwen has become almost a mandatory analysis subject, becoming one of the most representative models by default for developers and researchers.

DeepSeek V2: The "Pinduoduo" of the Large Model Industry

On May 6, DeepSeek, an AI company under the private equity giant High-Flyer Quantitative, released a new second-generation MoE large model DeepSeek-V2, and the model papers were both open sourced.

Its performance is in the top three in the AlignBench rankings, surpassing GPT-4 and approaching GPT-4-Turbo. MT-Bench is at the top level, comparable to LLaMA3-70B and far better than Mixtral 8x22B. Supports 128K contextual windows and specializes in math, code, and reasoning tasks.

In addition to the MoE architecture, DeepSeek V2 also innovates the Multi-Head Latent Attention mechanism. Out of a total of 236B parameters, only 21B is activated for the calculation. The computing resource consumption is only one-fifth of that of Llama 3 70B and one-twentieth of GPT-4.

In addition to efficient reasoning, the most explosive thing is that it is so good and cheap.

Under the premise that DeepSeek V2 is close to the first-tier closed-source model, the API pricing is reduced to 1 yuan per million tokens and 2 yuan per million tokens (32K context), which is only one-seventh of Llama3 70B and nearly one percent of GPT-4 Turbo, which is completely a price butcher.

Cheap is cheap, but DeepSeek doesn't lose money. It can achieve a peak throughput of 50,000 tokens per second on a machine with 8 x H800 GPUs. Based on the output API price, this equates to $50.4 per node per hour. The cost of an 8xH800 node in China is about $15/hour, so assuming perfect utilization, DeepSeek's profit per server is as high as $35.4 per hour, and the gross profit margin can reach more than 70%.

In addition, the DeepSeek platform also provides an API compatible with OpenAI, and you can get 5 million tokens when you sign up.

——Isn't the price point of efficient, easy to use, and penetrating the floor exactly what the open source community desperately needs?

On May 7, DeepSeek V2 was named as "the mysterious force of the rise of the East", and it achieved "economic crushing" over other models with ultra-high cost performance, pointing out that "OpenAI and Microsoft's industry challenges may not only come from the United States." ”

Philipp Schmid, Technical Director of Hugging Face, posted an article on X, listing the various skill points of DeepSeek V2 and recommending them to the community. In just four days after its launch, Hugging Face has been downloaded 3,522 times and instantly received 1,200 stars on GitHub.

Wall-facing intelligence: find another way, small and big

On the road to AGI, some like DeepSeek face computing power as the king and focus on economic efficiency; There are also those like Tongyi Qianwen that are in full bloom, with various model scales laid out; But the route of the vast majority of companies is to follow the Scaling Law and roll up the big parameters.

However, Facewall Intelligence is taking the opposite route: making the parameters as small as possible. Maximize the efficiency of the model with a lower deployment threshold and lower use cost, and "make a big deal with a small size".

On February 1 this year, Facewall Intelligence launched the MiniCPM-2B model with only 2.4 billion parameters, which is not only ahead of Google Gemma 2B at the same level as a whole, but also surpasses the performance benchmark Mistral-7B, and partially outperforms Llama2-13B, Llama2-70B-Chat, etc.

After the overseas community was open-sourced, Thomas Wolf, co-founder of Hugging Face, immediately posted, "A series of amazing technical reports and open-source models have emerged in China, such as DeepSeek, MiniCPM, UltraFeedback... Their data and experimental results are shared publicly, and this candid sharing of knowledge has been lost in the recent release of Western technology models. ”

Netizens retweeted and agreed: "MiniCPM is really impressive, with 2 billion parameters, and the best results from such a tiny model. ”

Another netizen who also read the MiniCPM model paper was even more excited, "Facewall intelligence is setting off a game-changing revolution." ”

"Imagine having powerful AI in your pocket, not just the cloud. The MiniCPM-2B is no ordinary model. With only 2.4 billion parameters, it surpasses its own 5 times the size of the giant! Size isn't the only criterion, it's all about how you use it. This is the vision of the future of AI at the edge, potentially redefining our interactions with technology. ”

70 days later, Facewall Intelligence took advantage of the victory and continued to release a new generation of MiniCPM-V 2.0, the "strongest end-side multi-modal large model that can run on mobile phones", with a parameter scale of 2.8B.

According to its Hugging Face, MiniCPM-V 2.0 has reached the best level in the open source community in multiple benchmarks, including OCRBench, TextVQA, and MME. In a comprehensive evaluation of OpenCompass covering 11 popular benchmarks, it outperformed Qwen-VL-Chat 9.6B, CogVLM-Chat 17.4B, and Yi-VL 34B. It's even close to the performance of the Gemini Pro in terms of scene text understanding.

"Compared to Mistral, many models in China are really open source"

In addition to the above-mentioned DeepSeek, Qwen, and MiniCPM, China's open-source models such as InternLM, which was jointly developed by Shanghai Artificial Intelligence Lab and SenseTime, the Yi series of Zero One Things, and the multi-modal large model CogVLM of Zhipu AI, are also very popular in the developer community.

People also discussed on Twitter that due to the language barrier between Chinese and English, the large models that can usually be seen overseas are only part of the release, and too many AI applications and integrations have not been fully demonstrated. It is speculated that these models should perform better in Chinese than in English. But even so, they are quite competitive in English benchmarks.

Others said they were shocked by the sheer number of Chinese authors in AI papers on Arxiv over the past year.

Chip Huyen, a former Stanford adjunct lecturer and co-founder of Claypot AI, shared his findings in his blog after researching 900 popular open-source AI tools: "Six of the top 20 accounts on GitHub are from China.

One of the benefits of open source is that it makes conspiracy theories untenable.

Vinod Khosla, an early investor in OpenAI, once posted in X that the open source models in the United States will be copied by China.

But the remarks were immediately refuted by Meta's AI godfather, Yann LeCun: "AI is not a weapon. Whether we open source our technology or not, China will not be left behind. They'll take control of their own AI and develop their own homegrown tech stack. ”

Moreover, in the sincerity of open source, the Chinese model has also begun to be recognized by developers. Some students studying at Stanford also shared that the professor praised China's open source model in class, especially actively sharing the results with the community openly and honestly, which is different from some star companies in Europe and the United States with the name of "open source". Some netizens also expressed a similar view to this professor, "The most embarrassing thing for the United States is the significant contribution of China's open source models today."

Open source is destined to continue to play an important role in the development of large-scale model technology, and this is the first time that open source and closed-source technologies have almost gone hand in hand. In this wave, China's open source contributors are contributing to the global community through more sincere open source products.

"The most embarrassing thing for the United States is the significant contribution of China's open source models today"

Tongyi Qianwen: Climb to the top of the mainstream open source list, and each of the eight sizes can be played

DeepSeek V2: The "Pinduoduo" of the Large Model Industry

Wall-facing intelligence: find another way, small and big

"Compared to Mistral, many models in China are really open source"

Read on