Break the open source ceiling: Tongyi Qianwen 72B and 1.8B models are open source, and the smallest end can run

The Heart of the Machine Original

Authors: Zhang Qian, Du Wei

At present, Tongyi Qianwen Open Source Family Bucket has 4 basic open-source models with 1.8 billion, 7 billion, 14 billion, and 72 billion parameters, as well as a variety of open-source models across multiple modalities such as language, image, and voice.

"The Qwen-72B model will be released on November 30. A few days ago, a netizen on the X platform posted such a message, and the source was a conversation. He also said, "It would be amazing if [the new model] was like their 14B model." 」

Break the open source ceiling: Tongyi Qianwen 72B and 1.8B models are open source, and the smallest end can run

A netizen reposted the post with the caption "Qianwen model has been performing well recently".

The 14B model in this sentence refers to Qwen-14B, a 14 billion parameter model of Tongyi Qianwen, which was open-sourced by Alibaba Cloud in September. At that time, this model surpassed the model of the same size in many authoritative evaluations, and some indicators were even close to Llama2-70B, and it was very popular among the developer community at home and abroad. Over the next two months, developers who used the Qwen-14B were naturally curious and looking forward to a larger model.

It seems that Japanese developers are also looking forward to it.

As stated in the message, on November 30, Qwen-72B was open-sourced. It single-handedly allows foreign developers who are chasing open source trends to live in Hangzhou.

Alibaba Cloud also announced a lot of details at today's press conference.

Looking at the performance data, the Qwen-72B lives up to expectations. In 10 authoritative benchmarks such as MMLU and AGIEval, Qwen-72B has achieved the best results of the open source model, becoming the most powerful open source model, even surpassing the open source benchmark Llama 2-70B and most commercial closed-source models (some of which surpass GPT-3.5 and GPT-4).

You know, before that, there was no high-quality open-source large model in the Chinese large model market that was strong enough to compete with Llama 2-70B, and Qwen-72B filled this gap. After that, large and medium-sized domestic enterprises can develop commercial applications based on its powerful reasoning capabilities, and universities and research institutes can carry out scientific research work such as AI for Science based on it.

In addition, a small model, Qwen-1.8B, and an audio model, Qwen-Audio, were released together. Qwen-1.8B and Qwen-72B are small and large, plus the 7B and 14B models that have been open-sourced before, forming a complete open-source spectrum, which is suitable for various application scenarios. Qwen-Audio, the previously open-source visual understanding model Qwen-VL, and the underlying text model form a multimodal spectrum that can help developers extend the capabilities of large models to more real-world environments.

Qwen-1.8B, the smallest open-source model of Tongyi Qianwen, only needs 3G video memory to inference 2K length text content. It seems that developers who want to deploy language models on mobile phones and other devices can give it a try.

This kind of "full-scale, full-modal" open source power is unparalleled in the industry. The Qwen-72B raises the bar for size and performance of open source models. In order to verify the capabilities of this open source model, the heart of the machine got started with the Alibaba Cloud Magic Community, and discussed the attractiveness of the Tongyi Qianwen open source model for developers.

First-hand experience:

Stronger reasoning and the ability to customize roles

The following diagram shows the user interface of the Qwen-72B. You can enter the question you want to ask or other interactive content in the "Input" box below, and the middle box will output the answer. At present, Qwen-72B supports Chinese and English input, which is also a big difference between Tongyi Qianwen and Llama2. Previously, the poor Chinese support of Llama2 was a headache for many domestic developers.

Experience address: https://modelscope.cn/studios/qwen/Qwen-72B-Chat-Demo/summary

We learned that in Chinese tasks, Qwen-72B dominates CEVAL, CMMLU, Gaokao and other assessments, especially in complex semantic understanding and logical reasoning. First, an analysis of confusing sentences that contain elements of characters in Chinese martial arts novels, Qwen-72B obviously gets several different meanings of "pass".

Another sentence that is similar to the tendency to get dizzy is also explained very clearly.

Another classic game of "farmer, fox, rabbit and turnip" to cross the river safely, Qwen-72B can also answer the flow.

Since Qwen-72B supports English input, let's also test its bilingual interaction skills.

Qwen-72B also understands authentic American slang very well.

Math experts are online

Mathematics has always been an important part of testing large models. The data shows that Qwen-72B has achieved a faulty lead over other open-source models in tests such as MATH, so what about the actual results? First, test it with a classic craps probability problem, and obviously, it is not stumped.

The chickens and rabbits in the same cage problem also came together, and the answer was correct, but the problem solving process was a bit special.

The problem of two bottles of water can also be solved.

Incarnated as Lin Daiyu and Old Master Kong

Giving a personalized role to the large model is a major feature of the Qwen-72B. Thanks to its powerful system command capabilities, you only need to set prompt words to customize your AI assistant to give it a unique character, personality, tone, etc.

Let's let it reply in Lin Daiyu's tone first.

Let it incarnate as Old Master Confucius, and his earnest teachings will come to his face.

The dialects of Northeast China, Tianjin and other places can also blurt out.

According to the technical data released by Alibaba Cloud, the improvement of the inference performance of Qwen-72B is actually inseparable from the optimization of data and training.

At the data level, Tongyi currently uses up to 3T tokens of data, with a vocabulary of up to 150,000. According to the Tongyi Qianwen team, the model is still being trained, and more high-quality data will be eaten in the future.

In terms of model training, they comprehensively used DP, TP, PP, SP and other methods for large-scale distributed parallel training, and introduced efficient operators such as Flash Attention v2 to improve the training speed. With the topology-aware scheduling mechanism of Alibaba Cloud AI platform PAI, the communication cost of large-scale training is effectively reduced and the training speed is increased by 30%.

How did the cumulative number of over 1.5 million downloads come about?

Judging from the above evaluation results, the open source model of the Tongyi Qianwen series represented by Qwen-72B does give developers many reasons to choose them, such as stronger Chinese capabilities than Llama 2.

Chen Junbo, founder and CEO of Youlu Robotics, mentioned that they experimented with all the large models that could be found on the market when making products, and finally chose Tongyi Qianwen because "it is one of the best intelligent open source large models that can be found at least in the Chinese field."

Tao Jia, a specialist in the system room of Zhejiang Electric Power Design Institute Co., Ltd. of China Energy Construction Group, mentioned that foreign models (such as GPT-4) are very capable, but API calls are inconvenient, and B-end users prefer to customize by themselves, and there are still too few things that the API can do.

The customizability of the model is also a point that Chen Junbo is more concerned about. He said that what they need is not a large language model with a fixed level of intelligence, but a large language model that can become more and more intelligent with the accumulation of enterprise data, "Closed-source large models obviously can't do this, so in our business format, the end game must be an open source model." 」

When talking about the feeling of using the Tongyi Qianwen open source model to build an application, Tao Jia described, "Among the several open source models I have tried, Tongyi Qianwen is the best, not only the answer is accurate, but also the 'feel' is very good. "Feel" is a more subjective thing, and in general, it is the most suitable for my needs, without those strange bugs. 」

In fact, when it comes to "demand", almost every B-end user's needs are inseparable from "reducing costs and increasing efficiency", which is another advantage of the open source model. According to a September statistic, Llama2 -70B is about 30 times cheaper than GPT-4, and even after OpenAI announced the price cut, Llama2 -70B still retains several times the cost advantage, let alone derivative open-source models with a volume of less than 70B. This is very attractive for businesses.

Source: https://promptengineering.org/how-does-llama-2-compare-to-gpt-and-other-ai-language-models/

For example, Wang Zhaotian, product owner of Lingyang Quick BI, a data enterprise service brand, mentioned that one of the major advantages of Qianwen is that it is lightweight and "can be deployed and used in a low-cost hardware environment", which allows Quick BI to seize the opportunity to launch the intelligent data assistant "Smart Q" based on the large model of Tongyi Qianwen, and seize the minds of users.

A sentence from Qin Xuye, co-founder and CEO of Future Speed, may resonate with many companies. He said that enterprise-level users are more concerned about being able to solve problems than asking for comprehensive model capabilities. The "problems" of enterprises are difficult and easy, and the available funds, computing power, and deployment requirements are also very different, so the flexibility and cost-effectiveness of the model are very high. For example, some enterprises may want to run large models on mobile phones and other devices, while others may have relatively sufficient computing power but need models with stronger inference capabilities. Tongyi Qianwen just provides developers with these choices - from 1.8B to 72B, from text to voice to images, this is a rich open-source package, there is always one that better meets the needs.

On multiple authoritative test sets, the performance of Qwen-1.8B, a 1.8 billion parameter open-source model of Tongyi Qianwen, far exceeds that of the previous SOTA model.

That's not all, though. For developers and enterprises who choose open source users, it is equally important whether the model is sustainable and whether the ecosystem is rich.

"We don't have the resources to train a pedestal model from scratch, and the first consideration for choosing a model is whether the institutions behind it can endorse the model well, and whether it can continue to invest in the pedestal model and its ecological construction. These are some of the criteria used by Yan Xin, a core member of the X-D Lab at East China University of Science and Technology, to judge whether the model is sustainable.

Obviously, after watching the "100 model war" in the first half of the year, he is also worried that the model he chooses will be an outcast in this competition. In order to avoid this situation, he chose Alibaba Cloud, because it is the only organization with open source large models among major domestic manufacturers. Moreover, in addition to Tongyi Qianwen, more than half of the leading large models in China are running on Alibaba Cloud, and the investment and sustainability of infrastructure construction are beyond doubt.

In addition, Alibaba Cloud has been making large models for some years, and began to conduct large model research in 2018, and in 2023, it will release the signal of "all in large models". These signals are reassuring for developers who care about the sustainability of large models. Yan Xin commented, "Alibaba Cloud's ability to open source such a large-scale model as Tongyi Qianwen 72B shows that it is determined and able to continue to invest in open source. 」

In terms of ecology, Yan Xin also expressed his own considerations, "We hope to choose a mainstream and stable model architecture, which can maximize the power of ecology and match the upstream and downstream environments. 」

In fact, this is also the advantage of the open source model of Tongyi Qianwen. Due to the early days of open source, Alibaba Cloud's open source ecosystem has actually begun to take shape, and the cumulative number of downloads of Tongyi Qianwen's open source model has exceeded 1.5 million, giving birth to dozens of new models and applications. These developers provided ample feedback from the application scenarios to Tongyi Qianwen, so that the development team could continuously optimize the open source basic model.

In addition, the related ancillary services within the community are also an attractive point. Chen Junbo mentioned, "Tongyi Qianwen provides a very convenient tool chain that allows us to quickly do finetune and various experiments on our own data. And the service of Tongyi Qianwen is very good, and we can respond quickly to any needs. This is something that most current open source model providers can't do.

Yann LeCun:

Open source is good for both AI development and social development

Before you know it, ChatGPT has been released for one year, and it is also a year that open-source models are struggling to catch up. During this time, there has been a lot of debate about whether large models should be open source or closed source.

In an interview some time ago, Yann LeCun, chief scientist at Meta and winner of the Turing Award, revealed the reasons why he has been committed to open source all along. He believes that AI will be a repository of all human knowledge in the future. And this repository needs everyone to contribute to it, and that's something that can only be done with open source. In addition, he has previously said that the open source model helps to empower more people and businesses to take advantage of the most advanced technologies, as well as to address potential weaknesses, reduce social disparities and improve competition.

At the press conference, Zhou Jingren, CTO of Alibaba Cloud, reiterated the importance they attach to open source, saying that Tongyi Qianwen will adhere to open source and hope to build "the most open large model in the AI era". It seems that a larger open-source model can be expected to be in waves.

Break the open source ceiling: Tongyi Qianwen 72B and 1.8B models are open source, and the smallest end can run

Read on

In less than a year, I tried a number of large models at home and abroad, and found that each has its own merits. Recently, I tested Tongyi Qianwen, an open source version of Alibaba Cloud's 720 parameter large model

Today, there is a remarkable change in the field of AI in China. Alibaba Cloud announced the open source Tongyi Qianwen, a powerful large model with 72 billion parameters, which quickly seized the open source large model

Tongyi Qianwen 72B, 1.8B, and Audio models were released, following the example of Meta and Alibaba Cloud to open source the Tongyi Qianwen 72 billion parameter model Qwen-72B, 1

开源12天,包揽Hugging Face等榜单冠军,通义千问甩Llama 2成新标杆

Use Ali Tongyi Qianwen and Semantic Kernel to build a knowledge assistant in 10 minutes!

#Zhang Jiechatabout Technology#Musk is indeed a veritable prophet! In his recent exchange with the Israeli Prime Minister, he mentioned: "China will emerge in the field of artificial intelligence and is expected to take the lead."

Intel Core Ultra was released, Ali Tongyi Qianwen took the lead in adapting, and the landing of AI PCs is exciting

Terracotta Warriors and Horses Dance Subject Three, Bezos Dance House Dance... Ali Tongyi Qianwen has been arranged!

Alibaba Cloud won the first instance of the lawsuit against Shanzhai Tongyi Qianwen App, Apple became the smartphone sales champion in 2023, and Win 11 completely eliminated the WordPad | Geek headlines

MediaTek and Alibaba Cloud have completed the device-side deployment of the Tongyi Qianwen model on the Dimensity mobile platform

Tongyi Qianwen has open-sourced 32 billion parameter models, and has realized 7 large language models that are all open-source

The release time of the new iPad is exposed, and the price may rise/ Huawei may release new products for 5 consecutive days/Tongyi Qianwen open-source 32 billion parameter model

Tongyi Qianwen landed MediaTek mobile phone chip offline can also continue to talk to AI

Technology tycoon Musk once predicted that many of China's AI capabilities have huge potential, even leading the world. It turns out

Tongyi Qianwen open source king fried, 110 billion parameters dominate the open source list, Chinese ability is the first in the world

The throne of the open source large model changed hands again, Tongyi Qianwen won the SOTA with 100 billion parameters, and 8 models have been launched in March

面壁新模型:早于Llama3、比肩 Llama3、推理超越 Llama3!

Huawei's profit soared by 564% in the first quarter, Tianya community recovered, and Xiaohongshu tested its self-developed large model

13 Models of Effective Communication Expression

Eat through an industrial chain in one day: NO.37 AI large model industrial chain

10 domestic large models vs. mentally handicapped - Chinese comprehension ability assessment

The most complete interpretation of the MoE hybrid expert model: revealing the key technologies and challenges

Baidu's strongest SOTA: 3DGS based on diffusion model!

Sprint 2024 "Half Year Red" | Sixty percent of AI companies have achieved profitable growth, and large model companies have made money?

Dialogue with UBTECH Jiao Jichao: Large model accelerates humanoid robots to "work in the factory"

iFLYTEK's profit puzzle: high investment and low return in the field of large models

Ali Lin Junyang: Large models are not enough for many people, and building multimodal agents is the key

Li Feifei, the godmother of AI, founded a spatial intelligence company that strives to overcome the existing limitations of large-scale AI technology

"Butterfly Model" classic example class notes

Li Feifei, the "godmother of AI", founded a spatial intelligence company in an effort to overcome the existing limitations of AI technologies such as large models

The large model engages in "human flesh search", and the accuracy rate is as high as 95.8%!

Product Life (4): From "User Story Mapping" to "WOOP Mindset"