Editor: Editorial Department

The industry's strongest 72B model directly surpasses the open source benchmark Llama 2-70B, as well as the 1.8B model and the audio model are all open source, and Alibaba Cloud has really taken out all the money this time.

Just now, the AI community was swiped by this picture.

Alibaba Cloud has open-sourced the Qwen-72B model with 72 billion parameters, and in 10 evaluations, the performance directly surpasses the open-source benchmark Llama 2-70B.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

In the domestic open source model, it is rare to see such a large parameter.

You must know that in the previous domestic large-scale model market, there were very few high-quality open source models that were sufficient to benchmark Llama 2-70B.

"Just a few weeks ago, I was a fan of Mistral. Who would have thought that in just two or three weeks, the AI world has already changed!"

And such excellent performance and 32K-long context also amazed netizens.

At the same time, netizens are also very much looking forward to the int8 version of the 1.8B model: "A large wave of TPU and NPU ARM computers are coming".

From 1.8 billion, 7 billion, 14 billion, to 72 billion parameters, the four large language models are all open source, all open source?

Not only that, but the open source also includes two multimodal large models of visual understanding and audio understanding.

This time, Alibaba Cloud can be said to have taken out all the money, and the industry has never seen a precedent for such a "full-size, full-modal" open source.

This also means that from now on, whether it is a consumer-grade terminal or an enterprise high-performance application, there are abundant domestic open source large models for us to choose from!

Benchmarking against Llama, the industry's strongest 72 billion parameter large model is open-sourced

The Qwen-72B with 72 billion parameters is directly comparable to Llama2-70B in size, and its performance has reached the top level of open source large models, surpassing most commercial closed-source models.

In the 10 authoritative evaluations, the 72 billion parameter model of Tongyi Qianwen won the best score of the open source model

Based on the high-quality data training of 3T token, the Qwen-72B achieves a comprehensive performance upgrade with a larger parameter scale and more training data.

In terms of language ability, Qwen-72B performed excellently, achieving the highest score of the open source model on the English task in the MMLU benchmark; in the Chinese task, it dominated the C-Eval, CMMLU, GaokaoBench and other assessments, and scored better than GPT-4.

Seeing this evaluation result, the editor had itchy hands and immediately started to test it.

The understanding of the meaning of classical Chinese is well controlled.

The question of the mentally retarded is not a problem for Qwen-72B.

In terms of mathematical reasoning, compared with other open-source models, Qwen-72B leads in GSM8K and MATH tests.

At the same time, the ability to understand the code has also made a qualitative leap, and the performance in tests such as HumanEval and MBPP has been greatly improved.

Faced with high school math problems, Qwen-72B not only has clear ideas for solving problems, but also has accurate calculation results.

Which is bigger, 0.999 infinite loop or 1? This is a classic case in the mathematical world, but it has stumped many people.

Sure enough, Qwen-72B gave the correct answer: 0.999...=1.

Heaven and hell have two gates, two doormen, one tells the truth and the other tells lies, and can only ask one person once, how to find out the gate of heaven?

For this kind of complex logical reasoning, the Qwen-72B is also up to the task.

The 32K extra-long context is easy to handle

In addition to the most direct generation ability, with the debut of GPT-4 128K and Claude 200K, the track of ultra-long text sequence processing is also becoming more and more "volume".

But in practice, few models can fully understand and utilize all the information in long documents.

Some time ago, tech guru Greg Kamradt used a method called "finding a needle in a haystack" to test the long context capabilities of GPT-4 and Claude 2.1 - "Given a long document, insert a sentence into it as information to be retrieved, and ask the model questions at the end of the document. 」

The results show that the Claude 2.1, which is nominal 200K, has a noticeable performance degradation from 90K onwards, and a similar situation occurs with GPT-4 128K.

In contrast, Qwen-72B can basically accurately find information within 32K and placed in various locations.

Looking at these three comparison charts, there is really no harm in no comparison

The problem that stumped GPT-4 and Claude was easily solved by Qwen-72B, no wonder it caused foreign developers to exclaim again and again.

A prompt, customize any persona

In addition to longer contextual capabilities, the Qwen-72B is equipped with powerful System Prompt capabilities.

With a prompt, we can customize our own AI assistant and let the large model role-play.

For example, you can let it play Zhen Huan: "Do you love the emperor or the king of Guojun?"

Experience address: https://modelscope.cn/studios/qwen/Qwen-72B-Chat-Demo/summary

You can also make it play the role of a two-dimensional cute girl.

You must know that customizing a persona is actually a highly technical technology.

In role-playing, the AI assistant should not forget its own persona after multiple rounds of dialogue, which requires the system instructions to remain stable in multiple rounds of dialogue. In addition, the AI assistant also needs to make inferences about its own behavior based on settings.

System directives, on the other hand, provide an easy-to-organize, contextually stable way to help us control AI.

For example, if it's a cat, it's going to meow.

Qwen-72B is fully trained based on a variety of system instructions with multiple rounds of complex interactions, so that the model can follow a variety of system instructions, realize the customization of the context, and further improve the scalability of the model.

Similarly, OpenAI's open "GPTs" can also be customized to extend the model. Within a day of launch, thousands of apps were created.

For example, some developers have made a "boyfriend GPT" according to their boyfriend's settings, as well as scientific research GPT, game generated GPT, and so on.

Full size open source

This time, it is not just the gap of the 70B parametric model -

Alibaba Cloud is also open source, and there is also a 1.8B small-size model Qwen-1.8B.

If the Qwen-72B is "moving upwards", it raises the size and performance ceiling of the open source model. The open source of Qwen-1.8B has "bottomed out" and become the smallest Chinese open source model on the market.

The size of the model parameters is positively correlated with the computing power consumption. In short, the larger the model, the higher the cost of model training and inference.

However, for many developers and enterprises, the smaller the model, the lower the development cost, which can promote the universalization of large model technology, under the premise of effectively controlling the accuracy of model training and inference.

Even Microsoft is very bullish on small-scale models. At the Ignite conference some time ago, Nadella announced that the Phi-2 model, which has only 2.7 billion parameters, will be open sourced in the future.

In contrast, the biggest advantage of Qwen-1.8B is that the minimum video memory required for inference is less than 1.5GB, which can complement many applications in device-side scenarios.

Moreover, the minimum fine-tuning cost is not more than 6GB, and the fine-tuning speed is more than 3 times higher than that of the 7B model.

In a number of authoritative evaluation sets, the performance of Qwen-1.8B far exceeds that of previous SOTA models of the same scale.

From 1.8 billion, 7 billion, 14 billion to 72 billion parameters, this kind of full-size, full-modal open source is the first case in China.

Since August this year, the 7 billion parameter model Qwen-7B, the visual understanding model Qwen-VL, and the 14 billion parameter model Qwen-14B have been open-sourced in Moda, and have successively rushed to the HuggingFace and Github large model lists, with a cumulative download of more than 1.5 million and more than 150 new models and applications.

Multimodal exploration

This time, the Tongyi Qianwen team is also one step ahead of the industry in the field of multi-modal large models, and has open-sourced Qwen-Audio, a large audio understanding model, for the first time.

Address: https://arxiv.org/abs/2311.07919

Different from traditional speech models, Qwen-Audio has the ability to perceive and understand various speech signals such as human voices, nature sounds, animal voices, and music sounds.

On the basis of Qwen-Audio, the team developed Qwen-Audio-Chat through instruction fine-tuning, which can achieve multiple rounds of dialogue and support diverse audio scenarios - similar to OpenAI's launch of a new voice feature in September, where users can chat with ChatGPT directly by speaking.

Now, when you type a voice into Qwen-Audio, it can "hear" and "reply" to what you want to know.

It can even be used for literary creation, logical reasoning, story continuation, and so on based on audio.

Experience address: https://modelscope.cn/studios/qwen/Qwen-Audio-Chat-Demo/summary

The Tongyi model can not only "hear", but also "see".

In August this year, Tongyi Qianwen open-sourced the Qwen-VL visual understanding model, giving the large model the visual ability close to that of humans.

Multimodal models are regarded as one of the important directions for the evolution of general artificial intelligence technology.

From a language model that only supports text to a model that can understand and generate multimodal "full facial features" such as audio, pictures, and videos, it implies a huge possibility of intelligent leaps in large models.

On December 1st, Qwen-VL released another major update - not only in general OCR, visual reasoning, Chinese text understanding basic capabilities have been greatly improved, but also can process a variety of resolutions and specifications of images, and even the tricky requirements of "looking at pictures and doing problems" can be easily solved.

Upload a road sign diagram and Qwen-VL will be able to make driving decisions.

Upload a primary school programming problem that can read the picture and write the code directly.

Handwriting and hand-drawn drawings can be recognized.

Upload a piece of English, and the model can recognize the original English text and translate it into Chinese.

Open source or closed source, does it matter?

It has been a full year since ChatGPT was born. Everyone has witnessed the great changes that this AI weapon has brought to the whole world.

According to incomplete statistics, the total number of large models in China has exceeded 200+, and open-source models also abound.

In the midst of this, the battle between open source and closed source, like the operating system, has always been there.

Turing Award winner Yann LeCun once said that closed source proves the feasibility of the large model route, while open source makes large models easy to use and usable through a thriving ecosystem.

While the industry is hotly debated, there are two main reasons for users to choose a model.

First, whether the performance of the model meets the needs of your scenario.

If you want to use it out of the box, you will most likely choose a closed-source model product that has already been packaged. If you need deep customization and have clear requirements for security and privacy, you are likely to choose the open-source model.

As we all know, different open-source models have different training data, in addition to differences in parameters. Naturally, there will be advantages and disadvantages in terms of performance and knowledge structure.

From the above evaluation results, it can be seen that Tongyi Qianwen open source family bucket can well support various AI applications in terms of performance.

For example, with the brain of Tongyi Qianwen, and then matching it with a body, it is simply a realistic version of 007 - let the robotic arm heat a cake for you, open the oven, place the cake, and then rotate the button to heat, the whole process can only be described as silky.

According to Yan Xin of the X-D Laboratory of East China University of Technology, among the many open-source large models tested at that time, Tongyi Qianwen played the best, especially in complex logical reasoning.

As a result, the team developed three large-scale models in vertical fields based on Tongyi Qianwen. In addition, the model has been used by more than 200,000 people, and more than 1 million Q&A services have been provided.

Experience address: https://modelscope.cn/studios/X-D-Lab/MindChat/summary

The second is whether the model has an ecology and whether it can be sustained.

Obviously, those large models made in order to gain popularity will not only not have further updates, but also will not have related ecology.

In contrast, Alibaba Cloud has been increasing open source in terms of personnel, funds, and supporting tools, providing developers with a complete full-link closed-loop service.

For example, the AI model community "Magic Ride", launched in 2022, has 2,300+ models and more than 2.8 million developers so far. In this platform, developers can download various models and call API inference.

Not only that, developers can also call the model API through the "Lingji Platform", or use the "Bailian Platform" to customize large model applications.

and the "Artificial Intelligence Platform PAI", which is deeply adapted to the full range of models of Tongyi Qianwen, has also launched services such as lightweight fine-tuning, full-parameter fine-tuning, distributed training, offline inference verification, and online service deployment.

As one of the important contributors to open source, Alibaba Cloud continues to contribute to the open source community this time with 72 billion parameters, 1.8 billion parameters, and Qwen-Audio, a voice model.

Resources:

https://modelscope.cn/organization/qwen

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

The industry's strongest 72B model directly surpasses the open source benchmark Llama 2-70B, as well as the 1.8B model and the audio model are all open source, and Alibaba Cloud has really taken out all the money this time.

The 32K extra-long context is easy to handle

A prompt, customize any persona

Read on

In less than a year, I tried a number of large models at home and abroad, and found that each has its own merits. Recently, I tested Tongyi Qianwen, an open source version of Alibaba Cloud's 720 parameter large model

Today, there is a remarkable change in the field of AI in China. Alibaba Cloud announced the open source Tongyi Qianwen, a powerful large model with 72 billion parameters, which quickly seized the open source large model

Tongyi Qianwen 72B, 1.8B, and Audio models were released, following the example of Meta and Alibaba Cloud to open source the Tongyi Qianwen 72 billion parameter model Qwen-72B, 1

开源12天,包揽Hugging Face等榜单冠军,通义千问甩Llama 2成新标杆

Use Ali Tongyi Qianwen and Semantic Kernel to build a knowledge assistant in 10 minutes!

#Zhang Jiechatabout Technology#Musk is indeed a veritable prophet! In his recent exchange with the Israeli Prime Minister, he mentioned: "China will emerge in the field of artificial intelligence and is expected to take the lead."

Intel Core Ultra was released, Ali Tongyi Qianwen took the lead in adapting, and the landing of AI PCs is exciting

Terracotta Warriors and Horses Dance Subject Three, Bezos Dance House Dance... Ali Tongyi Qianwen has been arranged!

Alibaba Cloud won the first instance of the lawsuit against Shanzhai Tongyi Qianwen App, Apple became the smartphone sales champion in 2023, and Win 11 completely eliminated the WordPad | Geek headlines

MediaTek and Alibaba Cloud have completed the device-side deployment of the Tongyi Qianwen model on the Dimensity mobile platform

Tongyi Qianwen has open-sourced 32 billion parameter models, and has realized 7 large language models that are all open-source

The release time of the new iPad is exposed, and the price may rise/ Huawei may release new products for 5 consecutive days/Tongyi Qianwen open-source 32 billion parameter model

Tongyi Qianwen landed MediaTek mobile phone chip offline can also continue to talk to AI

Technology tycoon Musk once predicted that many of China's AI capabilities have huge potential, even leading the world. It turns out

Tongyi Qianwen open source king fried, 110 billion parameters dominate the open source list, Chinese ability is the first in the world

The throne of the open source large model changed hands again, Tongyi Qianwen won the SOTA with 100 billion parameters, and 8 models have been launched in March