laitimes

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

author:New Zhiyuan

Editor: Editorial Department

The industry's strongest 72B model directly surpasses the open source benchmark Llama 2-70B, as well as the 1.8B model and the audio model are all open source, and Alibaba Cloud has really taken out all the money this time.

Just now, the AI community was swiped by this picture.

Alibaba Cloud has open-sourced the Qwen-72B model with 72 billion parameters, and in 10 evaluations, the performance directly surpasses the open-source benchmark Llama 2-70B.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

In the domestic open source model, it is rare to see such a large parameter.

You must know that in the previous domestic large-scale model market, there were very few high-quality open source models that were sufficient to benchmark Llama 2-70B.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

"Just a few weeks ago, I was a fan of Mistral. Who would have thought that in just two or three weeks, the AI world has already changed!"

And such excellent performance and 32K-long context also amazed netizens.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

At the same time, netizens are also very much looking forward to the int8 version of the 1.8B model: "A large wave of TPU and NPU ARM computers are coming".

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

From 1.8 billion, 7 billion, 14 billion, to 72 billion parameters, the four large language models are all open source, all open source?

Not only that, but the open source also includes two multimodal large models of visual understanding and audio understanding.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

This time, Alibaba Cloud can be said to have taken out all the money, and the industry has never seen a precedent for such a "full-size, full-modal" open source.

This also means that from now on, whether it is a consumer-grade terminal or an enterprise high-performance application, there are abundant domestic open source large models for us to choose from!

Benchmarking against Llama, the industry's strongest 72 billion parameter large model is open-sourced

The Qwen-72B with 72 billion parameters is directly comparable to Llama2-70B in size, and its performance has reached the top level of open source large models, surpassing most commercial closed-source models.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

In the 10 authoritative evaluations, the 72 billion parameter model of Tongyi Qianwen won the best score of the open source model

Based on the high-quality data training of 3T token, the Qwen-72B achieves a comprehensive performance upgrade with a larger parameter scale and more training data.

In terms of language ability, Qwen-72B performed excellently, achieving the highest score of the open source model on the English task in the MMLU benchmark; in the Chinese task, it dominated the C-Eval, CMMLU, GaokaoBench and other assessments, and scored better than GPT-4.

Seeing this evaluation result, the editor had itchy hands and immediately started to test it.

The understanding of the meaning of classical Chinese is well controlled.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

The question of the mentally retarded is not a problem for Qwen-72B.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code
Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

In terms of mathematical reasoning, compared with other open-source models, Qwen-72B leads in GSM8K and MATH tests.

At the same time, the ability to understand the code has also made a qualitative leap, and the performance in tests such as HumanEval and MBPP has been greatly improved.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Faced with high school math problems, Qwen-72B not only has clear ideas for solving problems, but also has accurate calculation results.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Which is bigger, 0.999 infinite loop or 1? This is a classic case in the mathematical world, but it has stumped many people.

Sure enough, Qwen-72B gave the correct answer: 0.999...=1.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Heaven and hell have two gates, two doormen, one tells the truth and the other tells lies, and can only ask one person once, how to find out the gate of heaven?

For this kind of complex logical reasoning, the Qwen-72B is also up to the task.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

The 32K extra-long context is easy to handle

In addition to the most direct generation ability, with the debut of GPT-4 128K and Claude 200K, the track of ultra-long text sequence processing is also becoming more and more "volume".

But in practice, few models can fully understand and utilize all the information in long documents.

Some time ago, tech guru Greg Kamradt used a method called "finding a needle in a haystack" to test the long context capabilities of GPT-4 and Claude 2.1 - "Given a long document, insert a sentence into it as information to be retrieved, and ask the model questions at the end of the document. 」

The results show that the Claude 2.1, which is nominal 200K, has a noticeable performance degradation from 90K onwards, and a similar situation occurs with GPT-4 128K.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code
Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

In contrast, Qwen-72B can basically accurately find information within 32K and placed in various locations.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Looking at these three comparison charts, there is really no harm in no comparison

The problem that stumped GPT-4 and Claude was easily solved by Qwen-72B, no wonder it caused foreign developers to exclaim again and again.

A prompt, customize any persona

In addition to longer contextual capabilities, the Qwen-72B is equipped with powerful System Prompt capabilities.

With a prompt, we can customize our own AI assistant and let the large model role-play.

For example, you can let it play Zhen Huan: "Do you love the emperor or the king of Guojun?"

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Experience address: https://modelscope.cn/studios/qwen/Qwen-72B-Chat-Demo/summary

You can also make it play the role of a two-dimensional cute girl.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

You must know that customizing a persona is actually a highly technical technology.

In role-playing, the AI assistant should not forget its own persona after multiple rounds of dialogue, which requires the system instructions to remain stable in multiple rounds of dialogue. In addition, the AI assistant also needs to make inferences about its own behavior based on settings.

System directives, on the other hand, provide an easy-to-organize, contextually stable way to help us control AI.

For example, if it's a cat, it's going to meow.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code
Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Qwen-72B is fully trained based on a variety of system instructions with multiple rounds of complex interactions, so that the model can follow a variety of system instructions, realize the customization of the context, and further improve the scalability of the model.

Similarly, OpenAI's open "GPTs" can also be customized to extend the model. Within a day of launch, thousands of apps were created.

For example, some developers have made a "boyfriend GPT" according to their boyfriend's settings, as well as scientific research GPT, game generated GPT, and so on.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Full size open source

This time, it is not just the gap of the 70B parametric model -

Alibaba Cloud is also open source, and there is also a 1.8B small-size model Qwen-1.8B.

If the Qwen-72B is "moving upwards", it raises the size and performance ceiling of the open source model. The open source of Qwen-1.8B has "bottomed out" and become the smallest Chinese open source model on the market.

The size of the model parameters is positively correlated with the computing power consumption. In short, the larger the model, the higher the cost of model training and inference.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

However, for many developers and enterprises, the smaller the model, the lower the development cost, which can promote the universalization of large model technology, under the premise of effectively controlling the accuracy of model training and inference.

Even Microsoft is very bullish on small-scale models. At the Ignite conference some time ago, Nadella announced that the Phi-2 model, which has only 2.7 billion parameters, will be open sourced in the future.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

In contrast, the biggest advantage of Qwen-1.8B is that the minimum video memory required for inference is less than 1.5GB, which can complement many applications in device-side scenarios.

Moreover, the minimum fine-tuning cost is not more than 6GB, and the fine-tuning speed is more than 3 times higher than that of the 7B model.

In a number of authoritative evaluation sets, the performance of Qwen-1.8B far exceeds that of previous SOTA models of the same scale.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

From 1.8 billion, 7 billion, 14 billion to 72 billion parameters, this kind of full-size, full-modal open source is the first case in China.

Since August this year, the 7 billion parameter model Qwen-7B, the visual understanding model Qwen-VL, and the 14 billion parameter model Qwen-14B have been open-sourced in Moda, and have successively rushed to the HuggingFace and Github large model lists, with a cumulative download of more than 1.5 million and more than 150 new models and applications.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Multimodal exploration

This time, the Tongyi Qianwen team is also one step ahead of the industry in the field of multi-modal large models, and has open-sourced Qwen-Audio, a large audio understanding model, for the first time.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Address: https://arxiv.org/abs/2311.07919

Different from traditional speech models, Qwen-Audio has the ability to perceive and understand various speech signals such as human voices, nature sounds, animal voices, and music sounds.

On the basis of Qwen-Audio, the team developed Qwen-Audio-Chat through instruction fine-tuning, which can achieve multiple rounds of dialogue and support diverse audio scenarios - similar to OpenAI's launch of a new voice feature in September, where users can chat with ChatGPT directly by speaking.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Now, when you type a voice into Qwen-Audio, it can "hear" and "reply" to what you want to know.

It can even be used for literary creation, logical reasoning, story continuation, and so on based on audio.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Experience address: https://modelscope.cn/studios/qwen/Qwen-Audio-Chat-Demo/summary

The Tongyi model can not only "hear", but also "see".

In August this year, Tongyi Qianwen open-sourced the Qwen-VL visual understanding model, giving the large model the visual ability close to that of humans.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Multimodal models are regarded as one of the important directions for the evolution of general artificial intelligence technology.

From a language model that only supports text to a model that can understand and generate multimodal "full facial features" such as audio, pictures, and videos, it implies a huge possibility of intelligent leaps in large models.

On December 1st, Qwen-VL released another major update - not only in general OCR, visual reasoning, Chinese text understanding basic capabilities have been greatly improved, but also can process a variety of resolutions and specifications of images, and even the tricky requirements of "looking at pictures and doing problems" can be easily solved.

Upload a road sign diagram and Qwen-VL will be able to make driving decisions.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Upload a primary school programming problem that can read the picture and write the code directly.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Handwriting and hand-drawn drawings can be recognized.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code
Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Upload a piece of English, and the model can recognize the original English text and translate it into Chinese.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Open source or closed source, does it matter?

It has been a full year since ChatGPT was born. Everyone has witnessed the great changes that this AI weapon has brought to the whole world.

According to incomplete statistics, the total number of large models in China has exceeded 200+, and open-source models also abound.

In the midst of this, the battle between open source and closed source, like the operating system, has always been there.

Turing Award winner Yann LeCun once said that closed source proves the feasibility of the large model route, while open source makes large models easy to use and usable through a thriving ecosystem.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

While the industry is hotly debated, there are two main reasons for users to choose a model.

First, whether the performance of the model meets the needs of your scenario.

If you want to use it out of the box, you will most likely choose a closed-source model product that has already been packaged. If you need deep customization and have clear requirements for security and privacy, you are likely to choose the open-source model.

As we all know, different open-source models have different training data, in addition to differences in parameters. Naturally, there will be advantages and disadvantages in terms of performance and knowledge structure.

From the above evaluation results, it can be seen that Tongyi Qianwen open source family bucket can well support various AI applications in terms of performance.

For example, with the brain of Tongyi Qianwen, and then matching it with a body, it is simply a realistic version of 007 - let the robotic arm heat a cake for you, open the oven, place the cake, and then rotate the button to heat, the whole process can only be described as silky.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

According to Yan Xin of the X-D Laboratory of East China University of Technology, among the many open-source large models tested at that time, Tongyi Qianwen played the best, especially in complex logical reasoning.

As a result, the team developed three large-scale models in vertical fields based on Tongyi Qianwen. In addition, the model has been used by more than 200,000 people, and more than 1 million Q&A services have been provided.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

Experience address: https://modelscope.cn/studios/X-D-Lab/MindChat/summary

The second is whether the model has an ecology and whether it can be sustained.

Obviously, those large models made in order to gain popularity will not only not have further updates, but also will not have related ecology.

In contrast, Alibaba Cloud has been increasing open source in terms of personnel, funds, and supporting tools, providing developers with a complete full-link closed-loop service.

For example, the AI model community "Magic Ride", launched in 2022, has 2,300+ models and more than 2.8 million developers so far. In this platform, developers can download various models and call API inference.

Not only that, developers can also call the model API through the "Lingji Platform", or use the "Bailian Platform" to customize large model applications.

and the "Artificial Intelligence Platform PAI", which is deeply adapted to the full range of models of Tongyi Qianwen, has also launched services such as lightweight fine-tuning, full-parameter fine-tuning, distributed training, offline inference verification, and online service deployment.

Tongyi Qianwen exploded the open source family bucket!72 billion parameters overtaking Llama 2, look at the picture and go straight out of the code

As one of the important contributors to open source, Alibaba Cloud continues to contribute to the open source community this time with 72 billion parameters, 1.8 billion parameters, and Qwen-Audio, a voice model.

Resources:

https://modelscope.cn/organization/qwen

Read on