laitimes

Developers vote with their feet, Tongyi Qianwen is popular in the Chinese and English AI community, and today a new model is exploded

author:Smart stuff
Developers vote with their feet, Tongyi Qianwen is popular in the Chinese and English AI community, and today a new model is exploded

Smart stuff

Author | vanilla

Edit | Shadow of indifference

Zhidong reported on September 25 that today, Alibaba Cloud held a large-model open source conference, officially releasing the 14 billion parameter model Qwen-14B and the dialogue model Qwen-14B-Chat, which are open source and free.

Following the open-source community's word-of-mouth Qwen-7B, Qwen-14B is expected to become the next bomb-up presence. According to reports, Qwen-14B stands out in many open-source models of the same size, and has achieved the best results in 12 authoritative evaluation sets such as MMLU, C-Eval, GSM8K, MATH, and GaoKao-Bench, surpassing all SOTA large models in evaluation. Some of the capabilities are not inferior to the 34B and 70B models of the Llama 2.

Developers vote with their feet, Tongyi Qianwen is popular in the Chinese and English AI community, and today a new model is exploded

The Qwen-14B model surpassed the SOTA large model on 12 authoritative lists

The Qwen-14B has put a lot of effort into "ease of use". The Tongyi Qianwen team has upgraded the ability of Qwen models to connect with external systems, allowing developers to implement complex plug-in calls through simple operations, or quickly develop AI systems such as agents based on Qwen series base models, and use Qwen's understanding and planning capabilities to complete complex tasks. At the same time, Qwen-7B has also achieved a comprehensive upgrade, with core indicators increased by up to 22.5%.

Developers vote with their feet, Tongyi Qianwen is popular in the Chinese and English AI community, and today a new model is exploded

Alibaba Cloud Intelligent CTO Zhou Jingren released Qwen-14B at the press conference

Just last month, Alibaba Cloud became the first major factory in China to enter the camp of large-model open source. Open source general model Qwen-7B, dialogue model Qwen-7B-Chat, etc. In just over a month, the number of downloads of Qwen-7B and other models exceeded 1 million, more than 50 related derivative models appeared in the open source community, and many enterprises with monthly activity of more than 100 million applied to the Tongyi Qianwen team for use. The vertical large model of Zhihai-Sanle Education of Zhejiang University and other Zhejiang University, and the intelligent cleaning robot of Zhejiang Youlu Robot are all based on Qwen-7B.

Open source is obviously not the decision of Alibaba Cloud's momentary interest. Zhou Jingren, CTO of Alibaba Cloud Intelligence, said at the press conference that Alibaba Cloud will adhere to the determination to embrace open source and openness, "make computing power more inclusive, and make AI more popular".

Qwen-14B-Chat Experience Address:

https://modelscope.cn/studios/qwen/Qwen-14B-Chat-Demo/summary/

First, "reverse reasoning" did not stump Qwen-14B, how did it do it?

Qwen-14B is a high-performance open source model that supports multiple languages, using more high-quality data than similar models, and the overall training data exceeds 3 trillion tokens, making the model have more powerful reasoning, cognition, planning and memory capabilities, and supports a maximum context window length of 8k.

Compared with Qwen-7B, the Qwen-14B model further enhances the agent capability and significantly improves the reliability when using complex tools. For example, Qwen-14B can proficiently use the Code Interpreter tool to execute Python code, perform complex mathematical calculations, data analysis, and data charting. In addition, the Qwen-14B's planning and memory skills have been improved, making it more reliable when performing tasks such as multi-document Q&A and long-form writing.

Interestingly, when Chi-Dong asked the Qwen-7B-Chat chatbot a question involving "reverse reasoning," Qwen-7B-Chat gave an accurate answer. Recently, a study from the British Frontier AI Working Group, Apollo Research, New York University, Oxford and other institutions showed that large models have a dilemma in inferring "B is A" from "A is B", and in 519 facts about stars, pre-trained large models can be replicated in one direction, but not in the other.

Developers vote with their feet, Tongyi Qianwen is popular in the Chinese and English AI community, and today a new model is exploded

Qwen-7B-Chat chatbot's answer to the "reverse reasoning" question

So, how did the Qwen-14B do it?

First of all, in terms of dataset construction, the Tongyi Qianwen R&D team used a large-scale pre-training dataset of 3 trillion tokens, covering knowledge in various fields and industries, including language and code data in multiple languages. On this basis, the R&D team did more refined data processing, including large-scale data deduplication, garbage text filtering, and improving the proportion of high-quality data.

Secondly, in terms of model structure, the R&D team of Tongyi Qianwen did a series of preliminary experiments to verify the impact of model structure design on the effect. Overall, most of the technical choices in Google's PaLM and Meta's Llama models are better, including SwiGLU's activation function design, ROPE position coding, etc., which are used in Qwen's structural design.

The Tongyi Qianwen team has specially optimized the vocabulary with a vocabulary size of more than 150,000, which has good coding efficiency. Compared with other tokenizers (tokenizers), more information can be represented with fewer tokens, and lower costs can be achieved by saving the number of tokens.

In addition, the team focused on optimizing the modeling of long sequence data, using the most effective strategies at present, including but not limited to Dynamic NTK, Log-N Attention Scaling, Window Attention, etc., and made some detailed adjustments to ensure that the model performance on long sequence data is more stable. At present, the sequence length of the Qwen-14B model that can be adapted and achieved stable performance has reached 8192.

The R&D team of Tongyi Qianwen said that there are not too many complex skills in large model training, but more through a large number of attempts and iterations, to find better training parameters, to achieve the optimal balance of training stability, training effect and training efficiency, including but not limited to the configuration of optimizers, model parallel configuration, etc.

Finally, in terms of the ability of external tools, the R&D team mainly optimized two aspects. First, in terms of fine-tuning samples, by establishing a more comprehensive automatic evaluation benchmark, the previous Qwen performance was actively discovered, and the self-instruct self-guidance method was used to expand the high-quality fine-tuning samples. The second is to improve the ability of the base pre-trained model, thereby enhancing the understanding and code ability of the model. As a result, Qwen-14B outperformed Qwen-7B.

At present, Qwen-14B and the dialogue model Qwen-14B-Chat have been launched on the magic community for free use by the whole society. In addition to downloading models directly from the MoDai community, users can also access and call Qwen-14B and Qwen-14B-Chat through Alibaba Cloud DashScope to experience the full range of services provided by Alibaba Cloud, including model training, inference, deployment, and fine tuning.

Second, the developer votes with his feet, and Tongyi Qianwen runs out of the landing acceleration

On August 3, Alibaba Cloud open-sourced the Qwen-7B 7 billion parameter model and the dialogue model Qwen-7B-Chat, both of which are open source and free. In a number of authoritative evaluations, the Tongyi Qianwen 7B model has achieved the effect of surpassing models of the same size at home and abroad.

Feedback from numerous developers verifies Benchmark's conclusions. According to reports, Qwen-7B is also popular in open source communities other than Modai, and has successively rushed to the Trending list of Hugging Face, GitHub and other communities, and overseas open source communities that dominate the English world model are also full of existence.

Developers vote with their feet, Tongyi Qianwen is popular in the Chinese and English AI community, and today a new model is exploded

Qwen-7B made it to GitHub's Trending list

Developers voted with their feet, and more than 1 million Qwen-7B and other models were downloaded in more than a month, more than 50 new models based on Qwen appeared in the open source community, and the Tongyi Qianwen team has also received a number of enterprises with a monthly activity of more than 100 million to apply for authorization.

At present, many well-known tools and frameworks in the open source community have integrated Qwen, such as FastChat, a tool that supports building WebUI, APIs and fine-tuning tools for building large models, AutoGPTQ, a quantitative model framework, LMDeploy, a large model deployment and inference framework, and XTuner, a large model fine-tuning framework.

There are also a large number of developers who have developed their own models and applications based on Qwen, such as LLaMA-Efficient-Tuning, Firefly, and OpenAI.mini projects developed by individual developers, all of which support or use the Qwen model.

Developers vote with their feet, Tongyi Qianwen is popular in the Chinese and English AI community, and today a new model is exploded

AutoGPTQ, a quantitative model framework, integrates the Qwen model

With the blessing of open source initiatives, the Tongyi Qianwen model has run out of the acceleration of landing applications, and the application institutions that access Tongyi Qianwen cover the Internet and traditional industries, academia and industry, leading enterprises and start-ups, including Alibaba's Taobao, DingTalk, Future Genie (formerly Tmall Genius), Zhejiang University and Higher Education Press, Zhejiang Youlu Robot Technology Co., Ltd., etc.

Developers vote with their feet, Tongyi Qianwen is popular in the Chinese and English AI community, and today a new model is exploded

Zhou Jingren introduced the landing of Qwen-7B at the press conference

Alibaba Cloud displayed a number of application cases at the press conference, making the "landing of large models" knowable and perceptible. For example, Zhejiang University, together with Higher Education Press and Alibaba Cloud, has trained the Zhihai-Sanle Education vertical model based on Qwen-7B, which has been launched on the Alibaba Cloud Lingji platform, and developers can use it with only one line of code. The model has been applied in 12 colleges and universities across the country, providing intelligent question answering, test question generation, teaching evaluation and other capabilities.

Start-up Zhejiang Youlu Robot Technology Co., Ltd. has integrated Qwen-7B into robots and begun to explore new "embodied intelligence". In the road cleaning robot AI130, Youlu integrates Qwen-7B, so that the robot can use natural language to interact with the user in real time, understand the needs put forward by the user, such as "go and clean the Coke bottle next to Building 5", the robot can automatically analyze and disassemble the user's high-level instructions, and complete the cleaning task through high-level logical analysis and task planning.

Third, "one flower alone is not spring", fully embrace open source and openness

Alibaba Cloud said that in the 100-model war, many people saw the "big war", and Alibaba Cloud saw the "100 models".

Zhang Qi, vice president of Alibaba Cloud and general manager of the public and customer communication department, told reporters: "One flower alone is not spring, and a hundred flowers bloom in spring. Whether it is a closed-source large model or an open source large model, a self-developed large model or a third-party large model, a large-scale parameter model or a small-scale parameter model, a general large model or an industry- and enterprise-specific large model, Alibaba Cloud welcomes and supports all of us to jointly build the largest large-model free market. We want all big models to run faster, cheaper, and safer on Alibaba Cloud. Because of this, Alibaba Cloud took the lead in open source the 7B and 14B models, and will continue to open source and contribute to the open source community. ”

This explains Alibaba Cloud's alternative route: creating an ecosystem. Looking back at Alibaba Cloud's various initiatives since the rise of big models, from theory to practice, Alibaba Cloud is doing the same thing.

In 2022, Alibaba Cloud first proposed the concept of MaaS (Model as a Service) in the industry, providing a theoretical basis and best practices for the construction of a large model ecosystem under the new wave of AI. The core of the MaaS concept lies in proposing a new development paradigm with AI models as the core. Based on this, Alibaba Cloud has built a set of cloud computing technology and service architecture with AI models as the core, and fully opened this set of capabilities to large-model startups and developers. In less than a year, the big model industry has become "MaaS".

Developers vote with their feet, Tongyi Qianwen is popular in the Chinese and English AI community, and today a new model is exploded

Zhou Jingren introduced Alibaba Cloud's MaaS concept at the press conference

In July 2023, Alibaba Cloud announced that it would make promoting the prosperity of China's big model ecosystem its primary goal, providing a full range of services to big model startups, including the most powerful intelligent computing power and development tools, and providing full support in terms of capital and commercialization exploration.

According to the sharing of this conference, Alibaba Cloud has several unique advantages in providing underlying computing power services:

At the infrastructure layer, Alibaba Cloud has the strongest intelligent computing power reserve in China, and its Lingjun intelligent computing cluster can support a maximum GPU scale of 100,000 cards, carrying multiple trillion-parameter large models for simultaneous online training.

At the AI platform layer, Alibaba Cloud's machine learning platform PAI provides engineering capabilities for the entire AI development process, which can shorten the training time of large models by 10 times. The one-stop model service platform Lingji has an automated model migration to the cloud unified tool link, which supports model independent access and automatic access to the platform's powerful service capabilities. The Lingji platform now hosts large models such as Tongyi Qianwen, Stable Diffusion, ChatGLM-v2, Baichuan, and Jiang Ziya.

At the developer ecosystem, Alibaba Cloud led the construction of ModelScope, China's first AI open source portal. Adhering to the innovative concept of "model as a service", Modai Community has gathered more than 1,200 high-quality AI models contributed by more than 30 top AI institutions, and turned AI models into directly usable services, providing developers with one-stop model experience, download, reasoning, tuning, customization and other services.

Developers vote with their feet, Tongyi Qianwen is popular in the Chinese and English AI community, and today a new model is exploded

Zhou Jingren introduced the Motai community at the press conference

The model contributors of the Modai community basically cover the core players of the domestic large-scale model track, and large-scale model companies have taken the Modai as the first stop for the open source release of self-developed models. In September, Baichuan Intelligence's Baichuan 2 series model, Shanghai Artificial Intelligence Lab's Shusheng Puyu 20B model, Zhipu AI's MathGLM and other models were all released at Modai Open Source. Among them, the Shusheng Puyu series of models reached an ecological cooperation with the Modai community, saying that they would jointly promote the ecological construction of China's large models.

The abundance of model supply has brought about the convergence of developers, and "finding a big model on the magic match" has become the common mind of developers. Less than a year after its launch, the community has gathered 2.3 million AI developers, and the cumulative number of model downloads has exceeded 85 million.

In the "big model free market" imagined by Alibaba Cloud, Tongyi Qianwen is only one of the "100 models". Open source is the "best practice" of Alibaba Cloud's integration of knowledge and action to carry out the construction of a large model ecosystem.

The open source ecosystem is crucial to promoting the technology inclusiveness and application of the general large model. The high cost of large model training is unaffordable for most small and medium-sized enterprises and developers. Open source of large models can promote the large model capabilities of leading enterprises to small and medium-sized enterprises and developers at a lower cost and faster speed, accelerate the construction of large model ecology, and breed large model application innovation.

From a broader perspective, the competition of AI big models is not only between companies and research teams, but also between ecology and ecology. If the system capability of "public cloud + AI" is the ticket to big model competition, then technology and industrial ecology are the main battlefield of global big model competition. Industrial ecology is the key to building business closed loop and competitive barriers, the sooner the large model is introduced to the market, the more feedback is absorbed to feed the large model, and the more the "flywheel effect" of "stronger model, more application, more application, stronger model" can be realized.

In the end, it is every developer, SME, and the entire big model industry that benefits.

Read on