
Take stock of the current situation of domestic large models

author:Intelligent investment strategizing

On December 23, the results of the first official "large model standard compliance evaluation" in China were announced, which was initiated by the China Electronics Standardization Institute of the Ministry of Industry and Information Technology, and it is understood that the first batch was only passed by 360, Baidu, Tencent and Ali.

The development of domestic large-scale models began in February, but in just 10 months, many models have been publicly tested.

Take stock of the current situation of domestic large models

1. Baidu - Wenxin Yiyan

1) On August 31, 2023, Wenxin Yiyan took the lead in fully opening up to the whole society.

2) Underlying AI chips: Kunlun chips, formerly known as Baidu's intelligent chip and architecture department, completed independent financing in 2021, and currently has 1/2 generations of Kunlun chip mass production products.

3) Deep learning Xi framework: The PaddlePaddle platform integrates the core framework, basic model library, and end-to-end development kit. The number of developers of the PaddlePaddle platform has reached 8 million, and the number of models has exceeded 800,000.

4) On September 1, Wenxin Yiyan plug-in ecosystem "Spiritual Matrix" opened the internal test invitation, and nearly 100 companies have settled in the first batch, such as Ctrip, WPSAI, iQiyi, Autohome, Maoyan Movie, Understanding Ball Emperor, etc.

5) Wenxin Yiyan PC has officially opened 3 plug-ins, namely Illustrated Diagram (text creation based on pictures), E Yanyi Diagram (providing insights and chart making based on data), and Scroll Document (summary and Q&A based on documents).

2. iFLYTEK - Xinghuo Cognition 2.0

1) On August 15, 2023, iFLYTEK released version 2.0 of the Xinghuo cognitive model.

2) The large model is synchronized to multiple products and businesses, including iFlycode intelligent programming assistant, iFLYTEK Xinghuo Companion 2.0, Xinghuo Teacher Assistant, etc.

3) Spark Model 2.0 improves the functions of code generation, code completion, code error correction and unit test generation in the process of code writing.

4) Programming assistant iFlyCode1.0, according to the statistics of more than 2,000 employees who tested the use of iFlyCode1.0 within 1 month on iFLYTEK's internal R&D performance platform, the code adoption rate reached 30%, the coding efficiency increased by 30%, and the overall efficiency increased by 15%.

3.360 -- Intelligent Brain Model 4.0

1) On June 13, 360 Group officially held the "360 Intelligent Brain Model" application conference.

2) 360 Zhibrain is the country's first native security model, and the official claims that it has a self-developed parameter scale of 100 billion yuan, and its core capabilities rank in the first echelon in China. The large model is pre-trained with more than one trillion tokens, and has ten core capabilities such as generation and creation, multi-round dialogue, and logical reasoning, as well as hundreds of subdivision functions, which can cover all scenarios of large model applications.

3) The enterprise-level vertical model based on 360 Intelligent Brain has been successively implemented in nearly 20 industries such as finance, healthcare, and education.

4) 360 Smart Brain introduces a stronger intelligent assistant for the online experience, which can summarize, translate, and rewrite the browsing web with one click, combined with graphical efficiency tools.

5) Combined with the vertical knowledge base accumulated over the years by 360 search, the large model digital human can intelligently optimize the user's input prompt, identify the user's intent, and make the AI Q&A generation results more professional and high-quality.

Take stock of the current situation of domestic large models

4. Ali - Tongyi Qianwen

1) In September 2022, Alibaba released the latest "Tongyi" large model series.

2) The general model layer includes three types of models: Tongyi-M6, Tongyi-AliceMind, and Tongyi-CV, covering multimodality, natural language processing, and computer vision, and the professional model layer is deeply involved in e-commerce, healthcare, law, finance, entertainment and other industries.

3) Within Alibaba, all Ali products will be connected to the Tongyi Qianwen model for a comprehensive upgrade. At present, DingTalk, Tmall Genie and other products have been connected to the Tongyi Qianwen test.

4) For developers and enterprises, Tongyi provides a variety of model access and transfer methods, based on the Lingjun platform, which supports one-click deployment of cloud services for various models, flexible API calls, and support model fine-tuning and customization.

5. Tencent - Mixed Yuan Model

1) On September 15, 2023, the first batch of Tencent's hybrid model passed the filing.

2) Tencent's hybrid model is a utility-level model developed by Tencent in the whole link, with a scale of over 100 billion parameters and more than 2 trillion tokens of pre-trained corpus, which has been deeply applied to multiple business scenarios, including Tencent Cloud, Tencent Advertising, Tencent Games, Tencent FinTech, Tencent Meeting, Tencent Docs, WeChat Soyiso, QQ Browser, etc., and more than 100 businesses and products, which have been connected to Tencent Hybrid Model for testing.

3) The hybrid model is backed by Tencent and has many potential application scenarios, and the more scenarios that are implemented, the more business value and data generated by users, and the training cost and inference cost of the general large model will be reduced, but the model ability will be stronger, thus forming a flywheel effect.

6. Huawei--Pangu model

1) On July 7, 2023, the Pangu model was released at the Huawei Developer Conference.

2) Pangu 3.0 provides customers with a series of basic models with 10 billion parameters, 38 billion parameters, 71 billion parameters and 100 billion parameters to match the diversified needs of customers in different scenarios, different delays, and different response speeds.

3) L0 layer: It includes five basic large models: natural language, vision, multimodality, prediction, and scientific computing, providing a variety of skills to meet the needs of industry scenarios.

4) L1 layer: HUAWEI CLOUD can provide industry-wide models trained using industry-wide public data, including government, finance, manufacturing, mining, and meteorology, and can also train its own proprietary models on the L0 and L1 layers of the Pangu model based on industry customers' own data.

5) L2 layer: It is to provide customers with more detailed scenario models, focus more on a specific application scenario or specific business, and provide customers with "out-of-the-box" model services.

Take stock of the current situation of domestic large models

7. ByteDance - Bean Bag

1) On August 17, 2023, ByteDance launched an open beta to develop an AI dialogue product "Doubao" based on the lark model, which has a web client, iOS and Android clients, and two preset functions: English learning Xi assistant and writing assistant.

2) The lark model is positioned as an AI model for natural language content generation and content understanding, which is developed based on the byte machine Xi platform, which can have a dialogue with users through natural language processing technology, answer users' questions, and provide relevant information and suggestions.

3) The Byte Research team has also announced an academic research project on a multimodal large model, BuboGPT, which is capable of processing multimodal inputs including text, images, and audio, that is, it can not only understand images, audio, and text, and combine these understandings with text input and output, but also locate and describe objects in images and the source of sounds.

4) The application scenarios are more life-oriented, mainly focusing on emotional companionship, travel planning, daily writing, etc., and the application scenarios are less than Wenxin Yiyan and Xunfei Xinghuo.

5) Provide indexing for the generated factual content to improve credibility.According to user needs, if the generated content has high requirements for authenticity, Doubao will attach a reference link at the end of the generated result, so that users can confirm the authenticity of the information, such as professional terms, real-world events or geographical locations.

8. SenseTime - a new model of the day

1) On April 10, 2023, SenseTime released the "Daily New" large-scale model system.

2) SenseTime held a technical exchange day and released the "RiRixin SenseNova" large model system, demonstrating AI model application capabilities such as Q&A, code generation, 2D/3D digital human generation, and 3D scene/object generation.

3) In terms of landing scenarios, SenseTime's language model has shown strong capabilities in the fields of professional text understanding, code generation, and assisting in preliminary medical consultations, and generative AI has great potential in e-commerce, advertising, cultural tourism, and other fields.

9. Baichuan intelligent large model

1) On September 6, 2023, Baichuan Intelligent officially released the fine-tuned Baichuan2-7B, Baichuan2-13B, Baichuan2-13B-Chat and their 4-bit quantized versions, all of which are free and commercially available open source large models.

2) At present, on the mainstream Chinese and English general list, Baichuan2 is ahead of Llama2 released by Meta on July 19, and its performance is excellent in Chinese dialogue understanding, and it is more thorough in open source commercial use.

3) The first open-source model in China, with good text ability In June and August this year, Baichuan successively released 3 general-purpose large models at the parameter level, of which 7 billion and 13 billion parameter models are open source and commercial, which is the first open-source model in China.

4) The founder and CEO of Baichuan Intelligence is Wang Xiaochuan, who used to be the CEO of Sogou.

Take stock of the current situation of domestic large models

It is not intended as a securities recommendation or investment advice, it is intended to provide more information, and the author does not guarantee the accuracy of its content.

Read on