laitimes

China's large-scale model is gaining momentum: Kimi, the dark side of the moon, has been upgraded and broken through, and Step Star Step has been released

author:Lionsgate
China's large-scale model is gaining momentum: Kimi, the dark side of the moon, has been upgraded and broken through, and Step Star Step has been released

Source: SDIC Securities, Author: Zhao Yang, Xia Yingtao

1. Kimi: The world's leading lossless long text processing capability

Kimi has increased the amount of long text input by 10 times, and is now the world's leader. Founded in March 2023, the main product Kimi intelligent assistant made its debut in October 2023, with the lossless context capability of about 200,000 Chinese characters, it has helped users unlock many new use scenarios, including the translation and understanding of professional academic papers, assisting in the analysis of legal issues, sorting out dozens of invoices at one time, and quickly understanding API development documents, etc., which has won a good user reputation and rapid growth in the number of users. On March 18 this year, the company announced that the Kimi intelligent assistant has made another breakthrough in long context window technology, increasing the length of lossless context by an order of magnitude to 2 million words. According to the data of Machine Heart, the GPT-4.5 Turbo context window that has not yet been launched is specified as 256,000 tokens, and Kimi's long text capability after this upgrade is 10 times that of it, which is the longest context input length that can be supported by large model services that can be used productically in the global market.

China's large-scale model is gaining momentum: Kimi, the dark side of the moon, has been upgraded and broken through, and Step Star Step has been released

A longer context means more "memory", which makes processing a large number of files more efficient. From a technical point of view, the number of parameters determines how complex the "calculation" the large model supports, and the amount of text input that the large model can receive (i.e., the long text technology) determines how much "memory" the large model has, both of which jointly determine the application effect of the model. Supporting longer contexts means that large models have more "memory", which makes the application of large models deeper and more extensive: for example, market analysis through multiple financial reports, handling extra-long legal contracts, quickly sorting out key information from multiple articles or multiple web pages, role-playing based on novel settings, etc. At the same time, through innovative network structure and engineering optimization, Kimi Chat realizes a lossless long-range attention mechanism under 100 billion parameters, and does not rely on "shortcut" solutions that have great performance damage such as sliding windows, downsampling, and small models.

China's large-scale model is gaining momentum: Kimi, the dark side of the moon, has been upgraded and broken through, and Step Star Step has been released
China's large-scale model is gaining momentum: Kimi, the dark side of the moon, has been upgraded and broken through, and Step Star Step has been released

Intelligent retrieval and analysis and summary are closely related to long text processing capabilities: Kimi can actively search, analyze and summarize the most relevant pages on the Internet according to the user's problem, and the multiple pieces of information obtained from the search will be handed over to the model as part of the context to inference and generate more direct and accurate answers. It is precisely because the context window supported by the Kimi large model is long enough and the information loss in the window is low enough that the Kimi intelligent assistant can output high-quality results. For example, users can let Kimi actively search and compare the latest financial report data of two listed companies in the same field, and directly generate a comparison table, saving a lot of time in data search.

China's large-scale model is gaining momentum: Kimi, the dark side of the moon, has been upgraded and broken through, and Step Star Step has been released

Outstanding ability of multi-round interaction and ultra-long instruction following: The ability to follow instructions is closely related to the lossless context ability of large models. The ability to follow instructions is mainly reflected in two aspects: 1) whether the model can always follow the user's instructions and understand the user's needs in multiple rounds of dialogue, and 2) whether the model can follow complex instructions, which may sometimes be thousands or tens of thousands of words long. Judging from user feedback since the launch of the product, the multi-round interaction and ultra-long instruction following ability of Kimi intelligent assistant are also a core advantage of the product.

Kimi's traffic is increasing much more than expected, and emergency measures such as capacity expansion have been taken. According to Similarweb data, the number of daily active users of Kimi Web has exceeded 200,000 for several consecutive days, with a peak of 346,000 daily active users, and the weekly active data has increased by 45% month-on-month. The Dark Side of the Moon released a statement that from 9:30 on March 20, 2024, Kimi's system traffic has continued to increase abnormally, and the trend of traffic increase far exceeds the company's expected resource planning. As a result, since 10:00 on March 20th, more SaaS customers have continued to experience the 429: engine is overloaded exception issue, for which the company is sorry, and a number of contingency measures have been implemented, including but not limited to: 5 expansion efforts have been carried out after the abnormal increase in traffic has been observed. Inference resources will continue to be expanded with traffic to support the continuous growth of users, and a more effective SaaS traffic prioritization strategy is designed to ensure the stability of paying users' calls, which is expected to be completed and launched before March 25.

China's large-scale model is gaining momentum: Kimi, the dark side of the moon, has been upgraded and broken through, and Step Star Step has been released

Top algorithm engineering talents are gathered, and the founding team members have participated in the research and development of many large models. Yang Zhilin, the founder of the Dark Side of the Moon team, graduated from the Department of Computer Science and Technology of Tsinghua University with a bachelor's degree and a Ph.D. from the Language Technology Institute (LTI) at Carnegie Mellon University, which ranks first in natural language processing in the United States, and has been cited more than 20,000 times since 2019. In the field of algorithms and engineering, the dark side of the moon includes the new generation of talents in natural language processing, computer vision, reinforcement learning, infrastructure, etc., and the core members of the founding team have participated in the research and development of many large models such as Google Gemini, Google Bard, Pangu NLP, and Wudao, and a number of core technologies have been adopted by mainstream products such as Google PaLM, Meta LLaMa, and Stable Diffusion.

2. Step Star: Released the preview version of the trillion-parameter MoE large model

Step-1V has outstanding multi-mode understanding capabilities, and is poised to release a trillion-parameter model. The general model startup Leap Star was established in April 2023. On March 23, 2024, the company officially unveiled during the 2024 Global Developer Pioneer Conference held in Shanghai, and Dr. Jiang Daxin, founder and CEO of Step Star, released the Step series of general large models at the opening ceremony of the conference. The Step-1V 100-billion-parameter multi-modal model has outstanding multi-mode understanding capabilities, which can accurately describe and understand text, data, charts and other information in images, and realize multiple tasks such as content creation, logical reasoning, data analysis, and video understanding based on image information. The model ranked first in the multimodal model evaluation list of OpenCompass, China's authoritative large-scale model evaluation platform, and its performance is comparable to GPT-4V.

At the conference, the preview version of the Step-2 trillion parameter MoE language model was also released, which adopts the MoE architecture, focuses on the exploration of deep intelligence, and provides API interfaces for some partners to try. The training of the trillion-parameter model reflects the core technical capabilities and determination of Step Star to explore general artificial intelligence.

China's large-scale model is gaining momentum: Kimi, the dark side of the moon, has been upgraded and broken through, and Step Star Step has been released

The founding team is firmly committed to climbing the scaling law, and the layout of the four elements of computing power/data/algorithm/system is deployed. The founder and CEO is Dr. Jiang Daxin, former global vice president of Microsoft and chief scientist of Microsoft Asia Internet Engineering Institute, and the core founding team includes Dr. Zhu Yibo, the head of the system, and Dr. Binxing Jiao, the head of data. Daxin Jiang is a world-renowned expert in the field of natural language processing, with extensive research and engineering experience in the fields of machine learning, data mining, natural language processing, and bioinformatics. Zhu Yibo has many practical experience in the construction and management of systems with a single cluster of more than 10,000 cards. Previously, he was the head of the core search team for Microsoft's Bing Engine, where he was responsible for optimizing indexing and search quality using data mining and NLP algorithms.

StepLeap Stars is firmly committed to climbing ScalingLaw on the path of large-scale model technology. According to the data of Step Star, the equivalent of a single cluster of 8 million calories, efficient and stable training, 10 trillion tokens of high-quality data, coupled with the control of a novel MoE architecture, if there is a shortcoming in any link, Scaling law will not be able to climb up. Therefore, since its establishment, the company has made a comprehensive layout in the four major elements of computing power, data, algorithms and systems:

1) Computing power: Actively reserve computing power through self-built computer room + rented computing power. Forward-looking layout of computing resources, Stepleap Star invested 200 million yuan in Shanghai Intelligent Computing Technology Co., Ltd. and held 10% of the shares. (The company's major shareholder is Shanghai INESA Group, which holds 44% of the shares, and Yunsai Zhilian holds 11% of the shares.) )

2) System: Practiced the construction and management of the system of more than 10,000 cards in a single cluster. The MFU (effective computing power output) of training 100 billion models reaches 57%.

3) Data: The core backbone of the data team comes from the Bing search engine, which has supported more than 100 languages around the world and provided services for more than 200 countries and regions. Have an in-depth understanding of the distribution of high-quality corpora on the global Internet. And establish a powerful data processing and knowledge graph pipeline.

4) Algorithms: The team can not only control various architectures, such as the trillion-parameter MoE architecture, but also have deep insight into the cognition and development path of large models.

China's large-scale model is gaining momentum: Kimi, the dark side of the moon, has been upgraded and broken through, and Step Star Step has been released

The unification of multimodal understanding and generation is the only way to AGI. Step Xingchen believes that the evolution of the model will inevitably go through three stages: "single-mode> multi-mode, > world model". In the early stages, the modalities of language, vision, and sound develop independently, and each model learns how to better represent each modality. The current stage is the convergence of multiple modalities, whether language, sight or sound, which can now be mapped to the same space for representation. Although multiple modalities are beginning to converge at this stage, there is still a problem - understanding the model and generating the model are developed separately. The result is a strong comprehension of a model with weak generative ability (e.g., GPT-4V), or a generative model with strong generative ability but weak comprehension (e.g., Sora).

Understanding and generation must be unified in a single model, that is, the unity of multimodal understanding and generation is the only way to AGI. In the future stage, with the unity of understanding and generation, it can be further combined with embodied intelligence to form a model of the world.

Further, adding the planning ability of complex tasks and the ability to summarize abstract concepts on the basis of the world model has truly evolved to the stage of AGI.

China's large-scale model is gaining momentum: Kimi, the dark side of the moon, has been upgraded and broken through, and Step Star Step has been released

Based on the self-developed large model base, StepChat has launched two AI application products for C-end users: 1) StepChat is a free AI chatbot developed based on the company's 100 billion-level parameter model, positioned as a personal efficiency assistant, with main functions including AI dialogue chat, image content understanding, document information summary, web content analysis, online search, etc. 2) Bubble Duck is a free AI open world platform developed based on the company's 100-billion-level parameter model, which provides a large number of agents covering multiple fields such as anthropomorphism, tools, content, games, entertainment, etc., and sets up one billion plots and characters, with which users can have a multi-scene role-playing experience. Bubble Duck AI relies on ultra-long contextual memory capabilities and real-time online search capabilities to deeply understand user intent and provide immediate, accurate, and personalized responses and choices.

China's large-scale model is gaining momentum: Kimi, the dark side of the moon, has been upgraded and broken through, and Step Star Step has been released

Read on