laitimes

After the stock price rose by more than 30%, trading was suspended, what are the highlights of SenseTime Rixin 5.0?

author:21st Century Business Herald

21st Century Business Herald reporter Dong Jingyi reported from Shanghai

On April 23, SenseTime held a technical exchange day and released SenseTime's "SenseNova 5.0".

Since its first release in April last year, SenseTime's "RiRixin SenseNova" large model system has officially launched five major version iterations. Based on more than 10TB of tokens and covering a large amount of synthetic data, "Ririxin SenseNova 5.0" (hereinafter referred to as Ririxin 5.0) adopts a hybrid expert architecture (MoE), and the context window can be effective to about 200K during inference.

It is reported that this update mainly focuses on enhancing knowledge, mathematics, reasoning and code capabilities, benchmarking GPT-4 Turbo, and reaching or surpassing GPT-4 Turbo in mainstream objective evaluation.

After the market opened on April 24, SenseTime's stock price rose sharply, once rising by more than 36%. At 11:15 a.m., SenseTime announced a temporary suspension of trading. As of the suspension of trading, the company's share price was HK$0.80 per share, an increase of 31.15%, with a total market capitalization of HK$26.8 billion.

SenseTime responded to the 21st Century Business Herald reporter that yesterday's RiRixin model 5.0 press conference was widely praised and received great attention from the market, and in accordance with the listing rules and the recommendations of the Hong Kong Stock Exchange, the company will further publish relevant announcements.

In the afternoon, SenseTime issued an announcement stating that the board of directors had noticed the recent unusual fluctuations in the trading price and trading volume of Class B shares, and the trading of Class B shares was suspended from 11:15 a.m. on April 24. The Company has applied to the Stock Exchange for the resumption of trading of Class B shares from 9am on 25 April.

Break through data bottlenecks

How to complete the upgrade of Ririxin 5.0? Xu Li, Chairman and CEO of SenseTime, pointed out the critical path on the technical exchange day.

"Under the guidance of the law of scale, SenseTime will continue to explore the KRE three-layer architecture (knowledge-reasoning-execution) of large model capabilities, and constantly break through the boundaries of large model capabilities. Xu Li said.

The research and development of large models has a basic law that everyone recognizes in the industry, which is called "Scaling Law" in the industry. In the general sense of the law of scale, as the parameters of the model become larger, the amount of data becomes larger, and the training time increases, the algorithm performance will get better and better. Therefore, in order to form a general artificial intelligence model, the consumption of computing power has become an inevitable requirement.

Xu Li said that in fact, there are two hidden assumptions: first, predictability, doing a lot of experiments on a small scale, spanning 5-7 orders of magnitude to still maintain accurate prediction of performance, and second, order-preserving, which verifies the advantages and disadvantages of performance on a small scale, and still maintains it on a larger scale.

"The law of scale is a guide for resource allocation, which can guide us to find the optimal model architecture and data recipe based on limited R&D resources, so that the model can complete the learning process more efficiently. Xu Li said.

Based on the experimental results, the performance of the small model can approach or even surpass that of the large model across data levels when the data is optimized. However, data has always been the bottleneck for the continuous improvement of AI, and it is also one of the most important improvements of Ririxin 5.0.

At the knowledge level, RiRixin 5.0 uses more than 10TB of Tokens, which ensures the completeness of high-quality data and provides a rich knowledge base for the model.

At the inference level, RiRixin 5.0 constructs chain-of-thought data through synthesis, which helps models better understand and reason about industry-specific logic and knowledge.

According to reports, in terms of liberal arts ability, the creative writing ability, reasoning ability and summary ability of "Ririxin 5.0" have been improved, and after the same Chinese knowledge is injected, better understanding and summary and Q&A can be obtained, providing assistance for vertical application scenarios such as education and content industry. In terms of science ability, the mathematical ability, code ability and reasoning ability of "Ririxin 5.0" have been improved, providing a foundation for the implementation of scenarios such as finance and data analysis.

In terms of multi-modal capabilities, it supports the analysis and understanding of high-definition long graphs and the interactive generation of Wensheng diagrams, realizes complex cross-document knowledge extraction, summary Q&A display, and has rich multimodal interaction capabilities.

Xu Li said, "The comprehensive capabilities of the Ririxin 5.0 large model system are fully benchmarked against GPT-4 Turbo, and the technology is leading the way to accelerate the comprehensive transition of generative AI to the industry. ”

Device-cloud synergy

In the past year, large models in the cloud have been widely used in various industries. However, smart terminals, such as mobile phones, PCs, and automobiles, are also a very wide range of carriers and scenarios for general artificial intelligence applications.

Xu Li said that this year is the first year of the explosion of large models in device-side applications, "The application of end-side capabilities is actually the core key to the rollout of large models. ”

In order to meet the application needs of mobile end users for large model technology, SenseTime has also launched a 1.8B (1.8 billion) parameter scale device-side large model. It is understood that it achieves an average generation speed of 18.3 words/s on the mid-range platform, and 78.3 words/s on the flagship platform.

On the other hand, the device-side large model also makes up for the shortcomings of the cloud.

The first is the challenge of balancing model performance and cost. Wang Xiaogang, co-founder and chief scientist of SenseTime, said in an interview with the 21st Century Business Herald reporter that if billions of device-side devices are constantly calling the cloud model, it will require huge computing power; For example, Wang Xiaogang said that in autonomous driving, the deployment of large models must occur on the device side.

He further said that different applications have different requirements for the accuracy or experience of the model, which means that the end-side model can be used for applications that do not have very high model requirements.

As a result, the device-cloud collaboration solution has emerged, in which a smaller, task-optimized model is deployed on device-side devices (such as smartphones and IoT devices) in a device-cloud architecture. These models can respond quickly to the needs of users and handle tasks that don't require a lot of computing resources.

The cloud has more powerful computing resources and larger models to handle more complex or data-intensive tasks. Cloud-based models typically have more parameters and are able to provide deeper learning and inference capabilities.

The MoE architecture of device-cloud integration can give full play to the respective advantages of devices and clouds through intelligent judgment and collaboration, and offload to the cloud for processing when it is necessary to search or process complex scenarios on the Internet, and the device-side processing accounts for more than 80% of some scenarios, thereby significantly reducing the cost of inference.

Wang Xiaogang told reporters that by intelligently selecting the most suitable model, the combination of device and cloud can provide faster response time and more accurate results, thereby optimizing the user experience.

SenseTime said that the popularization and promotion of the application of end-side large models will be a strategic focus this year.

In addition, for the growing demand for AI applications at the edge of key industries such as finance, code, healthcare, and government affairs, SenseTime has also launched enterprise-level large-scale model all-in-one machines. Wang Xiaogang believes that the combination with vertical industries is a key indicator to reflect the "differentiation" of the model, "and where the application value of the model is, and in which directions it can be optimized, these need to be driven by the industry." ”

For more information, please download the 21 Finance APP

Read on