laitimes

Why did SenseTime run out of the "escape speed" of the large model landing?

author:Meng Yonghui
Why did SenseTime run out of the "escape speed" of the large model landing?

When Xu Li, Chairman and CEO of SenseTime, released the newly upgraded "Ririxin SenseNova 5.0" large model system in front of a rather Chinese-style background, it indicated that SenseTime became the first enterprise to realize the full-stack layout of the cloud edge, and it was on such a background that the "AI Large Model Era II" was written.

As a result, people can't help but ask, why did SenseTime run out of the "escape speed" of the large model and achieve the performance of the large model beyond GPT-4 Turbo?

If we have a comprehensive understanding of the SenseTime 5.0 model and the powerful computing power behind SenseTime, we will not be surprised by this.

As Xu Li said, "Under the guidance of the law of scale, SenseTime will continue to explore the KRE three-layer architecture (knowledge-reasoning-execution) of large model capabilities, and constantly break through the boundaries of large model capabilities. Based on this, we may be able to find the internal logic of the "escape velocity" of SenseTime's large-scale model.

商汤日日新5.0性能全面超越GPT-4 Turbo

Since its official launch in April last year, SenseTime's new large-scale model system has completed five major iterative upgrades. This upgrade is mainly based on more than 10TB tokens of training, covering a large amount of synthetic data, using a hybrid expert architecture, the context window can be effective to about 200K during inference, and the aggregation enhances the knowledge, mathematics, reasoning and code capabilities, fully benchmarking GPT-4 Turbo, and reaching or surpassing GPT-4 Turbo in mainstream objective evaluation.

Thanks to these updates, the "liberal arts ability", "science ability" and multimodal ability of Ririxin 5.0 have been qualitatively improved.

Take Ririxin 5.0 and GPT-4 as an example of answering interesting reasoning questions: "Mom made a cup of coffee for Yuanyuan, and after Yuanyuan drank half a cup, she filled it with water, and then she drank another half a cup, filled it with water, and finally drank it all." Asked Yuanyuan to drink more coffee or water?", Ririxin 5.0 answered correctly, and GPT-4 answered incorrectly.

The improvement of these capabilities of Ririxin 5.0 can better summarize and answer questions for users in the context of Chinese, and help the application of industry scenarios such as education and content.

At the same time, the mathematical ability, code ability and reasoning ability of RiRixin 5.0 have been greatly improved, which can also provide strong support for the application of finance, data analysis and other scenarios.

In addition to "liberal arts ability" and "science ability", the multimodal ability of Ririxin 5.0 also performs well. It not only supports the analysis and understanding of high-definition long graphs and the interactive generation of Wensheng diagrams, but also realizes complex cross-document knowledge extraction and summary Q&A display, and also has rich multi-modal interaction capabilities.

The image and text perception ability of SenseTime's multi-modal large model has reached the world's leading level, ranking first in the comprehensive score of MMBench, the authoritative comprehensive benchmark test of multi-modal large models, and leading results in many well-known multi-modal lists MathVista, AI2D, ChartQA, TextVQA, DocVQA, and MMMU.

It can be seen that SenseTime 5.0's outstanding performance in "liberal arts ability", "science ability" and multimodal ability has laid a solid foundation for it to better help the implementation of large-scale model scenarios. It not only fully meets or exceeds GPT-4 Turbo in subjective evaluation, but also helps more local companies actively embrace the dividends brought by the era of large models in the Chinese environment.

Therefore, if we want to find the internal logic of the "escape speed" of SenseTime Ririxin 5.0 to run out of the large-scale model, the comprehensive development of both arts and sciences and the excellent performance of multimodal interaction are undoubtedly an important aspect worthy of our attention.

With a full-stack layout at the cloud edge, SenseTime creates a large-scale model product matrix

With the advent of the AI era, especially when the demand for centralized computing power expands to the device side and the demand for enterprise-level edge AI continues to increase, only by realizing the efficient collaboration of cloud, device, and edge can we truly help the implementation of large models.

It is based on this understanding that SenseTime has launched a full-stack large-scale model product matrix of "cloud, device, and edge" for the first time in the industry, including the "SenseTime End-to-End Model" for terminal devices, and the "SenseTime Enterprise-level Large-scale Model All-in-One" for edge products in various fields such as finance, code, healthcare, and government affairs.

It is reported that the inference speed of SenseTime's device-side large language model has reached the fastest in the industry, with an average generation speed of 18.3 words/s on the mid-end platform and 78.3 words/s on the flagship platform.

The diffusion model can also achieve the fastest inference speed in the industry on the device side, and the inference speed of the end-side LDM-AI expansion technology is less than 1.5 seconds on a mainstream platform, which is 10 times faster than that of the cloud app of a competitor, and supports the output of high-definition images of 12 million pixels and above, and supports fast image editing functions such as equal ratio expansion on the device, free expansion and rotation expansion on the device.

It is worth mentioning that in order to meet the growing demand for AI applications at the edge of key industries such as finance, code, healthcare, and government affairs, SenseTime officially launched the enterprise-level large model all-in-one machine, which can support both enterprise-level 100 billion model acceleration and knowledge retrieval hardware acceleration, realize localized deployment, and use it out of the box, lowering the threshold for enterprise application large models. Compared with similar products in the industry, the inference cost is reduced by 80%, the retrieval is greatly accelerated, and the CPU workload is 50%.

Thanks to SenseTime's full-stack layout in the cloud, device, and edge, SenseTime can enable AI models to be implemented in more enterprises, so that the needs of each enterprise can be met to the greatest extent.

Because of this,

In the office field, SenseTime helps WPS 365 build a new office productivity platform that releases scene capabilities more efficiently and builds an exclusive "enterprise brain" for enterprises based on the excellent code generation and tool invocation capabilities of the "RiRixin" large model.

In the financial sector, Haitong Securities and SenseTime jointly released a multi-modal full-stack model of the financial industry, in which the two parties promoted business implementation in the fields of intelligent customer service, compliance and risk control, code assistance, and business office assistants, and jointly researched cutting-edge scenarios in the industry such as robo-advisors and public opinion monitoring, so as to open up the full-stack capabilities of the large-scale model landing in the securities industry.

In the field of travel, based on SenseTime's end-cloud large model solution, Xiaomi Xiaoai provides intelligent interactive experience for car owners.

It is foreseeable that with the deepening of the cloud, device, and edge full-stack layout of SenseTime 5.0, we will also see more enterprises realize the rapid implementation of AI applications with the help of SenseTime, and continue to embrace the dividends of the AI era.

With the blessing of computing power, SenseTime has found a path to follow the "law of scale".

Whether it is the comprehensive upgrade of Ridayxin 5.0 or the full-stack layout of SenseTime based on the cloud edge, it is actually inseparable from the blessing and support of the computing center built by SenseTime.

As Xu Li, Chairman and CEO of SenseTime, said, SenseTime continues to seek the best data ratio and establish a data quality evaluation system to promote its own large model research and development, while also providing industry partners with large model training, fine-tuning, deployment, and various generative AI capabilities and services.

In the final session of this technical exchange day, Xu Li, chairman and CEO of SenseTime, also brought three videos generated entirely by large models, and emphasized the controllability of the Wensheng video platform for characters, actions and scenes.

In the future, a video can be generated by entering a text or a complete description, and the characters' clothing, hairstyles, and scenes can be pre-set to maintain the coherence and consistency of the video content.

It is not difficult to see that SenseTime's Wensheng video is already on the way.

It can be said that SenseTime has found a way to follow the "law of scale".

It is based on such a new path that SenseTime can continuously realize the upgrade of New 5.0 every day, and can create a full-stack layout of cloud, device, and edge, so as to meet the new needs of more and more enterprises for AI.

Therefore, if we want to find the internal reason for the "escape speed" of SenseTime's large model landing, the strong blessing of SenseTime Intelligent Computing Center behind it is undoubtedly another important aspect that deserves our attention.

epilogue

From the knowledge, mathematics, reasoning and code capabilities of RiRixin 5.0, the comprehensive benchmarking of GPT-4 Turbo, the achievement or surpassing of GPT-4 Turbo in mainstream objective evaluation, to the industry's first full-stack layout at the cloud edge, to SenseTime's in-depth empowerment of partners, and even SenseTime's comprehensive embrace of the AGI era, we can see that SenseTime has really run the "escape speed" of the landing of large models.

When the performance of SenseTime 5.0 surpasses that of GPT-4 Turbo, and when SenseTime understands Chinese consumers and enterprises better than GPT-4 Turbo, SenseTime can undoubtedly achieve overtaking in a corner at the moment when the path of "law of scale" is gradually clear, helping AI to land in more scenarios, and truly realize the comprehensive collaboration of algorithms, computing power, data, applications and scenarios.

-ENDS-

Author: Meng Yonghui, senior writer, columnist, industry observer, well-known KOL.

Read on