On April 24, SenseTime suspended trading on the Hong Kong Stock Exchange, rising 31.15% before the suspension of trading. SenseTime responded, "Yesterday's RiRixin Model 5.0 launch conference was well received and received great attention from the market, and in accordance with the listing rules and the recommendations of the Hong Kong Stock Exchange, the company will further publish relevant announcements." ”

At the "2024 SenseTime Technology Exchange Day" the day before, SenseTime released the new large model SenseNova 5.0 that benchmarks GPT4-Turbo, and catching up with GPT4 may be the collective goal of China's large model industry. At the "2024 SenseTime Technology Exchange Day" held at the Shanghai SenseTime Lingang AIDC, SenseTime handed over its answer sheet.

商汤版本的Scaling Law

不是GPT-4-1106-preview，不是GPT-4-0125-preview，而是在一众大模型榜单中都高居榜首的GPT4-Turbo。从SenseNova 4.0超GPT-3.5，到SenseNova5.0全面对标GPT-4 Turbo，商汤用了不到三个月时间。

The stock price has soared for a long time, and SenseTime wants to challenge GPT4 with its own scaling law

There's no magic behind this, it's the first principles in large language models: scaling laws at work.

First of all, with the continuous improvement of data, models, and computing power, SenseTime can continue to improve the capabilities of large models. This is also the power-law relationship between model performance and model size, data volume, and computation amount emphasized by OpenAI, which is a more general performance improvement framework.

But the big model is not just a violent aesthetic, behind it is a large number of software engineering system problems. Under the premise of following the law of scale, SenseTime obtains mathematical formulas through scientific experiments, and is able to predict the performance of the next generation of large models, rather than blindly and randomly trying.

Xu Li, Chairman and CEO of SenseTime, summarized two assumptions:

First, predictability: Accurate predictions of performance can be maintained across 5-7 orders of magnitude.

Second, order preservation: the performance is verified on a small scale, and it is still maintained on a larger scale.

This guides SenseTime to find the optimal model architecture and data recipe based on limited R&D resources, so that the model can complete the learning process more efficiently. "We predicted early on that our model could surpass GPT-4's capabilities at a certain test level. ”

In other words, in the process of developing large models, SenseTime focuses on predicting and verifying the effectiveness of model architecture and data formulations through small-scale experiments, and ensuring that these conclusions that have been verified on a small scale can be maintained and applied on a larger scale.

"If we choose better data recipes, the performance gains will be even more efficient. "Based on the experimental results of SenseTime, the performance of small models can approach or even surpass that of large models across orders of magnitude while optimizing data. For example, the Llama 3 small model spans an order of magnitude ahead of the larger size of the Llama 2.

The question that arises is, where are the better datasets, and how can the quality of the datasets be improved?

According to Xu Li, SenseNova 5.0 uses 10T+tokens Chinese and English pre-training data, and forms high-quality basic data through finely designed cleaning processing, which solves the primary cognition of objective knowledge and the world of large models.

In addition, SenseTime also synthesized and constructed the chain of thought data, and used the logic synthesis data (in the order of hundreds of billions of tokens) on a large scale in the pre-training process, so as to improve the model's reasoning, mathematics and programming capabilities. This is essentially helping large models learn the ideas and methods of human problem-solving.

"This is the key to ensuring the improvement of model capabilities in a real sense. If the data of each industry chain of thought can be easily constructed, our reasoning ability in the industry will be greatly improved."

There are also physical limits to the law of scale, such as the absence of data, such as the limits of hardware connections. In March of this year, Microsoft engineers mentioned that if OpenAI deployed more than 100,000 H100 GPUs in the same state, the power grid would collapse. Xu Li said, "This requires a new design of these cards, these connections, and these topologies, and the algorithm design and computing power facilities need to be jointly optimized." ”

Wensheng video is on the way

The release of Llama3 versions 8B and 70B allows us to see the potential of large models with small parameters in device-side scenarios. SenseChat-Lite, which also launched SenseTime with 1.8B parameters, surpassed all open-source 2B models of the same level in mainstream evaluation, and led 7B and 13B models such as LLaMA2 across levels.

Through the device-cloud collaboration solution, SenseChat-Lite can achieve an average generation speed of 18.3 words/s on the mid-range platform and 78.3 words/s on the flagship platform.

In terms of multimodality on the device side, the diffusion model can also achieve the fastest inference speed in the industry on the device side, and SenseTime's end-side LDM-AI expansion technology has an inference speed of less than 1.5 seconds on a mainstream platform, supports the output of high-definition images of 12 million pixels and above, and supports fast image editing functions such as proportional expansion, free expansion, and rotation expansion on the device.

SenseTime's SDK for end-to-end business has also been officially released, covering scenarios such as daily conversations, common sense Q&A, copywriting generation, album management, image generation, and image expansion, supporting a full range of Qualcomm 8 series and 7 series chips, as well as MTK Dimensity chips, and adapting to mobile phone terminals, tablet computers, VR glasses, and vehicle terminals.

In response to the data privatization deployment needs faced by industries such as finance, code, healthcare, and government affairs, SenseTime has launched an enterprise-level large-scale model all-in-one machine. It can support enterprise-level 100 billion model acceleration and knowledge retrieval hardware acceleration at the same time, realize localized deployment, out-of-the-box use, and complete the adaptation of localized chips. It supports up to 2P FLOPS computing power, 256G video memory, and 448 GB/s connection.

For software development, SenseTime released the Lightweight Edition of the Little Raccoon Code Large Model All-in-One Machine to help enterprise developers write, understand and maintain code more efficiently, with a test pass rate of 75.6% in HumanEval, exceeding 74.4% of GPT-4, supporting more than 90 programming languages and 8K contexts, and meeting the application needs of teams with less than 100 people on a single machine. The cost can be reduced from 7-8 yuan per person per day to 4.5 yuan per person per day for calling cloud code services. The price of the lightweight version of the small raccoon code large model all-in-one machine is 350,000 yuan each.

In addition, SenseTime also released a large industry model based on Ascend, and worked with Huawei Ascend to build an industry ecosystem for large models such as finance, healthcare, government affairs, and code.

In the last link, Xu Li also left an "easter egg": he posted three videos that were completely generated by large models, and said that he would release the Wensheng video platform in a short time. This also makes people start to imagine, after catching up with GPT-4 and accumulating a deep sense of business in the visual field, the next goal is to catch up with Sora?

A game of "fast fish eat slow fish".

In addition to the release of GPT-4 Turbo and end-side and side-side products after the upgrade of SenseNova 5.0, another key word of SenseTime's technical exchange day is "partner".

SenseTime invited Zhang Dixuan, President of Huawei's Ascend Computing Business, Zhang Qingyuan, CEO of Kingsoft Office, Mao Yuxing, Deputy General Manager and Chief Information Officer of Haitong Securities, Wang Gang, General Manager of Xiaoai of Xiaomi Group, and Ge Wenbing, General Manager of Dream Island of China Literature Group, to share their views. Discuss and exchange the application and prospects of large model technology in different fields such as office, finance, and travel.

In addition to reflecting the application potential of SenseTime's large-scale model capabilities in different fields, it also conveys SenseTime's vision of further deepening industry cooperation in the future. After catching up with GPT4, the real competition may be the ability to land applications, and at this point, SenseTime needs more partners.

Whether it is the release of the Ascend-based industry model with Huawei or the release of the device-side SDK, we can see that SenseTime has been emphasizing the importance of industry partners, which is also reflected in the details of the partnership:

Zhang Dixuan, President of Huawei's Ascend Computing Business Group, said that SenseTime joined Ascend's native program in early March this year, and has released four industry models after more than a month.

Wang Gang, general manager of Xiaomi Group Xiaoai, mentioned that SenseTime had completed the optimization of Xiaomi cars in two or three days, and successfully passed Lei Jun's acceptance.

"Fast" As early as 2021, SenseTime began to build its own AI infrastructure SenseCore SenseTime AI large device, AIDC is an important computing power base of SenseTime, and it was officially launched on January 24, 2022, according to the performance announcement, the total computing power of SenseTime large device has reached 12,000 petaFLOPS, which has doubled compared with the beginning of 2023, and the number of GPUs has reached 45,000 cards, realizing the large model training capability of Wanka Wanshen。

Since announcing its strategic focus on AGI in March 2023, SenseTime has updated its basic models and solutions on a quarterly basis. After SenseNova 5.0 catches up with GPT-4, the logic of the market is very clear, the cash flow is sufficient in a short period of time, and after catching up with OpenAI's latest model, it can tell a bigger story, plus the price is low enough, and naturally more people will vote with their feet.

According to SenseTime's latest 2023 financial report, its generative AI revenue business revenue reached 1.2 billion yuan, achieving a large growth of 200%, accounting for 35% of the company's total revenue. This is also a new business in which SenseTime has achieved more than 1 billion revenue at the fastest speed since its establishment ten years ago.

SenseTime, which has gone through the AI 1.0 era, has witnessed the changes of Chinese's artificial intelligence industry as an important leader.

In the era of AI 2.0, everyone seems to have become a catch-up with OpenAI. This competition around the big model is not only the big fish eating the small fish, but also the fast fish eating the slow fish. OpenAI's leading position brings an absolute competitive advantage, and for participants to get out of this state of catch-up, they need the maturity of the underlying infrastructure and the innovation of the top-level design.

For SenseTime, only by running fast enough and running long enough in the dawn before the commercialization of large models can it enjoy the first wave of dividends, fundamentally solve the problem of losses, and return to its rightful position.

In the 16th century BC, Shang Tang overthrew the Xia Dynasty through a series of military campaigns and political strategies to establish the Shang Dynasty, a change that later generations called the "Shang Tang Revolution". In the coming years, generative AI will likely become SenseTime's biggest source of revenue, and this may be the kind of revolution SenseTime needs.

The stock price has soared for a long time, and SenseTime wants to challenge GPT4 with its own scaling law

商汤版本的Scaling Law

Wensheng video is on the way

A game of "fast fish eat slow fish".