A new chapter in device-side AI: SenseTime released the "fastest" device-side model

作者 | Li Yuan

Edit | Zheng Xuan

Designing itineraries, generating copywriting, making summaries, and intelligently expanding maps, can the end-side model do all this? And so fast?

On April 23, SenseTime released the Ririxin 5.0 model system, among which the SenseChat-Lite model has become the highlight of this release, which not only comprehensively leads all open source 2B models of the same level, but also surpasses the Llama-2 7B model in terms of evaluation.

Since last year, the device-side large model has been a hot spot in the consumer electronics industry and the artificial intelligence industry.

The device-side large model is a large model that runs on the device side, and usually has a smaller number of parameters than the large model such as GPT that we are familiar with, so it can be directly run using the device-side computing power. On-device AI has the advantage of being able to generate answers under any network conditions, privacy without leaving the device, and lower costs because it doesn't require cloud computing power.

Stronger device-side AI capabilities mean that user scenarios that could not be done will be further opened: work files can also be processed directly with mobile phone models without worrying about confidential leakage, chatting with foreigners on the plane without Internet access, and children who do not want to connect their mobile phones to the Internet, but can also let children learn knowledge and listen to stories without connecting to the Internet.

Because of the lower cost, it will also have an impact on the consumer electronics industry itself, and the device-side model will enable various terminal devices, including automobiles and XR, especially affordable electronic devices such as smart speakers, to have the opportunity to access the intelligent experience without thinking too much about the cost of computing power. The low latency will also enable the scene of future devices such as AI Pins to open up and obtain a usable experience.

The latest demonstration of the capabilities of the device-side model seen in the Geek Park makes us feel the rapid progress of device-side AI. In the future, the market for end-to-end AI will be very broad.

The fastest end-to-end large model

What surprised us the most was the response speed of the large model on the device side.

A new chapter in device-side AI: SenseTime released the "fastest" device-side model

Although before the test, we have learned that the large model of SenseTime has a very fast response speed, but the test effect is still amazing.

For the human eye, reading 20 words per second is already the limit of the human eye.

This time, SenseTime's model can achieve 18.3 words per second on the mid-range platform and 78.3 words per second on the flagship platform!

In addition to the speed of generation, it is also more responsive than the AI in the cloud that we are familiar with.

Wang Gang, the head of the Xiaomi Xiaoai team, once said that when making Xiaomi Xiaoai's products, in order to make the user experience better, the team originally hoped to find a cloud model with a response time of less than 1.4 seconds, but found that 1.4 seconds is too harsh for the cloud model, and the response time of the cloud model that is actually accessed is about 2 seconds.

This time, SenseTime's device-side model has a first load time of less than 0.4 seconds, which is far lower than the response speed of even the more advanced cloud models. The response speed and generation speed together constitute the first feeling that the device-side model gives us, which is fast, which is worthy of being the fastest device-side model in the industry.

In addition to responsiveness, we also tested the model's underlying capabilities.

Let's start by looking at the generative capabilities of the model.

The prompt word given is: Help me write a reading note on Dream of Red Mansions.

As you can see, although we don't have a word count requirement in the prompt words, SenseChat-Lite is not lazy at all, and directly generates a reading note with a fairly long word count.

And it is obvious that the model itself has a relatively good interpretation of the direction, meaning, and characters of the story of Dream of Red Mansions.

Are you afraid that children will pick up their mobile phones connected to the Internet and play endlessly? In the future, you can directly give children a mobile phone that is disconnected from the Internet, and let the big model tutor the children.

We then did a Weibo copy generation.

To our surprise, in this scenario, we didn't limit the number of words, but the model autonomously generated a short copy to match Weibo's word limit. I even know how to add the hashtag #Graduation Blessing #.

Obviously, it is useful for the model to use Chinese corpus for training. The model has a very good understanding of what kind of copywriting length and copywriting style are required in each scenario, and can be directly imitated and generated.

It is completely conceivable that based on the same model capabilities, you can also directly use the end-side model to generate Taobao comments, generate Moments copywriting, and generate event publicity.

SenseTime said that during the training process, it learned a lot of questions that users like to ask in the mobile phone scene, and specially did the corresponding training, which seems to be really effective. Finally, we tested the document summarization feature.

The end-to-end model quickly summarizes a complex seven-paragraph text into a two-paragraph copy.

How many articles have you collected that you haven't read for too long?

This is also normal, reading long articles on mobile phones is indeed very anti-human.

After the arrival of device-side AI, whether it is on a plane or a train, when you can't read a long article, or when you are too lazy to read work documents and have to deal with them, the device-side model can quickly help you summarize the main points.

In addition to the device-side negotiation model, SenseTime also demonstrated the capabilities of the device-side multimodal diffusion model.

The multimodal model, on mobile phones, is mainly used for photo processing, and can be used in multiple scenarios including album management and image generation.

This time, we saw a demonstration of SenseTime's AI expansion model. When taking photos with our mobile phones, sometimes we crop the frame because of the angle or to avoid tourists. In this case, AI can use AI to calculate the boundaries of the image, generate an unobstructed image, readjust the image proportion, and generate a better looking and more shareable image.

The demo video is SenseTime's free expansion function. In fact, there are many ways to choose from SenseTime's AI expansion. You can either expand the image normally and take a crooked photo, or you can directly adjust it back to the right and let the AI take care of the rest.

Like SenseChat-Lite, the most amazing thing about this expansion is its speed.

It is reported that the diffusion model has also achieved the fastest end-to-end inference speed in the industry, which can be achieved in less than 1.5 seconds on the Qualcomm flagship platform. At the press conference, SenseTime also demonstrated the speed comparison with similar functions of other competitors, and the difference is significant. After one of the extended images of the competitor has been loaded, the end-side model of SenseTime has been expanded to nine maps.

On-device AI can also be smart

For device-side AI, one question that cannot be crossed over is, is device-side AI smart enough?

With this question in mind, let's take a look at the generative capabilities of device-side AI.

Take, for example, the itinerary at the beginning of the article. Itinerary requires not only knowledge and ability - knowledge of Cairo, but also a certain amount of reasoning ability to design a good itinerary. If the model is not intelligent enough, it is easy to have a situation where "listening to your words is better than listening to your words", or only generating one or two simple answers. And this time, the end-to-end model generation is very good.

According to SenseTime, the reason why the end-side model can be so intelligent is because it carries SenseTime's latest data cleaning technology and high-quality data. According to the scaling law, the ability of a small-sized model to train a model with high-quality data can outweigh the intelligence of a model larger than it.

At the same time, when the device-side model really falls into the user's use in the future, the solution proposed by SenseTime is the combination of device and cloud.

SenseTime said that in knowledge, exam, and Q&A scenarios, end-to-end processing can account for more than 70%. In other scenarios, the scale may vary slightly. However, according to the analysis of the real usage scenarios of users, SenseTime believes that in fact, most of the time, the questions raised by users are knowledge and Q&A questions, and the end-side model can solve a considerable proportion.

Why do you need an end-to-end model?

The cloud model has been done very well, with a large number of parameters and strong capabilities, why do we still need an end-side model?

For ordinary users, the most obvious perception may still lie in the fast generation speed, which is also the advantage of the model released by SenseTime.

SenseTime has made a very interesting little game, allowing GPT-4 and SenseTime's end-to-end models to take over the players of the fighting game respectively to reflect the meaning of speed. In the game, GPT-4 was able to better analyze how to dodge with punches, but the response speed of the end-side model was extremely fast, and GPT-4 was hit before it had time to analyze how to dodge, and the final result became that the end-side model punched and killed the master.

In the user's use, the response speed is fast, which can continuously reduce the user's psychological cost of using the AI model. Whether it takes one second or five seconds to expand an image may completely affect the experience of using this feature.

In addition, the device-side model will expand new AI use cases through the generation ability and stronger protection of privacy under the condition of network disconnection.

Whether it's working on an airplane, traveling to a no-man's land to find life-saving strategies, translating when network conditions are not good, giving children a disconnected device to learn, throwing confidential work files to a large model for processing, and allowing AI to access more personal data, etc., are all practical scenarios that the device-side model can provide in the future.

And for the industry, the impact may be even greater.

At the moment, as users, we are exposed to cloud models, which are usually free. But free doesn't mean there is no cost, and such costs are usually borne by other parties for the user. For example, Xiaomi once mentioned that after accessing the large model, the retention of active users of Xiaomi Xiaoai increased by 10% the next day.

In fact, Internet products often need to be radically changed to increase next-day retention by 5 percent. The increase in next-day retention will lead to an increase in user activity, as well as more distribution opportunities and stronger commercialization capabilities. This is a very valuable business opportunity for Xiaomi. As a company with a variety of smart hardware such as mobile phones, cars, and speakers, Xiaomi naturally hopes to integrate the model into more devices, but this involves the cost of the cloud model. Xiaomi mentioned that mid-to-high-end mobile phones and cars are currently insensitive to the cost of large models, but for devices like speakers, it is difficult for hardware companies to bear the cost of users using large models in a life cycle. And as the capabilities of large models improve, users may use large models more and more, and cost will become a bigger issue.

In the device-side model, the computing power and power used are provided by the device itself, and users can get an improved experience without additional fees, but manufacturers can save the cost of the cloud, which is a win-win situation. An important path for device-side models to AI inclusion.

In addition, the data transmission and latency of the device-side model are also very important for the experience of intelligent hardware using large models.

AI Pin, which has attracted much attention before, has a very futuristic concept - using large models to help life anytime and anywhere, but foreign evaluations generally say that the experience is overturned. One of the main reasons is that AI Pins all use cloud models, and when the device enters different locations such as indoor and parking lots, the response speed is too slow, which will seriously affect the product experience. With the device-side model, such problems can be avoided.

At present, users cannot directly experience the device-side model of SenseTime, and it requires smart hardware manufacturers to cooperate with SenseTime for joint deployment. However, SenseTime said that it has cooperated with a number of leading manufacturers, including Xiaomi Xiaoai, and believes that soon, a large part of the smart scenes experienced by users on the terminal will be carried by terminal AI.

In addition to mobile phones that can adapt to a variety of chips, the device-side models currently released by SenseTime can also be used on XR, in-vehicle and other platforms, which will be able to realize the intelligent experience of these devices from scratch. In the future, the capabilities of device-side agents will be unlocked to further empower smart devices.

SenseTime's latest end-side model allows us to see that the underlying model capabilities of the end-side model are accelerating to become stronger, and the end-side will definitely have a large market in the future.

SenseTime has provided the underlying capabilities for mobile phones since the last AI era. In the era of large models, we have once again seized the relevant opportunities, not only launched a large cloud model with capabilities comparable to GPT-4, but also launched the fastest device-side model this time.

With the good cooperation experience of the smart device market in the past, SenseTime has a lot to do in the future market of the end side.

*Header image source: Visual China This article is an original article by Geek Park, please contact Geek Jun WeChat geekparkGO for reprinting

A new chapter in device-side AI: SenseTime released the "fastest" device-side model

The fastest end-to-end large model

On-device AI can also be smart

Why do you need an end-to-end model?

Read on

The mobile UI model came out, and the Apple iPhone may welcome a new cycle of upgrades

iFLYTEK does not tell the "sexy story" of large models

Meta released the "strongest open-source AI model", and the next generation may be stronger than GPT

面壁新模型:早于Llama3、比肩 Llama3、推理超越 Llama3!

Huawei's profit soared by 564% in the first quarter, Tianya community recovered, and Xiaohongshu tested its self-developed large model

13 Models of Effective Communication Expression

Eat through an industrial chain in one day: NO.37 AI large model industrial chain

10 domestic large models vs. mentally handicapped - Chinese comprehension ability assessment

The most complete interpretation of the MoE hybrid expert model: revealing the key technologies and challenges

Baidu's strongest SOTA: 3DGS based on diffusion model!

Sprint 2024 "Half Year Red" | Sixty percent of AI companies have achieved profitable growth, and large model companies have made money?

Dialogue with UBTECH Jiao Jichao: Large model accelerates humanoid robots to "work in the factory"

iFLYTEK's profit puzzle: high investment and low return in the field of large models

Ali Lin Junyang: Large models are not enough for many people, and building multimodal agents is the key

Li Feifei, the godmother of AI, founded a spatial intelligence company that strives to overcome the existing limitations of large-scale AI technology

Cao Dewang is afraid of following in the footsteps of Ma Yun, and Zhang Xuefeng talks about Fuyao University of Science and Technology, and the hidden dangers of development are relatively large

Mingxun Chen's team from Northwest A&F University revealed the molecular mechanism of DIV1 regulating seed germination under salt stress

After 10 years of net profit, Zitian Technology's performance thunderstorm has attracted regulatory attention

A Brief Discussion on Bank M&A Services in Fintech Finance|Finance and Technology

China's science and technology have been belittled, and the US media: J-20 "steals" US technology and is not qualified to be compared with the F35

"Not closing" during the holiday: a number of key projects in the eco-science and technology new city refresh the "progress bar"

Was Guangxu's death an illness or man-made? Modern science and technology have solved a century-old mystery, and the ending is heartbreaking

Chang'e-6: Debunking the technological myth of landing on the back of the moon

The development focus of Ningxiang has shifted to Daolin and Datunying to build a science and technology transformation base in Xiangjiang Science and Technology City

When the technological face and the natural face "age at the same time, how big is the difference?

"Butterfly Model" classic example class notes

How big is the gap between "technological face" and "natural face"? After reading the comparison of the three groups, it is clear at a glance

Trumpchi Zhidian Technology i-GTEC upgrade 2.0, cross-border marketing has "new tricks"

Love wallpapers|Future technology

The high-tech field is also inseparable from its magical charm

Fuyao University of Science and Technology has changed its name to a frenzy! The shocking new name of the university has sparked heated discussions in the society!

Liu Xiaoqing is only 5 years older than Deng Jie! The two are in very different states, and the technological face and the original face are clear at a glance