laitimes

A new chapter in device-side AI: SenseTime released the "fastest" device-side model

author:Geek Park

作者 | Li Yuan

Edit | Zheng Xuan

Designing itineraries, generating copywriting, making summaries, and intelligently expanding maps, can the end-side model do all this? And so fast?

On April 23, SenseTime released the Ririxin 5.0 model system, among which the SenseChat-Lite model has become the highlight of this release, which not only comprehensively leads all open source 2B models of the same level, but also surpasses the Llama-2 7B model in terms of evaluation.

Since last year, the device-side large model has been a hot spot in the consumer electronics industry and the artificial intelligence industry.

The device-side large model is a large model that runs on the device side, and usually has a smaller number of parameters than the large model such as GPT that we are familiar with, so it can be directly run using the device-side computing power. On-device AI has the advantage of being able to generate answers under any network conditions, privacy without leaving the device, and lower costs because it doesn't require cloud computing power.

Stronger device-side AI capabilities mean that user scenarios that could not be done will be further opened: work files can also be processed directly with mobile phone models without worrying about confidential leakage, chatting with foreigners on the plane without Internet access, and children who do not want to connect their mobile phones to the Internet, but can also let children learn knowledge and listen to stories without connecting to the Internet.

Because of the lower cost, it will also have an impact on the consumer electronics industry itself, and the device-side model will enable various terminal devices, including automobiles and XR, especially affordable electronic devices such as smart speakers, to have the opportunity to access the intelligent experience without thinking too much about the cost of computing power. The low latency will also enable the scene of future devices such as AI Pins to open up and obtain a usable experience.

The latest demonstration of the capabilities of the device-side model seen in the Geek Park makes us feel the rapid progress of device-side AI. In the future, the market for end-to-end AI will be very broad.

The fastest end-to-end large model

What surprised us the most was the response speed of the large model on the device side.

A new chapter in device-side AI: SenseTime released the "fastest" device-side model

Although before the test, we have learned that the large model of SenseTime has a very fast response speed, but the test effect is still amazing.

For the human eye, reading 20 words per second is already the limit of the human eye.

This time, SenseTime's model can achieve 18.3 words per second on the mid-range platform and 78.3 words per second on the flagship platform!

In addition to the speed of generation, it is also more responsive than the AI in the cloud that we are familiar with.

Wang Gang, the head of the Xiaomi Xiaoai team, once said that when making Xiaomi Xiaoai's products, in order to make the user experience better, the team originally hoped to find a cloud model with a response time of less than 1.4 seconds, but found that 1.4 seconds is too harsh for the cloud model, and the response time of the cloud model that is actually accessed is about 2 seconds.

This time, SenseTime's device-side model has a first load time of less than 0.4 seconds, which is far lower than the response speed of even the more advanced cloud models. The response speed and generation speed together constitute the first feeling that the device-side model gives us, which is fast, which is worthy of being the fastest device-side model in the industry.

In addition to responsiveness, we also tested the model's underlying capabilities.

Let's start by looking at the generative capabilities of the model.

The prompt word given is: Help me write a reading note on Dream of Red Mansions.

A new chapter in device-side AI: SenseTime released the "fastest" device-side model

As you can see, although we don't have a word count requirement in the prompt words, SenseChat-Lite is not lazy at all, and directly generates a reading note with a fairly long word count.

And it is obvious that the model itself has a relatively good interpretation of the direction, meaning, and characters of the story of Dream of Red Mansions.

Are you afraid that children will pick up their mobile phones connected to the Internet and play endlessly? In the future, you can directly give children a mobile phone that is disconnected from the Internet, and let the big model tutor the children.

We then did a Weibo copy generation.

A new chapter in device-side AI: SenseTime released the "fastest" device-side model

To our surprise, in this scenario, we didn't limit the number of words, but the model autonomously generated a short copy to match Weibo's word limit. I even know how to add the hashtag #Graduation Blessing #.

Obviously, it is useful for the model to use Chinese corpus for training. The model has a very good understanding of what kind of copywriting length and copywriting style are required in each scenario, and can be directly imitated and generated.

It is completely conceivable that based on the same model capabilities, you can also directly use the end-side model to generate Taobao comments, generate Moments copywriting, and generate event publicity.

SenseTime said that during the training process, it learned a lot of questions that users like to ask in the mobile phone scene, and specially did the corresponding training, which seems to be really effective. Finally, we tested the document summarization feature.

A new chapter in device-side AI: SenseTime released the "fastest" device-side model

The end-to-end model quickly summarizes a complex seven-paragraph text into a two-paragraph copy.

How many articles have you collected that you haven't read for too long?

This is also normal, reading long articles on mobile phones is indeed very anti-human.

After the arrival of device-side AI, whether it is on a plane or a train, when you can't read a long article, or when you are too lazy to read work documents and have to deal with them, the device-side model can quickly help you summarize the main points.

In addition to the device-side negotiation model, SenseTime also demonstrated the capabilities of the device-side multimodal diffusion model.

The multimodal model, on mobile phones, is mainly used for photo processing, and can be used in multiple scenarios including album management and image generation.

This time, we saw a demonstration of SenseTime's AI expansion model. When taking photos with our mobile phones, sometimes we crop the frame because of the angle or to avoid tourists. In this case, AI can use AI to calculate the boundaries of the image, generate an unobstructed image, readjust the image proportion, and generate a better looking and more shareable image.

A new chapter in device-side AI: SenseTime released the "fastest" device-side model

The demo video is SenseTime's free expansion function. In fact, there are many ways to choose from SenseTime's AI expansion. You can either expand the image normally and take a crooked photo, or you can directly adjust it back to the right and let the AI take care of the rest.

Like SenseChat-Lite, the most amazing thing about this expansion is its speed.

It is reported that the diffusion model has also achieved the fastest end-to-end inference speed in the industry, which can be achieved in less than 1.5 seconds on the Qualcomm flagship platform. At the press conference, SenseTime also demonstrated the speed comparison with similar functions of other competitors, and the difference is significant. After one of the extended images of the competitor has been loaded, the end-side model of SenseTime has been expanded to nine maps.

On-device AI can also be smart

For device-side AI, one question that cannot be crossed over is, is device-side AI smart enough?

With this question in mind, let's take a look at the generative capabilities of device-side AI.

Take, for example, the itinerary at the beginning of the article. Itinerary requires not only knowledge and ability - knowledge of Cairo, but also a certain amount of reasoning ability to design a good itinerary. If the model is not intelligent enough, it is easy to have a situation where "listening to your words is better than listening to your words", or only generating one or two simple answers. And this time, the end-to-end model generation is very good.

A new chapter in device-side AI: SenseTime released the "fastest" device-side model

According to SenseTime, the reason why the end-side model can be so intelligent is because it carries SenseTime's latest data cleaning technology and high-quality data. According to the scaling law, the ability of a small-sized model to train a model with high-quality data can outweigh the intelligence of a model larger than it.

At the same time, when the device-side model really falls into the user's use in the future, the solution proposed by SenseTime is the combination of device and cloud.

A new chapter in device-side AI: SenseTime released the "fastest" device-side model

SenseTime said that in knowledge, exam, and Q&A scenarios, end-to-end processing can account for more than 70%. In other scenarios, the scale may vary slightly. However, according to the analysis of the real usage scenarios of users, SenseTime believes that in fact, most of the time, the questions raised by users are knowledge and Q&A questions, and the end-side model can solve a considerable proportion.

Why do you need an end-to-end model?

The cloud model has been done very well, with a large number of parameters and strong capabilities, why do we still need an end-side model?

For ordinary users, the most obvious perception may still lie in the fast generation speed, which is also the advantage of the model released by SenseTime.

SenseTime has made a very interesting little game, allowing GPT-4 and SenseTime's end-to-end models to take over the players of the fighting game respectively to reflect the meaning of speed. In the game, GPT-4 was able to better analyze how to dodge with punches, but the response speed of the end-side model was extremely fast, and GPT-4 was hit before it had time to analyze how to dodge, and the final result became that the end-side model punched and killed the master.

A new chapter in device-side AI: SenseTime released the "fastest" device-side model

In the user's use, the response speed is fast, which can continuously reduce the user's psychological cost of using the AI model. Whether it takes one second or five seconds to expand an image may completely affect the experience of using this feature.

In addition, the device-side model will expand new AI use cases through the generation ability and stronger protection of privacy under the condition of network disconnection.

Whether it's working on an airplane, traveling to a no-man's land to find life-saving strategies, translating when network conditions are not good, giving children a disconnected device to learn, throwing confidential work files to a large model for processing, and allowing AI to access more personal data, etc., are all practical scenarios that the device-side model can provide in the future.

And for the industry, the impact may be even greater.

At the moment, as users, we are exposed to cloud models, which are usually free. But free doesn't mean there is no cost, and such costs are usually borne by other parties for the user. For example, Xiaomi once mentioned that after accessing the large model, the retention of active users of Xiaomi Xiaoai increased by 10% the next day.

In fact, Internet products often need to be radically changed to increase next-day retention by 5 percent. The increase in next-day retention will lead to an increase in user activity, as well as more distribution opportunities and stronger commercialization capabilities. This is a very valuable business opportunity for Xiaomi. As a company with a variety of smart hardware such as mobile phones, cars, and speakers, Xiaomi naturally hopes to integrate the model into more devices, but this involves the cost of the cloud model. Xiaomi mentioned that mid-to-high-end mobile phones and cars are currently insensitive to the cost of large models, but for devices like speakers, it is difficult for hardware companies to bear the cost of users using large models in a life cycle. And as the capabilities of large models improve, users may use large models more and more, and cost will become a bigger issue.

In the device-side model, the computing power and power used are provided by the device itself, and users can get an improved experience without additional fees, but manufacturers can save the cost of the cloud, which is a win-win situation. An important path for device-side models to AI inclusion.

In addition, the data transmission and latency of the device-side model are also very important for the experience of intelligent hardware using large models.

AI Pin, which has attracted much attention before, has a very futuristic concept - using large models to help life anytime and anywhere, but foreign evaluations generally say that the experience is overturned. One of the main reasons is that AI Pins all use cloud models, and when the device enters different locations such as indoor and parking lots, the response speed is too slow, which will seriously affect the product experience. With the device-side model, such problems can be avoided.

At present, users cannot directly experience the device-side model of SenseTime, and it requires smart hardware manufacturers to cooperate with SenseTime for joint deployment. However, SenseTime said that it has cooperated with a number of leading manufacturers, including Xiaomi Xiaoai, and believes that soon, a large part of the smart scenes experienced by users on the terminal will be carried by terminal AI.

In addition to mobile phones that can adapt to a variety of chips, the device-side models currently released by SenseTime can also be used on XR, in-vehicle and other platforms, which will be able to realize the intelligent experience of these devices from scratch. In the future, the capabilities of device-side agents will be unlocked to further empower smart devices.

SenseTime's latest end-side model allows us to see that the underlying model capabilities of the end-side model are accelerating to become stronger, and the end-side will definitely have a large market in the future.

SenseTime has provided the underlying capabilities for mobile phones since the last AI era. In the era of large models, we have once again seized the relevant opportunities, not only launched a large cloud model with capabilities comparable to GPT-4, but also launched the fastest device-side model this time.

With the good cooperation experience of the smart device market in the past, SenseTime has a lot to do in the future market of the end side.

*Header image source: Visual China This article is an original article by Geek Park, please contact Geek Jun WeChat geekparkGO for reprinting

Read on