Jin Lei from SenseTime AIDC

Quantum Position | 公众号 QbitAI

It's exciting enough, GPT-4 was "beaten" in public, and he didn't even have a chance to fight back:

GPT-4 was "beaten" by the small model on the end of the scene, and SenseTime 5.0: fully benchmarked against Turbo

Yes, it was in a live PK of the "Street Fighter" game that such a famous scene occurred.

And the two are still not in the same "heavyweight" category:

Green Man: Manipulated by GPT-4
Red: Manipulated by a small end-side model

So what is the origin of this small and tough player?

不卖关子,它正是由商汤科技最新发布的日日新端侧大模型——SenseChat Lite(商量轻量版)。

In the performance of "Street Fighter" alone, this small model has a kind of momentum of "martial arts in the world, only fast and unbreakable":

GPT-4 was still thinking about how to make a decision, and SenseChat Lite's fist was already hit.

Not only that, Xu Li, CEO of SenseTime, also increased the difficulty on the spot, directly disconnecting the network on his mobile phone to start the test!

For example, if an employee takes a one-week leave request in offline mode, the effect is as follows:

△ On-site original speed

(Of course, Xu Li jokingly said, "The fake is too long, don't approve it~")

You can also make a quick summary of long paragraphs of text:

△ On-site original speed

And this is possible because SenseChat Lite has reached the SOTA level in terms of performance at the same scale.

It also defeated Llama 2-7B and even 13B in a number of tests with the posture of "small and big".

In terms of speed, SenseChat Lite adopts the MoE framework of device-cloud "linkage", and device-side inference accounts for 70% in some scenarios, which will make the inference cost lower.

Specifically, compared to the reading speed of 20 words per second for the human eye, SenseChat Lite can reach an inference speed of 18.3 words per second on medium-performance mobile phones.

If it is in a high-end flagship mobile phone, then the inference speed can directly soar to 78.3 words / second!

However, in addition to text generation, Xu Li also demonstrated the multimodal capabilities of SenseTime's end-to-end model at the scene.

For example, in the same extended image, SenseTime's end-side large model expands 3 different pictures faster than the speed of 1 photo expansion of a competitor when it starts slowly in half a beat:

The students who demonstrated even took pictures directly on the spot, and then expanded the photos freely after shrinking them a lot:

Well, I have to say that SenseTime dares to be real on the spot.

However, throughout the event, the end-side model is only a corner of the conference.

In terms of "big pedestal", SenseTime has upgraded its own daily new large model with a major version - SenseNova 5.0. And directly position it to a new level:

全面对标GPT-4 Turbo！

So what is the strength of the 5.0 version of the new large model every day, let's measure a wave~

Please, "mentally retarded"!

Since the popularity of large models, "mentally retarded" has always become one of the standards for testing the logical ability of large models, and it is nicknamed "Benchmark for the mentally handicapped".

("Mentally Handicapped" is derived from Baidu Tieba, a Chinese community full of absurd, bizarre, and unreasonable statements.) ）

And not long ago, "mentally handicapped bar" also appeared on the serious AI paper, becoming the best Chinese training data, causing a lot of heated discussions.

So when the discussion model 5.0 of text dialogue meets the "mentally handicapped", what kind of fireworks will be created between the two?

Logical reasoning: "Mentally retarded"

Listen to the first question:

Why didn't my parents call me when they got married?

The answer is different from other AI, it will be more anthropomorphic with "I" to answer, and from the answer results, there is not too much redundant content, but the answer and explanation are accurate, "You were not born when they got married".

Listen to the second question:

Internet cafes can access the Internet, why can't the mentally handicapped bar be mentally handicapped?

In the same way, the discussion directly pointed out that "this is a joke problem" and that "the 'mentally retarded' bar is not an actual place".

It is not difficult to see that for the magic of "mentally handicapped bar", the logic of not playing cards according to the routine, Consultation 5.0 has been able to hold.

Natural Language: Dream of Red Mansions

In addition to logical reasoning ability, in terms of natural language generation, we can directly use the 2022 college entrance examination essay questions to compare GPT-4 and the large model 5.0.

Judging from the results, GPT-4's article is still an "AI template", while the discussion 5.0 side is quite poetic, not only the sentences are neat and right, but also the classics can be cited.

Well, the idea of AI is opened up and diverged.

Math ability: Simplify the complex

In the same way, GPT-4 and Shangshang 5.0 are competing on the same stage, and let's test their mathematical abilities this time:

Mom made Yuanyuan a cup of coffee, and after Yuanyuan drank half a cup, she filled it with water, and then she drank another half cup, and then filled it with water, and finally drank it all. Ask Yuanyuan how much coffee or water he drinks, and how many cups of coffee and water do you drink?

This question is actually a relatively simple question for humans, but GPT-4 has made a seemingly serious and careful deduction about it, and the result is still wrong.

The reason is that the logical construction of the thinking chain behind the large model is incomplete, and it is very easy to make mistakes if it encounters niche problems.

In the following "eagle catches chicken" question, GPT-4 may not understand the rules of this game, because the calculated answer is still wrong:

Not only can we perceive one or two from the effect of the actual experience, but also the more direct evaluation of the list data also reflects the ability to discuss 5.0——

Conventional objective measurements have reached or surpassed GPT-4.

So how does Ririxin 5.0 do it? In a word, left-handed data, right-handed computing power.

First of all, in order to break the bottleneck at the data level, SenseTime uses more than 10T tokens, so that it has the completeness of high-quality data, so that the large model has a primary understanding of objective knowledge and the world.

In addition, SenseTime has also synthesized and constructed hundreds of billions of tokens of thought chain data, which is also the key point of this effort at the data level, which can activate the strong reasoning ability of large models.

Secondly, at the computing power layer, SenseTime has jointly optimized the algorithm design and computing facilities: the topological limit of the computing facilities is used to define the algorithm in the next stage, and the new progress in the algorithm needs to re-know the construction of the computing facilities.

This is the core capability of SenseTime's AI device for the joint iteration of algorithms and computing power.

Overall, the highlights of RiRixin 5.0 can be summarized as follows:

The MoE architecture is adopted
Trained on more than 10TB tokens, it has a large amount of synthetic data
The inference context window reaches 200K
Knowledge, reasoning, math and code are fully benchmarked against GPT-4

In addition, in the field of multimodality, RiRixin 5.0 has also achieved leading results in a number of core indicators:

As a rule, let's move on to the generative effects of multimodality.

I'm even better at looking at pictures

For example, if you give a super long picture (646*130000) to Consultation 5.0, you just need to let it recognize it, and you can get an overview of all the contents:

Another example is to throw an interesting picture of a cat to Discussion 5.0, and it can infer that the cat is celebrating its birthday based on details such as party hats, cakes, and "happy birthday".

More practical, such as uploading a complex screenshot, Consultation 5.0 can accurately extract and summarize key information, but GPT-4 made a mistake in the identification process:

Second Stroke 5.0: Wasan Dairyu PK

In terms of Wensheng diagrams, Ririxin's second painting 5.0 is directly related to Midjourney, Stable Diffuison and DALL· The E 3 competed on the same stage.

For example, in terms of style, the image generated by Miaohua may be closer to the "National Geographic" mentioned in the prompt:

On the character image, you can show more complex skin textures:

Even text can be embedded in images with unmistakable precision:

There's also an anthropomorphic model

In addition, SenseTime also launched a special large model in this release - the anthropomorphic large model.

From the perspective of experience, it can already imitate film and television characters, real celebrities, Genshin Impact World and other dimension-breaking characters, and start a high emotional intelligence dialogue with you.

From a functional point of view, the Discussion Anthropomorphic Model supports character creation and customization, knowledge base construction, long dialogue memory, etc., and even the kind that can be chatted by more than three people~

It is also based on such multimodal capabilities that another major member of the SenseTime model family, Little Raccoon, has also ushered in an upgrade in ability.

Office and programming just got easier

SenseTime's Little Raccoon is currently subdivided into two categories: Office Raccoon and Programming Raccoon, which, as the name suggests, are for office scenarios and programming scenarios, respectively.

With Office Raccoon, dealing with forms, documents, and even code files has now become a matter of "one throw + one question".

Taking the procurement scenario as an example, we can first upload the supplier list information from different sources, and then say to the office raccoon:

Units, Unit Price, Remarks. Because the header information in different sheets is not consistent, you can merge similar header contents. Show the table results in the dialog box and generate a local download link, thank you.

Just wait a few moments and we will get the result after processing.

And in the left sidebar, the office raccoon also gives the Python code of the analysis process, focusing on a "traceable trace".

We can also upload multiple documents such as inventory information and purchasing requirements at the same time:

Then continue to make requests, and the office raccoon is still able to complete the task quickly.

And even if the data form is not standardized, it can find and solve it on its own:

Of course, data calculation is not a problem, it is still a matter of request.

In addition, office raccoons can also do visualization work based on data files, and directly display difficult heat maps:

In summary, the office raccoon can process multiple and different types (such as Excel, csv, json, etc.), and has very strong capabilities in Chinese understanding, mathematical computing, and data visualization. And it enhances the accuracy and controllability of the content generated by the large model in the form of a code interpreter.

In addition, at the press conference, the office raccoon also demonstrated the ability to combine complex databases for analysis.

Last week, China's first F1 driver, Zhou Guanyu, completed his race at the F1 Chinese Grand Prix. At the press conference, SenseTime directly "fed" a database file with a huge amount of data to the little raccoon in the office, and asked the little raccoon to analyze the relevant situation of Zhou Guanyu and F1 events on the spot.

For example, counting Zhou Guanyu's participation information, how many drivers are in F1 in total, which drivers have won the championship, and ranking them according to the number of awards from high to low, these calculations involve a larger and more complex data table and more dimensional details such as laps and awards, and finally give completely correct answers.

In the programming scenario, Code Raccoon can also make the efficiency of programmers directly Pro Max.

For example, just install the plugin for the extension in VS Code:

Then the various parts of programming become a matter of typing a sentence into natural language.

For example, throw the requirements document to the code raccoon, and say:

Help me write a detailed PRD document for WeChat QR code payment on the public cloud. Please follow the requirements of the "Product Requirements Document PRD Template" for PRD format and content, and the generated content is clear, complete and detailed.

Then the code raccoon starts to do the requirements analysis work:

Codecoon can also do the architectural design for you:

You can also write code through natural language requirements, or through one-click mouse comments, test code generation, code translation, refactoring or correction, etc.:

The final software testing link can also be handed over to the code raccoon to execute~

All in all, with CodeRaccoon, it can help you with some of the repetitive and tedious programming tasks that you would normally have.

And SenseTime not only released such an action this time, but also "packaged" the code raccoon to launch a lightweight all-in-one machine.

One all-in-one machine can support the development of a team of 100 people, and the cost is only 4.5 yuan per person per day.

The above is the main content of SenseTime's release.

So finally, we need to talk about a topic in a summary way.

The number of large model roads of SenseTime

Throughout the whole press conference, the most intuitive feeling is that it is comprehensive enough.

Whether it is the device-side model or the "big base" Ririxin 5.0, it is a release or upgrade of the full stack of cloud, edge, and device, and its capabilities cover almost all mainstream AIGC "labels" such as language, knowledge, reasoning, mathematics, code, and multimodality.

The second is to be resistant to fighting.

Taking the comprehensive strength of Ririxin 5.0 as an example, at present, looking at the entire domestic large-scale model players, it can be said that there are only a few who can shout out a comprehensive benchmark against GPT-4;

In the end, it's all about speed.

SenseTime's speed is not only limited to the speed of the operation effect of the device-side large model, but also the speed of its own iterative optimization process from a macro perspective. If we stretch the timeline, this speed becomes particularly noticeable:

RiRixin 1.0→2.0: 3 months
Ririxin 2.0→ 4.0: 6 months
Daily New 4.0→5.0: 3 months

On average, there is a major version upgrade almost every quarter, and its overall capability will also be greatly improved.

So the next question is, why can SenseTime do this?

First of all, from the perspective of the general direction, it is the "large model + large device" style that SenseTime has always emphasized.

The large model refers to the new large model system of Riri, which can provide a variety of large models and capabilities such as natural language processing, image generation, automatic data annotation, and custom model training.

The large device refers to the high-efficiency, low-cost, and large-scale next-generation AI infrastructure built by SenseTime, with the development, generation, and application of AI large models as the core, with a total computing power of up to 12,000 petaFLOPS and more than 45,000 GPUs.

The similarity between the two is that they have already been laid out, and they are not the products of the AIGC boom, but two forward-looking works that can be traced back to several years ago.

Secondly, at the level of large models, SenseTime has a new understanding and interpretation of the basic laws and scaling laws agreed upon by the industry based on its own actual testing and practice process.

The law of scale usually refers to the fact that with the increase of the amount of data, the number of parameters, and the training time, the performance of large models will be better, which is a feeling of miracles.

This law also contains two hidden assumptions:

Predictability: Accurate predictions of performance can be maintained across 5-7 orders of magnitude scales
Sequence-preserving: The performance advantage is verified on a small scale, and it is still maintained on a larger scale

Therefore, the law of scale can guide the optimal model architecture and data recipe in the limited R&D resources, so that the large model can learn efficiently.

It is also based on SenseTime's observation and practice that the "small and playable" end-to-end model was born.

In addition, SenseTime also has a unique understanding of the three-tier architecture (KRE) for the capabilities of large models.

Xu Li gave an in-depth interpretation of this.

The first is in knowledge, which refers to the comprehensive infusion of world knowledge.

At present, new productivity tools such as large models are almost all based on this to solve problems, that is, to answer your questions according to the solutions to problems that have been solved by predecessors.

This can be regarded as the basic skill of large model ability, but the more advanced knowledge should be based on the new knowledge obtained by reasoning under such ability, which is the second layer of this architecture - reasoning, that is, the qualitative improvement of rational thinking.

The capabilities of this layer are the key and core that can determine whether the large model is smart enough and whether it can draw inferences.

On top of that, there is execution, which refers to the interactive transformation of world content, that is, how to interact with the real world (for now, embodied intelligence is a potential existence at this level).

Although the three are independent of each other, they are also closely related to each other, and Xu Li made a more vivid analogy:

Knowledge to reasoning is like the brain, and reasoning to execution is like the cerebellum.

In SenseTime's view, these three-layer architecture is the capability that a large model should have, and this is the key to inspiring SenseTime to build high-quality data.

So the last question is, based on KRE, based on the route of "large model + large device", to what extent has the latest Ririxin been "employed" in the industry?

As the saying goes, "practice is the only criterion for testing the truth", feedback from customers may be the most authentic answer.

At the scene, Huawei, WPS, Xiaomi, China Literature, and Haitong Securities, from office to entertainment, from finance to terminals, shared the cost reduction and efficiency increase brought to their own businesses by using SenseTime's new model system.

All in all, with technology, computing power, methodology, and scenarios, the next development of SenseTime in the AIGC era is worth looking forward to.

— END —

量子位 QbitAI 头条号签约

GPT-4 was "beaten" by the small model on the end of the scene, and SenseTime 5.0: fully benchmarked against Turbo

Please, "mentally retarded"!

Office and programming just got easier

The number of large model roads of SenseTime

Read on

SenseTime's stock price surged 30% and just upgraded to Ririxin 5.0, saying that its performance surpassed GPT-4 Turbo

After the stock price rose by more than 30%, trading was suspended, what are the highlights of SenseTime Rixin 5.0?

The cloud-edge full-stack layout has been completed, and SenseTime has upgraded SenseNova 5.0 to achieve comprehensive industry implementation

端侧大模型爆发前夜商汤日日新性能超越GPT-4 Turbo

Garbage classification|Bayan South Road community carried out the activity of "daily cleaning of garbage cans and garbage classification every day".

From "One Factory in a Lifetime" to "New Journey Every Day": Comparison and Enlightenment of Chinese and Western Factory Cultures

🌿 Some people like to use up every inch of space in their home and fill it up, a sense of complexity in their lives, superior and reassuring. But I like to have a blank space in the kitchen that brings me life

If you can listen to it, you will see it, and you will find a topic! China's first WYSIWYG large-scale model "RiRixin 5O" was released

商汤科技发布"日日新5o",实时多模态流式交互对标GPT-4o

SenseTime Technology's "Daily New 5O" was released, and you can listen to it and look for topics

SenseTime Launches Multi-modal Large Model "RiRixin 5O"丨Kelin Launches AI Video Web Editor

Reward丨Gou Rixin, New Every Day: Fu Baoshi "Mirror Park Flying Spring"

The race is endless, and it is new every day! Sheyang handed over the "mid-term answer sheet" for high-quality development

The "Riri New Large Model" was unveiled at the Olympic Games, and what is the color of SenseTime's AI application?

Accelerate the momentum, and the new industrial city is new day by day

CNCC | The future of multimodal affective computing under large models

The "Fuxi Eye" large model was released! It has the world's largest ophthalmic image database

New car | The AI large model is on the car, 13 new/27 optimizations, and the ZEEKR 009 glorious OTA upgrade

AI Daily: Fudan and Baidu's new models can generate 1-hour long videos; The new version of ChatGPT for Windows is launched; Two new features have been added to NotebookLM

Surveying and Mapping Bulletin | Ren Ping: Noise data visualization based on LOD1 city model

The terminal AI grading standard has been implemented, and the "fire" of the mobile phone model has burned to the agent

J Clin Invest丨Yang Weili/Li Shihua/Li Xiaojiang's team used monkey models to reveal new pathological mechanisms of Parkinson's disease

Tens of millions of dollars lost by poisoning for large model training? Anthropic found a hidden bug in the LLM codebase

Nearly 1,000 teenagers in the city gathered at Zhonghai Expo to show their skills in the three major model competitions of navigation, aviation and architecture

DeepMind and MIT developed Fluid, which enables autoregressive models to achieve large-scale expansion of Wensheng graphs

AI Weekly | ByteDance's large model training was "poisoned"; Microsoft will terminate the Azure OpenAI service for individuals in China

ByteDance responded to the attack on the intern for the training of the large model: it has been dismissed and does not affect the online business

A number of large models have been rolled out in the field of traditional Chinese medicine, and the "AI old Chinese medicine" is coming?

Shoot the king to bomb? Photorealistic generative world model, with Pixar investment

Tencent, Huawei, etc. access to DeepSeek lose more than 400 million yuan per month, and the MaaS model as a service is about to be subverted? Titanium media AGI

The sex robot was unexpectedly empowered by a large model, and the concept stocks of adult products rose collectively, against the sky?