Editor: Editorial Department

The recent wave of upgrades of the new business model 5.0 has shocked the foreign science and technology circles! After the actual measurement, we found that its reasoning and mathematical abilities have spiraled again. Zhou Guanyu's racing situation in the last three years, F1 trivia you don't know, and the AI database all let you see what you get.

China's large-scale model has shocked foreign science and technology circles.

No, in the past few days, the update of the large model has been discussed, which directly made foreign netizens exclaim: It's crazy, how many great changes in China's AI industry we don't know about?

GPT-4 Turbo-level domestic large model debuted, and Zhou Guanyu's F1 race data analysis stunned the big guy

Don't blame these netizens for making too much fuss - the recently upgraded Ririxin Consultation Large Model 5.0 (SenseChat V5) has been significantly updated again in terms of basic capabilities, directly upgrading the large model capabilities to a new stage, the kind that can be felt intuitively.

To put it simply, this 600 billion parameter MoE model with strong logical reasoning capabilities can easily turn you into a better worker.

Part-time job artifact Part 1: Office raccoon

So having said all this, what kind of extraordinary experience will the products that are blessed by RiRixin 5.0 have?

First of all, let's take a look at the "office raccoon" that most directly hits the pain points of migrant workers.

As the name suggests, it focuses on an office capability.

Experience address: https://raccoon.sensetime.com/office

As we all know, in the real office scene, there are often a lot of extremely complex charts, and even we humans will be dizzy when they see it.

What's more, there are many materials that are only in foreign languages, which increases the reading disability.

Can office raccoons hold?

Two days ago, the F1 Chinese Grand Prix had just ended, and SenseTime, as a technical partner of Sauber, provided some information.

And we took this opportunity to go straight to the difficulty: import an "all-English" table with 600,000 pieces of data, covering all kinds of historical F1 data information, and let it analyze it.

It's no exaggeration to say that this test is very difficult!

You know, this data is huge. Moreover, in addition to English, the database also contains complex elements such as abbreviations and underlinings.

For example, "Zhou Guanyu" corresponds to "guanyu-zhou" (not even guanyu zhou), and the information ambiguity is relatively high.

Therefore, it is not easy for a model to analyze such data.

And we are looking forward to this challenge.

By the way, SenseTime has been the team's technical partner for three consecutive years since Zhou Guanyu's first appearance in F1 in 2022

Next, the time to test the real skills has come, and we will send a task to the office raccoon:

A histogram of the number of matches that Zhou Guanyu participated in between 2020-2024 is given.

Sure enough, on the first attempt, the office raccoon could not match Zhou Guanyu from the English name "guanyu-zhou" in the table.

Therefore, it will assume that there is no information about Zhou Guanyu in the picture.

The next step is to get some "hints" on the trick.

In the next interaction, say to it, "There will definitely be one, you can find it again".

Through step-by-step guidance and interaction, the model learned to reflect under our guidance and then successfully completed the task!

It can be seen that the office raccoon completed the data analysis of the given task by thinking hard, and gave the corresponding Python code.

And this interactive process also tells us that if the data table given to the model does not match, it is vague, and the model performance is not satisfactory, do not give up. Through interaction, the model is likely to surprise us and give us a different data interaction experience.

Here's a more difficult task, where we import all the drivers, teams, races, tracks, engine manufacturers, and so on in the history of F1 into a database file, which is a huge amount of data.

Then ask the model: how many drivers are there in F1?

This task is also very difficult, because in all the fields, none of them are in Chinese.

In the end, the office raccoon used a fuzzy match to find the corresponding information - 901 drivers, and this answer is completely correct!

In the large-scale model products, the performance of the office raccoon can be called the master of the masters.

In this process, the model uses the iterative logic of the interactive mode to query different table headers many times, and finally gives us information that we can understand.

To change the question, "Which drivers won the championship?" and plot a histogram from highest to lowest number of wins.

In the end, the model was sorted out: the drivers with the most championships were Hamilton and Schumacher.

Next, let's see if it can count the awards of Hamilton and Schumacher from different dimensions.

The office raccoons drew a radar chart that clearly showed the ability of the two in various dimensions such as pole positions, laps, podiums, and wins, and Hamilton was still slightly higher than Schumacher.

In this real data application scenario, the linkage of complex tables is realized through interaction, and the strong reasoning ability shown by Ririxin 5.0 is really impressive.

Below, let's take another example of an equally difficult market procurement.

After uploading the "2024 New Supplier Related Information" document, ask it to be consolidated into a table, and ask for the header to list the supplier classification, supplier name, product name... List.

The office raccoon immediately gave a complete and clear summary of the table.

It can even generate a visual bar chart for you, visually presenting IT, fixed capital, marketing, and administrative expenses.

It can also be used to generate graphs such as heat maps.

In addition, we can upload multiple documents at the same time, so that the office raccoon can continue to complete the requested tasks.

First of all, it gives the code that can be consulted, and finally generates the data table for the different categories that need to be purchased, which is clear at a glance.

After a test, the editor's emotion is: being able to use such an efficient data analysis and summarizing office artifact is really a blessing for every worker.

And, it's free!

Part-time job artifact Part 2: Documentation model

Another product that clearly reflects the capabilities of RiRixin 5.0 is the consultation-document model.

It is said that in addition to tabular data analysis, the model's ability to process long text is also a must.

Then we have to get to the difficulty: throw it a bunch of math papers and ask it to find a solution to a one-dimensional equation.

Soon, it not only found the corresponding question type from the fifth part of the "Primary School Mathematics Test Paper", but even gave the problem solving process in a slippery manner.

We can also ask it to help with a similar question, but the question type has to be multiple-choice.

It not only gives the stem, but also the correct answer and the steps to solve the problem.

Another example is to upload a primary school test paper and let the document model help you analyze one of the application questions with the comprehension of primary school students.

It can be like a patient teacher guiding students to do problems, from steps 1, 2, and 3 to analyze the meticulous problem-solving process in detail, and give the answer.

Who doesn't love such an AI teacher?

Then, the large model of the document can also be a "question machine", which can give a similar question, which can fully exercise your ability to draw inferences.

You can also tell it the results of your own test questions and let it score for you.

Obviously, 8.4 ÷ 0.4 = 2.1 is incorrect, and the correct answer should be 21.

With this document, you can ask unlimited questions.

The large document model can not only accurately identify the topic you want, but also give you the answer carefully.

Upload a copy of 300 Tang poems and 300 Song poems to it, and we can ask questions based on these documents!

For example, look for poems that describe the moon.

It quickly found works such as "Silent Night Thoughts", "Looking at the Moon and Huaiyuan", "Water Tune Song Tou: Bingchen Mid-Autumn Festival" and so on.

Below, we can also ask a lofty question: what are the similarities and differences between the connotation of the moon in Tang poetry and Song poetry?

It replied that the similarities lie in the emotional sustenance, the symbols of the passage of time and the symbols of beauty, but the differences lie in the differences in expression, emotional depth and cultural background.

If you want to ask the editor who works hard every day in the morning, what is his favorite word to hear?

10W+!

What are the routines of 10w+ articles? Let the document model help us analyze it.

The following are five popular articles on the 10w+ public account (yes, just look at the name).

Let's throw them at the documentation big model all at once. First of all, it can help us summarize the summary of each article.

There are thousands of articles on the Internet, why have they become a hit?

After analyzing the large model of the document, it was concluded that the real story close to life suddenly allowed the reader to find his own shadow and produce a strong emotional connection.

Excavating the common emotional experience of human beings and providing different perspectives of observation will make the article have a high value for thinking.

So, based on the above experience, how do we concoct a similar hit? The document model provides the following ideas-

The new normal of parent-child relationship under the epidemic, working mothers in the era of remote work, digital disconnection, old money to new money, and career transformation in the era of artificial intelligence......

Good guy, these propositions all sound very eye-catching, and I can't help but want to read them! The next step is to code out a few thousand words, add 100,000 articles, and walk to the peak of life.

The super text analysis ability of the document model can even provide ideas for students in literature, history and philosophy to write serious papers.

For example, what are the similarities and differences between the Analects and the Tao Te Ching on "virtue"?

After chewing on the 29-page 21,638-word Analects and the 14-page 7,302-word Tao Te Ching, the document model analyzed—

The similarity lies in the fact that both attach great importance to the role of "virtue" in personal cultivation and social governance, but the difference is that "virtue" in the Analects is more related to the individual, and the latter also involves the idea of conforming to nature and governing by doing nothing.

If you want to dig deeper, you should read the reference articles and books, which list the classics in the field.

What's more powerful is coming, if the ideas of the two documents are integrated, what kind of inspiration can be obtained? The large model of the document shows that we can start from the philosophy of harmonious and symbiotic life, the unity of inner cultivation and external behavior, etc.

If you dig deeper along this line, you may be able to produce an academic paper with a unique point of view.

A wave of benchmarks is coming

Of course, in addition to part-time jobs, for all kinds of tricky tests, Ririxin 5.0 is not afraid.

First, let's take a look at a freshly baked photo of the Xiaomi SU7.

Because it is captured casually, the vehicle itself is actually very small.

However, with the blessing of RiRixin 5.0, it was easy to identify the model, and a wave of detailed introductions was attached, which was very professional.

In contrast, other models are directly GG.

Either they recognized the wrong car, or they didn't even see the car, and only recognized the watermark of the photo.

Next, coming to us is the daily new 5.0 war "mentally handicapped" problem.

"How do you divide four oranges equally among four children with just one cut?"

For the sake of fairness, it was discussed that only one cut would have to arrange the four oranges in a row. In this way, if you go down with a knife, each child will still have an orange per person.

That's a clever trick!

Next, there is a very "serious" reasoning question.

"A hunter walked a mile south, another mile east, then a mile north, and finally returned to the starting point. He saw a bear and shot it. What color is this bear?"

In the words of discussion, it is said that this question is actually a geographical riddle.

Because it is only at the extreme point that the hunter can sound such a tortuous journey back to the starting point.

In other words, this bear must be a polar bear.

5次模型迭代，全面对标GPT-4 Turbo

After a wave of testing, you must also have a general understanding of the upgraded Ririxin 5.0 capabilities.

The image below is a side-by-side review of the models in the industry.

Note that there is a bright spot in the chart: the recent iteration of the industry model has not improved so significantly in pure knowledge-based ability, but it has greatly improved in high-order reasoning, especially mathematical ability.

For example, GPT-3.5 to GPT-4 is up to 100%, while Llama 2 to Llama 3 is up to 400%.

This is because most of the capabilities used to improve data quality are built on inference, and they are inferences on synthetic data.

Especially for the implementation of field applications, high-order reasoning ability has become an important indicator for the promotion of industry large model capabilities.

RiRixin 5.0 has benchmarked or even surpassed GPT-4 Turbo in most of the core test set indicators

Let's go back to these evaluations, and it is not difficult to see that RiRixin 5.0 has a clear wave of language, knowledge, reasoning, mathematics, code and other abilities.

And in mainstream objective evaluations, it has reached or even surpassed the level of GPT-4 Turbo!

As mentioned above, the strong ability of RiRixin 5.0 depends on the continuous optimization of the model architecture and data formula of the SenseTime team.

From Ririxin 1.0, to 2.0, 3.0, 4.0, and today's release of 5.0, the core behind every major iteration of the version is the upgrade of data.

In the past year, SenseTime has spent a lot of time optimizing the quality of the corpus and building a complete data cleaning chain.

For version 5.0, they focused on whether the dataset contained rich logic.

The performance was improved by giving higher weight to the corpus with high information density and strong logic, and the overall corpus was cleaned with high quality.

Specifically, at the knowledge level, SenseTime uses a Token of more than 10T to ensure LLM's primary cognition of objective knowledge and the world.

In addition, SenseTime has also synthesized hundreds of billions of thought chain data, which has become the key to improving the performance of RiRixin 5.0 and benchmarking GPT-4 Turbo. x

Internally, the synthetic data method has gone through two iterations, from the initial use of GPT-4 to synthesize data to the intermediate version of its own model to synthesize data and then train it.

Among them, 90% of SenseTime's synthetic data is generated by its own models, and the other 10% of the data is generated by the world's top LLMs.

As a result, hundreds of billions of synthetic data can be obtained in very high quality.

In the past few days, Altman said in a closed-door speech at Stanford, "Scaling Law is still valid, GPT-5 is more powerful than GPT-4, GPT-6 is far superior to GPT-5, and we have not yet reached the top of this curve."

In other words, the spatial potential for the next development of the large model will be endless.

I'm really looking forward to the birth of Ririxin 6.0.

Resources:

https://chat.sensetime.com/

GPT-4 Turbo-level domestic large model debuted, and Zhou Guanyu's F1 race data analysis stunned the big guy

Read on