Jin Lei from WAIC

Qubits | Official account QbitAI

SenseTime's "Daily New" AI big model, as its name suggests, really does it every day.

No, from the release to now, just 3 months later, SenseTime officially announced:

SenseNova's large model system has been fully upgraded.

So what are the optimizations for this upgraded version?

Without further ado, go straight to the results.

For example, based on the new 100 billion parameter language big model Discussion 2.0 (SenseChat), the performance in the three global authoritative evaluation benchmarks has surpassed ChatGPT:

SenseTime's new big model is upgraded in 90 days! The CEO is active on the spot, focusing on breaking through imagination

△The scores of major language models in the three evaluation criteria of MMLU, AGIEval and C-Eval

In addition, based on the other major AIGC platforms of Ridaynew, this time there is also a big upgrade in one breath:

SenseMirage 3.0: The number of parameters has been increased to 7 billion to achieve professional photography-level image detail portrayal.
SenseAvatar 2.0: Improves voice and lip fluency by more than 30% to achieve 4K HD video effects.
Qiongyu 2.0 (SenseSpace): 20% improvement in space reconstruction efficiency and 50% improvement in rendering performance.
SenseThings 2.0: The rendering accuracy has been greatly improved, and the restoration of item textures and materials has reached millimeter-level fineness.

At the scene, Xu Li, CEO of SenseTime, also took the second painting 3.0 to engage in flower work:

If you don't go to CVPR 2023 to receive the award (SenseTime won this year's best paper), then just let the second drawing generate it.

Can't play guitar, can't draw, it's okay, keep building:

Let's just say the city will play.

So what is the specific performance of more products with the new upgrade? Let's move on.

It's not just beyond ChatGPT

1. Let Lao Tzu talk to Confucius

Overall, the basic capabilities of the negotiation model have been greatly improved in version 2.0, which we can intuitively feel from the performance mentioned above that exceeds ChatGPT.

In terms of model system, in addition to launching SenseChat XL, SenseTime also launched a SenseChat S, that is, a small model version.

Xu Li showed a wave of "Confucius Dialogue with Lao Tzu" with these two models at the scene:

In terms of language, it has added some regional languages, such as Arabic and Cantonese; As well as support Chinese Simplified, Chinese Traditional, English and other languages.

The same "Confucius Dialogue with Lao Tzu", the Cantonese version is like this:

At the same time, Negotiation 2.0 also breaks through the limitation of input length of large language models.

For example, give it a command: summarize the very long English text into Chinese summary, and negotiate 2.0 to immediately perform such a complex task.

Not only that, but users can further multi-turn conversations based on this:

Finally, Discuss 2.0 also released a "plugin" - knowledge base mounting:

Knowledge generation can be quickly fused without training a model; Paired with an enterprise knowledge base, you can quickly solve problems in related areas.

2. Generate photography-grade images, and prompt words can be automatically supplemented

In terms of Wensheng Diagram, SenseTime's second painting has been upgraded to version 3.0, and from the perspective of the content of the function upgrade, the main thing is to "break through the imagination".

For example, in terms of lightweight, everyone can now fine-tune the model in 10 minutes by dragging and dropping, and customize their own generated AI.

The Xu Li "flower work" we mentioned above is a good example.

In terms of "intelligence", generating AI paintings based on prompts will no longer become cumbersome, because now even prompts can be automatically supplemented.

In the past, if we wanted the AI to generate a dragon, we might enter "Chinese dragon", "blue ornaments", and "jewelry style", but often because the prompt words were too simple, they could not achieve particularly fine effects.

Now, just enter these three phrases, and Miaohua 3.0 will automatically supplement the prompt words, and then generate more amazing works:

Let's feel the effect of the prompt "plastic bag in the sun" after being automatically expanded:

Finally, in terms of effects, the images now generated by Miaohua 3.0, whether it is the level of film or details and elements, can achieve film-level effects.

In addition to the ability to fight alone, when the multi-modal large model after the combination of Discussion 2.0 and Second Drawing 3.0 also derives a new way of playing - hot event understanding.

"Feed" it a picture or video during the dialogue, you can look at the picture and speak, and describe the content in the material more deeply and accurately.

3. Master Yan Shen also "came to the scene"

Ronin 2.0 in terms of digital humans is also the focus of this major upgrade of SenseTime's products.

No, at the scene, SenseTime showed the digital human effect of well-known figures such as Master Yanshan, whether it is the voice (accent) or the expression, it can really be said to be lifelike.

Video loading...

There are also hosts Zhang Quanling, economist Ren Zeping, professor Ji Weidong of Shanghai Jiaotong University, etc., which can be called a "variety of big coffee show".

In terms of language, the accuracy of digital humans created by Ronin 2.0 in English, Japanese, Spanish, Arabic and other languages has increased by more than 30%, and the lip shape and voice match more naturally.

In terms of generating effects, Ronin 2.0 supports the output of 4K high-definition video, which can make the finished film more sophisticated.

Ruying's upgrade also brings a new way to play -

Now, users can automatically generate a digital human image that matches the description by entering a prompt!

And it is also the kind that can realize the singing function of digital people.

Video loading...

This greatly lowers the threshold for content generation such as virtual influencers and digital human short videos.

4. 3D reconstruction that can be held whether it is 10,000 square meters or 1 mm

Finally, in the field of 3D reconstruction and digital twin, SenseTime has also brought major upgrades - Qiongyu 2.0 and Gewu 2.0.

Let's take a look at this magnificent scene:

If you don't say that this is the result of Qiongyu 2.0 3D reconstruction, perhaps many friends will think that it is an aerial video.

It is understood that Qiongyu 2.0 has achieved centimeter-level three-dimensional reconstruction accuracy, with an accuracy of 5 cm per 10,000 square meters outdoors and 1 cm per 1,000 square meters indoors.

At the same time, the reconstruction efficiency is improved by 20%, the rendering performance is improved by 50%, and the mapping time of a 100 square kilometer scene can be completed in only 38 hours (supported by 1200 TFLOPS/sec computing power).

In addition to this 3D reconstruction of macro scenes, the ability of 3D reconstruction at the micro level has also been greatly improved with the iteration of lattice objects to version 2.0.

The restoration of the texture and material of the item reaches millimeter-level fineness, which can bring a clearer and more realistic product detail experience.

More importantly, Lattice 2.0 also breaks through the age-old problem of high reflection and specular object collection, and is the kind that can accurately restore the appearance and characteristics of goods without stickers or labels.

It is not difficult to see that after entering the 2.0 era, the capabilities of various AIGC platforms have been significantly improved.

So the next question is:

How to make a big upgrade in 3 months?

As early as three months ago, SenseTime stood in the new era of AGC and gave a new formula around the three elements of data, algorithm and computing power:

Amount of computation (number of GPUs x runtime x parallel efficiency) = number of model parameters x amount of data processed.

Let's start with the two factors on the right side of the equation:

The number of model parameters must be large enough to realize the emergence of AI intelligence, which also brings a sharp increase in computing power, requiring higher parallel efficiency to effectively support large-parameter model training.

In terms of data, high-quality natural language data is gradually scarce, and visual data has many advantages over natural language in terms of quantity, quality, and information containment, which can enable AI to better understand the world.

The deep combination of the two gives the calculation amount on the left side of the equation; For these two, SenseTime has its own set of "playing methods" -

Large model + large equipment.

First of all, in terms of large models, although SenseTime released a few months ago, it is not the product of this wave of AIGC craze.

Because as early as 5 years ago, SenseTime has already started here, and in 2019, it used thousands of GPUs for single-task training, launched a visual model with a scale of 1 billion parameters, and the algorithm effect reached the best in the industry at that time.

Later, between 2021 and 2022, SenseTime also trained and open-sourced the 3 billion parameter multimodal large model "Shusheng".

Therefore, SenseTime's ability to quickly launch a large model with hundreds of billions of parameters and achieve version iteration can be regarded as a "big job" that has been a collection of various "small jobs" for a long time.

Secondly, in terms of large devices, that is, SenseTime's "playing method" in large computing power, it is also similar to the development of its own large model - it also has a layout for a long time.

In January 2022, SenseTime delivered the Artificial Intelligence Computing Center (AIDC) with a down payment of up to 5.6 billion yuan, and became one of the largest AI supercomputing centers in Asia after its debut.

A year ago, its computing power was already as high as 3740 petaflops, which can easily cope with large models with trillion parameters; After only one year, this value doubled to 5,000 petaflops.

What is this concept?

For example, with the blessing of such a large device, it can perform single-task training in a cluster of up to 4,000 cards, and can achieve uninterrupted stable training for more than seven days.

……

All in all, there is big data, big computing power, and big models, so it is not difficult to understand that SenseTime can complete the iteration of the version within 3 months.

But one thing to say, this is only a corner of "SenseTime's speed".

The landing of large models is also accelerating

The other side of "SenseTime's speed" is reflected in the landing application.

If you think that the comprehensive upgrade of the daily new is just a "proposed" and "announced" action, you are very wrong, because it is already the kind of "onboarding".

For example, combining the capabilities of Discussion 2.0 and Miaohua 3.0, SenseTime has brought customers a variety of interactive "solutions" on the mobile terminal.

Q&A interactions for information acquisition, knowledge interactions for life scenarios, and content interactions generated by language and images can be easily deployed on mobile devices because SenseTime's large model has a lightweight version.

For example, Qiongyu 2.0 has created corresponding digital twins for real offline scenarios such as the regional development of Mashan Town in Jinan, Hefei China Vision Park, and Shanghai Ruijin Hospital, which has greatly improved operational efficiency.

SenseTime also brings intelligent solutions such as long-tail fault identification and complex defect judgment to power grid inspections through its large model capabilities.

Moreover, the upgrade of SenseTime's products is not the kind of solitary fighting, but a miracle of joining forces.

For example, the "understanding of hot events" we just mentioned is one of them, in addition, SenseTime's superb shadow in the field of smart cars has integrated multiple "housekeeping skills".

For example, multimodal large model (multimodal perception), language large model (cabin brain), knowledge fusion (exclusive memory module), AIGC large model (custom digital human) and so on.

As Xu Li said at the scene:

The breakthrough of large models has set off a new round of technological revolution in artificial intelligence, followed by explosive growth in industrial demand, and new application scenarios and application models are rapidly emerging.

SenseTime hopes to continue to promote the leap forward and improve AI infrastructure capabilities through 'big model + big device', not only to create a basic model with more powerful general capabilities, but also to further efficiently integrate the expertise of different vertical fields, build a professional large model with better understanding of the industry and more expertise, fundamentally reduce the downstream application cost and threshold of the large model, and let the industrial value of the large model bloom in thousands of industries."

All in all, the current battle of large models is not only the speed of technical iteration, but also the speed of application and landing.

— End —

Qubits QbitAI · Headline number signed

SenseTime's new big model is upgraded in 90 days! The CEO is active on the spot, focusing on breaking through imagination

It's not just beyond ChatGPT

How to make a big upgrade in 3 months?

The landing of large models is also accelerating

Read on

The "SenseTime" large model system has been fully upgraded, and the language model can be used for Cantonese conversation

The "SenseTime Daily Update" large model system has been comprehensively upgraded, and the five major products have been comprehensively upgraded and applied

The "SenseTime Daily Update" large model system has been comprehensively upgraded to create an immersive sci-fi space for "The Three-Body Problem"

Alibaba Cloud AI painting tools debut丨SenseTime's new big model upgrade: surpassing ChatGPT

The "SenseTime" large model system has been comprehensively upgraded, and rapid iteration empowers hundreds of industries to innovate every day

SenseTime's stock price surged 30% and just upgraded to Ririxin 5.0, saying that its performance surpassed GPT-4 Turbo

After the stock price rose by more than 30%, trading was suspended, what are the highlights of SenseTime Rixin 5.0?

The cloud-edge full-stack layout has been completed, and SenseTime has upgraded SenseNova 5.0 to achieve comprehensive industry implementation

GPT-4 was "beaten" by the small model on the end of the scene, and SenseTime 5.0: fully benchmarked against Turbo

端侧大模型爆发前夜商汤日日新性能超越GPT-4 Turbo

Garbage classification|Bayan South Road community carried out the activity of "daily cleaning of garbage cans and garbage classification every day".

From "One Factory in a Lifetime" to "New Journey Every Day": Comparison and Enlightenment of Chinese and Western Factory Cultures

🌿 Some people like to use up every inch of space in their home and fill it up, a sense of complexity in their lives, superior and reassuring. But I like to have a blank space in the kitchen that brings me life

If you can listen to it, you will see it, and you will find a topic! China's first WYSIWYG large-scale model "RiRixin 5O" was released

商汤科技发布"日日新5o",实时多模态流式交互对标GPT-4o

SenseTime Technology's "Daily New 5O" was released, and you can listen to it and look for topics

SenseTime Launches Multi-modal Large Model "RiRixin 5O"丨Kelin Launches AI Video Web Editor

Reward丨Gou Rixin, New Every Day: Fu Baoshi "Mirror Park Flying Spring"

The race is endless, and it is new every day! Sheyang handed over the "mid-term answer sheet" for high-quality development

The "Riri New Large Model" was unveiled at the Olympic Games, and what is the color of SenseTime's AI application?

Accelerate the momentum, and the new industrial city is new day by day