laitimes

SenseTime CVPR "Best Paper" on the bus! Thousands of lines of code to achieve end-to-end intelligent driving

author:Smart car reference
Jia Haonan sent from the co-driving temple smart car reference | Official account AI4Auto
With the evolution of AI technology, the living space of the previous Tier1 autonomous driving will become smaller and smaller.

During the lively 2024 Beijing Auto Show, Wang Xiaogang, co-founder and chief scientist of SenseTime, and president of Jueying Intelligent Vehicle Business Group, gave such a new judgment.

This year, high-end intelligent driving has reached the moment of popularization. Autonomous driving has also reached the "knockout" stage: the function volume is "no map", the cost level volume is "thousand-yuan level", and it also needs to be "standard", and the technical competition is "end-to-end" and "data-driven".

Wang Xiaogang believes that these concepts that are hotly discussed in the industry are not suddenly popular, and all this is just the inevitable result of the evolution of AI technology.

As early as 2018, SenseTime has been making reserves in today's competitive situation.

Phenomena and experiences: new products and technologies of SenseTime at the auto show

The Beijing Auto Show just opened today. According to the rough statistics of smart car reference, there are about eighty or ninety different brands and models at the entire auto show, all of which are equipped with SenseTime's technology or solutions.

At the Shanghai Auto Show exactly a year ago, the number was still thirty or forty.

These technologies and solutions have been mass-produced and delivered to users.

For example, the popular Xiaomi SU7, which has previously officially displayed an interactive scene that has aroused heated discussions:

The user pointed to a car in front of him and asked the voice assistant what brand and model it was, and the car machine immediately gave an accurate answer.
SenseTime CVPR "Best Paper" on the bus! Thousands of lines of code to achieve end-to-end intelligent driving

In fact, behind it is the ability of a series of large models to collaborate. For example, large language models accurately understand user instructions and give corresponding answers, while multimodal large models correlate video, sound, image, and other data to form environmental understanding, logical thinking, and content generation capabilities.

SenseTime CVPR "Best Paper" on the bus! Thousands of lines of code to achieve end-to-end intelligent driving

In terms of intelligent driving solutions, GAC Aion's mid-size SUV LX Plus, which focuses on practicality, is equipped with an ADAS system with high-speed navigation assistance capabilities.

The high-speed pilot full-stack intelligent driving capability of SenseTime is equipped on the new coupe S, Nezha.

In addition to the mature solution products that have been delivered for mass production, Jueying also showed more "black technologies" that are about to be mass-produced and put on the car at the Beijing Auto Show.

For example, Apple's Vision Pro is popular this year, allowing people to appreciate the charm of 3D interaction. Jueying has launched two new cockpit 3D interactions, including 3D Gaze and 3D dynamic gesture interactions.

Among them, 3D Gaze will allow users to control the central control icon through their eyes, and 3D dynamic gesture interaction is an industry-leading intelligent cockpit technology that supports dynamic gestures and hand micro-motion recognition, allowing users to interact with various cockpits through gestures "in the air".

With the cooperation of the two functions, the experience is almost like "Vision Pro" naked eye on the car, and the interaction of the intelligent cockpit is more in line with human intuition and more natural.

SenseTime CVPR "Best Paper" on the bus! Thousands of lines of code to achieve end-to-end intelligent driving
SenseTime CVPR "Best Paper" on the bus! Thousands of lines of code to achieve end-to-end intelligent driving

At the end of 2022, SenseTime proposed the industry's first universal autonomous driving model with integrated perception and decision-making, UniAD, and the following year, this paper won the best paper in 2023 at CVPR.

Just at the Beijing Auto Show, SenseTime announced that this best paper is going to be "on the bus"!

In terms of experience, the tidal lane is a great challenge for the traditional intelligent driving solution, but after the end-to-end large model is trained on relevant data, it can interpret and understand external data such as indication text, icons, and traffic flow changes, so as to actively change the route and drive into or out of the tidal lane.

Another example is the scene that is often encountered on rural roads: in the case of oncoming traffic, there is a pedestrian running in front of you:

SenseTime CVPR "Best Paper" on the bus! Thousands of lines of code to achieve end-to-end intelligent driving

In order to ensure safety, the test vehicle equipped with UniAD first accelerated to the left to avoid pedestrians, and then quickly turned to the right to avoid oncoming traffic.

The previous intelligent driving products can also have a chance to pass in the case of map information, but the success rate is not guaranteed, because behind it is a set of complex rules defined by the "passive trigger" mechanism, but the situation on the road is slightly different, and the system has nothing to do.

Through data learning and drive, AI drivers can cope with complex environments such as urban areas and even rural roads without markings and traffic signs with only the visual perception of cameras.

SenseTime CVPR "Best Paper" on the bus! Thousands of lines of code to achieve end-to-end intelligent driving

Intelligent driving has opened an overwhelming era this year, and high-speed NOA has become the standard threshold in terms of function, and the price has also dropped to 15-200,000 yuan models.

But under the excitement, many industry technology giants have recently issued the same warning:

Serious consideration has to be given to the technical route, and the question is how far the previously rule-based stack can go.

Behind this is the evolution of intelligent driving algorithms, from the previous modular and rule-driven to end-to-end integrated models and data-driven.

Therefore, what is more important than "how many cars are on" is that SenseTime's end-to-end model is the first to be put on the car, representing the development trend of China's smart car industry and the new technology paradigm in the future.

Technology: End-to-end, where is the "real"?

The UniAD proposed by SenseTime is the first end-to-end autonomous driving model among domestic players.

And what is surprising is that the progress of mass production on the car is also the fastest.

In addition to the experience benefits just mentioned, UniAD has 4 key points:

Efficient development iterations

"Pure vision, pure no picture" high-level intelligent driving is innate

True end-to-end integration of perception and decision-making

It is light in size, with thousands of lines of code

Separately, the end-to-end model can transfer and generalize the driving capabilities and skills it has learned to other scenarios through a completely data-driven model, independently and efficiently solve various new long-tail problems in the driving and parking scenarios, and has faster iteration efficiency, which can effectively reduce the cost of opening the city and help car companies achieve the goal of "opening the whole country" more quickly.

And the "no-map NOA" function that everyone is rolling up now, as well as the purely visual city NOA capability that many players are actively promoting, will be the innate talent of the end-to-end model, because it only needs navigation information to drive the car to its destination.

This kind of "pure no map" and "pure vision" capabilities can naturally help car companies reduce software and hardware costs, and completely bid farewell to the low coverage of high-precision maps, slow updates, and the need to rely on sensor redundancy such as lidar to solve the cost problems caused by various corner cases.

More importantly, as can be seen from the description of UniAD, the biggest difference is that it is infinitely close to the human driving thinking mode, actively learning, thinking and reasoning, and understanding the complex traffic environment, rather than passively triggering countermeasures according to different scenarios.

How?

In fact, the so-called "end-to-end" is for the traditional technology paradigm, in which the perception, decision-making, and regulation of autonomous driving are independent of each other. The data collected by the sensor needs to pass through this series of different algorithm modules before it can finally be "turned into" into operation instructions.

In such a technical system, only the perception module usually applies the AI model, and the rest of the modules are based on human-defined handwriting rules.

The information between each independent module is transmitted step by step, and there will inevitably be information loss and error in this process, and the error of the previous module will affect the next one, and the information error between multiple modules will continue to accumulate, which will affect the overall effect of the autonomous driving scheme.

SenseTime CVPR "Best Paper" on the bus! Thousands of lines of code to achieve end-to-end intelligent driving

Secondly, the rule-led intelligent driving function is rigid and rigid in behavior, and flexible response measures cannot be taken in the face of different road conditions, resulting in the entire product being difficult to use and not daring to use.

Wang Xiaogang said that Waymo, Tesla, including SenseTime, have tried to optimize and iterate on the traditional rule-based intelligent driving scheme, but none of them can break through the limitations of this algorithm framework.

In order to realize the lossless transmission of information from the beginning of perception, there must be a new algorithm paradigm - the end-to-end algorithm model.

At present, many end-to-end solutions on the market build a large model framework for the two modules of perception and decision-making, because it is easier to implement them. However, the transmission between the two models of the "two-stage" scheme is still artificially defined explicit information, which cannot avoid information loss and error, reduces the difficulty, and lowers the upper limit of its capabilities.

SenseTime CVPR "Best Paper" on the bus! Thousands of lines of code to achieve end-to-end intelligent driving

SenseTime's UniAD solution integrates perception, decision-making, planning and other modules into a full-stack Transformer end-to-end model to achieve the integration of perception decision-making, without the need to abstract and transfer perception data step by step, and "what you see is what you get":

The original information collected by the sensor is input into the model, and then the command output is based on the self-vehicle trajectory planning.
SenseTime CVPR "Best Paper" on the bus! Thousands of lines of code to achieve end-to-end intelligent driving

This is the key reason why UniAD is called "true end-to-end" – not just a "neural network" of decision-scale modules, but a complete perception of the entire decision-making process as a whole from the very beginning, thinking and solving problems.

The "end-to-end" autonomous driving model is not a very new thing, having been first proposed by NVIDIA in 2016. But the reason why we are starting to practice now is because the end-to-end large model "black box" lacks explainability, which stuckmost players: the performance experience is not good, but they don't know how to tune the parameters...

SenseTime's solution is as follows: compared with the non-decoupling end-to-end solution, UniAD integrates multiple modules under an end-to-end model architecture, and each module can still be monitored and optimized separately.

Wang Xiaogang believes that the mass production of end-to-end models on the car is a trend, and models such as "input data output throttle and brake signals" still have safety risks when getting on the car.

The degree of "integration" is not achieved overnight, but is a process that gradually integrates all aspects of perception, decision-making, and regulation into a large model.

The reason why UniAD can be called the first "true" end-to-end in China is because it has the highest degree of integration and the farthest road to integration.

It is precisely because of true integration that the proportion of manual code maintenance in the entire system has been reduced to a minimum, with a total volume of only a few thousand lines of code.

Jueying was the first to achieve, and Wang Xiaogang attributed it to SenseTime's long-term investment in AGI (Artificial General Intelligence) capabilities.

Since 2018, SenseTime has begun to lay out computing infrastructure, investing more than 5 billion yuan to build an intelligent computing center AIDC in Shanghai Lingang, and many people at that time did not understand why an algorithm company would invest so much in infrastructure construction.

SenseTime CVPR "Best Paper" on the bus! Thousands of lines of code to achieve end-to-end intelligent driving

However, it has been verified that strong computing power is indispensable for the development of AI large models. Relying on the AIDC-supported SenseTime large device, SenseTime also has industry-leading computing power reserves, with an operating computing power scale of 12,000P, and it is expected that the peak computing power will reach 16,000P by the fourth quarter of 2024.

On the basis of powerful computing power, SenseTime has established its own "RiRixin Large Model System", covering large language models, Wensheng graph/video models, multimodal models, etc., which can solve many open-ended tasks and take the lead in touching the threshold of general artificial intelligence.

Therefore, Wang Xiaogang believes that the leading progress of Jueying in the end-to-end model, as well as the comprehensive layout of intelligent driving/cockpit/vehicle cloud business, are actually the best landing and practice carriers of SenseTime's AGI technology.

Trend: End-to-end reset smart cars

The living space of traditional autonomous driving companies is getting smaller and smaller, which is Wang Xiaogang's latest judgment.

Such a view is still from the perspective of technological evolution:

The rise of end-to-end, resetting the autonomous driving track, and racing on the car are the indicators and "touchstones" of the new stage of the autonomous driving track.

To put it in detail, the end-to-end model truly practices the "first principles of autonomous driving" for the first time, and perfectly solves problems that were difficult to solve in the past from the two dimensions of experience and technology iteration.

Because of this, it gives all players new opportunities: better intelligent driving experience, lower maintenance, pan-cost, and more competitive intelligent driving solution cost.

However, the price is that the previous modular, rule-driven-led technical system must be overturned and reconstructed.

Therefore, it is also a challenge with a very high threshold, and from the example of SenseTime, it is necessary to have at least these capabilities:

Computing infrastructure, accumulation of basic large models, multi-modal large models...

Of course, there is also the "sunk cost" of switching technical routes: the money and time invested in the past.

The veteran stars may reset their advantage to zero, and the "latecomers" will also gain the lead.

The 2024 autonomous driving reshuffle, on the surface, is to look at the project landing and the funds on the account, but in fact, the main driving factor behind it is the reconstruction of the technical route.

Under the new trend, SenseTime's masterpiece is worth paying attention to.

Read on