laitimes

SenseTime's true end-to-end autonomous driving solution UniAD for mass production was demonstrated for the first time

author:Kōko Kōnen

On April 25, SenseTime's true-to-end autonomous driving solution UniAD (Unified Autonomous Driving), a mass-produced autonomous driving solution, made its debut at the Beijing Auto Show. At the end of 2022, SenseTime proposed the industry's first universal autonomous driving model with integrated perception and decision-making, UniAD, which won the best paper at the 2023 International Conference on Computer Vision and Pattern Recognition (CVPR), SenseTime took the lead in achieving a key breakthrough in China's end-to-end autonomous driving solution from technological innovation to vehicle-end deployment.

The true end-to-end solution made its debut on the car, and UniAD started in urban areas and traveled smoothly on rural roads

As the landing scene of intelligent driving enters the urban area from the highway, the complexity of the road environment increases dramatically, and for the traditional intelligent driving solution, the complex scene of unprotected left turn in the urban area is a big challenge, which requires multi-sensor fusion perception and a large amount of resources to solve various long-tail problems.

Now, vehicles equipped with UniAD's end-to-end autonomous driving solution can observe and understand the external environment like a human through data learning and driving only with the visual perception of the camera, without the need for high-precision maps, and then based on sufficient rich perception information, UniAD can think and make decisions on its own, drive like a human, and smoothly make unprotected left turns. Quickly pass through traffic light intersections where people and vehicles are mixed, and independently solve various difficult urban complex driving scenarios.

SenseTime's true end-to-end autonomous driving solution UniAD for mass production was demonstrated for the first time

(Vehicles equipped with the UniAD solution can pass quickly and at traffic light intersections with mixed traffic)

Not only that, on rural roads without a center line, which are difficult to break through with traditional solutions, UniAD can also drive freely, completing a series of difficult operations including turning left on the bridge at a large angle, avoiding vehicles occupying the road and construction areas, and bypassing running pedestrians, so as to truly "drive like a human".

SenseTime's true end-to-end autonomous driving solution UniAD for mass production was demonstrated for the first time

(Vehicles equipped with the UniAD solution can autonomously avoid the construction area.)

There is a rather complex scene in the on-board demonstration at the Beijing Auto Show: on the narrow road in the unmarked countryside of Lingang, there are cars coming from the opposite direction, and there are pedestrians running in front, UniAD judges that there is enough space in front of them to operate, so under the condition of ensuring safety, they choose to quickly turn left to bypass the pedestrians and then return to the normal driving route to complete the meeting, and successfully solve this complex scene, just like an old driver driving.

SenseTime's true end-to-end autonomous driving solution UniAD for mass production was demonstrated for the first time

(UniAD flexibly bypasses pedestrians and completes the meeting of vehicles, so that you can truly drive like a human)

SenseTime demonstrated the strength of China's end-to-end intelligent driving solution for mass production with the amazing results of UniAD real-car testing.

UniAD true end-to-end: the integrated large model of perception and decision-making is the optimal solution

At present, the mainstream architecture scheme of autonomous driving algorithms is based on handwritten rules artificially defined by engineers, and relies on the cooperation of different modules such as perception, decision-making, and planning to achieve autonomous driving. However, due to the fact that the data between each independent module is transmitted step by step, there will inevitably be information loss and error, and the error of the previous module will affect the next one, and the information error between multiple modules will continue to accumulate, which will affect the overall effect of the autonomous driving scheme.

Moreover, the limited rules cannot fully cover the infinite complex scenarios and long-tail problems, and the ceiling of traditional intelligent driving has begun to appear.

In order to realize the lossless transmission of information from the beginning of perception and break the ceiling of traditional intelligent driving, a new algorithm paradigm is necessary, and the end-to-end model is opening up a new technical route for autonomous driving.

Different from traditional intelligent driving algorithms, the end-to-end autonomous driving solution refers to the goal of ultimate driving performance, through an integrated way to deal with autonomous driving tasks, from perception to decision-making to control, the entire process from perception to decision-making to control is completed by the Transformer neural network model.

With the introduction of UniAD and Tesla's actual car equipped with FSD V12 on the road, more and more companies have begun to launch their own "end-to-end" solutions.

At present, many end-to-end solutions on the market build a large model framework for the two modules of perception and decision-making, which is easier to implement, but the information transmitted between the two models of perception and decision-making in the "two-stage" end-to-end scheme is artificially defined explicit information, and data transmission will still be filtered and lost.

SenseTime's UniAD solution is the industry's first to integrate perception, decision-making, planning and other modules into a full-stack Transformer end-to-end model to realize the integration of perception decision-making, without the need to abstract and transfer perception data step by step, "what you see is what you get", directly input the original information into the end-to-end model, and then output instructions based on the self-vehicle trajectory planning to achieve true end-to-end autonomous driving.

SenseTime's true end-to-end autonomous driving solution UniAD for mass production was demonstrated for the first time

(True end-to-end is the integration of perception and decision-making)

In the future, the end-to-end solution will replace the time-consuming inefficient model that relies solely on manpower and will become a key capability of autonomous driving in the AGI era.

First of all, the traditional intelligent driving scheme and the "two-stage" end-to-end scheme rely on artificially defined rules to transmit explicit information, there are information errors and losses, and it is difficult to completely and accurately restore the external scene, and the most obvious advantage of the end-to-end autonomous driving model is the lossless transmission of information, the end-to-end model is based on the original information for learning, thinking and reasoning, and finally can comprehensively understand the complex traffic environment like a human, and can continue to grow, with a higher ability ceiling.

Second, the data-driven end-to-end solution can transfer and generalize the driving capabilities and skills it has learned to other scenarios, with faster iteration efficiency, and help car companies achieve the goal of being able to drive across the country more quickly.

Finally, the end-to-end autonomous driving model is to perceive and understand the external environment like a human, and pure vision and no high-precision maps are the innate talents of UniAD, which only needs navigation information to drive the car to the destination, which can naturally help car companies reduce software and hardware costs.

The integrated perception and decision-making model with higher capability cap, faster iteration efficiency, and lower system cost is the optimal solution for true end-to-end intelligent driving.

SenseTime's true and hard core capabilities: powerful model performance, high-quality data, and abundant computing power

Compared with the traditional rule-based intelligent driving solution, the core advantage of the end-to-end autonomous driving solution is the strong learning, thinking and reasoning ability of the large model, especially the "emergence" ability, while the ability of the UniAD end-to-end solution needs to be supported by strong model performance, high-quality data and rich computing resources.

In terms of model performance, SenseTime proposed the industry's first general model for autonomous driving with integrated perception and decision-making at the end of 2022, and the UniAD solution has gone through many rounds of iterations driven by high-quality data, and its performance has been continuously optimized, which is in a leading position in the industry.

Tesla's FSD V12 version deleted more than 300,000 lines and eventually shrunk them down to a few thousand, but the capabilities of this end-to-end intelligent driving solution are still strong and growing. The same is true for UniAD, relying on SenseTime's rich experience in model lightweight deployment, SenseTime's Jeeying UniAD solution will be deployed and launched in the second half of 2023, and it will continue to iterate and grow rapidly with the support of abundant computing power and high-quality data.

Not only that, Tesla's FSD V12 and other integrated end-to-end solutions are based on a non-decoupling model, UniAD is to integrate multiple modules into an end-to-end model architecture, and each module can still be monitored and optimized separately, compared with the end-to-end technology of pure black box, UniAD solutions have stronger interpretability, security and continuous iteration.

At the data level, the training of end-to-end autonomous driving requires high-quality video data, mainly in various long-tail scenarios, such as wrong-way vehicles, non-motorized vehicles traversing, pedestrians with "ghost probes", etc., which are very difficult to collect in the real world.

Through the collection of real vehicles, the cleaning and screening capabilities of the data pipeline, and the powerful simulation technology, SenseTime can artificially create complex scenarios by adding obstacles, etc., to provide nutrients for UniAD to continue to evolve and commercialize.

Relying on the world model, SenseTime can continuously generate video data of more detailed and complex scenes in the autonomous driving environment, and then use this data to train targeted models for UniAD. For example, the world model can generate complex urban scenes such as mixed traffic and roundabouts, and can even replicate "8D" urban structures.

SenseTime's true end-to-end autonomous driving solution UniAD for mass production was demonstrated for the first time

(SenseTime provides a solid foundation for efficient training and learning of UniAD and actual vehicle deployment)

In terms of computing power, SenseTime began to deploy and build AI infrastructure in 2018, and now, SenseTime has deployed a nationwide integrated intelligent computing network, with a total computing power scale of 12,000 petaFLOPS (petaflops per second, hereinafter referred to as "P"). With the leading computing resources of SenseTime in China, the efficient training and learning of the UniAD autonomous driving solution has a solid foundation for actual vehicle deployment.

DriveAGI: Smarter and more powerful end-to-end is on the way

At the Beijing Auto Show, SenseTime released a preview of DriveAGI, a smarter and more powerful next-generation autonomous driving technology, which is based on a multi-modal large model to improve and upgrade the end-to-end intelligent driving solution.

DriveAGI is the evolution of the autonomous driving model from data-driven to cognitive-driven, beyond the concept of driver, deepening its understanding of the world, with stronger reasoning, decision-making and interaction capabilities, and is currently the technical solution that is closest to the human thinking mode, the most able to understand human intentions and the strongest ability to solve difficult driving scenarios in autonomous driving, taking an important step towards completely unmanned driving.

SenseTime's true end-to-end autonomous driving solution UniAD for mass production was demonstrated for the first time

(DriveAGI, a next-generation autonomous driving model: Perceived, interactive, and trusted)

Not only that, DriveAGI is built based on a multi-modal large model, which has strong interaction capabilities, allowing users to interact with natural language instructions and driving control in the cockpit, and further achieve perceivable, interactive and trustworthy experience.

From UniAD to DriveAGI, SenseTime has been leading the trend of end-to-end autonomous driving, but we won't stop there. SenseTime is breaking the boundaries between intelligent cockpit and intelligent driving, promoting the architectural transformation of cabin and driving integration, and accelerating intelligent vehicles into a new future of AGI.

Read on