laitimes

Li Auto fully pushed NOA without a map in July

On July 5, Li Auto announced at the 2024 Intelligent Driving Summer Conference that it would push the "nationwide open" no-map NOA to all Ideal AD Max users in July, and will push fully automatic AES (automatic emergency steering) and all-round low-speed AEB (automatic emergency braking) in July. At the same time, Li Auto released a new autonomous driving technology architecture based on end-to-end models, VLM visual language models, and world models, and started the early bird program of the new architecture.

In terms of intelligent driving products, NOA no longer relies on high-precision maps or prior information, and can be used in navigation coverage areas across the country, and with the help of time-space joint planning capabilities, it brings a smoother detour experience. The no-map NOA also has the ability of ultra-long-line sight navigation and road selection, and can still pass smoothly at complex intersections. At the same time, NOA fully considers the user's psychological safety boundary, and uses decimeter-level micro-operation to bring a tacit and reassuring intelligent driving experience. In addition, the AES function that will be pushed soon can realize fully automatic triggering without relying on human-assisted torque, avoiding the risk of more high-risk accidents. The all-round low-speed AEB once again expands the active safety risk scenarios, effectively reducing the occurrence of high-frequency scraping accidents in low-speed vehicle handling scenarios.

In terms of autonomous driving technology, the new architecture consists of an end-to-end model, a VLM visual language model, and a world model. The end-to-end model is used to process routine driving behaviors, and only one model is passed through from sensor input to driving trajectory output, making information transfer, inference calculation, and model iteration more efficient, and driving behavior more anthropomorphic. The VLM visual language model has strong logical thinking ability, can understand complex road conditions, navigation maps, and traffic rules, and cope with difficult unknown scenarios. At the same time, the autonomous driving system will be able to learn and test its capabilities in a virtual environment built on the world model. The world model combines two paths: reconstruction and generation, and the test scenarios constructed not only conform to the real law, but also have excellent generalization capabilities.

Fan Haoyu, Senior Vice President of Li Auto's Product Department, said: "Li Auto has always insisted on working with users to polish the product experience, from pushing the first batch of 1,000 experience users in May this year to expanding the scale of experience users to more than 10,000 people in June, we have accumulated more than one million kilometers of no-map NOA mileage across the country. After the full push of NOA without a picture, 240,000 ideal AD Max owners will use the current leading intelligent driving products in China, which is a blockbuster upgrade full of sincerity. ”

Lang Xianpeng, Vice President of Intelligent Driving R&D of Li Auto, said: "From the launch of full-stack self-development in 2021 to the release of a new autonomous driving technology architecture today, Li Auto's autonomous driving R&D has never stopped exploring. We combine the end-to-end model and the VLM visual language model to bring the industry's first solution to deploy dual systems on the vehicle side, and also successfully deploy the VLM visual language model on the vehicle chip for the first time, this industry-leading new architecture is a milestone technology breakthrough in the field of autonomous driving. ”

The four capabilities of NOA have been improved, and the roads across the country are efficient

Li Auto fully pushed NOA without a map in July

The no-map NOA, which will be pushed in July, brings four major capability upgrades to comprehensively improve the user experience. First of all, thanks to the overall improvement of perception, understanding and road structure construction capabilities, NOA without maps gets rid of the dependence on prior information. NOA can be used in cities with navigation coverage across the country, and even on more special alleys and narrow roads and country roads.

Secondly, based on the efficient spatio-temporal joint planning ability, the vehicle can avoid and detour road obstacles more smoothly. Spatio-temporal joint planning realizes the synchronous planning of horizontal and longitudinal spaces, and plans all driving trajectories in the future time window by continuously predicting the spatial interaction between the self-vehicle and other vehicles. Based on the learning of high-quality samples, the vehicle can quickly screen the optimal trajectory and perform the detour action decisively and safely.

In complex urban intersections, the road selection ability of NOA has also been significantly improved. The no-map NOA adopts the BEV visual model fusion navigation matching algorithm to perceive the changing curb, road surface arrow signs and intersection characteristics in real time, and fully integrates the lane structure and navigation features, which effectively solves the problem that complex intersections are difficult to structure, has the ability of ultra-long-distance navigation and road selection, and makes the intersection traffic more stable.

At the same time, NOA focuses on the user's psychological safety boundary, and uses decimeter-level micro-operation capabilities to bring a more tacit and reassuring driving experience. Through the occupancy network of lidar and pre-vision fusion, vehicles can identify irregular obstacles in a wider range and have higher perception accuracy, so as to achieve earlier and more accurate prediction of the behavior of other traffic participants. Thanks to this, the vehicle can maintain a reasonable distance from other traffic participants, and the timing of acceleration and deceleration is more appropriate, effectively improving the user's sense of safety when driving.

The active safety capability is advanced, and the coverage scenarios are further expanded

Li Auto fully pushed NOA without a map in July

In the field of active safety, Li Auto has established a complete library of safety risk scenarios, and continues to improve the coverage of risk scenarios according to the frequency and degree of danger, and will soon push the fully automatic AES and all-round low-speed AEB functions for users in July.

In order to cope with the physical limit scenario where AEB cannot avoid accidents, Li Auto has launched the AES automatic emergency steering function that is fully automatically triggered. At high speeds, the active safety system has very little reaction time, and in some cases, even if the AEB is triggered, the vehicle will not be able to brake in time with full braking. At this time, the AES function will be triggered in time, without human involvement in the steering operation, automatic emergency steering, avoiding the target ahead, and effectively avoiding accidents in extreme scenarios.

The all-round low-speed AEB provides 360-degree active safety protection for parking and low-speed driving scenarios. In a complex basement parking environment, obstacles such as pillars, pedestrians, and other vehicles around vehicles increase the risk of scratching. The all-round low-speed AEB can effectively identify the risk of forward, backward and lateral collisions, and brake in time for emergency braking, bringing users a more reassuring experience for daily use.

Autonomous driving technology has made breakthroughs and innovations, and dual systems are more intelligent

Li Auto fully pushed NOA without a map in July

Inspired by Nobel Prize-winning Daniel Kahneman's theory of fast and slow systems, Li Auto's new autonomous driving technology architecture simulates human thinking and decision-making processes in the field of autonomous driving, forming a more intelligent and anthropomorphic driving solution.

The fast system, i.e. System 1, is good at handling simple tasks and is an intuition formed by human experience and habit that is sufficient to cope with 95% of the routine scenarios when driving a vehicle. The slow system, or System 2, is the logical reasoning, complex analysis, and calculation ability formed by humans through deeper understanding and learning, which is used to solve complex or even unknown traffic scenarios when driving vehicles, accounting for about 5% of daily driving. System 1 and System 2 work together to ensure high efficiency in most scenarios and high ceilings in a few scenarios, respectively, and become the basis for human cognition, understanding of the world, and decision-making.

Li Auto has formed a prototype of autonomous driving algorithm architecture based on the system theory of fast and slow systems. System 1 is implemented by an end-to-end model, which has the ability to respond efficiently and quickly. The end-to-end model receives sensor input and directly outputs the driving trajectory for vehicle control. System 2 is implemented by a VLM visual language model, which receives sensor input, and outputs decision-making information to system 1 after logical thinking. The autonomous driving capabilities of the dual system will also be trained and validated in the cloud using the world model.

Efficient end-to-end model

Li Auto fully pushed NOA without a map in July

The input of the end-to-end model is mainly composed of cameras and lidars, and the multi-sensor features are extracted, fused, and projected into the BEV space through the CNN backbone network. In order to improve the representation ability of the model, Li Auto also designed a memory module, which has the memory ability of both time and space dimensions. In the input of the model, Li Auto also adds vehicle status information and navigation information, which are encoded by the Transformer model to decode dynamic obstacles, road structures and general obstacles together with BEV features, and plan the trajectory of the vehicle.

The multi-task output is realized in the integrated model, and there is no rule intervention in the middle, so the end-to-end model has significant advantages in information transfer, inference calculation, and model iteration. In real-world driving, the end-to-end model demonstrates greater general obstacle understanding, over-the-horizon navigation, road structure understanding, and more anthropomorphic path planning.

VLM visual language model with high ceiling

Li Auto fully pushed NOA without a map in July

The algorithm architecture of the VLM visual language model is composed of a unified Transformer model, which encodes the Prompt text with a tokenizer, encodes the image of the forward-looking camera and the navigation map information with visual information, and then aligns the modal through the image-text alignment module, and finally carries out unified autoregressive reasoning, outputs the understanding of the environment, driving decisions and driving trajectory, and transmits them to the system 1 to assist in controlling the vehicle.

Li Auto's VLM visual language model has a total of 2.2 billion parameters, and has a strong understanding of the complex traffic environment of the physical world, even in the face of unknown scenarios experienced for the first time. The VLM model can recognize environmental information such as road surface flatness and light, and prompt System 1 to control vehicle speed to ensure safe and comfortable driving. The VLM model also has stronger navigation map understanding capabilities, which can cooperate with the vehicle system to correct navigation and prevent driving on the wrong route. At the same time, the VLM model can understand complex traffic rules such as bus lanes, tidal lanes, and time-based traffic restrictions, and make rational decisions while driving.

Rebuild the world model of the spawn combination

Li Auto fully pushed NOA without a map in July

Li Auto's world model combines the two technical paths of reconstruction and generation, reconstructing real data through 3DGS (3D Gaussian sputtering) technology, and using generative models to supplement new perspectives. In the scene reconstruction, the dynamic and static elements will be separated, the static environment will be reconstructed, and the dynamic objects will be reconstructed and new perspectives will be generated. After re-rendering the scene, a 3D physical world is formed, in which the dynamic assets can be arbitrarily edited and adjusted to achieve partial generalization of the scene. Compared with reconstruction, the generative model has a stronger generalization ability, and the weather, lighting, traffic flow and other conditions can be customized to generate new scenarios that conform to real laws, which are used to evaluate the adaptability of the autonomous driving system under various conditions.

The combination of reconstruction and generation creates a better virtual environment for the learning and testing of autonomous driving system capabilities, so that the system has an efficient closed-loop iteration ability to ensure the safety and reliability of the system.

###

Car

Read on