On July 5, Ideal held the 2024 Intelligent Driving Summer Conference, officially announcing that it will push the "nationwide can open" no-map NOA to all Ideal AD Max users in July, and will push fully automatic AES (automatic emergency steering) and all-round low-speed AEB (automatic emergency braking) in July. At the same time, Li Auto released a new autonomous driving technology architecture based on end-to-end models, VLM visual language models, and world models, and started the early bird program of the new architecture.
From the launch of full-stack self-development in 2021 to the release of a new autonomous driving technology architecture today, behind the full-stack self-development is the anchor of the ideal for intelligent "upward", which is not only a climb to the peak of technology, but also a commitment to the ultimate user experience.
Lang Xianpeng, Vice President of Intelligent Driving R&D of Li Auto, said, "We combined the end-to-end model and the VLM visual language model to bring the industry's first solution to deploy dual systems on the vehicle side, and also successfully deployed the VLM visual language model on the vehicle chip for the first time, which is a milestone technological breakthrough in the field of autonomous driving.
Three major directions: the overall leap in intelligent driving ability
As a pioneer in the field of intelligent driving, Li Auto has always been committed to providing users with a safer, more convenient and more intelligent travel experience.
At the press conference, Li Auto announced that it will push a series of intelligent driving technology upgrades in July, including four major capability enhancements of NOA, the advancement of active safety capabilities, and breakthroughs and innovations in autonomous driving technology. These upgrades not only demonstrate Li Auto's profound accumulation in intelligent driving technology, but also bring users an unprecedented travel experience.
Among them, NOA, as one of the cores of Li Auto's intelligent driving technology, has brought four major capability improvements. Thanks to the comprehensive improvement of perception, understanding and road structure construction capabilities, NOA without maps gets rid of the dependence on prior information, and users can use NOA in cities with navigation coverage across the country, and can even open the function on more special narrow roads and country roads. This ability to efficiently pass on roads across the country will undoubtedly bring users a more convenient and free travel experience. At the same time, based on the efficient spatio-temporal joint planning ability, the vehicle can avoid and detour road obstacles more smoothly, which further improves the user's sense of safety and comfort.
In the field of active safety, Li Auto has also made significant progress. By establishing a complete database of safety risk scenarios and classifying them according to their frequency and degree of risk, Li Auto continues to improve the coverage of risk scenarios. The fully automatic AES and all-round low-speed AEB functions that will be introduced to users in July will further enhance the vehicle's active safety capabilities in complex scenarios. The launch of the fully automatic AES function will effectively deal with the physical limit scenarios where AEB cannot avoid accidents, and provide a stronger guarantee for users' driving safety. The all-round low-speed AEB provides 360-degree active safety protection for parking and low-speed driving scenarios, making users' daily use of the car more at ease.
In addition, Li Auto has also made breakthrough innovations in autonomous driving technology. Inspired by Nobel laureate Daniel Kahneman's theory of fast and slow systems, Li Auto has formed a more intelligent and anthropomorphic autonomous driving technology architecture. This architecture consists of two parts, a fast system and a slow system, which are responsible for handling simple tasks and complex scenarios, respectively. The fast system is implemented by an end-to-end model, which has the ability to respond efficiently and quickly. The slow system is implemented by the VLM visual language model, which outputs decision-making information to the fast system after logical thinking. This dual-system autonomous driving capability not only improves the vehicle's driving efficiency in most scenarios, but also ensures a high ceiling performance in a few complex scenarios.
Cultivating large model scenes: Open up more imagination
With the popularization of new energy vehicles and the rapid development of intelligent network technology, the automotive industry is undergoing unprecedented changes. As one of the core functions of intelligent networked vehicles, intelligent driving has become a new focus of competition among car companies. With its powerful data processing capabilities, deep learning capabilities, and multi-task processing capabilities, the automotive model provides a solid foundation for the intelligent driving system.
At the scene, Ideal demonstrated the remarkable progress of the integration and innovation of end-to-end models, VLM visual language models and world models, which not only improved the efficiency and ceiling of intelligent driving technology, but also created a better virtual environment for the learning and testing of autonomous driving systems.
Specifically, the end-to-end model, as one of the cores of Li Auto's intelligent driving technology, shows the advantages of high efficiency.
The model is based on the input of camera and lidar, and the multi-sensor features are extracted and fused through the CNN backbone network, and projected into the BEV space. In order to further improve the characterization ability of the model, Li Auto also designed a memory module with memory capabilities in both time and space dimensions. At the same time, vehicle status information and navigation information are also added to the input of the model, which is encoded by the Transformer model to decode dynamic obstacles, road structures and general obstacles together with BEV features, and plan the trajectory of the vehicle.
This kind of multi-task output is realized in an integrated model, and there is no rule intervention in the middle, which makes the end-to-end model have significant advantages in information transfer, inference calculation, and model iteration. In real-world driving, the end-to-end model demonstrates greater general obstacle understanding, over-the-horizon navigation, road structure understanding, and more anthropomorphic path planning.
The VLM visual language model represents the high ceiling of Li Auto's intelligent driving technology.
The algorithm architecture of the model consists of a unified Transformer model, which can encode the Prompt text and encode the visual information of the image and navigation map information of the forward-looking camera. After modal alignment through the image-text alignment module, the VLM model can uniformly carry out autoregressive inference, output the understanding of the environment, driving decisions and driving trajectory, and transmit them to the system 1 auxiliary control vehicle. Li Auto's VLM visual language model has a total of 2.2 billion parameters and has a strong understanding of the complex traffic environment of the physical world. Even in the face of unknown scenarios experienced for the first time, the VLM model can cope with it with ease. It can recognize environmental information such as road surface smoothness and light, and prompts System 1 to control vehicle speed to ensure safe and comfortable driving. At the same time, the VLM model also has stronger navigation map understanding capabilities, which can cooperate with the vehicle-machine system to correct navigation and prevent driving on the wrong route. In driving, VLM models are also able to understand complex traffic rules such as bus lanes, tidal lanes, and time-based traffic restrictions, and make informed decisions.
In addition to the end-to-end model and VLM visual language model, Li Auto is also committed to the reconstruction of the world model and the integration and innovation of generation technology. By reconstructing real data through 3DGS technology and supplementing new perspectives with generative models, Li Auto has built a world model that combines the two technical paths of reconstruction and generation.
In scene reconstruction, the dynamic and static elements are separated, the static environment is reconstructed, and the dynamic objects are reconstructed and new perspectives are generated. The scene is then re-rendered to form a 3D physical world. In this 3D physics world, dynamic assets can be arbitrarily edited and adjusted to achieve partial generalization of the scene. Compared with the reconstruction technique, the generative model has a stronger generalization ability. Weather, lighting, traffic and other conditions can be customized to create new scenes that conform to real laws. These new scenarios are used to evaluate the adaptability of autonomous driving systems under various conditions. The combination of reconstruction and generation creates a better virtual environment for the learning and testing of the capabilities of the autonomous driving system, so that the system has the ability to iterate efficiently and closed-loop, so as to ensure the safety and reliability of the system.
Postscript: Intelligent traction brand "up"
With the rapid development of science and technology, intelligent driving, as the future trend of the automotive industry, is changing the way we travel at an unprecedented speed. In this wave of change, Li Auto has become an important force leading the intelligent upgrade of the industry with its firm full-stack self-development strategy. The 2024 Intelligent Driving Summer Conference is not only a concentrated display of Li Auto's past technology accumulation, but also a profound discussion and layout of the future intelligent development direction.
At present, intelligence is becoming an inexhaustible driving force for automobile brands. Ideal full-stack self-development means that from the underlying technology to the upper-level application, from hardware to software, Li Auto is in its own hands, which also enables Li Auto to deeply understand user needs, quickly respond to market changes, and continuously launch more intelligent, safe and convenient travel solutions.
As Fan Haoyu, senior vice president of the product department of Li Auto, said, Li Auto has always insisted on working with users to polish the product experience, from pushing the first batch of 1,000 experience users in May this year to expanding the scale of experience users to more than 10,000 people in June, we have accumulated more than one million kilometers of no-map NOA mileage across the country. After the full push of NOA without a picture, 240,000 ideal AD Max owners will use the current leading intelligent driving products in China, which is a blockbuster upgrade full of sincerity.