A brief analysis of the development of autonomous driving perception

--Collect the "Automotive Driving Automation Classification" (GB/T 40429-2021)--

As a veteran who has been in the field of autonomous driving perception for several years, I am ready to share some of my understanding of this field on the platform. This can help me build a more systematic knowledge system for myself, and I hope to recognize more excellent peers, so we welcome everyone to communicate with each other and help each other. The information sources of the article are mainly public information, the sharing of other creators and my daily work experience, and I am once again very grateful to the colleagues who quoted the article.

Data closed-loop

Today's first chapter wants to share is the data closed loop, and at present, in the field of automatic driving, the concept of data closed loop can be described as hot. With the concept of data closed-loop and the ideal rolling iteration model, the seemingly unattainable vision of autonomous driving is gradually becoming landable. On the one hand, technology companies have gained the favor of capital and continuously improved their own valuations; on the other hand, car companies are particularly satisfied with the logic that data is king, and they hold the first-hand data source and sit firmly in Diaoyutai. So first, we need to explain what a "data closed loop" is.

Data closed loop, also known as data analysis closed loop, its core idea is a very simple analysis method: when we encounter a software (or algorithm) problem, collect relevant data and reproduce, the resulting problems are targeted to solve, update the software (or algorithm) version after testing and confirming, and finally release a new version.

When each step in this loop is a technical implementation means, the entire closed loop is formed, and the iterative upgrade of the software (or algorithm) can be completed according to this closed loop.

In the software development stage, that is, when the version has not been officially released, the entire loop is generally controllable and achievable: the software development (including self-test) team completes the analysis steps of A, D, E, F, G, when the version is confirmed, the system tester conducts the test, and the testing process finds the problem (step B) will record the problem performance, time and other information (step C) in detail, and provide it to the developer. However, after the software version is officially released, steps B and C are missing, which is why many companies or articles focus on information collection when describing data closed loops. In general, these two steps are missing for several reasons:

There is no appropriate problem feedback mechanism in the software

Software users do not have the relevant ability to provide valid problem information

Software problems involve their own operational information, and software users are reluctant to provide it

Since the problem data is the most valuable data, all software vendors will do a lot of work on the data closed loop, such as everyone will encounter similar pop-up boxes when using computers:

A brief analysis of the development of autonomous driving perception

The response of this pop-up box is an effective data closed loop, and perfectly solves the above three problems: when the problem occurs first, the software actively pops the box, and the customer only needs to confirm it; the software background packages the required information, and the customer cannot know what is uploaded, so 2 and 3 are circumvented. But this is the traditional PC-side software, and the problems encountered by the autopilot-related software will be more complex than the PC side. Relative to PC-side software:

Drivers should try to avoid driving distractions and do extra operations when the software goes wrong, such as clicking;

When the intelligent driving software is released, the probability of affecting the driver's software problems should be greatly reduced, so a large number of software problems that the driver cannot perceive cannot be recorded;

Driving information includes information such as roads and traffic participants, and laws and regulations will be supervised.

Three of these are policy issues, not technical solution bottlenecks. Self-driving related companies solve problems 1 and 2 through silent (shadow) mode of operation and OTAs. In general, in the software field of autonomous driving, the two technologies of steps B and C have been completed to open up the entire data closed-loop link.

Vendor data closed-loop mode

Tesla: The following figure is Tesla's open data closed-loop rolling model framework: after obtaining the data source, the model error is confirmed through unit testing, a large amount of effective data is obtained through data enhancement amplification technology and data annotation is obtained, and after multiple rounds of data cleaning, the model training is completed, and the deployment is finally completed.

Waymo: The following figure is the data closed-loop platform mentioned in the Google Waymo report: after obtaining the data source, the data truth value is obtained through manual calibration and automatic calibration, which involves data screening, mining and active learning. After iteratively completing the model optimization, the testing and version release are completed, and the data is continuously obtained.

Nvidia: The following figure is NVIDIA's machine learning platform MAGLEV established by autonomous driving development, which is also based on a closed-loop model iteration: after obtaining the data source from the real car, it completes intelligent data screening, data annotation, model search, training, evaluation, debugging and deployment.

Momenta: The following figure is the "flywheel mode" proposed by momenta in 2019, which is essentially a rolling pattern of data closed loop, containing three key factors:

Data-driven: To create a data-driven in the whole process, compared with the manual drive, the performance of each module will be improved across orders of magnitude;

Massive data: The goal is to collect billions of real scenes of highways and roads, and complete the unification of L4 and L2 data streams, forming a dual-track improvement of technology and data;

Closed-loop automation: It has a highly automated mining ability and labeling ability for data, and realizes the closed-loop automation of the whole process of screening, labeling and iteration

Data closed-loop trends

Combined with competitive analysis, it can be seen that the data closed loop itself is a fixed analysis mode, but each step is constantly evolving with the change of task and technological innovation. In the closed loop of data, if each algorithm problem is manually reproduced and debugged, it will bring huge costs: in software development, assuming that 100 problems can be solved by 100 engineers in 1 day, then if it is 10,000 problems, then either 10,000 engineers solve it in 1 day, or 100 engineers solve it within 100 days, this labor cost or software development cycle is unacceptable, and in the field of automatic driving, algorithm personnel will need to face far more than 10,000 algorithm problems. Therefore, data and algorithm two-wheel drive can achieve more mature automatic driving technology, which is also an inevitable choice for the landing of automatic driving technology.

So in order to create a sound, complete and low-cost ideal software iteration model, it will be prepared from both data and algorithms:

On the data side, samples are collected around both massive and high-quality features. The ideal data collection includes real vehicle data and simulation data, and the amount of real vehicle data = the number of real vehicles * the collection time. High quality manifests itself in two dimensions: the richness of the sensor and the scarcity of the sample:

On the algorithm side, improving automation under the premise of ensuring performance will be the main trend of future algorithm schemes. In the application direction of data closed-loop, the algorithm mainly falls in three fields: massive data mining algorithm, high-precision pseudo-true value generation algorithm, and functional algorithm mass production scheme (as far as possible end-to-end):

Based on this ideal data-driven closed-loop model, it evolves into:

The core of the entire closed-loop system is the rolling of data and mass production algorithm schemes, so share the relevant content of data collection in the later sharing.

Public datasets

In the early days of algorithm research, everyone will use public data for preliminary verification. Public datasets serve as a benchmark for the quality of datasets, which can be used to analyze the requirements of acquisition equipment. Among them, the common data that claims to be able to achieve centimeter-level positioning accuracy are Kitti, Nuscenes, AplloScape, Waymo, Lyft Level5 and so on.

For the perception task, the main information is the bounding box of the target (including 2D, 3D) or the pixel-by-pixel semantic information on the original image. The Kitti dataset is the most versatile dataset for current perceptual tasks, and is used by almost all CV companies, and the main sensors of the Kitti data collection vehicle include:

1 Inertial Navigation System (GPS/IMU): OXTS RT3003

1 lidar: Velodyne HDL-64E

2 grayscale cameras, 1.4 megapixels: Point Grey Flea2 (FL2-14S3M-C)

2 color cameras, 1.4 megapixels: Point Grey Flea2 (FL2-14S3C-C)

4 zoom lenses, 4-8 mm: Edmund Optics NT59-917

The Nuscenes and Waymo datasets are structurally and informatively similar to Kitti, providing 3D bounding box tag truth values based on fusion perception. The ApolloScape dataset consists of two types, some focusing on target perception and others labeling target trajectories, which can be used to predict tasks.

The Lyft Level5 dataset published in 2020 is quite different from the above dataset, it does not provide bounding box information, but it provides a detailed map information of each point on the road, which contains both detailed static road information and target information at the time of collection, (provided by the top view envelope), so this dataset may not be helpful for common mobile target perception tasks, but it can effectively provide static target information. And it can eliminate the interference of dynamic targets, which is not available in other data. This data is the best base data for the current BEV correlation algorithm.

The collection vehicle for the Lyft Levels5 dataset is loaded with 3 lidars, 5 mmWave radars and 7 cameras, of which the roof laser (64-line @10HZ), 4 mmWave radars and 7 cameras are mounted on the top of the vehicle, and the front bumper is equipped with 2 40-line lidar and 1 mmwave radar (pictured below).

Lyft Level5 collection car does not give the vehicle positioning sensor information, with the help of AutoNavi map collection car information, you can understand the sensor accuracy requirements for making high-precision maps:

The accuracy of laser and inertial navigation is generally determined by the laser hardware. In order to achieve good cross-modal data alignment between the laser and the camera, when the top laser sweeps through the center of the camera FV, the exposure of the camera will be triggered. The timestamp of the image is the exposure trigger time; the timestamp of the laser scan is the time when the current laser frame is fully rotated. Considering that the exposure time of the camera is almost instantaneous, this method usually achieves good data alignment. Of course, the camera tries to choose a rolling shutter, so that the exposure time can be fixed, and the current rolling shutter cameras used to collect high-precision maps and public data sets are all rolling shutter cameras.

In terms of positioning accuracy, the current pure car end of the program, the most ideal accuracy is the combination of navigation positioning: GPS \ IMU \ DMI, AutoNavi publicity in Chihiro is an independent positioning algorithm company, which should be combined with self-developed and outsourced algorithms to obtain more stable positioning accuracy.

GPS or GNSS; Refers to the Global Positioning System . By receiving satellite signals, the user equipment obtains the distance observations of the user equipment and the satellite, and the three-dimensional coordinates, heading and other information of the user equipment are obtained through specific algorithm processing. Using different types of observations and algorithms, the positioning accuracy ranges from the centimeter level to the 10 meter level. Its advantages are high precision, error does not diverge with time, and the disadvantage is that it requires a general view, and the positioning range cannot be covered indoors.

IMU (Inertial measurement unit): Refers to the inertial measurement unit. Includes gyroscope and accelerometer. The gyroscope measures the angular rate of the triaxial axis of the object and is used to calculate the carrier attitude; the accelerometer measures the linear acceleration of the triaxial axis of the object and can be used to calculate the carrier velocity and position. The advantage of M center is that it does not require general vision, and the positioning range is the whole scene; the disadvantage is that the positioning accuracy is not high, and the error diverges with time. GPS and IMU are two complementary positioning technologies.

DMI (Measurable Image): Is an emerging ground stereoscopic image information product, including absolute outer azimuth element information on the spatio-temporal sequence, which can support direct browsing of the real scene of the environment, relative measurement of the height, width, area and other information of the target figure, as well as applications such as absolute position analysis measurement and target attribute information mining. The vehicle navigation system is used to collect real-time images and match them with the measured images obtained in advance, and the spatial position information of the measurable images on the matching is transmitted to the real-time images, and the current position of the motion carrier is deduced through spatial coordinate transformation. In fact, it can be simply understood as a combination of visual information for auxiliary positioning.

Data collection of domestic manufacturers

At present, the various companies on the Internet have less disclosure of the development of the sample set, on the one hand, this information involves development privacy, while the development data and test data are generally homologous, and the road test data is an important subset of the development data. Therefore, we can evaluate the data collection situation according to the road test information of the domestic research and development of Robotaxi. These road test data can be converted into development samples after effective processing, and the L4 level of high-precision sensors will also have significant positive benefits for L2 level mass production solutions:

From the information that can be collected, Baidu is in a clear leading position in data collection. The road test city, kilometer number and collection vehicle are far beyond other companies; Xiaoma Zhixing, Wenyuan Zhixing, autoX (in February 2022, autoX claimed that it has more than 1,000 Robotaxi fleets) is probably in the second echelon state, the number of kilometers is about 8 million; Didi, Yuanrong, momenta, zhixing are expected to be not much different, 200 ~ 500W test volume. The information disclosed by other companies is not very complete and is not easy to evaluate. It should be emphasized again that this is based on Robotaxi's road test information to evaluate, in fact, companies, especially those that lay out multiple tracks, must have additional data sources: for example, Huawei, in 2020, released the automatic driving dataset ONCE Dataset, containing 1 million 3D point cloud scenes, each 3D scene has 7 cameras to take pictures covering a 360-degree perspective, a total of 7 million pictures.

Mass production data recovery

The above discussion is the establishment of data collection capabilities in the development stage, and the other data collection capability, that is, the key capability to open up the data closed-loop system, is mass production data recovery. First of all, the current discussion is mainly about using self-driving cars as data collection ports and application outputs, and does not extend to the situation where the car directly handles high-intensity training tasks (which can be gradually expanded later).

OTA (Over-the-Air Technology): Is the realization of mobile terminal equipment and M card data remote management technology through the air interface of mobile communications, the original main purpose is to enable users to obtain value-added services through downloads and updates. However, with the opening of the network transmission channel, the upload of user-use information will also enable software providers to obtain a large amount of valid information, so as to have the ability to provide more competitive functions. The first use of OTA technology in the field of autonomous driving can be traced back to the cooperation period between Mobileye and Tesla (before 2014), after the separation of the two companies, each of them is vigorously using OTAs to enhance product power: Mobileye pioneered the Road Experience Management road network collection management system, uploading all the perception devices equipped with Mobileye devices and generating perception maps in the cloud. Then upgrade the perception map to each car to improve the perception ability.

Tesla is more aggressive in this area, on the one hand, the vehicle is gradually softwared, through the gradual payment to unlock the way to add various driving functions, on the other hand through data backhaul to collect tens of billions of kilometers of real car data. Comparing the road test data of various domestic L4 in the previous article, we can see the horror of the energy and efficiency of this data collection method. Therefore, the relevant domestic autonomous driving OTA technology is imitating and improving Tesla's related solutions.

Tesla's biggest technical highlight in data recovery is the shadow mode, which perfectly solves the contradiction between the quality and quantity of data collection. Because the current automatic driving function is not mature, the number of times the driver uses the automatic driving function will not be particularly frequent, and the data can only be recovered by sampling a large amount of raw sensor data when it is not turned on. A driving car loaded with data from cameras, lasers and radar can generate 80T of data per day, and this amount of data transmission is impossible to complete before the 5G construction matures.

Another problem is that drivers for their own safety considerations, more in their own familiar roads to open the automatic driving function, so if you only turn on data recovery at these times, then you can only receive some fixed scene data, and many will not have any unexpected situations, resulting in a decrease in the value of data. The core of Tesla's "shadow mode" is that in the manned state, the system including sensors is still running but does not participate in vehicle control, but only the decision algorithm is verified A system algorithm makes continuous simulation decisions in the "shadow mode", and compares the decision with the driver's behavior, once the two are inconsistent, the scene is judged to be "extreme working conditions", which triggers data backhaul. This mode is equivalent to treating the driver's operation as the driving truth value, filtering out a large number of scenes that have been covered by the algorithm, and retaining a high-quality problem sample backhaul.

Of course, with the gradual rollout of the automatic driving function, how to collect useful information in the opening stage of the automatic driving function is also an important task. Tesla has designed a total of 221 triggers in the shadow mode and algorithm self-evaluation (as of 2021), and the trigger point information of its external publicity is shown as follows:

Do a data collation to see that the trigger mode can be divided into three major directions:

Algorithm performance self-evaluation: The most common way is to judge that there is at least one perceptual performance error through the perception difference between multiple sensors or front and back frames;

Differences in human-machine decision-making: When the driver's behavioral logic does not match the current perceived result, there is a probability that the perceived result does not meet the expectations;

Low-frequency scene collection: Combined with development experience, targeted triggering of some specified scenes

Simulation data generation

In the field of automatic driving, there is also an important source of data is to rely on simulation tools to generate, in order for the simulation platform of automatic driving to actually play the corresponding capabilities for automatic driving, it is necessary to have several core capabilities: real restoration test scene, efficient use of road data to generate simulation scenarios, cloud large-scale parallel acceleration, etc., so that simulation tests meet the automatic driving perception, decision planning and control of the closed loop of the full-stack algorithm. At present, including technology companies, car companies, autonomous driving solution providers, simulation software companies, universities and scientific research institutions and other entities are actively involved in the construction of virtual simulation platforms.

At present, the most widely used simulation platforms are: PreScan, Carmaker, CarSim, VIRES VTD, PTV, Vissim, TESS NG, CARLA, etc.; In the self-research, Waymo has the largest investment, compared to Tesla, Waymo does not have such a huge real data recovery mechanism, so most of its algorithm tests are based on the simulation platform, as of December 2021, Waymo's self-developed simulation test software Carcraft has simulated 17 billion kilometers of road scenarios and supported large-scale testing of Waymo models.

However, so far, the role of simulation data of open software on perception is mainly reflected in the pre-training of the network, which is not carried out for the time being, and then a separate investigation will be done when the relevant research is becoming more and more mature.

The main job after data collection is the truth value calibration, and the next chapter will talk about the pre-calibration of automatic driving data.

Resources

John Houston etc.

I know

"Nine Chapters of Wisdom Driving" Su Qingtao

Nirvana Cars

Andrej Karpathy(Tesla):CVPR2021 Workshop on Autonomous Vehicles

Frontiers of intelligent transportation technology

Sorted out from automotive electronics and software, the views in the text are only for sharing and exchange, do not represent the position of this public account, such as copyright and other issues, please inform, we will deal with it in a timely manner.

-- END --

A brief analysis of the development of autonomous driving perception

Read on