An article to understand the architecture and key technologies of the perception system of autonomous vehicles

2022-05-04 08:22:32

--Follow reply "40429"--

--Collect the "Automotive Driving Automation Classification" (GB/T 40429-2021)--

Basic introduction to perceptual systems

The Perception system is a system that uses data from multiple sensors and information from high-precision maps as input, and after a series of calculations and processing, accurately perceives the surrounding environment of autonomous vehicles.

It provides a wealth of information for downstream modules, including information on the location, shape, category, and speed of obstacles, as well as semantic understanding of special scenarios (e.g., construction areas, traffic lights, and traffic signs).

The composition of the perceptual system and subsystems

Sensors: It involves the installation of the sensor, field of view, detection distance, data throughput, calibration accuracy, time synchronization, etc. Because autonomous driving uses more sensors, time-synchronized solutions are critical.

Object detection and classification: In order to ensure the safety of autonomous driving, the perception system needs to achieve an approximately 100% recall rate and a very high accuracy rate. Object detection and classification often involves deep learning work, including object detection on 3D point clouds and 2D Images (images) and deep multi-sensor integration.

Multi-objective tracking: Follow up with multi-frame information to calculate and predict the trajectory of obstacles.

Scene understanding: including traffic lights, street signs, construction areas, and special categories, such as school buses, police cars.

Machine learning distributed training infrastructure and related evaluation systems

Data: A large amount of labeling data, including 3D point cloud data and 2D picture data.

Sensor details

At present, the sensors in autonomous driving applications are mainly divided into three categories: liDAR (LiDAR), camera (Camera), and millimeter wave radar (Radar).

An article to understand the architecture and key technologies of the perception system of autonomous vehicles

At the beginning, the input of the perception system is a variety of sensor data and high-precision maps, and the above figure shows the output results of the object detection of the perception system, that is, it can detect obstacles around the vehicle, such as vehicles, pedestrians, bicycles, etc., and at the same time, combined with the high-precision map, the perception system will also output the surrounding Background (environmental background) information.

As shown in the image above, the green block represents a passenger car, the orange color represents a motorcycle, the yellow represents a pedestrian, and the gray color represents detected environmental information, such as vegetation.

The perception system combines multi-frame information (above) to accurately output the speed, direction, trajectory prediction, and so on of moving pedestrians and vehicles.

Sensor configurations are deeply integrated with multiple sensors

After understanding the general introduction of the perception system from input to output, next, I briefly introduce the sensor installation scheme of PonyAlpha, the third generation of the autonomous driving system of Xiaoma Zhixing, and the solution of multi-sensor deep integration.

Sensor mounting scenarios

At present, the sensing distance of the PonyAlpha sensor installation scheme can cover 360 degrees of vehicle circumference and within 200 meters.

Specifically, this solution uses 3 lidars, on the top and sides of the car. At the same time, multiple wide-angle cameras are used to cover a 360-degree field of view. In terms of distant field of view, forward-facing millimeter-wave radar and telephoto cameras extend the perceptual distance to a range of 200 meters, allowing them to detect information about objects at greater distances. This sensor configuration can ensure that our autonomous vehicles can drive autonomously in residential areas, commercial areas, industrial areas and other scenarios.

Multi-sensor deep integration solution

The basis for deep multi-sensor integration

The solution of multi-sensor deep integration is first of all to calibrate the data of different sensors into the same coordinate system, including the internal reference calibration of the camera, the external parameter calibration of the laser ray to the camera, the external parameter calibration of the mmwave ray to the GPS, and so on.

An important prerequisite for sensor fusion is to achieve an extremely high level of calibration accuracy, which is a necessary foundation for both sensor fusion at the outcome level and sensor fusion at the metadata level.

As you can see from the image above, our perception system accurately projects a 3D laser point cloud onto the image, and the accuracy of the sensor calibration is high enough.

Calibration schemes for different sensors

The entire sensor calibration work has been largely automated.

The first is the calibration of the camera internal reference (above), which is to correct for image distortion caused by the camera's own characteristics. The calibration platform for camera internal references enables each camera to calibrate the sensor in two to three minutes.

The second is the external parameter calibration of lidar and GPS/IMU (above), the original data of lidar is based on the radar coordinate system, so we need to convert the point from the radar coordinate system to the world coordinate system, which involves the calculation of the relative position relationship between lidar and GPS/IMU. Our calibration tools are able to quickly find the optimal position relationship outdoors with optimized solutions.

The third is camera-to-lidar fusion (above). The perceived environment of lidar is a 360-degree rotation, each rotation is 100 milliseconds, and the camera is a certain instantaneous exposure, in order to ensure that the exposure of the camera is synchronized with the rotation of the lidar, it is necessary to synchronize the time of the two, that is, to trigger the camera exposure through Lidar. For example, the position information of the lidar can be used to trigger the exposure time of the camera at the corresponding position to achieve accurate synchronization between the camera and the lidar.

3D (lidar) and 2D (camera) complement each other, and a better fusion of the two allows for a more accurate output of perception.

Finally, the calibration of millimeter-wave radar (Radar) and GPS/IMU (above), which also converts Radar data from the Local (local) coordinate system to the world coordinate system, and we will calculate the relative position relationship between Radar and GPS/IMU through the real 3D environment. Good calibration results can ensure that the perception system gives lane information (such as being located in the lane or pressing the lane line) of the obstacle vehicle within 200 meters.

The demo video below succinctly and vividly shows the partial processing effect of the deep integration of multiple sensors.

In-vehicle perception system architecture

So what does the in-vehicle perception system architecture look like? What is its solution?

The diagram above shows the architecture of the entire in-vehicle perception system. First of all, the data of the three sensors of lidar, camera and millimeter-wave radar must be synchronized in time, and all the time errors are controlled in the millisecond level. Combined with the sensor data, the perception system performs detection( detection, segmentation, classification) and other calculations based on frame-based (frame-based), and finally uses multi-frame information for multi-target tracking and outputs the relevant results. This process will involve the technical details related to multi-sensor deep integration and deep learning, and I will not discuss too much here.

The solution of the perceptual system should guarantee the following five points

◆ The first is safety, to ensure nearly 100% detection (Recall).

■ Precision requirements are very high, if below a certain threshold, resulting in False Positive (false alarm), will cause the vehicle to drive very uncomfortable in the automatic driving state.

◆ Try to output all the information that is helpful to the driving, including road signs, traffic lights and other scene understanding information.

◆ Ensure the efficient operation of the perception system, and be able to process a large amount of sensor data in near real time.

Scalability is also important. Deep learning relies on a lot of data, and the generalization ability of its trained model is very important for perceptual systems. In the future, we hope that models and new algorithms will have the ability to adapt to the road conditions of more cities and countries.

The challenge of perceiving technology

The challenge of balancing perceived accuracy with recall

The image above shows the busy scene of the intersection during the evening rush hour, when a large number of pedestrians and motorcycles cross the intersection.

Through the 3D point cloud data (above), it is possible to see the corresponding perceived raw data at this time.

The challenge here is that, after computational processing, the perception system needs to output the correct segmentation results and obstacle categories for all obstacles in such an environment.

In addition to busy intersections, the perception system also faces a lot of challenges in dealing with some special or bad weather conditions.

Sudden rainstorms or prolonged rains often cause water in the road area, and vehicles will naturally splash water as they pass by. The white dot cloud in the video above shows the results of lidar detecting splashes of water from other vehicles passing by and filtering them. If the perception system cannot accurately identify and filter the splash, this will cause trouble for autonomous driving. Combining lidar and lidar & Camera data, our perception system has a high recognition rate for splashes.

Long-tail scene challenge

Sprinkler truck

The picture above is two types of sprinklers we have encountered during road testing (above). The sprinkler on the left uses a fog cannon spraying upwards, while the right is a sprinkler truck spraying on both sides.

When a human driver encounters a sprinkler, it is easy to make a judgment and surpass the sprinkler, but for the perception system, it takes a certain amount of time to process and identify such scenes and vehicles, and our automatic driving has obtained a better riding experience in encountering similar scenes.

Detection of small objects

The significance of small object detection is that in the face of unexpected road test events, such as stray kittens and puppies suddenly appearing on the road, the perception system can have an accurate recall of such small objects to ensure the safety of small lives.

traffic light

As more and more regions and countries carry out autonomous driving road tests, perception systems will always encounter new long-tail scenarios when processing traffic lights.

For example, the problem of backlighting (above) or the problem of camera exposure after suddenly driving out of the bridge hole, we can solve the problem by dynamically adjusting the exposure of the camera.

There is also a traffic light countdown scene (above), the perception system can recognize the countdown number, so that the autonomous vehicle can give a better planning decision to respond to when encountering yellow lights/ before, and optimize the ride experience.

On rainy days, the camera (camera) is densely beaded with water (above), and the perception system needs to process the scene in such special climatic conditions and accurately identify traffic lights.

Traffic lights used in some regions have progress bars (above) that require the perception system to be able to recognize changes in the progress bars, which can help downstream planning decision modules slow down in advance when the green light will turn yellow.

Reproduced from Zhiche Technology, the views in the text are only for sharing and exchange, and do not represent the position of this public account, such as copyright and other issues, please inform, we will deal with it in a timely manner.

-- END --

An article to understand the architecture and key technologies of the perception system of autonomous vehicles

Read on