Basket Sharing| How do vehicles in autonomous driving use point cloud positioning?

Lidar sensors are capable of acquiring point cloud data of objects in rich, dense and precise three-dimensional space, which can help autonomous vehicles achieve positioning and tracking of obstacles, and lidar will also become the core sensor for achieving fully autonomous driving. This article will mainly introduce the latest research of 3D lidar in the field of automatic driving positioning, and analyze the effect of various methods of positioning.

introduce

The positioning of autonomous driving means being able to find the location and direction of the vehicle in a map. The map here is also taken using only lidar, using a laser beam to acquire the measured distance and produce point cloud data, where each point represents the coordinates of the surface (XYZ) of the object acquired by the sensor. The high-precision map based on point cloud can be built offline by lidar scanning, and the closed-loop construction map can also be realized by the odometer during navigation, which is the SLAM system.

Here first analyze the advantages and disadvantages of using lidar's point cloud data as a positioning, compared to images or other sensors.

Basket Sharing| How do vehicles in autonomous driving use point cloud positioning?

(Image from the Internet)

Lidar data can obtain richer and more accurate spatial information, which also makes the vehicle more advantageous in positioning.

Due to the continuous decline of lidar data, this makes sensors easier for the public to use and research, and it is gradually becoming accepted for automakers.

However, the use of 3D lidar as a positioning device usually has some problems, due to the huge amount of lidars data, so it is necessary to quickly process the output and ensure the real-time performance of the system, so ensuring the real-time positioning of the vehicle has certain challenges and difficulties. Therefore, it is often necessary to use downsampling or feature point extraction to efficiently simplify point cloud information.

We know that generating a odometer in a vehicle's real-time positioning system is an essential part, and in past studies, many methods of calculating a vehicle's odometer using lidar's point cloud data have been proposed, and there are three main categories of these methods:

(1) Registration method based on point cloud data [1]: This is a good offline method of building a high-precision map, which is too slow to be processed in real time, because the method considers all the points in the lidar point cloud data for registration, which can be summarized as a dense method.

(2) Method based on point cloud feature points: Inspired by the 2D image feature extraction and matching method [2,3,4], according to the extraction of feature points of 3D point clouds, calculate the displacement between successive frames, the accuracy and real-time processing of this method is still OK, but it is not robust enough for fast motion. This method only uses the feature points extracted from the point cloud to represent a frame of point cloud data for registration, which can be summarized as a sparse method.

(3) Deep learning methods based on point cloud data: The research of deep learning on the problem of determining the positioning of vehicles has been increasingly studied. In the [5,6,7,8] article 2D images were first used to predict and calculate the odometer, and the final positioning effect was acceptable. However, it cannot exceed the existing level of technology.

A lot of recent work is exploring the use of lidar point cloud data, and the results are very good. Next, we will introduce the comparison and test results of various point cloud positioning technologies.

3D lidar localization for autonomous vehicles

First review and discuss all the methods available in the literature, where 3D positioning of the vehicle can be achieved using only 3D LIDAR sensors. We divide the available methods into three categories (point cloud registration, 3D feature point matching method, and deep learning methods) and list them in the following table. And in the next reading, it will be introduced in detail.

1 3D point cloud registration method

Here is the main review of the positioning method based on the registration of 3D point cloud, the purpose of registration is to achieve a pair of point clouds can be aligned under the same coordinate system, so that the transformation of the point cloud between the two scans can be calculated, in the automatic driving positioning scenario, the registration method can be used in two ways:

(1) Locate the vehicle by registering the acquired scanned frame point cloud with a part of the pre-built high-precision point cloud map.

(2) The odometer information of the vehicle is clouded by the point obtained by continuous Lidar scanning.

Point cloud registration is mainly used in areas such as shape alignment and scene reconstruction, of which the iterative nearest point algorithm (ICP) is one of the most popular algorithms, optimizing the conversion between source and target point clouds by minimizing measurement errors between point cloud data in ICP. And in this research field there are a variety of ICP algorithm variants [47], the common variant algorithm is a bit to the line segment of the ICP [48], point-to-surface ICP [49] and the general ICP [10], ICP algorithm can be considered to be a classic algorithm to solve the point cloud registration, in the article [11] the point cloud registration and loopback detection and vehicle posture map optimization results together to reduce the cumulative error caused by continuous registration. In the paper [50] proposed a calculation of the odometer and the integration of radar sensor data features to improve the ICP algorithm, which is an ICP algorithm that inhibits point cloud matching by downsampling the point cloud and the geometric properties of the point cloud data, the author's odometer drift on the KITTI dataset dropped by 27%, but the ICP algorithm was eventually surpassed by the 3D normal distribution (NDT) algorithm [14] [51] The 3DNDT algorithm is actually an algorithm that extends the 2D NDT algorithm to three-dimensional space Similar to the ICP algorithm, the conversion of source and target point cloud quality inspection also needs to be iterated and optimized, but the error equation of this method optimization is not between the point pairs, but according to the mean and covariance of the points present in the pre-calculated voxels, NDT first converts the point cloud into a probability density function (PDF), and then combines the probability density function with the Gaussian Newtonian algorithm to optimize, finds the spatial transformation between the two point clouds, and proposes a pair of 3D in [52]. An extension of the NDT algorithm and named the probabilistic NDT algorithm, which attempts to solve the sparseness of the classical NDT algorithm. This method of probability, which no longer gives the number of points but the probability of the point, is able to obtain the transformation relationship between LIDAR data, but in autonomous driving, this method rarely meets the requirements of running calculations in real time. Therefore, auxiliary sensors, such as IMUs, are generally added as the initial positioning value. Three algorithms are proposed in the IMLS-SLAM[20] algorithm:

(1) The first is the deletion of the dynamic object, which is deleted by clustering the scanned frame point cloud data.

(2) Downsample the remaining point cloud where the dynamic obstacle is removed,

(3) Finally, the matching step is performed, which calculates and optimizes the transformation relationship by scanning the matching strategy into the model, using the implicit minimum movement method (IMLS).

Another popular processing method is to calculate the surfel (SURFace ELement) article [24] to build a point cloud's surel map, the built map and normal map can be used in the ICP algorithm to calculate the vehicle's odometer, and through the surfel to achieve loop detection and trajectory optimization. In the article [38], a lidar scan is converted into a line point cloud by following the following steps: Sampling line segments from between adjacent points of adjacent rings. These line point clouds are then aligned using an iterative approach: First, the center point of the resulting line is calculated. These points are then used to find transitions between continuous scans by finding the center of the line closest to the source point cloud in the target point cloud.

Other post-processing techniques are then used to improve accuracy, such as using previous transformations to predict and initialize the next pose estimation step. Sometimes, reducing the dimensionality of the LIDAR data can also produce reasonable results, such as in [40], projecting the incoming scan data onto a 2.5D mesh plot with occupancy rasters and heights.

2 Positioning method based on 3D features

3D point cloud features have [55] [56] [57] is a representative of the time and space consistent recognizable area of interest points, these feature points are usually used for 3D object detection using feature descriptors as the unique vector representation, and the descriptors can be used to match features in two different point clouds, by finding sufficient and consistent matches, and then using optimization methods to calculate the transformation relationship between the two scanned point clouds. So that it can build an odometer, in the article [12] the author proposed a study focusing on finding out the feature information to achieve accurate positioning in automatic driving, but because of the distribution of dot clusters due to different scenes, the feature points extracted by the method are also unstable, in the paper [16], the PoseMap method is proposed, the author believes that the continuity of the map is the key to achieving vehicle positioning, and pre-built point cloud HD map, and then aligned according to the overlapping threshold for secondary sampling, To generate a collection of environments that maintain a keyframed pose, these submaps can be updated independently of each other at different points in time, and then the positioning problem is solved by sliding the window by simply using two submaps closest to the current vehicle and minimizing the distance between the old map and the new feature.

There is also a paper [21] [22] using the geometry present in the environment of an autonomous vehicle as a positioning element, combining the plane extraction algorithm with the technique between frames to generate attitude estimation for the positioning of the vehicle, and comparing the results obtained by the ICP algorithm, the method of plane extraction and alignment has shown great improvement in accuracy and speed.

The current #1 method on kitti's odometer rankings,[25] first extracts planar and corner point features based on the smoothness and occlusion of points. These features are matched to point patches in subsequent scans, and then the LEVENBERG-Marquardt method is used to solve for LIDAR motions. As is often done in most SLAM processes, building maps in background threads at a slower frequency than the odometer estimate helps improve the final positioning results. An extension of the method is proposed in the article [26] to increase its speed and guarantee the real-time calculation of the odometer. The main improvement lies in making the most of the information on the ground by eliminating unreliable features and using the twice-levenberg Marquardt method to speed up the optimization steps. Nonetheless, one of the main legacy issues in the LOAM process is odometer drift due to cumulative errors. However, adding loopback detection algorithms to the process can solve this problem, as shown in [28] or [27].

3 Positioning method based on 3D point cloud deep learning

Deep learning methods are still relatively new research directions in terms of odometers and positioning, but after deep learning has proved valuable in the field of images, and methods like PointNet [60] and PointNet++ show that the use of deep learning will become more and more popular, and methods involving deep learning can try to use the original point cloud as input and use a single network to directly predict the displacement of the vehicle to solve this task in an end-to-end manner. In order to simplify the input of the deep learning network, the input of the deep learning network is not directly processed by the 3D point cloud but by projecting the LIDAR point cloud onto 2D space to generate a panoramic depth image, and then input it into the convolutional network to solve the rotation and translation between the two input frames, and the result obtained is lower than the standard, but it is indeed an exploration of the scheme of using deep learning to solve this task.

Depth images of panoramas are a common representation of lidar data, and another way to use depth images is for DeepPCO [17] to feed the panoramic depth maps generated by radar projections into two convolutional networks, which are used to calculate rotation and translation between two frames. In addition, two new 2D images are generated by projecting a radar point cloud into a spherical coordinate system, namely a fixed-point map (representing the location of each point (XYZ)) and a discovery map (representing the normal value of each point), which are entered into two networks, namely: VertexNet, which uses the fixed-point map as input to predict the conversion between consecutive frames, and NormalNet uses the normal map as input to predict the rotation between the two.

A solution called CAE-LO is proposed in [44] in which features are extracted from spherical projections of LIDAR data in a multiscale manner using unsupervised convolutional autoencoders. Additional autoencoders are used to generate feature descriptors and then match points using RANSAC-based frame-to-frame matching. Finally, the ICP algorithm is used to refine the odometer results.

In [29], the LORAX algorithm is proposed. This approach introduces the concept of superpoints, which are subsets of points located within a sphere and depict the local surface of a point cloud that are projected onto 2D space to form a 2D depth map. These depth maps are then filtered using a series of tests, leaving only the relevant superpoints and encoded using PCA and depth autoencoders. Then, before proceeding to the coarse registration step (which uses an iterative approach involving the RANSAC algorithm), select the candidates to match based on the Euclidean distance between the features. As a final step, fine-tuning is made using the ICP algorithm to improve the accuracy of the overall algorithm results.

The authors who proposed the SegMap method after integrating a series of papers[32],[31],[33],[34] explored how simple convolutional networks could be used to efficiently extract and encode fragments from point clouds for solving mapping and mapping-related tasks. The main contribution of this approach lies in its data-driven 3D fragment descriptor, which is extracted using a network consisting of a series of convoluted and fully connected layers. Train the descriptor extractor network using a two-part loss function: the classified loss and the reconstruction part. Eventually, the k-Nearest Neighbors (k-NN) algorithm was used to find the extracted fragments and their candidate counterparts, which made it possible to solve the localization task.

When trying to bring the motion back between two frames of point clouds, most of the methods discussed earlier inevitably suffer from dynamic objects in the scene (cars, pedestrians, etc.). Deleting dynamic objects in the scene is known to improve odometer calculations in most SLAM processes. However, detecting and then removing dynamic objects from the scene in a supervised manner introduces additional complexity, which can lead to longer processing times and unstable results. To solve this problem in an unsupervised way, the authors in the paper [37] propose to train the encoder-decoder branch for the task of dynamic mask prediction. This is done by optimizing the geometric consistency loss function, which describes the area where the normals of the point cloud data can model geometric consistency. A complete network called LO-Net can be trained in an end-to-end manner by combining geometric consistency loss, odometer regression loss, and cross-entropy loss for regularization.

Comparison of 3D deep learning localization methods on KITTI training datasets

Comparison of 3D positioning methods on KITTI test datasets.

Some deep learning methods do not use LIDAR directly to locate vehicles, but instead try to learn the wrong models in common processes. In other words, deep learning can be used to correct already available odometer calculations, resulting in powerful and flexible plug-in modules. The authors of the paper [39] recommend learning a bias correction term with the aim of improving the results of a state estimator with LIDAR data as input. The Gaussian model is used to model 6 ranging errors independently of each other, and its carefully selected input features are concentrated on the 3 degrees of freedom most affected by the error. In [41], a more advanced approach, called L3-Net, is proposed that can be associated with the bias correction problem, as the authors here propose a network that attempts to learn residual terms rather than predicting the conversion relationship between frames. The relevant features are first extracted and entered into miniPointNet to generate their corresponding feature descriptors. The residual term is then constructed and regularized using a 3D convolutional neural network. In addition, RNN branches are added to the network to ensure temporal smoothness of displacement predictions. The same author proposed a more complete and versatile L3-Net variant in [42],[43] and named it DeepICP. Here, features are extracted using PointNet++ and then filtered using a weighted layer that preserves only the most relevant features. Similar to the previous method, the feature descriptor is computed using the miniPointNet structure and then fed back to the corresponding point cloud generation layer, which generates the corresponding key points in the target point cloud. To return the final value of the transformation, two loss functions are combined and local similarity and global geometric constraints are encoded.

summary

We compare the previously referenced method based on the results previously reported on the KITTI odometer dataset [9], one of the most popular large datasets for outdoor odometer evaluation: it contains 22 sequence lidar scanners recorded with velodyne HDL-64E that have been pre-processed to compensate for the movement of the vehicle. The ground truth is available in 11 sequences and is obtained using an advanced GPS/INS system. Although LOAM still occupies the top spot in the KITTI rankings, it is clear that methods involving deep learning are becoming more and more accurate. For example, DeepICP's reported average results outperform any other approach proposed on the training dataset. However, it is difficult to classify them as "most advanced" methods for two main reasons:

(1) DeepICP reports that it takes about 2 seconds to register each pair of point clouds. This is too slow to be used on real self-driving cars running in real life.

(2) The results of these methods on the test dataset have not been reported. Good results on the test dataset will prove that these methods can be used in real-world scenarios, not just on data that deep neural networks are already seeing. Until then, LOAM and its variants remain the best choice and most trusted for true autonomous driving deployments.

In this article, most of the latest developments and findings in the field of 3D LIDAR positioning for autonomous vehicles are reviewed, analyzed, compared and discussed. A system that takes into account the use of the only sensor is 3D LIDAR, due to the growing importance of this sensor in today's most accurate sensing and positioning systems, in addition to its increased availability to the public and manufacturers. Comparing the KITTI odometer dataset from the paper, the following conclusions are drawn: Although the deep learning-based method shows good results and seems to represent future research directions, the method based on 3D feature detection and matching is still considered the best and effective solution due to its certain stability in real-world applications.

Reprinted from Yanzhi Intelligent Car, the views in the text are only for sharing and exchange, do not represent the position of this number, such as copyright and other issues, please inform, we will deal with it in a timely manner.

Basket Sharing| How do vehicles in autonomous driving use point cloud positioning?

Read on