The newly discovered encoder PointPillars is used to detect objects in point clouds, which is more efficient and accurate

Original | Text BFT robot

Technical background

In the past few years, deep learning techniques have made significant progress in object detection in the field of images, such as object detection algorithms such as Faster R-CNN, YOLO, and SSDs that can efficiently and accurately detect objects in images. However, object detection still faces many challenges when it comes to point cloud data (three-dimensional data acquired by sensors such as lidar).

Point cloud data differs from traditional image data in that they consist of a large number of discrete points, each of which contains information about the position of the object in three-dimensional space. Therefore, object detection in point cloud data requires solving some unique problems. For example, the density of point cloud data may vary depending on how close an object is to the sensor, and noise and occlusion may also affect the detection results. In addition, point cloud data often needs to be preprocessed for use in deep learning models.

In order to overcome these challenges, a new point cloud data encoder named PointPillars is proposed. PointPillars enables end-to-end training to learn object detection tasks directly from raw point cloud data. This encoder can transform point cloud data into a representation that makes it suitable for the input of deep learning models. PointPillars is designed with point cloud data in mind and its distribution in three-dimensional space.

Notably, this method achieved the best detection performance in the KITTI Challenge. The KITTI Challenge is a competition focused on the field of autonomous driving that evaluates the ability of different algorithms to detect, locate, and track objects such as vehicles in real-world scenarios. Thus, the success of PointPillars demonstrates its effectiveness and superiority in object detection in point cloud data.

Dissertation innovation

The innovation of this paper is the proposal of a new encoder, called PointPillars, which can train point cloud data end-to-end and can achieve higher detection performance than existing methods using only lidar data. PointPillars uses PointNets to encode point cloud data, organize the point cloud data into vertical columnar structures, and then use 2D convolutional neural networks for detection. Compared to existing methods, PointPillars has faster operation speed and higher detection accuracy. In addition, PointPillars does not need to manually adjust the vertical bins because it operates on a columnar structure, not on a voxel.

Introduction to algorithms

Touching on the algorithm section in this article, it is mainly divided into two key components: encoder and detector.

Encoder: Encoders utilize PointNets to process point cloud data to create a vertical columnar structure. Specifically, the encoder divides the point cloud data into vertically oriented cylindrical voxels, each containing a certain number of points. PointNets are then used to encode each column voxel to produce a fixed-length feature vector. These feature vectors are integrated into a matrix, which is used as input to the detector.

Detector: The detector uses a 2D convolutional neural network to detect encoded point cloud data. Specifically, the detector uses the Single Shot Detector (SSD) architecture for object detection. The SSD architecture uses a predefined set of anchor boxes to perceive objects, each representing a specific object size and aspect ratio. The detector classifies and regresses each anchor box through a convolutional neural network to determine whether each box contains an object, as well as determine the position and size of the object.

Overall, the main advantage of the PointPillars algorithm is that it can achieve better detection performance than existing methods using only lidar data. In addition, the algorithm is executed faster and the detection accuracy is higher. It is worth mentioning that PointPillars does not need to manually adjust the vertical box division, because its operation is carried out on a columnar structure, not at the voxel level. This makes the algorithm more adaptive.

Experimental discussion

The experimental part of this paper mainly uses the KITTI object detection benchmark dataset for experiments. This dataset contains lidar point cloud data and image data for object detection and tracking tasks. This article uses only lidar point cloud data for training and testing, and compares it with a fusion method that uses lidar and image data.

Experimental setup:

In this paper, the Adam optimizer is used to optimize the loss function, with an initial learning rate of 2e-4 and 0.8x attenuation every 15 epochs. Train 160 epochs, the batch size is 2, and the batch size of the validation set and the test set are 4 respectively. For experimental research, this paper divides the official training set into 3712 training samples and 3769 validation samples, and the test set contains 784 samples. This article uses the training set for model training, the validation set for model selection and tuning, and finally evaluation on the test set.

Experimental results:

The experimental results in this paper show that the PointPillars algorithm can achieve higher detection performance than existing methods when only using lidar data. In KITTI 3D and bird's-eye inspection benchmarks, the PointPillars algorithm significantly outperforms existing methods, with higher detection accuracy even when compared to methods that use lidar and image data fusion. In addition, the PointPillars algorithm also runs faster than existing methods, and can run at 62Hz, which is 2-4 times faster than existing methods. In a faster version, the PointPillars algorithm can match existing methods at 105Hz.

Conclusion and sharing

PointPillars is an object detection algorithm focused on point cloud data that achieves superior detection performance over existing methods using only lidar data. The algorithm not only has faster operation speed and higher detection accuracy, but also shows significant advantages in KITTI 3D object detection and bird's-eye view detection benchmarks. Even when compared to methods using lidar and image data fusion, the PointPillars algorithm still shows higher detection accuracy.

It is worth noting that the PointPillars algorithm not only achieves a faster running speed, reaching a rate of 62 frames per second, which is 2-4 times faster than existing methods, but also in faster variants, it can even reach a rate of 105 frames per second, comparable to current methods. Therefore, the PointPillars algorithm has shown great potential in the field of object detection in point cloud data. This research provides a promising solution for object detection in point cloud data.

Author | Coriander flower

Typesetting | Spring

Audit | Qiqi

If you have any questions about the content of this article, please contact us and we will respond in time. If you want to know more cutting-edge information, remember to like and follow~