Is 3D point cloud recognition secure? The University of Michigan et al. proposed robustness analysis datasets to deal with severe distortion

Heart of the Machine column

Machine Heart Editorial Department

Researchers from the University of Michigan and other institutions have proposed a novel and comprehensive dataset ModelNet40-C to systematically test and further improve the robustness of point cloud recognition models for distortion.

3D point clouds are widely used in 3D recognition technology. Some special application areas often have higher requirements for the security of 3D point cloud recognition, such as autonomous driving, medical image processing, etc. Current academic research on point cloud security has focused on robustness against attacks. Natural distortion and perturbation are more common in the real world than adversarial attacks. However, there is no systematic study of the robustness of 3D point clouds for distortion.

Is 3D point cloud recognition secure? The University of Michigan et al. proposed robustness analysis datasets to deal with severe distortion

Address of the paper: https://arxiv.org/abs/2201.12296

Project Home: https://sites.google.com/umich.edu/modelnet40c

Open Source Github: https://github.com/jiachens/ModelNet40-C

Recently, researchers from the University of Michigan and other institutions have proposed a novel and comprehensive data set ModelNet40-C to systematically test and further improve the robustness of point cloud recognition models for distortion. ModelNet40-C contains 185,000 point cloud data from 15 different point cloud distortion types, each with 5 different severities. These point cloud distortions fall into three broad categories: density distortion, noise distortion, and transformation distortion.

Experiments have shown that the error rate of the current representative 3D point cloud recognition model (such as PointNet, PointNet++, DGCNN and PCT) on ModelNet40-C is more than 3 times higher than the error rate on the original ModelNet40 dataset, as shown in Figure 1 below. This proves that the point cloud depth model framework is still very susceptible to common distortions.

Figure 1. Deep point clouds identify error rates for representative models on modelNet40 and ModelNet40-C datasets.

Based on this finding, the study further did a lot of testing to explore the impact of different model architectures, data enhancements, and adaptive approaches on distortion robustness. Based on the experimental results, the researchers summarized several findings to help developers of 3D point cloud recognition technology design more robust models and training protocols. For example, researchers have found that Transformer-based point cloud recognition architectures have great advantages in improving the robustness of models for distortion; different types of data enhancement strategies have different advantages for various types of distortion; adaptive methods at testing have good robustness to some very serious distortions, and so on.

ModelNet40-C dataset build

Figure 2. ModelNet40-C dataset distortion type illustration.

Distortion robustness has gained a lot of attention in 2D images, where CIFAR-C and ImageNet-C build distortion datasets by simulating different weather, noise, and blur. However, the researchers found that the distortion of 3D point clouds is fundamentally different from 2D images because the data structure of point clouds is more flexible and irregular, for example, the number of points within a point cloud can be changed, and changes in the location of 3D point clouds can easily affect semantic information. The researchers propose 3 principles for building ModelNet40-C: 1) semantic invariance, 2) distortion realism, and 3) distortion diversity to ensure the quality of the data set.

The distortion of ModelNet40-C is divided into three categories: density distortion, noise distortion, and transformation distortion.

Density distortion includes "occlusion", "lidar (LiDAR)", "local density rise", "local density decrease", and "local absence", which simulate different characteristics of the point cloud density generated by different sensors in reality, for example, "occlusion" simulates that the sensor can only generate a part of the point cloud when scanning 3D objects due to angle limitations.

Noise distortion includes "uniform distribution", "Gaussian distribution", "pulse", "upsampling", and "background" noise, which simulate the digital noise and errors that are inevitable during sensor generation and program preprocessing in reality.

Transformation distortion includes Rotation, Mistangle, Free Deformation, Radial Basis Deformation, and Inverse Radial Basis Deformation, the first two of which simulate distortion when processing point cloud data in misalignment and when data is dynamically collected, and the latter three represent point cloud distortion produced by AR/VR games and generative models (GANs).

Figure 3. Average obfuscation matrix for 6 models on ModelNet40-C.

The researchers illustrate that these distortions are very common in point cloud applications and ensure that the resulting datasets remain semantically, as shown in Figure 2. Figure 3 shows the average confusion matrix of the six models on ModelNet40-C, with the diagonal weight still high, which also cross-validates the semantic invariance of ModelNet40-C.

Benchmarking results and analysis on ModelNet40-C

After building ModelNet40-C, the study underwent a large number of benchmarks, including different model architecture designs, different data enhancement methods, and experimental setups for different adaptive methods.

Comparison of different distortions and model architecture designs

Table 1. Error rates for different models on ModelNet40-C under standard training.

As shown in Table 1, the study was benchmarked on six models: PointNet, PointNet++, DGCNN, RSCNN, PCT, and SimpleView. The researchers summarized some of the findings: 1) "occlusion" and "lidar" caused extremely high error rates for point cloud recognition models. 2) Small angles of "rotation" will still greatly affect the performance of point cloud recognition. 3) "Background" and "pulse" noise present unexpected challenges for most models.

The researchers further reflected these findings in model design. 1) PointNet is robust to density distortion, but overall lacks performance. This is because PointNet only encodes global features and no local features, which has long been considered a major drawback of PointNet. However, density distortion is a loss of local features, which has a limited impact on PointNet, but this mechanism does cause PointNet to be very sensitive to other types of distortion. The researchers suggest that future uses of PointNet should consider application scenarios.

2) The clustering method of ball query is more robust to "background" and "pulse" noise. This is because spherical clustering limits the maximum cluster radius relative to kNN clustering, and this design helps the model remove the effects of far-flung outliers.

3) The Transformer-based point cloud recognition model is more robust to transformation distortion because the self-attention mechanism enables the model to learn more robust and comprehensive global features, and the Transformer architecture also achieves greater model capacity, making it more robust for global deformation distortion.

Comparison of different data enhancement methods

Table 2. Error rates for different models on ModelNet40-C under standard training.

As shown in Table 2, the study used PointCutMix-R, PointCutMix-K, PointMixup, RSMix, and Adversarial Training as five data-enhanced training modalities. The researchers found that: 1) Although the data enhancement scheme has limited improvement in model performance on clean data sets, it obviously improves the robustness of the model in the case of point cloud distortion. 2) No one data augmentation scheme can dominate all types of distortion.

The PointCutMix-R is robust to noise distortion because it randomly samples two different categories of point clouds and synthesizes them directly, so the resulting point cloud is an "overlap" of two already somewhat downsampled clouds, so that each downsampled point cloud is equivalent to noise distortion for the other half. So such a data enhancement mode can greatly improve the robustness of noise distortion.

PointMixup performs better on transformation distortion Because PointMixup pairs two different categories of point clouds at the smallest distance and "interpolates" the sample, the shape of the resulting point cloud is between the two types, which is close to the overall deformation in the transformation distortion, so it is more robust to the transformation distortion.

RSMix is robust to density distortion, and although the overall idea of RSMix is close to PointCutMix, it strictly stipulates rigid synthesis, that is, two different classes of point cloud sampling are still independent in 3D space, and there is no "superposition". Such synthesis is equivalent to two independent locally missing point clouds, so its robustness to density distortion is better.

Comparison of different adaptive methods

Table 3. Error rates for different models on ModelNet40-C under standard training.

For the first time, the study applied the adaptive method of testing to the task of point cloud identification. The researchers used BN and TENT methods to update the parameters of the model's BatchNorm Layer, and they found that: 1) the adaptive method during testing can steadily improve the robustness of the model, but overall it is not as good as the data enhancement; 2) the test adaptive method is surprisingly good for some difficult distortion types.

For example, on average, TENT helps achieve the strongest robustness under the distortion types of Occlusion (Error Rate = 47.6%), LiDAR (Error Rate = 54.1%), and Rotation (Error Rate = 19.8%), which are 6.7%, 1.9%, and 7.9% higher than the best data enhancement methods, respectively. This demonstrates the great potential of adaptive methods in improving the robustness of point cloud recognition distortion.

The researchers eventually combined PointCutMix-R, the best overall performance in the data enhancement, with the adaptive method TENT, and found that the PCT model based on the Transformer architecture achieved the best overall distortion robustness available (error rate = 13.9%). This finding also validates transformer's success in model robustness, in line with previous studies (Bai, Yutong, et al., 2021) on Transformer's conclusions on 2D images.

summary

The study presents a novel and comprehensive 3D point cloud recognition robustness analysis dataset ModelNet40-C. The researchers proposed and constructed 75 different types and degrees of distortion to simulate point cloud distortion and damage caused by physical limitations, sensor accuracy limitations, and processing in real-world scenarios. ModelNet40-C contains 185,000 different point cloud data.

Experiments show that the error rate of the current representative model on ModelNet40-C is ~3 times higher than that on the original ModelNet40 dataset. The study demonstrated the performance of different model architectures, different data enhancement strategies, and adaptive approaches on ModelNet40-C through extensive benchmarking and summarized useful findings to help the 3D point cloud community design more robust recognition models. We look forward to the ModelNet40-C dataset accelerating more future studies of point cloud recognition robustness!"

Is 3D point cloud recognition secure? The University of Michigan et al. proposed robustness analysis datasets to deal with severe distortion

Read on