CVPR'24 | NeRF新突破，启发式引导分割解决瞬态干扰

Source: 3D Vision Workshop

Add a small assistant: dddvision, note: direction + school/company + nickname, pull you into the group. At the end of the article, industry subdivisions are attached

论文题目：NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation

作者:Jiahao Chen, Yipeng Qin等

作者机构：Sun Yat-sen University ,Cardiff University 等

Paper link: https://arxiv.org/pdf/2403.17537.pdf

Code Connection: https://cnhaox.github.io/NeRF-HuGS/

This paper introduces a new method called NeRF-HuGS that aims to improve the performance of neural radiance fields (NeRF) in non-static scenes. NeRF is known for its excellence in new perspective compositing and 3D scene reconstruction, but has problems dealing with transient disturbances such as moving objects or shadows. To solve this problem, the researchers proposed the "Heuristic Guided Segmentation" (HuGS) paradigm, which significantly enhances the separation of static scenes from transient interferers by combining hand-crafted heuristics and state-of-the-art segmentation models. They devised elaborate heuristics, including a fusion of kinematic structure-based (SfM)-based heuristics and color residual heuristics, to accommodate a variety of texture profiles. Experiments show that this method is superior and robust in mitigating the transient interference of NeRF trained in non-static scenes.

This paper proposes a novel method to solve the transient interference problem common in NeRF training, which is of great significance for improving the applicability and robustness of NeRF models in the real world. By combining manual heuristics and semantic segmentation models, the method is able to accurately identify and segment transient interferences without any prior knowledge. The innovation of this approach lies in its combination of the advantages of different heuristics and models to achieve efficient handling of transient disturbances in complex scenes. Experimental results show that the proposed method has achieved significant improvement in perspective synthesis and segmentation tasks, and proves its potential in practical application. Overall, the method proposed in this paper provides an effective way to improve the performance of NeRF models in non-static scenes, and has certain enlightening significance for promoting research in the field of 3D scene reconstruction and perspective synthesis.

This paper introduces a new method called NeRF-HuGS to improve the performance of neural radiance fields (NeRF) in non-static scenes. NeRF is a technique that has made remarkable achievements in the synthesis of new perspectives, but is prone to undesirable artifacts when dealing with transient interferences such as moving objects or shadows. To solve this problem, this paper proposes a method called Heuristic Guided Segmentation (HuGS), which significantly enhances the separation of static scenes from transient interferers by combining the advantages of hand-crafted heuristics and state-of-the-art segmentation models. Specifically, the method is able to effectively identify static elements under various texture profiles by fusing kinematic structure-based heuristics and color residual heuristics. Experimental results show that the NeRF-HuGS method has excellent performance and robustness in mitigating transient interferences, and significantly improves the effect of NeRF training in non-static scenes.

The contribution to this article is:

A new paradigm called "heuristic guided segmentation" is proposed to improve NeRF trained in non-static scenes, which draws on the advantages of hand-crafted heuristics and state-of-the-art segmentation models to accurately distinguish between static scenes and transient interferences.
The heuristic design is studied in depth, and a seamless fusion of SfM-based heuristics and color residual heuristics is proposed to capture a wide range of static scene elements under various texture profiles, providing robust performance and superior results in mitigating transient interferences.
A large number of experimental results show that the proposed method produces clear and accurate static and transient separation results close to the ground reality, and significantly improves the NeRF trained in non-static scenes.

Firstly, it is pointed out that the accuracy of the static map Mi is crucial to the quality of training NeRF. In order to maximize the accuracy of Mi, a new method called Heuristic Guided Segmentation (HuGS) was employed. HuGS combines the advantages of hand-crafted heuristics and state-of-the-art segmentation models to identify coarse cues of static objects and produce clear and accurate object boundaries. In addition, the study conducted an in-depth analysis of the choice of heuristics, combining the SfM-based heuristic and the color residual heuristic in Nerfacto to capture the entire range of static scene elements under various texture profiles.

2.1 启发式引导分割(HuGS)

This section describes the method of Heuristic Guided Segmentation (HuGS). Existing solutions often use hand-crafted heuristics to distinguish between transient and static objects, but this approach has limitations when dealing with real-world diversity scenarios. To solve this problem, the HuGS method proposes a new framework that provides a rough hint of a static object by utilizing heuristics, and then uses a segmentation model to accurately generate a static map. Compared with the existing methods, the HuGS method can produce static maps with clear object boundaries, and even if the partially trained model is used as a heuristic, it can achieve good results. The success of this approach is based on the assumption that a rough but accurate hint of a static object is available.

2.2 Heuristic development

This section describes the approach to heuristic development. In order to provide a rough but accurate heuristic for static objects, a combination of two complementary heuristics, the SfM-based heuristic and the color residual heuristic in the partially trained Nerfacto [46], excelled at detecting static objects with high-frequency and low-frequency textures, respectively. The SfM-based heuristic uses SfM reconstruction to rely on the unique recognizable features between matching images, so it is suitable for detecting objects with high-frequency textures. To distinguish between static and transient objects, the heuristic treats transient objects as a minority relative to static objects and keeps changing their positions. However, unlike other methods, this method defines "few" as the frequency that occurs in the input image, which is consistent with the temporal meaning of "transient". This method is able to produce static maps with clear object boundaries, and can achieve good results even if a partially trained model is used as a heuristic. At the same time, in order to overcome the limitation that SfM-based heuristics may ignore low-frequency static objects, a comprehensive approach is proposed that combines the complementary advantages of another heuristic: the color residuals of the partially trained Nerfacto [46], which effectively identifies smooth transient objects but encounters difficulties in processing textured objects. This method provides a more precise heuristic by combining these two heuristics to capture the range of static scene elements under various texture profiles.

The experimental part mainly includes the experimental setup, evaluation of the baseline model and comparison with other methods, evaluation of the segmentation baseline model, and ablation studies.

Experimental Setup:

Three datasets were used: the Kubric dataset, the Distractor dataset, and the Phototourism dataset.
Implementation details include the use of COLMAP for SfM reconstruction, the use of SAM as the segmentation model, and the setting of thresholds and parameters.
Apply the method to two baseline NeRF models, Nerfacto and Mip-NeRF 360.

Evaluate perspective compositing:

The performance of the method with three other heuristic-based methods (NeRF-W, HA-NeRF, and RobustNeRF) and D2NeRF on the Kubric dataset was compared.
PSNR, SSIM and LPIPS were evaluated on the Kubric dataset, the Distractor dataset and the Phototourism dataset.
The results show that the proposed method achieves a significant improvement in PSNR, and at the same time achieves a good balance in ignoring transient interference and preserving static details.

Evaluation Segmentation:

- Comparisons with various existing segmentation models were performed on the Kubric dataset, including semantic segmentation, open set segmentation, and video segmentation.

The segmentation effect of the baseline NeRF model using the static map generated after full training was compared.
The results show that the existing segmentation models have limited performance on this specific task, and the heuristic-based method can roughly locate the transient interference, but cannot provide accurate segmentation results. And the method of combining heuristics and segmentation models can accurately segment transient interference and static scenes without any prior knowledge.

Ablation Studies:

Based on the Nerfacto model, the effects of different components of the method were studied and verified on two different datasets.
The results show that the best results are obtained by the complete method, which combines the SfM-based heuristic and the residual heuristic with the segmentation model.

In general, the experimental results show that the proposed method has achieved significant improvements in both perspective synthesis and segmentation, and has high effectiveness and robustness in dealing with transient interference in static scenes.

In this paper, a novel heuristic guided segmentation paradigm is proposed, which effectively solves the transient interference problem that is common in real-world NeRF training. By strategically combining the complementary advantages of manual heuristics and state-of-the-art semantic segmentation models, the authors' approach achieves highly accurate segmentation of transient interferences in a variety of scenarios without any prior knowledge. Through well-designed heuristics, the method in this paper is able to robustly capture both high-frequency and low-frequency static scene elements. A large number of experiments show that the proposed method is superior to the existing methods.

This article is only for academic sharing, if there is any infringement, please contact to delete the article.

3D Vision Workshop Exchange Group

At present, we have established multiple communities in the direction of 3D vision, including 2D computer vision, large models, industrial 3D vision, SLAM, autonomous driving, 3D reconstruction, drones, etc., and the subdivisions include:

2D Computer Vision: Image Classification/Segmentation, Target/Detection, Medical Imaging, GAN, OCR, 2D Defect Detection, Remote Sensing Mapping, Super-Resolution, Face Detection, Behavior Recognition, Model Quantification Pruning, Transfer Learning, Human Pose Estimation, etc

Large models: NLP, CV, ASR, generative adversarial models, reinforcement learning models, dialogue models, etc

Industrial 3D vision: camera calibration, stereo matching, 3D point cloud, structured light, robotic arm grasping, defect detection, 6D pose estimation, phase deflection, Halcon, photogrammetry, array camera, photometric stereo vision, etc.

SLAM: visual SLAM, laser SLAM, semantic SLAM, filtering algorithm, multi-sensor fusion, multi-sensor calibration, dynamic SLAM, MOT SLAM, NeRF SLAM, robot navigation, etc.

Autonomous driving: depth estimation, Transformer, millimeter-wave, lidar, visual camera sensors, multi-sensor calibration, multi-sensor fusion, autonomous driving integrated group, etc., 3D object detection, path planning, trajectory prediction, 3D point cloud segmentation, model deployment, lane line detection, Occupancy, target tracking, etc.

3D reconstruction: 3DGS, NeRF, multi-view geometry, OpenMVS, MVSNet, colmap, texture mapping, etc

Unmanned aerial vehicles: quadrotor modeling, unmanned aerial vehicle flight control, etc

In addition to these, there are also exchange groups such as job search, hardware selection, visual product landing, the latest papers, the latest 3D vision products, and 3D vision industry news

Add a small assistant: dddvision, note: research direction + school/company + nickname (such as 3D point cloud + Tsinghua + Little Strawberry), pull you into the group.

3D Vision Workshop Knowledge Planet

3DGS, NeRF, Structured Light, Phase Deflection, Robotic Arm Grabbing, Point Cloud Practice, Open3D, Defect Detection, BEV Perception, Occupancy, Transformer, Model Deployment, 3D Object Detection, Depth Estimation, Multi-Sensor Calibration, Planning and Control, UAV Simulation, 3D Vision C++, 3D Vision python, dToF, Camera Calibration, ROS2, Robot Control Planning, LeGo-LAOM, Multi-modal fusion SLAM, LOAM-SLAM, indoor and outdoor SLAM, VINS-Fusion, ORB-SLAM3, MVSNet 3D