laitimes

超越GaussianEditor | GScream:利用3D高斯高效、高质量物体移除

author:3D Vision Workshop

作者:Yuxin Wan |编辑:3DCV

Add WeChat: cv3d008, note: direction + unit + nickname, pull you into the group. At the end of the article, industry subdivisions are attached

超越GaussianEditor | GScream:利用3D高斯高效、高质量物体移除

This article introduces a new method called GScream for removing specified objects from a 3D scene. Based on 3D Gaussian sputtering (3DGS) representation, the method enhances the geometric consistency by introducing monocular depth estimation, and adopts a novel feature propagation mechanism to improve the texture consistency. Experiments show that this method not only improves the quality of the new perspective synthesis after removing the object, but also significantly improves the speed of training and rendering. Compared to traditional NeRF-based methods, GScream has shown significant improvements in efficiency and effectiveness.

标题:Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal

作者:Yuxin Wang等人

Unit: HKUST and other units

Thesis: https://arxiv.org/pdf/2404.13679.pdf

The main contributions of the GScream method include the following:

  • 3D Gaussian Splatting Application: For the first time, 3D Gaussian sputtering was applied to object removal tasks, and an efficient and high-quality object removal method was proposed.
  • Depth supervision: The introduction of monocular depth estimation as an additional geometric constraint improves the geometric accuracy of 3D Gaussian sputtering, thereby improving the geometric consistency of the removal region.
  • Regularization of cross-attention features: A cross-attention mechanism is proposed for information exchange between the visible and removed regions, which enhances the texture consistency of the removed regions.
  • Lightweight model: Scaffold-GS, a lightweight Gaussian sputtering model, is used as the base model to improve the training and rendering efficiency.
超越GaussianEditor | GScream:利用3D高斯高效、高质量物体移除

GScream

According to the paper, GScream is a framework that utilizes 3D Gaussian sputtering (3DGS) for target removal. The framework consists of two key components:

  1. Monocular depth-guided training: By introducing monocular depth estimation as an additional geometric constraint, the position of Gaussian sputtering is optimized and the geometric consistency is improved. The Online Depth Alignment and Supervision module utilizes an estimated depth map for supervision.
  2. Cross-attention feature regularization: Propagates information between 3D Gaussian clusters of visible and removed regions to improve texture consistency of removed regions. This includes 3D Gaussian sampling and a two-way cross-attention module.

These two components work together to improve the geometry and texture consistency of the removed area, resulting in a high-quality removal result. The GScream framework takes advantage of the efficient representation of 3DGS, which improves training and rendering speed.

4.1. Monocular depth guidance training

The specific steps are as follows:

  1. First, a monocular depth estimation model is used to extract the depth map of each image from the multi-view image. where corresponds to the depth map of the reference view.
  2. Then, an online depth alignment and supervised design is proposed to take advantage of depth guidance. Specifically, the following weighted depth losses are used:
超越GaussianEditor | GScream:利用3D高斯高效、高质量物体移除

where M' represents the weights of the different views. where w and q are the scale and translation parameters for online alignment, which are obtained by solving the least squares problem.

  1. In addition to this, the following loss function is employed:
超越GaussianEditor | GScream:利用3D高斯高效、高质量物体移除
  1. Finally, the multi-view color reconstruction loss is used to constrain the similarity of the rendered image to the real image:
超越GaussianEditor | GScream:利用3D高斯高效、高质量物体移除

By introducing monocular depth estimation as an additional geometric constraint, and adopting an online depth alignment and supervised design, the geometric consistency of 3DGS can be significantly improved, thus providing a more accurate geometric basis for subsequent texture propagation.

4.2. Regularization of cross-attention features

超越GaussianEditor | GScream:利用3D高斯高效、高质量物体移除

3D Gaussian Sampling: First, for each viewing angle, the authors sample a 3D Gaussian cluster containing the removed region and the surrounding area. The specific method is to project the 3D Gaussian centroid to the current viewing angle according to the 2D mask boundary, and find the 3D Gaussian cluster falling within the sampled 2D patch. These 3D Gaussian clusters can be divided into two groups, the removal region and the surrounding region, depending on whether their 2D projection is within the 2D mask. The authors' goal was to sample 3D points in the area of removal and the surrounding area.

Bidirectional Cross-Attention: Next, the authors perform bidirectional cross-attention on two sets of 3D Gaussian features to propagate information between them. The specific method is to stitch two sets of features into two tokens and input them into a two-way cross-attention structure. The structure contains cross-attention modules that share parameters and can propagate information in both directions. The output updated features are assigned back to the corresponding 3D Gaussian cluster.

With this two-way cross-attention design, the author can enhance the feature consistency between the removed area and the surrounding area, and improve the texture coherence of the rendered results. The authors take advantage of the explicitness of 3D Gaussian representation to improve the texture quality of the removed region through feature propagation.

Experimental setup: The authors performed object removal experiments on two datasets, SPIn-NeRF and IBRNet. Among them, the SPIn-NeRF dataset contains 10 scenes, 100 multi-view images and foreground object masks for each scene, and the IBRNet dataset contains 5 real mobile phone shooting scenes. In addition, the authors used baselines of 3 state-of-the-art methods for comparison, including SPIn-NeRF, OR-NeRF, and View-Sub. For the evaluation indicators, the authors calculated PSNR, SSIM, LPIPS, FID and other indicators, and recorded the training time to evaluate the efficiency of the method.

Comparison with the latest methodology: The authors demonstrated the results of the method with the latest baseline through quantitative and qualitative comparisons. The quantitative results showed that the authors' methods were better or equal to SPIn-NeRF and OR-NeRF in terms of PSNR, SSIM, LPIPS, FID and other indicators. Qualitative results show that the authors' method works better when completing more complex areas of removal. In addition, the training time of the authors' method is 1.5 times faster than SPIn-NeRF and 4 times faster than that of OR-NeRF.

Ablation Study: The authors performed ablation experiments with monocular depth supervision and cross-attention feature regularization. The results show that the removal of these modules leads to a decrease in the indicators, proving their effectiveness.

Additional Experiments: The authors also compared with GaussianEditor and performed ablation experiments using different depth estimation models and 2D repair models. The results show that accurate depth estimation and reasonable reference plots are very important for the results.

超越GaussianEditor | GScream:利用3D高斯高效、高质量物体移除
超越GaussianEditor | GScream:利用3D高斯高效、高质量物体移除
超越GaussianEditor | GScream:利用3D高斯高效、高质量物体移除
超越GaussianEditor | GScream:利用3D高斯高效、高质量物体移除

This article introduces a new method called GScream for efficiently removing specific objects from a 3D scene. The method uses 3D Gaussian sputtering to represent the scene, and improves the geometric consistency and texture coherence of the removed area through two key innovations. Firstly, the authors propose monocular depth-supervised training, which uses the depth information estimated by multi-view images to optimize the geometric representation of 3D Gaussian sputtering and improve the geometric consistency. Secondly, the authors propose cross-attention feature regularization, which uses the explicitness of 3D Gaussian representation to propagate feature information between the removed and visible regions to improve texture coherence. Experimental results show that the GScream method not only outperforms the existing methods based on NeRF, but also has significant improvement in training and rendering speed. This provides a new idea for efficient scene editing and content generation.

This article is only for academic sharing, if there is any infringement, please contact to delete the article.

3DCV technical exchange group

At present, we have established multiple communities in the direction of 3D vision, including 2D computer vision, large models, industrial 3D vision, SLAM, autonomous driving, 3D reconstruction, drones, etc., and the subdivisions include:

2D Computer Vision: Image Classification/Segmentation, Target/Detection, Medical Imaging, GAN, OCR, 2D Defect Detection, Remote Sensing Mapping, Super-Resolution, Face Detection, Behavior Recognition, Model Quantification Pruning, Transfer Learning, Human Pose Estimation, etc

Large models: NLP, CV, ASR, generative adversarial models, reinforcement learning models, dialogue models, etc

Industrial 3D vision: camera calibration, stereo matching, 3D point cloud, structured light, robotic arm grasping, defect detection, 6D pose estimation, phase deflection, Halcon, photogrammetry, array camera, photometric stereo vision, etc.

Slam:视觉Slam、激光Slam、语义Slam、滤波算法、多传感器融吇𴢆算法、多传感器标定、动态Slam、Mot Slam、Nerf Slam、机器人导航等.

Autonomous driving: depth estimation, Transformer, millimeter-wave, lidar, visual camera sensor, multi-sensor calibration, multi-sensor fusion, autonomous driving integrated group, etc., 3D object detection, path planning, trajectory prediction, 3D point cloud segmentation, model deployment, lane line detection, BEV perception, Occupancy, target tracking, end-to-end autonomous driving, etc.

三维重建:3DGS、NeRF、多视图几何、OpenMVS、MVSNet、colmap、纹理贴图等

Unmanned aerial vehicles: quadrotor modeling, unmanned aerial vehicle flight control, etc

In addition to these, there are also exchange groups such as job search, hardware selection, visual product landing, the latest papers, the latest 3D vision products, and 3D vision industry news

Add a small assistant: dddvision, note: research direction + school/company + nickname (such as 3D point cloud + Tsinghua + Little Strawberry), pull you into the group.

3D Vision Technology Planet

3DGS、NeRF、结构光、相位偏折术、机械臂抓取、点云实战、Open3D、缺陷检测、BEV感知、Occupancy、Transformer、模型部署、3D目标检测、深度估计、多传感器标定、规划与控制、无人机仿真、三维视觉C++、三维视觉python、dToF、相机标定、ROS2、机器人控制规划、LeGo-LAOM、多模态融合SLAM、LOAM-SLAM、室内室外SLAM、VINS-Fusion、ORB-SLAM3、MVSNet三维重建、colmap、线面结构光、硬件结构光扫描仪,无人机等。

Read on