超越GaussianEditor | GScream：利用3D高斯高效、高质量物体移除

作者：Yuxin Wan |编辑：3DCV

Add WeChat: cv3d008, note: direction + unit + nickname, pull you into the group. At the end of the article, industry subdivisions are attached

超越GaussianEditor | GScream：利用3D高斯高效、高质量物体移除

This article introduces a new method called GScream for removing specified objects from a 3D scene. Based on 3D Gaussian sputtering (3DGS) representation, the method enhances the geometric consistency by introducing monocular depth estimation, and adopts a novel feature propagation mechanism to improve the texture consistency. Experiments show that this method not only improves the quality of the new perspective synthesis after removing the object, but also significantly improves the speed of training and rendering. Compared to traditional NeRF-based methods, GScream has shown significant improvements in efficiency and effectiveness.

标题：Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal

作者：Yuxin Wang等人

Unit: HKUST and other units

Thesis: https://arxiv.org/pdf/2404.13679.pdf

The main contributions of the GScream method include the following:

3D Gaussian Splatting Application: For the first time, 3D Gaussian sputtering was applied to object removal tasks, and an efficient and high-quality object removal method was proposed.
Depth supervision: The introduction of monocular depth estimation as an additional geometric constraint improves the geometric accuracy of 3D Gaussian sputtering, thereby improving the geometric consistency of the removal region.
Regularization of cross-attention features: A cross-attention mechanism is proposed for information exchange between the visible and removed regions, which enhances the texture consistency of the removed regions.
Lightweight model: Scaffold-GS, a lightweight Gaussian sputtering model, is used as the base model to improve the training and rendering efficiency.

GScream

According to the paper, GScream is a framework that utilizes 3D Gaussian sputtering (3DGS) for target removal. The framework consists of two key components:

Monocular depth-guided training: By introducing monocular depth estimation as an additional geometric constraint, the position of Gaussian sputtering is optimized and the geometric consistency is improved. The Online Depth Alignment and Supervision module utilizes an estimated depth map for supervision.
Cross-attention feature regularization: Propagates information between 3D Gaussian clusters of visible and removed regions to improve texture consistency of removed regions. This includes 3D Gaussian sampling and a two-way cross-attention module.

These two components work together to improve the geometry and texture consistency of the removed area, resulting in a high-quality removal result. The GScream framework takes advantage of the efficient representation of 3DGS, which improves training and rendering speed.

4.1. Monocular depth guidance training

The specific steps are as follows:

First, a monocular depth estimation model is used to extract the depth map of each image from the multi-view image. where corresponds to the depth map of the reference view.
Then, an online depth alignment and supervised design is proposed to take advantage of depth guidance. Specifically, the following weighted depth losses are used:

where M' represents the weights of the different views. where w and q are the scale and translation parameters for online alignment, which are obtained by solving the least squares problem.

In addition to this, the following loss function is employed:

Finally, the multi-view color reconstruction loss is used to constrain the similarity of the rendered image to the real image:

By introducing monocular depth estimation as an additional geometric constraint, and adopting an online depth alignment and supervised design, the geometric consistency of 3DGS can be significantly improved, thus providing a more accurate geometric basis for subsequent texture propagation.

4.2. Regularization of cross-attention features

3D Gaussian Sampling: First, for each viewing angle, the authors sample a 3D Gaussian cluster containing the removed region and the surrounding area. The specific method is to project the 3D Gaussian centroid to the current viewing angle according to the 2D mask boundary, and find the 3D Gaussian cluster falling within the sampled 2D patch. These 3D Gaussian clusters can be divided into two groups, the removal region and the surrounding region, depending on whether their 2D projection is within the 2D mask. The authors' goal was to sample 3D points in the area of removal and the surrounding area.

Bidirectional Cross-Attention: Next, the authors perform bidirectional cross-attention on two sets of 3D Gaussian features to propagate information between them. The specific method is to stitch two sets of features into two tokens and input them into a two-way cross-attention structure. The structure contains cross-attention modules that share parameters and can propagate information in both directions. The output updated features are assigned back to the corresponding 3D Gaussian cluster.

With this two-way cross-attention design, the author can enhance the feature consistency between the removed area and the surrounding area, and improve the texture coherence of the rendered results. The authors take advantage of the explicitness of 3D Gaussian representation to improve the texture quality of the removed region through feature propagation.

Experimental setup: The authors performed object removal experiments on two datasets, SPIn-NeRF and IBRNet. Among them, the SPIn-NeRF dataset contains 10 scenes, 100 multi-view images and foreground object masks for each scene, and the IBRNet dataset contains 5 real mobile phone shooting scenes. In addition, the authors used baselines of 3 state-of-the-art methods for comparison, including SPIn-NeRF, OR-NeRF, and View-Sub. For the evaluation indicators, the authors calculated PSNR, SSIM, LPIPS, FID and other indicators, and recorded the training time to evaluate the efficiency of the method.

Comparison with the latest methodology: The authors demonstrated the results of the method with the latest baseline through quantitative and qualitative comparisons. The quantitative results showed that the authors' methods were better or equal to SPIn-NeRF and OR-NeRF in terms of PSNR, SSIM, LPIPS, FID and other indicators. Qualitative results show that the authors' method works better when completing more complex areas of removal. In addition, the training time of the authors' method is 1.5 times faster than SPIn-NeRF and 4 times faster than that of OR-NeRF.

Ablation Study: The authors performed ablation experiments with monocular depth supervision and cross-attention feature regularization. The results show that the removal of these modules leads to a decrease in the indicators, proving their effectiveness.

Additional Experiments: The authors also compared with GaussianEditor and performed ablation experiments using different depth estimation models and 2D repair models. The results show that accurate depth estimation and reasonable reference plots are very important for the results.

This article introduces a new method called GScream for efficiently removing specific objects from a 3D scene. The method uses 3D Gaussian sputtering to represent the scene, and improves the geometric consistency and texture coherence of the removed area through two key innovations. Firstly, the authors propose monocular depth-supervised training, which uses the depth information estimated by multi-view images to optimize the geometric representation of 3D Gaussian sputtering and improve the geometric consistency. Secondly, the authors propose cross-attention feature regularization, which uses the explicitness of 3D Gaussian representation to propagate feature information between the removed and visible regions to improve texture coherence. Experimental results show that the GScream method not only outperforms the existing methods based on NeRF, but also has significant improvement in training and rendering speed. This provides a new idea for efficient scene editing and content generation.

This article is only for academic sharing, if there is any infringement, please contact to delete the article.

3DCV technical exchange group

At present, we have established multiple communities in the direction of 3D vision, including 2D computer vision, large models, industrial 3D vision, SLAM, autonomous driving, 3D reconstruction, drones, etc., and the subdivisions include:

2D Computer Vision: Image Classification/Segmentation, Target/Detection, Medical Imaging, GAN, OCR, 2D Defect Detection, Remote Sensing Mapping, Super-Resolution, Face Detection, Behavior Recognition, Model Quantification Pruning, Transfer Learning, Human Pose Estimation, etc

Large models: NLP, CV, ASR, generative adversarial models, reinforcement learning models, dialogue models, etc

Industrial 3D vision: camera calibration, stereo matching, 3D point cloud, structured light, robotic arm grasping, defect detection, 6D pose estimation, phase deflection, Halcon, photogrammetry, array camera, photometric stereo vision, etc.

Slam:视觉Slam、激光Slam、语义Slam、滤波算法、多传感器融吇𴢆算法、多传感器标定、动态Slam、Mot Slam、Nerf Slam、机器人导航等.

Autonomous driving: depth estimation, Transformer, millimeter-wave, lidar, visual camera sensor, multi-sensor calibration, multi-sensor fusion, autonomous driving integrated group, etc., 3D object detection, path planning, trajectory prediction, 3D point cloud segmentation, model deployment, lane line detection, BEV perception, Occupancy, target tracking, end-to-end autonomous driving, etc.

三维重建：3DGS、NeRF、多视图几何、OpenMVS、MVSNet、colmap、纹理贴图等

Unmanned aerial vehicles: quadrotor modeling, unmanned aerial vehicle flight control, etc

In addition to these, there are also exchange groups such as job search, hardware selection, visual product landing, the latest papers, the latest 3D vision products, and 3D vision industry news

Add a small assistant: dddvision, note: research direction + school/company + nickname (such as 3D point cloud + Tsinghua + Little Strawberry), pull you into the group.

3D Vision Technology Planet

3DGS、NeRF、结构光、相位偏折术、机械臂抓取、点云实战、Open3D、缺陷检测、BEV感知、Occupancy、Transformer、模型部署、3D目标检测、深度估计、多传感器标定、规划与控制、无人机仿真、三维视觉C++、三维视觉python、dToF、相机标定、ROS2、机器人控制规划、LeGo-LAOM、多模态融合SLAM、LOAM-SLAM、室内室外SLAM、VINS-Fusion、ORB-SLAM3、MVSNet三维重建、colmap、线面结构光、硬件结构光扫描仪，无人机等。

超越GaussianEditor | GScream：利用3D高斯高效、高质量物体移除

Read on

Top 10 mathematicians in the world 1. Newton, England 2. Gauss, Germany 3. Euler, France

Xie Saining's team broke through the Gaussian splash memory bottleneck and realized multi-graphics card training in a parallel scheme

When the three Yau are combined, can they be on an equal footing with Newton, Gauss, Euler, and Riemann?

Wang Yang: After breaking up with Jiang Xin, he turned his head to marry Gauss, who was 4 years younger, and now he has finally succeeded after all his hard work

Negative Han Wang Yang: I have been in love with Jiang Xin for many years, and I turned my head and chose Gauss of Wangfu

Wang Yang and Jiang Xin have been in love for many years without success, and then married a four-year-old wife Gauss, and now their careers are booming!

#分享我的话题荣誉#一条购物评语胜过头条半月的稿酬我的快件收到了, there was a reward note attached to it, and I did a good review as required and uploaded the picture

To measure whether the workpiece contains magnetic objects, the desktop Gaussmeter TD8650 can be used for automatic measurement

The seminar on the research and application of 3D Gaussian and light field technology was successfully held

"420,200 Gauss! It's a record-breaking! ”

420,200 gauss! It's a record-breaking!

CNCC | At the end of the 3D reconstruction is Gauss? Advances in the construction and mapping of three-dimensional Gaussian expressions

The magnetic flux density of a permanent magnet can be measured with a benchtop Gaussmeter TD8650

He and Jiang Xin have been in love for many years, but they married Gauss, who is 4 years younger, and now his wife is very popular with him

Wang Yang: Although I have a relationship with Jiang Xin, I will not live up to Gauss, who has been waiting for me for 6 years, for the rest of my life

The well-known referee made a fatal mistake that sent Wawrinka out of the tournament with a grievance, and Kyrgios: He should have been sacked a long time ago