laitimes

Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy

author:3D Vision Workshop

作者:Jeongmin Bae | 编辑3DCV

Add WeChat: dddvision, note: 3DGS, pull you into the group. At the end of the article, industry subdivisions are attached

Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy

题目:Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting

作者:Jeongmin Bae等

Code: https://jeongminb.github.io/e-d3dgs/

Link to paper: https://arxiv.org/abs/2404.03613

1 Introduction

Because 3DGS provides fast, high-quality, novel view compositing, deforming the canonical 3DGS into multiple frames is a natural extension. However, previous works have failed to accurately reconstruct the dynamic scene, in particular, 1) the static part moves along the nearby dynamic part, and 2) some dynamic areas are blurred. We attribute the failure to the misdesign of the deformation field, which is constructed as a coordinate-based function. This approach is problematic because 3DGS is a hybrid of multiple fields centered on Gauss, not just a coordinate-based framework. To solve this problem, we define the deformation as a function of per-Gaussian embedding and time embedding. In addition, we decompose the deformation into coarse and fine deformations to simulate slow and fast motion, respectively. In addition, we have introduced an efficient training strategy for faster convergence and higher quality.

2 Major contributions

  1. Each Gaussian is assigned a latent embedding that is used to predict its deformation;
  2. Decompose temporal variations into coarse and fine deformations, and model slow and fast motion in the scene, respectively;
  3. An effective training strategy is proposed, including camera sampling, frame sampling, and multi-view DSSIM loss, to improve the convergence speed and performance.

3 Methods

Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy

Figure 2: Frame Existing field-based deformation methods suffer from entanglement between nearby Gaussians when mapping changes in deformation parameters from Gaussian coordinates. To solve this problem, we define deformations based on each Gauss. First, we assign a latent embedding to each Gaussian. In addition, we've introduced coarse and fine time embeddings to represent the slow and fast states of dynamic scenes. By using two decoders that input the latent embedding and coarse-time embedding for each Gaussian, and the latent embedding and fine-time embedding for each Gaussian, we estimate the slow or large changes and the fast or detailed changes, respectively, to model the final deformation.

As shown in the figure, the method proposed by the author includes the following key points:

Embedding-Based Deformation: The authors assign a 32-dimensional latent embedding to each Gaussian function to represent the deformation of the Gaussian function over time. At the same time, a 256-dimensional time embedding is introduced, representing the state of different frames. By taking the embedding of each Gaussian function and the time embedding of the corresponding frame as inputs, the parameter changes of each Gaussian function are predicted, so as to realize the embedding-based deformation.

Coarse-Fine Deformation: The author decomposes the time variation into coarse deformation and fine deformation. Coarse deformation is responsible for representing large or slow motion in the scene, while fine deformation learns fast or detailed motion that coarse deformation can't cover. This scheme is achieved by two deformation networks with different temporal resolutions.

Efficient Training Strategy: In order to accelerate convergence, the authors propose an efficient training strategy, including uniform coverage of multi-view frames, error-based sampling of difficult training frames, and induced high-smediness through multi-view DSSIM loss.

4 Experiments

According to the content of the document, the [4 Experiment] section mainly introduces the experimental verification and analysis carried out by the authors, including the following aspects:

Experimental validation: The authors performed validation experiments on Neural 3D Video, Technicolor Light Field, and HyperNeRF datasets, and selected corresponding benchmarks for comparison. Experimental results show that the proposed method is significantly better than the benchmark method in terms of dynamic region reconstruction quality, detail capture and computational efficiency.

Analysis: The authors analyze and visualize the role of coarse and fine deformation networks. The results show that the coarse deformation is responsible for capturing large movements, while the fine deformation is responsible for capturing small movements. At the same time, the authors conducted an ablation study, and the results showed that removing either coarsely deformed or finely deformed networks would reduce performance.

Efficient Training Strategy: The authors analyze the role of camera sampling, error-based frame sampling, and multi-view DSSIM loss in efficient training strategies. Experimental results show that these strategies can accelerate convergence and improve performance.

Limitations: The authors discuss the limitations of the method, such as the blurring of fast-moving regions, and suggest directions for improvement.

Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy
Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy
Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy
Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy
Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy
Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy
Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy
Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy
Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy
Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy

5 Summary

This paper proposes a Gaussian embedding-based deformation method for dynamic three-dimensional Gaussian sputtering (3DGS). Experimental results show that the proposed method performs well under dynamic region reconstruction, detail capture, and challenging camera settings, and outperforms multiple baseline methods.

This article is only for academic sharing, if there is any infringement, please contact to delete the article.

Here I recommend the new course "New SLAM Algorithm Based on NeRF/Gaussian 3D Reconstruction" launched by the 3D Vision Workshop and Gigi

About the Speaker

Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy

Course outline

Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy
Beyond the benchmark | A 3D Gaussian sputtering method based on each Gaussian deformation and its efficient training strategy

Course Highlights:

  • This course starts from both theory and code implementation, and takes you from scratch to learn the principles of NeRF/Gaussian Based SLAM, read papers, and sort out code.
  • At the theoretical level, starting from linear algebra to traditional computer graphics, we can understand the theoretical basis and source of modern 3D reconstruction.
  • At the code level, through a number of exercises, you will be taught to reproduce computer graphics and NeRF related work.

Harvest after school

  • Getting started in the field of SLAM based on NeRF/Gaussian
  • Learn how to quickly capture the key points and innovative points of a paper
  • How to quickly run through the code of a paper and grasp the idea of the paper in combination with the code
  • Parse the NeRF code line by line, grasp every implementation detail, and manually reproduce and improve it

Curriculum

  • System requirements: Linux
  • Programming language: Python
  • Basic requirements: Python and PyTorch foundation

Suitable for people

  • A novice who has no idea how to start with the open source code for a new paper
  • SLAM定位建图、NeRF三维重建小白
  • Those who are engaged in 3D reconstruction work can refer to it
  • Initial readers of NeRF papers
  • Students who are interested in SLAM and NeRF

Start time

On Saturday, February 24, 2024 at 8 p.m., there will be one chapter updated weekly.

Course Q&A

The Q&A of this course is mainly answered in the corresponding goose circle of this course, and students can ask questions in the goose circle at any time if they have any questions during the learning process.

▲Add a small assistant: cv3d007, consult more

Note: Some of the above pictures and videos are from the Internet, if your rights and interests are violated, please contact to delete!

Read on