laitimes

Only 3GB!2ms! two images can reconstruct the entire 3D Gaussian scene!

author:3D Vision Workshop

Source: 3D Vision Workshop

Add v: dddvision, note: 3D GS, and pull you into the group. At the end of the article, industry subdivisions are attached

0. Write on the front

Today, the author recommends a new work in the direction of 3D GS, pixelSplat, which can reconstruct the 3D radiation field parameterized by the 3D Gaussian primitives with two images and complete the synthesis of new perspectives.

Let's read about this work together~

1. Thesis information

标题:pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

作者:David Charatan, Sizhe Li, Andrea Tagliasacchi, Vincent Sitzmann

Institutions: Massachusetts Institute of Technology, Simon Fraser University, University of Toronto

Original link: https://arxiv.org/abs/2312.12337

Code link: https://github.com/dcharatan/pixelsplat

Official website: https://dcharatan.github.io/pixelsplat

2. Summary

We introduce pixelSplat, a feedforward model that learns to reconstruct a 3D radiated field parameterized by 3D Gaussian primitives from paired images. Our models feature real-time and memory-efficient rendering for scalable training and fast 3D reconstruction for inference. To overcome the sparse and local support representations inherent to local minima, we predict a dense probability distribution on 3D and sample Gaussian means from this probability distribution. We make the sampling operation differentiable by reparameterizing the technique, which allows us to represent the backpropagation gradient by a Gaussian distribution. Based on our benchmark of our wide-baseline new view compositing method on real-world RealEstate10k and ACID datasets, we outperformed the most advanced light field converters and improved rendering speed by 2.5 orders of magnitude while reconstructing interpretable and editable 3D radiant fields.

3. Effect display

Given a pair of input images, pixelSplat reconstructs the 3D radiation field parameterized by 3D Gaussian primitives. This results in explicit 3D representations that render in real time, remain editable, and are inexpensive to train.

Only 3GB!2ms! two images can reconstruct the entire 3D Gaussian scene!

Predicted 3D Gaussian plot (top) and corresponding depth map (bottom).

Only 3GB!2ms! two images can reconstruct the entire 3D Gaussian scene!

4. Major Contributions

The authors mainly compared with the following baselines:

Method of Du et al. (https://yilundu.github.io/wide_baseline/): A light field renderer designed for the synthesis of new views with a wide baseline.

GPNR: A light field converter that can only handle two input views.

pixelNeRF: A well-known NeRF-based approach that struggles to handle scene-scale datasets because it can't handle scale ambiguity.

5. How does it work?

Probabilistic prediction of pixel-aligned Gaussian distributions. For each pixel feature F[u] in the input feature map, the neural network F predicts the Gaussian element parameter σ and s. The Gaussian position μ and opacity α are not directly predicted, which will result in a local minimum. Conversely, f predicts the discrete probability distribution per pixel at depth pφ(z), which is sampled by φ. parameterization and then produces the position of Gaussian primitives. The opacity of each Gaussian is set to the probability of sampling the depth bucket. The final set of Gaussian primitives can then be rendered from the new view using the splatting algorithm proposed by Kerbl et al.

Only 3GB!2ms! two images can reconstruct the entire 3D Gaussian scene!

6. Comparison with other SOTA methods

Quantitative comparisons. pixelSplat outperforms all benchmark methods in terms of PSNR, LPIPS, and SSIM when compositing new views on real-world RealEstate10k and ACID datasets. In addition, pixelSplat requires less memory during inference and training, and renders images about 650 times faster than the second-fastest baseline. In the Memory column, Memory usage for a single scene and 256 × 256 rays is reported.

Only 3GB!2ms! two images can reconstruct the entire 3D Gaussian scene!

Qualitative comparison of the new views on the RealEstate10k (top) and ACID (bottom) test sets. Compared to baseline, pixelSplat not only produces more accurate and attractive images, but also better generalizes non-distributed examples.

Only 3GB!2ms! two images can reconstruct the entire 3D Gaussian scene!

7. Summary

This work introduces pixelSplat, an element-based parameterization method for reconstructing the 3D radiation field of a scene from only two images. When inferring, pixelSlat produces explicit 3D representations of the scene significantly faster than previous work on generalizable novel view composition. In order to solve the local minimum problem in element-based function regression, a new method of parameterizing the position of the element through dense probability distribution is introduced, and a new reparameterization technique of backpropagating the gradient into the distribution parameters is introduced.

Readers who are interested in more experimental results and details of the article can read the original paper~

Here is an introduction to the latest course of the 3D Vision Workshop, "New SLAM Algorithm Based on NeRF/Gaussian":

  • This course starts from both theory and code implementation, and takes you from scratch to learn the principles of NeRF/Gaussian Based SLAM, read papers, and sort out code.
  • At the theoretical level, starting from linear algebra to traditional computer graphics, we can understand the theoretical basis and source of modern 3D reconstruction.
  • At the code level, through a number of exercises, you will be taught to reproduce computer graphics and NeRF related work.

Read on