laitimes

实时400FPS!高精NeRF/Gaussian SLAM定位与建图

author:3D Vision Workshop

What is SLAM?

SLAM, or Simultaneous Localization and Mapping Technology, allows robots, drones, and other automated systems to simultaneously self-locate and map their environment in unknown environments.

实时400FPS!高精NeRF/Gaussian SLAM定位与建图

为什么是NeRF-Based SLAM?

实时400FPS!高精NeRF/Gaussian SLAM定位与建图

Traditional CG reprojects the input image into a new view camera, using geometry for reprojection. In many cases, the traditional CG method can reconstruct the map quite well, but for the unknown area on the map, it is difficult to reconstruct and restore the map in 3D.

Deep learning has been used in reconstruction for a long time. Volumetric expression was proposed by Soft3D and subsequently emerged with deep learning techniques combined with Volumetric ray-marching, a geometry representation based on a continuous differentiable density field.

The introduction of Importance Sampling and Positional Encoding in the neural radiance field has significantly improved the quality of 3D reconstruction, while the NeRF neural rendering algorithm has greatly reduced the artifacts generated in traditional 3D reconstruction, and in most cases the results are better than traditional algorithms. The best image quality at the moment is Mip-NeRF360.

In addition, the integration of SLAM technology into deep learning makes it easier to unify all algorithms into a single framework, facilitates data transmission and communication between different algorithms, and facilitates the collaboration and cooperation of upstream and downstream brother departments. For example, the built map can be used for semantic annotation to be trained in BEV perception, or it can generate an Occupancy mesh and hand it over to the regulatory department for path planning and agent control.

实时400FPS!高精NeRF/Gaussian SLAM定位与建图

为什么是Gaussian-Based SLAM?

实时400FPS!高精NeRF/Gaussian SLAM定位与建图

The NeRF-based SLAM algorithm uses the global map and image reconstruction loss function to capture dense photometric information through differentiable rendering with high fidelity. But modeling a scene with Implicit Neural Representation leads to a number of problems:

  • The query process (which can be understood as ray rendering) requires a large number of samples, and the rendering method is expensive
  • Large-scale multi-layer MLP is used, which has a large amount of computation and occupies a lot of memory
  • Not easy to edit
  • Spatial geometry cannot be explicitly modeled
  • Leads to the problem of "forgetting".

SLAM technology is often deployed on robots, and performance is critical. A series of papers have been published to address the effect and performance of NeRF reconstruction, and SLAM based on 3D Gaussian radiated field has the following benefits:

  • Fast Rendering and Rich Optimization: Gaussian Splatting can render at up to 400 FPS, making it faster to visualize and optimize than implicit expressions.
  • Mapping with a clear spatial extent: The spatial boundaries of an existing map can be controlled by adding a Gaussian function to some of the previously observed scenes. Given a new image frame, we can identify which parts of the scene are new (outside the spatial boundaries of the map) by rendering the silhouette. This is important for the tracking task, as we only want to compare the part of the image that has already been built with the new image frame. Implicit expressions are not good, because when mapping and optimizing unknown regions, the global optimization will affect the neural network.
  • Explicit maps: We can increase the map capacity arbitrarily by adding more Gaussian functions. And this explicit expression allows us to edit certain parts of the scene while still allowing for realistic rendering. Implicit methods can't easily increase its capacity or edit the scene it represents.

About the Speaker

实时400FPS!高精NeRF/Gaussian SLAM定位与建图

Course outline

实时400FPS!高精NeRF/Gaussian SLAM定位与建图
实时400FPS!高精NeRF/Gaussian SLAM定位与建图

Course Highlights:

  • This course starts from both theory and code implementation, and takes you from scratch to learn the principles of NeRF/Gaussian Based SLAM, read papers, and sort out code.
  • At the theoretical level, starting from linear algebra to traditional computer graphics, we can understand the theoretical basis and source of modern 3D reconstruction.
  • At the code level, through a number of exercises, you will be taught to reproduce computer graphics and NeRF related work.

Harvest after school

  • 入门基于NeRF/Gaussian的SLAM领域
  • Learn how to quickly capture the key points and innovative points of a paper
  • How to quickly run through the code of a paper and grasp the idea of the paper in combination with the code
  • Parse the NeRF code line by line, grasp every implementation detail, and manually reproduce and improve it

Curriculum

  • System requirements: Linux
  • Programming language: Python
  • Basic requirements: Python and PyTorch foundation

Suitable for people

  • A novice who has no idea how to start with the open source code for a new paper
  • SLAM定位建图、NeRF三维重建小白
  • Those who are engaged in 3D reconstruction work can refer to it
  • Initial readers of NeRF papers
  • Students who are interested in SLAM and NeRF

Course Q&A

The Q&A of this course is mainly answered in the corresponding goose circle of this course, and students can ask questions in the goose circle at any time if they have any questions during the learning process.

▲Add the assistant cv3d007 to consult more

Note: Some of the above pictures and videos are from the Internet, if your rights and interests are violated, please contact to delete!