New work of USTC | Localization and reconstruction of 3D Gauss-based multimodal fusion in unbounded scenes

Author: Chenyang Wu|Editor: Computer Vision Workshop

Add assistant: dddvision, note: 3D object detection, pull you into the group. At the end of the article, industry subdivisions are attached

标题：MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scene

Link: https://arxiv.org/pdf/2404.04026.pdf

1. Introduction

In this paper, we introduce a multi-sensor fusion SLAM method called MM-Gaussian, which aims to achieve localization and reconstruction in unbounded scenarios. In this method, Livox solid-state lidar and camera are used to obtain scene data, and 3D Gaussian point clouds are used to construct maps and render high-quality images at the same time. In this paper, four main modules are elaborated, including tracking, relocation, map expansion and update, and a relocation module is specially designed to correct the trajectory offset caused by positioning failure. Experimental results show that the proposed method is superior to the SLAM method based on 3D Gaussian points in terms of localization and mapping. In general, this paper realizes high-precision positioning and map construction in unbounded scenarios through multi-sensor fusion, which has strong robustness.

2. What are the main components of MM-Gaussian?

According to the documentation, the MM-Gaussian system consists of the following four main components:

Tracking: Utilize a point cloud registration algorithm to obtain an initial pose estimate and optimize the pose estimate by comparing the rendered image with the actual image.

Relocalization: Detects tracking failures and leverages the ability of 3D Gaussian rendered images to reposition the pose to the correct trajectory.

Map Expansion: Converts the point cloud of the current frame into 3D Gaussian points and adds them to the map to expand the map.

Map Updating: Optimize the properties of 3D Gaussian points with image keyframe sequences for better rendering.

3. How does the relocation module in the MM-Gaussian system work?

The main workflow is as follows:

Tracking Failure Detection: By calculating the loss function for each frame and comparing it with a preset threshold θfail. When the loss exceeds the threshold, the system enters the tracking failure state.

Get reference pose: When tracking fails, the system will fall back to the previous m frame and obtain the camera pose of this m frame as the reference pose.

"look-around" operation: fix the translation part of the reference pose, and perform uniform sampling on the rotation part to generate n new poses.

Render reference pose images: Render the corresponding RGB, depth, and contour images based on n new poses.

Feature extraction and matching: SuperPoint is used to extract the features of the current frame, and match the features with n rendered RGB images, and select the image with the largest number of matching points and exceeding the threshold θfeature as the candidate.

PnP solving: Based on the camera pose of the candidate image, the rendered depth map is back-projected into 3D space, and the feature correspondence is used to solve the camera pose of the current frame.

Re-render and evaluate: Re-render RGB, depth, and contour images based on the calculated current frame pose, and calculate the loss. If the loss is lower than the threshold θfail, the relocation is considered successful.

Resume tracking and mapping: The tracking, map expansion, and update modules resume work, discarding data during tracking failures.

In summary, the relocation module finds the correct trajectory through the reference pose and look-around operations, and uses the rendered image and feature matching to recover the tracking, so as to improve the robustness of the system.

4. What is the goal of the mapping phase in the MM-Gaussian system?

In the MM-Gaussian system, the goal of the mapping phase is to update the properties of the 3D Gaussian points to achieve a more realistic rendering effect. Specifically, during the mapping phase, the system performs the following operations:

Select Keyframes: Select k-2 keyframes from the keyframe sequence that are most relevant to the current frame, as well as the current frame and the latest keyframe, for optimization.

Render Image: Renders an RGB image based on the selected keyframe pose.

Calculate the loss function: Calculates the loss function between the rendered image and the original image based on the original image input.

Optimize 3D Gaussian properties: Use optimization algorithms such as the Adam optimizer to optimize the color, opacity, and other properties of 3D Gaussian points through gradient descent.

Remove invalid Gaussian points: After the optimization is completed, remove invalid Gaussian points that are too low in transparency or too large in radius.

Refine Surface Details: Generate new Gaussian points by duplicating Gaussian points based on gradients to refine the detailed representation of the object's surface.

Through the above operations, the mapping phase aims to continuously optimize the properties of the 3D Gaussian points to achieve high-quality image rendering.

5. Experiments

It mainly includes the following:

Experimental setup: The authors used a data acquisition device consisting of Livox AVIA lidar and MV-CS050-10UC camera, and collected 9 datasets in the campus scene. All experiments were performed using the parameter settings in Table III. In addition, the authors used the R3LIVE system to obtain preliminary attitude truth, and further optimization was performed using HBAs to evaluate the proposed repositioning module.

Evaluation index: The authors used the root mean square error of absolute trajectory error (ATE RMSE) as the evaluation index in the tracking stage, and the peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) and LPIPS as the evaluation index in the mapping stage.

Comparison of results: In the tracking phase, the authors compared the proposed method with methods such as SplaTAM, MonoGS and NeRF-LOAM. In the mapping phase, the authors compared the proposed method with SplaTAM, MonoGS, 3D Gaussian Splatting, etc. The results show that the proposed method achieves the best mapping results in all sequences.

Qualitative Comparison of Results: The authors also qualitatively compared the proposed method with SplaTAM, and the results showed that the proposed method can render a clearer image and better represent the details of the object's surface.

Effect of the repositioning module: The authors also conducted an Ablation study of the repositioning module, and the results showed that the proposed repositioning module successfully restored the attitude to the correct trajectory.

6. Conclusion

Localization and mapping are mission-critical for a variety of applications, such as autonomous vehicles and robotics. Due to the infinite nature of the outdoor environment, the challenges posed by the outdoor environment are particularly complex. In this work, we propose MM-Gaussian, a multimodal fusion system for LiDAR cameras for localization and mapping in unbounded scenes. Our approach is inspired by the recently developed 3D Gaussian, which exhibits an extraordinary ability to achieve high render quality and fast render speeds. Specifically, our system leverages the geometry information provided by solid-state lidar to solve the problem of depth inaccuracies encountered when relying solely on visual solutions in unbounded outdoor scenes. In addition, we use 3D Gaussian point clouds to make full use of the color information in the photo with the help of pixel-level gradient descent to achieve realistic rendering results. To further enhance the robustness of our system, we designed a repositioning module that helps to return the correct trajectory in the event of a positioning failure. Experiments conducted in a variety of scenarios demonstrate the effectiveness of our method.

This article is only for academic sharing, if there is any infringement, please contact to delete the article.

Computer Vision Workshop Exchange Group

At present, we have established multiple communities in the direction of 3D vision, including 2D computer vision, large models, industrial 3D vision, SLAM, autonomous driving, 3D reconstruction, drones, etc., and the subdivisions include:

2D Computer Vision: Image Classification/Segmentation, Target/Detection, Medical Imaging, GAN, OCR, 2D Defect Detection, Remote Sensing Mapping, Super-Resolution, Face Detection, Behavior Recognition, Model Quantification Pruning, Transfer Learning, Human Pose Estimation, etc

Large models: NLP, CV, ASR, generative adversarial models, reinforcement learning models, dialogue models, etc

Industrial 3D vision: camera calibration, stereo matching, 3D point cloud, structured light, robotic arm grasping, defect detection, 6D pose estimation, phase deflection, Halcon, photogrammetry, array camera, photometric stereo vision, etc.

SLAM: visual SLAM, laser SLAM, semantic SLAM, filtering algorithm, multi-sensor fusion, multi-sensor calibration, dynamic SLAM, MOT SLAM, NeRF SLAM, robot navigation, etc.

Autonomous driving: depth estimation, Transformer, millimeter-wave, lidar, visual camera sensor, multi-sensor calibration, multi-sensor fusion, autonomous driving integrated group, etc., 3D object detection, path planning, trajectory prediction, 3D point cloud segmentation, model deployment, lane line detection, BEV perception, Occupancy, target tracking, end-to-end autonomous driving, etc.

3D reconstruction: 3DGS, NeRF, multi-view geometry, OpenMVS, MVSNet, colmap, texture mapping, etc

Unmanned aerial vehicles: quadrotor modeling, unmanned aerial vehicle flight control, etc

In addition to these, there are also exchange groups such as job search, hardware selection, visual product landing, the latest papers, the latest 3D vision products, and 3D vision industry news

Add a small assistant: dddvision, note: research direction + school/company + nickname (such as 3D point cloud + Tsinghua + Little Strawberry), pull you into the group.

3D Visual Learning Knowledge Planet

3DGS, NeRF, Structured Light, Phase Deflection, Robotic Arm Grabbing, Point Cloud Practice, Open3D, Defect Detection, BEV Perception, Occupancy, Transformer, Model Deployment, 3D Object Detection, Depth Estimation, Multi-Sensor Calibration, Planning and Control, UAV Simulation, 3D Vision C++, 3D Vision python, dToF, Camera Calibration, ROS2, Robot Control Planning, LeGo-LAOM, Multimodal fusion SLAM, LOAM-SLAM, indoor and outdoor SLAM, VINS-Fusion, ORB-SLAM3, MVSNet 3D reconstruction, colmap, linear and surface structured light, hardware structured light scanners, drones, etc.

New work of USTC | Localization and reconstruction of 3D Gauss-based multimodal fusion in unbounded scenes

Read on

Wang Yang's relationship warmed up again, and the romantic story with Gauss, who was 4 years younger, was moving!

HKUST & Zhejiang University | The New Revolution in Reinforcement Learning: Breakthrough Applications of Universal Gaussian Representations

The number of shareholders of Gauss Bell increased by 19.90%, and the average shareholding was 62,500 yuan

"Negative Man" Wang Yang: I have been in love with Jiang Xin for many years, but I turned my head and chose Gauss, who is "Wangfu"!

Wang Yang: After breaking up with Jiang Xin, he married Gauss, who was 4 years younger, and now he has finally succeeded in his career

USTC Open Source | A 3D Scene Generation Method Based on 3D Gaussian and Formation Mode Sampled Text

Latest open source | Fast and good diffusion model helps complete 3D Gaussian scenes

Wang Yang: Although I have a relationship with Jiang Xin, I will never live up to Gauss, who has been waiting for me for 6 years, for the rest of my life

Ryan Gosling, a man of humor in a red suit, attended the premiere of the film Stunt Maniac in Paris

Actor Wang Yang: It was not popular for 10 years after his debut, but he became popular after marrying Gauss at the age of 40 and married a wife of a prosperous husband

The premiere was 430,000 Hong Kong dollars, and the word-of-mouth of "Stunt Maniac" was released, and Gosling beat Zhang Jiahui to win the runner-up

超越GaussianEditor | GScream：利用3D高斯高效、高质量物体移除

Actor Wang Yang picks up his daughter from school, her 3-year-old daughter is too beautiful, and Gauss is tall and thin and fashionable!

Actor Wang Yang picks up his daughter from school, her 3-year-old daughter is very beautiful, and Gauss is tall, thin and fashionable!

Wang Yang picked up his daughter from school, his wife Gauss was tall and thin, her daughter was so beautiful, and the family had a quiet life

Shenbei New Area has launched ten major sections, 40 scenes, more than 40 cultural, sports and tourism activities, and 5 boutique routes to invite you to check in!

If you don't even do scenario self-service testing, how can the experience be good?

Yutong Tianjun: All four major scenes are mastered

New Scene, New Business, New Future China Resources Land Qujiang CCBD Silk Road Business Co-construction Salon was successfully held

Future development trends of the pet industry in 2024: An in-depth explanation of the growth opportunities in various consumption scenarios

Chasing "light" and "electricity", multi-scene force! Desay Battery plays a good "combination punch" of energy storage

The tank brand has created a new off-road era with "full-power, full-scene globalization".

Dialogue with SAIC MAXUS Tao Kui: User-centric, SAIC MAXUS Hybrid focuses on actual car scenarios

BUPT | OMEGAS: Target segmentation and occlusion target reconstruction in large scenes

Tsinghua Academy of Fine Arts' scene colors have been high-scoring over the years, and each one is a classic!

The tank accelerates the global layout and creates a new off-road era with "full-power and full-scene globalization".

Circle of friends copywriting suitable for various scenes during the "May Day" holiday Take one sentence and take it

Meinian Health won the "2024 Forbes Chinese Artificial Intelligence Innovation Scenario Application Enterprise TOP10" award

German fans are looking forward to a dream scene this summer: Bayern Dort will meet in the Champions League final, and Germany will win the European Championship

Laiyifen LABmini debuts: new scene X new experience

After the "negative man" Wang Yang broke up with Jiang Xin, he turned around and married Gauss, who had been waiting for him for six years

Xiaomi 14 uses Google Camera, which can be compared to 14Ultra? The three pictures are Google Camera, Leica Bright, Google Camera, Leica Smart, Xiaomi Camera, Leica Classic, you can see Google Phase