laitimes

New work of USTC | Localization and reconstruction of 3D Gauss-based multimodal fusion in unbounded scenes

author:3D Vision Workshop

Author: Chenyang Wu|Editor: Computer Vision Workshop

Add assistant: dddvision, note: 3D object detection, pull you into the group. At the end of the article, industry subdivisions are attached

New work of USTC | Localization and reconstruction of 3D Gauss-based multimodal fusion in unbounded scenes

标题:MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scene

Link: https://arxiv.org/pdf/2404.04026.pdf

1. Introduction

In this paper, we introduce a multi-sensor fusion SLAM method called MM-Gaussian, which aims to achieve localization and reconstruction in unbounded scenarios. In this method, Livox solid-state lidar and camera are used to obtain scene data, and 3D Gaussian point clouds are used to construct maps and render high-quality images at the same time. In this paper, four main modules are elaborated, including tracking, relocation, map expansion and update, and a relocation module is specially designed to correct the trajectory offset caused by positioning failure. Experimental results show that the proposed method is superior to the SLAM method based on 3D Gaussian points in terms of localization and mapping. In general, this paper realizes high-precision positioning and map construction in unbounded scenarios through multi-sensor fusion, which has strong robustness.

New work of USTC | Localization and reconstruction of 3D Gauss-based multimodal fusion in unbounded scenes

2. What are the main components of MM-Gaussian?

New work of USTC | Localization and reconstruction of 3D Gauss-based multimodal fusion in unbounded scenes

According to the documentation, the MM-Gaussian system consists of the following four main components:

Tracking: Utilize a point cloud registration algorithm to obtain an initial pose estimate and optimize the pose estimate by comparing the rendered image with the actual image.

Relocalization: Detects tracking failures and leverages the ability of 3D Gaussian rendered images to reposition the pose to the correct trajectory.

Map Expansion: Converts the point cloud of the current frame into 3D Gaussian points and adds them to the map to expand the map.

Map Updating: Optimize the properties of 3D Gaussian points with image keyframe sequences for better rendering.

3. How does the relocation module in the MM-Gaussian system work?

New work of USTC | Localization and reconstruction of 3D Gauss-based multimodal fusion in unbounded scenes
New work of USTC | Localization and reconstruction of 3D Gauss-based multimodal fusion in unbounded scenes

The main workflow is as follows:

Tracking Failure Detection: By calculating the loss function for each frame and comparing it with a preset threshold θfail. When the loss exceeds the threshold, the system enters the tracking failure state.

Get reference pose: When tracking fails, the system will fall back to the previous m frame and obtain the camera pose of this m frame as the reference pose.

"look-around" operation: fix the translation part of the reference pose, and perform uniform sampling on the rotation part to generate n new poses.

Render reference pose images: Render the corresponding RGB, depth, and contour images based on n new poses.

Feature extraction and matching: SuperPoint is used to extract the features of the current frame, and match the features with n rendered RGB images, and select the image with the largest number of matching points and exceeding the threshold θfeature as the candidate.

PnP solving: Based on the camera pose of the candidate image, the rendered depth map is back-projected into 3D space, and the feature correspondence is used to solve the camera pose of the current frame.

Re-render and evaluate: Re-render RGB, depth, and contour images based on the calculated current frame pose, and calculate the loss. If the loss is lower than the threshold θfail, the relocation is considered successful.

Resume tracking and mapping: The tracking, map expansion, and update modules resume work, discarding data during tracking failures.

In summary, the relocation module finds the correct trajectory through the reference pose and look-around operations, and uses the rendered image and feature matching to recover the tracking, so as to improve the robustness of the system.

4. What is the goal of the mapping phase in the MM-Gaussian system?

In the MM-Gaussian system, the goal of the mapping phase is to update the properties of the 3D Gaussian points to achieve a more realistic rendering effect. Specifically, during the mapping phase, the system performs the following operations:

Select Keyframes: Select k-2 keyframes from the keyframe sequence that are most relevant to the current frame, as well as the current frame and the latest keyframe, for optimization.

Render Image: Renders an RGB image based on the selected keyframe pose.

Calculate the loss function: Calculates the loss function between the rendered image and the original image based on the original image input.

Optimize 3D Gaussian properties: Use optimization algorithms such as the Adam optimizer to optimize the color, opacity, and other properties of 3D Gaussian points through gradient descent.

Remove invalid Gaussian points: After the optimization is completed, remove invalid Gaussian points that are too low in transparency or too large in radius.

Refine Surface Details: Generate new Gaussian points by duplicating Gaussian points based on gradients to refine the detailed representation of the object's surface.

Through the above operations, the mapping phase aims to continuously optimize the properties of the 3D Gaussian points to achieve high-quality image rendering.

5. Experiments

It mainly includes the following:

Experimental setup: The authors used a data acquisition device consisting of Livox AVIA lidar and MV-CS050-10UC camera, and collected 9 datasets in the campus scene. All experiments were performed using the parameter settings in Table III. In addition, the authors used the R3LIVE system to obtain preliminary attitude truth, and further optimization was performed using HBAs to evaluate the proposed repositioning module.

Evaluation index: The authors used the root mean square error of absolute trajectory error (ATE RMSE) as the evaluation index in the tracking stage, and the peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) and LPIPS as the evaluation index in the mapping stage.

Comparison of results: In the tracking phase, the authors compared the proposed method with methods such as SplaTAM, MonoGS and NeRF-LOAM. In the mapping phase, the authors compared the proposed method with SplaTAM, MonoGS, 3D Gaussian Splatting, etc. The results show that the proposed method achieves the best mapping results in all sequences.

Qualitative Comparison of Results: The authors also qualitatively compared the proposed method with SplaTAM, and the results showed that the proposed method can render a clearer image and better represent the details of the object's surface.

Effect of the repositioning module: The authors also conducted an Ablation study of the repositioning module, and the results showed that the proposed repositioning module successfully restored the attitude to the correct trajectory.

New work of USTC | Localization and reconstruction of 3D Gauss-based multimodal fusion in unbounded scenes
New work of USTC | Localization and reconstruction of 3D Gauss-based multimodal fusion in unbounded scenes
New work of USTC | Localization and reconstruction of 3D Gauss-based multimodal fusion in unbounded scenes
New work of USTC | Localization and reconstruction of 3D Gauss-based multimodal fusion in unbounded scenes

6. Conclusion

Localization and mapping are mission-critical for a variety of applications, such as autonomous vehicles and robotics. Due to the infinite nature of the outdoor environment, the challenges posed by the outdoor environment are particularly complex. In this work, we propose MM-Gaussian, a multimodal fusion system for LiDAR cameras for localization and mapping in unbounded scenes. Our approach is inspired by the recently developed 3D Gaussian, which exhibits an extraordinary ability to achieve high render quality and fast render speeds. Specifically, our system leverages the geometry information provided by solid-state lidar to solve the problem of depth inaccuracies encountered when relying solely on visual solutions in unbounded outdoor scenes. In addition, we use 3D Gaussian point clouds to make full use of the color information in the photo with the help of pixel-level gradient descent to achieve realistic rendering results. To further enhance the robustness of our system, we designed a repositioning module that helps to return the correct trajectory in the event of a positioning failure. Experiments conducted in a variety of scenarios demonstrate the effectiveness of our method.

This article is only for academic sharing, if there is any infringement, please contact to delete the article.

Computer Vision Workshop Exchange Group

At present, we have established multiple communities in the direction of 3D vision, including 2D computer vision, large models, industrial 3D vision, SLAM, autonomous driving, 3D reconstruction, drones, etc., and the subdivisions include:

2D Computer Vision: Image Classification/Segmentation, Target/Detection, Medical Imaging, GAN, OCR, 2D Defect Detection, Remote Sensing Mapping, Super-Resolution, Face Detection, Behavior Recognition, Model Quantification Pruning, Transfer Learning, Human Pose Estimation, etc

Large models: NLP, CV, ASR, generative adversarial models, reinforcement learning models, dialogue models, etc

Industrial 3D vision: camera calibration, stereo matching, 3D point cloud, structured light, robotic arm grasping, defect detection, 6D pose estimation, phase deflection, Halcon, photogrammetry, array camera, photometric stereo vision, etc.

SLAM: visual SLAM, laser SLAM, semantic SLAM, filtering algorithm, multi-sensor fusion, multi-sensor calibration, dynamic SLAM, MOT SLAM, NeRF SLAM, robot navigation, etc.

Autonomous driving: depth estimation, Transformer, millimeter-wave, lidar, visual camera sensor, multi-sensor calibration, multi-sensor fusion, autonomous driving integrated group, etc., 3D object detection, path planning, trajectory prediction, 3D point cloud segmentation, model deployment, lane line detection, BEV perception, Occupancy, target tracking, end-to-end autonomous driving, etc.

3D reconstruction: 3DGS, NeRF, multi-view geometry, OpenMVS, MVSNet, colmap, texture mapping, etc

Unmanned aerial vehicles: quadrotor modeling, unmanned aerial vehicle flight control, etc

In addition to these, there are also exchange groups such as job search, hardware selection, visual product landing, the latest papers, the latest 3D vision products, and 3D vision industry news

Add a small assistant: dddvision, note: research direction + school/company + nickname (such as 3D point cloud + Tsinghua + Little Strawberry), pull you into the group.

3D Visual Learning Knowledge Planet

3DGS, NeRF, Structured Light, Phase Deflection, Robotic Arm Grabbing, Point Cloud Practice, Open3D, Defect Detection, BEV Perception, Occupancy, Transformer, Model Deployment, 3D Object Detection, Depth Estimation, Multi-Sensor Calibration, Planning and Control, UAV Simulation, 3D Vision C++, 3D Vision python, dToF, Camera Calibration, ROS2, Robot Control Planning, LeGo-LAOM, Multimodal fusion SLAM, LOAM-SLAM, indoor and outdoor SLAM, VINS-Fusion, ORB-SLAM3, MVSNet 3D reconstruction, colmap, linear and surface structured light, hardware structured light scanners, drones, etc.

Read on