laitimes

CVPR'24 Open Source | Fast and good visual inertial navigation system!

author:3D Vision Workshop

Editor: Computer Vision Workshop

Add assistant: dddvision, note: autopilot, pull you into the group. At the end of the article, industry subdivisions are attached

There are two main methods of VINS: optimization and filtering. Optimization-based methods are significant in terms of high accuracy in positioning, but can be affected by high computational complexity. Conversely, filtering-based methods achieve high efficiency at the expense of accuracy. Therefore, there is an urgent need to develop a framework that combines high precision and high efficiency. Inspired by the Shuer complement in the optimization-based approach, the authors make full use of the sparse structure inherent in the high-dimensional residual model for pose and landmark construction to achieve the high efficiency of the EKF. Therefore, this paper proposes an EKF-based VINS framework that achieves both high efficiency and high accuracy.

Let's read about this work together~

标题:SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System

Related: Yunfei Fan, Tianyu Zhao, Guidong Wang

Agency: ByteDance

Original link: https://arxiv.org/abs/2312.01616

Code link: https://github.com/bytedance/SchurVINS

Accuracy and computational efficiency are the most important metrics in a visual inertial navigation system (VINS). Existing VINS algorithms with high accuracy or low computational complexity are difficult to provide high-precision positioning in resource-constrained devices. To this end, we propose a novel filter-based VINS framework called SchurVINS, which guarantees high accuracy and low computational complexity by constructing a complete residual model and using Shure compensation. Technically, we first developed a complete residual model in which gradient, Hessian, and observational covariances were explicitly modeled. Then, Shure compensation was used to decompose the complete model into a self-motion residual model and a landmark residual model. Finally, an efficient extended Kalman filter (EKF) update is implemented in both models. Experiments on EuRoC and TUM-VI datasets show that our method is significantly superior to existing technical methods in terms of accuracy and computational complexity.

Evaluate the comparison of runtime, CPU usage, and RMSE on the EuRoC dataset. Different shapes and colors represent different methods and precisions.

CVPR'24 Open Source | Fast and good visual inertial navigation system!

In this paper, we propose a filter-based VINS framework called SchurVINS, which guarantees high accuracy and low computational complexity by constructing a complete residual model and using Shure complements. Technically, a complete residual model was developed first, in which the gradient, Hesse matrix, and observational covariance were explicitly modeled. Then, Shure complements are used to decompose the complete model into a self-motion residual model and a landmark residual model. Finally, the extended Kalman filter (EKF) update is efficiently implemented in these two models. Experiments on EuRoC and TUM-VI datasets show that SchurVINS is significantly superior to state-of-the-art methods in terms of accuracy and computational complexity. Key contributions include:

(1) An equivalent residual model is proposed to deal with ultra-high-dimensional observations, including gradients, Hesse matrices and corresponding observational covariances. This method has great versatility in EKF systems.

(2) A lightweight EKF-based landmark solver is proposed to estimate the location of landmarks efficiently.

(3) A novel EKF-based VINS framework was developed to achieve accurate and efficient estimation of both self-motion and landmarks.

SchurVINS was developed based on the open-source SVO2.0 in a binocular configuration, in which the original backend in SVO2.0 is replaced with a sliding window-based EKF backend, and the original landmark optimizer is replaced with an EKF-based landmark solver. P1 to Pm represent valid signposts for the surrounding environment and are used to build residual models.

CVPR'24 Open Source | Fast and good visual inertial navigation system!

precision

The overall accuracy of the SchurVINS was evaluated using the root mean square error (RMSE) on EuRoC and TUM-VI. SchurVINS achieves the lowest average RMSE in the dataset among the filter-based methods reported to date, and outperforms most optimization-based methods. In addition, SchurVINS achieves a similar accuracy to the well-known optimization-based method BASALT and slightly lower than its recent competitor, DMVIO. The reassessment experiments in Table 2 were exactly as expected. It is worth emphasizing that while the accuracy is slightly reduced compared to the two optimization-based competitors, the computational complexity of the SchurVINS implementation is significantly lower than both, as detailed in the next section.

CVPR'24 Open Source | Fast and good visual inertial navigation system!
CVPR'24 Open Source | Fast and good visual inertial navigation system!

efficiency

The efficiency evaluation was performed on the Intel i7-9700 (3.00GHZ) desktop platform. Global BA (GBA), Attitude Map Optimization, and Closed-Loop Detection are disabled in all algorithms. As shown in Table 3, SchurVINS achieves almost the lowest processor usage compared to all of the VINS algorithms mentioned. In particular, SVO2.0-wo requires similar CPU usage to SchurVINS, but suffers from significant inaccuracies due to the almost pure visual odometry (VO).

CVPR'24 Open Source | Fast and good visual inertial navigation system!

In Table 4, the optimizeStructure module in SchurVINS is nearly 3 times faster than SchurVINS-GN. This is because SchurVINS achieves significant computational savings by utilizing the intermediate results of Schur supplementation. In contrast, the SchurVINS-GN reconstruction problem to estimate landmarks. Compared to SVO2.0-wo, SchurVINS is faster because it replaces the computationally intensive SparseImageAlign with the propagation module. In contrast, SVO2.0-wo's optimizeStructure is significantly faster than SchurVINS-GN. The reason for this is that the latter uses almost 4 times more measurements than the former for optimization. The root cause of the significant increase in the runtime of the algorithm compared to SVO2.0 is the high computational complexity of LBA. Considering OpenVINS, it's worth noting that neither the default configuration nor the configuration with the maximum size of 4 sliding windows can achieve OpenVINS being better than SchurVINS in terms of efficiency. What is striking from this analysis is that the update of SLAM points in OpenVINS requires significantly more computational resources than the EKF-based landmark estimation proposed by SchurVINS.

CVPR'24 Open Source | Fast and good visual inertial navigation system!

Ablation studies

The above experiments strongly support SchurVINS. Therefore, it is necessary to study the impact of the different components of the algorithm. Based on SchurVINS, the EKF-based landmark solver is replaced or discarded to analyze its validity. As shown in Table 5, without one of the GN-based or EKF-based landmark solvers, SchurVINS would not be able to sufficiently limit the global drift. In addition, in some challenge scenarios, not simultaneously estimating landmarks in SchurVINS can lead to system divergence. The comparison of SchurVINS and SchurVINS-GN in Table 5 shows that both the proposed EKF-based landmark solver and the original SVO2.0 GN-based landmark solver are valid and reliable, and can guarantee high accuracy. In addition, a comparison of them in Tables 4 and 5 shows that while the proposed EKF-based landmark solver results in a slight drop in accuracy, it can achieve significantly low computational complexity. The visual explanation for the reduced accuracy is that SchurVINS uses only all observations in a sliding window for landmark estimation.

CVPR'24 Open Source | Fast and good visual inertial navigation system!

This article develops an EKF-based VINS algorithm, including a novel EKF-based landmark solver, to achieve 6 degrees of freedom estimation with high efficiency and accuracy. In particular, the equivalent residual model composed of Hessian, Gradient and corresponding observational covariance is used to jointly estimate the attitude and landmarks to ensure high-precision positioning. In order to achieve high efficiency, the equivalent residual model was decomposed into an attitude residual model and a landmark residual model by Schur supplementation for EKF update. Affected by the probabilistic independence of the surrounding environmental elements, the resulting landmark residual model is segmented into a set of small independent residual models for EKF updates for each landmark, significantly reducing computational complexity.

This article is the first to utilize the Schur Supplemental Factorized Residual Model in an EKF-based VINS algorithm for acceleration. Experiments based on EuRoC and TUM-VI datasets show that SchurVINS is significantly superior to the overall EKF-based methods and most optimization-based methods in terms of accuracy and efficiency. In addition, SchurVINS requires almost half the computational resources of SOTA's optimization-based approach, with comparable accuracy. At the same time, the ablation study clearly shows that the EKF-based landmark solver not only has significant efficiency, but also ensures high accuracy. In future work, the authors will focus on local map refinement in SchurVINS to explore greater accuracy.

Readers who are interested in more experimental results and details of the article can read the original paper~

This article is only for academic sharing, if there is any infringement, please contact to delete the article.

Computer Vision Workshop Exchange Group

At present, we have established multiple communities in the direction of 3D vision, including 2D computer vision, large models, industrial 3D vision, SLAM, autonomous driving, 3D reconstruction, drones, etc., and the subdivisions include:

2D Computer Vision: Image Classification/Segmentation, Target/Detection, Medical Imaging, GAN, OCR, 2D Defect Detection, Remote Sensing Mapping, Super-Resolution, Face Detection, Behavior Recognition, Model Quantification Pruning, Transfer Learning, Human Pose Estimation, etc

Large models: NLP, CV, ASR, generative adversarial models, reinforcement learning models, dialogue models, etc

Industrial 3D vision: camera calibration, stereo matching, 3D point cloud, structured light, robotic arm grasping, defect detection, 6D pose estimation, phase deflection, Halcon, photogrammetry, array camera, photometric stereo vision, etc.

SLAM: visual SLAM, laser SLAM, semantic SLAM, filtering algorithm, multi-sensor fusion, multi-sensor calibration, dynamic SLAM, MOT SLAM, NeRF SLAM, robot navigation, etc.

Autonomous driving: depth estimation, Transformer, millimeter-wave, lidar, visual camera sensor, multi-sensor calibration, multi-sensor fusion, autonomous driving integrated group, etc., 3D object detection, path planning, trajectory prediction, 3D point cloud segmentation, model deployment, lane line detection, BEV perception, Occupancy, target tracking, end-to-end autonomous driving, etc.

3D reconstruction: 3DGS, NeRF, multi-view geometry, OpenMVS, MVSNet, colmap, texture mapping, etc

Unmanned aerial vehicles: quadrotor modeling, unmanned aerial vehicle flight control, etc

In addition to these, there are also exchange groups such as job search, hardware selection, visual product landing, the latest papers, the latest 3D vision products, and 3D vision industry news

Add a small assistant: dddvision, note: research direction + school/company + nickname (such as 3D point cloud + Tsinghua + Little Strawberry), pull you into the group.

3D Visual Learning Knowledge Planet

3DGS, NeRF, Structured Light, Phase Deflection, Robotic Arm Grabbing, Point Cloud Practice, Open3D, Defect Detection, BEV Perception, Occupancy, Transformer, Model Deployment, 3D Object Detection, Depth Estimation, Multi-Sensor Calibration, Planning and Control, UAV Simulation, 3D Vision C++, 3D Vision python, dToF, Camera Calibration, ROS2, Robot Control Planning, LeGo-LAOM, Multimodal fusion SLAM, LOAM-SLAM, indoor and outdoor SLAM, VINS-Fusion, ORB-SLAM3, MVSNet 3D reconstruction, colmap, linear and surface structured light, hardware structured light scanners, drones, etc.

Read on