laitimes

HKUST & Zhejiang University | The New Revolution in Reinforcement Learning: Breakthrough Applications of Universal Gaussian Representations

author:3D Vision Workshop

作者:Jiaxu Wang | 编辑:3DCV

Add WeChat: dddvision, note: 3D Gauss, pull you into the group. At the end of the article, industry subdivisions are attached

HKUST & Zhejiang University | The New Revolution in Reinforcement Learning: Breakthrough Applications of Universal Gaussian Representations

标题:Reinforcement Learning with Generalizable Gaussian Splatting

作者:Jiaxu wang等人

Dissertation: https://arxiv.org/pdf/2404.07950.pdf

1. Introduction

This article introduces a novel environmental representation method based on generalizable Gaussian sputtering (3DGS) for reinforcement learning. This method uses 3DGS to express the environmental information clearly, capture the local geometric details at the same time, and construct 3D consistent features. The authors propose a general-purpose 3DGS framework that can predict 3D Gaussian clouds directly from multi-view images without the need for per-scene optimization. By comparing with different representations and algorithms on the RoboMimic platform, the experimental results show that the general 3DGS representation method can significantly improve the performance of reinforcement learning. This work expands the application prospect of 3DGS in reinforcement learning and provides a new perspective for vision-based reinforcement learning in the future.

2. Innovation

  • The use of 3D Gaussian representation as an environmental representation in reinforcement learning combines the advantages of explicit and implicit representations to contain rich geometric information and describe complex local geometric structures.
  • A general 3D Gaussian prediction module is introduced, which can predict 3D Gaussian point clouds directly from multi-view images without the need to optimize for each scene individually, so that 3D Gaussian representations can be used for reinforcement learning.
  • The pre-trained Gaussian prediction module is integrated into the reinforcement learning environment, and the observations of the environment are converted into 3D Gaussian representations, and then reinforcement learning strategies are trained based on the representations.
  • Verified in RoboMimic environment, the results show that the general Gaussian representation outperforms other benchmark representations on multiple tasks, and improves the performance of reinforcement learning.

3. Method

HKUST & Zhejiang University | The New Revolution in Reinforcement Learning: Breakthrough Applications of Universal Gaussian Representations

Universal 3D Gaussian representation: The authors propose a general 3D Gaussian representation method for predicting the corresponding 3D Gaussian point cloud for a given single image or multiple images. This representation consists of three main modules: depth estimation, Gaussian regression and Gaussian refining.

Depth Estimation: This module maps 2D images to 3D space by using stereo image pairs to predict the absolute depth value of each pixel.

Gaussian Regression: This module predicts the rest of each 3D Gaussian property in pixels, including rotation matrix, scale matrix, color, etc.

Gaussian refining: In order to improve the coherence of features, the authors define a Gaussian refining operation to smooth features in 3D space through a graph network.

Training strategy: first pre-train the depth estimation module, then freeze the module, and jointly train the Gaussian regression and refining modules.

Loss function: Render loss and reconstruction loss are used in training to guide model learning.

HKUST & Zhejiang University | The New Revolution in Reinforcement Learning: Breakthrough Applications of Universal Gaussian Representations

4. Experiments

  1. Experimental Setup:
    • The authors evaluated on the RoboMimic platform and selected four tasks: Lift, Can, Square, and Transport.
    • Three offline reinforcement learning algorithms, BCQ, IQL and IRIS, were adopted.
    • Four visual observation modes were compared: image, point cloud, voxel and general Gaussian representation.
    • For a fair comparison, the authors use the same default parameter settings and fix the generic Gaussian prediction module as an encoder to convert multi-view image observations into 3D Gaussian representations, and then let reinforcement learning strategies predict actions on this representation.
  2. Result analysis:
    • Table 1 shows a comparison of the performance of the different representations on the four tasks, and the results show that the generic Gaussian representation outperforms the other benchmark methods in most cases.
    • Table 2 evaluates the impact of Gaussian points on performance, and the results show that the method is less sensitive to points, but the performance is slightly improved as points increase.
    • Table 3 analyzes the influence of 3D Gaussian reconstruction quality on reinforcement learning performance, and the results show that more accurate reconstruction is beneficial to improve performance.
    • Table 4 shows an ablation analysis of some of the basic designs in the general Gaussian framework, and the results show that both the cascading structure of the feature space and the Gaussian refining are valid.
  3. Conclusion:
    • The authors' generic Gaussian representation outperformed other benchmark representations on four tasks, particularly on the most difficult Transport tasks, with performance improvements of 10%, 44%, and 15%.
HKUST & Zhejiang University | The New Revolution in Reinforcement Learning: Breakthrough Applications of Universal Gaussian Representations
HKUST & Zhejiang University | The New Revolution in Reinforcement Learning: Breakthrough Applications of Universal Gaussian Representations
HKUST & Zhejiang University | The New Revolution in Reinforcement Learning: Breakthrough Applications of Universal Gaussian Representations
HKUST & Zhejiang University | The New Revolution in Reinforcement Learning: Breakthrough Applications of Universal Gaussian Representations
HKUST & Zhejiang University | The New Revolution in Reinforcement Learning: Breakthrough Applications of Universal Gaussian Representations

4. Summary

In this paper, we propose a general Gaussian representation framework called GSRL for environmental representation in reinforcement learning. The framework predicts 3D Gaussian point clouds directly from multi-view images through a general 3D Gaussian prediction module, thus avoiding the shortcomings of traditional 3D Gaussian representations that need to be optimized separately for each scene. The authors integrate the pre-trained Gaussian prediction module into the reinforcement learning environment, convert the observations of the environment into 3D Gaussian representations, and then train reinforcement learning strategies based on the representations. Experiments show that the general Gaussian representation outperforms other benchmark representations in multiple tasks and improves the performance of reinforcement learning. This framework innovatively applies 3D Gaussian representation to the field of reinforcement learning, and provides an efficient environmental representation method for reinforcement learning.

This article is only for academic sharing, if there is any infringement, please contact to delete the article.

Here I recommend the new course "New SLAM Algorithm Based on NeRF/Gaussian 3D Reconstruction" launched by the 3D Vision Workshop and Gigi

About the Speaker

HKUST & Zhejiang University | The New Revolution in Reinforcement Learning: Breakthrough Applications of Universal Gaussian Representations

Course outline

HKUST & Zhejiang University | The New Revolution in Reinforcement Learning: Breakthrough Applications of Universal Gaussian Representations
HKUST & Zhejiang University | The New Revolution in Reinforcement Learning: Breakthrough Applications of Universal Gaussian Representations

Course Highlights:

  • This course starts from both theory and code implementation, and takes you from scratch to learn the principles of NeRF/Gaussian Based SLAM, read papers, and sort out code.
  • At the theoretical level, starting from linear algebra to traditional computer graphics, we can understand the theoretical basis and source of modern 3D reconstruction.
  • At the code level, through a number of exercises, you will be taught to reproduce computer graphics and NeRF related work.

Harvest after school

  • Getting started in the field of SLAM based on NeRF/Gaussian
  • Learn how to quickly capture the key points and innovative points of a paper
  • How to quickly run through the code of a paper and grasp the idea of the paper in combination with the code
  • Parse the NeRF code line by line, grasp every implementation detail, and manually reproduce and improve it

Curriculum

  • System requirements: Linux
  • Programming language: Python
  • Basic requirements: Python and PyTorch foundation

Suitable for people

  • A novice who has no idea how to start with the open source code for a new paper
  • SLAM定位建图、NeRF三维重建小白
  • Those who are engaged in 3D reconstruction work can refer to it
  • Initial readers of NeRF papers
  • Students who are interested in SLAM and NeRF

Start time

On Saturday, February 24, 2024 at 8 p.m., there will be one chapter updated weekly.

Course Q&A

The Q&A of this course is mainly answered in the corresponding goose circle of this course, and students can ask questions in the goose circle at any time if they have any questions during the learning process.

▲Add a small assistant: cv3d007, consult more

Note: Some of the above pictures and videos are from the Internet, if your rights and interests are violated, please contact to delete!

Read on