laitimes

GoMAvatar: High-fidelity rendering and deformation based on an efficient human body digitization method based on single-view video

author:3D Vision Workshop

Editor: Computer Vision Workshop

Add assistant: dddvision, note: 3D Gauss, pull you into the group. At the end of the article, industry subdivisions are attached

GoMAvatar: High-fidelity rendering and deformation based on an efficient human body digitization method based on single-view video

标题:GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh

Authors: Jing Wen et al

Homepage: https://wenj.github.io/GoMAvatar/

Thesis: https://arxiv.org/pdf/2404.07991.pdf

1. Introduction

This article introduces a new method called GoMAvatar that uses monocular video to quickly and efficiently reconstruct high-quality movable mannequins. At the heart of the approach is the Gaussians-on-Mesh (GoM) representation, which combines the high quality and speed of Gaussian rendering with the geometric modeling and compatibility of deformable meshes. Specifically, GoM uses Gaussian rendering, which provides rich flexibility for modeling appearances and enables real-time performance. At the same time, GoM leverages bone-driven deformable meshes to create compact, topologically complete digital avatars and simplify mesh articulation with forward kinematics. Crucially, in order to integrate the two representations together, we attached the Gaussian body to each mesh face, which better normalized the deformation of the Gaussian body in the new pose. In addition, to deal with view dependencies, we decompose the final color into a pseudo-albedo map rendered in Gaussian and a pseudo-shadow map derived from the normal map. This representation can be inferred from a single input video only. Experimental results show that GoMAvatar matches or outperforms the current state-of-the-art monocular human modeling algorithms in terms of rendering quality, while significantly outperforming them in terms of computational efficiency (43FPS) and has high memory efficiency (3.63MB per subject).

GoMAvatar: High-fidelity rendering and deformation based on an efficient human body digitization method based on single-view video

2. Method (GoM)

GoMAvatar: High-fidelity rendering and deformation based on an efficient human body digitization method based on single-view video

The core idea of the GoMAvatar method is to combine the Gaussian distribution with the representation of a deformable mesh. Specifically, it includes the following:

  1. Gaussian distribution representation: GoM uses Gaussian distribution for rendering, which makes the rendering speed faster. Each polygon is associated with a Gaussian distribution, where the mean and covariance matrices are related to the vertex coordinates of the polygon.
  2. Deformable mesh representation: GoM utilizes deformable mesh for deformation, providing clear geometric information and adapting to different human postures. Each vertex is associated with a linear blend skin weight, which is used for mesh deformation.
  3. Forward kinematic deformation: GoM performs mesh deformation through forward kinematics, which avoids the uncertainty of reverse mapping. This helps to achieve a more accurate deformation.
  4. Rendering and Morphing Compatible: GoM's rendering and morphing are fully compatible, enabling efficient production of high-quality rendered images.
  5. Efficient rendering: GoM uses Gaussian distribution rendering and mesh rendering to achieve efficient rendering.
  6. Explicit geometric information: GoM provides explicit geometric information through the mesh, which avoids the overfitting problem of Gaussian distributions.
  7. Balancing speed and quality: GoM balances rendering speed and quality, enabling high-quality rendered images to be produced quickly.

3. Experiments

, the authors conducted an extensive experimental evaluation of the GoMAvatar method and compared it with other methods for generating human digitization from single-view videos. Specific content includes:

  1. Dataset: The authors conducted experimental verification on the ZJU-MoCap dataset, the PeopleSnapshot dataset, and YouTube videos.
  2. Benchmark methods: The authors selected the latest human digitization methods such as NeuralBody, HumanNeRF, NeuMan, MonoHuman, Anim-NeRF, and InstantAvatar for comparison.
  3. Evaluation indicators: The main evaluation indicators include PSNR, SSIM, LPIPS, CD, NC, inference speed and memory usage.
  4. Experimental results: The authors performed a quantitative evaluation on the ZJU-MoCap dataset, and the results showed that the GoMAvatar method achieved a good balance in rendering quality, inference speed and memory usage.
  5. Qualitative comparison: The authors made qualitative comparisons with NeuralBody, HumanNeRF, and MonoHuman, demonstrating the advantages of GoMAvatar in terms of detail representation, surface geometry, and self-intersection.
  6. Case Study of Failure: The authors demonstrate the limitations of GoMAvatar in terms of unobserved regions and topological changes, while also demonstrating its flexibility in fitting garments with different topologies.
  7. Sensitivity analysis: The authors analyzed the sensitivity of GoMAvatar to the attitude estimation accuracy, and showed that the proposed method is robust to the attitude estimation error.
GoMAvatar: High-fidelity rendering and deformation based on an efficient human body digitization method based on single-view video
GoMAvatar: High-fidelity rendering and deformation based on an efficient human body digitization method based on single-view video
GoMAvatar: High-fidelity rendering and deformation based on an efficient human body digitization method based on single-view video
GoMAvatar: High-fidelity rendering and deformation based on an efficient human body digitization method based on single-view video
GoMAvatar: High-fidelity rendering and deformation based on an efficient human body digitization method based on single-view video
GoMAvatar: High-fidelity rendering and deformation based on an efficient human body digitization method based on single-view video
GoMAvatar: High-fidelity rendering and deformation based on an efficient human body digitization method based on single-view video

The efficiency and high-quality performance of the GoMAvatar method in human digitization tasks are fully verified, which provides an important reference for further promoting the development of this field.

4. Summary

This paper proposes GoMAvatar, an efficient and high-quality approach to digitizing the human body. The core idea is to combine Gaussian distributed rendering and deformable mesh deformation, so as to achieve a double improvement in rendering speed and quality. Specifically, GoMAvatar uses a Gaussian distribution for rendering, avoiding dense sampling in volumetric rendering, while attaching a Gaussian distribution to a deformable mesh to accommodate different human poses. In addition, it takes advantage of the forward kinematic deformation of the mesh, which avoids the uncertainty of the reverse mapping. Experimental results show that GoMAvatar achieves a good balance in rendering quality and inference speed, which is better than other latest methods. Finally, the authors also performed qualitative and quantitative analyses and demonstrated the advantages of GoMAvatar in terms of detail performance and self-inbreeding, as well as its limitations in unobserved regions and topological changes. Overall, GoMAvatar offers an efficient, high-quality new option for human digitization tasks.

This article is only for academic sharing, if there is any infringement, please contact to delete the article.

Here I recommend the new course "New SLAM Algorithm Based on NeRF/Gaussian 3D Reconstruction" launched by the 3D Vision Workshop and Gigi

About the Speaker

GoMAvatar: High-fidelity rendering and deformation based on an efficient human body digitization method based on single-view video

Course outline

GoMAvatar: High-fidelity rendering and deformation based on an efficient human body digitization method based on single-view video
GoMAvatar: High-fidelity rendering and deformation based on an efficient human body digitization method based on single-view video

Course Highlights:

  • This course starts from both theory and code implementation, and takes you from scratch to learn the principles of NeRF/Gaussian Based SLAM, read papers, and sort out code.
  • At the theoretical level, starting from linear algebra to traditional computer graphics, we can understand the theoretical basis and source of modern 3D reconstruction.
  • At the code level, through a number of exercises, you will be taught to reproduce computer graphics and NeRF related work.

Harvest after school

  • Getting started in the field of SLAM based on NeRF/Gaussian
  • Learn how to quickly capture the key points and innovative points of a paper
  • How to quickly run through the code of a paper and grasp the idea of the paper in combination with the code
  • Parse the NeRF code line by line, grasp every implementation detail, and manually reproduce and improve it

Curriculum

  • System requirements: Linux
  • Programming language: Python
  • Basic requirements: Python and PyTorch foundation

Suitable for people

  • A novice who has no idea how to start with the open source code for a new paper
  • SLAM定位建图、NeRF三维重建小白
  • Those who are engaged in 3D reconstruction work can refer to it
  • Initial readers of NeRF papers
  • Students who are interested in SLAM and NeRF

Start time

On Saturday, February 24, 2024 at 8 p.m., there will be one chapter updated weekly.

Course Q&A

The Q&A of this course is mainly answered in the corresponding goose circle of this course, and students can ask questions in the goose circle at any time if they have any questions during the learning process.

▲Add a small assistant: cv3d007, consult more

Note: Some of the above pictures and videos are from the Internet, if your rights and interests are violated, please contact to delete!

Read on