Hello everyone, Computer Vision Workshop today shares with you an article CVPR2024 the latest open source work: DNGaussian: the latest SOTA of sparse 3DGS! If you have any work to share, please contact cv3d008!
0. Reader's personal understanding
The synthesis of new perspectives with sparse inputs poses a challenge to the radiated field. Significant progress has recently been made in Neural Radiance Fields (NeRF), which can reconstruct realistic appearances and accurate geometries with just a few input views. However, most of the NeRF for sparse views are implemented with low processing speed and large memory consumption, resulting in high time and computational costs that limit their practical application. While some approaches achieve faster inference speeds with a mesh-based backbone, they often face trade-offs that result in high training costs or impaired rendering quality. Recently, 3D Gaussian Splash has introduced an unstructured 3D Gaussian radiation field, employing a set of 3D Gaussian elements that can be used to learn from dense color input views, with remarkable success in rapid, high-quality, and low-cost synthesis of new perspectives. Even with only sparse input, it can still partially retain the amazing ability to reconstruct some of the crisp and detailed local features. However, the reduction of view constraints causes significant parts of the scene geometry to be incorrectly learned, resulting in the failure of new perspective synthesis. Inspired by the success of the early deep regularized sparse view NeRF, this paper explores the extraction of depth information from a pre-trained monocular depth estimator to correct the Gaussian field of poor learning geometry, and introduces the 3D Gaussian radiation field of the deep normalized regularized sparse view (DNGaussian) to pursue a new perspective synthesis with fewer samples in pursuit of higher quality and efficiency.
Although they share a similar form of depth rendering, the depth regularization of the 3D Gaussian radiated field is significantly different from the approach taken by NeRF. First, the existing NeRF depth regularization strategies usually use depth to normalize the entire model, which creates potential geometric conflicts in the Gaussian field, which can adversely affect the quality. Specifically, this practice forces Gauss's shape to adapt to a smooth monocular depth rather than a complex color appearance, resulting in a loss of detail and a blurred appearance. Considering that the basis of scene geometry lies in the position of the Gaussky elements rather than their shape, we freeze the shape parameters and propose a hard and soft depth regularization that achieves spatial remodeling by encouraging movement between primitives. In the regularization process, we propose to render two depths to adjust the center and opacity of Gaussian independently without changing their shape, thus striking a balance between the appearance of complex colors and the smooth depth of roughness.
In addition, the Gaussian radiating field is more sensitive to small depth errors than NeRF, which can lead to noisy distribution of primitives and failure in regions with complex textures. Existing scale-invariant depth losses often choose to align the depth map to a fixed scale, which ignores small losses. To solve this problem, we introduce global local depth normalization into the depth loss function, which encourages the learning of small local depth changes in a scale-invariant manner. Through local and global scale normalization, our method guides the loss function to refocus on small local errors while maintaining knowledge of absolute scales to enhance the detailed geometric remodeling process of deep regularization.
Combining the two proposed techniques, DNGaussian synthesizes competitive quality and superior detail across multiple sparse view setups on LLFF, Blender, and DTU datasets, resulting in significantly lower memory costs, 25x shorter training times, and more than 3,000x faster rendering than state-of-the-art methods. Experiments have also shown that DNGaussian has the versatility to adapt to complex scenes, a wide range of perspectives, and a variety of materials.
2. Introduction
Radiant fields have shown impressive performance in synthesizing new views from sparse input views, but existing methods suffer from high training costs and slow inference speeds. In this paper, we introduce DNGaussian, a deep regularization framework based on 3D Gaussian radiation fields, which provides real-time and high-quality, low-cost, few-shot new view synthesis. Our motivation stems from the efficient representation and striking quality of the recent three-dimensional Gaussian spraying, despite the geometric degradation experienced when the input view is reduced. In the Gaussian radiance field, we find that the geometric degradation of the scene is mainly determined by the position of the Gaussian primitive, which can be mitigated by depth constraints. Therefore, we propose hard and soft depth regularization to restore accurate scene geometry under coarse monocular depth supervision, while maintaining a fine-grained color appearance. To further refine the detailed geometry reshape, we have introduced global local depth normalization to enhance the focus on small depth variations. Numerous experiments on LLFF, DTU, and Blender datasets have demonstrated that the DNGaussian method outperforms state-of-the-art methods, achieving comparable or better results at significantly reduced memory costs, 25x reduction in training time, and faster rendering speeds of over 3000x.
3. Effect display
With the sparse input view, DNGaussian stands out for delivering a fairly high-quality composite view with excellent detail, a significant 25x reduction in training time, a significant reduction in memory overhead, and the fastest rendering speed of 300 FPS.
The 3D Gaussian map shows its potential to reconstruct some fine details (green boxes) from sparse input views. However, a reduced input view can significantly reduce the quality of the geometry and cause the reconstruction to fail (orange box). After applying depth regularization, DNGaussian successfully restored the precise geometry and synthesized a new view with high quality.
4. Major Contributions
(1) The geometry of the 3D Gaussian radiation field is constrained by encouraging Gaussian movement, so as to realize the hard and soft depth regularization of the thickness and depth regularization space without affecting the fine-grained color performance.
(2) The global local depth normalization of the detailed appearance reconstruction of the 3D Gaussian radial field is improved by normalizing the depth patch at the local scale in order to refocus on small local depth changes.
and (3) a DNGaussian framework for fast and high-quality few-shot new perspective synthesis, combining the above two techniques with competitive quality compared to state-of-the-art methods in multiple benchmarks to capture detail with significantly lower training costs and real-time rendering.
(4) DNGaussian is the first attempt to analyze and solve the depth regularization problem of 3D Gaussian splash under coarse depth cues. We hope that this article will inspire more ideas on optimizing radiated fields without constraints.
5. What is the rationale?
DNGaussian's framework, which starts with a random initialization, includes a color-supervised module and a deep regularization module. The color-supervised optimization process is mainly inherited from 3D Gaussian spraying, with the exception of a neural color renderer. In depth regularization, we render a hard depth and a soft depth for the input view, and calculate the loss of the pregenerated monocular depth map using the proposed global-local depth normalization, respectively. Finally, the output Gaussian field achieves efficient and high-quality synthesis of new views.
6. Experimental results
LLFF。 The qualitative and visual results of the LLFF dataset are shown in Table 1 and Figure 5. It is important to note that since the NeRF baseline interpolates colors to areas that are not visible from the input view, and the discrete Gaussian radiation field directly exposes the black background of these blank areas, the 3DGS-based approach inherently has a weakness in reconstructing metrics from these meaningless invisible regions. Despite this, DNGaussian still outperformed all baselines in LPIPS scores and achieved comparable levels in terms of PSNR, SSIM, and mean error. Looking at the quantitative and qualitative results, it can be seen that DNGaussian predicted more detail and precise geometries. FreeNeRF tends to synthesize smooth views that lack high-frequency detail, and the geometry is not as accurate as the deeply supervised SparseNeRF and DNGaussian. Despite being normalized by the same depth maps, SparseNeRF is weaker in terms of detail and geometric integrity. DNGaussian also has a huge improvement in image geometry quality compared to a well-adjusted 3DGS.
DTU。 The quantitative results of the DTU 3 view settings reported in Table 1 show that DNGaussian performs best for LPIPS and SSIM, and ranks second for average error.
Efficiency. An efficiency study was performed with an RTX 3090 Ti GPU in the LLFF 3 view settings to explore the performance of the current SOTA baseline with limited GPU memory (24GB/12GB) and training time (1.0h/0.5h), as shown in Table 3. The top row of each group represents the default setting for the corresponding baseline, where the training time is measured with the same number of iterations on a single GPU. Although FreeNeRF and SparseNeRF performed poorly under strict resource constraints, DNGaussian showed a huge advantage in terms of efficiency, achieving a significant acceleration of 25x in training time and more than 3000x in FPS, while synthesizing new views with competitive quality. Considering the necessity of optimization and fast visualization of each scenario, our efficiency is of great value for practical applications.
7. Summary
In this paper, we propose the DNGaussian framework to introduce 3DGS into the task of composing new views with few samples through deep normalization.
8. References
[1] DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
Computer Vision Workshop Exchange Group
At present, we have established multiple communities in the direction of 3D vision, including 2D computer vision, large models, industrial 3D vision, SLAM, autonomous driving, 3D reconstruction, drones, etc., and the subdivisions include:
2D Computer Vision: Image Classification/Segmentation, Target/Detection, Medical Imaging, GAN, OCR, 2D Defect Detection, Remote Sensing Mapping, Super-Resolution, Face Detection, Behavior Recognition, Model Quantification Pruning, Transfer Learning, Human Pose Estimation, etc
Large models: NLP, CV, ASR, generative adversarial models, reinforcement learning models, dialogue models, etc
Industrial 3D vision: camera calibration, stereo matching, 3D point cloud, structured light, robotic arm grasping, defect detection, 6D pose estimation, phase deflection, Halcon, photogrammetry, array camera, photometric stereo vision, etc.
Slam:视觉Slam、激光Slam、语义Slam、滤波算法、多传感器融吇算法、多传感器标定、动态Slam、Mot Slam、Nerf Slam、机器人导航等.
Autonomous driving: depth estimation, Transformer, millimeter-wave, lidar, visual camera sensor, multi-sensor calibration, multi-sensor fusion, autonomous driving integrated group, etc., 3D object detection, path planning, trajectory prediction, 3D point cloud segmentation, model deployment, lane line detection, BEV perception, Occupancy, target tracking, end-to-end autonomous driving, etc.
三维重建:3DGS、NeRF、多视图几何、OpenMVS、MVSNet、colmap、纹理贴图等
Unmanned aerial vehicles: quadrotor modeling, unmanned aerial vehicle flight control, etc
In addition to these, there are also exchange groups such as job search, hardware selection, visual product landing, the latest papers, the latest 3D vision products, and 3D vision industry news
Add a small assistant: dddvision, note: research direction + school/company + nickname (such as 3D point cloud + Tsinghua + Little Strawberry), pull you into the group.
3D Visual Learning Knowledge Planet
3DGS、NeRF、结构光、相位偏折术、机械臂抓取、点云实战、Open3D、缺陷检测、BEV感知、Occupancy、Transformer、模型部署、3D目标检测、深度估计、多传感器标定、规划与控制、无人机仿真、三维视觉C++、三维视觉python、dToF、相机标定、ROS2、机器人控制规划、LeGo-LAOM、多模态融合SLAM、LOAM-SLAM、室内室外SLAM、VINS-Fusion、ORB-SLAM3、MVSNet三维重建、colmap、线面结构光、硬件结构光扫描仪,无人机等。