The nerve radiation field removes the "nerve", the training speed is increased by more than 100 times, and the quality of the 3D effect is not reduced

Reports from the Heart of the Machine

Editor: Qian Zhang

Without neural networks, Radiance Fields can achieve the same effect as Neural Radiance Fields (NeRFs), but converge more than 100 times faster.

In 2020, researchers at the University of California, Berkeley, Google, and the University of California, San Diego, proposed a 2D image-to-3D model called "NeRF" that can generate a multi-perspective realistic 3D image using a few still images. Its improved model, NeRF-W (NeRF in the Wild), also adapts to light-filled and occluded outdoor environments, generating 3D sightseeing blockbusters in minutes.

The nerve radiation field removes the "nerve", the training speed is increased by more than 100 times, and the quality of the 3D effect is not reduced

NeRF model demo.

NeRF-W model demo.

However, these stunning effects are very power-intensive: each frame of the graph is rendered for 30 seconds, and the model is trained for a day with a single GPU. As a result, several subsequent papers have improved on the cost of hashrate, especially rendering. However, the training cost of the model has not been significantly reduced, and it still takes several hours to train with a single GPU, which becomes a major bottleneck that limits its landing.

In a new paper, researchers from the University of California, Berkeley, took aim at the problem, proposing a new approach called Plenoxels. The new study shows that even without neural networks, training a radiance field from scratch can achieve the quality of NeRF generation, and the optimization speed is increased by two orders of magnitude.

Thesis link: https://arxiv.org/pdf/2112.05131.pdf

Project Home: https://alexyu.net/plenoxels/

Code link: https://github.com/sxyu/svox2

They offer a custom CUDA implementation that leverages the simplicity of the model to achieve considerable acceleration. In bounded scenarios, the typical optimization time for Plenoxels on a single Titan RTX GPU is 11 minutes, and NeRF is about one day, with the former achieving more than 100 times the acceleration; in the unbounded scene, the optimization time for Plenoxels is about 27 minutes, and the NeRF++ is about four days, with the former achieving more than 200 times the acceleration. Although the implementation of Plenoxels is not optimized for fast rendering, it can render new viewpoints at an interactive rate of 15 frames per second. If you want faster rendering speeds, the optimized Plenoxel model can be converted to PlenOctree (a new approach proposed by author Alex Yu et al. in an ICCV 2021 paper: https://alexyu.net/plenoctrees/).

Specifically, the researchers proposed an explicit voxel representation based on a view-dependent sparse voxel mesh without any neural networks. The new model renders realistic new viewpoints and leverages the microrenderable render loss and variation regularizer on the trained view for end-to-end optimization of calibrated 2D photos.

They called the model Plenoxel (plenoptic volume elements) because it consisted of sparse voxel meshes, each of which stores opacity and spherical harmonic coefficient information. These coefficients are interpolated to continuously model the complete all-optical function in space. To achieve high resolution on a single GPU, the researchers trimmed empty voxels and followed an optimization strategy from coarse to fine. While the core model is a bounded voxel mesh, they can model unbounded scenes in two ways: 1) using standardized device coordinates (for forward-facing scenes) and encoding the background around a mesh with a multisphere image (for 360° scenes).

Plenoxel's effect in an forward-facing scene.

Plenoxel's effect in a 360° scene.

This method shows that we can use standard tools to reconstruct realistic voxels from inverse problems, including data representations, forward models, regularization functions, and optimizers. Each of these components can be very simple and still achieve SOTA results. Experimental results show that the key element of the neural radiation field is not a neural network, but a differentiatable voxel renderer.

Framework overview

Plenoxel is a sparse voxel mesh in which each occupied voxel angle stores a scalar opacity σ and a spherical harmonic coefficient vector for each color channel. The authors refer to this representation as Plenoxel. Opacity and color at any location and in the direction of observation are determined by trilinear interpolation of values stored on adjacent voxels and evaluation of spherical harmonic coefficients in the appropriate direction of observation. Given a set of calibrated images, the model is optimized using render loss directly on the training ray. The architecture of the model is shown in Figure 2 below.

Figure 2 above is a conceptual diagram of the sparse Plenoxel model framework. Given an image of a set of objects or scenes, the researcher reconstructs one with density and spherical harmonic coefficients at each voxel: (a) a sparse voxel (Plenoxel) mesh. To render the rays, they (b) calculated the color and opacity of each sample point by three linear interpolations of adjacent voxel coefficients. They also used (c) microspecific rendering to integrate the colors and opacities of these samples. The voxel coefficients can then be optimized using standard MSE reconstruction loss relative to the training image and the total variation regularizer.

Experimental results

The researchers demonstrated the model effect in a composite bounded scene, a real unbounded forward-facing scene, and a real unbounded 360° scene. They compared the optimization time of the new model with all previous methods, including real-time rendering, and found that the new model was significantly faster. The quantitative comparison results are shown in Table 2, and the visual comparison results are shown in Figure 6, Figure 7, and Figure 8.

In addition, the new method yields high-quality results even after the first epoch of optimization in less than 1.5 minutes, as shown in Figure 5.

Quickly build an enterprise-grade ASR speech recognition assistant with NVIDIA Riva

NVIDIA Riva is an SDK that uses GPU acceleration to rapidly deploy high-performance conversational AI services for rapid development of speech AI applications. Riva is designed to help developers easily and quickly access sessionAL AI capabilities, out of the box, and quickly build high-level speech recognition services with a few simple commands and API operations. The service can process hundreds to thousands of audio streams as input and return text with minimal latency.

On December 29, 19:30-21:00, the main introduction of this online sharing is:

Introduction to Automatic Speech Recognition

Introduction and features of NVIDIA Riva

Rapid deployment of NVIDIA Riva

Launch the NVIDIA Riva client to quickly implement speech-to-text transcription

Use Python to quickly build an NVIDIA Riva-based automatic speech recognition service application

The nerve radiation field removes the "nerve", the training speed is increased by more than 100 times, and the quality of the 3D effect is not reduced

Read on

The loneliest neural network: only one neuron, but will "shadow doppelganger"

Maps, GPS is not reliable, UC Berkeley robot strange environment navigation of more than 3 kilometers

Research such as the University of Cambridge has found that stable and accurate deep neural networks do not actually exist in theory

A review of the latest literature on large-scale neural networks: training efficient DNNs, saving memory usage, optimizer design

Where have researchers gone in solving the traveling salesman problem with deep learning?

New research from PNAS: Cambridge scholars have found that some AI models cannot be computed

A single neuron can also achieve DNN, and the image classification accuracy is 98% | the Nature sub-journal

What is the risk of sudden death in 10 years? The first neural network algorithm tells you

Jeff Dean posted a review: The Golden Decade of Deep Learning

Talk about the body and soul of autonomous driving

Large inventory of the application of machine learning algorithms in autonomous driving

Stop the "TA" from listening to you, can the AI do it?

The transformational role of GPU computing and deep learning in drug discovery

Bloody work! The ZTE Axon 40 Ultra is coming: the most perfect full screen ever

HIT Ding Gong: A Cognitive Reasoning Method Based on Neural Symbols

To see how strong the AI is, someone took it to play a "script kill"

Hardware 丨 AMD expects to launch a CPU with an integrated AI engine as early as 2023

Why sound is suitable for building a brand strengthens the mind

The 7th generation of Qualcomm AI engine: through AI, see the future

Capture once in 5 minutes, at least 89 times a day at home! Suntech employee: I don't even dare to go to the toilet

Played a script kill, the same car teammate "not human"

2022 Le Orange New Product Launch: 14 new products qifa software and hardware fully upgraded

Is there any software to dub videos? Share software that can dub videos

Don't let ChatGPT run

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

Cheating with ChatGPT, beware of being caught, anti-plagiarism watermark technology makes students' nightmares come early

Google's "crazy" generative AI track, the latest model can "create" music with text and pictures

Spending only $60 can destroy 0.01% of the dataset, significantly reducing AI model performance

What to do if ChatGPT goes crazy? Xiaoice Li Di: Two keys that I can break

Experience ChatGPT again: it will still be wrong, but the logic is stronger

Xiaoza personally officially announced the Meta vision big model! Self-supervised learning requires no fine-tuning

The CV ring exploded again? Xiaoza high-profile official announcement DINOv2, split retrieval omnipotent, netizens: Meta is "Open" AI