NVIDIA's new technology increases the training speed of NeRF models by 60 times, and it takes as little as 5 seconds at the fastest

2022-01-23 22:56:59

Recently, NVIDIA used a new technology to reduce the time to train a NeRF model to just 5 seconds.

In this regard, Jon Barron, a scientist at Google, wrote on Twitter: "18 months ago, it took 5 hours to train NeRF; 2 months ago, it still took 5 minutes to train NeRF; and now, Nvidia's latest technology has reduced the time to train NeRF to 5 seconds!" ”

It is understood that the main reason NVIDIA was able to achieve this result was the use of a technology called Multiresolution Hash Encoding. He also detailed the new technique in a paper, Instant Neural Graphics Primitives with a Multiresolution Hash Encoding.

Nvidia said: "Computer graphics primitives are basically represented by mathematical functions with parametric appearances. Mathematically expressed quality and performance characteristics are critical for visual fidelity. "It wanted to capture high-frequency, localized detail while maintaining fast and compact representations of functions.

In order to meet the above requirements, NVIDIA uses multi-resolution hash encoding technology. According to NVIDIA, the technology is independent of the task and has two characteristics: adaptability and efficiency. It is configured with only two values, the number of parameters T and the desired optimal resolution Nmax.

With this technology, it only takes a few seconds of training to achieve high quality in a variety of tasks.

NVIDIA's new technology increases the training speed of NeRF models by 60 times, and it takes as little as 5 seconds at the fastest

Figure | Live training demonstration of neural graphics primitives for multiple tasks on one GPU (Source: GitHub)

With NeRF, you can turn some static images into very realistic 3D images. However, NeRF is quite power-intensive and costly, especially when it comes to rendering.

According to the paper, the cost of "training and evaluating the neural graphic primitives of parametrically fully connected neural networks" is relatively high, and in order to reduce the cost, NVIDIA adopts a new universal input code that can significantly reduce the number of floating-point and memory access operations in a smaller network without reducing quality. NVIDIA thus achieved "several orders of magnitude of combinatorial acceleration that can train high-quality neurograph primitives in seconds." ”

Nvidia verified multi-resolution hash encoding technology in four more representative tasks, including Neural Radiance Fields (NeRF), Neural Radiance Caching (NRC), Gigapixel Image, and Neural Signed Distance Functions (SDF).

Here's a focus on NeRF tasks.

GIF | Reconstruction quality demo for different encodings (Source: GitHub)

Below each image in the graph above shows the number of trainable parameters (neural network weights + coded parameters) and the training time. NVIDIA says the training speed has improved due to the sparseness of parameter updates and the smaller neural network. In addition, as the number of parameters increases, the approximate quality can be further improved, while the training time will not increase significantly.

At the same time, NVIDIA's technology also supports realistic 360-degree panoramic scenes and "complex scenes with more blurry, specular surfaces", and can render them in real time and "train on randomly captured data in 5 minutes".

Video | 360-degree scenes from iPhone (Source: GitHub)

Video | A complex scene of 34 photos (Source: GitHub)

It is worth mentioning that the multi-resolution hash encoding technology also supports training a NerF-like radiation field from the noise output of the volumetric path tracker. During training, light is fed into the network in real time to learn a denoised radiation field.

Finally, a brief description of the other three tasks.

GIF | Comparison of triangular wave encoding (left) and multi-resolution hash encoding (right) display results (Source: GitHub)

From the image comparison above, it can be seen that the new multi-resolution hash encoding enables the network to learn more details, including shadow areas.

Video | One-pixel image task (Source: GitHub)

The image above shows the real-time training progress of the gigapixel image task. The task primarily refers to multi-layer perceptrons (MLP) learning "the mapping of RGB colors from 2D coordinates to high-resolution images."

Compared to adaptive Coordinate Networks (ACORN), NVIDIA's method requires 2.5 minutes of training to achieve a peak signal-to-noise ratio (PSNR) of 38.59 decibels, while ACORN takes 36.9 hours.

GIF | Real-time training progress on various SDF datasets (Source: GitHub)

It is worth mentioning that the training data for the Neural Symbol Distance Function task is dynamically generated from the ground-based live mesh and uses the NVIDIA OptiX ray tracing engine.

Many graphics problems rely on the sparsity or smoothness of task-specific data structures, while multi-resolution hash encoding provides a practical learning-based alternative. It automatically pays attention to relevant details and is even used for time-constrained settings such as online training and reasoning.

In the context of neural network input coding, it can also be a temporary alternative, for example, to accelerate NeRF by several orders of magnitude.

Nvidia proved that for many graphics applications, a single GPU training time can be achieved in seconds. This allows neural methods to be applied to more places.

-End-

reference:

https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.pdf

https://nvlabs.github.io/instant-ngp/

NVIDIA's new technology increases the training speed of NeRF models by 60 times, and it takes as little as 5 seconds at the fastest

Read on