Xiao Zhen is from The Temple of Oufei

Qubits | Official account QbitAI

How complicated is rendering a Dragon Ball 3D handpiece that is fine to the folds of hair and skin?

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

For the classic NeRF, at least 100 hand-made photos taken from a specific distance by the same camera are required.

But now, a new AI model only needs 40 online images from unlimited sources to render the entire handmade!

There are no requirements for the shooting angle, distance, light and dark of these photos, but the restored pictures can be clear and without artifacts:

You can even estimate the material and re-polish it from any angle:

The AI model, called NeROIC, is a new trick played by the University of Southern California and the Snap team.

Some netizens were ecstatic:

Different angles of photos can render 3D models, fast forward to only use photos to shoot movies...

There are also netizens who take the opportunity to fry the wave NFT (manual dog head)

So, how exactly does NeROIC acquire the 3D shape and properties of an object based on just arbitrary 2D inputs?

Based on NeRF improvement, material illumination can be predicted

Before introducing this model, we need to briefly review NeRF.

NeRF proposes a method called neural radiance field, which uses a 5D vector function to represent a continuous scene, in which five parameters are used to represent the coordinate position (x, y, z) and viewing angle direction (θ, φ) of the spatial point, respectively.

However, NeRF has some problems:

The requirements for input pictures are high, and they must be photos of objects taken in the same scene;
It is impossible to predict the material properties of an object, so it is not possible to change the lighting conditions of the rendering.

This time, neROIC is optimized for these two aspects:

The scene of the input picture is not limited, it can be any background photo of the object, or even a network picture;
Material properties can be predicted, and the surface lighting effect of the object can be changed when rendering (lighting can be made).

It consists mainly of 2 networks, including the deep extraction network (a) and the rendering network (c).

The first is the deep extraction network, which is used to extract various parameters of the body.

In order to achieve unlimited input scenes, it is necessary to let the AI learn to cut the picture from different backgrounds first, but because the AI estimates the position of the camera inaccurately, the picture cut out always has the following artifacts (left in the figure):

Therefore, the deep extraction network introduces camera parameters, allowing the AI to learn how to estimate the position of the camera, that is, to estimate the angle from which the netizens in the picture are shooting, how far away, and the picture cut out is close to the real effect (GT):

At the same time, a new algorithm for estimating the normals of the surface of the object is designed, which preserves key details while also eliminating the effect of geometric noise (normals, that is, the texture of the model surface, change with the change of light conditions, thus affecting the light rendering effect):

Finally, there is the rendering network, which uses the extracted parameters to render the effect of the 3D object.

Specifically, the paper proposes a method that combines color prediction, neural networks, and parametric models for calculating colors, predicting final normals, and so on.

Among them, the implementation framework of NeROIC is built with PyTorch, and 4 NVIDIA Tesla V100 graphics cards are used for training.

During training, the deep extraction network takes 6 to 13 hours, and the rendering network runs for 2 to 4 hours.

3D models can be rendered with network images

As for the dataset used to train NeROIC, there are three main parts:

Images from the Internet (some of the goods come from online shopping platforms, namely Amazon and Taobao), NeRD, and the author's own (milk, TV, model) images, collecting an average of 40 photos per object.

So, how effective is such a model?

The paper first compares NeROIC with NeRF.

Intuitively, NeROIC is better than NeRF in terms of object rendering detail and sharpness.

Specific to the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), the "keying" technique of the deep extraction network is quite good, and it is better than NeRF:

At the same time, the paper also tested the effect of the rendered model in more scenes, and it turned out that there would be no artifacts:

It also produces new angles, and the effect of re-lighting is also good, for example, in an outdoor scene:

The lighting of the indoor scene is another effect:

The authors also tried to reduce the number of photos to 20 or even 10, training NeRF and NeROIC.

The results showed that neroic worked better than NeRF even with insufficient data sets.

However, some netizens said that the author did not give the rendering effect of glass or translucent materials:

For AI, reconstructing transparent or translucent objects is indeed a more complex task, and you can try the effect after the code comes out.

According to the authors, the code is still in preparation. Netizens joked: "Maybe it will be released after the speech."

A Tsinghua alumnus

Kuang Zhengfei is currently studying for a Ph.D. at the University of Southern California under the supervision of Li Hao, a well-known Chinese professor in the field of computer graphics.

He graduated from tsinghua computer science department with a bachelor's degree and worked as an assistant researcher in Professor Hu Shimin's planning team.

The article was written during his internship at Snap, and the rest of the authors are all from the Snap team.

In the future, it may only take a few netizens to "sell shows" to really be able to do VR cloud trials at home.

Address of thesis:

https://arxiv.org/abs/2201.02533

Project Address:

https://formyfamily.github.io/NeROIC/

Reference Links:

[1]https://zhengfeikuang.com/

[2]https://ningding97.github.io/fewnerd/

[3]https://twitter.com/ben_ferns/status/1486705623186112520

[4]https://twitter.com/ak92501/status/1480353151748386824

— Ends —

Qubit QbitAI · Headline signing

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

Based on NeRF improvement, material illumination can be predicted

3D models can be rendered with network images

A Tsinghua alumnus