laitimes

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

author:Quantum Position

Xiao Zhen is from The Temple of Oufei

Qubits | Official account QbitAI

How complicated is rendering a Dragon Ball 3D handpiece that is fine to the folds of hair and skin?

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

For the classic NeRF, at least 100 hand-made photos taken from a specific distance by the same camera are required.

But now, a new AI model only needs 40 online images from unlimited sources to render the entire handmade!

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

There are no requirements for the shooting angle, distance, light and dark of these photos, but the restored pictures can be clear and without artifacts:

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

You can even estimate the material and re-polish it from any angle:

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

The AI model, called NeROIC, is a new trick played by the University of Southern California and the Snap team.

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

Some netizens were ecstatic:

Different angles of photos can render 3D models, fast forward to only use photos to shoot movies...
If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

There are also netizens who take the opportunity to fry the wave NFT (manual dog head)

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

So, how exactly does NeROIC acquire the 3D shape and properties of an object based on just arbitrary 2D inputs?

Based on NeRF improvement, material illumination can be predicted

Before introducing this model, we need to briefly review NeRF.

NeRF proposes a method called neural radiance field, which uses a 5D vector function to represent a continuous scene, in which five parameters are used to represent the coordinate position (x, y, z) and viewing angle direction (θ, φ) of the spatial point, respectively.

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

However, NeRF has some problems:

  • The requirements for input pictures are high, and they must be photos of objects taken in the same scene;
  • It is impossible to predict the material properties of an object, so it is not possible to change the lighting conditions of the rendering.

This time, neROIC is optimized for these two aspects:

  • The scene of the input picture is not limited, it can be any background photo of the object, or even a network picture;
  • Material properties can be predicted, and the surface lighting effect of the object can be changed when rendering (lighting can be made).

It consists mainly of 2 networks, including the deep extraction network (a) and the rendering network (c).

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

The first is the deep extraction network, which is used to extract various parameters of the body.

In order to achieve unlimited input scenes, it is necessary to let the AI learn to cut the picture from different backgrounds first, but because the AI estimates the position of the camera inaccurately, the picture cut out always has the following artifacts (left in the figure):

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

Therefore, the deep extraction network introduces camera parameters, allowing the AI to learn how to estimate the position of the camera, that is, to estimate the angle from which the netizens in the picture are shooting, how far away, and the picture cut out is close to the real effect (GT):

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

At the same time, a new algorithm for estimating the normals of the surface of the object is designed, which preserves key details while also eliminating the effect of geometric noise (normals, that is, the texture of the model surface, change with the change of light conditions, thus affecting the light rendering effect):

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

Finally, there is the rendering network, which uses the extracted parameters to render the effect of the 3D object.

Specifically, the paper proposes a method that combines color prediction, neural networks, and parametric models for calculating colors, predicting final normals, and so on.

Among them, the implementation framework of NeROIC is built with PyTorch, and 4 NVIDIA Tesla V100 graphics cards are used for training.

During training, the deep extraction network takes 6 to 13 hours, and the rendering network runs for 2 to 4 hours.

3D models can be rendered with network images

As for the dataset used to train NeROIC, there are three main parts:

Images from the Internet (some of the goods come from online shopping platforms, namely Amazon and Taobao), NeRD, and the author's own (milk, TV, model) images, collecting an average of 40 photos per object.

So, how effective is such a model?

The paper first compares NeROIC with NeRF.

Intuitively, NeROIC is better than NeRF in terms of object rendering detail and sharpness.

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

Specific to the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), the "keying" technique of the deep extraction network is quite good, and it is better than NeRF:

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

At the same time, the paper also tested the effect of the rendered model in more scenes, and it turned out that there would be no artifacts:

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

It also produces new angles, and the effect of re-lighting is also good, for example, in an outdoor scene:

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

The lighting of the indoor scene is another effect:

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

The authors also tried to reduce the number of photos to 20 or even 10, training NeRF and NeROIC.

The results showed that neroic worked better than NeRF even with insufficient data sets.

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

However, some netizens said that the author did not give the rendering effect of glass or translucent materials:

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

For AI, reconstructing transparent or translucent objects is indeed a more complex task, and you can try the effect after the code comes out.

According to the authors, the code is still in preparation. Netizens joked: "Maybe it will be released after the speech."

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

A Tsinghua alumnus

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

Kuang Zhengfei is currently studying for a Ph.D. at the University of Southern California under the supervision of Li Hao, a well-known Chinese professor in the field of computer graphics.

He graduated from tsinghua computer science department with a bachelor's degree and worked as an assistant researcher in Professor Hu Shimin's planning team.

The article was written during his internship at Snap, and the rest of the authors are all from the Snap team.

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

In the future, it may only take a few netizens to "sell shows" to really be able to do VR cloud trials at home.

If you can't afford to do it, use AI to render one! You can synthesize it with a random search on the Internet

Address of thesis:

https://arxiv.org/abs/2201.02533

Project Address:

https://formyfamily.github.io/NeROIC/

Reference Links:

[1]https://zhengfeikuang.com/

[2]https://ningding97.github.io/fewnerd/

[3]https://twitter.com/ben_ferns/status/1486705623186112520

[4]https://twitter.com/ak92501/status/1480353151748386824

— Ends —

Qubit QbitAI · Headline signing

Follow us and be the first to know about cutting-edge technology developments