Meta uses a neural network to realistically restore the main components of the skin from the RGB albedo and build a realistic human face

(Nweon, March 10, 2022) -- A long-term goal of computer graphics is to create convincing, realistic reproductions of human faces, which often rely on experienced artists to fine-tune the coloring parameters of a large number of spatial variations. As an important component of virtual people, the community has put a lot of effort into accurately modeling skins. Skin is challenging not only because of its complex interactions with light, but also because the human visual system has evolved powerful facial perception.

Recently, there has been an increase in academic interest in estimating skin properties through capture, mainly focusing on the use of biophysical constraints to invert the biophysical properties of skin from diffuse reflections. In a paper titled Estimation of Spectral Biophysical Skin Properties from Captured RGB Albedo, Meta, Cornell University, and University College London propose an entirely new model. Among them, it can be based on the biophysical spectrum of skin albedo space, and utilize neural networks to accurately restore the main components and structure of the skin from simple RGB albedo.

Meta uses a neural network to realistically restore the main components of the skin from the RGB albedo and build a realistic human face

To achieve the stated goals, the model takes a biophysical description of the main characteristics of human skin and converts it into albedo through Monte Carlo simulations. The process consists of two steps. First, the researchers precomputed an albedo space, which is a high-dimensional skin tone tensor produced by a combination of all possible skin parameters. Then, learn the inverse mapping from albedo to the relevant skin properties.

The team describes how to create an albedo space, the details of the skin model and the data used, and the balance of actual complexity and expressiveness.

1. Skin model: structural and optical properties

1.1 Skin structure

The team decided to limit the model to two layers, the epidermis and the dermis, because similar assumptions had proven sufficient in the past and were in line with the team's purpose. The epidermis consists of two parts: the living epidermis and the stratum corneum. The latter is the outermost layer, and its properties (surface roughness and sebum production) affect the specular reflections of the skin. But given its low absorption and relatively small thickness, it has the least effect on skin albedo. On the other hand, the researchers modeled the dermis as a monolayer with average scattering and absorption characteristics of two sublayers (reticulated dermis and papillary dermis). It is simulated as a semi-infinite medium and ignores the subcutaneous fat layer. This is to keep the model generic, with an emphasis on the face: next to the dermis is not only fat, but also other internal tissues such as cartilage or muscle, depending largely on the anatomical location and composition of the subject. In addition, the team found that including dermal thickness had the least effect on the resulting diffuse reflections.

1.2. Absorption

Based on the well-known optical model of multilayer tissue, the researchers described the optical properties of each layer by spectral absorption (μa) and scattering (μs) coefficients. The absorption of each layer of μai is the result of the additive contribution of the various chromophores present in each layer to absorb μac.

In the same spirit of the biophysical approach, the team incorporated the action of melanin and integrated hemoglobin Vb present in the blood into the dermis.

1.3. Scattering

The team treats the epidermis and dermis as homogeneous media, the latter being semi-infinite. The refractive index of both layers is 1.4, which is derived from the weighting (by thickness) of the corresponding sublayers (stratum corneum (1.53), living epidermis (1.34), papillary dermis (1.39) and reticular dermis (1.395).

2. Hierarchical model of albedo generation

To calculate the diffuse reflection of skin plaques, the researchers first tried using a layered model based on Kubelka-Munk. But it lacks expressiveness and cannot effectively restore the parameters of a sufficiently wide range of skin types. So the researchers decided to turn to brute force Monte Carlo random walk. At this stage, the researchers focused on solving the transport problem through stacking of different layers of the skin structure. Next, walk in 2D, assuming the azimuth plane symmetry of each layer. The interface between the epidermis and dermis takes into account only the variations in scattering and absorption parameters, since the refractive index measurements of the two layers are essentially the same. For each skin tone, spectral simulations are performed on wavelengths in the visible range, ranging from 380 to 780 nm. Experiments have found that walking at 10 nm is enough to produce stable, noise-free albedo. For each wavelength, one million photons are emitted, simulating 2D random walking on two layers of semi-infinite medium to produce diffuse reflection albedo (in contrast to the rendering step, in which the UV mapping albedo is used for true 3D random walking).

Researchers see the skin as a homogeneous medium of exponential decay. Note that the simulation begins once the photons cross the outermost interface and diffuse into the tissue, i.e. select the initial direction from the Lambert distribution around the inverse surface normal. The biophysical model is then validated by plotting the spectral reflectance of different skin tones.

3. Spatial parameterization and sampling of albedo

The biophysically based albedo space is created by changing the skin properties within the range listed in Table 1. Note that researchers consider epidermal thickness to be critical parameter estimates. Taking into account this parameter of the face, as well as the different values of melanin concentrations, this helps to obtain local dark areas (such as moles) and generalize to any skin type. The team allowed both melanin and hemoglobin to exceed the measured adult normal values in order to automatically process facial outliers. The expansion of this range is reasonable for oxygenation levels because it can vary a lot depending on the state of the person's body. For melanin type ratios, there is little consistency among the available measurement data.

As a result, parameters such as bilirubin and β-carotene concentrations remain fixed at common values measured on human skin. Epidermal thickness, melanin type ratio, and hemoglobin type ratio were all treated uniformly.

4. Restore skin properties

The biophysical model proposed by the team defines a forward mapping from skin parameters to skin albedo. For the inverse process, the researchers need to describe the mapping from skin albedo to skin parameters. The non-bipolar nature of this mapping makes this task challenging, where a combination of a large number of skin attributes can result in the same albedo.

It is worth noting that the model runs in spectral space to calculate the reflectivity corresponding to each combination of skin parameters, but then integrates such reflectivity into the RGB to learn the mapping. It's a design choice: it simplifies capture and provides good enough recovery skin parameter mapping to a) reconstruct the original albedo; b) perform reasonable editing. But operating in higher dimensional spaces can alleviate heterology, which helps to more accurately separate skin parameters, generalize the model to any light source, or be robust to different exposure levels, and so on.

4.1. Look-up Tensor（LUT）

The team pre-calculated the broad tensor of skin tones according to the sampling strategy described above. To estimate the skin parameters for a given input albedo, search for LUTs on each texture of the albedo to find the best set of skin parameters that minimize the error of reconstruction L2. Then, manipulate them and query for a new corresponding albedo from LUT. This method is able to faithfully reconstruct the skin albedo with an error close to zero. However, since the mapping from RGB to skin parameters is not smooth, there is noise in the invert parameter map, and there are many discontinuities. Conversely, editing adjacent pixels can lead to unexpected abrupt changes in the reconstruction albedo.

4.2. Neurodermal parameters

Estimated to overcome the limitations of the LUT method, the researchers chose to train an encoder-decoder network to restore a smooth skin parameter graph. Both the encoder and decoder are MLP with four fully connected layers, with 70 neurons each of the two hidden layers. The encoder maps the 3D skin albedo to a 5D skin parameter vector, while the decoder maps in reverse.

Using a biophysical skin model, the team generated a dataset consisting of 600k vs. 5D skin parameter vectors and corresponding 3D albedo, segmented 80% and 20% for training and validation, respectively. To verify, the researchers sampled the skin parameters based on a uniform distribution. For training, use low-difference sequences that rely on quasi-Monte Carlo to better cover skin parameters. Subsequently, a nonlinear redraw of the two is performed according to the scheme described above and the corresponding albedo is calculated using a biophysical model.

5. Implementation and results

The albedo space is generated by spectral computation, but the input albedo is in the RGB space. To downsample multiband spectral values to RGB, the team used existing integration methods. Considering that diffuse albedo has a fairly limited color gamut and dynamic range, most color spaces will be feasible (using sRGB). The researchers made changes to the light source to allow for direct comparison with previous studies.

The team estimated and manipulated skin parameters for several skin types covered by the Fitzpatrick scale. In the LUT method, in the Intel Xeon W-2135 at 3.70GHz, a force multithreaded (12) search of the tensor using the maximum tensor (256k skin tone), the search time for 2k×2k images ranged from 2 hours to 5 hours, and the search time for 4k × 4K images was more than 7 hours. Conversely, the learned mapping has the advantages of low memory consumption and high computational efficiency (on the same Intel Xeon CPU mentioned above, the average time for 2k×2k images is less than 2 seconds).

Also, for the sake of spatial regularization and performance, the team sacrificed a little reconstruction correctness (which is imperceptible in the final rendering). In experimental comparisons, the general observation was that the team's reconstruction error was much lower.

相关论文：Estimation of Spectral Biophysical Skin Properties from Captured RGB Albedo

Overall, the team's contributions mainly include:

The space for expression of spectral skin albedo is limited by the physical properties of the true skin, consistent with measurements reported in tissue optics and medical studies. Such spaces are created through a complex biophysical model of human skin.
Learning inverse mapping from skin albedo to biophysical skin characteristics in its associated spectrum, which is able to restore smooth, high-resolution spatial variation mapping of skin characteristics from the albedo captured by RGB.
In conclusion, it is a framework that faithfully reproduces the albedo of various skin types with minimal error, and the editing function can be implemented in a robust manner through estimated biophysical properties.