laitimes

The fine-tuning iteration is only 500 times, and the big-eye cute spawn effect surpasses StyleGAN, which can be played online

Reports from the Heart of the Machine

Editors: Du Wei, Chen Ping

The comic faces generated by JoJoGAN can be detailed enough to capture eye shape and detail.

Friends who like to read manga have more or less heard of "The Wonderful Adventure of JOJO", referred to as JOJO, which is a manga written by Japanese manga artist Hiruhiko Araki, with its unique drawing style and amazing plot, which can be described as a must-see for teenagers.

Good work is always inspiring, and researchers from the University of Illinois at Urbana-Champaign (UIUC) used JOJO as inspiration to develop a comic generating framework, JoJoGAN, which can stylize any human face. The user only needs to be given a single input style reference (the first row of images in the figure below, including different anime and cartoon characters), JoJoGAN can apply the style to any input image (the singer on the far left of the figure below, IU, Musk), and the resulting image style features are well preserved, such as eyes, hair color, etc.

For example, Musk, who was born into a long-haired princess style, musk with big eyes looks quite cute:

The fine-tuning iteration is only 500 times, and the big-eye cute spawn effect surpasses StyleGAN, which can be played online

JoJoGAN Other Build Effects Show:

The fine-tuning iteration is only 500 times, and the big-eye cute spawn effect surpasses StyleGAN, which can be played online

JoJoGAN can also play online, you can also enter your own pictures to view the generated comic face, here, we also tried to play a bit, the effect is not bad:

The fine-tuning iteration is only 500 times, and the big-eye cute spawn effect surpasses StyleGAN, which can be played online

Demo address: https://huggingface.co/spaces/akhaliq/JoJoGAN

The fine-tuning iteration is only 500 times, and the big-eye cute spawn effect surpasses StyleGAN, which can be played online

Address of the paper: https://arxiv.org/pdf/2112.11641.pdf

Project Address: https://github.com/mchong6/JoJoGAN

Overall, JoJoGAN first approximates a paired training dataset and then fine-tuns StyleGAN to perform one-shot facial stylization. The study shows that JoJoGAN can preserve the style details of the reference image well under zero supervision, and can also generalize to different styles.

Technical interpretation

Let's start with JoJoGAN's workflow.

JoJoGAN works by fine-tuning pre-trained StyleGAN2 with a single reference style image in four steps:

By gan flipping the reference style image y to prepare approximate paired training data, the resulting style code w can generate a reasonable real face image x;

Find out which real face image is generated x family w, which should match the reference style image y. Form (w_i, y) pairs as pairs of training sets;

Fine-tune based on these pairs of training data;

Generate a new Sample using the fine-tuned StyleGAN.

The fine-tuning iteration is only 500 times, and the big-eye cute spawn effect surpasses StyleGAN, which can be played online

Then there's data preparation.

Image stylization tasks are the best choice for training with paired data, but paired data is not readily available and requires a lot of time and resources. Currently, there is no good open source paired dataset in the field that is suitable for the tasks in this study.

Therefore, the researchers want to overcome this problem by approximateing the pairwise training dataset in Figure 3 below. Given a style reference image y, they perform a GAN inversion using the e4e framework to get w. Because e4e is trained on a real face dataset and cannot be generalized to out-of-distribution style images, the researcher is provided with a w that approximates the real face image y, forming a paired (w, y) training set.

The fine-tuning iteration is only 500 times, and the big-eye cute spawn effect surpasses StyleGAN, which can be played online

However, training with only a single data point results in poor generalization of other images, as shown in Figure 4 below. Therefore, the researchers overcame this problem by generating more training data points. The idea is simple, many real-life face images should match reference images of the same style. For example, faces with slightly different eye sizes or hair textures can match the same reference image.

The fine-tuning iteration is only 500 times, and the big-eye cute spawn effect surpasses StyleGAN, which can be played online

Finally, the researchers used the Adam optimizer to fine-tune JoJoGAN 500 iterations at a learning rate of 2×10^-3, which took only about 1 minute on the Nvidia A40.

The researchers compared JoJoGAN, which does not preserve color, with the current SOTA single/small sample stylization methods StyleGAN-NADA and BlendGAN. The results show that JoJoGAN can capture the small details that define the style while maintaining a clear input face identity feature.

As shown in Figure 5a below, JoJoGAN perfectly captures eye shape and detail as well as hair ornaments from style references; in Figure 5d, JoJoGAN accurately captures complex face paintings. In contrast, while StyleGAN-NADA also captured the overall clown makeup, it failed to capture details such as eyes and eyebrows, and identity traits were also greatly affected. BlendGAN failed to capture meaningful stylistic details, not even matching the color of the hairstyle.

The fine-tuning iteration is only 500 times, and the big-eye cute spawn effect surpasses StyleGAN, which can be played online
by

Read on