laitimes

"As we all know, video can't P", GAN: is it?

Ever seen a GAN P graph, ever seen a GANP video?

Lo and behold, the person who had been speaking expressionlessly smiled the whole time; the person who was originally 4 or 50 years old had directly become a 20-something:

"As we all know, video can't P", GAN: is it?

On the other side, "Hermione", who was smiling and singing, suddenly became angry, and could change into the face of a child of a few years old:

"As we all know, video can't P", GAN: is it?

Obama is also like this, 4 versions of the facial state are at hand, and even the gender is given to the P woman:

"As we all know, video can't P", GAN: is it?

No matter how the facial expression and state of the face change, these videos do not give people any sense of violation, and the whole process is so silky

Oh yes, in addition to live people, the face in the anime video can also be P:

"As we all know, video can't P", GAN: is it?

It's a bit of a punch.

GAN-based video face editing

This model is from Tel Aviv University in Israel.

"As we all know, video can't P", GAN: is it?

As we all know, the ability of GANs to encode rich semantics within their subliminal space has been widely used for face editing.

However, it is still a bit challenging to use in video: one is the lack of high-quality data sets, and the other is the need to overcome the basic obstacle of temporal coherency.

However, the researchers believe that the second point of this obstacle is mainly artificial.

Because the original video was time-consistent, the edited video changed, in part because some components were mishandled in the editing pipeline.

The video face semantic editing framework they proposed has made a significant improvement over the current level of technology:

Using only the standard non-sequential StyleGAN2, the different components in the GAN editing pipeline are analyzed to determine which components are consistent and used to operate with them.

The entire process does not involve any additional operations to maintain time consistency.

The specific process is divided into six steps:

"As we all know, video can't P", GAN: is it?

1. The input video is first divided into frames, and the faces in each frame are cropped and aligned;

2. Use the pre-trained e4e encoder to invert each cropped face into the subtext of the pre-trained StyleGAN2;

3. Use PTI (a newly proposed video face editing method) in all parallel frames to fine-tune the generator, correct the errors in the initial inversion, and restore global consistency;

4. All frames are edited linearly by using fixed directions and steps, and their pivot latent codes are linearly manipulated;

5. Fine-tune the generator again, "stitching" the background and the edited face together;

6. Reverse the alignment step and paste the modified face back into the video.

"As we all know, video can't P", GAN: is it?

△ Note that the neck has produced a lot of blemishes, and it is completely repaired in the last step

Compare with the SOTA model

How good the effect of this model is, let's compare it:

"As we all know, video can't P", GAN: is it?
"As we all know, video can't P", GAN: is it?
"As we all know, video can't P", GAN: is it?

The first is getting younger, and the second and third are getting older.

It is clear that the faces in the current Sota model (Latent Transformer) and PTI model will "twitch" and appear some artifacts, and this new model avoids these problems.

In addition, the researchers conducted time-consistency tests.

There are two indicators:

Local Time Consistency (TL-ID), which evaluates consistency between two adjacent frames through an out-of-the-box consistency detection network. The higher the TL-ID score, the smoother the effect produced by the method, with no noticeable local jitter.

Global Time Consistency (TG-ID), which also uses a conformance detection network to assess similarity between all possible frames (not necessarily adjacent). A score of 1 indicates that the method successfully maintained the time consistency with the original video.

The results are as follows:

"As we all know, video can't P", GAN: is it?

As you can see, this new model is slightly better in both indicators.

Finally, the code will be released on February 14th, and interested friends can squat ~ ~

"As we all know, video can't P", GAN: is it?

Read on