Ever seen a GAN P graph, ever seen a GANP video?
Lo and behold, the person who had been speaking expressionlessly smiled the whole time; the person who was originally 4 or 50 years old had directly become a 20-something:
On the other side, "Hermione", who was smiling and singing, suddenly became angry, and could change into the face of a child of a few years old:
Obama is also like this, 4 versions of the facial state are at hand, and even the gender is given to the P woman:
No matter how the facial expression and state of the face change, these videos do not give people any sense of violation, and the whole process is so silky
Oh yes, in addition to live people, the face in the anime video can also be P:
It's a bit of a punch.
GAN-based video face editing
This model is from Tel Aviv University in Israel.
As we all know, the ability of GANs to encode rich semantics within their subliminal space has been widely used for face editing.
However, it is still a bit challenging to use in video: one is the lack of high-quality data sets, and the other is the need to overcome the basic obstacle of temporal coherency.
However, the researchers believe that the second point of this obstacle is mainly artificial.
Because the original video was time-consistent, the edited video changed, in part because some components were mishandled in the editing pipeline.
The video face semantic editing framework they proposed has made a significant improvement over the current level of technology:
Using only the standard non-sequential StyleGAN2, the different components in the GAN editing pipeline are analyzed to determine which components are consistent and used to operate with them.
The entire process does not involve any additional operations to maintain time consistency.
The specific process is divided into six steps:
1. The input video is first divided into frames, and the faces in each frame are cropped and aligned;
2. Use the pre-trained e4e encoder to invert each cropped face into the subtext of the pre-trained StyleGAN2;
3. Use PTI (a newly proposed video face editing method) in all parallel frames to fine-tune the generator, correct the errors in the initial inversion, and restore global consistency;
4. All frames are edited linearly by using fixed directions and steps, and their pivot latent codes are linearly manipulated;
5. Fine-tune the generator again, "stitching" the background and the edited face together;
6. Reverse the alignment step and paste the modified face back into the video.
△ Note that the neck has produced a lot of blemishes, and it is completely repaired in the last step
Compare with the SOTA model
How good the effect of this model is, let's compare it:
The first is getting younger, and the second and third are getting older.
It is clear that the faces in the current Sota model (Latent Transformer) and PTI model will "twitch" and appear some artifacts, and this new model avoids these problems.
In addition, the researchers conducted time-consistency tests.
There are two indicators:
Local Time Consistency (TL-ID), which evaluates consistency between two adjacent frames through an out-of-the-box consistency detection network. The higher the TL-ID score, the smoother the effect produced by the method, with no noticeable local jitter.
Global Time Consistency (TG-ID), which also uses a conformance detection network to assess similarity between all possible frames (not necessarily adjacent). A score of 1 indicates that the method successfully maintained the time consistency with the original video.
The results are as follows:
As you can see, this new model is slightly better in both indicators.
Finally, the code will be released on February 14th, and interested friends can squat ~ ~