laitimes

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

Xiao Cha from OuFei Temple Qubit Report | Official account QbitAI

Following GauGAN2, Nvidia launched a "super suture" of GAN , PoE GAN.

PoE GAN can accept a variety of modal inputs, text descriptions, image segmentation, sketches, styles can be converted into pictures.

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

And it can accept any two combinations of the above input modes at the same time, which is the meaning of PoE.

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

PoE is hinton's concept of "product of experts" proposed in 2002, and each expert (individual model) is defined as a probabilistic model on the input space.

Each individual input mode is a constraint that must be met by the composite image, so a set of images that satisfy all constraints is the intersection of each set of constraints.

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

Assuming that the joint conditional probability distribution of each constraint follows a Gaussian distribution, the intersection distribution is expressed in terms of the product of the single-conditional probability distribution.

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

Under these conditions, in order for the product distribution to have a high density in one region, each individual distribution needs to have a high density in that region to satisfy each constraint.

The focus of PoE GAN is on how to mix each input together.

Design of PoE GAN

The generator for PoE GAN uses global PoE-Net to blend variations of different types of inputs.

We encode each modal input as a eigenvector and then summarize it into a global PoE-Net using PoE. The decoder not only uses the output of the global PoE-Net, but also directly connects the segmentation and sketch encoders to output the image.

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

The structure of the global PoE-Net is as follows, where a potential feature vector z0 is used as a sample using PoE, which is then processed by MLP to output the feature vector w.

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

In the discriminator section, the authors propose a multimodal projection discriminator that generalizes the projection discriminator to handle multiple conditional inputs.

Unlike a standard projection discriminator that calculates the individual inner product between image embedding and conditional embedding, the inner product of each input modality is calculated here and the phase is used to obtain the final loss.

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

Transform the input GAN at will

PoE can generate images with single-mode inputs, multimodal inputs, or even no inputs.

When tested using a single input mode, PoE-GAN outperformed previous SOTA methods designed specifically for that mode.

For example, in the split input mode, PoE-GAN is superior to the previous SPADE and OASIS.

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

In the text input mode, PoE-GAN is superior to text-to-image models DF-GAN, DM-GAN+CL.

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

When conditioned on any subset of patterns, PoE-GAN can produce different output images. A random sample of PoE-GAN is shown below, provided that two modes (text + segmentation, text + sketch, segmentation + sketch) are on the landscape image dataset.

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

PoE-GAN can even have no input, at which point PoE-GAN becomes an unconditional generation model. Below is a sample of the unconditionally generated poE-GAN.

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

Team Introduction

The corresponding author of the paper is Liu Huanyu, a well-known engineer of NVIDIA, whose research focuses on deep generative models and their applications. Interesting products like NVIDIA Canvas and GauGAN are all in his hands.

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

The first thesis is Huang Xun, who graduated from Beijing University of Aeronautics and Astronautics with a bachelor's degree and a Ph.D. from Cornell University, and now works at NVIDIA.

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

Thesis Address: https://arxiv.org/abs/2112.05130

PoE: https://www.cs.toronto.edu/~hinton/absps/icann-99.pdf

Projection Discriminator: https://arxiv.org/abs/1802.05637

Read on