Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

Xiao Cha from OuFei Temple Qubit Report | Official account QbitAI

Following GauGAN2, Nvidia launched a "super suture" of GAN , PoE GAN.

PoE GAN can accept a variety of modal inputs, text descriptions, image segmentation, sketches, styles can be converted into pictures.

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

And it can accept any two combinations of the above input modes at the same time, which is the meaning of PoE.

PoE is hinton's concept of "product of experts" proposed in 2002, and each expert (individual model) is defined as a probabilistic model on the input space.

Each individual input mode is a constraint that must be met by the composite image, so a set of images that satisfy all constraints is the intersection of each set of constraints.

Assuming that the joint conditional probability distribution of each constraint follows a Gaussian distribution, the intersection distribution is expressed in terms of the product of the single-conditional probability distribution.

Under these conditions, in order for the product distribution to have a high density in one region, each individual distribution needs to have a high density in that region to satisfy each constraint.

The focus of PoE GAN is on how to mix each input together.

Design of PoE GAN

The generator for PoE GAN uses global PoE-Net to blend variations of different types of inputs.

We encode each modal input as a eigenvector and then summarize it into a global PoE-Net using PoE. The decoder not only uses the output of the global PoE-Net, but also directly connects the segmentation and sketch encoders to output the image.

The structure of the global PoE-Net is as follows, where a potential feature vector z0 is used as a sample using PoE, which is then processed by MLP to output the feature vector w.

In the discriminator section, the authors propose a multimodal projection discriminator that generalizes the projection discriminator to handle multiple conditional inputs.

Unlike a standard projection discriminator that calculates the individual inner product between image embedding and conditional embedding, the inner product of each input modality is calculated here and the phase is used to obtain the final loss.

Transform the input GAN at will

PoE can generate images with single-mode inputs, multimodal inputs, or even no inputs.

When tested using a single input mode, PoE-GAN outperformed previous SOTA methods designed specifically for that mode.

For example, in the split input mode, PoE-GAN is superior to the previous SPADE and OASIS.

In the text input mode, PoE-GAN is superior to text-to-image models DF-GAN, DM-GAN+CL.

When conditioned on any subset of patterns, PoE-GAN can produce different output images. A random sample of PoE-GAN is shown below, provided that two modes (text + segmentation, text + sketch, segmentation + sketch) are on the landscape image dataset.

PoE-GAN can even have no input, at which point PoE-GAN becomes an unconditional generation model. Below is a sample of the unconditionally generated poE-GAN.

Team Introduction

The corresponding author of the paper is Liu Huanyu, a well-known engineer of NVIDIA, whose research focuses on deep generative models and their applications. Interesting products like NVIDIA Canvas and GauGAN are all in his hands.

The first thesis is Huang Xun, who graduated from Beijing University of Aeronautics and Astronautics with a bachelor's degree and a Ph.D. from Cornell University, and now works at NVIDIA.

Thesis Address: https://arxiv.org/abs/2112.05130

PoE： https://www.cs.toronto.edu/~hinton/absps/icann-99.pdf

Projection Discriminator: https://arxiv.org/abs/1802.05637

Nvidia launched GAN "Super Stitch", which can generate realistic photos by entering text sketches

Read on

After the mining ebb tide, the "waiting party" encountered a dilemma: on the one hand, the "assassin" and on the other hand, the "mining card"

What industries is AIGC changing? Films, television, and games are competing to be early adopters, and the risks cannot be ignored

The "water sellers" behind the giant AI melee have made numbness

Is this "iPhone moment" AI or NVIDIA?

How much computing power is needed to build a large AI model?

The cheap "new" graphics card that cross-border e-commerce hot-selling, who has pitted and pinched?

The domestic GPU that has just emerged in Tinder, who can break through the blockade of computing power?

NVIDIA is "monopolizing" the AI industry, are domestic manufacturers ready?

The thoughts and new things of A-share MCU chip manufacturers: selling goods at a loss under the dilemma of low-end internal involvement, overseas giants cutting into 32 or setting off the industry reshuffle, and combining automotive-grade and AI products into a breakthrough

Making 3D out of nothing – the AI revolution in the gaming industry

NVIDIA chip rose by 70,000 yuan in a week! GPT drove the price increase, and the gap in the main chip reached 300,000

TSMC: The strongest king, it is difficult to escape the ups and downs of the cycle

Jensen Huang, the first winner of the ChatGPT era

NVIDIA amplifies the move, and the game track rises 9% in two days! How much has AI+ gaming shaken the industry?

Silicon Valley hard currency Nvidia, which soared 500 billion overnight

He Xiaopeng sent Wu Xinzhu to the door of NVIDIA's house