GANs入门系列之（一）50行代码实现GANs（PyTorch）

大神的一篇blog，有人翻译过，但是还不如翻译软件，所以自己捋一下，GANs入门必看经典。

什么是 GAN？

In 2014, Ian Goodfellow and his colleaguesat the University of Montreal published a stunning paper introducing the worldto GANs, or generative adversarial networks. Through an innovative combinationof computational graphs and game theory they showed that, given enough modelingpower, two models fighting against each other would be able to co-train throughplain old backpropagation.

2014年，IanGoodfellow和他在蒙特利尔大学的同事发表了一篇令人惊叹的论文（GANs），提出了GANs（生成式对抗网络）。他们通过创新性地组合计算图和博弈论，展示了给定足够的建模能力，两个相互对抗的模型能够通过普通的反向传播进行共同训练。

The models play two distinct (literally,adversarial) roles. Given some real data set R, G is the generator, trying tocreate fake data that looks just like the genuine data, while D is thediscriminator, getting data from either the real set or G and labeling thedifference. Goodfellow’s metaphor (and a fine one it is) was that G was like ateam of forgers trying to match real paintings with their output, while D wasthe team of detectives trying to tell the difference. (Except that in thiscase, the forgers G never get to see the original data — only the judgments of D.They’relike blind forgers.)

模型扮演了两个不同的（确切地说，是对抗的）的角色。给定一些真实数据集R，G是生成器，试图创建看起来像真实数据的假数据，而D是判别器，从真实数据集或G中获得数据并标记差异。 Goodfellow给了一个很贴切的比喻，G像一伙努力用他们的输出匹配真实图画的伪造者，而D是一帮努力鉴别差异的侦探。（唯一的不同是，伪造者G永远不会看到原始数据 –而只能看到D的判断。他们是一伙盲人骗子）。

GANs入门系列之（一）50行代码实现GANs（PyTorch）

In the ideal case, both D and G would getbetter over time until G had essentially become a “master forger” of thegenuine article and D was at a loss, “unable to differentiate between the twodistributions.”

理想状态下，D和G将随着时间的推移而变得更好，直到G真正变成了原始数据的“伪造大师”，而D则彻底迷失，“无法分辨真假”。

In practice, what Goodfellow had shown was thatG would be able to perform a form of unsupervised learning on the originaldataset, finding some way of representing that data in a (possibly) muchlower-dimensional manner. And as Yann LeCun famously stated, unsupervisedlearning is the “cake” of true AI.

实际上，Goodfellow已经指出，G将能够对原始数据集进行无监督学习，找到某种（可能）维度低得多的方式来表示该数据。就像Yann LeCun所说，无监督学习是the “cake” of true AI。

用 PyTorch 训练 GAN

This powerful technique seems like it mustrequire a metric ton of code just to get started, right? Nope. Using PyTorch,we can actually create a very simple GAN in under 50 lines of code. There arereally only 5 components to think about:

这种强大的技术似乎需要大量的代码才可以，对吧？并不是。使用PyTorch，我们实际上可以在50行代码下创建一个非常简单的GAN。真的只需要考虑5个组件：

R: The original, genuine data set

I: The random noise that goes into thegenerator as a source of entropy

G: The generator which tries to copy/mimicthe original data set

D: The discriminator which tries to tellapart G’s output from R

The actual ‘training’ loop where we teach Gto trick D and D to beware G.

R：原始的、真正的数据；

I：进入生成器作为熵源的随机噪声；

G：努力模仿原始数据的生成器；

D：努力将G从R中分辨出来的判别器；

训练循环，我们在其中教G来愚弄D，教D小心G。

1) R: In our case, we’ll start with thesimplest possible R — a bell curve. This function takes a mean and a standard deviationand returns a function which provides the right shape of sample data from aGaussian with those parameters. In our sample code, we’ll use a mean of 4.0 anda standard deviation of 1.25.

1）R：在我们的例子中，我们将从最简单的R- 一个钟形曲线开始。钟形函数采用均值和标准差，并返回一个函数，该函数提供了使用这些参数的高斯分布的正确形状的样本数据。在我们的示例代码中，我们将使用均值4.0和标准差1.25。

def get_distribution_sampler(mu, sigma):
    return lambda n: torch.Tensor(np.random.normal(mu, sigma, (1, n))) # Gaussian

2.) I: The input into the generator is alsorandom, but to make our job a little bit harder, let’s use a uniform distributionrather than a normal one. This means that our model G can’t simply shift/scalethe input to copy R, but has to reshape the data in a non-linear way.

2.）I：进入生成器的输入也是随机的，但是为了使我们的工作更难一点，让我们使用一个均匀分布，而不是一个普通的分布。这意味着我们的模型G不能简单地移动/缩放输入以复制R，而是必须以非线性方式重塑数据。

def get_generator_input_sampler():
    return lambda m, n: torch.rand(m, n) # Uniform-dist data into generator, _NOT_ Gaussian

3.) G: The generator is a standardfeedforward graph — two hidden layers, three linear maps. We’re using an ELU(exponential linear unit). G is going to get the uniformly distributed datasamples from I and somehow mimic the normally distributed samples from R.

3.）G：生成器是一个标准的前馈网络 - 两个隐藏层，三个线性映射。我们使用ELU（exponential linear unit，ELU）。 G将从I获得均匀分布的数据样本，并以某种方式模仿来自R的正态分布样本。

class Generator(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Generator, self).__init__()
        self.map1 = nn.Linear(input_size, hidden_size)
        self.map2 = nn.Linear(hidden_size, hidden_size)
        self.map3 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = F.elu(self.map1(x))
        x = F.sigmoid(self.map2(x))
        return self.map3(x)

4.) D: The discriminator code is verysimilar to G’s generator code; a feedforward graph with two hidden layers andthree linear maps. It’s going to get samples from either R or G and will outputa single scalar between 0 and 1, interpreted as ‘fake’ vs. ‘real’.

4.）D：鉴别器代码与生成器G的代码非常相似;具有两个隐藏层和三个线性映射的前馈网络。它将从R或G获取样本，并将输出介于0和1之间的单个标量，解释为“假”与“真”。

class Discriminator(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Discriminator, self).__init__()
        self.map1 = nn.Linear(input_size, hidden_size)
        self.map2 = nn.Linear(hidden_size, hidden_size)
        self.map3 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = F.elu(self.map1(x))
        x = F.elu(self.map2(x))
        return F.sigmoid(self.map3(x))

5.) Finally, the training loop alternatesbetween two modes: first training D on real data vs. fake data, with accuratelabels; and then training G to fool D, with inaccurate labels.

5.) 最后，训练在两种模式之间循环交替：首先在真实数据与假数据上用准确的标签训练D，; 然后用不准确的标签训练G来愚弄D。

for epoch in range(num_epochs):
    for d_index in range(d_steps):
        # 1. Train D on real+fake
        D.zero_grad()

        #  1A: Train D on real
        d_real_data = Variable(d_sampler(d_input_size))
        d_real_decision = D(preprocess(d_real_data))
        d_real_error = criterion(d_real_decision, Variable(torch.ones(1)))  # ones = true
        d_real_error.backward() # compute/store gradients, but don't change params

        #  1B: Train D on fake
        d_gen_input = Variable(gi_sampler(minibatch_size, g_input_size))
        d_fake_data = G(d_gen_input).detach()  # detach to avoid training G on these labels
        d_fake_decision = D(preprocess(d_fake_data.t()))
        d_fake_error = criterion(d_fake_decision, Variable(torch.zeros(1)))  # zeros = fake
        d_fake_error.backward()
        d_optimizer.step()     # Only optimizes D's parameters; changes based on stored gradients from backward()

    for g_index in range(g_steps):
        # 2. Train G on D's response (but DO NOT train D on these labels)
        G.zero_grad()

        gen_input = Variable(gi_sampler(minibatch_size, g_input_size))
        g_fake_data = G(gen_input)
        dg_fake_decision = D(preprocess(g_fake_data.t()))
        g_error = criterion(dg_fake_decision, Variable(torch.ones(1)))  # we want to fool, so pretend it's all genuine

        g_error.backward()
        g_optimizer.step() # Only optimizes G's parameters

Even if you haven’t seen PyTorch before,you can probably tell what’s going on. In the first (green) section, we pushboth types of data through D and apply a differentiable criterion to D’sguesses vs. the actual labels. That pushing is the ‘forward’ step; we then call‘backward()’ explicitly in order to calculate gradients, which are then used toupdate D’s parameters in the d_optimizer step() call. G is used but isn’ttrained here.

即使你以前没有见过PyTorch，你也可以知道发生了什么。在第一部分中，我们将两种类型的数据都传送到D，并对D的猜测和实际标签使用可区分的标准。这种传送是“前向”的步骤; 我们然后显式地调用'backward（）'，以便计算梯度，这会用于更新d_optimizer.step（）调用中的D的参数。我们在这里使用了G，但没有训练。

Then in the last (red) section, we do thesame thing for G — note that we also run G’s output through D (we’re essentially giving theforger a detective to practice on) but we do not optimize or change D at thisstep. We don’twant the detective D to learn the wrong labels. Hence, we only callg_optimizer.step().

然后在最后一个部分，我们为G做同样的事情- 注意，我们还通过D运行G的输出（我们基本上是给了骗子一个侦探来让他练手），但在这一步我们不优化或改变D。我们不想让侦探D学习错误的标签。因此，我们只调用g_optimizer.step（）。

And…that’s all. There’s some otherboilerplate code but the GAN-specific stuff is just those 5 components, nothingelse.

这就是全部了。还有一些其他样板代码，但GAN特定的东西只是那5个组件，没有别的了。

After a few thousand rounds of thisforbidden dance between D and G, what do we get? The discriminator D gets goodvery quickly (while G slowly moves up), but once it gets to a certain level ofpower, G has a worthy adversary and begins to improve. Really improve.

在D和G之间几千次的对抗训练中，我们得到什么？鉴别器D很快变优（而G缓慢进步着），但一旦达到某种程度，G就有了一个匹配的对手，并开始改善。真的改善。

Over 20,000 training rounds, the mean ofG’s output overshoots 4.0 but then comes back in a fairly stable, correct range(left). Likewise, the standard deviation initially drops in the wrong directionbut then rises up to the desired 1.25 range (right), matching R.

20,000多个训练轮次之后，G输出平均值超过4.0，但随后回到一个相当稳定、正确的范围（下图左）。同样，标准偏差最初错误的下降，但随后上升到我们希望的1.25的范围（下图右），匹配了R.

GANs入门系列之（一）50行代码实现GANs（PyTorch）

Ok, so the basic stats match R, eventually.How about the higher moments? Does the shape of the distribution look right?After all, you could certainly have a uniform distribution with a mean of 4.0and a standard deviation of 1.25, but that wouldn’t really match R. Let’s showthe final distribution emitted by G.

好，现在基本的统计和R匹配了。那些highermoments怎么办？分布的形状看上去正确吗？毕竟，你当然可以有一个均值分布，平均值为4.0，标准差为1.25，但那并不会真正地和R匹配。让我们看看G最终发出的分布。

GANs入门系列之（一）50行代码实现GANs（PyTorch）

Not bad. The left tail is a bit longer thanthe right, but the skew and kurtosis are, shall we say, evocative of theoriginal Gaussian.

真不赖。左尾比右边有点长，但我们应该说，偏斜和峭度是原始高斯的回归。

G recovers the original distribution Rnearly perfectly — and D is left cowering in the corner, mumbling to itself, unable totell fact from fiction. This is precisely the behavior we want (see Figure 1 inGoodfellow). From fewer than 50 lines of code.

G几乎完全重现了原来的分布R，D则暗自神伤，因为他已无法分辨事实和虚幻。这正是我们想要的结果（见Goodfellow中的图1）。只用了不到50行的代码。

Goodfellow would go on to publish manyother papers on GANs, including a 2016 gem describing some practicalimprovements, including the mini-batch discrimination method adapted here. Andhere’s a 2-hour tutorial he presented at NIPS 2016. For TensorFlow users, here’sa parallel post from Aylien on GANs.

Goodfellow继续就GAN的问题发表了许多文章，其中包括一篇2016年的瑰宝（点击打开链接），描述了一些实用的改进，其中包括了此处适用的mini-batch discrimination方法。这里有一个2小时的教程(点击打开链接)，是他在2016年的NIPS提出的。对于Tensorflow的用户来说，这里有一个parallel post(点击打开链接)，来自GANs的Aylien。

参考资料：

1.Blog

2.Code

3.http://www.sohu.com/a/126742829_473283

4.http://www.pytorchtutorial.com/50-lines-of-codes-for-gan/#_GAN

5.https://blog.csdn.net/xjc864588399/article/details/56289591

GANs入门系列之（一）50行代码实现GANs（PyTorch）

什么是 GAN？

用 PyTorch 训练 GAN

继续阅读

PyTorch的自动混合精度（AMP）

Pytorch自动混合精度(AMP)介绍与使用Pytorch自动混合精度(AMP)介绍与使用

关于半精度fp16的混合训练fp16fp16&fp32混合精度训练

pytorch 基于 apex.amp 的混合精度训练：原理介绍与实现

[MICCAI2019] Unified Attentional Generative Adversarial Network for Brain Tumor Segmentation From Mu

9、TORCH.UTILS.MODEL_ZOO

梯度累加及torch实现1. 什么是梯度累加2. 梯度累加的过程3. 实验4. 参考

torch.nn.Upsample实现上采样

深度学习的一些小记录里面有一部分是摘录

LabelImg的安装与使用（Anaconda环境）Labellmg的安装

pytorch：List中包含Tensor的grad数据怎么办？

Pytorch机器学习（九）—— YOLO中对于锚框，预测框，产生候选区域及对候选区域进行标注详解 Pytorch机器学习（九）—— YOLO中锚框，预测框，产生候选区域及对候选区域进行标注详解前言一、基本概念二、代码讲解总结

CogView: Mastering Text-to-Image Generation via Transformers翻译摘要1.介绍2.方法3.Finetuning

【深度学习】损失函数记录0. 前言1. 正文参考文献

深度学习之卷积01 卷积02 填充Padding03 步幅Stride04 卷积核的选择05 多通道卷积参考

【Torch】最简洁logging使用指南