天天看點

論文閱讀:Identity Mappings in Deep Residual Networks(ResNetV2)

文章目錄

      • 1、論文總述
      • 2、f變為恒等映射後的變化
      • 3、跳連 Identity 的重要性
      • 4、激活函數不同位置的影響
      • 5、pre-activation的兩點優勢
      • 6、訓練尺度用法
      • 參考文獻

1、論文總述

本篇論文針對ResNet的中殘差和恒等映射進行了進一步的分析,提出了一個改進版本ResNetV2,不過本人認為大多數情況下用原來的ResNet50或者ResNet101就已經夠用,ResNetV2主要是針對CNN特别特别深時的改進,如大于100層,到1000層時,這時候再換ResNetV2即可。

論文閱讀:Identity Mappings in Deep Residual Networks(ResNetV2)

本文的工作主要是有兩部分:(1)針對殘差子產品中的恒等映射h進行研究,把它換成1*1卷積、門函數、常數尺度縮放等,發現還是直接啥也不做的效果更好,即h(x) = x;(2)針對h和F相加之後的激活函數 f 的位置進行研究,發現把 f 放在殘差F中在深層時效果更好,進而提出了ResNetV2。

論文閱讀:Identity Mappings in Deep Residual Networks(ResNetV2)

To understand the role of skip connections, we analyze and compare various

types of h(xl). We find that the identity mapping h(xl) = xl chosen in [1]

achieves the fastest error reduction and lowest training loss among all variants

we investigated, whereas skip connections of scaling, gating [5,6,7], and 1×1

convolutions all lead to higher training loss and error. These experiments suggest

that keeping a “clean” information path (indicated by the grey arrows in Fig. 1, 2,

and 4) is helpful for easing optimization.

To construct an identity mapping f(yl) = yl, we view the activation functions (ReLU and BN [8]) as “pre-activation” of the weight layers, in contrast to conventional wisdom of “post-activation”. This point of view leads to a newresidual unit design, shown in (Fig. 1(b)). Based on this unit, we present competitive results on CIFAR-10/100 with a 1001-layer ResNet, which is much easier

to train and generalizes better than the original ResNet in [1]. We further report

improved results on ImageNet using a 200-layer ResNet, for which the counterpart of [1] starts to overfit. These results suggest that there is much room to exploit the dimension of network depth, a key to the success of modern deep

learning.

2、f變為恒等映射後的變化

論文閱讀:Identity Mappings in Deep Residual Networks(ResNetV2)

3、跳連 Identity 的重要性

論文閱讀:Identity Mappings in Deep Residual Networks(ResNetV2)

上圖即是對h進行各式改進,然後做實驗,下表即是實驗結果,可以發現還是原來的效果最好。

論文閱讀:Identity Mappings in Deep Residual Networks(ResNetV2)

As indicated by the grey arrows in Fig. 2, the shortcut connections are the

most direct paths for the information to propagate. Multiplicative manipulations

(scaling, gating, 1×1 convolutions, and dropout) on the shortcuts can hamper

information propagation and lead to optimization problems.

It is noteworthy that the gating and 1×1 convolutional shortcuts introduce

more parameters, and should have stronger representational abilities than identity shortcuts. In fact, the shortcut-only gating and 1×1 convolution cover the

solution space of identity shortcuts (i.e., they could be optimized as identity

shortcuts). However, their training error is higher than that of identity shortcuts, indicating that the degradation of these models is caused by optimization

issues, instead of representational abilities.(不是表示能力不行,而是現有的優化方法無法更好的優化這樣的)

4、激活函數不同位置的影響

論文閱讀:Identity Mappings in Deep Residual Networks(ResNetV2)

(a)是原始ResNet采用的;(b)在相加之後改變了資料的分布,影響梯度傳播好像,具體可以參考:ResNetV2:ResNet深度解析;(c)殘差單元輸出值都是正的;(d)和原始的性能差不多;(e)即本文提出的ResNetV2,在特别特别深層時相比原始的有提升。

注:(d)和(e)都屬于預激活pre-activation

5、pre-activation的兩點優勢

We find the impact of pre-activation is twofold. First, the optimization is further

eased (comparing with the baseline ResNet) because f is an identity mapping.

Second, using BN as pre-activation improves regularization of the models.

更具體的解釋可以參考論文11、 12頁

6、訓練尺度用法

Table 5 shows the results of ResNet-152 [1] and ResNet-200, all trained from

scratch. We notice that the original ResNet paper [1] trained the models using

scale jittering with shorter side s ∈ [256, 480], and so the test of a 224×224 crop

on s = 256 (as did in [1]) is negatively biased. Instead, we test a single 320×320

crop from s = 320, for all original and our ResNets. Even though the ResNets

are trained on smaller crops, they can be easily tested on larger crops because

the ResNets are fully convolutional by design. This size is also close to 299×299

used by Inception v3 [19], allowing a fairer comparison.

參考文獻

[1] ResNetV2:Identity Mappings in Deep Residual Networks 論文閱讀

[2] ResNetV2:ResNet深度解析

繼續閱讀