論文閱讀：Identity Mappings in Deep Residual Networks（ResNetV2）

文章目錄

- - 1、論文總述
  - 2、f變為恒等映射後的變化
  - 3、跳連 Identity 的重要性
  - 4、激活函數不同位置的影響
  - 5、pre-activation的兩點優勢
  - 6、訓練尺度用法
  - 參考文獻

1、論文總述

本篇論文針對ResNet的中殘差和恒等映射進行了進一步的分析，提出了一個改進版本ResNetV2，不過本人認為大多數情況下用原來的ResNet50或者ResNet101就已經夠用，ResNetV2主要是針對CNN特别特别深時的改進，如大于100層，到1000層時，這時候再換ResNetV2即可。

論文閱讀：Identity Mappings in Deep Residual Networks（ResNetV2）

本文的工作主要是有兩部分：（1）針對殘差子產品中的恒等映射h進行研究，把它換成1*1卷積、門函數、常數尺度縮放等，發現還是直接啥也不做的效果更好，即h(x) = x；（2）針對h和F相加之後的激活函數 f 的位置進行研究，發現把 f 放在殘差F中在深層時效果更好，進而提出了ResNetV2。

論文閱讀：Identity Mappings in Deep Residual Networks（ResNetV2）

To understand the role of skip connections, we analyze and compare various

types of h(xl). We find that the identity mapping h(xl) = xl chosen in [1]

achieves the fastest error reduction and lowest training loss among all variants

we investigated, whereas skip connections of scaling, gating [5,6,7], and 1×1

convolutions all lead to higher training loss and error. These experiments suggest

that keeping a “clean” information path (indicated by the grey arrows in Fig. 1, 2,

and 4) is helpful for easing optimization.

To construct an identity mapping f(yl) = yl, we view the activation functions (ReLU and BN [8]) as “pre-activation” of the weight layers, in contrast to conventional wisdom of “post-activation”. This point of view leads to a newresidual unit design, shown in (Fig. 1(b)). Based on this unit, we present competitive results on CIFAR-10/100 with a 1001-layer ResNet, which is much easier

to train and generalizes better than the original ResNet in [1]. We further report

improved results on ImageNet using a 200-layer ResNet, for which the counterpart of [1] starts to overfit. These results suggest that there is much room to exploit the dimension of network depth, a key to the success of modern deep

learning.

2、f變為恒等映射後的變化

論文閱讀：Identity Mappings in Deep Residual Networks（ResNetV2）

3、跳連 Identity 的重要性

論文閱讀：Identity Mappings in Deep Residual Networks（ResNetV2）

上圖即是對h進行各式改進，然後做實驗，下表即是實驗結果，可以發現還是原來的效果最好。

論文閱讀：Identity Mappings in Deep Residual Networks（ResNetV2）

As indicated by the grey arrows in Fig. 2, the shortcut connections are the

most direct paths for the information to propagate. Multiplicative manipulations

(scaling, gating, 1×1 convolutions, and dropout) on the shortcuts can hamper

information propagation and lead to optimization problems.

It is noteworthy that the gating and 1×1 convolutional shortcuts introduce

more parameters, and should have stronger representational abilities than identity shortcuts. In fact, the shortcut-only gating and 1×1 convolution cover the

solution space of identity shortcuts (i.e., they could be optimized as identity

shortcuts). However, their training error is higher than that of identity shortcuts, indicating that the degradation of these models is caused by optimization

issues, instead of representational abilities.（不是表示能力不行，而是現有的優化方法無法更好的優化這樣的）

4、激活函數不同位置的影響

論文閱讀：Identity Mappings in Deep Residual Networks（ResNetV2）

（a）是原始ResNet采用的；（b）在相加之後改變了資料的分布，影響梯度傳播好像，具體可以參考：ResNetV2：ResNet深度解析；（c）殘差單元輸出值都是正的；（d）和原始的性能差不多；（e）即本文提出的ResNetV2，在特别特别深層時相比原始的有提升。

注：（d）和（e）都屬于預激活pre-activation

5、pre-activation的兩點優勢

We find the impact of pre-activation is twofold. First, the optimization is further

eased (comparing with the baseline ResNet) because f is an identity mapping.

Second, using BN as pre-activation improves regularization of the models.

更具體的解釋可以參考論文11、 12頁

6、訓練尺度用法

Table 5 shows the results of ResNet-152 [1] and ResNet-200, all trained from

scratch. We notice that the original ResNet paper [1] trained the models using

scale jittering with shorter side s ∈ [256, 480], and so the test of a 224×224 crop

on s = 256 (as did in [1]) is negatively biased. Instead, we test a single 320×320

crop from s = 320, for all original and our ResNets. Even though the ResNets

are trained on smaller crops, they can be easily tested on larger crops because

the ResNets are fully convolutional by design. This size is also close to 299×299

used by Inception v3 [19], allowing a fairer comparison.

參考文獻

[1] ResNetV2:Identity Mappings in Deep Residual Networks 論文閱讀

[2] ResNetV2：ResNet深度解析

論文閱讀：Identity Mappings in Deep Residual Networks（ResNetV2）

文章目錄

1、論文總述

2、f變為恒等映射後的變化

3、跳連 Identity 的重要性

4、激活函數不同位置的影響

5、pre-activation的兩點優勢

6、訓練尺度用法

參考文獻

繼續閱讀

【配準】弱監督(Weakly-Supervised)系列配準論文閱讀

論文閱讀：Fast R-CNN1、論文總述2、RCNN和SPPnet的缺點3、SPPnet不能更新SPP層之前的參數的原因4、Multi-task loss5、Truncated SVD for faster detection6、Which layers to fine-tune?（檢測時從哪個層開始finetune）7、 Does multi-task training help?參考文獻

強化學習論文筆記：Real-Time Reinforcement Learning簡介問題方法SAC作為Baseline，Metrics是平均累計回報。總結

[論文閱讀：姿态識别&Transformer] TransPose: Keypoint Localization via Transformer 2021 ICCV1. 摘要2.主要工作3. Contributions4. 架構總覽 4.1. Architecture 5. Experiments

論文分享（三）——權重采樣音頻對抗樣本攻擊一.介紹二.相關工作三.背景四.方法五.實驗結果六.總結

Few-Shot Object Detection via Sample Processing

Lattice-BERT 論文閱讀Motivation 創新點

CVPR2020場景文字資料增強（python實作）

文獻閱讀--Certified Adversarial Robustness via Randomized Smoothing1 概述2 問題的引出3 Randomized smoothing

新手如何快速入門車輛控制領域？（附帶讀論文的工具）

Fast Spatio-Temporal Residual Network for Video Super-Resolution閱讀了解

論文閱讀——Parallel Multi-Resolution Fusion Network for Image Inpainting網絡結構損失函數

Glove公式推導

《論文閱讀》SAPBERT: Speaker-Aware Pretrained BERT for Emotion Recognition in Conversation

目标檢測系相關論文閱讀基礎網絡檢測算法架構優化方向

論文閱讀筆記（三）：Research on Network Attack Effect Evaluation Based on Confrontational Perspective一. 論文簡介二. 創新點和貢獻：三. 相關領域的概述(related work)四. 作者的方案五. 主要的資訊流（approach）六. 總結