論文閱讀：Aggregated Residual Transformations for Deep Neural Networks（ResNeXt）

文章目錄

- - 1、論文總述
  - 2、vgg/resnet 與 Inception系列網絡設計機制的不同點
  - 3、ResNeXt并不是ensembling
  - 4、兩個設計準則
  - 5、split-transform-merge的本質
  - 6、BN和Relu的位置

1、論文總述

這篇論文提出的網絡是resnet的更新版，設計思想結合了vgg/resnet 的stacking building blocks 以及 Inception系列的 split - transform - merge，ResNeXt中的next是指作者在這篇論文中提出了另一個次元：cardinality，作者認為這是了 depth 和 width 次元以外另一個可以提升網絡性能的次元，并且作者在論文實驗部分還驗證了 cardinality這個次元比depth 和 width這倆次元提升網絡性能更有效，cardinality就是split分支的個數。其中作者在實作這個提出的block時利用了分組卷積，作者也說明這是第一篇利用分組卷積來提升網絡性能的工作，同樣的模型複雜度下，性能超越resnet。

注： depth指網絡的層數，width指feature map的通道個數；ResNeXt 32*4d 指32個cardinality，每個cardinality的width為4。

論文閱讀：Aggregated Residual Transformations for Deep Neural Networks（ResNeXt）

In this paper, we present a simple architecture which

adopts VGG/ResNets’ strategy of repeating layers, while

exploiting the split-transform-merge strategy in an easy, extensible way. A module in our network performs a set

of transformations , each on a low-dimensional embedding,

whose outputs are aggregated by summation. We pursuit a

simple realization of this idea — the transformations to be

aggregated are all of the same topology (e.g., Fig. 1 (right)).

This design allows us to extend to any large number of

transformations without specialized designs.

Interestingly, under this simplified situation we show that

our model has two other equivalent forms (Fig. 3). The reformulation in Fig. 3(b) appears similar to the InceptionResNet module [37] in that it concatenates multiple paths;

but our module differs from all existing Inception modules

in that all our paths share the same topology and thus the

number of paths can be easily isolated as a factor to be investigated. In a more succinct reformulation, our module

can be reshaped by Krizhevsky et al.’s grouped convolutions [24] (Fig. 3©), which, however, had been developed

as an engineering compromise.

We empirically demonstrate that our aggregated transformations outperform the original ResNet module, even

under the restricted condition of maintaining computational

complexity and model size — e.g., Fig. 1(right) is designed

to keep the FLOPs complexity and number of parameters of

Fig. 1(left). We emphasize that while it is relatively easy to

increase accuracy by increasing capacity (going deeper or

wider), methods that increase accuracy while maintaining

(or reducing) complexity are rare in the literature.

2、vgg/resnet 與 Inception系列網絡設計機制的不同點

1、vgg與resnet的設計是堆疊同一類型的block

The VGG-nets [36] exhibit a simple yet effective strategy of constructing very deep networks : stacking build-

ing blocks of the same shape. This strategy is inherited

by ResNets [14] which stack modules of the same topology. This simple rule reduces the free choices of hyperparameters, and depth is exposed as an essential dimension

in neural networks. Moreover, we argue that the simplicity

of this rule may reduce the risk of over-adapting the hyperparameters to a specific dataset.

2、Inception系列的設計是spilt - transform - merge

Unlike VGG-nets, the family of Inception models [38,

17, 39, 37] have demonstrated that carefully designed

topologies are able to achieve compelling accuracy with low

theoretical complexity. The Inception models have evolved

over time [38, 39], but an important common property is

a split-transform-merge strategy. In an Inception module,

the input is split into a few lower-dimensional embeddings

(by 1×1 convolutions), transformed by a set of specialized

filters (3×3, 5×5, etc.), and merged by concatenation. It

can be shown that the solution space of this architecture is a

strict subspace of the solution space of a single large layer

(e.g., 5×5) operating on a high-dimensional embedding.

The split-transform-merge behavior of Inception modules

is expected to approach the representational power of large

and dense layers, but at a considerably lower computational

complexity.

以下是Inception系列的缺點：超參數太多，換個資料集還得重新調參

Despite good accuracy, the realization of Inception models has been accompanied with a series of complicating fac-

tors — the filter numbers and sizes are tailored for each

individual transformation, and the modules are customized

stage-by-stage. Although careful combinations of these

components yield excellent neural network recipes, it is in

general unclear how to adapt the Inception architectures to

new datasets/tasks, especially when there are many factors

and hyper-parameters to be designed.

說明：雖然作者提出的resnext與Inception系列看着有點像，但是設計起來比Inception系列的簡單很多，隻是相同block的堆疊而已。

3、ResNeXt并不是ensembling

Averaging a set of independently trained networks is an effective solution to improving accuracy [24],

widely adopted in recognition competitions [33]. Veit et al. [40] interpret a single ResNet as an ensemble of shallower

networks, which results from ResNet’s additive behaviors

[15]. Our method harnesses additions to aggregate a set of

transformations. But we argue that it is imprecise to view

our method as ensembling, because the members to be aggregated are trained jointly, not independently.

作者的解釋為：因為ResNeXt是聯合訓練的，并不是分開單獨訓練的

4、兩個設計準則

These blocks have the same topology, and are

subject to two simple rules inspired by VGG/ResNets:

(i)

if producing spatial maps of the same size, the blocks share

the same hyper-parameters (width and filter sizes), and

(ii)

each time when the spatial map is downsampled by a factor of 2, the width of the blocks is multiplied by a factor

of 2.

The second rule ensures that the computational complexity, in terms of FLOPs (floating-point operations, in #

of multiply-adds), is roughly the same for all blocks.

論文閱讀：Aggregated Residual Transformations for Deep Neural Networks（ResNeXt）

5、split-transform-merge的本質

論文閱讀：Aggregated Residual Transformations for Deep Neural Networks（ResNeXt）

關于 low-dimensional embedding的了解，請參考：深度學習中 Embedding層兩大作用的個人了解

6、BN和Relu的位置

Our models are realized by the form of Fig. 3©. We

perform batch normalization (BN) [17] right after the con-

volutions in Fig. 3©.6 ReLU is performed right after each

BN, expect for the output of the block where ReLU is performed after the adding to the shortcut, following [14].

6 With BN, for the equivalent form in Fig. 3(a), BN is employed after

aggregating the transformations and before adding to the shortcut.

參考文獻

1、Aggregated Residual Transformations for Deep Neural Networks - arxiv 16.11

2、【ResNext】《Aggregated Residual Transformations for Deep Neural Networks》

論文閱讀：Aggregated Residual Transformations for Deep Neural Networks（ResNeXt）

文章目錄

1、論文總述

2、vgg/resnet 與 Inception系列網絡設計機制的不同點

3、ResNeXt并不是ensembling

4、兩個設計準則

5、split-transform-merge的本質

6、BN和Relu的位置

繼續閱讀

【配準】弱監督(Weakly-Supervised)系列配準論文閱讀

論文閱讀：Fast R-CNN1、論文總述2、RCNN和SPPnet的缺點3、SPPnet不能更新SPP層之前的參數的原因4、Multi-task loss5、Truncated SVD for faster detection6、Which layers to fine-tune?（檢測時從哪個層開始finetune）7、 Does multi-task training help?參考文獻

強化學習論文筆記：Real-Time Reinforcement Learning簡介問題方法SAC作為Baseline，Metrics是平均累計回報。總結

[論文閱讀：姿态識别&Transformer] TransPose: Keypoint Localization via Transformer 2021 ICCV1. 摘要2.主要工作3. Contributions4. 架構總覽 4.1. Architecture 5. Experiments

論文分享（三）——權重采樣音頻對抗樣本攻擊一.介紹二.相關工作三.背景四.方法五.實驗結果六.總結

Few-Shot Object Detection via Sample Processing

Lattice-BERT 論文閱讀Motivation 創新點

CVPR2020場景文字資料增強（python實作）

文獻閱讀--Certified Adversarial Robustness via Randomized Smoothing1 概述2 問題的引出3 Randomized smoothing

新手如何快速入門車輛控制領域？（附帶讀論文的工具）

Fast Spatio-Temporal Residual Network for Video Super-Resolution閱讀了解

論文閱讀——Parallel Multi-Resolution Fusion Network for Image Inpainting網絡結構損失函數

Glove公式推導

《論文閱讀》SAPBERT: Speaker-Aware Pretrained BERT for Emotion Recognition in Conversation

目标檢測系相關論文閱讀基礎網絡檢測算法架構優化方向

論文閱讀筆記（三）：Research on Network Attack Effect Evaluation Based on Confrontational Perspective一. 論文簡介二. 創新點和貢獻：三. 相關領域的概述(related work)四. 作者的方案五. 主要的資訊流（approach）六. 總結