文章目錄
-
-
- 1、論文總述
- 2、vgg/resnet 與 Inception系列網絡設計機制的不同點
- 3、ResNeXt并不是ensembling
- 4、兩個設計準則
- 5、split-transform-merge的本質
- 6、BN和Relu的位置
-
1、論文總述
這篇論文提出的網絡是resnet的更新版,設計思想結合了vgg/resnet 的stacking building blocks 以及 Inception系列的 split - transform - merge,ResNeXt中的next是指作者在這篇論文中提出了另一個次元:cardinality,作者認為這是了 depth 和 width 次元以外另一個可以提升網絡性能的次元,并且作者在論文實驗部分還驗證了 cardinality這個次元比depth 和 width這倆次元提升網絡性能更有效,cardinality就是split分支的個數 。其中 作者在實作這個提出的block時利用了分組卷積,作者也說明這是第一篇利用 分組卷積 來提升網絡性能的工作,同樣的模型複雜度下,性能超越resnet。
注: depth指網絡的層數,width指feature map的通道個數;ResNeXt 32*4d 指32個cardinality,每個cardinality的width為4。
In this paper, we present a simple architecture which
adopts VGG/ResNets’ strategy of repeating layers, while
exploiting the split-transform-merge strategy in an easy, extensible way. A module in our network performs a set
of transformations , each on a low-dimensional embedding,
whose outputs are aggregated by summation. We pursuit a
simple realization of this idea — the transformations to be
aggregated are all of the same topology (e.g., Fig. 1 (right)).
This design allows us to extend to any large number of
transformations without specialized designs.
Interestingly, under this simplified situation we show that
our model has two other equivalent forms (Fig. 3). The reformulation in Fig. 3(b) appears similar to the InceptionResNet module [37] in that it concatenates multiple paths;
but our module differs from all existing Inception modules
in that all our paths share the same topology and thus the
number of paths can be easily isolated as a factor to be investigated. In a more succinct reformulation, our module
can be reshaped by Krizhevsky et al.’s grouped convolutions [24] (Fig. 3©), which, however, had been developed
as an engineering compromise.
We empirically demonstrate that our aggregated transformations outperform the original ResNet module, even
under the restricted condition of maintaining computational
complexity and model size — e.g., Fig. 1(right) is designed
to keep the FLOPs complexity and number of parameters of
Fig. 1(left). We emphasize that while it is relatively easy to
increase accuracy by increasing capacity (going deeper or
wider), methods that increase accuracy while maintaining
(or reducing) complexity are rare in the literature.
2、vgg/resnet 與 Inception系列網絡設計機制的不同點
1、vgg與resnet的設計是堆疊同一類型的block
The VGG-nets [36] exhibit a simple yet effective strategy of constructing very deep networks : stacking build-
ing blocks of the same shape. This strategy is inherited
by ResNets [14] which stack modules of the same topology. This simple rule reduces the free choices of hyperparameters, and depth is exposed as an essential dimension
in neural networks. Moreover, we argue that the simplicity
of this rule may reduce the risk of over-adapting the hyperparameters to a specific dataset.
2、Inception系列的設計是spilt - transform - merge
Unlike VGG-nets, the family of Inception models [38,
17, 39, 37] have demonstrated that carefully designed
topologies are able to achieve compelling accuracy with low
theoretical complexity. The Inception models have evolved
over time [38, 39], but an important common property is
a split-transform-merge strategy. In an Inception module,
the input is split into a few lower-dimensional embeddings
(by 1×1 convolutions), transformed by a set of specialized
filters (3×3, 5×5, etc.), and merged by concatenation. It
can be shown that the solution space of this architecture is a
strict subspace of the solution space of a single large layer
(e.g., 5×5) operating on a high-dimensional embedding.
The split-transform-merge behavior of Inception modules
is expected to approach the representational power of large
and dense layers, but at a considerably lower computational
complexity.
以下是Inception系列的缺點:超參數太多,換個資料集還得重新調參
Despite good accuracy, the realization of Inception models has been accompanied with a series of complicating fac-
tors — the filter numbers and sizes are tailored for each
individual transformation, and the modules are customized
stage-by-stage. Although careful combinations of these
components yield excellent neural network recipes, it is in
general unclear how to adapt the Inception architectures to
new datasets/tasks, especially when there are many factors
and hyper-parameters to be designed.
說明:雖然作者提出的resnext與Inception系列看着有點像,但是設計起來比Inception系列的簡單很多,隻是相同block的堆疊而已。
3、ResNeXt并不是ensembling
Averaging a set of independently trained networks is an effective solution to improving accuracy [24],
widely adopted in recognition competitions [33]. Veit et al. [40] interpret a single ResNet as an ensemble of shallower
networks, which results from ResNet’s additive behaviors
[15]. Our method harnesses additions to aggregate a set of
transformations. But we argue that it is imprecise to view
our method as ensembling, because the members to be aggregated are trained jointly, not independently.
作者的解釋為:因為ResNeXt是聯合訓練的,并不是分開單獨訓練的
4、兩個設計準則
These blocks have the same topology, and are
subject to two simple rules inspired by VGG/ResNets:
(i)
if producing spatial maps of the same size, the blocks share
the same hyper-parameters (width and filter sizes), and
(ii)
each time when the spatial map is downsampled by a factor of 2, the width of the blocks is multiplied by a factor
of 2.
The second rule ensures that the computational complexity, in terms of FLOPs (floating-point operations, in #
of multiply-adds), is roughly the same for all blocks.
5、split-transform-merge的本質
關于 low-dimensional embedding的了解,請參考:深度學習中 Embedding層兩大作用的個人了解
6、BN和Relu的位置
Our models are realized by the form of Fig. 3©. We
perform batch normalization (BN) [17] right after the con-
volutions in Fig. 3©.6 ReLU is performed right after each
BN, expect for the output of the block where ReLU is performed after the adding to the shortcut, following [14].
6 With BN, for the equivalent form in Fig. 3(a), BN is employed after
aggregating the transformations and before adding to the shortcut.
參考文獻
1、Aggregated Residual Transformations for Deep Neural Networks - arxiv 16.11
2、【ResNext】《Aggregated Residual Transformations for Deep Neural Networks》