天天看點

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

ICML-2019

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

文章目錄

  • 1 Background and Motivation
  • 2 Related Work
  • 3 Advantages / Contributions
  • 4 Compound Model Scaling
    • 4.1 Problem Formulation
    • 4.2 Scaling Dimensions
    • 4.3 Compound Scaling
  • 5 EfficientNet Architecture
  • 6 Experiments
    • 6.1 Datasets
    • 6.2 Experimental for ImageNet
    • 6.3 Transfer Learning Results for EfficientNet
  • 7 Conclusion(own)

1 Background and Motivation

Scaling up ConvNets is widely used to achieve better accuracy.

常見的如

1)scale up 網絡深度(比如 resnet50 to resnet 101),

2)scale up 網絡的寬度(resnet50 to wide-resnet)

還有不常見的如

3)scale up 輸入的分辨率

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

三者之間是否有内在聯系?三者聯調如何實作最大精度的提升呢?

本文作者 first to empirically quantify the relationship among all three dimensions of network width, depth, and resolution,以高效的提升模型精度

2 Related Work

  • ConvNet Accuracy
  • ConvNet Efficiency——lightweight network
  • Model Scaling——width, depth, and resolutions

3 Advantages / Contributions

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

效仿 MNASNet AutoML 出 EfficientNet-B0,從 width, depth, and resolutions 三個次元 compound scale up EfficientNet-B0 形成不同大小的 EfficientNet-Bx,在 ImageNet 上實作 SOTA 且網絡參數很少,跨資料集驗證泛化性能也很棒(5/8 SOTA)

4 Compound Model Scaling

核心:

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

4.1 Problem Formulation

神經網絡 N N N 可以由堆疊的層 F ( X ) F(X) F(X) 來表示

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)
  • X 1 X_1 X1​ 是 input tensor
  • F j F_j Fj​ 是 operator(eg conv 和 activation),其中 j j j 表示 layer j j j

更子產品化一點可以表示為

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)
  • F i L i F_i^{L_i} FiLi​​ 表示 layer F i F_i Fi​ 在 stage i i i 中重複了 L i L_i Li​ 次

網絡疊代以提升精度的過程可表示為

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

4.2 Scaling Dimensions

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

Scaling up any dimension of network width, depth, or resolution improves accuracy, but the accuracy gain diminishes for bigger models.

1)scaling Depth

優勢:capture richer and more complex features

缺點:more difficult to train due to the vanishing gradient problem——diminishing accuracy return for very deep ConvNets(一定深度後 ACC 會達到瓶頸)

2)scaling Width

通道數增加了

優勢:wider networks tend to be able to capture more fine-grained features and are easier to train

缺點:have difficulties in capturing higher level features

3)scaling Resolution

優點:potentially capture more fine-grained patterns

4.3 Compound Scaling

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

上圖 width 固定,改變 depth 和 resolution 來觀測結果,發現同時改 depth 和 resolution 效果最猛

In order to pursue better accuracy and efficiency, it is critical to balance all dimensions of network width, depth, and resolution during ConvNet scaling

基于4.2 和 4.3 小節紅色字型的分析,作者提出了如下的 compound scaling 方法

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)
  • α \alpha α、 β \beta β、 γ \gamma γ 是通過 small grid search 來擷取的
  • ϕ \phi ϕ 是 a user-specified coefficient that controls how many more resources are available for model scaling

為啥限制 α \alpha α 時是 α \alpha α,而限制 β \beta β、 γ \gamma γ 時是 β 2 \beta^2 β2、 γ 2 \gamma^2 γ2?

doubling network depth will double FLOPS, but doubling network width or resolution will increase FLOPS by four times

按照作者的 compound scaling 方式,網絡的 FLOPS 變成了原來的

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

5 EfficientNet Architecture

基于 MNASNet 去 AutoML 基礎網絡 EfficientNet-B0——we optimize FLOPS rather than latency since we are not targeting any specific hardware device.

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

MBConv 是 mobilenet V2 的 inverted bottleneck

step 1:固定 ϕ = 1 \phi = 1 ϕ=1 去搜最優的 α \alpha α, β \beta β, γ \gamma γ——we find the best values for EfficientNet-B0 are α = 1.2 \alpha = 1.2 α=1.2, β = 1.1 \beta = 1.1 β=1.1, γ = 1.15 \gamma = 1.15 γ=1.15

step 2:固定 α \alpha α, β \beta β, γ \gamma γ,增大 ϕ \phi ϕ 來增大網絡(EfficientNet-B1~EfficientNet-B7)

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

6 Experiments

6.1 Datasets

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)
  • ImageNet
  • CIFAR10
  • CIFAR100
  • Birdsnap
  • Stanford Cars
  • Flowers
  • FGVC Aircraft
  • Oxford-IIIT Pets
  • Food-101

6.2 Experimental for ImageNet

1)Scaling Up MobileNets and ResNets

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

compound scaling 還是比 single scaling 猛哒

2)ImageNet Results for EfficientNet

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

下面看看速度

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

可以說是又快又猛

6.3 Transfer Learning Results for EfficientNet

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

又快又猛

這個圖畫成不同網絡用 compound scaling(點變成線) 就更驚豔啦

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own)

5 / 8 SOTA 強強強

7 Conclusion(own)

  • width / depth / resolution 單獨調的優缺點以及對網絡 FLOPS 影響的差異
  • width / depth / resolution 組合調更猛,初始的縮放因子 α \alpha α、 β \beta β、 γ \gamma γ 得 grid search 下
  • bigger models need more regularization(eg:越大 dropout 系數越高,當然指資料規模不變的情況下)

繼續閱讀