【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) ICML-2019
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) 文章目錄 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Compound Model Scaling 4.1 Problem Formulation 4.2 Scaling Dimensions 4.3 Compound Scaling 5 EfficientNet Architecture 6 Experiments 6.1 Datasets 6.2 Experimental for ImageNet 6.3 Transfer Learning Results for EfficientNet 7 Conclusion(own) 1 Background and Motivation Scaling up ConvNets is widely used to achieve better accuracy.
常見的如
1)scale up 網絡深度(比如 resnet50 to resnet 101),
2)scale up 網絡的寬度(resnet50 to wide-resnet)
還有不常見的如
3)scale up 輸入的分辨率
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) 三者之間是否有内在聯系?三者聯調如何實作最大精度的提升呢?
本文作者 first to empirically quantify the relationship among all three dimensions of network width, depth, and resolution,以高效的提升模型精度
2 Related Work ConvNet Accuracy ConvNet Efficiency——lightweight network Model Scaling——width, depth, and resolutions 3 Advantages / Contributions 【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) 【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) 效仿 MNASNet AutoML 出 EfficientNet-B0,從 width, depth, and resolutions 三個次元 compound scale up EfficientNet-B0 形成不同大小的 EfficientNet-Bx,在 ImageNet 上實作 SOTA 且網絡參數很少,跨資料集驗證泛化性能也很棒(5/8 SOTA)
4 Compound Model Scaling 核心:
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) 4.1 Problem Formulation 神經網絡 N N N 可以由堆疊的層 F ( X ) F(X) F(X) 來表示
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) X 1 X_1 X1 是 input tensor F j F_j Fj 是 operator(eg conv 和 activation),其中 j j j 表示 layer j j j 更子產品化一點可以表示為
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) F i L i F_i^{L_i} FiLi 表示 layer F i F_i Fi 在 stage i i i 中重複了 L i L_i Li 次 網絡疊代以提升精度的過程可表示為
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) 4.2 Scaling Dimensions 【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) Scaling up any dimension of network width, depth, or resolution improves accuracy, but the accuracy gain diminishes for bigger models.
1)scaling Depth
優勢:capture richer and more complex features
缺點:more difficult to train due to the vanishing gradient problem——diminishing accuracy return for very deep ConvNets(一定深度後 ACC 會達到瓶頸)
2)scaling Width
通道數增加了
優勢:wider networks tend to be able to capture more fine-grained features and are easier to train
缺點:have difficulties in capturing higher level features
3)scaling Resolution
優點:potentially capture more fine-grained patterns
4.3 Compound Scaling 【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) 上圖 width 固定,改變 depth 和 resolution 來觀測結果,發現同時改 depth 和 resolution 效果最猛
In order to pursue better accuracy and efficiency, it is critical to balance all dimensions of network width, depth, and resolution during ConvNet scaling
基于4.2 和 4.3 小節紅色字型的分析,作者提出了如下的 compound scaling 方法
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) α \alpha α、 β \beta β、 γ \gamma γ 是通過 small grid search 來擷取的 ϕ \phi ϕ 是 a user-specified coefficient that controls how many more resources are available for model scaling 為啥限制 α \alpha α 時是 α \alpha α,而限制 β \beta β、 γ \gamma γ 時是 β 2 \beta^2 β2、 γ 2 \gamma^2 γ2?
doubling network depth will double FLOPS, but doubling network width or resolution will increase FLOPS by four times
按照作者的 compound scaling 方式,網絡的 FLOPS 變成了原來的
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) 倍
5 EfficientNet Architecture 基于 MNASNet 去 AutoML 基礎網絡 EfficientNet-B0——we optimize FLOPS rather than latency since we are not targeting any specific hardware device.
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) MBConv 是 mobilenet V2 的 inverted bottleneck
step 1:固定 ϕ = 1 \phi = 1 ϕ=1 去搜最優的 α \alpha α, β \beta β, γ \gamma γ——we find the best values for EfficientNet-B0 are α = 1.2 \alpha = 1.2 α=1.2, β = 1.1 \beta = 1.1 β=1.1, γ = 1.15 \gamma = 1.15 γ=1.15
step 2:固定 α \alpha α, β \beta β, γ \gamma γ,增大 ϕ \phi ϕ 來增大網絡(EfficientNet-B1~EfficientNet-B7)
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) 【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) 6 Experiments 6.1 Datasets 【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) ImageNet CIFAR10 CIFAR100 Birdsnap Stanford Cars Flowers FGVC Aircraft Oxford-IIIT Pets Food-101 6.2 Experimental for ImageNet 1)Scaling Up MobileNets and ResNets
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) compound scaling 還是比 single scaling 猛哒
2)ImageNet Results for EfficientNet
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) 下面看看速度
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) 可以說是又快又猛
6.3 Transfer Learning Results for EfficientNet 【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) 又快又猛
這個圖畫成不同網絡用 compound scaling(點變成線) 就更驚豔啦
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Compound Model Scaling5 EfficientNet Architecture6 Experiments7 Conclusion(own) 5 / 8 SOTA 強強強
7 Conclusion(own) width / depth / resolution 單獨調的優缺點以及對網絡 FLOPS 影響的差異 width / depth / resolution 組合調更猛,初始的縮放因子 α \alpha α、 β \beta β、 γ \gamma γ 得 grid search 下 bigger models need more regularization(eg:越大 dropout 系數越高,當然指資料規模不變的情況下)