為了減少訓練時梯度下降法帶來的nternal covariate shift,文章通過fixing the distribution of the layer inputs來提高訓練的速度。
We presented an algorithm for constructing, training, and performing inference with batch-normalized networks. The
resulting networks can be trained with saturating nonlinearities, are more tolerant to increased training rates, and
often do not require Dropout for regularization.
inception V3
文章思考如何有效率的提升模型的規模而盡可能的減少計算代價。In this paper, we start with describing a few general principles and optimization ideas that that proved to be useful for scaling up convolution networks in efficient ways.
網絡結構中采用了Auxiliary Classifiers,文中指出通過測試發現Auxiliary Classifiers結構出現在網絡深層效果比較好,在淺層網絡時,有無Auxiliary Classifiers對測試結果沒有影響。we argue that the auxiliary classifiers act as regularizer
然而作者發現在使用這種不對稱的卷積核時,在淺層網絡的效果并不好,在中層的特征圖大小時得到較好的效果。——In practice, we have found that employing this factorization does not work well on early layers, but it gives very good results on medium grid-sizes (On m×m feature maps, where m ranges between 12 and 20). On that level, very good results can be achieved by using 1 × 7 convolutions followed by 7 × 1 convolutions..