为了减少训练时梯度下降法带来的nternal covariate shift,文章通过fixing the distribution of the layer inputs来提高训练的速度。
We presented an algorithm for constructing, training, and performing inference with batch-normalized networks. The
resulting networks can be trained with saturating nonlinearities, are more tolerant to increased training rates, and
often do not require Dropout for regularization.
inception V3
文章思考如何有效率的提升模型的规模而尽可能的减少计算代价。In this paper, we start with describing a few general principles and optimization ideas that that proved to be useful for scaling up convolution networks in efficient ways.
网络结构中采用了Auxiliary Classifiers,文中指出通过测试发现Auxiliary Classifiers结构出现在网络深层效果比较好,在浅层网络时,有无Auxiliary Classifiers对测试结果没有影响。we argue that the auxiliary classifiers act as regularizer
然而作者发现在使用这种不对称的卷积核时,在浅层网络的效果并不好,在中层的特征图大小时得到较好的效果。——In practice, we have found that employing this factorization does not work well on early layers, but it gives very good results on medium grid-sizes (On m×m feature maps, where m ranges between 12 and 20). On that level, very good results can be achieved by using 1 × 7 convolutions followed by 7 × 1 convolutions..