Dropout: A Simple Way to Prevent Neural Networks from Overfitting
對于 dropout 層,在訓練時節點保留率(keep probability)為某一機率 p(0.5),在預測時(前向預測時)為 1.0;
傳統網絡:
z(ℓ+1)i=∑jw(ℓ+1)ij⋅y(ℓ)j+b(ℓ+1)i=w(ℓ+1)iy(ℓ)+b(ℓ+1)i
y(ℓ+1)i=f(z(ℓ+1)i)
而對于 dropout 型網絡:
r(ℓ)j∼Bernoulli(p)
y˜(ℓ)=r(ℓ)∗y(ℓ)
z(ℓ+1)i=∑jw(ℓ+1)ij⋅y˜(ℓ)j+b(ℓ+1)i=w(ℓ+1)iy˜(ℓ)+b(ℓ+1)i
![](https://img.laitimes.com/img/9ZDMuAjOiMmIsIjOiQnIsISPrdEZwZ1Rh5WNXp1bwNjW1ZUba9VZwlHdsATOfd3bkFGazxCMx8VesATMfhHLlN3XnxCMwEzX0xiRGZkRGZ0Xy9GbvNGLpZTY1EmMZVDUSFTU4VFRR9Fd4VGdsYTMfVmepNHLrJXYtJXZ0F2dvwVZnFWbp1zczV2YvJHctM3cv1Ce-cmbw5CMwUmM2E2MmlTMiZjMyMGOlJmYyYGOkBjZzIDMiNDM48CXzAzLchDMxIDMy8CXn9Gbi9CXzV2Zh1WavwVbvNmLvR3YxUjLzM3Lc9CX6MHc0RHaiojIsJye.png)
由此可見 dropout 的應用應在 relu 等非線性激活函數之後,
-> CONV/FC -> BatchNorm -> ReLu(or other activation) -> Dropout -> CONV/FC ->;