深度學習——激活函數sigmoid與relu的反向傳播原理
- 一. sigmoid函數
- 二. sigmoid函數反向傳播原理
- 三. relu函數
- 四. relu函數反向傳播原理
一. sigmoid函數
-
sigmoid函數公式
(1) σ ( z ) = 1 1 + e − z σ(z)=\frac{1}{1+e^{-z}}\tag{1} σ(z)=1+e−z1(1)
- sigmoid函數Python實作
def sigmoid(Z):
"""
Implements the sigmoid activation in numpy
Arguments:
Z -- numpy array of any shape
Returns:
A -- output of sigmoid(z), same shape as Z
cache -- returns Z as well, useful during backpropagation
"""
A = 1/(1+np.exp(-Z))
cache = Z
return A, cache
- 注:因為反向傳播要用到Z,是以先将其儲存在cache裡
二. sigmoid函數反向傳播原理
-
sigmoid函數導數
(2) σ ′ ( z ) = σ ( z ) ∗ ( 1 − σ ( z ) ) σ'(z)=σ(z)*(1-σ(z))\tag{2} σ′(z)=σ(z)∗(1−σ(z))(2)
-
sigmoid函數反向傳播原理
在第 l l l 層神經網絡,正向傳播計算公式如下:
(3) Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]}=W^{[l]}A^{[l-1]} + b^{[l]}\tag{3} Z[l]=W[l]A[l−1]+b[l](3)
(4) A [ l ] = σ ( Z [ l ] ) A^{[l]} = σ(Z^{[l]})\tag{4} A[l]=σ(Z[l])(4)
其中(1)為線性部分,(2)為激活部分,激活函數為sigmoid函數
在反向傳播中,計算到第 l l l 層時,會通過後一層得到 d A [ l ] dA^{[l]} dA[l] (即 ∂ L ∂ A [ l ] \frac{\partial \mathcal{L} }{\partial A^{[l]}} ∂A[l]∂L,其中 L \mathcal{L} L為成本函數)
目前層需要計算 d Z [ l ] dZ^{[l]} dZ[l] (即 ∂ L ∂ Z [ l ] \frac{\partial \mathcal{L} }{\partial Z^{[l]}} ∂Z[l]∂L),公式如下:
(5) d Z [ l ] = ∂ L ∂ Z [ l ] = ∂ L ∂ A [ l ] ∗ ∂ A [ l ] ∂ Z [ l ] = d A ∗ σ ′ ( Z [ l ] ) = d A ∗ σ ( z ) ∗ ( 1 − σ ( z ) ) dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}} = \frac{\partial \mathcal{L} }{\partial A^{[l]}} * \frac{\partial A^{[l]} }{\partial Z^{[l]}} = dA * σ'(Z^{[l]}) = dA * σ(z)*(1-σ(z))\tag{5} dZ[l]=∂Z[l]∂L=∂A[l]∂L∗∂Z[l]∂A[l]=dA∗σ′(Z[l])=dA∗σ(z)∗(1−σ(z))(5)
是以實作代碼如下:
- sigmoid函數反向傳播Python實作
def sigmoid_backward(dA, cache):
"""
Implement the backward propagation for a single SIGMOID unit.
Arguments:
dA -- post-activation gradient, of any shape
cache -- 'Z' where we store for computing backward propagation efficiently
Returns:
dZ -- Gradient of the cost with respect to Z
"""
Z = cache
s = 1/(1+np.exp(-Z))
dZ = dA * s * (1-s)
assert (dZ.shape == Z.shape)
return dZ
三. relu函數
-
relu函數公式
(6) r e l u ( z ) = { z z > 0 0 z ≤ 0 relu(z) = \begin{cases} z & z > 0 \\ 0 & z \leq 0 \end{cases}\tag{6} relu(z)={z0z>0z≤0(6)
等價于
(7) r e l u ( z ) = m a x ( 0 , z ) relu(z) = max(0,z)\tag{7} relu(z)=max(0,z)(7)
- relu函數Python實作
def relu(Z):
"""
Implement the RELU function.
Arguments:
Z -- Output of the linear layer, of any shape
Returns:
A -- Post-activation parameter, of the same shape as Z
cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
"""
A = np.maximum(0,Z)
assert(A.shape == Z.shape)
cache = Z
return A, cache
- 注:因為反向傳播要用到Z,是以先将其儲存在cache裡
四. relu函數反向傳播原理
-
relu函數導數
(8) r e l u ′ ( z ) = { 1 z > 0 0 z ≤ 0 relu'(z) = \begin{cases} 1 & z > 0 \\ 0 & z \leq 0 \end{cases}\tag{8} relu′(z)={10z>0z≤0(8)
-
relu函數反向傳播原理
與sigmoid同理,正向傳播時,計算公式如下:
(9) Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]}=W^{[l]}A^{[l-1]} + b^{[l]}\tag{9} Z[l]=W[l]A[l−1]+b[l](9)
(10) A [ l ] = r e l u ( Z [ l ] ) A^{[l]} = relu(Z^{[l]})\tag{10} A[l]=relu(Z[l])(10)
反向傳播時, d Z [ l ] dZ^{[l]} dZ[l]的計算公式如下:
(11) d Z [ l ] = ∂ L ∂ Z [ l ] = ∂ L ∂ A [ l ] ∗ ∂ A [ l ] ∂ Z [ l ] = d A ∗ r e l u ′ ( Z [ l ] ) = { d A i j Z i j > 0 0 Z i j ≤ 0 dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}} = \frac{\partial \mathcal{L} }{\partial A^{[l]}} * \frac{\partial A^{[l]} }{\partial Z^{[l]}} = dA * relu'(Z^{[l]}) = \begin{cases} dA_{ij} & Z_{ij} > 0 \\ 0 & Z_{ij} \leq 0 \end{cases}\tag{11} dZ[l]=∂Z[l]∂L=∂A[l]∂L∗∂Z[l]∂A[l]=dA∗relu′(Z[l])={dAij0Zij>0Zij≤0(11)
是以實作的代碼如下:
- relu函數反向傳播Python實作
def relu_backward(dA, cache):
"""
Implement the backward propagation for a single RELU unit.
Arguments:
dA -- post-activation gradient, of any shape
cache -- 'Z' where we store for computing backward propagation efficiently
Returns:
dZ -- Gradient of the cost with respect to Z
"""
Z = cache
dZ = np.array(dA, copy=True) # just converting dz to a correct object.
# When z <= 0, you should set dz to 0 as well.
dZ[Z <= 0] = 0
assert (dZ.shape == Z.shape)
return dZ