天天看點

深度學習——激活函數sigmoid與relu的反向傳播原理一. sigmoid函數二. sigmoid函數反向傳播原理三. relu函數四. relu函數反向傳播原理

深度學習——激活函數sigmoid與relu的反向傳播原理

  • 一. sigmoid函數
  • 二. sigmoid函數反向傳播原理
  • 三. relu函數
  • 四. relu函數反向傳播原理

一. sigmoid函數

  • sigmoid函數公式

    (1) σ ( z ) = 1 1 + e − z σ(z)=\frac{1}{1+e^{-z}}\tag{1} σ(z)=1+e−z1​(1)

  • sigmoid函數Python實作
def sigmoid(Z):
    """
    Implements the sigmoid activation in numpy
    
    Arguments:
    Z -- numpy array of any shape
    
    Returns:
    A -- output of sigmoid(z), same shape as Z
    cache -- returns Z as well, useful during backpropagation
    """
    
    A = 1/(1+np.exp(-Z))
    cache = Z

    return A, cache
           
  • 注:因為反向傳播要用到Z,是以先将其儲存在cache裡

二. sigmoid函數反向傳播原理

  • sigmoid函數導數

    (2) σ ′ ( z ) = σ ( z ) ∗ ( 1 − σ ( z ) ) σ'(z)=σ(z)*(1-σ(z))\tag{2} σ′(z)=σ(z)∗(1−σ(z))(2)

  • sigmoid函數反向傳播原理

    在第 l l l 層神經網絡,正向傳播計算公式如下:

    (3) Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]}=W^{[l]}A^{[l-1]} + b^{[l]}\tag{3} Z[l]=W[l]A[l−1]+b[l](3)

    (4) A [ l ] = σ ( Z [ l ] ) A^{[l]} = σ(Z^{[l]})\tag{4} A[l]=σ(Z[l])(4)

    其中(1)為線性部分,(2)為激活部分,激活函數為sigmoid函數

    在反向傳播中,計算到第 l l l 層時,會通過後一層得到 d A [ l ] dA^{[l]} dA[l] (即 ∂ L ∂ A [ l ] \frac{\partial \mathcal{L} }{\partial A^{[l]}} ∂A[l]∂L​,其中 L \mathcal{L} L為成本函數)

    目前層需要計算 d Z [ l ] dZ^{[l]} dZ[l] (即 ∂ L ∂ Z [ l ] \frac{\partial \mathcal{L} }{\partial Z^{[l]}} ∂Z[l]∂L​),公式如下:

    (5) d Z [ l ] = ∂ L ∂ Z [ l ] = ∂ L ∂ A [ l ] ∗ ∂ A [ l ] ∂ Z [ l ] = d A ∗ σ ′ ( Z [ l ] ) = d A ∗ σ ( z ) ∗ ( 1 − σ ( z ) ) dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}} = \frac{\partial \mathcal{L} }{\partial A^{[l]}} * \frac{\partial A^{[l]} }{\partial Z^{[l]}} = dA * σ'(Z^{[l]}) = dA * σ(z)*(1-σ(z))\tag{5} dZ[l]=∂Z[l]∂L​=∂A[l]∂L​∗∂Z[l]∂A[l]​=dA∗σ′(Z[l])=dA∗σ(z)∗(1−σ(z))(5)

    是以實作代碼如下:

  • sigmoid函數反向傳播Python實作
def sigmoid_backward(dA, cache):
    """
    Implement the backward propagation for a single SIGMOID unit.
    
    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently
    
    Returns:
    dZ -- Gradient of the cost with respect to Z
    """

    Z = cache

    s = 1/(1+np.exp(-Z))
    dZ = dA * s * (1-s)

    assert (dZ.shape == Z.shape)

    return dZ
           

三. relu函數

  • relu函數公式

    (6) r e l u ( z ) = { z z > 0 0 z ≤ 0 relu(z) = \begin{cases} z & z > 0 \\ 0 & z \leq 0 \end{cases}\tag{6} relu(z)={z0​z>0z≤0​(6)

    等價于

    (7) r e l u ( z ) = m a x ( 0 , z ) relu(z) = max(0,z)\tag{7} relu(z)=max(0,z)(7)

  • relu函數Python實作
def relu(Z):
    """
    Implement the RELU function.
    
    Arguments:
    Z -- Output of the linear layer, of any shape
    
    Returns:
    A -- Post-activation parameter, of the same shape as Z
    cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
    """

    A = np.maximum(0,Z)

    assert(A.shape == Z.shape)
    
    cache = Z 
    return A, cache
           
  • 注:因為反向傳播要用到Z,是以先将其儲存在cache裡

四. relu函數反向傳播原理

  • relu函數導數

    (8) r e l u ′ ( z ) = { 1 z > 0 0 z ≤ 0 relu'(z) = \begin{cases} 1 & z > 0 \\ 0 & z \leq 0 \end{cases}\tag{8} relu′(z)={10​z>0z≤0​(8)

  • relu函數反向傳播原理

    與sigmoid同理,正向傳播時,計算公式如下:

    (9) Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]}=W^{[l]}A^{[l-1]} + b^{[l]}\tag{9} Z[l]=W[l]A[l−1]+b[l](9)

    (10) A [ l ] = r e l u ( Z [ l ] ) A^{[l]} = relu(Z^{[l]})\tag{10} A[l]=relu(Z[l])(10)

    反向傳播時, d Z [ l ] dZ^{[l]} dZ[l]的計算公式如下:

    (11) d Z [ l ] = ∂ L ∂ Z [ l ] = ∂ L ∂ A [ l ] ∗ ∂ A [ l ] ∂ Z [ l ] = d A ∗ r e l u ′ ( Z [ l ] ) = { d A i j Z i j > 0 0 Z i j ≤ 0 dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}} = \frac{\partial \mathcal{L} }{\partial A^{[l]}} * \frac{\partial A^{[l]} }{\partial Z^{[l]}} = dA * relu'(Z^{[l]}) = \begin{cases} dA_{ij} & Z_{ij} > 0 \\ 0 & Z_{ij} \leq 0 \end{cases}\tag{11} dZ[l]=∂Z[l]∂L​=∂A[l]∂L​∗∂Z[l]∂A[l]​=dA∗relu′(Z[l])={dAij​0​Zij​>0Zij​≤0​(11)

    是以實作的代碼如下:

  • relu函數反向傳播Python實作
def relu_backward(dA, cache):
    """
    Implement the backward propagation for a single RELU unit.
    
    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    
    Z = cache
    dZ = np.array(dA, copy=True) # just converting dz to a correct object.
    
    # When z <= 0, you should set dz to 0 as well. 
    dZ[Z <= 0] = 0
    
    assert (dZ.shape == Z.shape)

    return dZ
           

繼續閱讀