一. sigmoid函數

sigmoid函數公式

(1) σ ( z ) = 1 1 + e − z σ(z)=\frac{1}{1+e^{-z}}\tag{1} σ(z)=1+e−z1(1)
sigmoid函數Python實作

def sigmoid(Z):
    """
    Implements the sigmoid activation in numpy
    
    Arguments:
    Z -- numpy array of any shape
    
    Returns:
    A -- output of sigmoid(z), same shape as Z
    cache -- returns Z as well, useful during backpropagation
    """
    
    A = 1/(1+np.exp(-Z))
    cache = Z

    return A, cache

注：因為反向傳播要用到Z，是以先将其儲存在cache裡

二. sigmoid函數反向傳播原理

sigmoid函數導數

(2) σ ′ ( z ) = σ ( z ) ∗ ( 1 − σ ( z ) ) σ'(z)=σ(z)*(1-σ(z))\tag{2} σ′(z)=σ(z)∗(1−σ(z))(2)
sigmoid函數反向傳播原理

在第 l l l 層神經網絡，正向傳播計算公式如下：

(3) Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]}=W^{[l]}A^{[l-1]} + b^{[l]}\tag{3} Z[l]=W[l]A[l−1]+b[l](3)

(4) A [ l ] = σ ( Z [ l ] ) A^{[l]} = σ(Z^{[l]})\tag{4} A[l]=σ(Z[l])(4)

其中(1)為線性部分，(2)為激活部分，激活函數為sigmoid函數

在反向傳播中，計算到第 l l l 層時，會通過後一層得到 d A [ l ] dA^{[l]} dA[l] (即 ∂ L ∂ A [ l ] \frac{\partial \mathcal{L} }{\partial A^{[l]}} ∂A[l]∂L，其中 L \mathcal{L} L為成本函數)

目前層需要計算 d Z [ l ] dZ^{[l]} dZ[l] (即 ∂ L ∂ Z [ l ] \frac{\partial \mathcal{L} }{\partial Z^{[l]}} ∂Z[l]∂L)，公式如下：

(5) d Z [ l ] = ∂ L ∂ Z [ l ] = ∂ L ∂ A [ l ] ∗ ∂ A [ l ] ∂ Z [ l ] = d A ∗ σ ′ ( Z [ l ] ) = d A ∗ σ ( z ) ∗ ( 1 − σ ( z ) ) dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}} = \frac{\partial \mathcal{L} }{\partial A^{[l]}} * \frac{\partial A^{[l]} }{\partial Z^{[l]}} = dA * σ'(Z^{[l]}) = dA * σ(z)*(1-σ(z))\tag{5} dZ[l]=∂Z[l]∂L=∂A[l]∂L∗∂Z[l]∂A[l]=dA∗σ′(Z[l])=dA∗σ(z)∗(1−σ(z))(5)

是以實作代碼如下：
sigmoid函數反向傳播Python實作

def sigmoid_backward(dA, cache):
    """
    Implement the backward propagation for a single SIGMOID unit.
    
    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently
    
    Returns:
    dZ -- Gradient of the cost with respect to Z
    """

    Z = cache

    s = 1/(1+np.exp(-Z))
    dZ = dA * s * (1-s)

    assert (dZ.shape == Z.shape)

    return dZ

三. relu函數

relu函數公式

(6) r e l u ( z ) = { z z > 0 0 z ≤ 0 relu(z) = \begin{cases} z & z > 0 \\ 0 & z \leq 0 \end{cases}\tag{6} relu(z)={z0z>0z≤0(6)

等價于

(7) r e l u ( z ) = m a x ( 0 , z ) relu(z) = max(0,z)\tag{7} relu(z)=max(0,z)(7)
relu函數Python實作

def relu(Z):
    """
    Implement the RELU function.
    
    Arguments:
    Z -- Output of the linear layer, of any shape
    
    Returns:
    A -- Post-activation parameter, of the same shape as Z
    cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
    """

    A = np.maximum(0,Z)

    assert(A.shape == Z.shape)
    
    cache = Z 
    return A, cache

注：因為反向傳播要用到Z，是以先将其儲存在cache裡

四. relu函數反向傳播原理

relu函數導數

(8) r e l u ′ ( z ) = { 1 z > 0 0 z ≤ 0 relu'(z) = \begin{cases} 1 & z > 0 \\ 0 & z \leq 0 \end{cases}\tag{8} relu′(z)={10z>0z≤0(8)
relu函數反向傳播原理

與sigmoid同理，正向傳播時，計算公式如下：

(9) Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]}=W^{[l]}A^{[l-1]} + b^{[l]}\tag{9} Z[l]=W[l]A[l−1]+b[l](9)

(10) A [ l ] = r e l u ( Z [ l ] ) A^{[l]} = relu(Z^{[l]})\tag{10} A[l]=relu(Z[l])(10)

反向傳播時， d Z [ l ] dZ^{[l]} dZ[l]的計算公式如下：

(11) d Z [ l ] = ∂ L ∂ Z [ l ] = ∂ L ∂ A [ l ] ∗ ∂ A [ l ] ∂ Z [ l ] = d A ∗ r e l u ′ ( Z [ l ] ) = { d A i j Z i j > 0 0 Z i j ≤ 0 dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}} = \frac{\partial \mathcal{L} }{\partial A^{[l]}} * \frac{\partial A^{[l]} }{\partial Z^{[l]}} = dA * relu'(Z^{[l]}) = \begin{cases} dA_{ij} & Z_{ij} > 0 \\ 0 & Z_{ij} \leq 0 \end{cases}\tag{11} dZ[l]=∂Z[l]∂L=∂A[l]∂L∗∂Z[l]∂A[l]=dA∗relu′(Z[l])={dAij0Zij>0Zij≤0(11)

是以實作的代碼如下：
relu函數反向傳播Python實作

def relu_backward(dA, cache):
    """
    Implement the backward propagation for a single RELU unit.
    
    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    
    Z = cache
    dZ = np.array(dA, copy=True) # just converting dz to a correct object.
    
    # When z <= 0, you should set dz to 0 as well. 
    dZ[Z <= 0] = 0
    
    assert (dZ.shape == Z.shape)

    return dZ

深度學習——激活函數sigmoid與relu的反向傳播原理一. sigmoid函數二. sigmoid函數反向傳播原理三. relu函數四. relu函數反向傳播原理

深度學習——激活函數sigmoid與relu的反向傳播原理

一. sigmoid函數

二. sigmoid函數反向傳播原理

三. relu函數

四. relu函數反向傳播原理

繼續閱讀

簡單文檔分類——樸素貝葉斯算法樸素貝葉斯算法簡單文檔分類執行個體步驟總結樸素貝葉斯分類調用(sklearn)

考證大全 | 證券從業資格考試

敲黑闆！2021年證券從業考試考點預測

2021年銀行從業考試考情介紹,果斷收藏!

證券從業合格證書什麼時候列印？有哪些注意事項？

【幹貨滿滿】初級銀行從業考試《個人理财》重點梳理

2020年經濟師考試，難嗎？

初級銀行從業資格證有什麼用？

MBA提前面試純幹貨分享

MBA值得學麼

吳恩達logistic回歸實作

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

深度學習模型分析人類複雜疾病的準确性

【趨高機器視覺】機器視覺技術原了解析及解決方案

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡