吳恩達.深度學習系列-C1神經網絡與深度學習-w4深度神經網絡

- - 前言
  - Deep L-layer neural network
  - Forward Propagation in a Deep Network
  - Getting your matrix dimensions right
  - Why deep representations?
  - Building blocks of deep neural networks
  - Forward and Backward Propagation
  - Parameters vs Hyperparameters
  - What does this have to do with the brain?
  - Practic Questions

前言

Deep L-layer neural network

shallow 與 Deep 是相對的。

一般對某些問題進行分類，可以先從邏輯回歸（最簡單的單個神經元），逐漸增加網絡層數，并把層數做為一個超參數，使用交叉驗證來判定多少層的網絡适合我們的分類問題。

符号申明：

W[l]是用來計算Z[l]的參數,W[l]∗A[l−1]+b[l]=Z[l] W [ l ] 是用來計算 Z [ l ] 的參數 , W [ l ] ∗ A [ l − 1 ] + b [ l ] = Z [ l ]
n[l]表示第l層的神經元個數，n[0]是輸入層特征的個數 n [ l ] 表示第 l 層的神經元個數， n [ 0 ] 是輸入層特征的個數
a[l]表示第l層輸出的激活值，a[0]=x是輸入層的x1,x2...xnx。a[L]是輸出層 a [ l ] 表示第 l 層輸出的激活值， a [ 0 ] = x 是輸入層的 x 1 , x 2 . . . x n x 。 a [ L ] 是輸出層

Forward Propagation in a Deep Network

吳恩達.深度學習系列-C1神經網絡與深度學習-w4深度神經網絡

在上圖這樣一個3×5×5×3×1的5層網絡中，Forward propagation的vecteration表達如下：

第0層是輸入層，L=4層，第L層是輸出層。

n[0]=nx=3,n[1]=5,n[2]=5,n[3]=3,n[4]=1 n [ 0 ] = n x = 3 , n [ 1 ] = 5 , n [ 2 ] = 5 , n [ 3 ] = 3 , n [ 4 ] = 1

x=A[0]=(3,1) x = A [ 0 ] = ( 3 , 1 )

W[1]=(5,3),b[1]=(5,1),Z[1]=W[1]∗A[0]+b[1]=(5,1) W [ 1 ] = ( 5 , 3 ) , b [ 1 ] = ( 5 , 1 ) , Z [ 1 ] = W [ 1 ] ∗ A [ 0 ] + b [ 1 ] = ( 5 , 1 )

W(5,3)的形可以這樣看，第一個數字5是本層神經元個數，第二個數字是上一層的神經元個數。 W[l]:(n[l],n[l−1]) W [ l ] : ( n [ l ] , n [ l − 1 ] )

b(5,1)的形可以這樣看，第一個數字5是本層神經元個數，第二個數字是1，因為每個神經元共享一個偏置量b。

Forward：Input A^{[l-1]},output A^{[l]}

A[1]=g[1](Z[1])=(5,1);g()可以是sigmoid、relu等等的激活函數 A [ 1 ] = g [ 1 ] ( Z [ 1 ] ) = ( 5 , 1 ) ; g ( ) 可以是 s i g m o i d 、 r e l u 等等的激活函數

W[2]=(5,5),b[2]=(5,1),Z[2]=W[2]∗A[1]+b[2]=(5,1) W [ 2 ] = ( 5 , 5 ) , b [ 2 ] = ( 5 , 1 ) , Z [ 2 ] = W [ 2 ] ∗ A [ 1 ] + b [ 2 ] = ( 5 , 1 ) ⟹矩陣乘

A[2]=g[2](Z[2])=(5,1) A [ 2 ] = g [ 2 ] ( Z [ 2 ] ) = ( 5 , 1 )

W[3]=(3,5),b=(3,1)，Z[3]=W[3]∗A[2]+b[3]=(3,1) W [ 3 ] = ( 3 , 5 ) , b = ( 3 , 1 ) ， Z [ 3 ] = W [ 3 ] ∗ A [ 2 ] + b [ 3 ] = ( 3 , 1 ) ⟹矩陣乘

A[3]=g[3](Z[3])=(3,1) A [ 3 ] = g [ 3 ] ( Z [ 3 ] ) = ( 3 , 1 )

W[4]=(1,3),b[4]=(1,1)，Z[4]=W[4]∗A[3]+b[4]=(1,1) W [ 4 ] = ( 1 , 3 ) , b [ 4 ] = ( 1 , 1 ) ， Z [ 4 ] = W [ 4 ] ∗ A [ 3 ] + b [ 4 ] = ( 1 , 1 ) ⟹矩陣乘

輸出層：y^=A[4]=g[4](Z[4])=(1,1) 輸出層： y ^ = A [ 4 ] = g [ 4 ] ( Z [ 4 ] ) = ( 1 , 1 )

使用for-loop來從1到L層的計算。

以上是以單條樣本為例，如果有m進行運算，用m替換上面清單中的1.

Getting your matrix dimensions right

好吧，我在上一小節就把這章的筆記給總結了。

w,b,z,a的梯度的shape跟W,B,Z,A是一樣的。

Why deep representations?

為什麼深層神經網絡對很多分類問題有效？

圖像識别

随着網絡深入，從總結圖像的邊緣開始，逐漸組合出複雜的特征用以識别圖檔。

音頻識别

也一樣，第一層可能去識别一小片段的音調是升高還是降低，随着網絡層數增加，組合出複雜的音調類型，用以識别音頻的總體特征。

同樣，這個道理适用于文章的識别。

電路原理與深度學習

非正式說法：淺層網絡需要的神經元個數是深度網絡的指數倍數。

吳恩達.深度學習系列-C1神經網絡與深度學習-w4深度神經網絡

上圖左：可以使用更多的隐藏層，神經元的個數是O(logn)

上圖右：隻允許一個隐藏層，神經元個數是 2n個 2 n 個

這個例子可以從一個方面說明深度網絡的價值。

Building blocks of deep neural networks

Forward：Input A^{[l-1]},output A^{[l]}

Backward： Inputda[l],outputda[l−1] I n p u t d a [ l ] , o u t p u t d a [ l − 1 ]

吳恩達.深度學習系列-C1神經網絡與深度學習-w4深度神經網絡

如上，一張圖說明了前向傳播與反向傳播的過程變量與輸出值。注意： Z[l] Z [ l ] 是正向傳播的過程值，同時是反向傳播梯度計算中的輸入，是以必須進行cache。

有時間真應該把這草圖畫成一個正式圖。不過markdown的流程圖好像沒法畫這種圖，隻能用ppt來實作了。

Forward and Backward Propagation

吳恩達.深度學習系列-C1神經網絡與深度學習-w4深度神經網絡

前向傳播

參考前面小節“Forward Propagation in a Deep Network”

反向傳播

input： da[l] d a [ l ]

output: da[l−1],dW[l],db[l] d a [ l − 1 ] , d W [ l ] , d b [ l ]

Vecteraztion representation:

dZ[l]=dA[l]∗g[l]′(Z[l]) d Z [ l ] = d A [ l ] ∗ g [ l ] ′ ( Z [ l ] ) ,不同激活函數g‘()不一樣啊。

dW[l]=1mdZ[l]⋅(A[l−1].T) d W [ l ] = 1 m d Z [ l ] ⋅ ( A [ l − 1 ] . T )

db[l]=1mnp.sum(dZ[l],axis=1,keepdims=True) d b [ l ] = 1 m n p . s u m ( d Z [ l ] , a x i s = 1 , k e e p d i m s = T r u e )

dA[l−1]=(W[l].T)⋅dZ[l] d A [ l − 1 ] = ( W [ l ] . T ) ⋅ d Z [ l ]

吳恩達.深度學習系列-C1神經網絡與深度學習-w4深度神經網絡

前向傳播後得出 Lost(y^,y) L o s t ( y ^ , y ) 的數值,對Lost函數進行求導得出 dA[L]=−yA[L]+1−y1−A[L] d A [ L ] = − y A [ L ] + 1 − y 1 − A [ L ]

Parameters vs Hyperparameters

Parameters: W[1],b[1],W[2],b[2]....W[L],b[L] W [ 1 ] , b [ 1 ] , W [ 2 ] , b [ 2 ] . . . . W [ L ] , b [ L ]

Hyperparameters:

- Learning rate α α

- iterations

- hidden layers L

- hidden units n[1],n[2]...[L] n [ 1 ] , n [ 2 ] . . . [ L ]

- choice of activation:Relu,tanh,sigmoid….

- momentum 動量

- minibatch size

- regularizations

吳恩達.深度學習系列-C1神經網絡與深度學習-w4深度神經網絡

超參數是無法在深度學習過程中進行學習的參數，它由人工設定并影響了參數的取值與模型的效果。深度學習是一個重實驗的過程。需要不斷實驗（交叉驗證法）去對比發現最适當的超參數。如前表所列，需要處理的超參數十分的多，以後的課程會教授如何去探索超參數空間。

What does this have to do with the brain?

把深度學習與大腦做類比，吳恩達大師認為，這個領域已經進步到可以打破這個類比的階段。他傾向于不再運用這個類比。

【注：現在很多文章都有提出，人腦的處理過程不是這麼簡單的激活輸出的方式。深度學習跟大腦的學習并沒有什麼相似之處。】

Practic Questions

吳恩達.深度學習系列-C1神經網絡與深度學習-w4深度神經網絡

對于L的定義是：隐藏層數+1。輸入層與輸出層不是隐藏層。本題中隐藏層數=3，是以L=4。

第一層l=0，第二層l=1，第三層l=2，第四層l=3，第五層l=4=L

吳恩達.深度學習系列-C1神經網絡與深度學習-w4深度神經網絡

前言

Deep L-layer neural network

Forward Propagation in a Deep Network

Getting your matrix dimensions right

Why deep representations?

Building blocks of deep neural networks

Forward and Backward Propagation

Parameters vs Hyperparameters

What does this have to do with the brain?

Practic Questions

繼續閱讀

資料預處理與資料增強

訓練 fast point transformer

Check failed: error == cudaSuccess (8 vs. 0) invalid device function

手把手教你用matlab做深度學習(二)- --CNN

白話系列之 bias and variance機器學習中的bias和variance簡單了解

TensorFlow softmax VS sparse softmax

Introduction to Advanced Machine Learning, 第六周，week6_final_project_image_captioning_clean(答案)Image Captioning Final ProjectImport stuffDownload dataExtract image featuresExtract captions for imagesPrepare captions for trainingTrainingApplying model

Python3.5更新3.7PreparationBuilding The Soft Link

深度學習訓練過程中的問題集錦

PyTorch: PermissionError: [Errno 13] Permission deniedQuestion:Solution:

[windows 10編譯CNTK]

ELMo - Deep contextualized word representationsDeep contextualized word representations (ELMo)elmo語言模型Reference

NCNN使用總結友情提示

R與bioconductor--ExpressionSet SummarizedExperiment GEOquery biomaRt S4-Classes S4-Methods

Coursera Machine Learning如何送出MATLAB Online作業