卷积神经网络算法python实现_卷积神经网络python实现

以下实现参考吴恩达的作业。

一、 padding

defzero_pad(X, pad):"""Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image,

as illustrated in Figure 1.

Argument:

X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images

pad -- integer, amount of padding around each image on vertical and horizontal dimensions

Returns:

X_pad -- padded image of shape (m, n_H + 2*pad, n_W + 2*pad, n_C)"""X_pad= np.pad(X, ((0,0),(pad,pad),(pad,pad),(0,0)), 'constant', constant_values=(0,0))return X_pad

从zero_pad的函数中，我们可以看出，我们只需要对原图片矩阵进行padding操作，而m是图片的个数，n_C则是channel的个数，这两个维度并不需要我们做任何操作。

二、卷积计算

defconv_single_step(a_slice_prev, W, b):

s = a_slice_prev *W

Z =np.sum(s)

Z = Z +float(b)

return Z

卷积计算的过程中，a_slice_prev是我们在图片矩阵中的窗口，而W是filter的参数。随后我们对求得的结果进行求和，然后加上常数b。

三、卷积forward

1 defconv_forward(A_prev, W, b, hparameters):2 """

3 Implements the forward propagation for a convolution function4

5 Arguments:6 A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)7 W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)8 b -- Biases, numpy array of shape (1, 1, 1, n_C)9 hparameters -- python dictionary containing "stride" and "pad"10

11 Returns:12 Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)13 cache -- cache of values needed for the conv_backward() function14 """

16 ### START CODE HERE ###

17 #Retrieve dimensions from A_prev's shape (≈1 line)

18 (m, n_H_prev, n_W_prev, n_C_prev) =A_prev.shape19

20 #Retrieve dimensions from W's shape (≈1 line)

21 (f, f, n_C_prev, n_C) =W.shape22

23 #Retrieve information from "hparameters" (≈2 lines)

24 stride = hparameters['stride']25 pad = hparameters['pad']26

27 #Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)

28 n_H = int((n_H_prev + 2 * pad - f) / stride + 1)29 n_W = int((n_W_prev + 2 * pad - f) / stride + 1)30

31 #Initialize the output volume Z with zeros. (≈1 line)

32 Z =np.zeros((m, n_H, n_W, n_C))33

34 #Create A_prev_pad by padding A_prev

35 A_prev_pad =zero_pad(A_prev, pad)36

37 for i in range(m): #loop over the batch of training examples

38 a_prev_pad = A_prev_pad[i] #Select ith training example's padded activation

39 for h in range(n_H): #loop over vertical axis of the output volume

40 for w in range(n_W): #loop over horizontal axis of the output volume

41 for c in range(n_C): #loop over channels (= #filters) of the output volume

43 #Find the corners of the current "slice" (≈4 lines)

44 vert_start = h *stride45 vert_end = h * stride +f46 horiz_start = w *stride47 horiz_end = w * stride +f48

49 #Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)

50 a_slice_prev =a_prev_pad[vert_start : vert_end, horiz_start : horiz_end]51

52 #Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)

53 Z[i, h, w, c] =conv_single_step(a_slice_prev,W[:,:,:,c],b[:,:,:,c])54

55 ### END CODE HERE ###

57 #Making sure your output shape is correct

58 assert(Z.shape ==(m, n_H, n_W, n_C))59

60 #Save information in "cache" for the backprop

61 cache =(A_prev, W, b, hparameters)62

63 return Z, cache

参数中包含我们的图片A_prev，W，b以及超参数padding和strides。我们首先通过元组的方式获取了所有形状参数。根据形状对输出结果初始化。随后我们便可以对每一个图片中的每一个窗口进行遍历。通过f窗口长度的加法计算，我们得到窗口的横纵坐标位置。随后通过卷积计算得到最终结果。注意这里的参数适用于图中的每一个窗口。

四、池化层

def pool_forward(A_prev, hparameters, mode = "max"):"""Implements the forward pass of the pooling layer

Arguments:

A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)

hparameters -- python dictionary containing "f" and "stride"

mode -- the pooling mode you would like to use, defined as a string ("max" or "average")

Returns:

A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)

cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters"""

#Retrieve dimensions from the input shape

(m, n_H_prev, n_W_prev, n_C_prev) =A_prev.shape#Retrieve hyperparameters from "hparameters"

f = hparameters["f"]

stride= hparameters["stride"]#Define the dimensions of the output

n_H = int(1 + (n_H_prev - f) /stride)

n_W= int(1 + (n_W_prev - f) /stride)

n_C=n_C_prev#Initialize output matrix A

A =np.zeros((m, n_H, n_W, n_C))### START CODE HERE ###

for i in range(m): #loop over the training examples

for h in range(n_H): #loop on the vertical axis of the output volume

for w in range(n_W): #loop on the horizontal axis of the output volume

for c in range (n_C): #loop over the channels of the output volume

#Find the corners of the current "slice" (≈4 lines)

vert_start = h *stride

vert_end= vert_start +f

horiz_start= w *stride

horiz_end= horiz_start +f#Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)

a_prev_slice =A_prev[i, vert_start : vert_end, horiz_start : horiz_end, c]#Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.

if mode == "max":

A[i, h, w, c]=np.max(a_prev_slice)elif mode == "average":

A[i, h, w, c]=np.mean(a_prev_slice)### END CODE HERE ###

#Store the input and hparameters in "cache" for pool_backward()

cache =(A_prev, hparameters)#Making sure your output shape is correct

assert(A.shape ==(m, n_H, n_W, n_C))return A, cache

池化层的计算和之前的卷积层大同小异；我们需要注意的就是这里的参数中存在mode，其中包括max和average两种模式。

五、卷积层backward

defconv_backward(dZ, cache):"""Implement the backward propagation for a convolution function

Arguments:

dZ -- gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C)

cache -- cache of values needed for the conv_backward(), output of conv_forward()

Returns:

dA_prev -- gradient of the cost with respect to the input of the conv layer (A_prev),

numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)

dW -- gradient of the cost with respect to the weights of the conv layer (W)

numpy array of shape (f, f, n_C_prev, n_C)

db -- gradient of the cost with respect to the biases of the conv layer (b)

numpy array of shape (1, 1, 1, n_C)"""

### START CODE HERE ###

#Retrieve information from "cache"

(A_prev, W, b, hparameters) =cache#Retrieve dimensions from A_prev's shape

(m, n_H_prev, n_W_prev, n_C_prev) =A_prev.shape#Retrieve dimensions from W's shape

(f, f, n_C_prev, n_C) =W.shape#Retrieve information from "hparameters"

stride = hparameters['stride']

pad= hparameters['pad']#Retrieve dimensions from dZ's shape

(m, n_H, n_W, n_C) =dZ.shape#Initialize dA_prev, dW, db with the correct shapes

dA_prev =np.zeros(A_prev.shape)

dW=np.zeros(W.shape)

db=np.zeros(b.shape)#Pad A_prev and dA_prev

A_prev_pad =zero_pad(A_prev, pad)

dA_prev_pad=zero_pad(dA_prev, pad)for i in range(m): #loop over the training examples

#select ith training example from A_prev_pad and dA_prev_pad

a_prev_pad =A_prev_pad[i]

da_prev_pad=dA_prev_pad[i]for h in range(n_H): #loop over vertical axis of the output volume

for w in range(n_W): #loop over horizontal axis of the output volume

for c in range(n_C): #loop over the channels of the output volume

#Find the corners of the current "slice"

vert_start = h *stride

vert_end= h * stride +f

horiz_start= w *stride

horiz_end= w * stride +f#Use the corners to define the slice from a_prev_pad

a_slice =a_prev_pad[vert_start : vert_end, horiz_start : horiz_end, : ]#Update gradients for the window and the filter's parameters using the code formulas given above

da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] *dZ[ i, h, w ,c]

dW[:,:,:,c]+= a_slice *dZ[ i, h, w ,c]

db[:,:,:,c]+=dZ[ i, h, w ,c]#Set the ith training example's dA_prev to the unpaded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])

dA_prev[i, :, :, :] = da_prev_pad[pad:-pad, pad:-pad, :]### END CODE HERE ###

#Making sure your output shape is correct

assert(dA_prev.shape ==(m, n_H_prev, n_W_prev, n_C_prev))return dA_prev, dW, db

这里对于dW，db的计算与BP神经网络的计算相似。在更新参数时，我们对整个图片所有位置进行遍历，进行一次计算。

六、池化层backward

我们了解池化层的原理之后，就需要根据其特征构造backward，对于max池，我们需要创建一个mask来获得我们的有效窗口。

defcreate_mask_from_window(x):"""Creates a mask from an input matrix x, to identify the max entry of x.

Arguments:

x -- Array of shape (f, f)

Returns:

mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x."""

### START CODE HERE ### (≈1 line)

mask = (x ==np.max(x))### END CODE HERE ###

return mask

对于average我们需要分配到窗口中的每个值。

defdistribute_value(dz, shape):"""Distributes the input value in the matrix of dimension shape

Arguments:

dz -- input scalar

shape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dz

Returns:

a -- Array of size (n_H, n_W) for which we distributed the value of dz"""

### START CODE HERE ###

#Retrieve dimensions from shape (≈1 line)

(n_H, n_W) =shape#Compute the value to distribute on the matrix (≈1 line)

average = n_H *n_W#Create a matrix where every entry is the "average" value (≈1 line)

a = dz / average *np.ones((n_H, n_W))### END CODE HERE ###

return a

之后我们便可以通过和卷积层backward相同的方法，对图片进行遍历，我们将每一次得到的有效输出dZ进行累加得到这一层的dZ。

def pool_backward(dA, cache, mode = "max"):"""Implements the backward pass of the pooling layer

Arguments:

dA -- gradient of cost with respect to the output of the pooling layer, same shape as A

cache -- cache output from the forward pass of the pooling layer, contains the layer's input and hparameters

mode -- the pooling mode you would like to use, defined as a string ("max" or "average")

Returns:

dA_prev -- gradient of cost with respect to the input of the pooling layer, same shape as A_prev"""

### START CODE HERE ###

#Retrieve information from cache (≈1 line)

(A_prev, hparameters) =cache#Retrieve hyperparameters from "hparameters" (≈2 lines)

stride = hparameters['stride']

f= hparameters['f']#Retrieve dimensions from A_prev's shape and dA's shape (≈2 lines)

m, n_H_prev, n_W_prev, n_C_prev =A_prev.shape

m, n_H, n_W, n_C=dA.shape#Initialize dA_prev with zeros (≈1 line)

dA_prev =np.zeros(A_prev.shape)for i in range(m): #loop over the training examples

#select training example from A_prev (≈1 line)

a_prev =A_prev[i]for h in range(n_H): #loop on the vertical axis

for w in range(n_W): #loop on the horizontal axis

for c in range(n_C): #loop over the channels (depth)

#Find the corners of the current "slice" (≈4 lines)

vert_start = h *stride

vert_end= vert_start +f

horiz_start= w *stride

horiz_end= horiz_start +f#Compute the backward propagation in both modes.

if mode == "max":#Use the corners and "c" to define the current slice from a_prev (≈1 line)

a_prev_slice =a_prev[vert_start : vert_end, horiz_start : horiz_end, c]#Create the mask from a_prev_slice (≈1 line)

mask =create_mask_from_window(a_prev_slice)#Set dA_prev to be dA_prev + (the mask multiplied by the correct entry of dA) (≈1 line)

dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += mask *dA[i, h, w, c]elif mode == "average":#Get the value a from dA (≈1 line)

da =dA[i, h, w, c]#Define the shape of the filter as fxf (≈1 line)

shape =(f, f)#Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. (≈1 line)

dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] +=distribute_value(da, shape)### END CODE ###

#Making sure your output shape is correct

assert(dA_prev.shape ==A_prev.shape)return dA_prev