【python實作卷積神經網絡】卷積層Conv2D實作（帶stride、padding）

關于卷積操作是如何進行的就不必多說了，結合代碼一步一步來看卷積層是怎麼實作的。

代碼來源：

https://github.com/eriklindernoren/ML-From-Scratch

先看一下其基本的元件函數，首先是determine_padding(filter_shape, output_shape="same")：

def determine_padding(filter_shape, output_shape="same"):

# No padding
if output_shape == "valid":
    return (0, 0), (0, 0)
# Pad so that the output shape is the same as input shape (given that stride=1)
elif output_shape == "same":
    filter_height, filter_width = filter_shape

    # Derived from:
    # output_height = (height + pad_h - filter_height) / stride + 1
    # In this case output_height = height and stride = 1. This gives the
    # expression for the padding below.
    pad_h1 = int(math.floor((filter_height - 1)/2))
    pad_h2 = int(math.ceil((filter_height - 1)/2))
    pad_w1 = int(math.floor((filter_width - 1)/2))
    pad_w2 = int(math.ceil((filter_width - 1)/2))

    return (pad_h1, pad_h2), (pad_w1, pad_w2)

說明：根據卷積核的形狀以及padding的方式來計算出padding的值，包括上、下、左、右，其中out_shape=valid表示不填充。

補充：

math.floor(x)表示傳回小于或等于x的最大整數。

math.ceil(x)表示傳回大于或等于x的最大整數。

帶入實際的參數來看下輸出：

pad_h,pad_w=determine_padding((3,3), output_shape="same")

輸出：(1,1),(1,1)

然後是image_to_column(images, filter_shape, stride, output_shape='same')函數

def image_to_column(images, filter_shape, stride, output_shape='same'):

filter_height, filter_width = filter_shape
pad_h, pad_w = determine_padding(filter_shape, output_shape)# Add padding to the image
images_padded = np.pad(images, ((0, 0), (0, 0), pad_h, pad_w), mode='constant')# Calculate the indices where the dot products are to be applied between weights
# and the image
k, i, j = get_im2col_indices(images.shape, filter_shape, (pad_h, pad_w), stride)

# Get content from image at those indices
cols = images_padded[:, k, i, j]
channels = images.shape[1]
# Reshape content into column shape
cols = cols.transpose(1, 2, 0).reshape(filter_height * filter_width * channels, -1)
return cols

說明：輸入的images的形狀是[batchsize,channel,height,width]，類似于pytorch的圖像格式的輸入。也就是說images_padded是在height和width上進行padding的。在其中調用了get_im2col_indices()函數，那我們接下來看看它是個什麼樣子的：

def get_im2col_indices(images_shape, filter_shape, padding, stride=1):

# First figure out what the size of the output should be
batch_size, channels, height, width = images_shape
filter_height, filter_width = filter_shape
pad_h, pad_w = padding
out_height = int((height + np.sum(pad_h) - filter_height) / stride + 1)
out_width = int((width + np.sum(pad_w) - filter_width) / stride + 1)

i0 = np.repeat(np.arange(filter_height), filter_width)
i0 = np.tile(i0, channels)
i1 = stride * np.repeat(np.arange(out_height), out_width)
j0 = np.tile(np.arange(filter_width), filter_height * channels)
j1 = stride * np.tile(np.arange(out_width), out_height)
i = i0.reshape(-1, 1) + i1.reshape(1, -1)
j = j0.reshape(-1, 1) + j1.reshape(1, -1)
k = np.repeat(np.arange(channels), filter_height * filter_width).reshape(-1, 1)return (k, i, j)

說明：單獨看很難了解，我們還是帶着帶着實際的參數一步步來看。

get_im2col_indices((1,3,32,32), (3,3), ((1,1),(1,1)), stride=1)

說明：看一下每一個變量的變化情況，out_width和out_height就不多說，是卷積之後的輸出的特征圖的寬和高次元。

i0：np.repeat(np.arange(3),3)：[0 ,0,0,1,1,1,2,2,2]

i0：np.tile([0,0,0,1,1,1,2,2,2],3)：[0,0,0,1,1,1,2,2,2,0,0,0,1,1,1,2,2,2,0,0,0,1,1,1,2,2,2]，大小為：(27,)

i1：1*np.repeat(np.arange(32),32)：[0,0,0......,31,31,31]，大小為：(1024,)

j0：np.tile(np.arange(3),3*3)：[0,1,2,0,1,2,......]，大小為：(27,)

j1：1*np.tile(np.arange(32),32)：[0,1,2,3,......,0,1,2,......,29,30,31]，大小為(1024,)

i：i0.reshape(-1,1)+i1.reshape(1,-1)：大小(27,1024)

j：j0.reshape(-1,1)+j1.reshape(1,-1)：大小(27,1024)

k：np.repeat(np.arange(3),3*3).reshape(-1,1)：大小(27,1)

numpy.pad(array, pad_width, mode, **kwargs)：array是要要被填充的資料，第二個參數指定填充的長度，mod用于指定填充的資料，預設是0，如果是constant，則需要指定填充的值。

numpy.arange(start, stop, step, dtype = None)：舉例numpy.arange(3)，輸出[0,1,2]

numpy.repeat(array,repeats,axis=None)：舉例numpy.repeat([0,1,2],3)，輸出：[0,0,0,1,1,1,2,2,2]

numpy.tile(array,reps)：舉例numpy.tile([0,1,2],3)，輸出：[0,1,2,0,1,2,0,1,2]

具體的更複雜的用法還是得去查相關資料。這裡隻列舉出與本代碼相關的。

有了這些大小還是挺難了解的呀。那麼我們繼續，需要明确的是k是對通道進行操作，i是對特征圖的高，j是對特征圖的寬。使用3×3的卷積核在一個通道上進行卷積，每次執行3×3=9個像素操作，共3個通道，是以共對9×3=27個像素點進行操作。而圖像大小是32×32，共1024個像素。再回去看這三行代碼：

cols = images_padded[:, k, i, j]
channels = images.shape[1]
# Reshape content into column shape
cols = cols.transpose(1, 2, 0).reshape(filter_height * filter_width * channels, -1)

images_padded的大小是(1,3,34,34)，則cols=images_padded的大小是(1,27,1024)

channels的大小是3

最終cols=cols.transpose(1,2,0).reshape(333,-1)的大小是(27,1024)。

當batchsize的大小不是1，假設是64時，那麼最終輸出的cols的大小就是：(27,1024×64)=(27,65536)。

最後就是卷積層的實作了：

首先有一個Layer通用基類，通過繼承該基類可以實作不同的層，例如卷積層、池化層、批量歸一化層等等：

class Layer(object):

def set_input_shape(self, shape):
    """ Sets the shape that the layer expects of the input in the forward
    pass method """
    self.input_shape = shape

def layer_name(self):
    """ The name of the layer. Used in model summary. """
    return self.__class__.__name__

def parameters(self):
    """ The number of trainable parameters used by the layer """
    return 0

def forward_pass(self, X, training):
    """ Propogates the signal forward in the network """
    raise NotImplementedError()

def backward_pass(self, accum_grad):
    """ Propogates the accumulated gradient backwards in the network.
    If the has trainable weights then these weights are also tuned in this method.
    As input (accum_grad) it receives the gradient with respect to the output of the layer and
    returns the gradient with respect to the output of the previous layer. """
    raise NotImplementedError()

def output_shape(self):
    """ The shape of the output produced by forward_pass """
    raise NotImplementedError()

對于子類繼承該基類必須要實作的方法，如果沒有實作使用raise NotImplementedError()抛出異常。

接着就可以基于該基類實作Conv2D了：

class Conv2D(Layer):

"""A 2D Convolution Layer.
Parameters:
-----------
n_filters: int
    The number of filters that will convolve over the input matrix. The number of channels
    of the output shape.
filter_shape: tuple
    A tuple (filter_height, filter_width).
input_shape: tuple
    The shape of the expected input of the layer. (batch_size, channels, height, width)
    Only needs to be specified for first layer in the network.
padding: string
    Either 'same' or 'valid'. 'same' results in padding being added so that the output height and width
    matches the input height and width. For 'valid' no padding is added.
stride: int
    The stride length of the filters during the convolution over the input.
"""
def __init__(self, n_filters, filter_shape, input_shape=None, padding='same', stride=1):
    self.n_filters = n_filters
    self.filter_shape = filter_shape
    self.padding = padding
    self.stride = stride
    self.input_shape = input_shape
    self.trainable = True

def initialize(self, optimizer):
    # Initialize the weights
    filter_height, filter_width = self.filter_shape
    channels = self.input_shape[0]
    limit = 1 / math.sqrt(np.prod(self.filter_shape))
    self.W  = np.random.uniform(-limit, limit, size=(self.n_filters, channels, filter_height, filter_width))
    self.w0 = np.zeros((self.n_filters, 1))
    # Weight optimizers
    self.W_opt  = copy.copy(optimizer)
    self.w0_opt = copy.copy(optimizer)

def parameters(self):
    return np.prod(self.W.shape) + np.prod(self.w0.shape)

def forward_pass(self, X, training=True):
    batch_size, channels, height, width = X.shape
    self.layer_input = X
    # Turn image shape into column shape
    # (enables dot product between input and weights)
    self.X_col = image_to_column(X, self.filter_shape, stride=self.stride, output_shape=self.padding)
    # Turn weights into column shape
    self.W_col = self.W.reshape((self.n_filters, -1))
    # Calculate output
    output = self.W_col.dot(self.X_col) + self.w0
    # Reshape into (n_filters, out_height, out_width, batch_size)
    output = output.reshape(self.output_shape() + (batch_size, ))
    # Redistribute axises so that batch size comes first
    return output.transpose(3,0,1,2)

def backward_pass(self, accum_grad):
    # Reshape accumulated gradient into column shape
    accum_grad = accum_grad.transpose(1, 2, 3, 0).reshape(self.n_filters, -1)

    if self.trainable:
        # Take dot product between column shaped accum. gradient and column shape
        # layer input to determine the gradient at the layer with respect to layer weights
        grad_w = accum_grad.dot(self.X_col.T).reshape(self.W.shape)
        # The gradient with respect to bias terms is the sum similarly to in Dense layer
        grad_w0 = np.sum(accum_grad, axis=1, keepdims=True)

        # Update the layers weights
        self.W = self.W_opt.update(self.W, grad_w)
        self.w0 = self.w0_opt.update(self.w0, grad_w0)

    # Recalculate the gradient which will be propogated back to prev. layer
    accum_grad = self.W_col.T.dot(accum_grad)
    # Reshape from column shape to image shape
    accum_grad = column_to_image(accum_grad,
                            self.layer_input.shape,
                            self.filter_shape,
                            stride=self.stride,
                            output_shape=self.padding)

    return accum_grad

def output_shape(self):
    channels, height, width = self.input_shape
    pad_h, pad_w = determine_padding(self.filter_shape, output_shape=self.padding)
    output_height = (height + np.sum(pad_h) - self.filter_shape[0]) / self.stride + 1
    output_width = (width + np.sum(pad_w) - self.filter_shape[1]) / self.stride + 1
    return self.n_filters, int(output_height), int(output_width)

假設輸入還是(1,3,32,32)的次元，使用16個3×3的卷積核進行卷積，那麼self.W的大小就是(16,3,3,3)，self.w0的大小就是(16,1)。

self.X_col的大小就是(27,1024)，self.W_col的大小是(16,27)，那麼output = self.W_col.dot(self.X_col) + self.w0的大小就是(16,1024)

最後是這麼使用的：

image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8)

input_shape=image.squeeze().shape

conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='same', stride=1)

conv2d.initialize(None)

output=conv2d.forward_pass(image,training=True)

print(output.shape)

輸出結果：(1,16,32,32)

計算下參數：

print(conv2d.parameters())

輸出結果：448

也就是448=3×3×3×16+16

再是一個padding=valid的：

conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='valid', stride=1)

需要注意的是cols的大小變化了，因為我們卷積之後的輸出是(1,16,30,30)

輸出：

cols的大小：(27,900)

(1,16,30,30)

448

最後是帶步長的：

conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='valid', stride=2)

cols的大小：(27,225)

(1,16,15,15)

最後補充下：

卷積層參數計算公式：params=卷積核高×卷積核寬×通道數目×卷積核數目+偏置項（卷積核數目）

卷積之後圖像大小計算公式：

輸出圖像的高=(輸入圖像的高+padding（高）×2-卷積核高)/步長+1

輸出圖像的寬=(輸入圖像的寬+padding（寬）×2-卷積核寬)/步長+1

get_im2col_indices()函數中的變換操作是清楚了，至于為什麼這麼變換的原因還需要好好去琢磨。至于反向傳播和優化optimizer等研究好了之後再更新了。

原文位址

https://www.cnblogs.com/xiximayou/p/12706576.html

【python實作卷積神經網絡】卷積層Conv2D實作（帶stride、padding）

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入