天天看點

Deep Learning 學習随記(七)Convolution and Pooling --卷積和池化

圖像大小與參數個數:

前面幾章都是針對小圖像塊處理的,這一章則是針對大圖像進行處理的。兩者在這的差別還是很明顯的,小圖像(如8*8,MINIST的28*28)可以采用全連接配接的方式(即輸入層和隐含層直接相連)。但是大圖像,這個将會變得很耗時:比如96*96的圖像,若采用全連接配接方式,需要96*96個輸入單元,然後如果要訓練100個特征,隻這一層就需要96*96*100個參數(W,b),訓練時間将是前面的幾百或者上萬倍。是以這裡用到了部分聯通網絡。對于圖像來說,每個隐含單元僅僅連接配接輸入圖像的一小片相鄰區域。

這樣就引出了一個卷積的方法:

convolution:

自然圖像有其固有特性,也就是說,圖像的一部分的統計特性與其他部分是一樣的。這也意味着我們在這一部分學習的特征也能用在另一部分上,是以對于這個圖像上的所有位置,我們都能使用同樣的學習特征。

對于圖像,當從一個大尺寸圖像中随機選取一小塊,比如說8x8作為樣本,并且從這個小塊樣本中學習到了一些特征,這時我們可以把從這個8x8樣本中學習到的特征作為探測器,應用到這個圖像的任意地方中去。特别是,我們可以用從8x8樣本中所學習到的特征跟原本的大尺寸圖像作卷積,進而對這個大尺寸圖像上的任一位置獲得一個不同特征的激活值。

講義中舉得具體例子,還是看例子容易了解:

假設你已經從一個96x96的圖像中學習到了它的一個8x8的樣本所具有的特征,假設這是由有100個隐含單元的自編碼完成的。為了得到卷積特征,需要對96x96的圖像的每個8x8的小塊圖像區域都進行卷積運算。也就是說,抽取8x8的小塊區域,并且從起始坐标開始依次标記為(1,1),(1,2),...,一直到(89,89),然後對抽取的區域逐個運作訓練過的稀疏自編碼來得到特征的激活值。在這個例子裡,顯然可以得到100個集合,每個集合含有89x89個卷積特征。講義中那個gif圖更形象,這裡不知道怎麼添加進來...

最後,總結下convolution的處理過程:

假設給定了r * c的大尺寸圖像,将其定義為xlarge。首先通過從大尺寸圖像中抽取的a * b的小尺寸圖像樣本xsmall訓練稀疏自編碼,得到了k個特征(k為隐含層神經元數量),然後對于xlarge中的每個a*b大小的塊,求激活值fs,然後對這些fs進行卷積。這樣得到(r-a+1)*(c-b+1)*k個卷積後的特征矩陣。

pooling:

在通過卷積獲得了特征(features)之後,下一步我們希望利用這些特征去做分類。理論上講,人們可以把所有解析出來的特征關聯到一個分類器,例如softmax分類器,但計算量非常大。例如:對于一個96X96像素的圖像,假設我們已經通過8X8個輸入學習得到了400個特征。而每一個卷積都會得到一個(96 − 8 + 1) * (96 − 8 + 1) = 7921的結果集,由于已經得到了400個特征,是以對于每個樣例(example)結果集的大小就将達到892 * 400 = 3,168,400 個特征。這樣學習一個擁有超過3百萬特征的輸入的分類器是相當不明智的,并且極易出現過度拟合(over-fitting).

是以就有了pooling這個方法,翻譯作“池化”?感覺pooling這個英語單詞還是挺形象的,翻譯“作池”化就沒那麼形象了。其實也就是把特征圖像區域的一部分求個均值或者最大值,用來代表這部分區域。如果是求均值就是mean pooling,求最大值就是max pooling。講義中那個gif圖也很形象,隻是不知道這裡怎麼放gif圖....

至于pooling為什麼可以這樣做,是因為:我們之是以決定使用卷積後的特征是因為圖像具有一種“靜态性”的屬性,這也就意味着在一個圖像區域有用的特征極有可能在另一個區域同樣适用。是以,為了描述大的圖像,一個很自然的想法就是對不同位置的特征進行聚合統計。這個均值或者最大值就是一種聚合統計的方法。

另外,如果人們選擇圖像中的連續範圍作為池化區域,并且隻是池化相同(重複)的隐藏單元産生的特征,那麼,這些池化單元就具有平移不變性(translation invariant)。這就意味着即使圖像經曆了一個小的平移之後,依然會産生相同的(池化的)特征(這裡有個小小的疑問,既然這樣,是不是隻能保證在池化大小的這塊區域内具有平移不變性?)。在很多任務中(例如物體檢測、聲音識别),我們都更希望得到具有平移不變性的特征,因為即使圖像經過了平移,樣例(圖像)的标記仍然保持不變。例如,如果你處理一個MNIST資料集的數字,把它向左側或右側平移,那麼不論最終的位置在哪裡,你都會期望你的分類器仍然能夠精确地将其分類為相同的數字。

練習:

下面是講義中的練習。用到了上一章的練習的結構(即在convolution過程中的第一步,用稀疏自編碼對xsmall求k個特征)。

以下是主要程式:

主程式cnnExercise.m

Deep Learning 學習随記(七)Convolution and Pooling --卷積和池化
%% CS294A/CS294W Convolutional Neural Networks Exercise

%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  convolutional neural networks exercise. In this exercise, you will only
%  need to modify cnnConvolve.m and cnnPool.m. You will not need to modify
%  this file.

%%======================================================================
%% STEP 0: Initialization
%  Here we initialize some parameters used for the exercise.

imageDim = 64;         % image dimension
imageChannels = 3;     % number of channels (rgb, so 3)

patchDim = 8;          % patch dimension
numPatches = 50000;    % number of patches

visibleSize = patchDim * patchDim * imageChannels;  % number of input units 
outputSize = visibleSize;   % number of output units
hiddenSize = 400;           % number of hidden units 

epsilon = 0.1;           % epsilon for ZCA whitening

poolDim = 19;          % dimension of pooling region

%%======================================================================
%% STEP 1: Train a sparse autoencoder (with a linear decoder) to learn 
%  features from color patches. If you have completed the linear decoder
%  execise, use the features that you have obtained from that exercise, 
%  loading them into optTheta. Recall that we have to keep around the 
%  parameters used in whitening (i.e., the ZCA whitening matrix and the
%  meanPatch)

% --------------------------- YOUR CODE HERE --------------------------
% Train the sparse autoencoder and fill the following variables with 
% the optimal parameters:

%optTheta =  zeros(2*hiddenSize*visibleSize+hiddenSize+visibleSize, 1);
%ZCAWhite =  zeros(visibleSize, visibleSize);
%meanPatch = zeros(visibleSize, 1);
load STL10Features.mat;


% --------------------------------------------------------------------

% Display and check to see that the features look good
W = reshape(optTheta(1:visibleSize * hiddenSize), hiddenSize, visibleSize);
b = optTheta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);

displayColorNetwork( (W*ZCAWhite)');

%%======================================================================
%% STEP 2: Implement and test convolution and pooling
%  In this step, you will implement convolution and pooling, and test them
%  on a small part of the data set to ensure that you have implemented
%  these two functions correctly. In the next step, you will actually
%  convolve and pool the features with the STL10 images.

%% STEP 2a: Implement convolution
%  Implement convolution in the function cnnConvolve in cnnConvolve.m

% Note that we have to preprocess the images in the exact same way 
% we preprocessed the patches before we can obtain the feature activations.

load stlTrainSubset.mat % loads numTrainImages, trainImages, trainLabels

%% Use only the first 8 images for testing
convImages = trainImages(:, :, :, 1:8); 

% NOTE: Implement cnnConvolve in cnnConvolve.m first!
convolvedFeatures = cnnConvolve(patchDim, hiddenSize, convImages, W, b, ZCAWhite, meanPatch);

%% STEP 2b: Checking your convolution
%  To ensure that you have convolved the features correctly, we have
%  provided some code to compare the results of your convolution with
%  activations from the sparse autoencoder

% For 1000 random points
for i = 1:1000    
    featureNum = randi([1, hiddenSize]);
    imageNum = randi([1, 8]);
    imageRow = randi([1, imageDim - patchDim + 1]);
    imageCol = randi([1, imageDim - patchDim + 1]);    
   
    patch = convImages(imageRow:imageRow + patchDim - 1, imageCol:imageCol + patchDim - 1, :, imageNum);
    patch = patch(:);            
    patch = patch - meanPatch;
    patch = ZCAWhite * patch;
    
    features = feedForwardAutoencoder(optTheta, hiddenSize, visibleSize, patch); 

    if abs(features(featureNum, 1) - convolvedFeatures(featureNum, imageNum, imageRow, imageCol)) > 1e-9
        fprintf('Convolved feature does not match activation from autoencoder\n');
        fprintf('Feature Number    : %d\n', featureNum);
        fprintf('Image Number      : %d\n', imageNum);
        fprintf('Image Row         : %d\n', imageRow);
        fprintf('Image Column      : %d\n', imageCol);
        fprintf('Convolved feature : %0.5f\n', convolvedFeatures(featureNum, imageNum, imageRow, imageCol));
        fprintf('Sparse AE feature : %0.5f\n', features(featureNum, 1));       
        error('Convolved feature does not match activation from autoencoder');
    end 
end

disp('Congratulations! Your convolution code passed the test.');

%% STEP 2c: Implement pooling
%  Implement pooling in the function cnnPool in cnnPool.m

% NOTE: Implement cnnPool in cnnPool.m first!
pooledFeatures = cnnPool(poolDim, convolvedFeatures);

%% STEP 2d: Checking your pooling
%  To ensure that you have implemented pooling, we will use your pooling
%  function to pool over a test matrix and check the results.

testMatrix = reshape(1:64, 8, 8);
expectedMatrix = [mean(mean(testMatrix(1:4, 1:4))) mean(mean(testMatrix(1:4, 5:8))); ...
                  mean(mean(testMatrix(5:8, 1:4))) mean(mean(testMatrix(5:8, 5:8))); ];
            
testMatrix = reshape(testMatrix, 1, 1, 8, 8);
        
pooledFeatures = squeeze(cnnPool(4, testMatrix));

if ~isequal(pooledFeatures, expectedMatrix)
    disp('Pooling incorrect');
    disp('Expected');
    disp(expectedMatrix);
    disp('Got');
    disp(pooledFeatures);
else
    disp('Congratulations! Your pooling code passed the test.');
end

%%======================================================================
%% STEP 3: Convolve and pool with the dataset
%  In this step, you will convolve each of the features you learned with
%  the full large images to obtain the convolved features. You will then
%  pool the convolved features to obtain the pooled features for
%  classification.
%
%  Because the convolved features matrix is very large, we will do the
%  convolution and pooling 50 features at a time to avoid running out of
%  memory. Reduce this number if necessary

stepSize = 50;
assert(mod(hiddenSize, stepSize) == 0, 'stepSize should divide hiddenSize');

load stlTrainSubset.mat % loads numTrainImages, trainImages, trainLabels
load stlTestSubset.mat  % loads numTestImages,  testImages,  testLabels

pooledFeaturesTrain = zeros(hiddenSize, numTrainImages, ...
    floor((imageDim - patchDim + 1) / poolDim), ...
    floor((imageDim - patchDim + 1) / poolDim) );
pooledFeaturesTest = zeros(hiddenSize, numTestImages, ...
    floor((imageDim - patchDim + 1) / poolDim), ...
    floor((imageDim - patchDim + 1) / poolDim) );

tic();

for convPart = 1:(hiddenSize / stepSize)
    
    featureStart = (convPart - 1) * stepSize + 1;
    featureEnd = convPart * stepSize;
    
    fprintf('Step %d: features %d to %d\n', convPart, featureStart, featureEnd);  
    Wt = W(featureStart:featureEnd, :);
    bt = b(featureStart:featureEnd);    
    
    fprintf('Convolving and pooling train images\n');
    convolvedFeaturesThis = cnnConvolve(patchDim, stepSize, ...
        trainImages, Wt, bt, ZCAWhite, meanPatch);
    pooledFeaturesThis = cnnPool(poolDim, convolvedFeaturesThis);
    pooledFeaturesTrain(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;   
    toc();
    clear convolvedFeaturesThis pooledFeaturesThis;
    
    fprintf('Convolving and pooling test images\n');
    convolvedFeaturesThis = cnnConvolve(patchDim, stepSize, ...
        testImages, Wt, bt, ZCAWhite, meanPatch);
    pooledFeaturesThis = cnnPool(poolDim, convolvedFeaturesThis);
    pooledFeaturesTest(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;   
    toc();

    clear convolvedFeaturesThis pooledFeaturesThis;

end


% You might want to save the pooled features since convolution and pooling takes a long time
save('cnnPooledFeatures.mat', 'pooledFeaturesTrain', 'pooledFeaturesTest');
toc();

%%======================================================================
%% STEP 4: Use pooled features for classification
%  Now, you will use your pooled features to train a softmax classifier,
%  using softmaxTrain from the softmax exercise.
%  Training the softmax classifer for 1000 iterations should take less than
%  10 minutes.

% Add the path to your softmax solution, if necessary
% addpath /path/to/solution/

% Setup parameters for softmax
softmaxLambda = 1e-4;
numClasses = 4;
% Reshape the pooledFeatures to form an input vector for softmax
softmaxX = permute(pooledFeaturesTrain, [1 3 4 2]);
softmaxX = reshape(softmaxX, numel(pooledFeaturesTrain) / numTrainImages,...
    numTrainImages);
softmaxY = trainLabels;

options = struct;
options.maxIter = 200;
softmaxModel = softmaxTrain(numel(pooledFeaturesTrain) / numTrainImages,...
    numClasses, softmaxLambda, softmaxX, softmaxY, options);

%%======================================================================
%% STEP 5: Test classifer
%  Now you will test your trained classifer against the test images

softmaxX = permute(pooledFeaturesTest, [1 3 4 2]);
softmaxX = reshape(softmaxX, numel(pooledFeaturesTest) / numTestImages, numTestImages);
softmaxY = testLabels;

[pred] = softmaxPredict(softmaxModel, softmaxX);
acc = (pred(:) == softmaxY(:));
acc = sum(acc) / size(acc, 1);
fprintf('Accuracy: %2.3f%%\n', acc * 100);

% You should expect to get an accuracy of around 80% on the test images.      
Deep Learning 學習随記(七)Convolution and Pooling --卷積和池化

cnnConvolve.m

Deep Learning 學習随記(七)Convolution and Pooling --卷積和池化
function convolvedFeatures = cnnConvolve(patchDim, numFeatures, images, W, b, ZCAWhite, meanPatch)
%cnnConvolve Returns the convolution of the features given by W and b with
%the given images
%
% Parameters:
%  patchDim - patch (feature) dimension
%  numFeatures - number of features
%  images - large images to convolve with, matrix in the form
%           images(r, c, channel, image number)
%  W, b - W, b for features from the sparse autoencoder
%  ZCAWhite, meanPatch - ZCAWhitening and meanPatch matrices used for
%                        preprocessing
%
% Returns:
%  convolvedFeatures - matrix of convolved features in the form
%                      convolvedFeatures(featureNum, imageNum, imageRow, imageCol)
patchSize = patchDim*patchDim;
numImages = size(images, 4);
imageDim = size(images, 1);
imageChannels = size(images, 3);

convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1);

% Instructions:
%   Convolve every feature with every large image here to produce the 
%   numFeatures x numImages x (imageDim - patchDim + 1) x (imageDim - patchDim + 1) 
%   matrix convolvedFeatures, such that 
%   convolvedFeatures(featureNum, imageNum, imageRow, imageCol) is the
%   value of the convolved featureNum feature for the imageNum image over
%   the region (imageRow, imageCol) to (imageRow + patchDim - 1, imageCol + patchDim - 1)
%
% Expected running times: 
%   Convolving with 100 images should take less than 3 minutes 
%   Convolving with 5000 images should take around an hour
%   (So to save time when testing, you should convolve with less images, as
%   described earlier)

% -------------------- YOUR CODE HERE --------------------
% Precompute the matrices that will be used during the convolution. Recall
% that you need to take into account the whitening and mean subtraction
% steps
WT = W*ZCAWhite;
bT = b-WT*meanPatch;
% --------------------------------------------------------

convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1);
for imageNum = 1:numImages
  for featureNum = 1:numFeatures

    % convolution of image with feature matrix for each channel
    convolvedImage = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1);
    for channel = 1:3

      % Obtain the feature (patchDim x patchDim) needed during the convolution
      % ---- YOUR CODE HERE ----
      %feature = zeros(8,8); % You should replace this
      offset = (channel-1)*patchSize;
      feature = reshape(WT(featureNum,(offset+1):(offset+patchSize)),patchDim,patchDim);

      % ------------------------

      % Flip the feature matrix because of the definition of convolution, as explained later
      feature = flipud(fliplr(squeeze(feature)));
      
      % Obtain the image
      im = squeeze(images(:, :, channel, imageNum));

      % Convolve "feature" with "im", adding the result to convolvedImage
      % be sure to do a 'valid' convolution
      % ---- YOUR CODE HERE ----
       convolveThisChannel = conv2(im,feature,'valid');
       convolvedImage = convolvedImage + convolveThisChannel;            %三個通道加起來,應該是指三個通道同時用來做判斷标準。
    
      % ------------------------

    end
    
    % Subtract the bias unit (correcting for the mean subtraction as well)
    % Then, apply the sigmoid function to get the hidden activation
    % ---- YOUR CODE HERE ----
    convolvedImage = sigmoid(convolvedImage + bT(featureNum));

    % ------------------------
    
    % The convolved feature is the sum of the convolved values for all channels
    convolvedFeatures(featureNum, imageNum, :, :) = convolvedImage;
  end
end

function sigm = sigmoid(x)

    sigm = 1 ./ (1 + exp(-x));
end

end      
Deep Learning 學習随記(七)Convolution and Pooling --卷積和池化

cnnPool.m

Deep Learning 學習随記(七)Convolution and Pooling --卷積和池化
function pooledFeatures = cnnPool(poolDim, convolvedFeatures)
%cnnPool Pools the given convolved features
%
% Parameters:
%  poolDim - dimension of pooling region
%  convolvedFeatures - convolved features to pool (as given by cnnConvolve)
%                      convolvedFeatures(featureNum, imageNum, imageRow, imageCol)
%
% Returns:
%  pooledFeatures - matrix of pooled features in the form
%                   pooledFeatures(featureNum, imageNum, poolRow, poolCol)
%     

numImages = size(convolvedFeatures, 2);
numFeatures = size(convolvedFeatures, 1);
convolvedDim = size(convolvedFeatures, 3);

pooledFeatures = zeros(numFeatures, numImages, floor(convolvedDim / poolDim), floor(convolvedDim / poolDim));

% -------------------- YOUR CODE HERE --------------------
% Instructions:
%   Now pool the convolved features in regions of poolDim x poolDim,
%   to obtain the 
%   numFeatures x numImages x (convolvedDim/poolDim) x (convolvedDim/poolDim) 
%   matrix pooledFeatures, such that
%   pooledFeatures(featureNum, imageNum, poolRow, poolCol) is the 
%   value of the featureNum feature for the imageNum image pooled over the
%   corresponding (poolRow, poolCol) pooling region 
%   (see http://ufldl/wiki/index.php/Pooling )
%   
%   Use mean pooling here.
% -------------------- YOUR CODE HERE --------------------
numBlocks = floor(convolvedDim/poolDim);             %每個次元總共分成多少塊(57/19),這裡對于不同維數的資料,poolDim要選擇能剛好除盡的?
for featureNum = 1:numFeatures
    for imageNum=1:numImages
        for poolRow = 1:numBlocks
            for poolCol = 1:numBlocks
                features = convolvedFeatures(featureNum,imageNum,(poolRow-1)*poolDim+1:poolRow*poolDim,(poolCol-1)*poolDim+1:poolCol*poolDim);
                pooledFeatures(featureNum,imageNum,poolRow,poolCol) = mean(features(:));
            end
        end
    end
end
end      
Deep Learning 學習随記(七)Convolution and Pooling --卷積和池化

結果:

Accuracy: 78.938%

與講義提到的80%左右差不多。

ps:講義位址:

http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution

http://deeplearning.stanford.edu/wiki/index.php/Pooling

http://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_Pooling

繼續閱讀