{作為CNN學習入門的一部分,筆者在這裡逐漸給出UFLDL的各章節Exercise的個人代碼實作,供大家參考指正}
基于 Convolutional neural networks(CNN) (十二) Convolutional Neural Network Theory 理論分析,對 Exercise: Convolutional Neural Network進行了MatLab實作,同時也是CNN練習子產品的最後一作:
NOTICE:
UFLDL的wiki部分與tutorial部分存在其差異性,如讀者按照筆者的學習曆程,先學習了wiki部分。
則會在此文中發現 cnnPool.m 與 cnnConvolve.m 同之前的練習有所不同,在此blog中已經進行了更新。
cnnPool.m
function pooledFeatures = cnnPool(poolDim, convolvedFeatures)
%cnnPool Pools the given convolved features
%
% Parameters:
% poolDim - dimension of pooling region
% convolvedFeatures - convolved features to pool (as given by cnnConvolve)
% convolvedFeatures(imageRow, imageCol, featureNum, imageNum)
%
% Returns:
% pooledFeatures - matrix of pooled features in the form
% pooledFeatures(poolRow, poolCol, featureNum, imageNum)
%
numImages = size(convolvedFeatures, 4);
numFilters = size(convolvedFeatures, 3);
convolvedDim = size(convolvedFeatures, 1);
pooledFeatures = zeros(convolvedDim / poolDim, convolvedDim / poolDim, numFilters, numImages);
% Instructions:
% Now pool the convolved features in regions of poolDim x poolDim,
% to obtain the
% (convolvedDim/poolDim) x (convolvedDim/poolDim) x numFeatures x numImages
% matrix pooledFeatures, such that
% pooledFeatures(poolRow, poolCol, featureNum, imageNum) is the
% value of the featureNum feature for the imageNum image pooled over the
% corresponding (poolRow, poolCol) pooling region.
%
% Use mean pooling here.
%%% YOUR CODE HERE %%%
for iterFeature = 1:numFilters
for iterImage = 1:numImages
% for iterDim_col = 1:floor(convolvedDim / poolDim)
% for iterDim_row = 1:floor(convolvedDim / poolDim)
% tmp = convolvedFeatures( ...
% 1+(iterDim_row-1)*poolDim:iterDim_row*poolDim,...
% 1+(iterDim_col-1)*poolDim:iterDim_col*poolDim, ...
% iterFeature, ...
% iterImage );
% pooledFeatures(iterDim_row, iterDim_col, iterFeature, iterImage) = mean(tmp(:));
tmp = conv2(convolvedFeatures(:,:,iterFeature,iterImage), ones(poolDim),'valid');
pooledFeatures(:,:,iterFeature,iterImage) = 1./(poolDim^2)*tmp(1:poolDim:end,1:poolDim:end);
% end
% end
end
end
end
cnnConvolve.m
function convolvedFeatures = cnnConvolve(filterDim, numFilters, images, W, b)
%cnnConvolve Returns the convolution of the features given by W and b with
%the given images
%
% Parameters:
% filterDim - filter (feature) dimension
% numFilters - number of feature maps
% images - large images to convolve with, matrix in the form
% images(r, c, image number)
% W, b - W, b for features from the sparse autoencoder
% W is of shape (filterDim,filterDim,numFilters)
% b is of shape (numFilters,1)
%
% Returns:
% convolvedFeatures - matrix of convolved features in the form
% convolvedFeatures(imageRow, imageCol, featureNum, imageNum)
numImages = size(images, 3);
imageDim = size(images, 1);
convDim = imageDim - filterDim + 1;
convolvedFeatures = zeros(convDim, convDim, numFilters, numImages);
% Instructions:
% Convolve every filter with every image here to produce the
% (imageDim - filterDim + 1) x (imageDim - filterDim + 1) x numFeatures x numImages
% matrix convolvedFeatures, such that
% convolvedFeatures(imageRow, imageCol, featureNum, imageNum) is the
% value of the convolved featureNum feature for the imageNum image over
% the region (imageRow, imageCol) to (imageRow + filterDim - 1, imageCol + filterDim - 1)
%
% Expected running times:
% Convolving with 100 images should take less than 30 seconds
% Convolving with 5000 images should take around 2 minutes
% (So to save time when testing, you should convolve with less images, as
% described earlier)
for imageNum = 1:numImages
for filterNum = 1:numFilters
% convolution of image with feature matrix
% convolvedImage = zeros(convDim, convDim);
% Obtain the feature (filterDim x filterDim) needed during the convolution
%%% YOUR CODE HERE %%%
filter = W(:, :, filterNum);
% Flip the feature matrix because of the definition of convolution, as explained later
filter = rot90(squeeze(filter),2);
% Obtain the image
im = squeeze(images(:, :, imageNum));
% Convolve "filter" with "im", adding the result to convolvedImage
% be sure to do a 'valid' convolution
%%% YOUR CODE HERE %%%
convolvedImage = conv2(im, filter, 'valid');
% Add the bias unit
% Then, apply the sigmoid function to get the hidden activation
%%% YOUR CODE HERE %%%
convolvedImage = sigmoid(convolvedImage + b(filterNum));
convolvedFeatures(:, :, filterNum, imageNum) = convolvedImage;
end
end
end
cnnTrain.m
%% Convolution Neural Network Exercise
% Instructions
% ------------
%
% This file contains code that helps you get started in building a single.
% layer convolutional nerual network. In this exercise, you will only
% need to modify cnnCost.m and cnnminFuncSGD.m. You will not need to
% modify this file.
%%======================================================================
%% STEP 0: Initialize Parameters and Load Data
% Here we initialize some parameters used for the exercise.
% Configuration
imageDim = 28;
numClasses = 10; % Number of classes (MNIST images fall into 10 classes)
filterDim = 9; % Filter size for conv layer
numFilters = 20; % Number of filters for conv layer
poolDim = 2; % Pooling dimension, (should divide imageDim-filterDim+1)
% Load MNIST Train
addpath common/;
images = loadMNISTImages('common/train-images.idx3-ubyte');
images = reshape(images,imageDim,imageDim,[]);
labels = loadMNISTLabels('common/train-labels.idx1-ubyte');
labels(labels==0) = 10; % Remap 0 to 10
% Initialize Parameters
theta = cnnInitParams(imageDim,filterDim,numFilters,poolDim,numClasses);
%%======================================================================
%% STEP 1: Implement convNet Objective
% Implement the function cnnCost.m.
%%======================================================================
%% STEP 2: Gradient Check
% Use the file computeNumericalGradient.m to check the gradient
% calculation for your cnnCost.m function. You may need to add the
% appropriate path or copy the file to this directory.
DEBUG=false; % set this to true to check gradient
if DEBUG
% To speed up gradient checking, we will use a reduced network and
% a debugging data set
db_numFilters = 2;
db_filterDim = 9;
db_poolDim = 5;
db_images = images(:,:,1:10);
db_labels = labels(1:10);
db_theta = cnnInitParams(imageDim,db_filterDim,db_numFilters,...
db_poolDim,numClasses);
[cost grad] = cnnCost(db_theta,db_images,db_labels,numClasses,...
db_filterDim,db_numFilters,db_poolDim);
% Check gradients
numGrad = computeNumericalGradient( @(x) cnnCost(x,db_images,...
db_labels,numClasses,db_filterDim,...
db_numFilters,db_poolDim), db_theta);
% Use this to visually compare the gradients side by side
disp([numGrad grad]);
diff = norm(numGrad-grad)/norm(numGrad+grad);
% Should be small. In our implementation, these values are usually
% less than 1e-9.
disp(diff);
assert(diff < 1e-9,...
'Difference too large. Check your gradient computation again');
% reach here @ 2.0024e-10
end;
%%======================================================================
%% STEP 3: Learn Parameters
% Implement minFuncSGD.m, then train the model.
options.epochs = 5;
% options.minibatch = 256;
options.minibatch = 256;
options.alpha = 1e-1;
options.momentum = .95;
opttheta = minFuncSGD(@(x,y,z) cnnCost(x,y,z,numClasses,filterDim,...
numFilters,poolDim),theta,images,labels,options);
%%======================================================================
%% STEP 4: Test
% Test the performance of the trained model using the MNIST test set. Your
% accuracy should be above 97% after 3 epochs of training
testImages = loadMNISTImages('common/t10k-images.idx3-ubyte');
testImages = reshape(testImages,imageDim,imageDim,[]);
testLabels = loadMNISTLabels('common/t10k-labels.idx1-ubyte');
testLabels(testLabels==0) = 10; % Remap 0 to 10
[~,cost,preds]=cnnCost(opttheta,testImages,testLabels,numClasses,...
filterDim,numFilters,poolDim,true);
acc = sum(preds==testLabels)/length(preds);
% Accuracy should be around 97.4% after 3 epochs
fprintf('Accuracy is %f\n',acc);
cnnCost.m
function [cost, grad, preds] = cnnCost(theta,images,labels,numClasses,...
filterDim,numFilters,poolDim,pred)
% Calcualte cost and gradient for a single layer convolutional
% neural network followed by a softmax layer with cross entropy
% objective.
%
% Parameters:
% theta - unrolled parameter vector
% images - stores images in imageDim x imageDim x numImges
% array
% numClasses - number of classes to predict
% filterDim - dimension of convolutional filter
% numFilters - number of convolutional filters
% poolDim - dimension of pooling area
% pred - boolean only forward propagate and return
% predictions
%
%
% Returns:
% cost - cross entropy cost
% grad - gradient with respect to theta (if pred==False)
% preds - list of predictions for each example (if pred==True)
if ~exist('pred','var')
pred = false;
end;
imageDim = size(images,1); % height/width of image
numImages = size(images,3); % number of images
%% Reshape parameters and setup gradient matrices
% Wc is filterDim x filterDim x numFilters parameter matrix
% bc is the corresponding bias
% Wd is numClasses x hiddenSize parameter matrix where hiddenSize
% is the number of output units from the convolutional layer
% bd is corresponding bias
[Wc, Wd, bc, bd] = cnnParamsToStack(theta,imageDim,filterDim,numFilters,...
poolDim,numClasses);
% Same sizes as Wc,Wd,bc,bd. Used to hold gradient w.r.t above params.
Wc_grad = zeros(size(Wc));
% Wd_grad = zeros(size(Wd));
bc_grad = zeros(size(bc));
% bd_grad = zeros(size(bd));
%%======================================================================
%% STEP 1a: Forward Propagation
% In this step you will forward propagate the input through the
% convolutional and subsampling (mean pooling) layers. You will then use
% the responses from the convolution and pooling layer as the input to a
% standard softmax layer.
%% Convolutional Layer
% For each image and each filter, convolve the image with the filter, add
% the bias and apply the sigmoid nonlinearity. Then subsample the
% convolved activations with mean pooling. Store the results of the
% convolution in activations and the results of the pooling in
% activationsPooled. You will need to save the convolved activations for
% backpropagation.
convDim = imageDim-filterDim+1; % dimension of convolved output
outputDim = (convDim)/poolDim; % dimension of subsampled output
% convDim x convDim x numFilters x numImages tensor for storing activations
% activations = zeros(convDim,convDim,numFilters,numImages);
% outputDim x outputDim x numFilters x numImages tensor for storing
% subsampled activations
% activationsPooled = zeros(outputDim,outputDim,numFilters,numImages);
%%% YOUR CODE HERE %%%
activations = cnnConvolve(filterDim, numFilters, images, Wc, bc);
activationsPooled = cnnPool(poolDim, activations);
% Reshape activations into 2-d matrix, hiddenSize x numImages,
% for Softmax layer
activationsPooled = reshape(activationsPooled,[],numImages);
%% Softmax Layer
% Forward propagate the pooled activations calculated above into a
% standard softmax layer. For your convenience we have reshaped
% activationPooled into a hiddenSize x numImages matrix. Store the
% results in probs.
% numClasses x numImages for storing probability that each image belongs to
% each class.
% probs = zeros(numClasses,numImages);
%%% YOUR CODE HERE %%%
M = Wd*activationsPooled + repmat(bd, [1,numImages]);
M = bsxfun(@minus, M, max(M,[],1));
M = exp(M);
probs = bsxfun(@rdivide, M, sum(M));
%%======================================================================
%% STEP 1b: Calculate Cost
% In this step you will use the labels given as input and the probs
% calculate above to evaluate the cross entropy objective. Store your
% results in cost.
% cost = 0; % save objective into cost
lambda_c = 3e-3;
lambda_d = 1e-4;
numChannel = 1; % MNIST Data Set has only 1 input channel
%%% YOUR CODE HERE %%%
numCases = size(images, 3);
groundTruth = full(sparse(labels, 1:numCases, 1));
J_theta = sum(sum(log(probs).*groundTruth));
J_theta = -J_theta / numCases;
WeightDecay_c = lambda_c * sum(Wc(:).^2) / 2;
WeightDecay_d = lambda_d * sum(Wd(:).^2) / 2;
WeightDecay = WeightDecay_c + WeightDecay_d;
cost = J_theta + WeightDecay;
% Makes predictions given probs and returns without backproagating errors.
if pred
[~,preds] = max(probs,[],1);
preds = preds';
grad = 0;
return;
end;
%%======================================================================
%% STEP 1c: Backpropagation
% Backpropagate errors through the softmax and convolutional/subsampling
% layers. Store the errors for the next step to calculate the gradient.
% Backpropagating the error w.r.t the softmax layer is as usual. To
% backpropagate through the pooling layer, you will need to upsample the
% error with respect to the pooling layer for each filter and each image.
% Use the kron function and a matrix of ones to do this upsampling
% quickly.
%%% YOUR CODE HERE %%%
delta_softmax = -(groundTruth - probs) / numImages;
% 1/numImage has been calculated in this step
% Gradeint param won't contain 1/m
delta_pooling = Wd' * delta_softmax;
delta_pooling = reshape(delta_pooling, outputDim, outputDim, numFilters, numImages);
activations = reshape(activations, convDim, convDim, numFilters, numImages);
delta_conv = zeros(convDim, convDim, numFilters, numImages);
for i = 1:numImages
for j = 1:numFilters
delta_conv(:, :, j, i) = (1/poolDim^2) * kron(delta_pooling(:, :, j, i),ones(poolDim));
delta_conv(:, :, j, i) = delta_conv(:, :, j, i) .* activations(:, :, j, i) .* (1-activations(:, :, j, i));
end
end
%%======================================================================
%% STEP 1d: Gradient Calculation
% After backpropagating the errors above, we can use them to calculate the
% gradient with respect to all the parameters. The gradient w.r.t the
% softmax layer is calculated as usual. To calculate the gradient w.r.t.
% a filter in the convolutional layer, convolve the backpropagated error
% for that filter with each image and aggregate over images.
%%% YOUR CODE HERE %%%
Wd_grad = delta_softmax * activationsPooled' + lambda_d * Wd;
bd_grad = sum(delta_softmax, 2);
for i = 1:numFilters
for j = 1:numChannel
% Unused Loop
for m = 1:numImages
filter = rot90(squeeze(delta_conv(:,:,i,m)),2);
Wc_grad(:, :, i) = Wc_grad(:, :, i) + conv2(images(:,:,m), filter, 'valid');
end
end
bc_tmp = delta_conv(:,:,i,:);
bc_grad(i) = sum(bc_tmp(:));
end
Wc_grad = Wc_grad + lambda_c * Wc;
%% Unroll gradient into grad vector for minFunc
grad = [Wc_grad(:) ; Wd_grad(:) ; bc_grad(:) ; bd_grad(:)];
end
minFuncSGD.m
function [opttheta] = minFuncSGD(funObj,theta,data,labels,...
options)
% Runs stochastic gradient descent with momentum to optimize the
% parameters for the given objective.
%
% Parameters:
% funObj - function handle which accepts as input theta,
% data, labels and returns cost and gradient w.r.t
% to theta.
% theta - unrolled parameter vector
% data - stores data in m x n x numExamples tensor
% labels - corresponding labels in numExamples x 1 vector
% options - struct to store specific options for optimization
%
% Returns:
% opttheta - optimized parameter vector
%
% Options (* required)
% epochs* - number of epochs through data
% alpha* - initial learning rate
% minibatch* - size of minibatch
% momentum - momentum constant, defualts to 0.9
%%======================================================================
%% Setup
assert(all(isfield(options,{'epochs','alpha','minibatch'})),...
'Some options not defined');
if ~isfield(options,'momentum')
options.momentum = 0.9;
end;
epochs = options.epochs;
alpha = options.alpha;
minibatch = options.minibatch;
m = length(labels); % training set size
% Setup for momentum
mom = 0.5;
momIncrease = 20;
velocity = zeros(size(theta));
%%======================================================================
%% SGD loop
it = 0;
for e = 1:epochs
% randomly permute indices of data for quick minibatch sampling
rp = randperm(m);
for s=1:minibatch:(m-minibatch+1)
it = it + 1;
% increase momentum after momIncrease iterations
if it == momIncrease
mom = options.momentum;
end;
% get next randomly selected minibatch
mb_data = data(:,:,rp(s:s+minibatch-1));
mb_labels = labels(rp(s:s+minibatch-1));
% evaluate the objective function on the next minibatch
[cost grad] = funObj(theta,mb_data,mb_labels);
% Instructions: Add in the weighted velocity vector to the
% gradient evaluated above scaled by the learning rate.
% Then update the current weights theta according to the
% sgd update rule
%%% YOUR CODE HERE %%%
velocity = velocity * mom + alpha * grad;
theta = theta - velocity;
fprintf('Epoch %d: Cost on iteration %d is %f\n',e,it,cost);
end;
% aneal learning rate by factor of two after each epoch
alpha = alpha/2.0;
end;
opttheta = theta;
end
實驗結果:

參數選取:
5 epoch SGD-mom
batch-size = 256
time-consumption = 1171.324 / 60 = 19.52 mins
Epoch 5: Cost on iteration 1170 is 0.205146
Accuracy is 0.963500