天天看點

DeepLearningToolBox學習——RBM(Restrict Boltzmann Machine) 1.ministdeepauto2. Rbm 4.makebatches5. backprop5. CG_MNIST6.minimize——共轭梯度下降的優化函數形式

下載下傳位址:DeepLearningToolBox

     學習RBM代碼之前,需要一些基本的RBM的知識。

     網上有很多參考資料,我借鑒一篇寫的很好的系列文章,看下來也差不多能看懂了,部落格位址:http://blog.csdn.net/itplus/article/details/19168937

 目錄如下

(一)預備知識

(二)網絡結構

(三)能量函數和機率分布

(四)對數似然函數

(五)梯度計算公式

(六)對比散度算法

(七)RBM 訓練算法

(八)RBM 的評估

通過學習上面的系列文章,基本上了解了RBM的原理,接下來動手學習toolbox中對應的代碼部分。本文參考諸多部落格,感謝原文作者。

1.ministdeepauto

  這個code對應的原文是 Hition大牛science文章reducing the dimensionality of data with neural networks。  用MNIST資料庫來進行深度的autoencoder壓縮,用的是無監督學習,評價标準是重構誤內插補點MSE。

% Version 1.000
%
% Code provided by Ruslan Salakhutdinov and Geoff Hinton  
%
% Permission is granted for anyone to copy, use, modify, or distribute this
% program and accompanying programs and documents for any purpose, provided
% this copyright notice is retained and prominently displayed, along with
% a note saying that the original programs are available from our 
% web page. 
% The programs and documents are distributed without any warranty, express or
% implied.  As the programs were written for research purposes only, they have
% not been tested to the degree that would be advisable in any important
% application.  All use of these programs is entirely at the user's own risk.


% This program pretrains a deep autoencoder for MNIST dataset
% You can set the maximum number of epochs for pretraining each layer
% and you can set the architecture of the multilayer net.

clc
clear all
close all

maxepoch=10; %In the Science paper we use maxepoch=50, but it works just fine. 
numhid=1000; numpen=500; numpen2=250; numopen=30;

fprintf(1,'Converting Raw files into Matlab format \n');
converter; % 轉換資料為matlab的格式

fprintf(1,'Pretraining a deep autoencoder. \n');
fprintf(1,'The Science paper used 50 epochs. This uses %3i \n', maxepoch);

makebatches;
[numcases numdims numbatches]=size(batchdata);

fprintf(1,'Pretraining Layer 1 with RBM: %d-%d \n',numdims,numhid);
restart=1;
rbm;
hidrecbiases=hidbiases; %hidbiases為隐含層的偏置值
save mnistvh vishid hidrecbiases visbiases;
%儲存每層的變量,分别為權值,隐含層偏置值,可視層偏置值

fprintf(1,'\nPretraining Layer 2 with RBM: %d-%d \n',numhid,numpen);
batchdata=batchposhidprobs; %batchposhidprobs為第一個rbm的輸出機率值
numhid=numpen;
restart=1;
rbm;
hidpen=vishid; penrecbiases=hidbiases; hidgenbiases=visbiases;
save mnisthp hidpen penrecbiases hidgenbiases;
%mnisthp為所儲存的檔案名

fprintf(1,'\nPretraining Layer 3 with RBM: %d-%d \n',numpen,numpen2);
batchdata=batchposhidprobs;
numhid=numpen2;
restart=1;
rbm;
hidpen2=vishid; penrecbiases2=hidbiases; hidgenbiases2=visbiases;
save mnisthp2 hidpen2 penrecbiases2 hidgenbiases2;

fprintf(1,'\nPretraining Layer 4 with RBM: %d-%d \n',numpen2,numopen);
batchdata=batchposhidprobs;
numhid=numopen; 
restart=1;
rbmhidlinear;
hidtop=vishid; toprecbiases=hidbiases; topgenbiases=visbiases;
save mnistpo hidtop toprecbiases topgenbiases;

backprop; %Finetune
           

本次是訓練4個隐含層的autoencoder深度網絡結構,輸入層次元為784維,4個隐含層次元分别為1000,500,250,30。整個網絡權值的獲得流程梳理如下:

  1. 首先訓練第一個rbm網絡,即輸入層784維和第一個隐含層1000維構成的網絡。采用的方法是rbm優化,這個過程用的是訓練樣本,優化完畢後,計算訓練樣本在隐含層的輸出值。
  2. 利用1中的結果作為第2個rbm網絡訓練的輸入值,同樣用rbm網絡來優化第2個rbm網絡,并計算出網絡的輸出值。并且用同樣的方法訓練第3個rbm網絡和第4個rbm網絡。
  3. 将上面4個rbm網絡展開連接配接成新的網絡,且分成encoder和decoder部分。并用步驟1和2得到的網絡值給這個新網絡賦初值。
  4. 由于新網絡中最後的輸出和最初的輸入節點數是相同的,是以可以将最初的輸入值作為網絡理論的輸出标簽值,然後采用BP算法計算網絡的代價函數和代價函數的偏導數。
  5. 利用步驟3的初始值和步驟4的代價值和偏導值,采用共轭梯度下降法優化整個新網絡,得到最終的網絡權值。以上整個過程都是無監督的。

2. Rbm

% Version 1.000 
%
% Code provided by Geoff Hinton and Ruslan Salakhutdinov 
%
% Permission is granted for anyone to copy, use, modify, or distribute this
% program and accompanying programs and documents for any purpose, provided
% this copyright notice is retained and prominently displayed, along with
% a note saying that the original programs are available from our
% web page.
% The programs and documents are distributed without any warranty, express or
% implied.  As the programs were written for research purposes only, they have
% not been tested to the degree that would be advisable in any important
% application.  All use of these programs is entirely at the user's own risk.

% This program trains Restricted Boltzmann Machine in which
% visible, binary, stochastic pixels are connected to
% hidden, binary, stochastic feature detectors using symmetrically
% weighted connections. Learning is done with 1-step Contrastive Divergence.   
% The program assumes that the following variables are set externally:
% maxepoch  -- maximum number of epochs
% numhid    -- number of hidden units 
% batchdata -- the data that is divided into batches (numcases numdims numbatches)
% restart   -- set to 1 if learning starts from beginning 

epsilonw      = 0.1;   % Learning rate for weights 
epsilonvb     = 0.1;   % Learning rate for biases of visible units 
epsilonhb     = 0.1;   % Learning rate for biases of hidden units 
weightcost  = 0.0002;   
initialmomentum  = 0.5;
finalmomentum    = 0.9;
 %由此可見這裡隐含層和可視層的偏置值不是共用的,當然了,其權值是共用的
 
[numcases numdims numbatches]=size(batchdata);%[100,784,600]

if restart ==1,
  restart=0;
  epoch=1;

% Initializing symmetric weights and biases. 
  vishid     = 0.1*randn(numdims, numhid);%權值初始值随便給,784*1000
  hidbiases  = zeros(1,numhid);
  visbiases  = zeros(1,numdims);

  poshidprobs = zeros(numcases,numhid); %100*1000,單個batch正向傳播時隐含層的輸出機率
  neghidprobs = zeros(numcases,numhid);
  posprods    = zeros(numdims,numhid);
  negprods    = zeros(numdims,numhid);
  vishidinc  = zeros(numdims,numhid);
  hidbiasinc = zeros(1,numhid);
  visbiasinc = zeros(1,numdims);
  batchposhidprobs=zeros(numcases,numhid,numbatches);
  % 整個資料正向傳播時隐含層的輸出機率
end

for epoch = epoch:maxepoch,
 fprintf(1,'epoch %d\r',epoch); 
 errsum=0;
 for batch = 1:numbatches, %每次疊代都有周遊所有的batch
 fprintf(1,'epoch %d batch %d\r',epoch,batch); 

%%%%%%%%% START POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  data = batchdata(:,:,batch);
  % 每次疊代都需要取出一個batch的資料,每一行代表一個樣本值(這裡的資料是double的,不是01的,嚴格的說後面應将其01化)
  poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1))); 
  % 樣本正向傳播時隐含層節點的輸出機率 
  batchposhidprobs(:,:,batch)=poshidprobs;
  posprods    = data' * poshidprobs;
  %784*1000,這個是求系統的能量值用的,矩陣中每個元素表示對應的可視層節點和隐含層節點的乘積(包含此次樣本的資料對應值的累加)
  poshidact   = sum(poshidprobs);%針對樣本值進行求和
  posvisact = sum(data);

%%%%%%%%% END OF POSITIVE PHASE  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  poshidstates = poshidprobs > rand(numcases,numhid);
  %将隐含層資料01化(此步驟在posprods之後進行),按照機率值大小來判定.

%%%%%%%%% START NEGATIVE PHASE  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  negdata = 1./(1 + exp(-poshidstates*vishid' - repmat(visbiases,numcases,1)));% 反向進行時的可視層資料
  neghidprobs = 1./(1 + exp(-negdata*vishid - repmat(hidbiases,numcases,1))); % 反向進行後又馬上正向傳播的隐含層機率值    
  negprods  = negdata'*neghidprobs;% 同理也是計算能量值用的,784*1000
  neghidact = sum(neghidprobs);
  negvisact = sum(negdata); 

%%%%%%%%% END OF NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  err= sum(sum( (data-negdata).^2 ));% 重構後的內插補點
  errsum = err + errsum;

   if epoch>5,
     momentum=finalmomentum;
     %momentum為保持上一次權值更新增量的比例,如果疊代次數越少,則這個比例值可以稍微大一點
   else
     momentum=initialmomentum;
   end;

%%%%%%%%% UPDATE WEIGHTS AND BIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
    vishidinc = momentum*vishidinc + ...
                epsilonw*( (posprods-negprods)/numcases - weightcost*vishid);
    visbiasinc = momentum*visbiasinc + (epsilonvb/numcases)*(posvisact-negvisact);
    hidbiasinc = momentum*hidbiasinc + (epsilonhb/numcases)*(poshidact-neghidact);

    vishid = vishid + vishidinc;
    visbiases = visbiases + visbiasinc;
    hidbiases = hidbiases + hidbiasinc;

%%%%%%%%%%%%%%%% END OF UPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

  end
  fprintf(1, 'epoch %4i error %6.1f  \n', epoch, errsum); 
end;
           

下面來看下在程式中大緻實作RBM權值的優化步驟(假設是一個2層的RBM網絡,即隻有輸入層和輸出層,且這兩層上的變量是二值變量):

  1. 随機給網絡初始化一個權值矩陣w和偏置向量b。
  2. 對可視層輸入矩陣v正向傳播,計算出隐含層的輸出矩陣h,并計算出輸入v和h對應節點乘積的均值矩陣
  3. 此時2中的輸出h為機率值,将它随機01化為二值變量。
  4. 利用3中01化了的h方向傳播計算出可視層的矩陣v’.(按照道理,這個v'應該是要01化的)
  5. 對v’進行正向傳播計算出隐含層的矩陣h’,并計算出v’和h’對應節點乘積的均值矩陣。
  6. 用2中得到的均值矩陣減掉5中得到的均值矩陣,其結果作為對應權值增量的矩陣。
  7. 結合其對應的學習率,利用權值疊代公式對權值進行疊代。
  8. 重複計算2到7,直至收斂。

  偏置值的優化步驟:

  1. 随機給網絡初始化一個權值矩陣w和偏置向量b。
  2. 對可視層輸入矩陣v正向傳播,計算出隐含層的輸出矩陣h,并計算v層樣本的均值向量以及h層的均值向量。
  3. 此時2中的輸出h為機率值,将它随機01化為二值變量。
  4. 利用3中01化了的h方向傳播計算出可視層的矩陣v’.
  5. 對v’進行正向傳播計算出隐含層的矩陣h’, 并計算v‘層樣本的均值向量以及h’層的均值向量。
  6. 用2中得到的v方均值向量減掉5中得到的v’方的均值向量,其結果作為輸入層v對應偏置的增值向量。用2中得到的h方均值向量減掉5中得到的h’方的均值向量,其結果作為輸入層h對應偏置的增值向量。
  7. 結合其對應的學習率,利用權值疊代公式對偏置值進行疊代。
  8. 重複計算2到7,直至收斂。

  當然了,權值更新和偏置值更新每次疊代都是同時進行的,是以應該是同時收斂的。并且在權值更新公式也可以稍微作下變形,比如加入momentum變量,即本次權值更新的增量會保留一部分上次更新權值的增量值。

3. converter

實作的功能是将樣本集從.ubyte格式轉換成.ascii格式,然後繼續轉換成.mat格式。

4.makebatches

實作的是将原本的2維資料集變成3維的,因為分了多個批次,另外1維表示的是批次。

5. backprop

反向傳遞誤差

% Version 1.000
%
% Code provided by Ruslan Salakhutdinov and Geoff Hinton
%
% Permission is granted for anyone to copy, use, modify, or distribute this
% program and accompanying programs and documents for any purpose, provided
% this copyright notice is retained and prominently displayed, along with
% a note saying that the original programs are available from our
% web page.
% The programs and documents are distributed without any warranty, express or
% implied.  As the programs were written for research purposes only, they have
% not been tested to the degree that would be advisable in any important
% application.  All use of these programs is entirely at the user's own risk.

% This program fine-tunes an autoencoder with backpropagation.
% Weights of the autoencoder are going to be saved in mnist_weights.mat
% and trainig and test reconstruction errors in mnist_error.mat
% You can also set maxepoch, default value is 200 as in our paper.  

maxepoch=200;
fprintf(1,'\nFine-tuning deep autoencoder by minimizing cross entropy error. \n');%其微調通過最小化交叉熵來實作
fprintf(1,'60 batches of 1000 cases each. \n');

load mnistvh % 分别load4個rbm的參數
load mnisthp
load mnisthp2
load mnistpo 

makebatches;
[numcases numdims numbatches]=size(batchdata);
N=numcases; 

%%%% PREINITIALIZE WEIGHTS OF THE AUTOENCODER %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
w1=[vishid; hidrecbiases]; %分别裝載每層的權值和偏置值,将它們作為一個整體
w2=[hidpen; penrecbiases];
w3=[hidpen2; penrecbiases2];
w4=[hidtop; toprecbiases];
w5=[hidtop'; topgenbiases]; 
w6=[hidpen2'; hidgenbiases2]; 
w7=[hidpen'; hidgenbiases]; 
w8=[vishid'; visbiases];

%%%%%%%%%% END OF PREINITIALIZATIO OF WEIGHTS  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

l1=size(w1,1)-1;%每個網絡層中節點的個數
l2=size(w2,1)-1;
l3=size(w3,1)-1;
l4=size(w4,1)-1;
l5=size(w5,1)-1;
l6=size(w6,1)-1;
l7=size(w7,1)-1;
l8=size(w8,1)-1;
l9=l1;  %輸出層節點和輸入層的一樣
test_err=[];
train_err=[];


for epoch = 1:maxepoch

%%%%%%%%%%%%%%%%%%%% COMPUTE TRAINING RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
err=0; 
[numcases numdims numbatches]=size(batchdata);
N=numcases;
 for batch = 1:numbatches
  data = [batchdata(:,:,batch)];
  data = [data ones(N,1)];  % b補上一維,因為有偏置項
  w1probs = 1./(1 + exp(-data*w1)); w1probs = [w1probs  ones(N,1)];;
  %正向傳播,計算每一層的輸出,且同時在輸出上增加一維(值為常量1)
  w2probs = 1./(1 + exp(-w1probs*w2)); w2probs = [w2probs ones(N,1)];
  w3probs = 1./(1 + exp(-w2probs*w3)); w3probs = [w3probs  ones(N,1)];
  w4probs = w3probs*w4; w4probs = [w4probs  ones(N,1)];
  w5probs = 1./(1 + exp(-w4probs*w5)); w5probs = [w5probs  ones(N,1)];
  w6probs = 1./(1 + exp(-w5probs*w6)); w6probs = [w6probs  ones(N,1)];
  w7probs = 1./(1 + exp(-w6probs*w7)); w7probs = [w7probs  ones(N,1)];
  dataout = 1./(1 + exp(-w7probs*w8));
  err= err +  1/N*sum(sum( (data(:,1:end-1)-dataout).^2 )); %重構的誤內插補點
  end
 train_err(epoch)=err/numbatches; %總的誤內插補點(訓練樣本上)

%%%%%%%%%%%%%% END OF COMPUTING TRAINING RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%% DISPLAY FIGURE TOP ROW REAL DATA BOTTOM ROW RECONSTRUCTIONS %%%%%%%%%%%%%%%%%%%%%%%%%
fprintf(1,'Displaying in figure 1: Top row - real data, Bottom row -- reconstructions \n');
output=[];
 for ii=1:15
  output = [output data(ii,1:end-1)' dataout(ii,:)']; %output為15(因為是顯示15個數字)組,每組2列,分别為理論值和重構值
 end
   if epoch==1 
   close all 
   figure('Position',[100,600,1000,200]);
   else 
   figure(1)
   end 
   mnistdisp(output);
   drawnow;

%%%%%%%%%%%%%%%%%%%% COMPUTE TEST RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[testnumcases testnumdims testnumbatches]=size(testbatchdata);
N=testnumcases;
err=0;
for batch = 1:testnumbatches
  data = [testbatchdata(:,:,batch)];
  data = [data ones(N,1)];
  w1probs = 1./(1 + exp(-data*w1)); w1probs = [w1probs  ones(N,1)];
  w2probs = 1./(1 + exp(-w1probs*w2)); w2probs = [w2probs ones(N,1)];
  w3probs = 1./(1 + exp(-w2probs*w3)); w3probs = [w3probs  ones(N,1)];
  w4probs = w3probs*w4; w4probs = [w4probs  ones(N,1)];
  w5probs = 1./(1 + exp(-w4probs*w5)); w5probs = [w5probs  ones(N,1)];
  w6probs = 1./(1 + exp(-w5probs*w6)); w6probs = [w6probs  ones(N,1)];
  w7probs = 1./(1 + exp(-w6probs*w7)); w7probs = [w7probs  ones(N,1)];
  dataout = 1./(1 + exp(-w7probs*w8));
  err = err +  1/N*sum(sum( (data(:,1:end-1)-dataout).^2 ));
  end
 test_err(epoch)=err/testnumbatches;
 fprintf(1,'Before epoch %d Train squared error: %6.3f Test squared error: %6.3f \t \t \n',epoch,train_err(epoch),test_err(epoch));

%%%%%%%%%%%%%% END OF COMPUTING TEST RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 tt=0;
 for batch = 1:numbatches/10 %測試樣本numbatches是100
 fprintf(1,'epoch %d batch %d\r',epoch,batch);

%%%%%%%%%%% COMBINE 10 MINIBATCHES INTO 1 LARGER MINIBATCH %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 tt=tt+1; 
 data=[];
 for kk=1:10
  data=[data 
        batchdata(:,:,(tt-1)*10+kk)]; 
 end 

%%%%%%%%%%%%%%% PERFORM CONJUGATE GRADIENT WITH 3 LINESEARCHES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%共轭梯度線性搜尋
  max_iter=3;
  VV = [w1(:)' w2(:)' w3(:)' w4(:)' w5(:)' w6(:)' w7(:)' w8(:)']';;
  % 把所有權值(已經包括了偏置值)變成一個大的列向量
  Dim = [l1; l2; l3; l4; l5; l6; l7; l8; l9];
  %每層網絡對應節點的個數(不包括偏置值)
  [X, fX] = minimize(VV,'CG_MNIST',max_iter,Dim,data);%該函數時使用共轭梯度的方法來對參數X進行優化
  
  w1 = reshape(X(1:(l1+1)*l2),l1+1,l2);
  xxx = (l1+1)*l2;
  w2 = reshape(X(xxx+1:xxx+(l2+1)*l3),l2+1,l3);
  xxx = xxx+(l2+1)*l3;
  w3 = reshape(X(xxx+1:xxx+(l3+1)*l4),l3+1,l4);
  xxx = xxx+(l3+1)*l4;
  w4 = reshape(X(xxx+1:xxx+(l4+1)*l5),l4+1,l5);
  xxx = xxx+(l4+1)*l5;
  w5 = reshape(X(xxx+1:xxx+(l5+1)*l6),l5+1,l6);
  xxx = xxx+(l5+1)*l6;
  w6 = reshape(X(xxx+1:xxx+(l6+1)*l7),l6+1,l7);
  xxx = xxx+(l6+1)*l7;
  w7 = reshape(X(xxx+1:xxx+(l7+1)*l8),l7+1,l8);
  xxx = xxx+(l7+1)*l8;
  w8 = reshape(X(xxx+1:xxx+(l8+1)*l9),l8+1,l9);
%依次重新指派為優化後的參數
%%%%%%%%%%%%%%% END OF CONJUGATE GRADIENT WITH 3 LINESEARCHES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 end

 save mnist_weights w1 w2 w3 w4 w5 w6 w7 w8 %前面一個是檔案名
 save mnist_error test_err train_err;

end

           

5. CG_MNIST

 函數CG_MNIST形式如下:

  function [f, df] = CG_MNIST(VV,Dim,XX);

  該函數實作的功能是計算網絡代價函數值f,以及f對網絡中各個參數值的偏導數df,權值和偏置值是同時處理。其中參數VV為網絡中所有參數構成的列向量,參數Dim為每層網絡的節點數構成的向量,XX為訓練樣本集合。f和df分别表示網絡的代價函數和偏導函數值。 

6.minimize——共轭梯度下降的優化函數形式

  [X, fX, i] = minimize(X, f, length, P1, P2, P3, ... )

  該函數時使用共轭梯度的方法來對參數X進行優化,是以X是網絡的參數值,為一個列向量。f是一個函數的名稱,它主要是用來計算網絡中的代價函數以及代價函數對各個參數X的偏導函數,f的參數值分别為X,以及minimize函數後面的P1,P2,P3,…使用共轭梯度法進行優化的最大線性搜尋長度為length。傳回值X為找到的最優參數,fX為在此最優參數X下的代價函數,i為線性搜尋的長度(即疊代的次數)。

本文參考如下部落格:

http://www.cnblogs.com/tornadomeet/archive/2013/04/30/3052349.html

繼續閱讀