前言

最近在做FCN語義分割方面的項目，在finetune的時候遇到了訓練loss不下降的情況，在經過自己的摸索之後，最後loss曲線下降下去，這裡将這個過程記錄下來，希望對大家有所幫助。

1. 資料集問題

資料集的問題在按照自己的需求進行語義分割的時候都會遇到。其實資料集的制作很簡單，隻需要按照自己的需求設計好分類就行了，可以參考本人之前的部落格進行修改。需要提醒的是每個0~255的像素值代表的是一個分類，灰階圖像就有256個分類。

在進行訓練自己資料的時候，分類數目和Github上下載下傳下的檔案裡面定義的不一樣，這裡就需要在train.prototxt和val.prototxt檔案中将num_output: 21改為num_output: +自己的分類數目。也可以通過train.prototxt中删除Data層Loss層得到網絡的定義檔案deploy.prototxt

2. 基礎學習率問題

部落客在進行訓練自己的資料的時候在開始進行訓練的時候設定的base_lr為1e-4，結果發現loss曲線就是從開始到結束的一條直線，根本沒有絲毫的下降，但是将base_lr設定為1e-10的時候loss就下降很快了-_-||

3. 網絡權重初始化問題

這是原來調用訓練好的模型進行finretune的代碼

solver = caffe.SGDSolver('solver.prototxt')
solver.net.copy_from(weights)

要将其改成

solver = caffe.SGDSolver('solver.prototxt')  
# 這裡開始的3行都是我們需要增加的
vgg_net = caffe.Net(vgg_proto,vgg_weights,caffe.TRAIN)  
surgery.transplant(solver.net,vgg_net)  
del vgg_net

其實它是先把這個權重值放到了VGG16的網絡中，就是vgg_net = caffe.Net(vgg_proto, vgg_weights, caffe.TRAIN)這一句話

然後把vgg_net的權值通過一個函數轉化到我現在這個solver.net裡面去，surgery.transplant(solver.net, vgg_net)

就是這麼一個過程，附上transplant函數的源碼以供參考

def transplant(new_net, net, suffix=''):
    """
    Transfer weights by copying matching parameters, coercing parameters of
    incompatible shape, and dropping unmatched parameters.

    The coercion is useful to convert fully connected layers to their
    equivalent convolutional layers, since the weights are the same and only
    the shapes are different.  In particular, equivalent fully connected and
    convolution layers have shapes O x I and O x I x H x W respectively for O
    outputs channels, I input channels, H kernel height, and W kernel width.

    Both  `net` to `new_net` arguments must be instantiated `caffe.Net`s.
    """
    for p in net.params:
        p_new = p + suffix
        if p_new not in new_net.params:
            print 'dropping', p
            continue
        for i in range(len(net.params[p])):
            if i > (len(new_net.params[p_new]) - ):
                print 'dropping', p, i
                break
            if net.params[p][i].data.shape != new_net.params[p_new][i].data.shape:
                print 'coercing', p, i, 'from', net.params[p][i].data.shape, 'to', new_net.params[p_new][i].data.shape
            else:
                print 'copying', p, ' -> ', p_new, i
            new_net.params[p_new][i].data.flat = net.params[p][i].data.flat

4. 反卷積層初始化問題

loss不下降，開始訓練什麼樣子，最後還是什麼樣子

# surgeries  
interp_layers = [k for k in solver.net.params.keys() if 'up' in k]  
surgery.interp(solver.net, interp_layers)

在上面的代碼中，原作者對層名稱中有”up”字樣的層做了操作，這恰好是訓練檔案中的反卷積層。

這是對應檔案中的代碼

def interp(net, layers):  
    """ 
    Set weights of each layer in layers to bilinear kernels for interpolation. 
    """  
    for l in layers:  
        m, k, h, w = net.params[l][].data.shape  
        if m != k and k != :  
            print 'input + output channels need to be the same or |output| == 1'  
            raise  
        if h != w:  
            print 'filters need to be square'  
            raise  
        filt = upsample_filt(h)  
        net.params[l][].data[range(m), range(k), :, :] = filt  

def upsample_filt(size):  
    """ 
    Make a 2D bilinear kernel suitable for upsampling of the given (h, w) size. 
    """  
    factor = (size + ) //   
    if size %  == :  
        center = factor -   
    else:  
        center = factor -   
    og = np.ogrid[:size, :size]  
    return ( - abs(og[] - center) / factor) * \  
           ( - abs(og[] - center) / factor)

原來，官方自帶的solve.py檔案中的interp函數中的upsample_filt函數已經對反卷積層參數進行了雙線性插值初始化。在網上部落格中看到有部落客使用的是.prototxt檔案中對反卷積層進行初始化，這是不對的。應該使用官方例子中的方式進行。

5. 結果

這是本人最後經過2W多次訓練之後得到的結果：

FCN語義分割訓練自己資料不收斂處理記錄前言1. 資料集問題2. 基礎學習率問題3. 網絡權重初始化問題4. 反卷積層初始化問題5. 結果

FCN語義分割訓練自己資料不收斂處理記錄前言1. 資料集問題2. 基礎學習率問題3. 網絡權重初始化問題4. 反卷積層初始化問題5. 結果

前言

1. 資料集問題

2. 基礎學習率問題

3. 網絡權重初始化問題

4. 反卷積層初始化問題

5. 結果

繼續閱讀

caffe繪制訓練過程中的accuracy、loss曲線

CNN中1x1卷積的作用

神經網絡可視化工具Netron

CAFFE 人臉表情識别

關于深度學習在圖像識别上應用的學習筆記

Caffe2——cifar10資料集建立lmdb或leveldb類型的資料

Caffe3——ImageNet資料集建立lmdb類型的資料

caffe學習筆記1：轉化自己的資料為（leveldb/lmdb）檔案

caffe為什麼采用LMDB、LEVELDB，而不是直接讀取原始資料

【caffe-Windows】以mnist為例lmdb格式資料前言第一步第二步第三步

【AI】caffe使用步驟（一）：将标注資料生成lmdb或leveldb

caffe windows 資料轉換lmdb can not find file...

【caffe】讀取lmdb檔案中的内容

算法工程師校招攻略

Ubuntu16.04下Caffe環境搭建：cuda8.0 + opencv2.4.13

Ubuntu14.04+cuda8.0+caffe+MATLAB