天天看點

語義分割代碼解讀一(FCN, UNet, SegNet, Dilated Convolution)

1.FCN

語義分割代碼解讀一(FCN, UNet, SegNet, Dilated Convolution)

簡單的來說,

FCN

CNN

的差別在把于

CNN

最後的全連接配接層換成卷積層,讓卷積網絡在一張更大的輸入圖檔上滑動,得到多個輸出,這樣的轉化可以讓我們在單個向前傳播的過程中完成上述的操作。

When these receptive fields overlap significantly, both feedforward computation and backpropagation are much more efficient when computed layer-by-layer over an entire image instead of independently patch-by-patch.

當這些接受域明顯重疊時,在整個圖像上逐層計算而不是逐個塊地獨立計算時,前饋計算和反向傳播都要高效得多。

(1)正向傳遞過程

全卷積網絡的基礎網絡即根據上圖中的順序依次通過幾層卷積層,之後通過幾層

dropout

層,得到不同層的預測結果,論文中還采用了如下圖所示的方式來融合各個層之間的特征。

語義分割代碼解讀一(FCN, UNet, SegNet, Dilated Convolution)

FCN32s

表示特征圖為原圖像大小的

1/32

,可以采用任意一個普通的分類網絡提取到代特征圖大小,然後用上圖所示方法融合,即:

  1. pool5

    預測的結果

    FCN32s

    ,經過上采樣後被放大兩倍,與

    pool4

    預測的結果相加,得到

    FCN16s

  2. FCN16s

    ,經過上采樣後被放大兩倍,與

    pool3

    預測的結果相加,得到

    FCN8s

上采樣的示意圖如下圖所示:

語義分割代碼解讀一(FCN, UNet, SegNet, Dilated Convolution)

(2) 反向傳遞過程

用到了反卷積,示意圖如下圖所示:

語義分割代碼解讀一(FCN, UNet, SegNet, Dilated Convolution)

例如下面是用

pytorch

實作

fcn8s

的一個例子,注意例子中每一層的卷積核數和論文中有差别。

class FCN8s(nn.Module):
    def __init__(self, pretrained_net, n_class):
        super().__init__()
        self.n_class = n_class
        self.pretrained_net = pretrained_net
        self.relu    = nn.ReLU(inplace=True)
        self.deconv1 = nn.ConvTranspose2d(512, 512, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1)
        self.bn1     = nn.BatchNorm2d(512)
        self.deconv2 = nn.ConvTranspose2d(512, 256, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1)
        self.bn2     = nn.BatchNorm2d(256)
        self.deconv3 = nn.ConvTranspose2d(256, 128, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1)
        self.bn3     = nn.BatchNorm2d(128)
        self.deconv4 = nn.ConvTranspose2d(128, 64, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1)
        self.bn4     = nn.BatchNorm2d(64)
        self.deconv5 = nn.ConvTranspose2d(64, 32, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1)
        self.bn5     = nn.BatchNorm2d(32)
        self.classifier = nn.Conv2d(32, n_class, kernel_size=1)

    def forward(self, x):
    	# pretrained_net可以采用vggnet等分類網絡
        output = self.pretrained_net(x)
        x5 = output['x5']  # size=(N, 512, x.H/32, x.W/32)
        x4 = output['x4']  # size=(N, 512, x.H/16, x.W/16)
        x3 = output['x3']  # size=(N, 256, x.H/8,  x.W/8)

        score = self.relu(self.deconv1(x5))               # size=(N, 512, x.H/16, x.W/16)
        score = self.bn1(score + x4)                      # element-wise add, size=(N, 512, x.H/16, x.W/16)
        score = self.relu(self.deconv2(score))            # size=(N, 256, x.H/8, x.W/8)
        score = self.bn2(score + x3)                      # element-wise add, size=(N, 256, x.H/8, x.W/8)
        score = self.bn3(self.relu(self.deconv3(score)))  # size=(N, 128, x.H/4, x.W/4)
        score = self.bn4(self.relu(self.deconv4(score)))  # size=(N, 64, x.H/2, x.W/2)
        score = self.bn5(self.relu(self.deconv5(score)))  # size=(N, 32, x.H, x.W)
        score = self.classifier(score)                    # size=(N, n_class, x.H/1, x.W/1)

        return score  # size=(N, n_class, x.H/1, x.W/1)
           

2. UNet

語義分割代碼解讀一(FCN, UNet, SegNet, Dilated Convolution)

UNet

為對稱語義分割模型,該網絡模型主要由一個收縮路徑和一個對稱擴張路徑組成,收縮路徑用來獲得上下文資訊,對稱擴張路徑用來精确定位分割邊界。下面是用

pytorch

實作

unet

的一個例子。

class conv_block(nn.Module):
    """
    Convolution Block 
    """
    def __init__(self, in_ch, out_ch):
        super(conv_block, self).__init__()
        
        self.conv = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, kernel_size=3, stride=1, padding=1, bias=True),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_ch, out_ch, kernel_size=3, stride=1, padding=1, bias=True),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True))

    def forward(self, x):
        x = self.conv(x)
        return x


class up_conv(nn.Module):
    """
    Up Convolution Block
    """
    def __init__(self, in_ch, out_ch):
        super(up_conv, self).__init__()
        self.up = nn.Sequential(
            nn.Upsample(scale_factor=2),
            nn.Conv2d(in_ch, out_ch, kernel_size=3, stride=1, padding=1, bias=True),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        x = self.up(x)
        return x


class U_Net(nn.Module):
    """
    UNet - Basic Implementation
    Paper : https://arxiv.org/abs/1505.04597
    """
    def __init__(self, in_ch=3, out_ch=1):
        super(U_Net, self).__init__()

        n1 = 64
        filters = [n1, n1 * 2, n1 * 4, n1 * 8, n1 * 16]
        
        self.Maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.Maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.Maxpool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.Maxpool4 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.Conv1 = conv_block(in_ch, filters[0])
        self.Conv2 = conv_block(filters[0], filters[1])
        self.Conv3 = conv_block(filters[1], filters[2])
        self.Conv4 = conv_block(filters[2], filters[3])
        self.Conv5 = conv_block(filters[3], filters[4])

        self.Up5 = up_conv(filters[4], filters[3])
        self.Up_conv5 = conv_block(filters[4], filters[3])

        self.Up4 = up_conv(filters[3], filters[2])
        self.Up_conv4 = conv_block(filters[3], filters[2])

        self.Up3 = up_conv(filters[2], filters[1])
        self.Up_conv3 = conv_block(filters[2], filters[1])

        self.Up2 = up_conv(filters[1], filters[0])
        self.Up_conv2 = conv_block(filters[1], filters[0])

        self.Conv = nn.Conv2d(filters[0], out_ch, kernel_size=1, stride=1, padding=0)

       # self.active = torch.nn.Sigmoid()

    def forward(self, x):
		# 收縮路徑
        e1 = self.Conv1(x)

        e2 = self.Maxpool1(e1)
        e2 = self.Conv2(e2)

        e3 = self.Maxpool2(e2)
        e3 = self.Conv3(e3)

        e4 = self.Maxpool3(e3)
        e4 = self.Conv4(e4)

        e5 = self.Maxpool4(e4)
        e5 = self.Conv5(e5)
        
		# 擴張路徑
        d5 = self.Up5(e5)
        d5 = torch.cat((e4, d5), dim=1)

        d5 = self.Up_conv5(d5)

        d4 = self.Up4(d5)
        # 特征融合(拼接)
        d4 = torch.cat((e3, d4), dim=1)
        d4 = self.Up_conv4(d4)

        d3 = self.Up3(d4)
        # 特征融合(拼接)
        d3 = torch.cat((e2, d3), dim=1)
        d3 = self.Up_conv3(d3)

        d2 = self.Up2(d3)
        # 特征融合(拼接)
        d2 = torch.cat((e1, d2), dim=1)
        d2 = self.Up_conv2(d2)

        out = self.Conv(d2)
        return out
           

3. SegNet

語義分割代碼解讀一(FCN, UNet, SegNet, Dilated Convolution)

SegNe

t和

FCN

思路十分相似,隻是

Encoder

Decoder(Upsampling)

使用的技術不一緻,在

SegNet

中的

Pooling

UnPooling

多了一個

index

功能。

語義分割代碼解讀一(FCN, UNet, SegNet, Dilated Convolution)

如上圖所示,由左到右為一個最大池化過程,由右到左為一個上采樣過程。但是在上采樣當中存在着一個不确定性,即左圖中的某個

1x1

區域将會被右圖的

1x1

特征點取代,如果是随機将這個特征點配置設定到任意的一個位置或者配置設定到一個固定的位置會引入誤差,這是我們所不希望的。

那麼在

Decoder

中需要對特征圖進行上采樣的時候,我們就可以利用它對應的池化層的

Pooling Indices

來确定某個

1x1

特征點應該放到上采樣後的

2x2

區域中的哪個位置。此過程的如下圖所示。

語義分割代碼解讀一(FCN, UNet, SegNet, Dilated Convolution)

這一功能已經被添加在

pytorch

官方函數中。使用方法即在使用

MaxPool2d

時将

return_indices

設定為

True

(預設為

False

)。以下為

MaxPool2d

的參數:

語義分割代碼解讀一(FCN, UNet, SegNet, Dilated Convolution)

一個使用

return_indices

的例子如下

import torch.nn as nn
pool = nn.MaxPool2d(2, stride=2, return_indices=True)
unpool = nn.MaxUnpool2d(2, stride=2)
           

下面是我用

pytorch

實作

segnet

的一個例子。

class conv_block(nn.Module):
    def __init__(self, in_ch, out_ch, batchNorm_momentum = 0.1, three = False):
        super(conv_block, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_ch, momentum= batchNorm_momentum),
            nn.ReLU(inplace=True)
        )    

    def forward(self, x):
        x = self.conv(x)
        return x


class SegNet(nn.Module):
    def __init__(self,input_nbr,label_nbr):
        super(SegNet, self).__init__()
        batchNorm_momentum = 0.1

        self.Conv11 = conv_block(input_nbr, 64, batchNorm_momentum)
        self.Conv12 = conv_block(64, 64, batchNorm_momentum)
        self.Maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.Conv21 = conv_block(64, 128, batchNorm_momentum)
        self.Conv22 = conv_block(128, 128, batchNorm_momentum)
        self.Maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.Conv31 = conv_block(128, 256, batchNorm_momentum)
        self.Conv32 = conv_block(256, 256, batchNorm_momentum)
        self.Conv33 = conv_block(256, 256, batchNorm_momentum)
        self.Maxpool3 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.Conv41 = conv_block(256, 512, batchNorm_momentum)
        self.Conv42 = conv_block(512, 512, batchNorm_momentum)
        self.Conv43 = conv_block(512, 512, batchNorm_momentum)
        self.Maxpool4 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.Conv51 = conv_block(512, 512, batchNorm_momentum)
        self.Conv52 = conv_block(512, 512, batchNorm_momentum)
        self.Conv53 = conv_block(512, 512, batchNorm_momentum)
        self.Maxpool5 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.MaxUnpool5 = nn.MaxUnpool2d(kernel_size=2, stride=2)
        self.Conv53d = conv_block(512, 512, batchNorm_momentum)
        self.Conv52d = conv_block(512, 512, batchNorm_momentum)
        self.Conv51d = conv_block(512, 512, batchNorm_momentum)

        self.MaxUnpool4 = nn.MaxUnpool2d(kernel_size=2, stride=2)
        self.Conv43d = conv_block(512, 512, batchNorm_momentum)
        self.Conv42d = conv_block(512, 512, batchNorm_momentum)
        self.Conv41d = conv_block(512, 256, batchNorm_momentum)

        self.MaxUnpool3 = nn.MaxUnpool2d(kernel_size=2, stride=2)        
        self.Conv33d = conv_block(256, 256, batchNorm_momentum)
        self.Conv32d = conv_block(256, 256, batchNorm_momentum)
        self.Conv31d = conv_block(256, 128, batchNorm_momentum)

        self.MaxUnpool2 = nn.MaxUnpool2d(kernel_size=2, stride=2)
        self.Conv22d = conv_block(128, 128, batchNorm_momentum)
        self.Conv21d = conv_block(128, 64, batchNorm_momentum)

        self.Conv22d = conv_block(64, 64, batchNorm_momentum)
        self.Conv11d = nn.Conv2d(64, label_nbr, kernel_size=3, padding=1)
        self.relu1d = nn.ReLU(inplace=True)

    def forward(self, x):
        # Stage 1
        x1 = self.Conv11(x)
        x1 = self.Conv12(x1)
        x1 = self.Maxpool1(x1)

        # Stage 2
        x2 = self.Conv21(x1)
        x2 = self.Conv22(x1)
        x2 = self.Maxpool2(x2)

        # Stage 3
        x3 = self.Conv31(x2)
        x3 = self.Conv32(x3)
        x3 = self.Conv33(x3)
        x3 = self.Maxpool3(x3)

        # Stage 4
        x4 = self.Conv41(x3)
        x4 = self.Conv42(x4)
        x4 = self.Conv43(x4)
        x4 = self.Maxpool4(x4)

        # Stage 5
        x5 = self.Conv51(x4)
        x5 = self.Conv52(x5)
        x5 = self.Conv53(x5)
        x5 = self.Maxpool5(x5)

        # Stage 5d
        x5d = self.MaxUnpool5(x5)
        x5d = self.Conv53d(x5d)
        x5d = self.Conv52d(x5d)
        x5d = self.Conv51d(x5d)

        # Stage 4d
        x4d = self.MaxUnpool4(x5d)
        x4d = self.Conv43d(x4d)
        x4d = self.Conv42d(x4d)
        x4d = self.Conv41d(x4d)

        # Stage 3d
        x3d = self.MaxUnpool3(x4d)
        x3d = self.Conv33d(x3d)
        x3d = self.Conv32d(x3d)
        x3d = self.Conv31d(x3d)

        # Stage 2d
        x2d = self.MaxUnpool2(x3d)
        x2d = self.Conv22d(x2d)
        x2d = self.Conv21d(x2d)

        # Stage 1d
        x1d = self.MaxUnpool1(x2d)
        x1d = self.Conv12d(x1d)
        x1d = self.Bn1d(x1d)
        x1d = self.ReLU(x1d)
        x1d = self.Conv11d(x1d)

        return x1d
           

4. Dilated Convolution

使用了空洞卷積,這是一種可用于密集預測的卷積層。示意圖如下:

語義分割代碼解讀一(FCN, UNet, SegNet, Dilated Convolution)

原論文中對于參數的介紹圖如下:

語義分割代碼解讀一(FCN, UNet, SegNet, Dilated Convolution)

(a) 對應

3x3

1-dilated conv

,和普通的卷積操作一樣。

(b) 對應

3x3

2-dilated conv

,實際的卷積核大小還是

3x3

,但是間隔為

1

,能達到

7x7

的感受野。

© 對應

3x3

4-dilated conv

,同理,能達到

15x15

的感受野。

使用方法:在

pytorch

官方文檔中使用

dilation

參數。

語義分割代碼解讀一(FCN, UNet, SegNet, Dilated Convolution)

論文了解部分參考了以下部落格:

【語義分割】一篇看完就懂的最新深度語義分割模型綜述

史上最全語義分割綜述(FCN,UNet,SegNet,Deeplab,ASPP…)

繼續閱讀