U^2 Net顯著性檢測分割摳圖

論文：U^2 -Net: Going Deeper with Nested U-Structure for Salient Object Detection

代碼：https://github.com/xuebinqin/U-2-Net

1. 簡介

這是一篇關于顯著性檢測的文章，作者是秦雪彬大佬，CVPR2019 BASNet的作者。文章是關于顯著性檢測的，但其效果突出被用于分割、摳圖等領域，也取得了卓越的效果，詳情參見GitHub 不斷有新的應用出現。文章提出了一種兩級嵌套的U-Net結構的網絡：U^2 Net。該網絡優點：

1.提出新的RSU(ReSidual U-blocks)子產品，融合不同尺度感受野的特征，來捕捉不同尺度的上下文資訊；
2.基于RSU子產品的池化(pooling) 操作，在不顯著增加計算成本的前提下，增加了整個網絡結構的深度(depth).

因為這些設計使得我們可以從頭訓練網絡，而不需要使用現有的圖像分類網絡backbone。

ps: 因為這篇論文模型結構比較簡潔明了，開源的代碼也比較清晰，是以針對

U^2 Net

的介紹會更側重于使用代碼講解。

2. 方法設計

2.1 RSU-L

上圖為普通卷積block，Res-like block，Inception-like block，Dense-like block和Residual U-blocks的對比圖，明顯可以看出Residual U-blocks類似一個簡單的U-Net。**其中L是編碼器中的層數，Cin，Cout表示輸入和輸出通道，M表示RSU内部層中的通道數。**采用的是encoder-decoder的結構，下采樣使用了大量的池化層，解碼上采樣使用的是雙線性插值。

文章定義了卷積子產品：

class REBNCONV(nn.Module):
    def __init__(self,in_ch=3,out_ch=3,dirate=1):
        super(REBNCONV,self).__init__()

        self.conv_s1 = nn.Conv2d(in_ch,out_ch,3,padding=1*dirate,dilation=1*dirate)
        self.bn_s1 = nn.BatchNorm2d(out_ch)
        self.relu_s1 = nn.ReLU(inplace=True)

    def forward(self,x):

        hx = x
        xout = self.relu_s1(self.bn_s1(self.conv_s1(hx)))

        return xout

複制

主要注意膨脹系數：

dirate

，dirate==1的時候就是普通卷積，dirate!=1的時候是空洞卷積。

上圖e中顯示的就是RSU-7, 下面以RSU-7為例進行說明，其他層的代碼幾乎一緻：

### RSU-7 ###
class RSU7(nn.Module):#UNet07DRES(nn.Module):

    def __init__(self, in_ch=3, mid_ch=12, out_ch=3):
        super(RSU7,self).__init__()

        self.rebnconvin = REBNCONV(in_ch,out_ch,dirate=1)

        self.rebnconv1 = REBNCONV(out_ch,mid_ch,dirate=1)
        self.pool1 = nn.MaxPool2d(2,stride=2,ceil_mode=True)

        self.rebnconv2 = REBNCONV(mid_ch,mid_ch,dirate=1)
        self.pool2 = nn.MaxPool2d(2,stride=2,ceil_mode=True)

        self.rebnconv3 = REBNCONV(mid_ch,mid_ch,dirate=1)
        self.pool3 = nn.MaxPool2d(2,stride=2,ceil_mode=True)

        self.rebnconv4 = REBNCONV(mid_ch,mid_ch,dirate=1)
        self.pool4 = nn.MaxPool2d(2,stride=2,ceil_mode=True)

        self.rebnconv5 = REBNCONV(mid_ch,mid_ch,dirate=1)
        self.pool5 = nn.MaxPool2d(2,stride=2,ceil_mode=True)

        self.rebnconv6 = REBNCONV(mid_ch,mid_ch,dirate=1)

        self.rebnconv7 = REBNCONV(mid_ch,mid_ch,dirate=2) ###

        self.rebnconv6d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
        self.rebnconv5d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
        self.rebnconv4d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
        self.rebnconv3d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
        self.rebnconv2d = REBNCONV(mid_ch*2,mid_ch,dirate=1)
        self.rebnconv1d = REBNCONV(mid_ch*2,out_ch,dirate=1)

    def forward(self,x):

        hx = x
        hxin = self.rebnconvin(hx)

        hx1 = self.rebnconv1(hxin)
        hx = self.pool1(hx1)

        hx2 = self.rebnconv2(hx)
        hx = self.pool2(hx2)

        hx3 = self.rebnconv3(hx)
        hx = self.pool3(hx3)

        hx4 = self.rebnconv4(hx)
        hx = self.pool4(hx4)

        hx5 = self.rebnconv5(hx)
        hx = self.pool5(hx5)

        hx6 = self.rebnconv6(hx)

        hx7 = self.rebnconv7(hx6)

        hx6d =  self.rebnconv6d(torch.cat((hx7,hx6),1)) # 殘差連接配接 F1(x)+U(F1(x))
        hx6dup = _upsample_like(hx6d,hx5) # 上采樣 雙線性插值

        hx5d =  self.rebnconv5d(torch.cat((hx6dup,hx5),1))
        hx5dup = _upsample_like(hx5d,hx4)

        hx4d = self.rebnconv4d(torch.cat((hx5dup,hx4),1))
        hx4dup = _upsample_like(hx4d,hx3)

        hx3d = self.rebnconv3d(torch.cat((hx4dup,hx3),1))
        hx3dup = _upsample_like(hx3d,hx2)

        hx2d = self.rebnconv2d(torch.cat((hx3dup,hx2),1))
        hx2dup = _upsample_like(hx2d,hx1)

        hx1d = self.rebnconv1d(torch.cat((hx2dup,hx1),1))

        return hx1d + hxin

複制

對應結構的本質如下：

十分類似殘差塊，将第二個weight layer換成了U-net。這種設計使得網絡能從多個尺度直接從殘差塊中提取特征。并且U結構的計算開銷很小，因為大多數操作都是在下采樣的特征映射上進行的。詳細展開來講RSU主要由三部分組成：

一個輸入卷積層，它将輸入的feature map x ( H × W × C i n ) (H \times W \times C_{in}) (H×W×Cin)轉換成中間feature map F 1 ( x ) F1(x) F1(x)， F 1 ( x ) F1(x) F1(x)通道數為 C o u t C_{out} Cout。這是一個用于局部特征提取的普通卷積層。
一個U-like的對稱的encoder-decoder結構，高度為L，以中間feature map F 1 ( x ) F1(x) F1(x)為輸入，去學習提取和編碼多尺度文本資訊 U ( F 1 ( x ) ) U(F1(x)) U(F1(x)),U表示類U-Net結構。更大L會得到更深層的U-block(RSU)，更多的池操作，更大的感受野和更豐富的局部和全局特征。配置此參數允許從具有任意空間分辨率的輸入特征圖中提取多尺度特征。從逐漸降采樣特征映射中提取多尺度特征，并通過漸進上采樣、合并和卷積等方法将其編碼到高分辨率的特征圖中。這一過程減少了大尺度直接上采樣造成的細節損失。
一種殘差連接配接，它通過求和來融合局部特征和多尺度特征： F 1 ( x ) + U ( F 1 ( x ) ) F1(x)+U(F1(x)) F1(x)+U(F1(x))

2.2 U^2 net網絡結構

U^2 Net整體結構如上圖所示，整個結構較為清晰，En_1和De_1，En_2和De_2，En_3和De_3，En_,4和De_4，En_,5和De_5以及En_6分别使用了RSU-7，RSU-6，RSU-5，RSU-4，RSU-4F和RSU-4F。圖中紅字基本辨別出來了。

文章給後續設計留下了空間，1. U^2-Net的每一個Block都是一個U-Net結構的子產品，即上述Residual U-blocks。當然，你也可以繼續Going Deeper, 每個Block裡面的U-Net的子Block仍然可以是一個U-Net結構，命名為U^3-Net。2. 更改不同的RSU-L的L, 3. 更改每個RSU中的I,M,O 通道數，文章據此還提出了輕量化網絡U^2 Netp，模型體積很小，但是擁有不俗的效果。

##### U^2-Net ####
class U2NET(nn.Module):

    def __init__(self,in_ch=3,out_ch=1):
        super(U2NET,self).__init__()

        self.stage1 = RSU7(in_ch,32,64)
        self.pool12 = nn.MaxPool2d(2,stride=2,ceil_mode=True)

        self.stage2 = RSU6(64,32,128)
        self.pool23 = nn.MaxPool2d(2,stride=2,ceil_mode=True)

        self.stage3 = RSU5(128,64,256)
        self.pool34 = nn.MaxPool2d(2,stride=2,ceil_mode=True)

        self.stage4 = RSU4(256,128,512)
        self.pool45 = nn.MaxPool2d(2,stride=2,ceil_mode=True)

        self.stage5 = RSU4F(512,256,512)
        self.pool56 = nn.MaxPool2d(2,stride=2,ceil_mode=True)

        self.stage6 = RSU4F(512,256,512)

        # decoder
        self.stage5d = RSU4F(1024,256,512)
        self.stage4d = RSU4(1024,128,256)
        self.stage3d = RSU5(512,64,128)
        self.stage2d = RSU6(256,32,64)
        self.stage1d = RSU7(128,16,64)

        self.side1 = nn.Conv2d(64,out_ch,3,padding=1)
        self.side2 = nn.Conv2d(64,out_ch,3,padding=1)
        self.side3 = nn.Conv2d(128,out_ch,3,padding=1)
        self.side4 = nn.Conv2d(256,out_ch,3,padding=1)
        self.side5 = nn.Conv2d(512,out_ch,3,padding=1)
        self.side6 = nn.Conv2d(512,out_ch,3,padding=1)

        self.outconv = nn.Conv2d(6*out_ch,out_ch,1)

    def forward(self,x):

        hx = x

        #stage 1
        hx1 = self.stage1(hx)
        hx = self.pool12(hx1)

        #stage 2
        hx2 = self.stage2(hx)
        hx = self.pool23(hx2)

        #stage 3
        hx3 = self.stage3(hx)
        hx = self.pool34(hx3)

        #stage 4
        hx4 = self.stage4(hx)
        hx = self.pool45(hx4)

        #stage 5
        hx5 = self.stage5(hx)
        hx = self.pool56(hx5)

        #stage 6
        hx6 = self.stage6(hx)
        hx6up = _upsample_like(hx6,hx5)

        #-------------------- decoder --------------------
        hx5d = self.stage5d(torch.cat((hx6up,hx5),1))
        # 類似FPN。每個block的輸出結果和上一個（下一個block）結果做融合（cat），然後輸出。
        hx5dup = _upsample_like(hx5d,hx4)
        # 由于每個block做了下采樣，為了resize到原圖，需要做一個上采樣， 文章用的雙線性插值

        hx4d = self.stage4d(torch.cat((hx5dup,hx4),1))
        hx4dup = _upsample_like(hx4d,hx3)

        hx3d = self.stage3d(torch.cat((hx4dup,hx3),1))
        hx3dup = _upsample_like(hx3d,hx2)

        hx2d = self.stage2d(torch.cat((hx3dup,hx2),1))
        hx2dup = _upsample_like(hx2d,hx1)

        hx1d = self.stage1d(torch.cat((hx2dup,hx1),1))


        #side output
        d1 = self.side1(hx1d)
        # 把每一個block輸出結果，轉換成WxHx1的mask最後過一個sigmod就可以得到每個block輸出的機率圖。

        d2 = self.side2(hx2d)
        d2 = _upsample_like(d2,d1)

        d3 = self.side3(hx3d)
        d3 = _upsample_like(d3,d1)

        d4 = self.side4(hx4d)
        d4 = _upsample_like(d4,d1)

        d5 = self.side5(hx5d)
        d5 = _upsample_like(d5,d1)

        d6 = self.side6(hx6)
        d6 = _upsample_like(d6,d1)

        d0 = self.outconv(torch.cat((d1,d2,d3,d4,d5,d6),1))
        # 6個block cat一起之後做特征融合，然後再做輸出，結果就是d0(單通道)的結果，其他的輸出都是為了計算loss

        return F.sigmoid(d0), F.sigmoid(d1), F.sigmoid(d2), F.sigmoid(d3), F.sigmoid(d4), F.sigmoid(d5), F.sigmoid(d6)

複制

2.3 損失函數設計

bce_loss = nn.BCELoss(size_average=True) # Binary Cross Entropy

def muti_bce_loss_fusion(d0, d1, d2, d3, d4, d5, d6, labels_v):

	loss0 = bce_loss(d0,labels_v)
	loss1 = bce_loss(d1,labels_v)
	loss2 = bce_loss(d2,labels_v)
	loss3 = bce_loss(d3,labels_v)
	loss4 = bce_loss(d4,labels_v)
	loss5 = bce_loss(d5,labels_v)
	loss6 = bce_loss(d6,labels_v)

	loss = loss0 + loss1 + loss2 + loss3 + loss4 + loss5 + loss6
	print("l0: %3f, l1: %3f, l2: %3f, l3: %3f, l4: %3f, l5: %3f, l6: %3f\n"%(loss0.data.item(),loss1.data.item(),loss2.data.item(),loss3.data.item(),loss4.data.item(),loss5.data.item(),loss6.data.item()))

	return loss0, loss

複制

類似于HED算法的deep supervision方式，作者設計了如下函數：

其中，M=6, 為U2Net 的 Sup1, Sup2, …, Sup6 stage. w s i d e ( m ) l ( m ) s i d e w^{(m)}_{side}l^{(m)}{side} wside(m)l(m)side為對應的損失函數輸出和權重； w f u s e l f u s e w_{fuse}l_{fuse} wfuselfuse為融合的損失函數和權重;對于每一個 l l l使用的都是标準的BCE Loss：

從代碼上看很簡單，分别計算 l 0 , l 1 , … l 6 l_0,l_1, \dots l_6 l0,l1,…l6的BCE Loss, 然後全部相加，最後傳回 l o s s 0 , l o s s loss0, loss loss0,loss

這裡使用BCE Loss的原因是使用交叉熵作為損失函數後，反向傳播的梯度不在于sigmoid函數的導數有關了。這就從一定程度上避免了梯度消失。[2]

Reference:

[1] Qin X, Zhang Z, Huang C, et al. U2-Net: Going deeper with nested U-structure for salient object detection[J]. Pattern Recognition, 2020, 106: 107404.

[2] https://blog.csdn.net/geter_CS/article/details/84747670