TensorFlow實戰：Chapter-8上(Mask R-CNN介紹與實作)簡介Mask R-CNN論文回顧如何使用代碼代碼分析-資料預處理代碼分析-在自己的資料集上訓練模型代碼分析-Mask R-CNN 模型分析

簡介

論文位址：Mask R-CNN

源代碼：matterport - github

代碼源于matterport的工作組，可以在github上fork它們組的工作。

軟體必備

複現的Mask R-CNN是基于Python3，Keras，TensorFlow。

Python 3.4+
TensorFlow 1.3+
Keras 2.0.8+
Jupyter Notebook
Numpy, skimage, scipy

建議配置一個高版本的Anaconda3+TensorFlow-GPU版本。

Mask R-CNN論文回顧

Mask R-CNN(簡稱MRCNN)是基于R-CNN系列、FPN、FCIS等工作之上的，MRCNN的思路很簡潔：Faster R-CNN針對每個候選區域有兩個輸出：種類标簽和bbox的偏移量。那麼MRCNN就在Faster R-CNN的基礎上通過增加一個分支進而再增加一個輸出，即物體掩膜(

object mask

)。

先回顧一下Faster R-CNN， Faster R-CNN主要由兩個階段組成：區域候選網絡(Region Proposal Network,RPN)和基礎的Fast R-CNN模型。

RPN用于産生候選區域

TensorFlow實戰：Chapter-8上(Mask R-CNN介紹與實作)簡介Mask R-CNN論文回顧如何使用代碼代碼分析-資料預處理代碼分析-在自己的資料集上訓練模型代碼分析-Mask R-CNN 模型分析
Fast R-CNN通過RoIPool層對每個候選區域提取特征，進而實作目标分類和bbox回歸

TensorFlow實戰：Chapter-8上(Mask R-CNN介紹與實作)簡介Mask R-CNN論文回顧如何使用代碼代碼分析-資料預處理代碼分析-在自己的資料集上訓練模型代碼分析-Mask R-CNN 模型分析

MRCNN采用和Faster R-CNN相同的兩個階段，具有相同的第一層(即RPN)，第二階段，除了預測種類和bbox回歸，并且并行的對每個RoI預測了對應的二值掩膜(

binary mask

)。示意圖如下：

這樣做可以将整個任務簡化為mulit-stage pipeline，解耦了多個子任務的關系，現階段來看，這樣做好處頗多。

主要工作

損失函數的定義

依舊采用的是多任務損失函數，針對每個每個RoI定義為

L=Lcls+Lbox+LmaskL=Lcls+Lbox+Lmask Lcls，LboxLcls，Lbox與Faster R-CNN的定義類似，這裡主要看 LmaskLmask。

掩膜分支針對每個RoI産生一個Km2Km2的輸出，即K個分辨率為m×mm×m的二值的掩膜，KK為分類物體的種類數目。依據預測類别分支預測的類型ii，隻将第ii的二值掩膜輸出記為LmaskLmask。

掩膜分支的損失計算如下示意圖：

mask branch 預測KK個種類的m×mm×m二值掩膜輸出
依據種類預測分支(Faster R-CNN部分)預測結果：目前RoI的物體種類為ii
第ii個二值掩膜輸出就是該RoI的損失LmaskLmask

對于預測的二值掩膜輸出，我們對每個像素點應用sigmoid函數，整體損失定義為平均二值交叉損失熵。

引入預測KK個輸出的機制，允許每個類都生成獨立的掩膜，避免類間競争。這樣做解耦了掩膜和種類預測。不像是FCN的方法，在每個像素點上應用softmax函數，整體采用的多任務交叉熵，這樣會導緻類間競争，最終導緻分割效果差。

掩膜表示到RoIAlign層

在Faster R-CNN上預測物體标簽或bbox偏移量是将feature map壓縮到FC層最終輸出vector，壓縮的過程丢失了空間上(平面結構)的資訊，而掩膜是對輸入目标做空間上的編碼，直接用卷積形式表示像素點之間的對應關系那是最好的了。

輸出掩膜的操作是不需要壓縮輸出vector，是以可以使用FCN(Full Convolutional Network)，不僅效率高，而且參數量還少。為了更好的表示出RoI輸入和FCN輸出的feature之間的像素對應關系，提出了RoIAlign層。

先回顧一下RoIPool層：

其核心思想是将不同大小的RoI輸入到RoIPool層，RoIPool層将RoI量化成不同粒度的特征圖（量化成一個一個bin），在此基礎上使用池化操作提取特征。

下圖是SPPNet内對RoI的操作，在Faster R-CNN中隻使用了一種粒度的特征圖：

平面示意圖如下：

這裡面存在一些問題，在上面量操作上，實際計算中是使用的是[x/16][x/16]，1616的量化的步長，[⋅][·]是舍入操作(

rounding

)。這套量化舍入操作在提取特征時有着較好的魯棒性(檢測物體具有平移不變性等)，但是這很不利于掩膜定位，有較大負面效果。

針對這個問題，提出了RoIAlign層：避免了對RoI邊界或bin的量化操作，在擴充feature map時使用雙線性插值算法。這裡實作的架構要看FPN論文：

一開始的Faster R-CNN是基于最上層的特征映射做分割和預測的，這會丢失高分辨下的資訊，直覺的影響就是丢失小目标檢測，對細節部分丢失不敏感。受到SSD的啟發，FPN也使用了多層特征做預測。這裡使用的top-down的架構，是将高層的特征反卷積帶到低層的特征(即有了語義，也有精度)，而在MRCNN論文裡面說的雙線性內插補點算法就是這裡的top-down反卷積是用的插值算法。

總結

MRCNN有着優異的效果，除去了掩膜分支的作用，很大程度上是因為基礎特征網絡的增強，論文使用的是ResNeXt101+FPN的top-down組合，有着極強的特征學習能力，并且在實驗中夾雜這多種工程調優技巧。

但是吧，MRCNN的缺點也很明顯，需要大的計算能力并且速度慢，這離實際應用還是有很長的路，坐等大神們發力！

如何使用代碼

項目的源代碼位址為:github/Mask R-CNN

滿足運作環境
- Python 3.4+
- TensorFlow 1.3+
- Keras 2.0.8+
- Jupyter Notebook
- Numpy, skimage, scipy, Pillow（安裝Anaconda3直接完事）
- cv2
下載下傳代碼
- linux環境下直接clone到本地
  - 1
- Windows下下載下傳代碼即可，位址在上面
下載下傳模型在COCO資料集上預訓練權重（ mask_rcnn_coco.h5 ），下載下傳位址releasses Page.
如果需要在COCO資料集上訓練或測試，需要安裝 pycocotools ， clone 下來， make 生成對應的檔案，拷貝下工程目錄下即可(方法可參考下面repos内的 README.md 檔案)。
- Linux: https://github.com/waleedka/coco
- Windows: https://github.com/philferriere/cocoapi. You must have the Visual C++ 2015 build tools on your path (see the repo for additional details)
如果使用COCO資料集，需要：
- pycocotools (即第4條描述的)
- MS COCO Dataset。2014的訓練集資料
- COCO子資料集，5K的minival和35K的validation-minus-minival。（這兩個資料集下載下傳比較慢，沒有貼原位址，而是我的CSDN位址，分不夠下載下傳的可以私信我~）

下面的代碼分析運作環境都是jupyter。

代碼分析-資料預處理

項目源代碼：matterport - github

inspect_data.ipynb展示了準備訓練資料的預處理步驟.

導包

導入的coco包需要從coco/PythonAPI上下載下傳操作資料代碼，并在本地使用

make

指令編譯.将生成的

pycocotools

拷貝至工程的主目錄下，即和該

inspect_data.ipynb

檔案同一目錄。

import os
import sys
import itertools
import math
import logging
import json
import re
import random
from collections import OrderedDict
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib.lines as lines
from matplotlib.patches import Polygon

import utils
import visualize
from visualize import display_images
import model as modellib
from model import log

%matplotlib inline 

ROOT_DIR = os.getcwd()

# 選擇任意一個代碼塊 
# import shapes
# config = shapes.ShapesConfig()    # 使用代碼建立資料集，後面會有介紹

# MS COCO 資料集
import coco
config = coco.CocoConfig()
COCO_DIR = "/root/模型複現/Mask_RCNN-master/coco"  # COCO資料存放位置

加載資料集

COCO資料集的訓練集内有82081張圖檔，共81類。

# 這裡使用的是COCO
if config.NAME == 'shapes':
    dataset = shapes.ShapesDataset()
    dataset.load_shapes(, config.IMAGE_SHAPE[], config.IMAGE_SHAPE[])
elif config.NAME == "coco":
    dataset = coco.CocoDataset()
    dataset.load_coco(COCO_DIR, "train")

# Must call before using the dataset
dataset.prepare()

print("Image Count: {}".format(len(dataset.image_ids)))
print("Class Count: {}".format(dataset.num_classes))
for i, info in enumerate(dataset.class_info):
    print("{:3}. {:50}".format(i, info['name']))

>>>
>>>
loading annotations into memory...
Done (t=68s)
creating index...
index created!
Image Count: 
Class Count: 
   BG                                                
   person                                            
   bicycle   
 ...
  scissors                                          
  teddy bear                                        
  hair drier                                        
  toothbrush

随機找幾張照片看看：

# 加載和展示随機幾張照片和對應的mask
image_ids = np.random.choice(dataset.image_ids, )
for image_id in image_ids:
    image = dataset.load_image(image_id)
    mask, class_ids = dataset.load_mask(image_id)
    visualize.display_top_masks(image, mask, class_ids, dataset.class_names)

Bounding Boxes(bbox)

這裡我們不使用資料集本身提供的bbox坐标資料，取而代之的是通過mask計算出bbox，這樣可以在不同的資料集下對bbox使用相同的處理方法。因為我們是從mask上計算bbox，相比與從圖檔計算bbox轉換來說，更便于放縮，旋轉，裁剪圖像。

# Load random image and mask.
image_id = random.choice(dataset.image_ids)
image = dataset.load_image(image_id)
mask, class_ids = dataset.load_mask(image_id)
# Compute Bounding box
bbox = utils.extract_bboxes(mask)

# Display image and additional stats
print("image_id ", image_id, dataset.image_reference(image_id))
log("image", image)
log("mask", mask)
log("class_ids", class_ids)
log("bbox", bbox)
# Display image and instances
visualize.display_instances(image, bbox, mask, class_ids, dataset.class_names)

>>>
>>>
image_id   http://cocodataset.org/#explore?id=190360
image                    shape: (, , )         min:      max:  
mask                     shape: (, , )         min:      max:    
class_ids                shape: (,)                  min:      max:   
bbox                     shape: (, )                min:      max:

調整圖檔大小

因為訓練時是批量處理的，每次batch要處理多張圖檔，模型需要一個固定的輸入大小。故将訓練集的圖檔放縮到一個固定的大小

(1024×1024)

，放縮的過程要保持不變的寬高比，如果照片本身不是正方形，那邊就在邊緣填充0.(這在R-CNN論文裡面論證過)。

需要注意的是：原圖檔做了放縮，對應的mask也需要放縮，因為我們的bbox是依據mask計算出來的，這樣省了修改程式了~

# Load random image and mask.
image_id = np.random.choice(dataset.image_ids, )[]
image = dataset.load_image(image_id)
mask, class_ids = dataset.load_mask(image_id)
original_shape = image.shape
# 調整到固定大小
image, window, scale, padding = utils.resize_image(
    image, 
    min_dim=config.IMAGE_MIN_DIM, 
    max_dim=config.IMAGE_MAX_DIM,
    padding=config.IMAGE_PADDING)
mask = utils.resize_mask(mask, scale, padding) # mask也要放縮
# Compute Bounding box
bbox = utils.extract_bboxes(mask)

# Display image and additional stats
print("image_id: ", image_id, dataset.image_reference(image_id))
print("Original shape: ", original_shape)
log("image", image)
log("mask", mask)
log("class_ids", class_ids)
log("bbox", bbox)
# Display image and instances
visualize.display_instances(image, bbox, mask, class_ids, dataset.class_names)

>>>
>>>
image_id:   http://cocodataset.org/#explore?id=139889
Original shape:  (, , )
image                    shape: (, , )       min:      max:  
mask                     shape: (, , )       min:      max:    
class_ids                shape: (,)                  min:     max:   
bbox                     shape: (, )                min:    max:

原圖檔從

(426, 640, 3)

放大到

(1024, 1024, 3)

,圖檔的上下兩端都填充了0(黑色的部分)：

Mini Mask

訓練高分辨率的圖檔時，表示每個目标的二值mask也會非常大。例如，訓練一張

1024×1024

的圖檔，其目标物體對應的mask需要1MB的記憶體(用boolean變量表示單點)，如果1張圖檔有100個目标物體就需要100MB。講道理，如果是五顔六色就算了，但實際上表示mask的圖像矩陣上大部分都是0，很浪費空間。

為了節省空間同時提升訓練速度，我們優化mask的表示方式，不直接存儲那麼多0，而是通過存儲有值坐标的相對位置來壓縮表示資料的記憶體，原理和壓縮算法差類似。

我們存儲在對象邊界框内(bbox内)的mask像素，而不是存儲整張圖檔的mask像素，大多數物體相對比于整張圖檔是較小的，節省存儲空間是通過少存儲目标周圍的0實作的。
将mask調整到小尺寸 56×56 ，對于大尺寸的物體會丢失一些精度，但是大多數對象的注解并不是很準确，是以大多數情況下這些損失是可以忽略的。（可以在config類中設定mini mask的size。）

說白了就是在處理資料的時候，我們先利用标注的mask資訊計算出對應的bbox框，而後利用計算的bbox框反過來改變mask的表示方法，目的就是操作規範化，同時降低存儲空間和計算複雜度。

image_id = np.random.choice(dataset.image_ids, )[]
# 使用load_image_gt方法擷取bbox和mask
image, image_meta, bbox, mask = modellib.load_image_gt(
    dataset, config, image_id, use_mini_mask=False)

log("image", image)
log("image_meta", image_meta)
log("bbox", bbox)
log("mask", mask)

display_images([image]+[mask[:,:,i] for i in range(min(mask.shape[-], ))])

>>>
>>>
image                    shape: (, , )       min:      max:  
image_meta               shape: (,)                 min:      max: 
bbox                     shape: (, )                min:     max:  
mask                     shape: (, , )       min:      max:

随機選取一張圖檔，可以看到圖檔目标相對與圖檔本身較小：

visualize.display_instances(image, bbox[:,:4], mask, bbox[:,4], dataset.class_names)

使用

load_image_gt

方法，傳入

use_mini_mask=True

實作mini mask操作：

# load_image_gt方法內建了mini_mask的操作
image, image_meta, bbox, mask = modellib.load_image_gt(
    dataset, config, image_id, augment=True, use_mini_mask=True)
log("mask", mask)
display_images([image]+[mask[:,:,i] for i in range(min(mask.shape[-], ))])

>>>
>>>
mask                     shape: (, , )           min:      max:

這裡為了展現效果，将mini_mask表示方法通過

expand_mask

方法擴大到大圖像下的mask，再繪制試試：

mask = utils.expand_mask(bbox, mask, image.shape)
visualize.display_instances(image, bbox[:,:], mask, bbox[:,], dataset.class_names)

可以看到邊界是鋸齒狀，這也是壓縮的副作用，總體來說效果還可以～

Anchors

Anchors是Faster R-CNN内提出的方法。

模型在運作過程中有多層feature map，同時也會有非常多的Anchors，處理好Anchors的順序非常重要。例如使用anchors的順序要比對卷積處理的順序等規則。

對于FPN網絡，anchor的順序要與卷積層的輸出相比對：

先按金字塔等級排序，第一層的所有anchors,第二層所有anchors,etc..通過按層次可以很容易分開所有的anchors
對于每個層，通過feature map處理序列來排列anchors，通常，一個卷積層處理一個feature map 是從左上角開始，向右一行一行來整
對于feature map的每個cell，可為不同比例的Anchors采用随意順序，這裡我們将采用不同比例的順序當參數傳遞給相應的函數

Anchor步長：在FPN架構下，前幾層的feature map是高分辨率的。例如，如果輸入是

1024×1024

,那麼第一層的feature map大小為

256×256

，這會産生約200K的anchors(2562563),這些anchor都是

32×32

,相對于圖檔像素的步長為4(1024/256=4),這裡面有很多重疊，如果我們能夠為feature map的每個點生成獨有的anchor，就會顯著的降低負載，如果設定anchor的步長為2，那麼anchor的數量就會下降4倍。

這裡我們使用的strides為

，這和論文不一樣，在

Config

類中，我們配置了3中比例(

[0.5, 1, 2]

)的anchors，以第一層feature map舉例，其大小為

256×256

,故有feature_map2×ratiosstride2=256×256×322=49152feature_map2×ratiosstride2=256×256×322=49152。

# 生成 Anchors
anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES, 
                                          config.RPN_ANCHOR_RATIOS,
                                          config.BACKBONE_SHAPES,
                                          config.BACKBONE_STRIDES, 
                                          config.RPN_ANCHOR_STRIDE)

# Print summary of anchors
print("Scales: ", config.RPN_ANCHOR_SCALES)
print("ratios: {}, \nAnchors_per_cell:{}".format(config.RPN_ANCHOR_RATIOS , len(config.RPN_ANCHOR_RATIOS)))
print("backbone_shapes: ",config.BACKBONE_SHAPES)
print("backbone_strides: ",config.BACKBONE_STRIDES)
print("rpn_anchor_stride: ",config.RPN_ANCHOR_STRIDE)

num_levels = len(config.BACKBONE_SHAPES)
print("Count: ", anchors.shape[])
print("Levels: ", num_levels)
anchors_per_level = []
for l in range(num_levels):
    num_cells = config.BACKBONE_SHAPES[l][] * config.BACKBONE_SHAPES[l][]
    anchors_per_level.append(anchors_per_cell * num_cells // config.RPN_ANCHOR_STRIDE**)
    print("Anchors in Level {}: {}".format(l, anchors_per_level[l]))


>>>
>>>
Scales:  (, , , , )
ratios: [, , ], 
 anchors_per_cell:
backbone_shapes:  [[ ] [ ] [   ]  [   ]  [   ]]
backbone_strides:  [, , , , ]
rpn_anchor_stride:  
Count:  
Levels:  
Anchors in Level : 
Anchors in Level : 
Anchors in Level : 
Anchors in Level : 
Anchors in Level :

看看位置圖檔中心點cell的不同層anchor表示：

# Load and draw random image
image_id = np.random.choice(dataset.image_ids, )[]
image, image_meta, _, _ = modellib.load_image_gt(dataset, config, image_id)
fig, ax = plt.subplots(, figsize=(, ))
ax.imshow(image)

levels = len(config.BACKBONE_SHAPES) # 共有5層 15個anchors

for level in range(levels):
    colors = visualize.random_colors(levels)
    # Compute the index of the anchors at the center of the image
    level_start = sum(anchors_per_level[:level]) # sum of anchors of previous levels
    level_anchors = anchors[level_start:level_start+anchors_per_level[level]]
    print("Level {}. Anchors: {:6}  Feature map Shape: {}".format(level, level_anchors.shape[], 
                                                                config.BACKBONE_SHAPES[level]))
    center_cell = config.BACKBONE_SHAPES[level] // 
    center_cell_index = (center_cell[] * config.BACKBONE_SHAPES[level][] + center_cell[])
    level_center = center_cell_index * anchors_per_cell 
    center_anchor = anchors_per_cell * (
        (center_cell[] * config.BACKBONE_SHAPES[level][] / config.RPN_ANCHOR_STRIDE**) \
        + center_cell[] / config.RPN_ANCHOR_STRIDE)
    level_center = int(center_anchor)

    # Draw anchors. Brightness show the order in the array, dark to bright.
    for i, rect in enumerate(level_anchors[level_center:level_center+anchors_per_cell]):
        y1, x1, y2, x2 = rect
        p = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=, facecolor='none',
                              edgecolor=(i+)*np.array(colors[level]) / anchors_per_cell)
        ax.add_patch(p)


>>>
>>>
Level  Anchors:    Feature map Shape: [ ]
Level  Anchors:    Feature map Shape: [ ]
Level  Anchors:     Feature map Shape: [ ]
Level  Anchors:      Feature map Shape: [ ]
Level  Anchors:      Feature map Shape: [ ]

代碼分析-在自己的資料集上訓練模型

項目源代碼：matterport - github

train_shapes.ipynb展示了如何在自己的資料集上訓練Mask R-CNN.

如果想在你的個人訓練集上訓練模型，需要分别建立兩個子類繼承下面兩個父類：

Config 類，該類包含了預設的配置，子類繼承該類在針對資料集定制配置。
Dataset 類，該類提供了一套api，新的資料集繼承該類，同時覆寫相關方法即可，這樣可以在不修改模型代碼的情況下，使用多種資料集(包括同時使用)。

無論是

Dataset

還是

Config

都是基類，使用是要繼承并做相關定制，使用案例可參考下面的demo。

導包

因為demo中使用的資料集是使用opencv建立出來的，故不需要另外在下載下傳資料集了。為了保證模型運作正常，此demo依舊需要在GPU上運作。

import os
import sys
import random
import math
import re
import time
import numpy as np
import cv2
import matplotlib
import matplotlib.pyplot as plt

from config import Config
import utils
import model as modellib
import visualize
from model import log

%matplotlib inline 

ROOT_DIR = os.getcwd()  # Root directory of the project
MODEL_DIR = os.path.join(ROOT_DIR, "logs") # Directory to save logs and trained model
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5") # Path to COCO trained weights

建構個人資料集

這裡直接使用opencv建立一個資料集，資料集是由畫布和簡單的幾何形狀(三角形，正方形，圓形)組成。

構造的資料集需要繼承

utils.Dataset

類，使用

load_shapes()

方法向外提供加載資料的方法，并需要重寫下面的方法：

load_image()
load_mask()
image_reference()

構造資料集的代碼：

class ShapesDataset(utils.Dataset):
    """
    生成一個資料集，資料集由簡單的(三角形，正方形，圓形)放置在空白畫布的圖檔組成。
    """

    def load_shapes(self, count, height, width):
        """
        産生對應數目的固定大小圖檔
        count: 生成資料的數量
        height, width: 産生圖檔的大小
        """
        # 添加種類資訊
        self.add_class("shapes", , "square")
        self.add_class("shapes", , "circle")
        self.add_class("shapes", , "triangle")

        # 生成随機規格形狀，每張圖檔依據image_id指定
        for i in range(count):
            bg_color, shapes = self.random_image(height, width)
            self.add_image("shapes", image_id=i, path=None,
                           width=width, height=height,
                           bg_color=bg_color, shapes=shapes)

    def load_image(self, image_id):
        """
        依據給定的iamge_id産生對應圖檔。
        通常這個函數是讀取檔案的，這裡我們是依據image_id到image_info裡面查找資訊，再生成圖檔
        """
        info = self.image_info[image_id]
        bg_color = np.array(info['bg_color']).reshape([, , ])
        image = np.ones([info['height'], info['width'], ], dtype=np.uint8)
        image = image * bg_color.astype(np.uint8)
        for shape, color, dims in info['shapes']:
            image = self.draw_shape(image, shape, dims, color)
        return image

    def image_reference(self, image_id):
        """Return the shapes data of the image."""
        info = self.image_info[image_id]
        if info["source"] == "shapes":
            return info["shapes"]
        else:
            super(self.__class__).image_reference(self, image_id)

    def load_mask(self, image_id):
        """依據給定的image_id産生相應的規格形狀的掩膜"""
        info = self.image_info[image_id]
        shapes = info['shapes']
        count = len(shapes)
        mask = np.zeros([info['height'], info['width'], count], dtype=np.uint8)
        for i, (shape, _, dims) in enumerate(info['shapes']):
            mask[:, :, i:i+] = self.draw_shape(mask[:, :, i:i+].copy(),
                                                shape, dims, )
        # Handle occlusions
        occlusion = np.logical_not(mask[:, :, -]).astype(np.uint8)
        for i in range(count-, -, -):
            mask[:, :, i] = mask[:, :, i] * occlusion
            occlusion = np.logical_and(occlusion, np.logical_not(mask[:, :, i]))
        # Map class names to class IDs.
        class_ids = np.array([self.class_names.index(s[]) for s in shapes])
        return mask, class_ids.astype(np.int32)

    def draw_shape(self, image, shape, dims, color):
        """繪制給定的形狀."""
        # Get the center x, y and the size s
        x, y, s = dims
        if shape == 'square':
            image = cv2.rectangle(image, (x-s, y-s), (x+s, y+s), color, -)
        elif shape == "circle":
            image = cv2.circle(image, (x, y), s, color, -)
        elif shape == "triangle":
            points = np.array([[(x, y-s),
                                (x-s/math.sin(math.radians()), y+s),
                                (x+s/math.sin(math.radians()), y+s),
                                ]], dtype=np.int32)
            image = cv2.fillPoly(image, points, color)
        return image

    def random_shape(self, height, width):
        """
        依據給定的長寬邊界生成随機形狀

        傳回一個有三個值的元組：
        * shape: 形狀名稱(square, circle, ...)
        * color: 形狀顔色(a tuple of 3 values, RGB.)
        * dimensions: 随機形狀的中心位置和大小(center_x,center_y,size)
        """
        # Shape
        shape = random.choice(["square", "circle", "triangle"])
        # Color
        color = tuple([random.randint(, ) for _ in range()])
        # Center x, y
        buffer = 
        y = random.randint(buffer, height - buffer - )
        x = random.randint(buffer, width - buffer - )
        # Size
        s = random.randint(buffer, height//)
        return shape, color, (x, y, s)

    def random_image(self, height, width):
        """
        産生有多種形狀的随機規格的圖檔
        傳回背景色 和 可以用于繪制圖檔的形狀規格清單
        """
        # 随機生成三個通道顔色
        bg_color = np.array([random.randint(, ) for _ in range()])
        # 生成一些随機形狀并記錄它們的bbox
        shapes = []
        boxes = []
        N = random.randint(, )
        for _ in range(N):
            shape, color, dims = self.random_shape(height, width)
            shapes.append((shape, color, dims))
            x, y, s = dims
            boxes.append([y-s, x-s, y+s, x+s])
        # 使用非極大值抑制避免各種形狀之間覆寫  門檻值為:0.3
        keep_ixs = utils.non_max_suppression(np.array(boxes), np.arange(N), )
        shapes = [s for i, s in enumerate(shapes) if i in keep_ixs]
        return bg_color, shapes

用上面的資料類構造一組資料，看看：

# 建構訓練集，大小為500
dataset_train = ShapesDataset()
dataset_train.load_shapes(, config.IMAGE_SHAPE[], config.IMAGE_SHAPE[])
dataset_train.prepare()

# 建構驗證集，大小為50
dataset_val = ShapesDataset()
dataset_val.load_shapes(, config.IMAGE_SHAPE[], config.IMAGE_SHAPE[])
dataset_val.prepare()

# 随機選取4個樣本
image_ids = np.random.choice(dataset_train.image_ids, )  

for image_id in image_ids:
    image = dataset_train.load_image(image_id)
    mask, class_ids = dataset_train.load_mask(image_id)
    visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)

為上面構造的資料集配置一個對應的

ShapesConfig

類，該類的作用統一模型配置參數。該類需要繼承

Config

類：

class ShapesConfig(Config):
    """
    為資料集添加訓練配置
    繼承基類Config
    """
    NAME = "shapes" # 該配置類的識别符

    #Batch size is 8 (GPUs * images/GPU).
    GPU_COUNT =  # GPU數量
    IMAGES_PER_GPU =  # 單GPU上處理圖檔數(這裡我們構造的資料集圖檔小，可以多處理幾張) 

    # 分類種類數目 (包括背景)
    NUM_CLASSES =  +   # background + 3 shapes

    # 使用小圖檔可以更快的訓練
    IMAGE_MIN_DIM =  # 圖檔的小邊長
    IMAGE_MAX_DIM =  # 圖檔的大邊長

    # 使用小的anchors，因為資料圖檔和目标都小
    RPN_ANCHOR_SCALES = (, , , , )  # anchor side in pixels

    # 減少訓練每張圖檔上的ROIs，因為圖檔很小且目标很少，
    # Aim to allow ROI sampling to pick 33% positive ROIs.
    TRAIN_ROIS_PER_IMAGE = 

    STEPS_PER_EPOCH =      # 因為資料簡單，使用小的epoch

    VALIDATION_STPES =     # 因為epoch較小，使用小的交叉驗證步數

config = ShapesConfig()
config.print()

>>>
>>>

Configurations:
BACKBONE_SHAPES                [[ ]
 [ ]
 [   ]
 [   ]
 [   ]]
BACKBONE_STRIDES               [, , , , ]
BATCH_SIZE                     
BBOX_STD_DEV                   [       ]
DETECTION_MAX_INSTANCES        
DETECTION_MIN_CONFIDENCE       
DETECTION_NMS_THRESHOLD        
GPU_COUNT                      
IMAGES_PER_GPU                 
IMAGE_MAX_DIM                  
IMAGE_MIN_DIM                  
IMAGE_PADDING                  True
IMAGE_SHAPE                    [    ]
LEARNING_MOMENTUM              
LEARNING_RATE                  
MASK_POOL_SIZE                 
MASK_SHAPE                     [, ]
MAX_GT_INSTANCES               
MEAN_PIXEL                     [     ]
MINI_MASK_SHAPE                (, )
NAME                           shapes
NUM_CLASSES                    
POOL_SIZE                      
POST_NMS_ROIS_INFERENCE        
POST_NMS_ROIS_TRAINING         
ROI_POSITIVE_RATIO             
RPN_ANCHOR_RATIOS              [, , ]
RPN_ANCHOR_SCALES              (, , , , )
RPN_ANCHOR_STRIDE              
RPN_BBOX_STD_DEV               [       ]
RPN_TRAIN_ANCHORS_PER_IMAGE    
STEPS_PER_EPOCH                
TRAIN_ROIS_PER_IMAGE           
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STPES               
WEIGHT_DECAY

加載模型并訓練

上面配置好了個人資料集和對應的Config了，下面加載預訓練模型：

# 模型有兩種模式: training inference
# 建立模型并設定training模式
model = modellib.MaskRCNN(mode="training", config=config,
                          model_dir=MODEL_DIR)

# 選擇權重類型，這裡我們的預訓練權重是COCO的
init_with = "coco"  # imagenet, coco, or last

if init_with == "imagenet":
    model.load_weights(model.get_imagenet_weights(), by_name=True)
elif init_with == "coco":
    # 載入在MS COCO上的預訓練模型,跳過不一樣的分類數目層
    model.load_weights(COCO_MODEL_PATH, by_name=True,
                       exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", 
                                "mrcnn_bbox", "mrcnn_mask"])
elif init_with == "last":
    # 載入你最後訓練的模型，繼續訓練
    model.load_weights(model.find_last()[], by_name=True)

訓練模型

我們前面基礎層是加載預訓練模型的，在預訓練模型的基礎上再訓練，分為兩步：

隻訓練head部分，為了不破壞基礎層的提取能力，我們當機所有backbone layers,隻訓練随機初始化的層，為了達成隻訓練head部分，訓練時需要向 train() 方法傳入 layers='heads' 參數。
Fine-tune所有層，上面訓練了一會head部分，為了更好的适配新的資料集，需要fine-tune，使用 layers='all' 參數。

這兩個步驟也是做遷移學習的必備套路了~

1. 訓練head部分

# 通過傳入參數layers="heads" 當機處理head部分的所有層。可以通過傳入一個正規表達式選擇要訓練的層
model.train(dataset_train, dataset_val, 
            learning_rate=config.LEARNING_RATE, 
            epochs=, 
            layers='heads')

>>>
>>>
Starting at epoch  LR=

Checkpoint Path: /root/Mask_RCNNmaster/logs/shapes20171103T2047/mask_rcnn_shapes_{epoch:d}.h5
Selecting layers to train
fpn_c5p5               (Conv2D)
fpn_c4p4               (Conv2D)
fpn_c3p3               (Conv2D)
fpn_c2p2               (Conv2D)
fpn_p5                 (Conv2D)
fpn_p2                 (Conv2D)
fpn_p3                 (Conv2D)
fpn_p4                 (Conv2D)
In model:  rpn_model
    rpn_conv_shared        (Conv2D)
    rpn_class_raw          (Conv2D)
    rpn_bbox_pred          (Conv2D)
mrcnn_mask_conv1       (TimeDistributed)
...
mrcnn_mask_conv4       (TimeDistributed)
mrcnn_mask_bn4         (TimeDistributed)
mrcnn_bbox_fc          (TimeDistributed)
mrcnn_mask_deconv      (TimeDistributed)
mrcnn_class_logits     (TimeDistributed)
mrcnn_mask             (TimeDistributed)

Epoch /
/ [==============================] - s ms/step - loss:  - rpn_class_loss:  - rpn_bbox_loss:  - mrcnn_class_loss:  - mrcnn_bbox_loss:  - mrcnn_mask_loss:  - val_loss:  - val_rpn_class_loss:  - val_rpn_bbox_loss:  - val_mrcnn_class_loss:  - val_mrcnn_bbox_loss:  - val_mrcnn_mask_loss:

2. Fine tune 所有層

# 通過傳入參數layers="all"所有層
model.train(dataset_train, dataset_val, 
            learning_rate=config.LEARNING_RATE / ,
            epochs=, 
            layers="all")

>>>
>>>

Starting at epoch  LR=

Checkpoint Path: /root/Mask_RCNN-master/logs/shapes20171103T2047/mask_rcnn_shapes_{epoch:04d}.h5
Selecting layers to train
conv1                  (Conv2D)
bn_conv1               (BatchNorm)
res2a_branch2a         (Conv2D)
bn2a_branch2a          (BatchNorm)
res2a_branch2b         (Conv2D)
...
...
res5c_branch2c         (Conv2D)
bn5c_branch2c          (BatchNorm)
fpn_c5p5               (Conv2D)
fpn_c4p4               (Conv2D)
fpn_c3p3               (Conv2D)
fpn_c2p2               (Conv2D)
fpn_p5                 (Conv2D)
fpn_p2                 (Conv2D)
fpn_p3                 (Conv2D)
fpn_p4                 (Conv2D)
In model:  rpn_model
    rpn_conv_shared        (Conv2D)
    rpn_class_raw          (Conv2D)
    rpn_bbox_pred          (Conv2D)
mrcnn_mask_conv1       (TimeDistributed)
mrcnn_mask_bn1         (TimeDistributed)
mrcnn_mask_conv2       (TimeDistributed)
mrcnn_class_conv1      (TimeDistributed)
mrcnn_mask_bn2         (TimeDistributed)
mrcnn_class_bn1        (TimeDistributed)
mrcnn_mask_conv3       (TimeDistributed)
mrcnn_mask_bn3         (TimeDistributed)
mrcnn_class_conv2      (TimeDistributed)
mrcnn_class_bn2        (TimeDistributed)
mrcnn_mask_conv4       (TimeDistributed)
mrcnn_mask_bn4         (TimeDistributed)
mrcnn_bbox_fc          (TimeDistributed)
mrcnn_mask_deconv      (TimeDistributed)
mrcnn_class_logits     (TimeDistributed)
mrcnn_mask             (TimeDistributed)

Epoch /
/ [==============================] - 38s 381ms/step - loss:  - rpn_class_loss:  - rpn_bbox_loss:  - mrcnn_class_loss:  - mrcnn_bbox_loss:  - mrcnn_mask_loss:  - val_loss:  - val_rpn_class_loss:  - val_rpn_bbox_loss:  - val_mrcnn_class_loss:  - val_mrcnn_bbox_loss:  - val_mrcnn_mask_loss:

模型預測

模型預測也需要配置一個類

InferenceConfig

類，大部配置設定置和train相同：

class InferenceConfig(ShapesConfig):
    GPU_COUNT = 
    IMAGES_PER_GPU = 

inference_config = InferenceConfig()

# 重新建立模型設定為inference模式
model = modellib.MaskRCNN(mode="inference", 
                          config=inference_config,
                          model_dir=MODEL_DIR)

# 擷取儲存的權重，或者手動指定目錄位置
# model_path = os.path.join(ROOT_DIR, ".h5 file name here")
model_path = model.find_last()[]

# 加載權重
assert model_path != "", "Provide path to trained weights"
print("Loading weights from ", model_path)
model.load_weights(model_path, by_name=True)

# 測試随機圖檔
image_id = random.choice(dataset_val.image_ids)
original_image, image_meta, gt_bbox, gt_mask =\
    modellib.load_image_gt(dataset_val, inference_config, 
                           image_id, use_mini_mask=False)

log("original_image", original_image)
log("image_meta", image_meta)
log("gt_bbox", gt_bbox)
log("gt_mask", gt_mask)

visualize.display_instances(original_image, gt_bbox[:,:], gt_mask, gt_bbox[:,], 
                            dataset_train.class_names, figsize=(, ))

>>>
>>>
original_image           shape: (, , )         min:     max:  
image_meta               shape: (,)                 min:      max:  
gt_bbox                  shape: (, )                min:      max:  
gt_mask                  shape: (, , )         min:      max:

随機幾張驗證集圖檔看看:

使用模型預測：

def get_ax(rows=, cols=, size=):
    """傳回Matplotlib Axes數組用于可視化.提供中心點控制圖形大小"""
    _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows))
    return ax

results = model.detect([original_image], verbose=) # 預測

r = results[]
visualize.display_instances(original_image, r['rois'], r['masks'], r['class_ids'], 
                            dataset_val.class_names, r['scores'], ax=get_ax())

>>>
>>>
Processing  images
image                    shape: (, , )         min:     max:  
molded_images            shape: (, , , )      min:  -  max:  
image_metas              shape: (, )               min:      max:

計算ap值：

# Compute VOC-Style mAP @ IoU=0.5
# Running on 10 images. Increase for better accuracy.
image_ids = np.random.choice(dataset_val.image_ids, )
APs = []
for image_id in image_ids:
    # 加載資料
    image, image_meta, gt_bbox, gt_mask =\
        modellib.load_image_gt(dataset_val, inference_config,
                               image_id, use_mini_mask=False)
    molded_images = np.expand_dims(modellib.mold_image(image, inference_config), )
    # Run object detection
    results = model.detect([image], verbose=)
    r = results[]
    # Compute AP
    AP, precisions, recalls, overlaps =\
        utils.compute_ap(gt_bbox[:,:], gt_bbox[:,],
                         r["rois"], r["class_ids"], r["scores"])
    APs.append(AP)

print("mAP: ", np.mean(APs))

>>>
>>>
mAP:

代碼分析-Mask R-CNN 模型分析

測試，調試和評估Mask R-CNN模型。

導包

這裡會用到自定義的COCO子資料集，5K的minival和35K的validation-minus-minival。（這兩個資料集下載下傳比較慢，沒有貼原位址，而是我的CSDN位址，分不夠下載下傳的可以私信我~）

import os
import sys
import random
import math
import re
import time
import numpy as np
import scipy.misc
import tensorflow as tf
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.patches as patches

import utils
import visualize
from visualize import display_images
import model as modellib
from model import log

%matplotlib inline 

ROOT_DIR = os.getcwd()  # Root directory of the project
MODEL_DIR = os.path.join(ROOT_DIR, "logs") # Directory to save logs and trained model
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "coco/mask_rcnn_coco.h5")  # Path to trained weights file
SHAPES_MODEL_PATH = os.path.join(ROOT_DIR, "log/shapes20171103T2047/mask_rcnn_shapes_0002.h5") # Path to Shapes trained weights

# Shapes toy dataset
# import shapes
# config = shapes.ShapesConfig()

# MS COCO Dataset
import coco
config = coco.CocoConfig()
COCO_DIR = os.path.join(ROOT_DIR, "coco")  # TODO: enter value here

def get_ax(rows=, cols=, size=):
    """控制繪圖大小"""
    _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows))
    return ax

# 建立一個預測配置類InferenceConfig，用于測試預訓練模型
class InferenceConfig(config.__class__):
    # Run detection on one image at a time
    GPU_COUNT = 
    IMAGES_PER_GPU = 

config = InferenceConfig()
DEVICE = "/cpu:0"  # /cpu:0 or /gpu:0
TEST_MODE = "inference" # values: 'inference' or 'training'

# 加載驗證集 
if config.NAME == 'shapes':
    dataset = shapes.ShapesDataset()
    dataset.load_shapes(, config.IMAGE_SHAPE[], config.IMAGE_SHAPE[])
elif config.NAME == "coco":
    dataset = coco.CocoDataset()
    dataset.load_coco(COCO_DIR, "minival")

# Must call before using the dataset
dataset.prepare()

# 建立模型并設定inference mode
with tf.device(DEVICE):
    model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR,
                              config=config)

# Set weights file path
if config.NAME == "shapes":
    weights_path = SHAPES_MODEL_PATH
elif config.NAME == "coco":
    weights_path = COCO_MODEL_PATH
# Or, uncomment to load the last model you trained
# weights_path = model.find_last()[1]

# Load weights
print("Loading weights ", weights_path)
model.load_weights(weights_path, by_name=True)


image_id = random.choice(dataset.image_ids)
image, image_meta, gt_bbox, gt_mask =\
    modellib.load_image_gt(dataset, config, image_id, use_mini_mask=False)
info = dataset.image_info[image_id]
print("image ID: {}.{} ({}) {}".format(info["source"], info["id"], image_id, 
                                       dataset.image_reference(image_id)))
gt_class_id = gt_bbox[:, ]

# Run object detection
results = model.detect([image], verbose=)

# Display results
ax = get_ax()
r = results[]
# visualize.display_instances(image, gt_bbox[:,:4], gt_mask, gt_bbox[:,4], 
#                             dataset.class_names, ax=ax[0], title="Ground Truth")
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], 
                            dataset.class_names, r['scores'], ax=ax,
                            title="Predictions")
log("gt_class_id", gt_class_id)
log("gt_bbox", gt_bbox)
log("gt_mask", gt_mask)

随機在資料集中選張照片看看：

區域候選網絡(Region Proposal Network,RPN)

RPN網絡的任務就是做目标區域推薦，從R-CNN中使用的Selective Search方法到Faster R-CNN中使用的Anchor方法，目的就是用更快的方法産生更好的RoI。

RPN在圖像上建立大量的boxes(anchors)，并在anchors上運作一個輕量級的二值分類器傳回有目标/無目标的分數。具有高分數的anchors(positive anchors，正樣本)會被傳到下一階段用于分類。

通常，positive anchors也不會完全覆寫目标，是以RPN在對anchor打分的同時會回歸一個偏移量和放縮值，用于修正anchors位置和大小。

RPN Target

RPN Target是需要找到有目标的anchor，傳遞到模型後面用于分類等任務。RPN會在一個完整的圖檔上覆寫多種不同形狀的anchors，通過計算anchors與标注的ground truth(GT box)的IoU，認為IoU≥0.7為正樣本，IoU≤0.3為負樣本，卡在中間的丢棄為中立樣本，訓練模型不使用。

上面提到了訓練RPN的同時會回歸一個偏移量和放縮值，目的就是用來修正anchor的位置和大小，最終更好的與ground truth相cover。

# 生成RPN trainig targets
# target_rpn_match 值為1代表positive anchors, -1代表negative，0代表neutral.
target_rpn_match, target_rpn_bbox = modellib.build_rpn_targets(
    image.shape, model.anchors, gt_bbox, model.config)

log("target_rpn_match", target_rpn_match)
log("target_rpn_bbox", target_rpn_bbox)

# 分類所有anchor
positive_anchor_ix = np.where(target_rpn_match[:] == )[]
negative_anchor_ix = np.where(target_rpn_match[:] == -)[]
neutral_anchor_ix = np.where(target_rpn_match[:] == )[]

positive_anchors = model.anchors[positive_anchor_ix]
negative_anchors = model.anchors[negative_anchor_ix]
neutral_anchors = model.anchors[neutral_anchor_ix]
log("positive_anchors", positive_anchors)
log("negative_anchors", negative_anchors)
log("neutral anchors", neutral_anchors)

# 對positive anchor做修正
refined_anchors = utils.apply_box_deltas(
    positive_anchors,
    target_rpn_bbox[:positive_anchors.shape[]] * model.config.RPN_BBOX_STD_DEV)
log("refined_anchors", refined_anchors, )

>>>
>>>
target_rpn_match         shape: (,)              min:   -  max:    
target_rpn_bbox          shape: (, )              min:   -  max:    
positive_anchors         shape: (, )               min:  -  max: 
negative_anchors         shape: (, )              min:  -  max: 
neutral anchors          shape: (, )            min: -  max: 
refined_anchors          shape: (, )               min:   -  max:

看看positive anchors和修正後的positive anchors:

visualize.draw_boxes(image, boxes=positive_anchors, refined_boxes=refined_anchors, ax=get_ax())

RPN Prediction

# Run RPN sub-graph
pillar = model.keras_model.get_layer("ROI").output  # node to start searching from
rpn = model.run_graph([image], [
    ("rpn_class", model.keras_model.get_layer("rpn_class").output),
    ("pre_nms_anchors", model.ancestor(pillar, "ROI/pre_nms_anchors:0")),
    ("refined_anchors", model.ancestor(pillar, "ROI/refined_anchors:0")),
    ("refined_anchors_clipped", model.ancestor(pillar, "ROI/refined_anchors_clipped:0")),
    ("post_nms_anchor_ix", model.ancestor(pillar, "ROI/rpn_non_max_suppression:0")),
    ("proposals", model.keras_model.get_layer("ROI").output),
])

>>>
>>>
rpn_class                shape: (, , )         min:      max:    
pre_nms_anchors          shape: (, , )         min: -  max: 
refined_anchors          shape: (, , )         min: -  max: 
refined_anchors_clipped  shape: (, , )         min:      max: 
post_nms_anchor_ix       shape: (,)               min:      max: 
proposals                shape: (, , )          min:      max:

看看高分的anchors（沒有修正前):

limit = 100
sorted_anchor_ids = np.argsort(rpn['rpn_class'][:,:,1].flatten())[::-1]
visualize.draw_boxes(image, boxes=model.anchors[sorted_anchor_ids[:limit]], ax=get_ax())

看看修正後的高分anchors，超過的圖檔邊界的會被截止:

limit = 50
ax = get_ax(1, 2)
visualize.draw_boxes(image, boxes=rpn["pre_nms_anchors"][0, :limit], 
           refined_boxes=rpn["refined_anchors"][0, :limit], ax=ax[0])
visualize.draw_boxes(image, refined_boxes=rpn["refined_anchors_clipped"][0, :limit], ax=ax[1])

對上面的anchors做非極大值抑制:

limit = 50
ixs = rpn["post_nms_anchor_ix"][:limit]
visualize.draw_boxes(image, refined_boxes=rpn["refined_anchors_clipped"][0, ixs], ax=get_ax())

最終的proposal和上面的步驟一緻，隻是在坐标上做了歸一化操作:

limit = 50
# Convert back to image coordinates for display
h, w = config.IMAGE_SHAPE[:2]
proposals = rpn['proposals'][0, :limit] * np.array([h, w, h, w])
visualize.draw_boxes(image, refined_boxes=proposals, ax=get_ax())

測量RPN的召回率(目标被anchors覆寫的比例)，這裡我們計算召回率有三種方法：

所有的 anchors
所有修正的anchors
經過極大值抑制後的修正Anchors

iou_threshold = 

recall, positive_anchor_ids = utils.compute_recall(model.anchors, gt_bbox, iou_threshold)
print("All Anchors ({:5})       Recall: {:.3f}  Positive anchors: {}".format(
    model.anchors.shape[], recall, len(positive_anchor_ids)))

recall, positive_anchor_ids = utils.compute_recall(rpn['refined_anchors'][], gt_bbox, iou_threshold)
print("Refined Anchors ({:5})   Recall: {:.3f}  Positive anchors: {}".format(
    rpn['refined_anchors'].shape[], recall, len(positive_anchor_ids)))

recall, positive_anchor_ids = utils.compute_recall(proposals, gt_bbox, iou_threshold)
print("Post NMS Anchors ({:5})  Recall: {:.3f}  Positive anchors: {}".format(
    proposals.shape[], recall, len(positive_anchor_ids)))

>>>
>>>
All Anchors ()       Recall:   Positive anchors: 
Refined Anchors ()   Recall:   Positive anchors: 
Post NMS Anchors (   )  Recall:   Positive anchors:

Proposal 分類

前面RPN Target是生成region proposal，這裡就要對其分類了~

Proposal Classification

将RPN推選出來的Proposal送到分類部分，最終生成種類機率分布和bbox回歸。

# Get input and output to classifier and mask heads.
mrcnn = model.run_graph([image], [
    ("proposals", model.keras_model.get_layer("ROI").output),
    ("probs", model.keras_model.get_layer("mrcnn_class").output),
    ("deltas", model.keras_model.get_layer("mrcnn_bbox").output),
    ("masks", model.keras_model.get_layer("mrcnn_mask").output),
    ("detections", model.keras_model.get_layer("mrcnn_detection").output),
])

>>>
>>>
proposals                shape: (, , )          min:      max:    
probs                    shape: (, , )         min:      max:    
deltas                   shape: (, , , )      min:   -  max:    
masks                    shape: (, , , , )  min:      max:    
detections               shape: (, , )           min:      max:

擷取檢測種類，除去填充的0部分:

det_class_ids = mrcnn['detections'][, :, ].astype(np.int32)
det_count = np.where(det_class_ids == )[][]
det_class_ids = det_class_ids[:det_count]
detections = mrcnn['detections'][, :det_count]

print("{} detections: {}".format(
    det_count, np.array(dataset.class_names)[det_class_ids]))

captions = ["{} {:.3f}".format(dataset.class_names[int(c)], s) if c >  else ""
            for c, s in zip(detections[:, ], detections[:, ])]
visualize.draw_boxes(
    image, 
    refined_boxes=detections[:, :],
    visibilities=[] * len(detections),
    captions=captions, title="Detections",
    ax=get_ax())

>>>
>>>
 detections: ['person' 'person' 'person' 'person' 'person' 'orange' 'person' 'orange'
 'dog' 'handbag' 'apple']

Step by Step Detection

# Proposals是标準坐标， 放縮回圖檔坐标
h, w = config.IMAGE_SHAPE[:]
proposals = np.around(mrcnn["proposals"][] * np.array([h, w, h, w])).astype(np.int32)

# Class ID, score, and mask per proposal
roi_class_ids = np.argmax(mrcnn["probs"][], axis=)
roi_scores = mrcnn["probs"][, np.arange(roi_class_ids.shape[]), roi_class_ids]
roi_class_names = np.array(dataset.class_names)[roi_class_ids]
roi_positive_ixs = np.where(roi_class_ids > )[]

# How many ROIs vs empty rows?
print("{} Valid proposals out of {}".format(np.sum(np.any(proposals, axis=)), proposals.shape[]))
print("{} Positive ROIs".format(len(roi_positive_ixs)))

# Class counts
print(list(zip(*np.unique(roi_class_names, return_counts=True))))

>>>
>>>
 Valid proposals out of 
 Positive ROIs
[('BG', ), ('apple', ), ('cup', ), ('dog', ), ('handbag', ), ('orange', ), ('person', ), ('sandwich', )]

看一些随機取出的proposal樣本，BG的不做顯示，主要看有類别的，還有其對應的分數:

limit = 
ixs = np.random.randint(, proposals.shape[], limit)
captions = ["{} {:.3f}".format(dataset.class_names[c], s) if c >  else ""
            for c, s in zip(roi_class_ids[ixs], roi_scores[ixs])]
visualize.draw_boxes(image, boxes=proposals[ixs],
                     visibilities=np.where(roi_class_ids[ixs] > , , ),
                     captions=captions, title="ROIs Before Refinment",
                     ax=get_ax())

做bbox修正:

# Class-specific bounding box shifts.
roi_bbox_specific = mrcnn["deltas"][, np.arange(proposals.shape[]), roi_class_ids]
log("roi_bbox_specific", roi_bbox_specific)

# Apply bounding box transformations
# Shape: [N, (y1, x1, y2, x2)]
refined_proposals = utils.apply_box_deltas(
    proposals, roi_bbox_specific * config.BBOX_STD_DEV).astype(np.int32)
log("refined_proposals", refined_proposals)

# Show positive proposals
# ids = np.arange(roi_boxes.shape[0])  # Display all
limit = 
ids = np.random.randint(, len(roi_positive_ixs), limit)  # Display random sample
captions = ["{} {:.3f}".format(dataset.class_names[c], s) if c >  else ""
            for c, s in zip(roi_class_ids[roi_positive_ixs][ids], roi_scores[roi_positive_ixs][ids])]
visualize.draw_boxes(image, boxes=proposals[roi_positive_ixs][ids],
                     refined_boxes=refined_proposals[roi_positive_ixs][ids],
                     visibilities=np.where(roi_class_ids[roi_positive_ixs][ids] > , , ),
                     captions=captions, title="ROIs After Refinment",
                     ax=get_ax())

>>>
>>>
roi_bbox_specific        shape: (, )             min:   -  max:    
refined_proposals        shape: (, )             min:   -  max:

濾掉低分的檢測目标:

# Remove boxes classified as background
keep = np.where(roi_class_ids > )[]
print("Keep {} detections:\n{}".format(keep.shape[], keep))

# Remove low confidence detections
keep = np.intersect1d(keep, np.where(roi_scores >= config.DETECTION_MIN_CONFIDENCE)[])
print("Remove boxes below {} confidence. Keep {}:\n{}".format(
    config.DETECTION_MIN_CONFIDENCE, keep.shape[], keep))

>>>
>>>
Keep  detections:
[                                            
                                    
                              
                  
                  
                ]

Remove boxes below  confidence. Keep :
[                                           
                         
        ]

做非極大值抑制操作:

# Apply per-class non-max suppression
pre_nms_boxes = refined_proposals[keep]
pre_nms_scores = roi_scores[keep]
pre_nms_class_ids = roi_class_ids[keep]

nms_keep = []
for class_id in np.unique(pre_nms_class_ids):
    # Pick detections of this class
    ixs = np.where(pre_nms_class_ids == class_id)[]
    # Apply NMS
    class_keep = utils.non_max_suppression(pre_nms_boxes[ixs], 
                                            pre_nms_scores[ixs],
                                            config.DETECTION_NMS_THRESHOLD)
    # Map indicies
    class_keep = keep[ixs[class_keep]]
    nms_keep = np.union1d(nms_keep, class_keep)
    print("{:22}: {} -> {}".format(dataset.class_names[class_id][:], 
                                   keep[ixs], class_keep))

keep = np.intersect1d(keep, nms_keep).astype(np.int32)
print("\nKept after per-class NMS: {}\n{}".format(keep.shape[], keep))

>>>
>>>
person                : [                                       
         ] -> [         ]
dog                   : [     ] -> []
handbag               : [] -> []
apple                 : [] -> []
orange                : [                  ] -> [  ]

Kept after per-class NMS: 
[                         ]

看看最後的結果：

ixs = np.arange(len(keep))  # Display all
# ixs = np.random.randint(0, len(keep), 10)  # Display random sample
captions = ["{} {:.3f}".format(dataset.class_names[c], s) if c >  else ""
            for c, s in zip(roi_class_ids[keep][ixs], roi_scores[keep][ixs])]
visualize.draw_boxes(
    image, boxes=proposals[keep][ixs],
    refined_boxes=refined_proposals[keep][ixs],
    visibilities=np.where(roi_class_ids[keep][ixs] > , , ),
    captions=captions, title="Detections after NMS",
    ax=get_ax())

生成Mask

在上一階段産生的執行個體基礎上，通過mask head為每個執行個體産生分割mask。

Mask Target

即Mask分支的訓練目标:

Predicted Masks

# Get predictions of mask head
mrcnn = model.run_graph([image], [
    ("detections", model.keras_model.get_layer("mrcnn_detection").output),
    ("masks", model.keras_model.get_layer("mrcnn_mask").output),
])

# Get detection class IDs. Trim zero padding.
det_class_ids = mrcnn['detections'][, :, ].astype(np.int32)
det_count = np.where(det_class_ids == )[][]
det_class_ids = det_class_ids[:det_count]

print("{} detections: {}".format(
    det_count, np.array(dataset.class_names)[det_class_ids]))

# Masks
det_boxes = mrcnn["detections"][, :, :].astype(np.int32)
det_mask_specific = np.array([mrcnn["masks"][, i, :, :, c] 
                              for i, c in enumerate(det_class_ids)])
det_masks = np.array([utils.unmold_mask(m, det_boxes[i], image.shape)
                      for i, m in enumerate(det_mask_specific)])
log("det_mask_specific", det_mask_specific)
log("det_masks", det_masks)

display_images(det_mask_specific[:] * , cmap="Blues", interpolation="none")

>>>
>>>
detections               shape: (, , )           min:      max:  
masks                    shape: (, , , , )  min:      max:    
 detections: ['person' 'person' 'person' 'person' 'person' 'orange' 'person' 'orange'
 'dog' 'handbag' 'apple']

det_mask_specific        shape: (, , )          min:      max:    
det_masks                shape: (, , )      min:      max: