天天看點

【Mask R-CNN】(九):代碼了解inspect_data.ipynb一、導包二、配置三、資料集四、可視化五、Bounding Boxes六、縮放圖像七、Mini Masks八、Anchors九、Data Generator十、ROIS

一、導包

首先,導入包。

import os
import sys
import itertools
import math
import logging
import json
import re
import random
from collections import OrderedDict
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib.lines as lines
from matplotlib.patches import Polygon

#設定項目根目錄
ROOT_DIR = os.path.abspath("../../")

#導入Mask RCNN
sys.path.append(ROOT_DIR)
from mrcnn import utils
from mrcnn import visualize
from mrcnn.visualize import display_images
import mrcnn.model as modellib
from mrcnn.model import log

%matplotlib inline
           

二、配置

配置資訊。

#下面兩個代碼塊選擇運作其一即可

# Shapes toy dataset
# import shapes
# config = shapes.ShapesConfig()

# MS COCO Dataset
import coco
config = coco.CocoConfig()
#COCO資料集的路徑
COCO_DIR = "path/to/COCO" 
           

三、資料集

#加載資料集dataset
#加載shapes資料集
if config.NAME == 'shapes':
    dataset = shapes.ShapesDataset()
    dataset.load_shapes(500, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])
#加載coco資料集
elif config.NAME == "coco":
    dataset = coco.CocoDataset()
    dataset.load_coco(COCO_DIR, "train")

#下面這句必須在使用資料集之前調用
dataset.prepare()

#列印資料集資訊
print("Image Count: {}".format(len(dataset.image_ids)))
print("Class Count: {}".format(dataset.num_classes))
for i, info in enumerate(dataset.class_info):
    print("{:3}. {:50}".format(i, info['name']))
           

四、可視化

可視化資料集資訊和mask。

#随機選取一個樣本加載并顯示
image_ids = np.random.choice(dataset.image_ids, 4)
for image_id in image_ids:
    image = dataset.load_image(image_id)
    mask, class_ids = dataset.load_mask(image_id)
    visualize.display_top_masks(image, mask, class_ids, dataset.class_names)
           
【Mask R-CNN】(九):代碼了解inspect_data.ipynb一、導包二、配置三、資料集四、可視化五、Bounding Boxes六、縮放圖像七、Mini Masks八、Anchors九、Data Generator十、ROIS
【Mask R-CNN】(九):代碼了解inspect_data.ipynb一、導包二、配置三、資料集四、可視化五、Bounding Boxes六、縮放圖像七、Mini Masks八、Anchors九、Data Generator十、ROIS

五、Bounding Boxes

這裡代碼沒有使用資料集提供的bounding box坐标,而是通過mask計算得到。這樣可以不管是什麼資料集都可以用同一方法處理bounding boxes,并且可以很容易進行縮放,旋轉和裁剪圖像,因為我們是通過更新mask來産生bounding boxes,而不是通過每一種圖像變換來計算bounding box的變換。

#随機選擇圖像并加載mask.
image_id = random.choice(dataset.image_ids)
image = dataset.load_image(image_id)
mask, class_ids = dataset.load_mask(image_id)
#計算Bounding box
bbox = utils.extract_bboxes(mask)

#列印圖像和一些額外的資訊
print("image_id ", image_id, dataset.image_reference(image_id))
log("image", image)
log("mask", mask)
log("class_ids", class_ids)
log("bbox", bbox)
# 顯示圖像和instances
visualize.display_instances(image, bbox, mask, class_ids, dataset.class_names)
           
image_id  74886 http://cocodataset.org/#explore?id=118535
image                    shape: (375, 500, 3)         min:    0.00000  max:  255.00000
mask                     shape: (375, 500, 5)         min:    0.00000  max:    1.00000
class_ids                shape: (5,)                  min:    1.00000  max:   35.00000
bbox                     shape: (5, 4)                min:    1.00000  max:  329.00000
           
【Mask R-CNN】(九):代碼了解inspect_data.ipynb一、導包二、配置三、資料集四、可視化五、Bounding Boxes六、縮放圖像七、Mini Masks八、Anchors九、Data Generator十、ROIS

六、縮放圖像

因為在一個batch中要處理多幅圖像,是以所有圖像同一縮放到一個尺寸(1024x2014)。盡管代碼中提供了一個長寬比參數,但是當圖像不是正方形時,會在圖像上/下或者左/右填充0。

#随機選擇圖像.
image_id = np.random.choice(dataset.image_ids, 1)[0]
image = dataset.load_image(image_id)
mask, class_ids = dataset.load_mask(image_id)
original_shape = image.shape
# Resize
image, window, scale, padding, _ = utils.resize_image(
    image, 
    min_dim=config.IMAGE_MIN_DIM, 
    max_dim=config.IMAGE_MAX_DIM,
    mode=config.IMAGE_RESIZE_MODE)
mask = utils.resize_mask(mask, scale, padding)
#計算Bounding box
bbox = utils.extract_bboxes(mask)

#列印資訊
print("image_id: ", image_id, dataset.image_reference(image_id))
print("Original shape: ", original_shape)
log("image", image)
log("mask", mask)
log("class_ids", class_ids)
log("bbox", bbox)
#顯示圖像instances
visualize.display_instances(image, bbox, mask, class_ids, dataset.class_names)
           
image_id:  6480 http://cocodataset.org/#explore?id=402563
Original shape:  (476, 640, 3)
image                    shape: (1024, 1024, 3)       min:    0.00000  max:  255.00000
mask                     shape: (1024, 1024, 32)      min:    0.00000  max:    1.00000
class_ids                shape: (32,)                 min:    1.00000  max:   77.00000
bbox                     shape: (32, 4)               min:    1.00000  max:  991.00000
           
【Mask R-CNN】(九):代碼了解inspect_data.ipynb一、導包二、配置三、資料集四、可視化五、Bounding Boxes六、縮放圖像七、Mini Masks八、Anchors九、Data Generator十、ROIS

七、Mini Masks

當訓練高分辨率圖像時,産生的instance二值masks會比較大。比如,當訓練1024x1024分辨率的圖像時,單個instance的mask需要1MB的記憶體空間。如果一副圖像有100個instances,就需要100MB的空間來存儲masks。

為加速訓練速度,對masks做了以下優化:

  • 隻存儲在bounding box内的物體的mask像素,而不是整幅圖像的mask。大部分物體相比整幅圖像來說都很小,是以節省了很多存儲0值得空間。
  • 将mask縮放到更小的尺寸(例如 56x56)。盡管在一些比我們選擇尺寸大的物體上會損失一些精度,但是大部分物體的标注都不是很精确,是以這一點損失在實際應用中可以忽略不計。mini_mask的尺寸可以在config類中設定。
image_id = np.random.choice(dataset.image_ids, 1)[0]
image, image_meta, class_ids, bbox, mask = modellib.load_image_gt(
    dataset, config, image_id, use_mini_mask=False)

log("image", image)
log("image_meta", image_meta)
log("class_ids", class_ids)
log("bbox", bbox)
log("mask", mask)

display_images([image]+[mask[:,:,i] for i in range(min(mask.shape[-1], 7))])
visualize.display_instances(image, bbox, mask, class_ids, dataset.class_names)
           
image                    shape: (1024, 1024, 3)       min:    0.00000  max:  255.00000
image_meta               shape: (89,)                 min:    0.00000  max: 23221.00000
bbox                     shape: (1, 5)                min:   62.00000  max:  578.00000
mask                     shape: (1024, 1024, 1)       min:    0.00000  max:    1.00000
           
【Mask R-CNN】(九):代碼了解inspect_data.ipynb一、導包二、配置三、資料集四、可視化五、Bounding Boxes六、縮放圖像七、Mini Masks八、Anchors九、Data Generator十、ROIS
#使用mini mask
image, image_meta, class_ids, bbox, mask = modellib.load_image_gt(
    dataset, config, image_id, augment=True, use_mini_mask=True)
log("mask", mask)
display_images([image]+[mask[:,:,i] for i in range(min(mask.shape[-1], 7))])
mask = utils.expand_mask(bbox, mask, image.shape)
visualize.display_instances(image, bbox, mask, class_ids, dataset.class_names)
           
mask                     shape: (56, 56, 1)           min:    0.00000  max:    1.00000
           
【Mask R-CNN】(九):代碼了解inspect_data.ipynb一、導包二、配置三、資料集四、可視化五、Bounding Boxes六、縮放圖像七、Mini Masks八、Anchors九、Data Generator十、ROIS

八、Anchors

Anchors的順序是很重要的。在訓練和預測階段使用的順序應該是相同的,并且必須比對卷積執行的順序。

在FPN網絡,anchors的順序必須易于比對卷積層的輸出,用于預測anchor的分數和偏移。

  • 首先,按照金字塔的層次排序。先是第一層的所有anchors,再是第二層。。。這樣容易用不同的層次分離anchors。
  • 在每一層次内部,按照feature map的處理順序進行排序。通常一個卷積層處理一個feature map的順序是,從左上方開始,然後一行一行的往右移動。
  • 在每一個feature map cell内部,以任意順序選擇不同比例的anchors。這裡根據傳遞給函數的比例的順序選擇。

Anchor Stride:在FPN結構中,feature maps隻在前幾層的分辨率較高。例如,輸入圖像的尺寸是1024x1024,則第一層的feature map的大小是256x256,産生大約200k個anchors(2562563)。這些anchors的大小是32x32像素,它們的stride相對于圖像來說是4個像素,是以這裡有很大的重疊。如果我們在feature map上每隔一個cell生成anchors的話就可以顯著的減少負載。例如,将stride設定為2可以将anchors的數量減少到1/4。

是以和paper不同的是,這裡設定anchor的stride等于2。

#生成Anchors
backbone_shapes = modellib.compute_backbone_shapes(config, config.IMAGE_SHAPE)
anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES, 
                                          config.RPN_ANCHOR_RATIOS,
                                          backbone_shapes,
                                          config.BACKBONE_STRIDES, 
                                          config.RPN_ANCHOR_STRIDE)

#列印anchors的資訊
num_levels = len(backbone_shapes)
anchors_per_cell = len(config.RPN_ANCHOR_RATIOS)
print("Count: ", anchors.shape[0])
print("Scales: ", config.RPN_ANCHOR_SCALES)
print("ratios: ", config.RPN_ANCHOR_RATIOS)
print("Anchors per Cell: ", anchors_per_cell)
print("Levels: ", num_levels)
anchors_per_level = []
for l in range(num_levels):
    num_cells = backbone_shapes[l][0] * backbone_shapes[l][1]
    anchors_per_level.append(anchors_per_cell * num_cells // config.RPN_ANCHOR_STRIDE**2)
    print("Anchors in Level {}: {}".format(l, anchors_per_level[l]))
           
Count:  65472
Scales:  (32, 64, 128, 256, 512)
ratios:  [0.5, 1, 2]
Anchors per Cell:  3
Levels:  5
Anchors in Level 0: 49152
Anchors in Level 1: 12288
Anchors in Level 2: 3072
Anchors in Level 3: 768
Anchors in Level 4: 192
           
##標明一個特定層次的位于feature map中間的一個cell内的anchors

#随機選擇一幅圖像加載并顯示
image_id = np.random.choice(dataset.image_ids, 1)[0]
image, image_meta, _, _, _ = modellib.load_image_gt(dataset, config, image_id)
fig, ax = plt.subplots(1, figsize=(10, 10))
ax.imshow(image)
levels = len(backbone_shapes)

for level in range(levels):
    colors = visualize.random_colors(levels)
    # Compute the index of the anchors at the center of the image
    level_start = sum(anchors_per_level[:level]) #前面levels的anchors的總和
    level_anchors = anchors[level_start:level_start+anchors_per_level[level]]
    print("Level {}. Anchors: {:6}  Feature map Shape: {}".format(level, level_anchors.shape[0], 
                                                                  backbone_shapes[level]))
    center_cell = backbone_shapes[level] // 2
    center_cell_index = (center_cell[0] * backbone_shapes[level][1] + center_cell[1])
    level_center = center_cell_index * anchors_per_cell 
    center_anchor = anchors_per_cell * (
        (center_cell[0] * backbone_shapes[level][1] / config.RPN_ANCHOR_STRIDE**2) \
        + center_cell[1] / config.RPN_ANCHOR_STRIDE)
    level_center = int(center_anchor)

    #繪制anchors.按照亮度暗到亮的順序顯示.
    for i, rect in enumerate(level_anchors[level_center:level_center+anchors_per_cell]):
        y1, x1, y2, x2 = rect
        p = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=2, facecolor='none',
                              edgecolor=(i+1)*np.array(colors[level]) / anchors_per_cell)
        ax.add_patch(p)
           
【Mask R-CNN】(九):代碼了解inspect_data.ipynb一、導包二、配置三、資料集四、可視化五、Bounding Boxes六、縮放圖像七、Mini Masks八、Anchors九、Data Generator十、ROIS

九、Data Generator

#建立一個data generator
random_rois = 2000
g = modellib.data_generator(
    dataset, config, shuffle=True, random_rois=random_rois, 
    batch_size=4,
    detection_targets=True)
           
#擷取下一幅圖像
if random_rois:
    [normalized_images, image_meta, rpn_match, rpn_bbox, gt_class_ids, gt_boxes, gt_masks, rpn_rois, rois], \
    [mrcnn_class_ids, mrcnn_bbox, mrcnn_mask] = next(g)
    
    log("rois", rois)
    log("mrcnn_class_ids", mrcnn_class_ids)
    log("mrcnn_bbox", mrcnn_bbox)
    log("mrcnn_mask", mrcnn_mask)
else:
    [normalized_images, image_meta, rpn_match, rpn_bbox, gt_boxes, gt_masks], _ = next(g)
    
log("gt_class_ids", gt_class_ids)
log("gt_boxes", gt_boxes)
log("gt_masks", gt_masks)
log("rpn_match", rpn_match, )
log("rpn_bbox", rpn_bbox)
image_id = modellib.parse_image_meta(image_meta)["image_id"][0]
print("image_id: ", image_id, dataset.image_reference(image_id))

#移除mrcnn_class_ids的最後一個dim. 它僅僅是為了滿足Keras對目标shape的限制.
mrcnn_class_ids = mrcnn_class_ids[:,:,0]
           
b = 0

#恢複原始圖像(逆正規化)
sample_image = modellib.unmold_image(normalized_images[b], config)

#計算anchor偏移.
indices = np.where(rpn_match[b] == 1)[0]
refined_anchors = utils.apply_box_deltas(anchors[indices], rpn_bbox[b, :len(indices)] * config.RPN_BBOX_STD_DEV)
log("anchors", anchors)
log("refined_anchors", refined_anchors)

#擷取positive anchors
positive_anchor_ids = np.where(rpn_match[b] == 1)[0]
print("Positive anchors: {}".format(len(positive_anchor_ids)))
negative_anchor_ids = np.where(rpn_match[b] == -1)[0]
print("Negative anchors: {}".format(len(negative_anchor_ids)))
neutral_anchor_ids = np.where(rpn_match[b] == 0)[0]
print("Neutral anchors: {}".format(len(neutral_anchor_ids)))

# 根據類别分解ROI
for c, n in zip(dataset.class_names, np.bincount(mrcnn_class_ids[b].flatten())):
    if n:
        print("{:23}: {}".format(c[:20], n))

# 顯示positive anchors
visualize.draw_boxes(sample_image, boxes=anchors[positive_anchor_ids], 
                     refined_boxes=refined_anchors)
#顯示negative anchors
visualize.draw_boxes(sample_image, boxes=anchors[negative_anchor_ids])
#顯示neutral anchors.它們不用于訓練.
visualize.draw_boxes(sample_image, boxes=anchors[np.random.choice(neutral_anchor_ids, 100)])
           

十、ROIS

if random_rois:
    #類别明确的bboxes
    bbox_specific = mrcnn_bbox[b, np.arange(mrcnn_bbox.shape[1]), mrcnn_class_ids[b], :]

    #優化ROIs
    refined_rois = utils.apply_box_deltas(rois[b].astype(np.float32), bbox_specific[:,:4] * config.BBOX_STD_DEV)

    #類别明确masks
    mask_specific = mrcnn_mask[b, np.arange(mrcnn_mask.shape[1]), :, :, mrcnn_class_ids[b]]

    visualize.draw_rois(sample_image, rois[b], refined_rois, mask_specific, mrcnn_class_ids[b], dataset.class_names)
    
    #有沒有重複的ROIs?
    rows = np.ascontiguousarray(rois[b]).view(np.dtype((np.void, rois.dtype.itemsize * rois.shape[-1])))
    _, idx = np.unique(rows, return_index=True)
    print("Unique ROIs: {} out of {}".format(len(idx), rois.shape[1]))
           
if random_rois:
    #顯示ROIs和相關的masks,bounding boxes
    ids = random.sample(range(rois.shape[1]), 8)

    images = []
    titles = []
    for i in ids:
        image = visualize.draw_box(sample_image.copy(), rois[b,i,:4].astype(np.int32), [255, 0, 0])
        image = visualize.draw_box(image, refined_rois[i].astype(np.int64), [0, 255, 0])
        images.append(image)
        titles.append("ROI {}".format(i))
        images.append(mask_specific[i] * 255)
        titles.append(dataset.class_names[mrcnn_class_ids[b,i]][:20])

    display_images(images, titles, cols=4, cmap="Blues", interpolation="none")
           
#檢查positive ROIs占一系列圖像的比例.
if random_rois:
    limit = 10
    temp_g = modellib.data_generator(
        dataset, config, shuffle=True, random_rois=10000, 
        batch_size=1, detection_targets=True)
    total = 0
    for i in range(limit):
        _, [ids, _, _] = next(temp_g)
        positive_rois = np.sum(ids[0] > 0)
        total += positive_rois
        print("{:5} {:5.2f}".format(positive_rois, positive_rois/ids.shape[1]))
    print("Average percent: {:.2f}".format(total/(limit*ids.shape[1])))
           

繼續閱讀