天天看点

mmdetection源码阅读笔记(1)--创建网络创建cascade rcnn网络backboneneckRPN HEADassigners and samplersbbox headmask head小结

之前写了mmdetection的模型创建部分,这次以cascade rcnn为例具体看下网络是怎么构建的。

讲网络之前,要先看看配置文件,这里我主要结合官方提供的

cascade_mask_rcnn_r50_fpn_1x.py

来看具体实现,关于这些配置项具体的含义可以看mmdetection的configs中的各项参数具体解释

创建cascade rcnn网络

先找到cascade rcnn的定义文件

mmdet/models/detectors/cascade_rcnn.py

这里我将cascade rcnn网络的创建过程主要分为5个部分。

  • backbone
  • neck
  • rpn_head
  • bbox_head
  • mask_head

backbone

cascade rcnn的backb选择的是

res50

,创建backbone的方式和之前一样,也是将支持的模型注册到

registry

中,只后再通过

builder

进行实例化。

resnet

的定义文件在

mmdet/models/backbones/resnet.py

def forward(self, x):
        x = self.conv1(x)
        x = self.norm1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        outs = []
        for i, layer_name in enumerate(self.res_layers):
            res_layer = getattr(self, layer_name)
            x = res_layer(x)
            if i in self.out_indices:
                outs.append(x)
        if len(outs) == 1:
            return outs[0]
        else:
            return tuple(outs)
           

forward

中outs取的是多stage的输出,先拼成一个list在转成tuple,取哪些stage是根据config中的

out_indices

model = dict(
    type='CascadeRCNN',
    num_stages=3,
    pretrained='modelzoo://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        style='pytorch'),
           

backbone是4stage,取了所有的stage。

backbone的主要作用就是提取图像特征。

neck

这部分主要是实现

FPN

,FPN讲解

先看下config文件中与FPN相关的部分

neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
           

in_channels

与之前

backbone

的输出相匹配,

out_channels

为输出纬度。

FPN

定义在

mmdet/models/necks/fpn.py

,其中

__init__.py

for i in range(self.start_level, self.backbone_end_level):
            l_conv = ConvModule(
                in_channels[i],
                out_channels,
                1,
                normalize=normalize,
                bias=self.with_bias,
                activation=self.activation,
                inplace=False)
            fpn_conv = ConvModule(
                out_channels,
                out_channels,
                3,
                padding=1,
                normalize=normalize,
                bias=self.with_bias,
                activation=self.activation,
                inplace=False)

            self.lateral_convs.append(l_conv)
            self.fpn_convs.append(fpn_conv)
           

这里的

self.start_level

为0

self.backbone_end_level

len(in_channels)

,也就是说这里定义的

lateral_convs

fpn_convs

的长度和输入的长度是相等的。

这里可以这样理解,之前backbone的输出是多层的特征图,这里对每层的输出用不同的

ConvModule

来处理,再统一

channel

数,就完成了高低层特征的融合。可能比较绕,结合代码就比较好理解了。

下面是

forward

函数部分代码。

# build laterals
        laterals = [
            lateral_conv(inputs[i + self.start_level])
            for i, lateral_conv in enumerate(self.lateral_convs)
        ]
# part 1: from original levels
        outs = [
            self.fpn_convs[i](laterals[i]) for i in range(used_backbone_levels)
        ]
           

其实这部分也可以看成是在提取特征,到下面RPN部分就真正涉及到目标检测了。

RPN HEAD

cascade rcnn

rpn_head

乍一看感觉还挺简单的,因为这部分主要就两个网络。主要涉及到两个文件

mmdet/models/anchor_head/anchor_head.py

mmdet/models/anchor_head/rpn_head.py

后者是前者的子类。

先是config相关项

rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_scales=[8],
        anchor_ratios=[0.5, 1.0, 2.0],
        anchor_strides=[4, 8, 16, 32, 64],
        target_means=[.0, .0, .0, .0],
        target_stds=[1.0, 1.0, 1.0, 1.0],
        use_sigmoid_cls=True),
           

rpn_head

的主要实现如下

#定义网络
    def _init_layers(self):
        self.rpn_conv = nn.Conv2d(
            self.in_channels, self.feat_channels, 3, padding=1)
        self.rpn_cls = nn.Conv2d(self.feat_channels,
                                 self.num_anchors * self.cls_out_channels, 1)
        self.rpn_reg = nn.Conv2d(self.feat_channels, self.num_anchors * 4, 1)
    #forward
    def forward_single(self, x):
        x = self.rpn_conv(x)
        x = F.relu(x, inplace=True)
        rpn_cls_score = self.rpn_cls(x)
        rpn_bbox_pred = self.rpn_reg(x)
        return rpn_cls_score, rpn_bbox_pred
           

很简单,就只有两个网络,判断是否是前景(rpn_cls),预测框的修改值(rpn_reg)。并且其中

self.num_anchors = len(self.anchor_ratios) * len(self.anchor_scales)

但是RPN的目标是得到候选框,所以这里就还要用到

anchor_head.py

中的另一个函数

get_bboxs()

def get_bboxes(self, cls_scores, bbox_preds, img_metas, cfg,
                   rescale=False):
        assert len(cls_scores) == len(bbox_preds)
        num_levels = len(cls_scores)

        mlvl_anchors = [
            self.anchor_generators[i].grid_anchors(cls_scores[i].size()[-2:], self.anchor_strides[i])
            for i in range(num_levels)
        ]
        result_list = []
        for img_id in range(len(img_metas)):
            cls_score_list = [
                cls_scores[i][img_id].detach() for i in range(num_levels)
            ]
            bbox_pred_list = [
                bbox_preds[i][img_id].detach() for i in range(num_levels)
            ]
            img_shape = img_metas[img_id]['img_shape']
            scale_factor = img_metas[img_id]['scale_factor']
            proposals = self.get_bboxes_single(cls_score_list, bbox_pred_list,
                                               mlvl_anchors, img_shape,
                                               scale_factor, cfg, rescale)
            result_list.append(proposals)
        return result_list
           

在这里先通过

self.anchor_generators[i].grid_anchors()

这个函数取到所有的

anchor_boxs

,再通过

self.get_bboxes_single()

根据之前rpn的结果获取到候选框(proposal boxs)。

self.get_bboxes_single()

中,先在每个尺度上取2000个

anchor

出来,

concat

到一起作为该图像的anchor,对这些

anchor boxs

nms(thr=0.7)

就得到了所需的候选框。

这部分还有他的

loss

比较复杂,就放到之后写

loss

的时候在一起写。

assigners and samplers

上一步

rpn

输出了一堆候选框,但是在将这些候选框拿去训练之前还需要分为正负样本。

assigners

就是完成这个工作的。

cascade_rcnn

默认使用的是

MaxIoUAssigner

定义在

mmdet/core/bbox/assigners/max_iou_assigner.py

主要用到的是

assign()

def assign(self, bboxes, gt_bboxes, gt_bboxes_ignore=None, gt_labels=None):
        """Assign gt to bboxes.

        This method assign a gt bbox to every bbox (proposal/anchor), each bbox
        will be assigned with -1, 0, or a positive number. -1 means don't care,
        0 means negative sample, positive number is the index (1-based) of
        assigned gt.
        The assignment is done in following steps, the order matters.

        1. assign every bbox to -1
        2. assign proposals whose iou with all gts < neg_iou_thr to 0
        3. for each bbox, if the iou with its nearest gt >= pos_iou_thr,
           assign it to that bbox
        4. for each gt bbox, assign its nearest proposals (may be more than
           one) to itself

        Args:
            bboxes (Tensor): Bounding boxes to be assigned, shape(n, 4).
            gt_bboxes (Tensor): Groundtruth boxes, shape (k, 4).
            gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are
                labelled as `ignored`, e.g., crowd boxes in COCO.
            gt_labels (Tensor, optional): Label of gt_bboxes, shape (k, ).

        Returns:
            :obj:`AssignResult`: The assign result.
        """
        if bboxes.shape[0] == 0 or gt_bboxes.shape[0] == 0:
            raise ValueError('No gt or bboxes')
        bboxes = bboxes[:, :4]
        overlaps = bbox_overlaps(gt_bboxes, bboxes)

        if (self.ignore_iof_thr > 0) and (gt_bboxes_ignore is not None) and (
                gt_bboxes_ignore.numel() > 0):
            if self.ignore_wrt_candidates:
                ignore_overlaps = bbox_overlaps(
                    bboxes, gt_bboxes_ignore, mode='iof')
                ignore_max_overlaps, _ = ignore_overlaps.max(dim=1)
            else:
                ignore_overlaps = bbox_overlaps(
                    gt_bboxes_ignore, bboxes, mode='iof')
                ignore_max_overlaps, _ = ignore_overlaps.max(dim=0)
            overlaps[:, ignore_max_overlaps > self.ignore_iof_thr] = -1

        assign_result = self.assign_wrt_overlaps(overlaps, gt_labels)
        return assign_result
           

proposal

分为正负样本过后,通过

sampler

对这些

proposal

进行采样得到

sampler_result

进行训练。

cascade_rcnn

默认使用的是

RandomSampler

定义在

mmdet/core/bbox/sampler/random_sampler.py

@staticmethod
    def random_choice(gallery, num):
        """Random select some elements from the gallery.

        It seems that Pytorch's implementation is slower than numpy so we use
        numpy to randperm the indices.
        """
        assert len(gallery) >= num
        if isinstance(gallery, list):
            gallery = np.array(gallery)
        cands = np.arange(len(gallery))
        np.random.shuffle(cands)
        rand_inds = cands[:num]
        if not isinstance(gallery, np.ndarray):
            rand_inds = torch.from_numpy(rand_inds).long().to(gallery.device)
        return gallery[rand_inds]

    def _sample_pos(self, assign_result, num_expected, **kwargs):
        """Randomly sample some positive samples."""
        pos_inds = torch.nonzero(assign_result.gt_inds > 0)
        if pos_inds.numel() != 0:
            pos_inds = pos_inds.squeeze(1)
        if pos_inds.numel() <= num_expected:
            return pos_inds
        else:
            return self.random_choice(pos_inds, num_expected)

    def _sample_neg(self, assign_result, num_expected, **kwargs):
        """Randomly sample some negative samples."""
        neg_inds = torch.nonzero(assign_result.gt_inds == 0)
        if neg_inds.numel() != 0:
            neg_inds = neg_inds.squeeze(1)
        if len(neg_inds) <= num_expected:
            return neg_inds
        else:
            return self.random_choice(neg_inds, num_expected)
           

重写了两个sample函数供父类调用。

主要用到的是其父类

mmdet/core/bbox/sampler/base_sampler.py

定义的

sample

def sample(self,
               assign_result,
               bboxes,
               gt_bboxes,
               gt_labels=None,
               **kwargs):
        """Sample positive and negative bboxes.

        This is a simple implementation of bbox sampling given candidates,
        assigning results and ground truth bboxes.

        Args:
            assign_result (:obj:`AssignResult`): Bbox assigning results.
            bboxes (Tensor): Boxes to be sampled from.
            gt_bboxes (Tensor): Ground truth bboxes.
            gt_labels (Tensor, optional): Class labels of ground truth bboxes.

        Returns:
            :obj:`SamplingResult`: Sampling result.
        """
        bboxes = bboxes[:, :4]

        gt_flags = bboxes.new_zeros((bboxes.shape[0], ), dtype=torch.uint8)
        if self.add_gt_as_proposals:
            bboxes = torch.cat([gt_bboxes, bboxes], dim=0)
            assign_result.add_gt_(gt_labels)
            gt_ones = bboxes.new_ones(gt_bboxes.shape[0], dtype=torch.uint8)
            gt_flags = torch.cat([gt_ones, gt_flags])

        num_expected_pos = int(self.num * self.pos_fraction)
        pos_inds = self.pos_sampler._sample_pos(
            assign_result, num_expected_pos, bboxes=bboxes, **kwargs)
        # We found that sampled indices have duplicated items occasionally.
        # (may be a bug of PyTorch)
        pos_inds = pos_inds.unique()
        num_sampled_pos = pos_inds.numel()
        num_expected_neg = self.num - num_sampled_pos
        if self.neg_pos_ub >= 0:
            _pos = max(1, num_sampled_pos)
            neg_upper_bound = int(self.neg_pos_ub * _pos)
            if num_expected_neg > neg_upper_bound:
                num_expected_neg = neg_upper_bound
        neg_inds = self.neg_sampler._sample_neg(
            assign_result, num_expected_neg, bboxes=bboxes, **kwargs)
        neg_inds = neg_inds.unique()

        return SamplingResult(pos_inds, neg_inds, bboxes, gt_bboxes,
                              assign_result, gt_flags)
           

现在bbox已经处理好了,之后就是将这些框分别送到

bbox head

mask head

了。

bbox head

当然之前得到的那些框还不能直接送到

bbox head

,在此之前还要做一次

RoI Pooling

,将不同大小的框映射成固定大小。

具体定义在

mmdet/models/roi_extractors/single_level.py

def forward(self, feats, rois):
        if len(feats) == 1:
            return self.roi_layers[0](feats[0], rois)

        out_size = self.roi_layers[0].out_size
        num_levels = len(feats)
        target_lvls = self.map_roi_levels(rois, num_levels)
        roi_feats = torch.cuda.FloatTensor(rois.size()[0], self.out_channels,
                                           out_size, out_size).fill_(0)
        for i in range(num_levels):
            inds = target_lvls == i
            if inds.any():
                rois_ = rois[inds, :]
                roi_feats_t = self.roi_layers[i](feats[i], rois_)
                roi_feats[inds] += roi_feats_t
        return roi_feats
           

这里的

roi_layers

用的是

RoIAlign

,RoI的结果就可以送到

bbox head

了。

bbox head

部分和之前的

rpn

部分的操作差不多,主要是针对每个框进行分类和坐标修正。之前

rpn

分为前景和背景两类,这里分为

N+1

类(实际类别 + 背景)。具体代码在

mmdet/models/bbox_head/convfc_bbox_head.py

def forward(self, x):
        # shared part
        if self.num_shared_convs > 0:
            for conv in self.shared_convs:
                x = conv(x)

        if self.num_shared_fcs > 0:
            if self.with_avg_pool:
                x = self.avg_pool(x)
            x = x.view(x.size(0), -1)
            for fc in self.shared_fcs:
                x = self.relu(fc(x))
        # separate branches
        x_cls = x
        x_reg = x

        for conv in self.cls_convs:
            x_cls = conv(x_cls)
        if x_cls.dim() > 2:
            if self.with_avg_pool:
                x_cls = self.avg_pool(x_cls)
            x_cls = x_cls.view(x_cls.size(0), -1)
        for fc in self.cls_fcs:
            x_cls = self.relu(fc(x_cls))

        for conv in self.reg_convs:
            x_reg = conv(x_reg)
        if x_reg.dim() > 2:
            if self.with_avg_pool:
                x_reg = self.avg_pool(x_reg)
            x_reg = x_reg.view(x_reg.size(0), -1)
        for fc in self.reg_fcs:
            x_reg = self.relu(fc(x_reg))

        cls_score = self.fc_cls(x_cls) if self.with_cls else None
        bbox_pred = self.fc_reg(x_reg) if self.with_reg else None
        return cls_score, bbox_pred
           

forward

的输出就是框的分类score和坐标。

之后再通过这两个结果去计算

bbox_loss

,这个也放到之后在写。

下面就是与

bbox head

平行的另一个分支

mask head

了。

mask head

mask

部分的流程和

bbox

部分相同,也是先对之前的候选框先做一次

RoI Pooling

,这里的

RoI

与之前

bbox

网络都一样只是部分参数不同。

具体定义在

mmdet/models/mask_heads/fcn_mask_head.py

def forward(self, x):
        for conv in self.convs:
            x = conv(x)
        if self.upsample is not None:
            x = self.upsample(x)
            if self.upsample_method == 'deconv':
                x = self.relu(x)
        mask_pred = self.conv_logits(x)
        return mask_pred
           

forward

的输出就是每个像素点的分类值,之后也是通过这个结果去计算

mask loss

bbox head

和这部分

forward

的输出结果都不是测试阶段的最终结果,还需要进行其他操作才能得到测试结果。这部分之后写

test

的时候再写。

小结

这篇主要写了

mmdetection

cascade_rcnn

的网络创建过程,之前想的是慢慢抠细节,争取把每部分的细节都写了,但是实际看的时候还是觉得太复杂了,就先把整体流程写了一遍,相当于把整体骨架写了。准备之后把

loss

和测试部分写完了,在慢慢来抠每个部分的细节。

继续阅读