YOLO代码解读_model.py

1 概述
2 导入库文件
3 parse_model_cfg()
- 3.1 修正路径
- 3.2 按行读取
- 3.3 模型定义
4 create_modules()
- 4.1 基本单元
- 4.2 YOLO网络结构的基本了解
- 4.3 convolutional
- 4.4 Upsample
- 4.5 route
- 4.6 shortcut层
- 4.7 YOLO层
- 4.8 记录返回
5. YOLOLayer()
- 5.1 构造函数
- 5.2 前馈函数
6 Darknet类
- 6.1 构造函数
- 6.2 forward函数
- 6.3 fuse和info
7 其他函数

1 概述

这个文件主要由

两个类Darknet YOLOLayer，
几个函数create_modules，get_yolo_layers，load_darknet_weights，save_weights，convert，attempt_download

组成

Darknet的构造函数先是调用了parse_model_cfg，create_modules，get_yolo_layers三个函数，所以我们先从他们入手，再进一步分析。

2 导入库文件

utils里面有很多重要的工具，后面要了解了解

from utils.google_utils import *
from utils.layers import *
from utils.parse_config import *

3 parse_model_cfg()

3.1 修正路径

这是一个在parse_config.py的一个函数，这个函数是从cfg文件，读取模型的定义

如果没有加后缀，会自动添加后缀，如果没有前面的路径，会自动添加该路径

# 保证模型的路径格式对，有省略会补全
    if not path.endswith('.cfg'):  # add .cfg suffix if omitted
        path += '.cfg'
    if not os.path.exists(path) and os.path.exists('cfg' + os.sep + path):  # add cfg/ prefix if omitted
        path = 'cfg' + os.sep + path

3.2 按行读取

按行读取，

去掉空行，if x

去掉注释行not x.startswith(’#’) ，所以注释需要独立一行

去掉两端的空格rstrip是后面的，lstrip是前面的

with open(path, 'r') as f:
        lines = f.read().split('\n')
    lines = [x for x in lines if x and not x.startswith('#')]
    lines = [x.rstrip().lstrip() for x in lines]  # get rid of fringe whitespaces

3.3 模型定义

在看他如何处理之前，可以先看看cfg文件怎么写的

cfg文件,利用空行和注释隔开

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=2
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

按行处理，读取一个新的模块，往列表尾部塞入一个字典，去掉[]

mdefs = []  # module definitions
    for line in lines:
        if line.startswith('['):  # This marks the start of a new block
            mdefs.append({})
            # 去掉[]
            mdefs[-1]['type'] = line[1:-1].rstrip()

如果是卷积块，需要进行batch normalize，似乎很重要，后面来看看

if mdefs[-1]['type'] == 'convolutional':
                # 卷积层一定要有batch_normalize?这里是先占坑？防止定义时忘了？因为如果有的话，会被重写的。
                # pre-populate with zeros (may be overwritten later)
                mdefs[-1]['batch_normalize'] = 0

这个是对每个模块内的内容进行处理，anchor这里要注意一下，一般只出现在yolo层，样子如下

[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=7
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

读取之后，需要两个为一组，所以用reshape

else:
            key, val = line.split("=")
            key = key.rstrip()
            if key == 'anchors':  # return nparray
                mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2))  # np anchors

剩下的就利用对应的类型保存相应的层

elif (key in ['from', 'layers', 'mask']) or (key == 'size' and ',' in val):  # return array
                mdefs[-1][key] = [int(x) for x in val.split(',')]
            else:
                val = val.strip()
                # TODO: .isnumeric() actually fails to get the float case
                if val.isnumeric():  # return int or float
                    mdefs[-1][key] = int(val) if (int(val) - float(val)) == 0 else float(val)
                else:
                    mdefs[-1][key] = val  # return string

检查是否有不支持的模块，有的话就不行哦

# Check all fields are supported
    supported = ['type', 'batch_normalize', 'filters', 'size', 'stride', 'pad', 'activation', 'layers', 'groups',
                 'from', 'mask', 'anchors', 'classes', 'num', 'jitter', 'ignore_thresh', 'truth_thresh', 'random',
                 'stride_x', 'stride_y', 'weights_type', 'weights_normalization', 'scale_x_y', 'beta_nms', 'nms_kind',
                 'iou_loss', 'iou_normalizer', 'cls_normalizer', 'iou_thresh', 'probability']

    f = []  # fields

这里是从1，也就是列表的第二个开始，因为cfg文件第一个是net，不在supported里面

for x in mdefs[1:]:
        [f.append(k) for k in x if k not in f]
    u = [x for x in f if x not in supported]  # unsupported fields
    assert not any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path)

    return mdefs

4 create_modules()

4.1 基本单元

利用前面得到的模型定义，我们利用create_modules来构建module_list，这里需要对nn.ModuleList()，nn.Sequential()有一定的了解。先参考以下上面的链接，这里简单说下。

nn.Sequential()相当于定义一个按顺序执行的小模块
nn.ModuleList()相当于定义了一个列表，用来存放各个小模块，和list差不多，但是方便参数的填充

YOLO代码解读_model.py1 概述2 导入库文件3 parse_model_cfg()4 create_modules()5. YOLOLayer()6 Darknet类7 其他函数
因为研究的是YOLOv3，所以我们对卷积层，route，shortcut和YOL;O四个做一下仔细的分析

4.2 YOLO网络结构的基本了解

大致的网络结构如下所示，具体介绍可以参考这个链接1,这个是对残差层有一些更具体的介绍。

YOLO代码解读_model.py1 概述2 导入库文件3 parse_model_cfg()4 create_modules()5. YOLOLayer()6 Darknet类7 其他函数

相关维度的计算后面要了解一下

4.3 convolutional

卷积比较简单，具体看代码注释

if mdef['type'] == 'convolutional':
     bn = mdef['batch_normalize']
     filters = mdef['filters'] # 卷积核的数目，几个卷积核，输出的深度就是多少
     k = mdef['size']  # kernel size 卷积核的大小
     stride = mdef['stride'] if 'stride' in mdef else (mdef['stride_y'], mdef['stride_x'])# stride 或是xy步长不一样
     if isinstance(k, int):  # single-size conv
         # 添加单种类型的卷积核
         modules.add_module('Conv2d', nn.Conv2d(in_channels=output_filters[-1],
                                                out_channels=filters,
                                                kernel_size=k,
                                                stride=stride,
                                                padding=k // 2 if mdef['pad'] else 0,
                                                groups=mdef['groups'] if 'groups' in mdef else 1,
                                                bias=not bn))
     else:  # multiple-size conv
         # 添加混合卷积核
         modules.add_module('MixConv2d', MixConv2d(in_ch=output_filters[-1],
                                                   out_ch=filters,
                                                   k=k,
                                                   stride=stride,
                                                   bias=not bn))

接下来是一些正则化和激活函数，这里要特别注意，如果卷积层没有batchnormalize，则说明下一层是yolo层，所以进行了记录，这里记录了对应层的index

# 如果卷积层没有batchnormalize，则说明下一层是yolo层，所以进行了记录，这里记录了对应层的index
       if bn:
           modules.add_module('BatchNorm2d', nn.BatchNorm2d(filters, momentum=0.03, eps=1E-4))
       else:
           routs.append(i)  # detection output (goes into yolo layer)

       # 不同类型的激活函数
       if mdef['activation'] == 'leaky':  # activation study https://github.com/ultralytics/yolov3/issues/441
           modules.add_module('activation', nn.LeakyReLU(0.1, inplace=True))
       elif mdef['activation'] == 'swish':
           modules.add_module('activation', Swish())
       elif mdef['activation'] == 'mish':
           modules.add_module('activation', Mish())

4.4 Upsample

上采样据说是将图片放大，具体的算法目前还是不了解，在最开始，这和ONNX_EXPORT设为false，所以只要看else的内容，一般是放大两倍。

elif mdef['type'] == 'upsample':
            if ONNX_EXPORT:  # explicitly state size, avoid scale_factor
                g = (yolo_index + 1) * 2 / 32  # gain
                modules = nn.Upsample(size=tuple(int(x * g) for x in img_size))  # img_size = (320, 192)
            else:
                modules = nn.Upsample(scale_factor=mdef['stride'])

4.5 route

该模块有两种类型，如下

[route]
layers = -4

[route]
layers = -1, 61

当这一层的layers属性只有一层时，他会输出该值索引的特征图，例如-1，表示上一层
当两个参数时，则输出这两个层对应特征图的深度连接结果【那这些特征图的大小应该相同哦】

代码的具体解释都在注释里面

elif mdef['type'] == 'route':  # nn.Sequential() placeholder for 'route' layer
     layers = mdef['layers']
     # 每一层的输出时append到output_filters里面的，所以-1直接取最后一个，
     # 从头数的话，因为最开始时是3通道【这个代表输入】，所以要+1
     filters = sum([output_filters[l + 1 if l > 0 else l] for l in layers])
     # 记录是哪一层，这个不知道后面怎么用的
     routs.extend([i + l if l < 0 else l for l in layers])
     # 这里就是特征图拼接
     modules = FeatureConcat(layers=layers)

4.6 shortcut层

这一层的作用类似与残差网络，是特征图相加，【是否保证了特征图的大小相同】

[shortcut]
from=-3
activation=linear

例如上面的模块的作用就是将前一层和前三层的特征图相加

这个activation在代码中似乎没有用到

具体的解释看代码注释

elif mdef['type'] == 'shortcut':  # nn.Sequential() placeholder for 'shortcut' layer
     layers = mdef['from']
     # 因为是想加，所以直接采用上一层的输出大小即可
     filters = output_filters[-1]
     # 记录这两个相加的层
     routs.extend([i + l if l < 0 else l for l in layers])
     # 权重混合
     modules = WeightedFeatureFusion(layers=layers, weight='weights_type' in mdef)

4.7 YOLO层

mask代表这个YOLO层选取那些anchor box，

anchors，代表这次实验选取的anchor boxes

classes 代表有几种类别

其他参数还不是很了解

[yolo]
mask = 3,4,5
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

构造YOLO层，并对前面卷积层进行参数的初始化

elif mdef['type'] == 'yolo':
  	# yolo层数加一
      yolo_index += 1
      stride = [32, 16, 8]  # P5, P4, P3 strides
      # 这些需要倒序？
      if any(x in cfg for x in ['panet', 'yolov4', 'cd53']):  # stride order reversed
          stride = list(reversed(stride))
      layers = mdef['from'] if 'from' in mdef else []
      # 构造yolo模块
      modules = YOLOLayer(anchors=mdef['anchors'][mdef['mask']],  # anchor list
                          nc=mdef['classes'],  # number of classes
                          img_size=img_size,  # (416, 416)
                          yolo_index=yolo_index,  # 0, 1, 2...
                          layers=layers,  # output layers
                          stride=stride[yolo_index])
      # 对前面的卷积层的bias进行自动初始化？
      # Initialize preceding Conv2d() bias (https://arxiv.org/pdf/1708.02002.pdf section 3.3)
      try:
          j = layers[yolo_index] if 'from' in mdef else -1
          # If previous layer is a dropout layer, get the one before
          if module_list[j].__class__.__name__ == 'Dropout':
              j -= 1
          bias_ = module_list[j][0].bias  # shape(255,)
          bias = bias_[:modules.no * modules.na].view(modules.na, -1)  # shape(3,85)
          bias[:, 4] += -4.5  # obj
          bias[:, 5:] += math.log(0.6 / (modules.nc - 0.99))  # cls (sigmoid(p) = 1/nc)
          module_list[j][0].bias = torch.nn.Parameter(bias_, requires_grad=bias_.requires_grad)
      except:
          print('WARNING: smart bias initialization failure.')

4.8 记录返回

将模块添加到列表

# Register module list and number of output filters
        # 将模块添加到列表里面，同时记录输出维度
        module_list.append(modules)
        output_filters.append(filters)

记录一下那些层有rout

routs_binary = [False] * (i + 1)
    for i in routs:
        routs_binary[i] = True
    return module_list, routs_binary

5. YOLOLayer()

在create_module里面调用了类YOLOLayer，接下来来好好研究这个类

先来看看怎么调用这个类

首先传入用到的anchors，种类数，图片大小，第几层yolo，layer似乎是空的，stride在v3中依次是32，16，8

modules = YOLOLayer(anchors=mdef['anchors'][mdef['mask']],  # anchor list
                    nc=mdef['classes'],  # number of classes
                    img_size=img_size,  # (416, 416)
                    yolo_index=yolo_index,  # 0, 1, 2...
                    layers=layers,  # output layers
                    stride=stride[yolo_index])

5.1 构造函数

具体看注释

def __init__(self, anchors, nc, img_size, yolo_index, layers, stride):
      super(YOLOLayer, self).__init__()
      self.anchors = torch.Tensor(anchors) # 将anchor转换成tensor
      self.index = yolo_index  # index of this layer in layers
      self.layers = layers  # model output layer indices
      self.stride = stride  # layer stride，这个stride具体怎么用很重要
      self.nl = len(layers)  # number of output layers (3) 输出层数
      self.na = len(anchors)  # number of anchors (3) anchors的数量
      self.nc = nc  # number of classes (80) 类别数目
      # 输出的维度=类别+5，5是长，宽，x,y,置信度【如果要修改，这里应该修改】
      self.no = nc + 5  # number of outputs (85) 
      # 初始化一些变量ng，有几个网格吗？
      self.nx, self.ny, self.ng = 0, 0, 0  # initialize number of x, y gridpoints
      # 对anchor的尺寸进行缩放，stride应该是指每个网格多大
      self.anchor_vec = self.anchors / self.stride
      self.anchor_wh = self.anchor_vec.view(1, self.na, 1, 1, 2)

      # 这个似乎没用到
      if ONNX_EXPORT:
          self.training = False
          self.create_grids((img_size[1] // stride, img_size[0] // stride))  # number x, y grid points

5.2 前馈函数

前馈函数的输入是p和out，p是上一个网络输出,out还不清楚是啥，主要是用在ASFF，先不管

这里面有很多if判断，目前很多是false，所以我们直接关注关键的部分，

训练的部分

在这里对张量的维度进行了重构，同时没有做其他过多的处理，需要看一下前一层的输出和损失函数怎么处理的

# p.view(bs, 255, 13, 13) -- > (bs, 3, 13, 13, 85)  # (bs, anchors, grid, grid, classes + xywh)
  p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous()  # prediction

  if self.training:
      return p

不是训练的部分，可以看到训练是直接输出结果，而测试则有将结果按原文处理，这个就涉及到我们对比的真实值是如何的了。

else:  # inference
     io = p.clone()  # inference output
     io[..., :2] = torch.sigmoid(io[..., :2]) + self.grid  # xy
     io[..., 2:4] = torch.exp(io[..., 2:4]) * self.anchor_wh  # wh yolo method
     io[..., :4] *= self.stride
     torch.sigmoid_(io[..., 4:])
     return io.view(bs, -1, self.no), p  # view [1, 3, 13, 13, 85] as [1, 507, 85]

6 Darknet类

这个类是属于这个文件的主要内容，其他的方法，属性都是为了他准备

6.1 构造函数

调用前面的方法，完成基本的构造，后面的version，seen，info不太清楚

def __init__(self, cfg, img_size=(416, 416), verbose=False):
        super(Darknet, self).__init__()

        self.module_defs = parse_model_cfg(cfg)# 加载模型定义
        self.module_list, self.routs = create_modules(self.module_defs, img_size, cfg)#构建模型列表
        self.yolo_layers = get_yolo_layers(self) # 获取yolo在第几层
        # torch_utils.initialize_weights(self)

        # Darknet Header https://github.com/AlexeyAB/darknet/issues/2914#issuecomment-496675346
        self.version = np.array([0, 2, 5], dtype=np.int32)  # (int32) version info: major, minor, revision
        self.seen = np.array([0], dtype=np.int64)  # (int64) number of images seen during training
        self.info(verbose) if not ONNX_EXPORT else None  # print model description

6.2 forward函数

这里面有两个函数，一个是forward，一个是forward_once

在forward函数中，主要是判断是否使用图像增强的技术，增强的话会对图像做一些缩放，旋转

def forward(self, x, augment=False, verbose=False):
        # 判断是否需要使用图像增强的技术
        if not augment:
            return self.forward_once(x)
        else:  # Augment images (inference and test only) https://github.com/ultralytics/yolov3/issues/931
            # 对图像做一些缩放和旋转，原图加上三种变换
            img_size = x.shape[-2:]  # height, width
            s = [0.83, 0.67]  # scales
            y = []
            for i, xi in enumerate((x,
                                    torch_utils.scale_img(x.flip(3), s[0], same_shape=False),  # flip-lr and scale
                                    torch_utils.scale_img(x, s[1], same_shape=False),  # scale
                                    )):
                # cv2.imwrite('img%g.jpg' % i, 255 * xi[0].numpy().transpose((1, 2, 0))[:, :, ::-1])
                y.append(self.forward_once(xi)[0])

            # 这里应该对输出进行相应处理，保证相同
            y[1][..., :4] /= s[0]  # scale
            y[1][..., 0] = img_size[1] - y[1][..., 0]  # flip lr
            y[2][..., :4] /= s[1]  # scale

            y = torch.cat(y, 1)
            return y, None

不管缩不缩放，对每一张图片都要调用forward_once()

def forward_once(self, x, augment=False, verbose=False):
        img_size = x.shape[-2:]  # height, width
        yolo_out, out = [], []
        if verbose:# 如果为true的话，就会输出很多东西，verbose是冗长的意思
            print('0', x.shape)
            str = ''

在测试集中，也会做一些数据增强

# Augment images (inference and test only)
        # 这个增强似乎怪怪的，只用在测试中？
        if augment:  # https://github.com/ultralytics/yolov3/issues/931
            nb = x.shape[0]  # batch size
            s = [0.83, 0.67]  # scales
            x = torch.cat((x,
                           torch_utils.scale_img(x.flip(3), s[0]),  # flip-lr and scale
                           torch_utils.scale_img(x, s[1]),  # scale
                           ), 0)

主要的处理部分，用verbose控制输出

for i, module in enumerate(self.module_list):
            name = module.__class__.__name__
            if name in ['WeightedFeatureFusion', 'FeatureConcat']:  # sum, concat
                if verbose:# 如果为true的话，就会输出很多东西，verbose是冗长的意思
                    l = [i - 1] + module.layers  # layers
                    sh = [list(x.shape)] + [list(out[i].shape) for i in module.layers]  # shapes
                    str = ' >> ' + ' + '.join(['layer %g %s' % x for x in zip(l, sh)])
                x = module(x, out)  # WeightedFeatureFusion(), FeatureConcat()
            elif name == 'YOLOLayer':
                # 到了yolo层就会输出，为什么需要传个out进去呢
                yolo_out.append(module(x, out))
            else:  # run module directly, i.e. mtype = 'convolutional', 'upsample', 'maxpool', 'batchnorm2d' etc.
                x = module(x)
            # 这个out还是糊里糊涂
            out.append(x if self.routs[i] else [])
            if verbose:# 如果为true的话，就会输出很多东西，verbose是冗长的意思
                print('%g/%g %s -' % (i, len(self.module_list), name), list(x.shape), str)
                str = ''

如果是train，直接返回

if self.training:  # train
            return yolo_out
        # 这个export至今没搞懂
        elif ONNX_EXPORT:  # export
            x = [torch.cat(x, 0) for x in zip(*yolo_out)]
            return x[0], torch.cat(x[1:3], 1)  # scores, boxes: 3780x80, 3780x4

如果是test，因为输出是

所以还要进行处理，先将输出拆成两部分，在对x进行拼接，判断是否处理数据增强，再输出

else:  # inference or test
        	# 将输出拆成两部分
            x, p = zip(*yolo_out)  # inference output, training output
            # 这个是将x里面的tensor进行拼接，1代表按列拼接
            x = torch.cat(x, 1)  # cat yolo outputs
            if augment:  # de-augment results
                x = torch.split(x, nb, dim=0)
                x[1][..., :4] /= s[0]  # scale
                x[1][..., 0] = img_size[1] - x[1][..., 0]  # flip lr
                x[2][..., :4] /= s[1]  # scale
                x = torch.cat(x, 1)
            return x, p

6.3 fuse和info

fuse主要作用是对模型中的Conv2d 和 BatchNorm2d进行融合
info是对模型的相关信息进行输出

7 其他函数

load_darknet_weights 加载模型的权重
save_weights 保存权重
convert 权重文件格式转换 pt与weight
attempt_download 尝试下载不存在的模型权重