Pytorch實作yolov3（train）訓練代碼詳解（二）

上篇我們講解了如何進行資料預處理，讀取資料。接下來我們一起分析yolov3訓練過程與training procedure。想真正讀懂這個部分，要對inference部分有所了解。

資料加載

def train(
        cfg,
        data_cfg,
        img_size=416,
        resume=False,
        epochs=273,  # 500200 batches at bs 64, dataset length 117263
        batch_size=16,
        accumulate=1,
        multi_scale=False,
        freeze_backbone=False,
        transfer=False  # Transfer learning (train only YOLO layers)
):
    init_seeds()
    weights = 'weights' + os.sep   #python是跨平台的。在Windows上，檔案的路徑分隔符是'\'，在Linux上是'/'。
    latest = weights + 'latest.pt'
    best = weights + 'best.pt'
    device = torch_utils.select_device()

    # Configure run
    train_path = parse_data_cfg(data_cfg)['train']  #data_cfg='data/coco.data' train=../coco/trainvalno5k.txt

    # Initialize model
    model = Darknet(cfg, img_size).to(device)

    # Optimizer
    optimizer = optim.SGD(model.parameters(), lr=hyp['lr0'], momentum=hyp['momentum'], weight_decay=hyp['weight_decay'])

    cutoff = -1  # backbone reaches to cutoff layer
    start_epoch = 0
    best_loss = float('inf')
    nf = int(model.module_defs[model.yolo_layers[0] - 1]['filters'])  # yolo layer size (i.e. 255)
    if resume:  # Load previously saved model
        if transfer:  # Transfer learning
            chkpt = torch.load(weights + 'yolov3.pt', map_location=device)
            model.load_state_dict({k: v for k, v in chkpt['model'].items() if v.numel() > 1 and v.shape[0] != 255},
                                  strict=False)
            for p in model.parameters():
                p.requires_grad = True if p.shape[0] == nf else False

        else:  # resume from latest.pt
            chkpt = torch.load(latest, map_location=device)  # load checkpoint
            model.load_state_dict(chkpt['model'])

        start_epoch = chkpt['epoch'] + 1
        if chkpt['optimizer'] is not None:
            optimizer.load_state_dict(chkpt['optimizer'])
            best_loss = chkpt['best_loss']
        del chkpt

    else:  # Initialize model with backbone (optional)
        if '-tiny.cfg' in cfg:
            cutoff = load_darknet_weights(model, weights + 'yolov3-tiny.conv.15')
        else:
            cutoff = load_darknet_weights(model, weights + 'darknet53.conv.74')

    lf = lambda x: 1 - 10 ** (hyp['lrf'] * (1 - x / epochs))  # inv exp ramp to lr0 * 1e-2
    scheduler = optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lf, last_epoch=start_epoch - 1)

    dataset = LoadImagesAndLabels(train_path, img_size=img_size, augment=True)
   # dataset.__getitem__(0)
    # Initialize distributed training
    if torch.cuda.device_count() > 1:
        dist.init_process_group(backend=opt.backend, init_method=opt.dist_url, world_size=opt.world_size, rank=opt.rank)
        model = torch.nn.parallel.DistributedDataParallel(model)
        sampler = torch.utils.data.distributed.DistributedSampler(dataset)
    else:
        sampler = None

    # Dataloader
    dataloader = DataLoader(dataset,
                            batch_size=batch_size,
                            num_workers=opt.num_workers,
                            shuffle=True,
                            pin_memory=True,
                            collate_fn=dataset.collate_fn,
                            sampler=sampler)

這一段代碼實作了上一篇講的資料加載過程，以及路徑配置和模式選擇，具體我不贅述，如果想了解的話可以在評論問我。

train以及loss

for epoch in range(start_epoch, epochs):
        model.train()
        print(('\n%8s%12s' + '%10s' * 7) % ('Epoch', 'Batch', 'xy', 'wh', 'conf', 'cls', 'total', 'nTargets', 'time'))

        # Update scheduler
        scheduler.step()

        # Freeze backbone at epoch 0, unfreeze at epoch 1
        if freeze_backbone and epoch < 2:
            for name, p in model.named_parameters():
                if int(name.split('.')[1]) < cutoff:  # if layer < 75
                    p.requires_grad = False if epoch == 0 else True

        mloss = torch.zeros(5).to(device)  # mean losses
        for i, (imgs, targets, _, _) in enumerate(dataloader):
            imgs = imgs.to(device)
            targets = targets.to(device)
            nt = len(targets)
            # if nt == 0:  # if no targets continue
            #     continue

            # Plot images with bounding boxes
            if epoch == 0 and i == 0:
                plot_images(imgs=imgs, targets=targets, fname='train_batch0.jpg')

            # SGD burn-in
            if epoch == 0 and i <= n_burnin:
                lr = hyp['lr0'] * (i / n_burnin) ** 4
                for x in optimizer.param_groups:
                    x['lr'] = lr

            # Run model
            pred = model(imgs)

            # Compute loss
            loss, loss_items = compute_loss(pred, targets, model)

train的過程：

1：設定epoch參數，它決定了所有資料所需要訓練的輪數。

2：進入epoch的for循環後，講model設定為train，然後for i, (imgs, targets, _, _) in enumerate(dataloader):擷取資料預處理後的資料和labels，這裡要注意資料和labels都resize成416*416了（與txt中的不同）。

3：将取出的資料imgs傳入model中，model就是yolov3的darknet，它有3個yolo層，每個yolo層都會輸出一個特征映射圖（dimention如(bs, 3, 13, 13, 85)）bs=batch_size,3指每一個像素點存在3種anchor，13*13是它的次元，85=xywh+conf+classes。

class YOLOLayer(nn.Module):#x = module[0](x, img_size)
    def __init__(self, anchors, nc, img_size, yolo_layer, cfg):
        super(YOLOLayer, self).__init__()

        self.anchors = torch.Tensor(anchors)
        self.na = len(anchors)  # number of anchors (3)
        self.nc = nc  # number of classes (80)
        self.img_size = 0

        if ONNX_EXPORT:  # grids must be computed in __init__
            stride = [32, 16, 8][yolo_layer]  # stride of this layer
            if cfg.endswith('yolov3-tiny.cfg'):
                stride *= 2

            ng = (int(img_size[0] / stride), int(img_size[1] / stride))  # number grid points
            create_grids(self, max(img_size), ng)

    def forward(self, p, img_size, var=None):
        if ONNX_EXPORT:
            bs = 1  # batch size
        else:
            bs, nx, ny = p.shape[0], p.shape[-2], p.shape[-1]

            if self.img_size != img_size:
                create_grids(self, img_size, (nx, ny), p.device)
        # p.view(bs, 255, 13, 13) -- > (bs, 3, 13, 13, 85)  # (bs, anchors, grid, grid, classes + xywh)
        p = p.view(bs, self.na, self.nc + 5, self.nx, self.ny).permute(0, 1, 3, 4, 2).contiguous()  # prediction
        if self.training:
            return p

4：loss, loss_items = compute_loss(pred, targets, model)，三個yolo層的輸出最終與labels産生的targets運算得到loss。

5：用設定好的optimizer對loss進行BP。

6：儲存最終的參數。

loss的構成：

首先，我從yolov3的算法思想講起，讓大家對整體思路有所了解，再具體到代碼層面，這樣大家可以感受代碼複現算法的過程，進而真正了解yolov3。這對大家再相應背景下訓練自己的資料也好，改造yolov3也好，都有直接的幫助。

yolov3算法思想：首先設計darknet，darknet是resnet的變形，在imagenet資料集上進行訓練。然後去除最後的全連接配接，并在模型上增加了三個次元13，26，52的輸出，即yolo層（為了增加小模型的檢測精度），再用coco資料集進行微調。coco資料集的标簽包括了class與ground truth（xywh）。loss由四個部分組成lxy，lwh，lcls，lconf組成（代碼中會解釋每個成分）。

這裡就有個重要的疑問了，一個尺度的feature map有三個anchors，那麼對于某個ground truth框，究竟是哪個anchor負責比對它呢？和YOLOv1一樣，對于訓練圖檔中的ground truth，若其中心點落在某個cell内，那麼該cell内的3個anchor box負責預測它，具體是哪個anchor box預測它，需要在訓練中确定，即由那個與ground truth的IOU最大的anchor box預測它，而剩餘的2個anchor box不與該ground truth比對。YOLOv3需要假定每個cell至多含有一個grounth truth，而在實際上基本不會出現多于1個的情況。與ground truth比對的anchor box計算坐标誤差、置信度誤差（此時target為1）以及分類誤差，而其它的anchor box隻計算置信度誤差（此時target為0）。

def compute_loss(p, targets, model):  # predictions, targets, model
    ft = torch.cuda.FloatTensor if p[0].is_cuda else torch.Tensor
    lxy, lwh, lcls, lconf = ft([0]), ft([0]), ft([0]), ft([0])
    txy, twh, tcls, indices = build_targets(model, targets)#在13 26 52次元中找到大于iou門檻值最适合的anchor box 作為targets
    #txy[次元(0:2),(x,y)] twh[次元(0:2),(w,h)] indices=[0,anchor索引，gi，gj]

    # Define criteria
    MSE = nn.MSELoss()
    CE = nn.CrossEntropyLoss()
    BCE = nn.BCEWithLogitsLoss()

    # Compute losses
    h = model.hyp  # hyperparameters
    bs = p[0].shape[0]  # batch size
    k = h['k'] * bs  # loss gain
    for i, pi0 in enumerate(p):  # layer i predictions, i
        b, a, gj, gi = indices[i]  # image, anchor, gridx, gridy
        tconf = torch.zeros_like(pi0[..., 0])  # conf


        # Compute losses
        if len(b):  # number of targets
            pi = pi0[b, a, gj, gi]  # predictions closest to anchors 找到p中與targets對應的資料lxy
            tconf[b, a, gj, gi] = 1  # conf
            # pi[..., 2:4] = torch.sigmoid(pi[..., 2:4])  # wh power loss (uncomment)

            lxy += (k * h['xy']) * MSE(torch.sigmoid(pi[..., 0:2]), txy[i])  # xy loss
            lwh += (k * h['wh']) * MSE(pi[..., 2:4], twh[i])  # wh yolo loss
            lcls += (k * h['cls']) * CE(pi[..., 5:], tcls[i])  # class_conf loss

        # pos_weight = ft([gp[i] / min(gp) * 4.])
        # BCE = nn.BCEWithLogitsLoss(pos_weight=pos_weight)
        lconf += (k * h['conf']) * BCE(pi0[..., 4], tconf)  # obj_conf loss
    loss = lxy + lwh + lconf + lcls

    return loss, torch.cat((lxy, lwh, lconf, lcls, loss)).detach()

下面我們具體看看代碼。

txy, twh, tcls, indices = build_targets(model, targets)，build_targets函數傳回了我們需要參數，我們看看這個函數具體再操作什麼。

def build_targets(model, targets):
    # targets = [image, class, x(歸一後的中心), y, w（歸一後的寬）, h] [ 0.00000, 20.00000,  0.72913,  0.48770,  0.13595,  0.08381]
    iou_thres = model.hyp['iou_t']  # hyperparameter
    if type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel):
        model = model.module

    nt = len(targets)
    txy, twh, tcls, indices = [], [], [], []
    for i in model.yolo_layers:
        layer = model.module_list[i][0]
        #layer->YOLOLayer()
        # iou of targets-anchors
        t, a = targets, []
        gwh = targets[:, 4:6] * layer.ng  #layer.ng就是yolo層輸出次元13  26  52，gwh将原來的wh還原到13*13的圖上
        if nt:
            iou = [wh_iou(x, gwh) for x in layer.anchor_vec] #anchor_vec是anchor box的wh

            iou, a = torch.stack(iou, 0).max(0)  # best iou and anchor找到每一層與label->wh，iou最大的anchor,a是anchor的索引
            # reject below threshold ious (OPTIONAL, increases P, lowers R)
            reject = True
            if reject:
                j = iou > iou_thres
                t, a, gwh = targets[j], a[j], gwh[j]


        # Indices targets = [image, class, x(歸一後的中心), y, w（歸一後的寬）, h]
        b, c = t[:, :2].long().t()  # target image, class
        gxy = t[:, 2:4] * layer.ng
        gi, gj = gxy.long().t()  # grid_i, grid_j
        indices.append((b, a, gj, gi))
        # XY coordinates
        txy.append(gxy - gxy.floor())#在yolov3裡是Gx,Gy減去grid cell左上角坐标Cx,Cy
        # Width and height
        twh.append(torch.log(gwh / layer.anchor_vec[a]))  # wh yolo method
        # twh.append((gwh / layer.anchor_vec[a]) ** (1 / 3) / 2)  # wh power method
        # Class
        tcls.append(c)

        if c.shape[0]:
            assert c.max() <= layer.nc, 'Target classes exceed model classes'

    return txy, twh, tcls, indices

build_targets(model, targets)需要兩個參數，model是模型，targets是從labels讀取resize後的标簽參數，具體的targets = [image, class, x(歸一後的中心), y, w（歸一後的寬）, h]，iou_thres是我們需要設定的超參數，作用我們後面會講。

for i in model.yolo_layers:我們知道有3個yolo層，i是層數的索引，targets[:, 4:6]是ground truth的寬和高，是以gwh将原來的wh還原到13*13的特征圖上。

iou = [wh_iou(x, gwh) for x in layer.anchor_vec]，anchor_vec是anchor box的wh，我們求出每一層三個anchor box與ground truth的iou，然後找到iou最大的anchor box，用a來記錄它的索引，删去小于iou_thres的anchor。用b記錄imges，c記錄calss類别，gxy記錄對應feature map的中心點，gi，gj記錄哪個格子負責這個ground truth。

Pytorch實作yolov3（train）訓練代碼詳解（二）

txy=gxy - gxy.floor()，在yolov3裡是Gx,Gy減去grid cell左上角坐标Cx,Cy得到txy。

twh=torch.log(gwh / layer.anchor_vec[a])對應上面公式反推過來的。txy，twh是我們loss的标簽，它是真正坐标的offset，是以我們bp優化得到的ixy還需要做相應的變換才能得到真正的box。

tcls=c，c就是calss類别。這樣我們就了解build_targets傳回的标簽參數意義了。

for i, pi0 in enumerate(p):  # layer i predictions, i
        b, a, gj, gi = indices[i]  # image, anchor, gridx, gridy
        tconf = torch.zeros_like(pi0[..., 0])  # conf


        # Compute losses
        if len(b):  # number of targets
            pi = pi0[b, a, gj, gi]  # predictions closest to anchors 找到p中與targets對應的資料lxy
            tconf[b, a, gj, gi] = 1  # conf
            # pi[..., 2:4] = torch.sigmoid(pi[..., 2:4])  # wh power loss (uncomment)

            lxy += (k * h['xy']) * MSE(torch.sigmoid(pi[..., 0:2]), txy[i])  # xy loss
            lwh += (k * h['wh']) * MSE(pi[..., 2:4], twh[i])  # wh yolo loss
            lcls += (k * h['cls']) * CE(pi[..., 5:], tcls[i])  # class_conf loss

        # pos_weight = ft([gp[i] / min(gp) * 4.])
        # BCE = nn.BCEWithLogitsLoss(pos_weight=pos_weight)
        lconf += (k * h['conf']) * BCE(pi0[..., 4], tconf)  # obj_conf loss
    loss = lxy + lwh + lconf + lcls

for i, pi0 in enumerate( p ):p是我們yolo層傳回的特征圖，i=0時，p=[[bs,3,13,13,85],[bs,3,26,26,85],[bs,3,52,52,85]].

b, a, gj, gi = indices[i]，從傳回的标簽中讀取最佳anchor，與格子坐标，因為我們知道每個ground truth隻對應一個anchor box，這些索引就是為了找到那個anchor box。

pi = pi0[b, a, gj, gi]，pi0=（b，3，13，13，85）找到pi0中對應ground truth的資料。

tconf[b, a, gj, gi] = 1 ，将對應位置的confience設定為1，其餘都為0。

lxy += (k * h[‘xy’]) * MSE(torch.sigmoid(pi[…, 0:2]), txy[i])，與ground truth比對的anchor box計算坐标誤差、置信度誤差（此時target為1）以及分類誤差，而其它的anchor box隻計算置信度誤差（此時target為0）。這裡pi隻記錄了與ground truth比對的anchor box的資訊。sigmoid是将xy限制在0-1，因為中心坐标落在格子内。MSE就是均方差。

lwh += (k * h[‘wh’]) * MSE(pi[…, 2:4], twh[i]) 如上。

lcls += (k * h[‘cls’]) * CE(pi[…, 5:], tcls[i])：隻計算與ground truth比對的anchor box的分類誤差。CE是交叉熵。

lconf += (k * h[‘conf’]) * BCE(pi0[…, 4], tconf)。tconf把對應ground truth的置信度設定為1，沒有與ground truth對應都設定為0.

這樣我們就求得了loss，有些地方可能描述不太好，當然也有了解不到位的，歡迎讨論。

Pytorch實作yolov3（train）訓練代碼詳解（二）

繼續閱讀

簡單文檔分類——樸素貝葉斯算法樸素貝葉斯算法簡單文檔分類執行個體步驟總結樸素貝葉斯分類調用(sklearn)

考證大全 | 證券從業資格考試

敲黑闆！2021年證券從業考試考點預測

2021年銀行從業考試考情介紹,果斷收藏!

證券從業合格證書什麼時候列印？有哪些注意事項？

【幹貨滿滿】初級銀行從業考試《個人理财》重點梳理

2020年經濟師考試，難嗎？

初級銀行從業資格證有什麼用？

MBA提前面試純幹貨分享

MBA值得學麼

吳恩達logistic回歸實作

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

深度學習模型分析人類複雜疾病的準确性

【趨高機器視覺】機器視覺技術原了解析及解決方案

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡