網絡結構
以coco資料集為例,合計80類。
yolov5在stride為8, 16, 32這三個次元上分别檢測小目标,中目标,大目标。在每個尺度的每個位置上鋪num_anchors個anchor。正常來說,num_anchors=3。
假設輸入圖檔的尺寸為[N, 3, H, W],這三個stride上輸出特征圖的尺寸分别為:
[N, num_anchors, H/8, W/8, num_classes + 5]
[N, num_anchors, H/16, W/16, num_classes + 5]
[N, num_anchors, H/32, W/32, num_classes + 5]
train
訓練的重點是loss計算部分。分為build_targets和compute_loss兩部分。
build_targets
目的是将真值框和anchor之間形成比對關系。
tcls, tbox, indices, anch = [], [], [], []
gain = 0.5
off = torch.tensor([[0, 0], [1, 0], [0, 1], [-1, 0], [0, -1]], device=targets.device).float() * g
gain = torch.ones(7, device=targets.device)
# na: num_anchors, 此處生成anchor index, 尺寸為[na, nt]
ai = torch.arange(na, device=targets.device).float().view(na, 1).repeat(1, nt)
# targets: [nt, 6] => [na, nt, 6]
# ai: [na, nt] => [na, nt, 1], 字尾None表示新增一維。
# 在dim=2的次元上,将兩個tensor cat在一起,變為[na, nt, 7]
# targets中的每個item為[target_id, cls, xc, yc, w, h, anchor_id]
# 這裡的[xc, yc, w, h]還是真值建構時,相對原圖的width和height做過歸一化的數值。
targets = torch.cat((targets.repeat(na, 1, 1), ai[:, :, None]), 2)
for i in range(self.nl): #對每個stride特征圖
anchors = self.anchors[i] # 擷取該stride特征圖上鋪設的anchor尺寸, [num_anchors, 2]
gain[2:6] = torch.tensor(p[i].shape)[[3, 2, 3, 2]] # [1, 1, w, h, w, h, 1]
# Match targets to anchors
t = targets * gain # [na, nt, 7]
if nt:
# Matches
# get wh ratio, [na, nt, 2] / [na, 1, 2]
r = t[:, :, 4:6] / anchors[:, None]
# torch.max是按位比較,取最大值。
# max取特定次元内的最大值,且傳回list,0為最大值,1為最大值對應的序号。
# 該操作的目的是擷取,在尺寸上最适合目前target的anchor配置
j = torch.max(r, 1./r).max(2)[0] < 4.0
t = t[j] # [num_valid, 7]
# get offset
gxy = t[:, 2:4] # object center position
gxi = gain[[2, 3]] - gxy
j, k = ((gxy % 1. < g) & (gxy > 1.)).T
l, m = ((gxi % 1. < g) & (gxi > 1.)).T
j = torch.stack((torch.ones_like(j), j, k, l, m))
t = t.repeat((5, 1, 1))[j]
# [num_valid, 1, 2] -> [num_valid, 5, 2]
# 每個t sample都提供[1, 5, 2]的offset偏置。
# 但隻取j為true的分量。
# 這部分操作的目的,我了解是為了讓每個target盡可能的和周邊的grid形成關聯,提供更多的正樣本。
offsets = (torch.zeros_like(gxy)[None] + off[:, None])[j]
# Define
b, c = t[:, :2].long().T #image, class
gxy = t[:, 2:4]
gwh = t[:, 4:6]
gij = (gxy - offsets).long()
gi, gj = gij.T
# Append
a = [t:, 6].long() # anchor indices
# [target_id, anchor_id, h_id, w_id]
indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1)))
# [cx_offset, cy_offset, w, h]
tbox.append(torch.cat((gxy - gij, gwh), 1))
# assoiated anchors, [num_valid, 2]
anch.append(anchors[a]))
tcls.append(c)
return tcls, tbox, indices, anch
compute_loss
loss分為cls_loss, box_loss, obj_loss三部分。cls_loss用于監督類别分類,box_loss用于監督檢測框的回歸,obj_loss用于監督grid中是否存在物體。loss的計算在每層feature map上進行。
lcls, lbox, lobj = torch.zeros(1, device=device), torch.zeros(1,device=device), torch.zeros(1, device=device)
tcls, tbox, indices, anchors = self.build_targets(p, targets) # targets
for i, pi in enumerate(p):
# b: [num_valid]
# a: [num_valid], index of anchors in current stride feat
# gj: [num_valid]
# gi: [num_valid]
b, a, gj, gi = indices[i]
tobj = torch.zeros_like(pi[..., 0], device=device)
n = b.shape[0]
if n:
ps = pi[b, a, gj, gi]
# Bbox Regression
pxy = ps[:, :2].sigmoid() * 2. - 0.5
pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]
pbox = torch.cat((pxy, pwh), 1)
iou = bbox_iou(pbox.T, tbox[i], x1y1x2y2=False, CIOU=True)
lbox += (1.0 - iou).mean()
# Objectness
tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype) # self.gr = 0
# Classification
if self.nc > 1:
t = torch.full_like(ps[:, 5:], self.cn, device=device)
t[range(n), tcls[i]] = self.cp
lcls += self.BCEcls(ps[:, 5:], t)
lobj += self.BCEobj(pi[..., 4], tobj) * self.balance[i] # obj loss
loss = lbox + lobj + lcls
obj_loss
每個網格是否存在物體,是個二值分類問題。故這裡用BCELoss
self.BCEobj = nn.BCEWithLogitsLoss(pose_weight=1.0)
lobj = self.BCEobj(pi[…, 4], tobj)
cls loss
self.BCEcls = nn.BCEWithLogitsLoss(pose_weight=1.0)
lcls = self.BCEcls(ps[:, 5:], t)
box loss
用CIOU來表征bbox的回歸損失。