天天看點

yolov5網絡結構train

網絡結構

以coco資料集為例,合計80類。

yolov5在stride為8, 16, 32這三個次元上分别檢測小目标,中目标,大目标。在每個尺度的每個位置上鋪num_anchors個anchor。正常來說,num_anchors=3。

假設輸入圖檔的尺寸為[N, 3, H, W],這三個stride上輸出特征圖的尺寸分别為:

[N, num_anchors,  H/8,  W/8, num_classes + 5]
[N, num_anchors, H/16, W/16, num_classes + 5]
[N, num_anchors, H/32, W/32, num_classes + 5]
           

train

訓練的重點是loss計算部分。分為build_targets和compute_loss兩部分。

build_targets

目的是将真值框和anchor之間形成比對關系。

tcls, tbox, indices, anch = [], [], [], []
gain = 0.5
off = torch.tensor([[0, 0], [1, 0], [0, 1], [-1, 0], [0, -1]], device=targets.device).float() * g
gain = torch.ones(7, device=targets.device)
# na: num_anchors, 此處生成anchor index, 尺寸為[na, nt]
ai = torch.arange(na, device=targets.device).float().view(na, 1).repeat(1, nt) 
# targets: [nt, 6] => [na, nt, 6]
# ai: [na, nt] => [na, nt, 1], 字尾None表示新增一維。
# 在dim=2的次元上,将兩個tensor cat在一起,變為[na, nt, 7]
# targets中的每個item為[target_id, cls, xc, yc, w, h, anchor_id]
# 這裡的[xc, yc, w, h]還是真值建構時,相對原圖的width和height做過歸一化的數值。
targets = torch.cat((targets.repeat(na, 1, 1), ai[:, :, None]), 2)
for i in range(self.nl): #對每個stride特征圖
	anchors = self.anchors[i] # 擷取該stride特征圖上鋪設的anchor尺寸, [num_anchors, 2]
	gain[2:6] = torch.tensor(p[i].shape)[[3, 2, 3, 2]] # [1, 1, w, h, w, h, 1]
	# Match targets to anchors
	t = targets * gain	# [na, nt, 7]
	if nt:
		# Matches
		# get wh ratio, [na, nt, 2] / [na, 1, 2]
		r = t[:, :, 4:6] / anchors[:, None] 
		# torch.max是按位比較,取最大值。
		# max取特定次元内的最大值,且傳回list,0為最大值,1為最大值對應的序号。
		# 該操作的目的是擷取,在尺寸上最适合目前target的anchor配置
		j = torch.max(r, 1./r).max(2)[0] < 4.0
		t = t[j] # [num_valid, 7]
		# get offset
		gxy = t[:, 2:4]	# object center position
		gxi = gain[[2, 3]] - gxy
		j, k = ((gxy % 1. < g) & (gxy > 1.)).T
		l, m = ((gxi % 1. < g) & (gxi > 1.)).T
		j = torch.stack((torch.ones_like(j), j, k, l, m))
		t = t.repeat((5, 1, 1))[j]
		# [num_valid, 1, 2] -> [num_valid, 5, 2]
		# 每個t sample都提供[1, 5, 2]的offset偏置。
		# 但隻取j為true的分量。
		# 這部分操作的目的,我了解是為了讓每個target盡可能的和周邊的grid形成關聯,提供更多的正樣本。
		offsets = (torch.zeros_like(gxy)[None] + off[:, None])[j]
		# Define
		b, c = t[:, :2].long().T #image, class
		gxy = t[:, 2:4]
		gwh = t[:, 4:6]
		gij = (gxy - offsets).long()
		gi, gj = gij.T
		# Append
		a = [t:, 6].long() # anchor indices
		# [target_id, anchor_id, h_id, w_id]
		indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1)))
		# [cx_offset, cy_offset, w, h]
		tbox.append(torch.cat((gxy - gij, gwh), 1))
		# assoiated anchors, [num_valid, 2]
		anch.append(anchors[a]))
		tcls.append(c)
	return tcls, tbox, indices, anch

           

compute_loss

loss分為cls_loss, box_loss, obj_loss三部分。cls_loss用于監督類别分類,box_loss用于監督檢測框的回歸,obj_loss用于監督grid中是否存在物體。loss的計算在每層feature map上進行。

lcls, lbox, lobj = torch.zeros(1, device=device), torch.zeros(1,device=device), torch.zeros(1, device=device)
tcls, tbox, indices, anchors = self.build_targets(p, targets)  # targets

for i, pi in enumerate(p):
	#  b: [num_valid]
	#  a: [num_valid], index of anchors in current stride feat
	# gj: [num_valid]
	# gi: [num_valid]
	b, a, gj, gi = indices[i]
	tobj = torch.zeros_like(pi[..., 0], device=device)
	n = b.shape[0]
	if n:
		ps = pi[b, a, gj, gi]
	# Bbox Regression
	pxy = ps[:, :2].sigmoid() * 2. - 0.5
	pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]
	pbox = torch.cat((pxy, pwh), 1)
	iou = bbox_iou(pbox.T, tbox[i], x1y1x2y2=False, CIOU=True)
	lbox += (1.0 - iou).mean()
	# Objectness
	tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype)	# self.gr = 0

	# Classification
	if self.nc > 1:
		t = torch.full_like(ps[:, 5:], self.cn, device=device)
		t[range(n), tcls[i]] = self.cp
		lcls += self.BCEcls(ps[:, 5:], t)

lobj += self.BCEobj(pi[..., 4], tobj) * self.balance[i] # obj loss

loss = lbox + lobj + lcls
           

obj_loss

每個網格是否存在物體,是個二值分類問題。故這裡用BCELoss

self.BCEobj = nn.BCEWithLogitsLoss(pose_weight=1.0)

lobj = self.BCEobj(pi[…, 4], tobj)

cls loss

self.BCEcls = nn.BCEWithLogitsLoss(pose_weight=1.0)

lcls = self.BCEcls(ps[:, 5:], t)

box loss

用CIOU來表征bbox的回歸損失。