超參數進化 Hyperparameter Evolution

前言

yolov5提供了一種超參數優化的方法–Hyperparameter Evolution，即超參數進化。超參數進化是一種利用遺傳算法(GA) 進行超參數優化的方法，我們可以通過該方法選擇更加合适自己的超參數。

提供的預設參數也是通過在COCO資料集上使用超參數進化得來的。由于超參數進化會耗費大量的資源和時間，如果預設參數訓練出來的結果能滿足你的使用，使用預設參數也是不錯的選擇。

ML中的超參數控制訓練的各個方面，找到一組最佳的超參數值可能是一個挑戰。像網格搜尋這樣的傳統方法由于以下原因可能很快變得難以處理：

高次元的搜尋空間；

次元之間未知的相關性；

在每個點上評估fitness的代價很高

由于這些原因使得遺傳算法成為超參數搜尋的合适候選。

1. 初始化超參數

YOLOv5有大約25個用于各種訓練設定的超參數，它們定義在/data目錄下的yaml檔案中。好的初始參數值将産生更好的最終結果，是以在演進之前正确初始化這些值是很重要的。如果有不清楚怎麼初始化，隻需使用預設值，這些值是針對COCO訓練優化得到的。

yolov5/data/hyp.scratch.yaml

# Hyperparameters for COCO training from scratch
# python train.py --batch 40 --cfg yolov5m.yaml --weights '' --data coco.yaml --img 640 --epochs 300
# See tutorials for hyperparameter evolution https://github.com/ultralytics/yolov5#tutorials


lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
lrf: 0.2 # final OneCycleLR learning rate (lr0 * lrf)
momentum: 0.937 # SGD momentum/Adam beta1
weight_decay: 0.0005 # optimizer weight decay 5e-4
warmup_epochs: 3.0 # warmup epochs (fractions ok)
warmup_momentum: 0.8 # warmup initial momentum
warmup_bias_lr: 0.1 # warmup initial bias lr
box: 0.05 # box loss gain
cls: 0.5 # cls loss gain
cls_pw: 1.0 # cls BCELoss positive_weight
obj: 1.0 # obj loss gain (scale with pixels)
obj_pw: 1.0 # obj BCELoss positive_weight
iou_t: 0.20 # IoU training threshold
anchor_t: 4.0 # anchor-multiple threshold
# anchors: 3 # anchors per output layer (0 to ignore)
fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5)
hsv_h: 0.015 # image HSV-Hue augmentation (fraction)
hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)
hsv_v: 0.4 # image HSV-Value augmentation (fraction)
degrees: 0.0 # image rotation (+/- deg)
translate: 0.1 # image translation (+/- fraction)
scale: 0.5 # image scale (+/- gain)
shear: 0.0 # image shear (+/- deg)
perspective: 0.0 # image perspective (+/- fraction), range 0-0.001
flipud: 0.0 # image flip up-down (probability)
fliplr: 0.5 # image flip left-right (probability)
mosaic: 1.0 # image mosaic (probability)
mixup: 0.0 # image mixup (probability)

2. 定義fitness

fitness是我們尋求最大化的值。在YOLOv5中，定義了一個fitness函數對名額進行權重。

yolov5/utils/metrics.py

def fitness(x):
    # Model fitness as a weighted combination of metrics
    w = [0.0, 0.0, 0.1, 0.9] # weights for [P, R, [email protected], [email protected]:0.95]
    return (x[:, :4] * w).sum(1)

3. 進化

使用預訓練的yolov5s對COCO128進行微調

python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --cache

基于這個場景進行超參數進化選擇，通過使用參數--evolve：

# Single-GPU
python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --cache --evolve

# Multi-GPU
for i in 0 1 2 3; do
  nohup python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --cache --evolve --device $i > evolve_gpu_$i.log &
done

# 其中多GPU運作時的`nohub`是`no hang up`（不挂起），用于在系統背景不挂斷地運作指令，退出終端不會影響程式的運作。
# `&`符号的用途：在背景運作。
# 一般兩個一起用`nohup command &`。

# 檢視程序：
ps -aux|grep train.py

# #終止程序：
kill -9 程序号

代碼中預設進化設定将運作基本場景300次，即300代

yolov5/train.py

for _ in range(300): # generations to evolve

主要的遺傳操作是交叉和變異。在這項工作中，使用了90%的機率和0.04的方差的變異，以所有前幾代最好的父母的組合來創造新的後代。結果記錄在yolov5/evolve.txt，fitness最高的後代儲存在yolov5/runs/evolve/hyp_evolved.yaml

4. 可視化

結果被儲存在yolov5/evolve.png，每個超參數一個圖表。超參數的值在x軸上，fitness在y軸上。黃色表示濃度較高。垂直線表示某個參數已被固定，且不會發生變化。這是使用者在train.py上可選擇的meta字典，這對于固定參數和防止它們進化是很有用的。

報錯問題

報錯1：KeyError: ‘anchors’ ：

issues/2485

issues/1411

pull/1135

I think commenting the same field in the meta dictionary can work… yes that should work, it will act as if the field does not exist at all. Anchor count will be fixed at 3, and autoanchor will be run if the Best Possible Recall (BPR) dips below threshold, which is set at 0.98 at the moment. Varying the hyps can cause your BPR to vary, so its possible some generations may use it and other not. - - glenn-jocher

EDIT: BTW the reason there are two dictionaries is that the meta dictionary contains gains and bounds applied to each hyperparameter during evolution as key: [gain, lower_bound, upper_bound]. meta is only ever used during evolution, I kept it separated to avoid complicating the hyp dictionary, again not sure if that’s the best design choice, we could merge them, but then each hyp.yaml would be busier and more complicated to read. - - glenn-jocher

原因是data/hyp.scratch.yaml裡面的anchors被注釋掉，取消注釋繼續運作，出現下面的錯誤

報錯2：IndexError: index 34 is out of bounds for axis 0 with size 34 ：

将data/hyp.scratch.yaml裡面的anchors注釋掉；同時将train.py中的mate字典中的anchors也注釋掉。運作成功

如果為hyp['anchors']設定一個值，autoanchor将建立新的錨覆寫在model.yaml中指定的任何錨資訊。比如：你可以設定anchors:5強制autoanchor為每個輸出層建立5個新的錨，取代現有的錨。超參數進化将使用該參數為您進化出最優數量的錨。issue

————————————————

原文連結：https://blog.csdn.net/ayiya_Oese/article/details/115369068

1. 超參數

YOLOv3中的超參數在train.py中提供，其中包含了一些資料增強參數設定，具體内容如下：

hyp = {'giou': 3.54,  # giou loss gain
       'cls': 37.4,  # cls loss gain
       'cls_pw': 1.0,  # cls BCELoss positive_weight
       'obj': 49.5,  # obj loss gain (*=img_size/320 if img_size != 320)
       'obj_pw': 1.0,  # obj BCELoss positive_weight
       'iou_t': 0.225,  # iou training threshold
       'lr0': 0.00579,  # initial learning rate (SGD=1E-3, Adam=9E-5)
       'lrf': -4.,  # final LambdaLR learning rate = lr0 * (10 ** lrf)
       'momentum': 0.937,  # SGD momentum
       'weight_decay': 0.000484,  # optimizer weight decay
       'fl_gamma': 0.5,  # focal loss gamma
       'hsv_h': 0.0138,  # image HSV-Hue augmentation (fraction)
       'hsv_s': 0.678,  # image HSV-Saturation augmentation (fraction)
       'hsv_v': 0.36,  # image HSV-Value augmentation (fraction)
       'degrees': 1.98,  # image rotation (+/- deg)
       'translate': 0.05,  # image translation (+/- fraction)
       'scale': 0.05,  # image scale (+/- gain)
       'shear': 0.641}  # image shear (+/- deg)

2. 使用方法

在訓練的時候，train.py提供了一個可選參數

--evolve

, 這個參數決定了是否進行超參數搜尋與進化（預設是不開啟超參數搜尋的）。

具體使用方法也很簡單：

python train.py --data data/voc.data
				--cfg cfg/yolov3-tiny.cfg
				--img-size 416 
				--epochs 273 
				--evolve

實際使用的時候，需要進行修改，train.py中的約444行：

for _ in range(1):  # generations to evolve

将其中的1修改為你想設定的疊代數，比如200代，如果不設定，結果将會如下圖所示，實際上就是隻有一代。

3. 原理

整個過程比較簡單，對于進化過程中的新一代，都選了了适應性最高的前一代（在前幾代中）進行突變。以上所有的參數将有約20%的 1-sigma的正态分布幾率同時突變。

s = 0.2 # sigma

整個進化過程需要搞清楚兩個點：

如何評判其中一代的好壞？
下一代如何根據上一代進行進化？

**第一個問題：**判斷好壞的标準。

def fitness(x):
    w = [0.0, 0.0, 0.8, 0.2]
    # weights for [P, R, mAP, F1]@0.5
    return (x[:, :4] * w).sum(1)

YOLOv3進化部分是通過以上的适應度函數判斷的，适應度越高，代表這一代的性能越好。而在适應度中，是通過Precision,Recall ,mAP,F1這四個名額作為适應度的評價标準。

其中的w是設定的權重，如果更關心mAP的值，可以提高mAP的權重；如果更關心F1,則設定更高的權重在對應的F1上。這裡配置設定mAP權重為0.8、F1權重為0.2。

**第二個問題：**如何進行進化？

進化過程中有兩個重要的參數:

第一個參數為parent, 可選值為

single

或者

weighted

，這個參數的作用是：決定如何選擇上一代。如果選擇single，代表隻選擇上一代中最好的那個。

if parent == 'single' or len(x) == 1:
 	x = x[fitness(x).argmax()]

如果選擇weighted，代表選擇得分的前10個權重平均的結果作為下一代，具體操作如下：

elif parent == 'weighted':  # weighted combination
    n = min(10, len(x))  # number to merge
    x = x[np.argsort(-fitness(x))][:n]  # top n mutations
    w = fitness(x) - fitness(x).min()  # weights
    x = (x * w.reshape(n, 1)).sum(0) / w.sum()  # new parent

第二個參數為method，可選值為

1,2,3

, 分别代表使用三種模式來進化：

# Mutate
method = 2
s = 0.2  # 20% sigma
np.random.seed(int(time.time()))
g = np.array([1, 1, 1, 1, 1, 1, 1, 0, .1, \
              1, 0, 1, 1, 1, 1, 1, 1, 1])  # gains
# 這裡的g類似權重
ng = len(g)
if method == 1:
    v = (np.random.randn(ng) *
         np.random.random() * g * s + 1) ** 2.0
elif method == 2:
    v = (np.random.randn(ng) *
         np.random.random(ng) * g * s + 1) ** 2.0
elif method == 3:
    v = np.ones(ng)
    while all(v == 1):
        # 為了防止重複，直到有變化才停下來
         r = (np.random.random(ng) < 0.1) * np.random.randn(ng)
         # 10% 的突變幾率
         v = (g * s * r + 1) ** 2.0

for i, k in enumerate(hyp.keys()):
    hyp[k] = x[i + 7] * v[i]
    # 進行突變

另外，為了防止突變過程，導緻參數出現明顯不合理的範圍，需要用一個範圍進行框定，将超出範圍的内容剪切掉。具體方法如下：

# Clip to limits
keys = ['lr0', 'iou_t', 'momentum',
        'weight_decay', 'hsv_s',
        'hsv_v', 'translate',
        'scale', 'fl_gamma']
limits = [(1e-5, 1e-2), (0.00, 0.70),
          (0.60, 0.98), (0, 0.001),
          (0, .9), (0, .9), (0, .9),
          (0, .9), (0, 3)]

for k, v in zip(keys, limits):
    hyp[k] = np.clip(hyp[k], v[0], v[1])

最終訓練的超參數搜尋的結果可視化：

參考資料：

官方issue: https://github.com/ultralytics/yolov3/issues/392

官方代碼：https://github.com/ultralytics/yolov3

如果這篇文章幫助到了你，你可以請作者喝一杯咖啡