準備訓練資料

使用darknet訓練自己的YOLO模型需要将資料轉成darknet需要的格式，每張圖檔對應一個.txt的label檔案，檔案格式如下：

<object-class> <x> <y> <width> <height>

object-class是類的索引，後面的4個值都是相對于整張圖檔的比例。

x是ROI中心的x坐标，y是ROI中心的y坐标，width是ROI的寬，height是ROI的高。

我需要用到Pascal VOC、MSCOCO、ImageNet和自己标記的一些圖檔。

混用這些資料集有一個嚴重的問題，有一些需要标記的物體沒有被标記。

如ImageNet的200種物體中有iPod并做了标記，而MSCOCO中有一些圖檔中有iPod卻沒有标記出來，這會導緻模型的精度下降。該問題可以通過對這部分圖檔重新标記來解決（工作量很大）；也可以修改損失函數，對不同資料集的image計算不同的損失，同時針對不同資料集中的資料使用不同的object_scale和noobject_scale。

整合這些資料集首先要準備一個list，list中列出了要識别的物體。

如paul_list.txt

0,ambulance
1,apple
2,automat
3,backpack
4,baggage
5,banana
6,baseball
7,basketball
8,bed
9,bench

轉換Pascal VOC

darknet作者提供了voc_label.py腳本來實作該功能，我們隻需修改腳本中的classes為我們需要的classes即可，然後在VOCdevkit的父目錄執行voc_label.py即可。

classes = ["ambulance", "apple", "automat", "backpack", "baggage", "banana", "baseball", "basketball", "bed","bench"]

轉換MSCOCO

檢視coco的80種物體有哪些是我們需要的，制作coco_list.txt，格式為,。如：

1,apple
3,backpack
5,banana
8,bed
9,bench

安裝MSCOCO提供的python API庫，然後執行coco_label.py。

coco_label.py見github。

https://github.com/PaulChongPeng/darknet/blob/master/tools/coco_label.py

執行腳本前需要修改dataDir和classes為自己的COCO資料集路徑和coco_list.txt路徑

# coding=utf-8
# 使用說明
# 需要先安裝coco tools
# git clone https://github.com/pdollar/coco.git
# cd coco/PythonAPI
# make install(可能會缺少相關依賴，根據提示安裝依賴即可)
# 執行腳本前需在train2014和val2014目錄下分别建立JPEGImages和labels目錄，并将原來train2014和val2014目錄下的圖檔移到JPEGImages下
# COCO資料集的filelist目錄下會生成圖檔路徑清單
# COCO資料集的子集的labels目錄下會生成yolo需要的标注檔案


from pycocotools.coco import COCO
import shutil
import os


# 将ROI的坐标轉換為yolo需要的坐标
# size是圖檔的w和h
# box裡儲存的是ROI的坐标（x，y的最大值和最小值）
# 傳回值為ROI中心點相對于圖檔大小的比例坐标，和ROI的w、h相對于圖檔大小的比例
def convert(size, box):
    dw = 1. / size[0]
    dh = 1. / size[1]
    x = box[0] + box[2] / 2.0
    y = box[1] + box[3] / 2.0
    w = box[2]
    h = box[3]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)


# 擷取所需要的類名和id
# path為類名和id的對應關系清單的位址（标注檔案中可能有很多類，我們隻加載該path指向檔案中的類）
# 傳回值是一個字典，鍵名是類名，鍵值是id
def get_classes_and_index(path):
    D = {}
    f = open(path)
    for line in f:
        temp = line.rstrip().split(',', 2)
        print("temp[0]:" + temp[0] + "\n")
        print("temp[1]:" + temp[1] + "\n")
        D[temp[1]] = temp[0]
    return D


dataDir = '/mnt/large4t/pengchong_data/Data/COCO'  # COCO資料集所在的路徑
dataType = 'train2014'  # 要轉換的COCO資料集的子集名
annFile = '%s/annotations/instances_%s.json' % (dataDir, dataType)  # COCO資料集的标注檔案路徑
classes = get_classes_and_index('/mnt/large4t/pengchong_data/Tools/Yolo_paul/darknet/data/coco_list.txt')

# labels 目錄若不存在，建立labels目錄。若存在，則清空目錄
if not os.path.exists('%s/%s/labels/' % (dataDir, dataType)):
    os.makedirs('%s/%s/labels/' % (dataDir, dataType))
else:
    shutil.rmtree('%s/%s/labels/' % (dataDir, dataType))
    os.makedirs('%s/%s/labels/' % (dataDir, dataType))

# filelist 目錄若不存在，建立filelist目錄。
if not os.path.exists('%s/filelist/' % dataDir):
    os.makedirs('%s/filelist/' % dataDir)

coco = COCO(annFile)  # 加載解析标注檔案
list_file = open('%s/filelist/%s.txt' % (dataDir, dataType), 'w')  # 資料集的圖檔list儲存路徑

imgIds = coco.getImgIds()  # 擷取标注檔案中所有圖檔的COCO Img ID
catIds = coco.getCatIds()  # 擷取标注檔案總所有的物體類别的COCO Cat ID

for imgId in imgIds:
    objCount = 0  # 一個标志位，用來判斷該img是否包含我們需要的标注
    print('imgId :%s' % imgId)
    Img = coco.loadImgs(imgId)[0]  # 加載圖檔資訊
    print('Img :%s' % Img)
    filename = Img['file_name']  # 擷取圖檔名
    width = Img['width']  # 擷取圖檔尺寸
    height = Img['height']  # 擷取圖檔尺寸
    print('filename :%s, width :%s ,height :%s' % (filename, width, height))
    annIds = coco.getAnnIds(imgIds=imgId, catIds=catIds, iscrowd=None)  # 擷取該圖檔對應的所有COCO物體類别标注ID
    print('annIds :%s' % annIds)
    for annId in annIds:
        anns = coco.loadAnns(annId)[0]  # 加載标注資訊
        catId = anns['category_id']  # 擷取該标注對應的物體類别的COCO Cat ID
        cat = coco.loadCats(catId)[0]['name']  # 擷取該COCO Cat ID對應的物體種類名
        # print 'anns :%s' % anns
        # print 'catId :%s , cat :%s' % (catId,cat)

        # 如果該類名在我們需要的物體種類清單中，将标注檔案轉換為YOLO需要的格式
        if cat in classes:
            objCount = objCount + 1
            out_file = open('%s/%s/labels/%s.txt' % (dataDir, dataType, filename[:-4]), 'a')
            cls_id = classes[cat]  # 擷取該類物體在yolo訓練中的id
            box = anns['bbox']
            size = [width, height]
            bb = convert(size, box)
            out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
            out_file.close()

    if objCount > 0:
        list_file.write('%s/%s/JPEGImages/%s\n' % (dataDir, dataType, filename))

list_file.close()

轉換ImageNet

我使用的是ILSVRC2016的資料，檢視200種物體中有哪些是我們需要的，然後制作imagenet_list.txt。

需要注意，ImageNet的标注檔案中的object name使用的物體的WordNetID，是以imagenet_list.txt中需要使用WordNetID，如：

1,n07739125     
3,n02769748  
5,n07753592   
6,n02799071 
7,n02802426      
9,n02828884

為了友善擷取WordNetID在ImageNet中的物體名詞（paul_list.txt中的名詞未必和ImageNet中的一緻），可以制作一個imagenet_map.txt，如：

1,apple,n07739125     
3,backpack,n02769748  
5,banana,n07753592   
6,baseball,n02799071 
7,basketball,n02802426      
9,bench,n02828884

制作imagenet_list.txt和imagenet_map.txt需要知道WordNetID和名詞間的映射關系，有兩個辦法。

離線版：

從ImageNet下載下傳words.txt（WordNetID和名詞間的映射）和gloss.txt（WordNetID對應的名詞的定義）,然後查詢。如果沒有梯子，國内通路ImageNet龜速，檔案被我備份在GitHub。

https://github.com/PaulChongPeng/darknet/blob/32dddd8509de4bf57cad0aa330160d57d33d0c66/data/words.txt

https://github.com/PaulChongPeng/darknet/blob/32dddd8509de4bf57cad0aa330160d57d33d0c66/data/gloss.txt

線上版：

通路 http://image-net.org/challenges/LSVRC/2015/browse-det-synsets 。請自備梯子，不然慢的令人發指。

點選需要查詢的名詞，如Volleyball，會跳轉到對應的網頁，我們需要的是網頁位址後的wnid。如 http://imagenet.stanford.edu/synset?wnid=n04540053 。

制作好list後，将imagenet_to_yolo.py放在ILSVRC2016/bject_detection/ILSVRC目錄下，并将Data檔案夾重命名為JPEGImages（因為darknet找圖檔對應的标記檔案是直接替換JPEGImages為labels，圖檔字尾名替換為txt）。修改classes為自己的list路徑後直接運作腳本即可。

imagenet_to_yolo.py 我放在了GitHub上：

https://github.com/PaulChongPeng/darknet/blob/master/tools/imagenet_to_yolo.py

# coding=utf-8

# 使用說明
# 将該檔案放在ILSVRC2016/bject_detection/ILSVRC目錄下，并将Data檔案夾重命名為JPEGImages
# 執行該工具，Lists目錄下會生成圖檔路徑清單
# labels目錄下會生成yolo需要的标注檔案

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
import shutil


# 擷取所有包含标注檔案的的目錄路徑
def get_dirs():
    dirs = ['DET/train/ILSVRC2014_train_0006', 'DET/train/ILSVRC2014_train_0005', 'DET/train/ILSVRC2014_train_0004',
            'DET/train/ILSVRC2014_train_0003', 'DET/train/ILSVRC2014_train_0002', 'DET/train/ILSVRC2014_train_0001',
            'DET/train/ILSVRC2014_train_0000', 'DET/val']
    dirs_2013 = os.listdir('JPEGImages/DET/train/ILSVRC2013_train/')
    for dir_2013 in dirs_2013:
        dirs.append('DET/train/ILSVRC2013_train/' + dir_2013)
    return dirs


# 擷取所需要的類名和id
# path為類名和id的對應關系清單的位址（标注檔案中可能有很多類，我們隻加載該path指向檔案中的類）
# 傳回值是一個字典，鍵名是類名，鍵值是id
def get_classes_and_index(path):
    D = {}
    f = open(path)
    for line in f:
        temp = line.rstrip().split(',', 2)
        D[temp[1]] = temp[0]
    return D


# 将ROI的坐标轉換為yolo需要的坐标
# size是圖檔的w和h
# box裡儲存的是ROI的坐标（x，y的最大值和最小值）
# 傳回值為ROI中心點相對于圖檔大小的比例坐标，和ROI的w、h相對于圖檔大小的比例
def convert(size, box):
    dw = 1. / size[0]
    dh = 1. / size[1]
    x = (box[0] + box[1]) / 2.0
    y = (box[2] + box[3]) / 2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)


# 将labelImg 生成的xml檔案轉換為yolo需要的txt檔案
# image_dir 圖檔所在的目錄的路徑
# image_id圖檔名
def convert_annotation(image_dir, image_id):
    in_file = open('Annotations/%s/%s.xml' % (image_dir, image_id))
    obj_num = 0  # 一個标志位，用來判斷該img是否包含我們需要的标注
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        cls = obj.find('name').text
        if cls not in classes:
            continue
        obj_num = obj_num + 1
        if obj_num == 1:
            out_file = open('labels/%s/%s.txt' % (image_dir, image_id), 'w')
        cls_id = classes[cls]  # 擷取該類物體在yolo訓練中的id
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
             float(xmlbox.find('ymax').text))
        bb = convert((w, h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

    if obj_num > 0:
        list_file = open('Lists/%s.txt' % image_dir.split('/')[-1], 'a')  # 資料集的圖檔list儲存路徑
        list_file.write('%s/JPEGImages/%s/%s.JPEG\n' % (wd, image_dir, image_id))
        list_file.close()


def IsSubString(SubStrList, Str):
    flag = True
    for substr in SubStrList:
        if not (substr in Str):
            flag = False

    return flag


# 擷取FindPath路徑下指定格式（FlagStr）的檔案名（不包含字尾名）清單
def GetFileList(FindPath, FlagStr=[]):
    import os
    FileList = []
    FileNames = os.listdir(FindPath)
    if (len(FileNames) > 0):
        for fn in FileNames:
            if (len(FlagStr) > 0):
                if (IsSubString(FlagStr, fn)):
                    FileList.append(fn[:-4])
            else:
                FileList.append(fn)

    if (len(FileList) > 0):
        FileList.sort()

    return FileList


classes = get_classes_and_index('/mnt/large4t/pengchong_data/Tools/Yolo_paul/darknet/data/imagenet_list.txt')
dirs = get_dirs()

wd = getcwd()

# Lists 目錄若不存在，建立Lists目錄。若存在，則清空目錄
if not os.path.exists('Lists/'):
    os.makedirs('Lists/')
else:
    shutil.rmtree('Lists/')
    os.makedirs('Lists/')

for image_dir in dirs:
    if not os.path.exists('JPEGImages/' + image_dir):
        print("JPEGImages/%s dir not exist" % image_dir)
        continue
    # labels 目錄若不存在，建立labels目錄。若存在，則清空目錄
    if not os.path.exists('labels/%s' % (image_dir)):
        os.makedirs('labels/%s' % (image_dir))
    else:
        shutil.rmtree('labels/%s' % (image_dir))
        os.makedirs('labels/%s' % (image_dir))
    image_ids = GetFileList('Annotations/' + image_dir, ['xml'])
    for image_id in image_ids:
        print(image_id)
        convert_annotation(image_dir, image_id)

轉換自己的資料

我使用的labelImg工具做的圖像标注，标記格式大體和VOC一緻。

工具位址見GitHub： https://github.com/tzutalin/labelImg

隻需要簡單修改voc_label.py就可以轉換自己的資料。修改後的腳本命名為lableImg_voc_to_yolo.py。我放在了GitHub上：

https://github.com/PaulChongPeng/darknet/blob/master/tools/lableImg_voc_to_yolo.py

# coding=utf-8

# 使用說明

# 要轉換的資料集目錄結構為：
# Paul/time/class/annotations/xml檔案
# Paul/time/class/images/jpg檔案
# Paul/time/class/labels/即将生成的yolo需要的txt檔案

# 該檔案需放在Paul目錄下，該目錄下将會生成名為“日期”的txt檔案，檔案内容為日期檔案夾下所有圖檔的路徑

# 有多少個日期的檔案夾，就将多少個檔案夾的名字加入sets

# 需要生成多少種物體的标簽，就将多少種物體加入classes
# labels目錄下生成的txt檔案中的第一個數字就是物體種類在classes中的索引


import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
import shutil

sets = ['20170401', '20170414']


# 擷取所需要的類名和id
# path為類名和id的對應關系清單的位址（标注檔案中可能有很多類，我們隻加載該path指向檔案中的類）
# 傳回值是一個字典，鍵名是類名，鍵值是id
def get_classes_and_index(path):
    D = {}
    f = open(path)
    for line in f:
        temp = line.rstrip().split(',', 2)
        print("temp[0]:" + temp[0] + "\n")
        print("temp[1]:" + temp[1] + "\n")
        D[temp[1].replace(' ', '')] = temp[0]
    return D


# 将ROI的坐标轉換為yolo需要的坐标
# size是圖檔的w和h
# box裡儲存的是ROI的坐标（x，y的最大值和最小值）
# 傳回值為ROI中心點相對于圖檔大小的比例坐标，和ROI的w、h相對于圖檔大小的比例
def convert(size, box):
    dw = 1. / size[0]
    dh = 1. / size[1]
    x = (box[0] + box[1]) / 2.0
    y = (box[2] + box[3]) / 2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)


# 将labelImg 生成的xml檔案轉換為yolo需要的txt檔案
# path到類名一級的目錄路徑
# image_id圖檔名
def convert_annotation(path, image_id):
    in_file = open('%s/annotations/%s.xml' % (path, image_id))
    out_file = open('%s/labels/%s.txt' % (path, image_id), 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        cls = obj.find('name').text.replace(' ', '')
        # 如果該類物體不在我們的yolo訓練清單中，跳過
        if cls not in classes:
            continue
        cls_id = classes[cls]  # 擷取該類物體在yolo訓練清單中的id
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
             float(xmlbox.find('ymax').text))
        bb = convert((w, h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')


def IsSubString(SubStrList, Str):
    flag = True
    for substr in SubStrList:
        if not (substr in Str):
            flag = False

    return flag


# 擷取FindPath路徑下指定格式（FlagStr）的檔案名（不包含字尾名）清單
def GetFileList(FindPath, FlagStr=[]):
    import os
    FileList = []
    FileNames = os.listdir(FindPath)
    if (len(FileNames) > 0):
        for fn in FileNames:
            if (len(FlagStr) > 0):
                if (IsSubString(FlagStr, fn)):
                    FileList.append(fn[:-4])
            else:
                FileList.append(fn)

    if (len(FileList) > 0):
        FileList.sort()

    return FileList


# 擷取目錄下子目錄的目錄名清單
def get_dirs(time):
    dirs = []
    dirs_temp = os.listdir(time)
    for dir_name in dirs_temp:
        dirs.append(time + '/' + dir_name)
    return dirs


wd = getcwd()

classes = get_classes_and_index('/raid/pengchong_data/Tools/Paul_YOLO/data/Paul_list.txt')

for time in sets:
    dirs = get_dirs(time)
    list_file = open('%s.txt' % time, 'w')  # 資料集的圖檔list儲存路徑
    for path in dirs:
        print(path)
        if not os.path.exists('%s/annotations/' % path):
            os.makedirs('%s/annotations/' % path)
        if not os.path.exists('%s/labels/' % path):
            os.makedirs('%s/labels/' % path)
        else:
            shutil.rmtree('%s/labels/' % path)
            os.makedirs('%s/labels/' % path)
        image_ids = GetFileList(path + '/annotations/', ['xml'])
        for image_id in image_ids:
            print(image_id)
            list_file.write('%s/%s/images/%s.jpg\n' % (wd, path, image_id))
            convert_annotation(path, image_id)
    list_file.close()

将各個資料集的标注檔案轉換成YOLO需要的格式後，将腳本生成的圖像位址list的内容全部拷貝到paul.txt中，然後使用partial.py腳本随機分割為train,val,test data。腳本已上傳至GitHut，可根據自己的需要進行修改。

https://github.com/PaulChongPeng/darknet/blob/master/tools/partial.py

資料準備工作到此就算結束了。

準備配置檔案

在cfg目錄下添加paul.data,内容如下：

classes=10                                                      要識别物體的種類數
train  = data/paul_train.txt                                    訓練集圖檔list
valid = data/paul_val.txt                                       驗證集圖檔list
names = data/paul.names                                         要識别的物體list
backup = /mnt/large4t/pengchong_data/Tools/darknet/backup/      訓練時權重檔案備份路徑

在cfg目錄下添加yolo-paul.cfg檔案，該檔案内容複制自預設的yolo-voc.cfg，根據自己的訓練集和機器配置做修改，具體參數意義可以參考我之前的文章：

我修改的内容如下：

[net]
batch=27                       每27張圖更新一次權重，subdivisions=1時占用GPU memory 15.6G左右
......
......
learning_rate=0.00001           學習率大了容易發散
max_batches = 500000
......
......
[convolutional]
......
......
filters=75                      最後一個卷積層輸出的特征圖數為5*(10+5)
......
......
[region]
......
......
classes=10                      訓練十種物體
......
......

在data目錄下增加paul.names，内容如下：

ambulance
apple
automat
backpack
baggage
banana
baseball
basketball
bed
bench

修改Makefile

GPU=1
CUDNN=1

編譯

make clean
make -j8

訓練

首先準備ImageNet的預訓練權重檔案

curl -O https://pjreddie.com/media/files/darknet19.weights

使用前23層的權重

./darknet partial cfg/darknet19_448.cfg darknet19_448.weights darknet19_448.conv.23 23

partial指令可以分割權重檔案，fine-tune的時候也會用到。

開始訓練

./darknet detector train cfg/paul.data cfg/yolo-paul.cfg darknet19_448.conv.23 2>1 | tee paul_train_log.txt

剩下的就是等待了。

需要注意的是，如果學習率設定的比較大，訓練結果很容易發散，訓練過程輸出的log會有nan字樣，需要減國小習率後再進行訓練。

多GPU訓練技巧

darknet支援多GPU，使用多GPU訓練可以極大加速訓練速度。據我測試在DGX-1上使用8塊Tesla P100同時訓練的速度是在外星人上使用1塊GTX1080的130多倍。

單GPU與多GPU的切換技巧

在darknet上使用多GPU訓練需要一定技巧，盲目使用多GPU訓練會悲劇的發現損失一直在下降、recall在上升，然而Obj幾乎為零,最終得到的權重檔案無法預測出bounding box。

使用多GPU訓練前需要先用單GPU訓練至Obj有穩定上升的趨勢後（我一般在obj大于0.1後切換）再使用backup中備份的weights通過多GPU繼續訓練。一般情況下使用單GPU訓練1000個疊代即可切換到多GPU。

./darknet detector train cfg/paul.data cfg/yolo-paul.cfg backup/yolo-paul_1000.weights -gpus 0,1,2,3,4,5,6,7 2>1 | tee paul_train_log.txt

0,1,2,3,4,5,6,7是指定的GPU的ID，通過

nvidia-smi

指令可以查詢:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.20                 Driver Version: 375.20                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-SXM2...  On   | 0000:06:00.0     Off |                    0 |
| N/A   52C    P0   270W / 300W |  15887MiB / 16308MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-SXM2...  On   | 0000:07:00.0     Off |                    0 |
| N/A   55C    P0   247W / 300W |  15887MiB / 16308MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-SXM2...  On   | 0000:0A:00.0     Off |                    0 |
| N/A   54C    P0   252W / 300W |  15887MiB / 16308MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-SXM2...  On   | 0000:0B:00.0     Off |                    0 |
| N/A   51C    P0   242W / 300W |  15887MiB / 16308MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla P100-SXM2...  On   | 0000:85:00.0     Off |                    0 |
| N/A   53C    P0   227W / 300W |  15887MiB / 16308MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla P100-SXM2...  On   | 0000:86:00.0     Off |                    0 |
| N/A   58C    P0   245W / 300W |  15887MiB / 16308MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla P100-SXM2...  On   | 0000:89:00.0     Off |                    0 |
| N/A   59C    P0   245W / 300W |  15887MiB / 16308MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla P100-SXM2...  On   | 0000:8A:00.0     Off |                    0 |
| N/A   52C    P0   228W / 300W |  15887MiB / 16308MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     50064    C   ./darknet                                    15887MiB |
|    1     50064    C   ./darknet                                    15887MiB |
|    2     50064    C   ./darknet                                    15887MiB |
|    3     50064    C   ./darknet                                    15887MiB |
|    4     50064    C   ./darknet                                    15887MiB |
|    5     50064    C   ./darknet                                    15887MiB |
|    6     50064    C   ./darknet                                    15887MiB |
|    7     50064    C   ./darknet                                    15887MiB |
+-----------------------------------------------------------------------------+

使用多GPU時的學習率

使用多GPU訓練時，學習率是使用單GPU訓練的n倍，n是使用GPU的個數

可視化訓練過程的中間參數

等待訓練結束後（有時候沒等結束我們的模型就開始發散了），我們需要檢查各項名額（如loss）是否達到了我們期望的數值，如果沒有，要分析為什麼。可視化訓練過程的中間參數可以幫助我們分析問題。

可視化中間參數需要用到訓練時儲存的log檔案paul_train_log.txt

訓練log中各參數的意義

Region Avg IOU：平均的IOU，代表預測的bounding box和ground truth的交集與并集之比，期望該值趨近于1。

Class:是标注物體的機率，期望該值趨近于1.

Obj：期望該值趨近于1.

No Obj:期望該值越來越小但不為零.

Avg Recall：期望該值趨近1

avg：平均損失，期望該值趨近于0

使用train_loss_visualization.py腳本可以繪制loss變化曲線。

腳本已上傳至GitHub（使用前需安裝依賴）：

https://github.com/PaulChongPeng/darknet/blob/master/tools/train_loss_visualization.py

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

lines =1878760
result = pd.read_csv('S:/Tools/Paul_YOLO/paul_train_log_new.txt', skiprows=[x for x in range(lines) if ((x%10!=9) |(x<1000))] ,error_bad_lines=False, names=['loss', 'avg', 'rate', 'seconds', 'images'])
result.head()

result['loss']=result['loss'].str.split(' ').str.get(1)
result['avg']=result['avg'].str.split(' ').str.get(1)
result['rate']=result['rate'].str.split(' ').str.get(1)
result['seconds']=result['seconds'].str.split(' ').str.get(1)
result['images']=result['images'].str.split(' ').str.get(1)
result.head()
result.tail()

#print(result.head())
# print(result.tail())
# print(result.dtypes)

print(result['loss'])
print(result['avg'])
print(result['rate'])
print(result['seconds'])
print(result['images'])

result['loss']=pd.to_numeric(result['loss'])
result['avg']=pd.to_numeric(result['avg'])
result['rate']=pd.to_numeric(result['rate'])
result['seconds']=pd.to_numeric(result['seconds'])
result['images']=pd.to_numeric(result['images'])
result.dtypes


fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(result['avg'].values,label='avg_loss')
#ax.plot(result['loss'].values,label='loss')
ax.legend(loc='best')
ax.set_title('The loss curves')
ax.set_xlabel('batches')
fig.savefig('avg_loss')
#fig.savefig('loss')

腳本使用說明：

使用指令

2>1 | tee paul_train_log.txt

儲存log時會生成兩個檔案，檔案1裡儲存的是網絡加載資訊和checkout點儲存資訊，paul_train_log.txt中儲存的是訓練資訊。

1、删除log開頭的三行：

0,1,2,3,4,5,6,7
yolo-paul
Learning Rate: 1e-05, Momentum: 0.9, Decay: 0.0005

2、删除log的結尾幾行，使最後一行為batch的輸出，如：

497001: 0.863348, 0.863348 avg, 0.001200 rate, 5.422251 seconds, 107352216 images

3、執行extract_log.py腳本，格式化log。腳本代碼見GitHub：

https://github.com/PaulChongPeng/darknet/blob/master/tools/extract_log.py

# coding=utf-8
# 該檔案用來提取訓練log，去除不可解析的log後使log檔案格式化，生成新的log檔案供可視化工具繪圖

import random

f = open('paul_train_log.txt')
train_log = open('paul_train_log_new.txt', 'w')

for line in f:
    # 去除多gpu的同步log
    if 'Syncing' in line:
        continue
    # 去除除零錯誤的log
    if 'nan' in line:
        continue
    train_log.write(line)

f.close()
train_log.close()

最終log格式：

Loaded: 5.588888 seconds
Region Avg IOU: 0.649881, Class: 0.854394, Obj: 0.476559, No Obj: 0.007302, Avg Recall: 0.737705,  count: 61
Region Avg IOU: 0.671544, Class: 0.959081, Obj: 0.523326, No Obj: 0.006902, Avg Recall: 0.780000,  count: 50
Region Avg IOU: 0.525841, Class: 0.815314, Obj: 0.449031, No Obj: 0.006602, Avg Recall: 0.484375,  count: 64
Region Avg IOU: 0.583596, Class: 0.830763, Obj: 0.377681, No Obj: 0.007916, Avg Recall: 0.629214,  count: 89
Region Avg IOU: 0.651377, Class: 0.908635, Obj: 0.460094, No Obj: 0.008060, Avg Recall: 0.753425,  count: 73
Region Avg IOU: 0.571363, Class: 0.880554, Obj: 0.341659, No Obj: 0.007820, Avg Recall: 0.633663,  count: 101
Region Avg IOU: 0.585424, Class: 0.935552, Obj: 0.358635, No Obj: 0.008192, Avg Recall: 0.644860,  count: 107
Region Avg IOU: 0.599972, Class: 0.832793, Obj: 0.382910, No Obj: 0.009005, Avg Recall: 0.650602,  count: 83
497001: 0.863348, 0.863348 avg, 0.000012 rate, 5.422251 seconds, 107352216 images

4、修改train_loss_visualization.py中lines為log行數，并根據需要修改要跳過的行數。

skiprows=[x for x in range(lines) if ((x%10!=9) |(x<1000))]

運作train_loss_visualization.py會在腳本所在路徑生成avg_loss.png。

YOLO訓練準備訓練資料準備配置檔案訓練多GPU訓練技巧可視化訓練過程的中間參數使用驗證集評估模型

從損失變化曲線可以看出，模型在100000萬次疊代後損失下降速度非常慢，幾乎沒有下降。結合log和cfg檔案發現，我自定義的學習率變化政策在十萬次疊代時會減小十倍，十萬次疊代後學習率下降到非常小的程度，導緻損失下降速度降低。修改cfg中的學習率變化政策，10萬次疊代時不改變學習率，30萬次時再降低。

我使用疊代97000次時的備份的checkout點來繼續訓練。

./darknet detector train cfg/paul.data cfg/yolo-paul.cfg backup/yolo-paul_97000.weights 2>1 | tee paul_train_log.txt

除了可視化loss，還可以可視化Avg IOU，Avg Recall等參數。

可視化’Region Avg IOU’, ‘Class’, ‘Obj’, ‘No Obj’, ‘Avg Recall’,’count’這些參數可以使用腳本train_iou_visualization.py，使用方式和train_loss_visualization.py相同。腳本已上傳至GitHub：https://github.com/PaulChongPeng/darknet/blob/master/tools/train_iou_visualization.py

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

lines =525990
result = pd.read_csv('S:/Tools/Paul_YOLO/paul_train_log_new.txt', skiprows=[x for x in range(lines) if (x%10==0 or x%10==9) ] ,error_bad_lines=False, names=['Region Avg IOU', 'Class', 'Obj', 'No Obj', 'Avg Recall','count'])
result.head()

result['Region Avg IOU']=result['Region Avg IOU'].str.split(': ').str.get(1)
result['Class']=result['Class'].str.split(': ').str.get(1)
result['Obj']=result['Obj'].str.split(': ').str.get(1)
result['No Obj']=result['No Obj'].str.split(': ').str.get(1)
result['Avg Recall']=result['Avg Recall'].str.split(': ').str.get(1)
result['count']=result['count'].str.split(': ').str.get(1)
result.head()
result.tail()

#print(result.head())
# print(result.tail())
# print(result.dtypes)
print(result['Region Avg IOU'])

result['Region Avg IOU']=pd.to_numeric(result['Region Avg IOU'])
result['Class']=pd.to_numeric(result['Class'])
result['Obj']=pd.to_numeric(result['Obj'])
result['No Obj']=pd.to_numeric(result['No Obj'])
result['Avg Recall']=pd.to_numeric(result['Avg Recall'])
result['count']=pd.to_numeric(result['count'])
result.dtypes

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
#ax.plot(result['Region Avg IOU'].values,label='Region Avg IOU')
#ax.plot(result['Class'].values,label='Class')
#ax.plot(result['Obj'].values,label='Obj')
#ax.plot(result['No Obj'].values,label='No Obj')
ax.plot(result['Avg Recall'].values,label='Avg Recall')
#ax.plot(result['count'].values,label='count')
ax.legend(loc='best')
#ax.set_title('The Region Avg IOU curves')
ax.set_title('The Avg Recall curves')
ax.set_xlabel('batches')
#fig.savefig('Avg IOU')
fig.savefig('Avg Recall')

YOLO訓練準備訓練資料準備配置檔案訓練多GPU訓練技巧可視化訓練過程的中間參數使用驗證集評估模型

使用驗證集評估模型

評估模型可以使用指令valid（隻有預測結果，沒有評價預測是否正确）或recall，這兩個指令都無法滿足我的需求，我實作了category指令做性能評估。

valid:

在paul.data末尾添加

eval = imagenet #有voc、coco、imagenet三種模式

修改Detector.c檔案validate_detector函數，修改門檻值（預設.005）

float thresh = .1;

重新編譯然後執行指令

./darknet detector valid cfg/paul.data cfg/yolo-paul.cfg backup/yolo-paul_final.weights

results目錄下會生成預測結果，格式如下：

1 1 0.431522 235.186066 77.746033 421.808258 348.950012
1 1 0.186538 161.324097 270.221497 187.429535 321.382141
1 14 0.166257 284.207947 364.423889 465.995056 454.305603
2 30 0.287718 274.455719 290.674194 343.506256 352.656433
2 30 0.582356 293.578918 294.799438 350.478088 327.216614
2 1 0.599921 138.686981 314.705231 352.362152 588.235962
3 59 0.251553 193.290497 183.707275 277.655273 349.782410
3 59 0.107120 209.172287 269.722626 330.998718 342.530914
3 62 0.162954 0.000000 278.525543 457.739563 480.000000
4 6 0.617184 38.155792 31.496445 434.091705 527.705811
4 1 0.101005 358.778351 238.540756 395.645050 289.902283
4 6 0.813770 75.790985 282.521210 459.018585 564.883545
4 3 0.114561 32.667072 407.288025 142.561798 506.885498
4 3 0.104120 87.489151 337.674896 446.883728 584.356689
5 1 0.106601 235.460571 0.707840 265.958740 34.851868
5 1 0.134753 310.776398 1.273307 344.392303 31.028347
5 1 0.146177 349.860596 0.445604 385.901550 29.931465
5 1 0.129790 388.831177 3.721551 419.852844 30.414955
5 1 0.146747 369.672150 0.000000 441.490387 45.012733
5 1 0.339233 7.567236 0.000000 53.692001 97.718735

如果想要檢視recall可以使用recall指令。

修改費Detector.c檔案的validate_detector_recall函數：

1、修改門檻值：

float thresh = .25;

2、修改驗證集路徑：

list *plist = get_paths("/mnt/large4t/pengchong_data/Data/Paul/filelist/val.txt");

3、增加Precision

//fprintf(stderr, "%5d %5d %5d\tRPs/Img: %.2f\tIOU: %.2f%%\tRecall:%.2f%%\n", i, correct, total, (float)proposals/(i+1), avg_iou*100/total, 100.*correct/total);
fprintf(stderr, "ID:%5d Correct:%5d Total:%5d\tRPs/Img: %.2f\tIOU: %.2f%%\tRecall:%.2f%%\t", i, correct, total, (float)proposals/(i+1), avg_iou*100/total, 100.*correct/total);
fprintf(stderr, "proposals:%5d\tPrecision:%.2f%%\n",proposals,100.*correct/(float)proposals);

重新編譯然後執行指令

./darknet detector recall cfg/paul.data cfg/yolo-paul.cfg backup/yolo-paul_final.weights

結果格式如下：

ID:    0 Correct:    1 Total:   22  RPs/Img: 2.00   IOU: 7.59%  Recall:4.55%    proposals:    2 Precision:50.00%
ID:    1 Correct:    2 Total:   28  RPs/Img: 2.00   IOU: 8.90%  Recall:7.14%    proposals:    4 Precision:50.00%
ID:    2 Correct:    3 Total:   39  RPs/Img: 1.67   IOU: 7.91%  Recall:7.69%    proposals:    5 Precision:60.00%
ID:    3 Correct:    3 Total:   42  RPs/Img: 2.00   IOU: 7.42%  Recall:7.14%    proposals:    8 Precision:37.50%
ID:    4 Correct:    9 Total:   58  RPs/Img: 5.00   IOU: 15.96% Recall:15.52%   proposals:   25 Precision:36.00%
ID:    5 Correct:   10 Total:   70  RPs/Img: 4.50   IOU: 14.99% Recall:14.29%   proposals:   27 Precision:37.04%
ID:    6 Correct:   12 Total:   72  RPs/Img: 4.00   IOU: 16.51% Recall:16.67%   proposals:   28 Precision:42.86%
ID:    7 Correct:   14 Total:   76  RPs/Img: 3.75   IOU: 17.60% Recall:18.42%   proposals:   30 Precision:46.67%
ID:    8 Correct:   16 Total:   81  RPs/Img: 3.78   IOU: 19.15% Recall:19.75%   proposals:   34 Precision:47.06%
ID:    9 Correct:   20 Total:   96  RPs/Img: 3.80   IOU: 20.40% Recall:20.83%   proposals:   38 Precision:52.63%
ID:   10 Correct:   22 Total:  103  RPs/Img: 3.82   IOU: 21.09% Recall:21.36%   proposals:   42 Precision:52.38%

category指令評估模型針對每種物體檢測的性能

代碼已送出至GitHub：https://github.com/PaulChongPeng/darknet/blob/master/src/detector.c

void print_category(FILE **fps, char *path, box *boxes, float **probs, int total, int classes, int w, int h, float thresh, float iou_thresh)
{
    int i, j;

    char labelpath[4096];
    find_replace(path, "images", "labels", labelpath);
    find_replace(labelpath, "JPEGImages", "labels", labelpath);
    find_replace(labelpath, ".jpg", ".txt", labelpath);
    find_replace(labelpath, ".JPEG", ".txt", labelpath);

    int num_labels = 0;
    box_label *truth = read_boxes(labelpath, &num_labels);

    for(i = 0; i < total; ++i){
        int class_id = max_index(probs[i],classes);
        float prob = probs[i][class_id];
        if (prob < thresh)continue;

        float best_iou = 0;
        int best_iou_id = 0;
        int correct = 0;
        for (j = 0; j < num_labels; ++j) {
            box t = {truth[j].x*w, truth[j].y*h, truth[j].w*w, truth[j].h*h};
            float iou = box_iou(boxes[i], t);
            //fprintf(stderr, "box p: %f, %f, %f, %f\n", boxes[i].x, boxes[i].y, boxes[i].w, boxes[i].h);
            //fprintf(stderr, "box t: %f, %f, %f, %f\n", t.x, t.y, t.w, t.h);
            //fprintf(stderr, "iou : %f\n", iou);
            if(iou > best_iou){
                best_iou = iou;
                best_iou_id = j;
            }
        }

        if(best_iou > iou_thresh && truth[best_iou_id].id == class_id){
            correct = 1;
        }

        float xmin = boxes[i].x - boxes[i].w/2.;
        float xmax = boxes[i].x + boxes[i].w/2.;
        float ymin = boxes[i].y - boxes[i].h/2.;
        float ymax = boxes[i].y + boxes[i].h/2.;

        if (xmin < 0) xmin = 0;
        if (ymin < 0) ymin = 0;
        if (xmax > w) xmax = w;
        if (ymax > h) ymax = h;

        fprintf(fps[class_id], "%s, %d, %d, %f, %f, %f, %f, %f, %f\n", path, class_id, correct, prob, best_iou, xmin, ymin, xmax, ymax);

    }
}


void validate_detector_category(char *datacfg, char *cfgfile, char *weightfile, char *outfile)
{
    int j;
    list *options = read_data_cfg(datacfg);
    char *valid_images = option_find_str(options, "valid", "data/train.list");
    char *name_list = option_find_str(options, "names", "data/names.list");
    char *prefix = option_find_str(options, "results", "results");
    char **names = get_labels(name_list);
    char *mapf = option_find_str(options, "map", 0);
    int *map = 0;
    if (mapf) map = read_map(mapf);

    network net = parse_network_cfg(cfgfile);
    if(weightfile){
        load_weights(&net, weightfile);
    }
    set_batch_network(&net, 1);
    fprintf(stderr, "Learning Rate: %g, Momentum: %g, Decay: %g\n", net.learning_rate, net.momentum, net.decay);
    srand(time(0));

    list *plist = get_paths(valid_images);
    char **paths = (char **)list_to_array(plist);

    layer l = net.layers[net.n-1];
    int classes = l.classes;

    char buff[1024];
    FILE **fps = 0;
    if(!outfile) outfile = "paul_";
    fps = calloc(classes, sizeof(FILE *));
    for(j = 0; j < classes; ++j){
        snprintf(buff, 1024, "%s/%s%s.txt", prefix, outfile, names[j]);
        fps[j] = fopen(buff, "w");
    }


    box *boxes = calloc(l.w*l.h*l.n, sizeof(box));
    float **probs = calloc(l.w*l.h*l.n, sizeof(float *));
    for(j = 0; j < l.w*l.h*l.n; ++j) probs[j] = calloc(classes, sizeof(float *));

    int m = plist->size;
    int i=0;
    int t;

    float thresh = .25;
    float iou_thresh = .5;
    float nms = .45;

    int nthreads = 4;
    image *val = calloc(nthreads, sizeof(image));
    image *val_resized = calloc(nthreads, sizeof(image));
    image *buf = calloc(nthreads, sizeof(image));
    image *buf_resized = calloc(nthreads, sizeof(image));
    pthread_t *thr = calloc(nthreads, sizeof(pthread_t));

    load_args args = {0};
    args.w = net.w;
    args.h = net.h;
    args.type = IMAGE_DATA;

    for(t = 0; t < nthreads; ++t){
        args.path = paths[i+t];
        args.im = &buf[t];
        args.resized = &buf_resized[t];
        thr[t] = load_data_in_thread(args);
    }
    time_t start = time(0);
    for(i = nthreads; i < m+nthreads; i += nthreads){
        fprintf(stderr, "%d\n", i);
        for(t = 0; t < nthreads && i+t-nthreads < m; ++t){
            pthread_join(thr[t], 0);
            val[t] = buf[t];
            val_resized[t] = buf_resized[t];
        }
        for(t = 0; t < nthreads && i+t < m; ++t){
            args.path = paths[i+t];
            args.im = &buf[t];
            args.resized = &buf_resized[t];
            thr[t] = load_data_in_thread(args);
        }
        for(t = 0; t < nthreads && i+t-nthreads < m; ++t){
            char *path = paths[i+t-nthreads];
            float *X = val_resized[t].data;
            network_predict(net, X);
            int w = val[t].w;
            int h = val[t].h;
            get_region_boxes(l, w, h, thresh, probs, boxes, 0, map, .5);
            if (nms) do_nms_sort(boxes, probs, l.w*l.h*l.n, classes, nms);
            print_category(fps, path, boxes, probs, l.w*l.h*l.n, classes, w, h, thresh, iou_thresh);
            free_image(val[t]);
            free_image(val_resized[t]);
        }
    }
    for(j = 0; j < classes; ++j){
        if(fps) fclose(fps[j]);
    }
    fprintf(stderr, "Total Detection Time: %f Seconds\n", (double)(time(0) - start));
}

void run_detector(int argc, char **argv)
{
    char *prefix = find_char_arg(argc, argv, "-prefix", 0);
    float thresh = find_float_arg(argc, argv, "-thresh", .24);
    float hier_thresh = find_float_arg(argc, argv, "-hier", .5);
    int cam_index = find_int_arg(argc, argv, "-c", 0);
    int frame_skip = find_int_arg(argc, argv, "-s", 0);
    if(argc < 4){
        fprintf(stderr, "usage: %s %s [train/test/valid] [cfg] [weights (optional)]\n", argv[0], argv[1]);
        return;
    }
    char *gpu_list = find_char_arg(argc, argv, "-gpus", 0);
    char *outfile = find_char_arg(argc, argv, "-out", 0);
    int *gpus = 0;
    int gpu = 0;
    int ngpus = 0;
    if(gpu_list){
        printf("%s\n", gpu_list);
        int len = strlen(gpu_list);
        ngpus = 1;
        int i;
        for(i = 0; i < len; ++i){
            if (gpu_list[i] == ',') ++ngpus;
        }
        gpus = calloc(ngpus, sizeof(int));
        for(i = 0; i < ngpus; ++i){
            gpus[i] = atoi(gpu_list);
            gpu_list = strchr(gpu_list, ',')+1;
        }
    } else {
        gpu = gpu_index;
        gpus = &gpu;
        ngpus = 1;
    }

    int clear = find_arg(argc, argv, "-clear");

    char *datacfg = argv[3];
    char *cfg = argv[4];
    char *weights = (argc > 5) ? argv[5] : 0;
    char *filename = (argc > 6) ? argv[6]: 0;
    if(0==strcmp(argv[2], "test")) test_detector(datacfg, cfg, weights, filename, thresh, hier_thresh);
    else if(0==strcmp(argv[2], "train")) train_detector(datacfg, cfg, weights, gpus, ngpus, clear);
    else if(0==strcmp(argv[2], "valid")) validate_detector(datacfg, cfg, weights, outfile);
    else if(0==strcmp(argv[2], "recall")) validate_detector_recall(cfg, weights);
    else if(0==strcmp(argv[2], "category"))validate_detector_category(datacfg, cfg, weights, outfile);
    else if(0==strcmp(argv[2], "demo")) {
        list *options = read_data_cfg(datacfg);
        int classes = option_find_int(options, "classes", 20);
        char *name_list = option_find_str(options, "names", "data/names.list");
        char **names = get_labels(name_list);
        demo(cfg, weights, thresh, cam_index, filename, names, classes, frame_skip, prefix, hier_thresh);
    }
}

執行指令

./darknet detector category cfg/paul.data cfg/yolo-paul.cfg backup/yolo-paul_final.weights

result目錄下會生成各類物體的val結果，有多少種物體，就會生成多少個txt檔案，每個txt檔案中有path, class_id, correct, prob, best_iou, xmin, ymin, xmax, ymax資訊。

使用evalute.py工具可以解析這些txt檔案做一個總結性的評估。

工具已上傳到GitHub：https://github.com/PaulChongPeng/darknet/blob/master/tools/evalute.py

# coding=utf-8
# 本工具和category指令結合使用
# category是在detector.c中新增的指令，主要作用是生成每類物體的evalute結果
# 執行指令 ./darknet detector category cfg/paul.data cfg/yolo-paul.cfg backup/yolo-paul_final.weights
# result目錄下會生成各類物體的val結果，将本工具放在result目錄下執行，會print出各種物體的evalute結果，包括
# id,avg_iou,avg_correct_iou,avg_precision,avg_recall,avg_score
# result目錄下會生成low_list和high_list，内容分别為精度和recall未達标和達标的物體種類


import os
from os import listdir, getcwd
from os.path import join
import shutil

# 共有多少類物體
class_num = 97


# 每類物體的驗證結果
class CategoryValidation:
    id = 0  # Category id
    path = ""  # path
    total_num = 0  # 标注檔案中該類bounding box的總數
    proposals_num = 0  # validate結果中共預測了多少個該類的bounding box
    correct_num = 0  # 預測正确的bounding box（與Ground-truth的IOU大于0.5且種類正确）的數量
    iou_num = 0  # 所有大于0.5的IOU的數量
    iou_sum = 0  # 所有大于0.5的IOU的IOU之和
    correct_iou_sum = 0  # 預測正确的bounding box的IOU之和
    score_sum = 0  # 所有正确預測的bounding box的機率之和
    avg_iou = 0  # 無論預測的bounding box的object的種類是否正确，所有bounding box 與最吻合的Ground-truth求出IOU，對大于0.5的IOU求平均值：avg_iou = iou_sum/iou_num
    avg_correct_iou = 0  # 對預測正确的bounding box的IOU求平均值：avg_correct_iou = correct_iou_sum/correct_num
    avg_precision = 0  # avg_precision = correct_num/proposals_num
    avg_recall = 0  # avg_recall = correct_num/total_num
    avg_score = 0  # avg_score=score_sum/correct_num

    def __init__(self, path, val_cat_num):
        self.path = path
        f = open(path)

        for line in f:
            temp = line.rstrip().replace(' ', '').split(',', 9)
            temp[1] = int(temp[1])
            self.id = temp[1]
            self.total_num = val_cat_num[self.id]
            if (self.total_num):
                break

        for line in f:
            # path, class_id, correct, prob, best_iou, xmin, ymin, xmax, ymax
            temp = line.rstrip().split(', ', 9)
            temp[1] = int(temp[1])
            temp[2] = int(temp[2])
            temp[3] = float(temp[3])
            temp[4] = float(temp[4])
            self.proposals_num = self.proposals_num + 1.00
            if (temp[2]):
                self.correct_num = self.correct_num + 1.00
                self.score_sum = self.score_sum + temp[3]
                self.correct_iou_sum = self.correct_iou_sum + temp[4]
            if (temp[4] > 0.5):
                self.iou_num = self.iou_num + 1
                self.iou_sum = self.iou_sum + temp[4]

        self.avg_iou = self.iou_sum / self.iou_num
        self.avg_correct_iou = self.correct_iou_sum / self.correct_num
        self.avg_precision = self.correct_num / self.proposals_num
        self.avg_recall = self.correct_num / self.total_num
        self.avg_score = self.score_sum / self.correct_num

        f.close()

    # 導出識别正确的圖檔清單
    def get_correct_list(self):
        f = open(self.path)
        new_f_name = "correct_list_" + self.id + ".txt"
        new_f = open(new_f_name, 'w')
        for line in f:
            temp = line.rstrip().split(', ', 9)
            if (temp[2]):
                new_f.write(line)
        f.close()

    # 導出識别錯誤的圖檔清單
    def get_error_list(self):
        f = open(self.path)
        new_f_name = "error_list_" + self.id + ".txt"
        new_f = open(new_f_name, 'w')
        for line in f:
            temp = line.rstrip().split(', ', 9)
            if (temp[2] == 0):
                new_f.write(line)
        f.close()

    def print_eva(self):
        print("id=%d, avg_iou=%f, avg_correct_iou=%f, avg_precision=%f, avg_recall=%f, avg_score=%f \n" % (self.id,
                                                                                                           self.avg_iou,
                                                                                                           self.avg_correct_iou,
                                                                                                           self.avg_precision,
                                                                                                           self.avg_recall,
                                                                                                           self.avg_score))


def IsSubString(SubStrList, Str):
    flag = True
    for substr in SubStrList:
        if not (substr in Str):
            flag = False

    return flag


# 擷取FindPath路徑下指定格式（FlagStr）的檔案名清單
def GetFileList(FindPath, FlagStr=[]):
    import os
    FileList = []
    FileNames = os.listdir(FindPath)
    if (len(FileNames) > 0):
        for fn in FileNames:
            if (len(FlagStr) > 0):
                if (IsSubString(FlagStr, fn)):
                    FileList.append(fn)
            else:
                FileList.append(fn)

    if (len(FileList) > 0):
        FileList.sort()

    return FileList


# 擷取所有物體種類的ROI數目
# path是圖檔清單的位址
# 傳回值是一個list，list的索引是物體種類在yolo中的id，值是該種物體的ROI數量
def get_val_cat_num(path):
    val_cat_num = []
    for i in range(0, class_num):
        val_cat_num.append(0)

    f = open(path)
    for line in f:
        label_path = line.rstrip().replace('images', 'labels')
        label_path = label_path.replace('JPEGImages', 'labels')
        label_path = label_path.replace('.jpg', '.txt')
        label_path = label_path.replace('.JPEG', '.txt')
        label_list = open(label_path)
        for label in label_list:
            temp = label.rstrip().split(" ", 4)
            id = int(temp[0])
            val_cat_num[id] = val_cat_num[id] + 1.00
        label_list.close()
    f.close()
    return val_cat_num


# 擷取物體名list
# path是物體名list檔案位址
# 傳回值是一個清單，清單的索引是類的id，值為該類物體的名字
def get_name_list(path):
    name_list = []
    f = open(path)
    for line in f:
        temp = line.rstrip().split(',', 2)
        name_list.append(temp[1])
    return name_list


wd = getcwd()
val_result_list = GetFileList(wd, ['txt'])
val_cat_num = get_val_cat_num("/raid/pengchong_data/Data/filelists/val.txt")
name_list = get_name_list("/raid/pengchong_data/Tools/Paul_YOLO/data/paul_list.txt")
low_list = open("low_list.log", 'w')
high_list = open("high_list.log", 'w')
for result in val_result_list:
    cat = CategoryValidation(result, val_cat_num)
    cat.print_eva()
    if ((cat.avg_precision < 0.3) | (cat.avg_recall < 0.3)):
        low_list.write("id=%d, name=%s, avg_precision=%f, avg_recall=%f \n" % (cat.id, name_list[cat.id], cat.avg_precision, cat.avg_recall))
    if ((cat.avg_precision > 0.6) & (cat.avg_recall > 0.6)):
        high_list.write("id=%d, name=%s, avg_precision=%f, avg_recall=%f \n" % (cat.id, name_list[cat.id], cat.avg_precision, cat.avg_recall))

low_list.close()
high_list.close()

将本工具放在result目錄下執行，會print出各種物體的evalute結果，包括

id,avg_iou,avg_correct_iou,avg_precision,avg_recall,avg_score。

id=1, avg_iou=0.807394, avg_correct_iou=0.810435, avg_precision=0.473983, avg_recall=0.283531, avg_score=0.661014 

id=2, avg_iou=0.824890, avg_correct_iou=0.826227, avg_precision=0.812950, avg_recall=0.824818, avg_score=0.772828 

id=3, avg_iou=0.748561, avg_correct_iou=0.756006, avg_precision=0.401891, avg_recall=0.146048, avg_score=0.568196 

id=4, avg_iou=0.821225, avg_correct_iou=0.822419, avg_precision=0.779621, avg_recall=0.798544, avg_score=0.773700 

id=5, avg_iou=0.722905, avg_correct_iou=0.721078, avg_precision=0.391119, avg_recall=0.255361, avg_score=0.552248 

id=6, avg_iou=0.814797, avg_correct_iou=0.814427, avg_precision=0.731707, avg_recall=0.612245, avg_score=0.833531 

id=7, avg_iou=0.713375, avg_correct_iou=0.702796, avg_precision=0.739336, avg_recall=0.715596, avg_score=0.691065 

id=8, avg_iou=0.785120, avg_correct_iou=0.797686, avg_precision=0.582267, avg_recall=0.594216, avg_score=0.734099 

id=9, avg_iou=0.744355, avg_correct_iou=0.752729, avg_precision=0.523982, avg_recall=0.241049, avg_score=0.650683 

id=10, avg_iou=0.736755, avg_correct_iou=0.744951, avg_precision=0.621368, avg_recall=0.382028, avg_score=0.651450

同時result目錄下會生成low_list和high_list，内容分别為精度和recall未達标和達标的物體種類。

YOLO訓練準備訓練資料準備配置檔案訓練多GPU訓練技巧可視化訓練過程的中間參數使用驗證集評估模型

準備訓練資料

轉換Pascal VOC

轉換MSCOCO

轉換ImageNet

轉換自己的資料

準備配置檔案

訓練

多GPU訓練技巧

單GPU與多GPU的切換技巧

使用多GPU時的學習率

可視化訓練過程的中間參數

訓練log中各參數的意義

使用train_loss_visualization.py腳本可以繪制loss變化曲線。

除了可視化loss，還可以可視化Avg IOU，Avg Recall等參數。

使用驗證集評估模型

valid:

如果想要檢視recall可以使用recall指令。

category指令評估模型針對每種物體檢測的性能

繼續閱讀

簡單文檔分類——樸素貝葉斯算法樸素貝葉斯算法簡單文檔分類執行個體步驟總結樸素貝葉斯分類調用(sklearn)

【分類算法】什麼是分類算法定義分類與聚類分類過程方法

分類算法的評價名額

K-近鄰算法以及圖像分類應用

weka之NB算法

使用weka的select attribute

weka中分類器算法

在weka中內建自己的算法

【多變量線性回歸】學習記錄序思路實作終

申請評分模型拒絕推斷（RI）方法申請評分模型拒絕推斷（RI）方法

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告