天天看點

【深度學習】YOLO_V1 複現(使用Tensorflow實作)論文閱讀Tensorflow代碼實作:

論文位址:https://arxiv.org/pdf/1506.02640.pdf

本文所包含代碼GitHub位址:https://github.com/shankezh/DL_HotNet_Tensorflow

如果對機器學習有興趣,不僅僅滿足将深度學習模型當黑盒模型使用的,想了解為何機器學習可以訓練拟合最佳模型,可以看我過往的部落格,使用數學知識推導了機器學習中比較經典的案例,并且使用了python撸了一套簡單的神經網絡的代碼架構用來加深了解:https://blog.csdn.net/shankezh/article/category/7279585

項目幫忙或工作機會請郵件聯系:[email protected]

--------------------- 

本文對論文進行了代碼複現;

---------------------

論文閱讀

關鍵資訊提取

1.目前檢測系統應用分類器去執行檢測,如DPM,RCNN等;

2.YOLO則将對象檢測看做一個回歸問題;使用這個系統,你進需要看一次圖像就可以預測對象是什麼且對象在哪裡;

3.系統劃分輸入圖像為SxS個格子,如果一個對象的中心在一個單元格中,那麼這個單元格就負責檢測這個對象;

4.每一個單元格預測B個bbox和這些bbox的置信率分數;分數反映了模型對這個box中的對象預測的正确率的思考;定義置信率分數為Pr(Object) * IOU;

5.如果沒有對象在這個單元格,那麼置信率分數将會是0;否則我們想要置信率分數等于預測box和真實box的IOU;

6.每個bbox包含五個預測,x,y,w,h,confidence(置信率),(x,y)坐标表示盒子的中心相對與單元格的邊界;width和height是相對于整個圖像預測的;置信率表示為真實box和預測box之間的IOU;

7.每個格子也要預測C個條件機率,Pr(Class i | Object);這些機率是限制在格子包含的一個對象中;

8.僅預測每個單元格中一組類别機率,無論裡面有多個boxes B;

9.測試時乘上條件機率和乘上分離的盒子置信率,公式見下圖:

【深度學習】YOLO_V1 複現(使用Tensorflow實作)論文閱讀Tensorflow代碼實作:

其中,Pr(Object)要麼為1,表示單元格中有對象,要麼為0,表示單元格中無對象;

10.評估YOLO在PASCAL VOC中,我們使用S=7,B=2. PASCAL VOC有二十個标簽類别,是以C=20,那麼最終預測的結果就是7x7x30 shape的tensor,公式為S x S x (B * 5 + C);

11.設計的網絡中,卷積層從圖像抽取特征,fc預測輸出機率和坐标;

12.設計的網絡受到GoogLeNet的圖像分類啟發,設計了24個卷積層後跟2個fc層,使用了1x1後跟3x3來代替GoogLeNet的inception子產品,全部設計見下圖:

【深度學習】YOLO_V1 複現(使用Tensorflow實作)論文閱讀Tensorflow代碼實作:

 作者在ImageNet上進行了預訓練分類任務,分辨率是224x224,然後使用了翻倍的分辨率去做檢測,也就是448x448;

訓練

1.在ImageNet上預訓練;

2.預訓練使用了前20層卷積層,後面跟一個均值池化層和一個全連接配接層;

3.由于一篇論文說同時添加卷積核全連接配接可以提升性能,是以我們添加了4個卷積層和2個全連接配接層,并設定了随機初始化權重;由于檢測往往需要細密的紋理視覺資訊,是以增加了輸入到網絡的分辨率,将224x224變成了448x448;

4.最後一層預測類别機率和bbox坐标;作者通過圖檔的寬高規範化了bbox的寬和高,是以他們降到了0和1之間;我們參數化bbox坐标x,y為特殊的單元格定位偏移,是以他們也總是在0和1之間;

5.我們最後一層使用線性激活單元,其他所有曾都是用leaky 激活函數,激活函數見下圖(Leaky ReLU);

【深度學習】YOLO_V1 複現(使用Tensorflow實作)論文閱讀Tensorflow代碼實作:

6.優化器使用和方差作為模型輸出,是以和方差容易計算,但對于最大化平均精度來說不夠完美;權衡定位誤差和分類誤差不夠完美;如果每個圖像中許多個單元格都不包含對象,那麼就設定下這些單元格的置信率為0,通常梯度會超過包含對象的單元格,這會導緻模型不穩定,因為早期的訓練會發散;

7.為了解決6的問題,作者增加了bbox的坐标預測損失,減少不包含對象的置信率預測損失,是以使用了兩個參數λcoord, λnoobj , λcoord = 5, λnoobj = 0.5;

8.和方差同樣對待大boxes和小boxes的誤差,度量誤差時,大的boxes比小的boxes偏差更重要,為了部分解決這個問題,作者預測bbox的寬高的方根來代替直接預測寬高;

9.設計的誤差如下:

【深度學習】YOLO_V1 複現(使用Tensorflow實作)論文閱讀Tensorflow代碼實作:

這裡使用X(上角标 #下角标)來代替公式中具有上下角标的變量,其中 1(obj # i)代表在第i個單元格中存在對象;1(obj # ij)代表第j個bbox的預測器在單元格i中負責檢測; 

10.在VOC2007和2012上訓練了135個批次,batchsize為64,使用了momentun,沖量0.9,衰減0.0005;學習速率如下,第一個批次緩慢減少從0.001到0.01,後面到第75個批次,速率為0.01,然後再用0.001訓練30個批次,最後使用0.0001訓練30個批次;

11.為了避免過拟合,使用了dropout和資料擴充,dropout設定為0.5在第一個fc後,對于資料擴充使用了随機縮放和變換,大約在20%的原始圖像尺寸,也随機調整了圖像的HSV飽和度通過設定因子為1.5;

12.一些大的對象機關會交叉落到多個單元格内,是以,非極大抑制法(NMS)就起作用了;

YOLO的限制

1.yolo在預測bbox上具有較強的空間限制,因為每個單元格僅僅預測兩個boxes和僅有一個類别;這樣就限制了模型對較多鄰近對象的預測;

2.模型也使用了粗略的相關性特征去預測bbox,因為結構有多重下采樣層;

Tensorflow代碼實作:

模型代碼(包含預訓練和檢測兩種網絡)

model.py:

import tensorflow as tf
import tensorflow.contrib.slim as slim
import net.Detection.YOLOV1.config as cfg
import numpy as np


class YOLO_Net(object):
    def __init__(self,is_pre_training=False,is_training = True):
        self.classes = cfg.VOC07_CLASS
        self.pre_train_num = cfg.PRE_TRAIN_NUM
        self.det_cls_num = len(self.classes)
        self.image_size = cfg.DET_IMAGE_SIZE
        self.cell_size = cfg.CELL_SIZE
        self.boxes_per_cell = cfg.PER_CELL_CHECK_BOXES
        self.output_size = (self.cell_size * self.cell_size) * ( 5 * self.boxes_per_cell + self.det_cls_num)
        self.scale = 1.0 * self.image_size / self.cell_size
        self.boundary1 = self.cell_size * self.cell_size * self.det_cls_num
        self.boundary2 = self.boundary1 + self.cell_size * self.cell_size * self.boxes_per_cell
        self.object_scale = cfg.OBJ_CONFIDENCE_SCALE
        self.no_object_scale = cfg.NO_OBJ_CONFIDENCE_SCALE
        self.class_scale = cfg.CLASS_SCALE
        self.coord_scale = cfg.COORD_SCALE
        self.learning_rate = 0.0001
        self.batch_size = cfg.BATCH_SIZE
        self.keep_prob = cfg.KEEP_PROB
        self.pre_training = is_pre_training

        self.offset = np.transpose(
            np.reshape(
                np.array(
                    [np.arange(self.cell_size)]*self.cell_size*self.boxes_per_cell
                ),(self.boxes_per_cell,self.cell_size,self.cell_size)
            ),(1,2,0)
        )

        self.bn_params = cfg.BATCH_NORM_PARAMS
        self.is_training = tf.placeholder(tf.bool)
        if self.pre_training:
            self.images = tf.placeholder(tf.float32, [None, 224, 224, 3], name='images')
        else:
            self.images = tf.placeholder(tf.float32, [None, self.image_size, self.image_size, 3], name='images')

        self.logits = self.build_network(self.images,is_training=self.is_training)

        if is_training:
            if self.pre_training:
                self.labels = tf.placeholder(tf.float32, [None,self.pre_train_num])
                self.classify_loss(self.logits,self.labels)
                self.total_loss = tf.losses.get_total_loss()
                self.evalution = self.classify_evalution(self.logits,self.labels)
                print('預訓練網絡')
            else:
                self.labels = tf.placeholder(tf.float32, [None,self.cell_size,self.cell_size,5+self.det_cls_num])
                self.det_loss_layer(self.logits,self.labels)
                self.total_loss = tf.losses.get_total_loss()
                tf.summary.scalar('total_loss', self.total_loss)
                print('識别網絡')




    def build_network(self, images,is_training = True,scope = 'yolov1'):
        net = images
        with tf.variable_scope(scope):
            with slim.arg_scope([slim.conv2d, slim.fully_connected],
                                weights_regularizer=slim.l2_regularizer(0.00004)):
                with slim.arg_scope([slim.conv2d],
                                    weights_initializer=slim.xavier_initializer(),
                                    normalizer_fn=slim.batch_norm,
                                    activation_fn=slim.nn.leaky_relu,
                                    normalizer_params=self.bn_params):
                    with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training):
                        net = slim.conv2d(net, 64, [7, 7], stride=2, padding='SAME', scope='layer1')
                        net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool1')

                        net = slim.conv2d(net, 192, [3, 3], stride=1, padding='SAME', scope='layer2')
                        net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool2')

                        net = slim.conv2d(net, 128, [1, 1], stride=1, padding='SAME', scope='layer3_1')
                        net = slim.conv2d(net, 256, [3, 3], stride=1, padding='SAME', scope='layer3_2')
                        net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer3_3')
                        net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer3_4')
                        net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool3')

                        net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_1')
                        net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_2')
                        net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_3')
                        net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_4')
                        net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_5')
                        net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_6')
                        net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_7')
                        net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_8')
                        net = slim.conv2d(net, 512, [1, 1], stride=1, padding='SAME', scope='layer4_9')
                        net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer4_10')
                        net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool4')

                        net = slim.conv2d(net, 512, [1, 1], stride=1, padding='SAME', scope='layer5_1')
                        net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer5_2')
                        net = slim.conv2d(net, 512, [1, 1], stride=1, padding='SAME', scope='layer5_3')
                        net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer5_4')

                        if self.pre_training:
                            net = slim.avg_pool2d(net, [7, 7], stride=1, padding='VALID', scope='clssify_avg5')
                            net = slim.flatten(net)
                            net = slim.fully_connected(net, self.pre_train_num, activation_fn=slim.nn.leaky_relu,
                                                       scope='classify_fc1')
                            return net

                        net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer5_5')
                        net = slim.conv2d(net, 1024, [3, 3], stride=2, padding='SAME', scope='layer5_6')

                        net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer6_1')
                        net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer6_2')

                        net = slim.flatten(net)

                        net = slim.fully_connected(net, 1024, activation_fn=slim.nn.leaky_relu, scope='fc1')
                        net = slim.dropout(net, 0.5)
                        net = slim.fully_connected(net, 4096, activation_fn=slim.nn.leaky_relu, scope='fc2')
                        net = slim.dropout(net, 0.5)
                        net = slim.fully_connected(net, self.output_size, activation_fn=None, scope='fc3')
                        # N, 7,7,30
                        # net = tf.reshape(net,[-1,S,S,B*5+C])
            return net

    def classify_loss(self,logits,labels):
        with tf.name_scope('classify_loss') as scope:
            _loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=labels)
            mean_loss = tf.reduce_mean(_loss)
            tf.losses.add_loss(mean_loss)
            tf.summary.scalar(scope + 'classify_mean_loss', mean_loss)

    def classify_evalution(self,logits,labels):
        with tf.name_scope('classify_evaluation') as scope:
            correct_pre = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
            accurary = tf.reduce_mean(tf.cast(correct_pre, 'float'))
            # tf.summary.scalar(scope + 'accuracy:', accurary)
        return accurary


    '''
    @:param predicts shape->[N,7x7x30]
    @:param labels   shape->[N,7,7,25]  <==>[N,h方向,w方向,25] ==>[N,7,7,25(1:是否負責檢測,2-5:坐标,6-25:類别one-hot)]
    '''

    def det_loss_layer(self, predicts, labels, scope='det_loss'):
        with tf.variable_scope(scope):
            predict_classes = tf.reshape(predicts[:, :self.boundary1],
                                         [-1, 7, 7, 20])  # 類别預測 ->[batch_size,cell_size,cell_size,num_cls]
            predict_scale = tf.reshape(predicts[:, self.boundary1:self.boundary2],
                                       [-1, 7, 7, 2])  # 置信率預測-> [batch_size,cell_size,cell_size,boxes_per_cell]
            predict_boxes = tf.reshape(predicts[:,self.boundary2:],
                                       [-1, 7, 7, 2, 4])  # 坐标預測->[batch_size,cell_size,cell_size,boxes_per_cell,4]

            response = tf.reshape(labels[:, :, :, 0], [-1, 7, 7, 1])  # 标簽置信率,用來判斷cell是否負責檢測
            boxes = tf.reshape(labels[:, :, :, 1:5], [-1, 7, 7, 1, 4])  # 标簽坐标
            boxes = tf.tile(boxes,
                            [1, 1, 1, 2, 1]) / self.image_size  # 标簽坐标,由于預測是2個,是以需要将标簽也變成2個,同時對坐标進行yolo形式歸一化
            classes = labels[:, :, :, 5:]  # 标簽類别

            offset = tf.constant(self.offset, dtype=tf.float32)
            offset = tf.reshape(offset, [1, 7, 7, 2])
            offset = tf.tile(offset, [tf.shape(boxes)[0], 1, 1, 1])
            predict_boxes_tran = tf.stack([
                1. * (predict_boxes[:, :, :, :, 0] + offset) / self.cell_size,
                1. * (predict_boxes[:, :, :, :, 1] + tf.transpose(offset, (0, 2, 1, 3))) / self.cell_size,
                tf.square(predict_boxes[:, :, :, :, 2]),
                tf.square(predict_boxes[:, :, :, :, 3])
            ], axis=-1)
            # predict_boxes_tran = tf.transpose(predict_boxes_tran,[1,2,3,4,0])

            iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes)
            object_mask = tf.reduce_max(iou_predict_truth, 3, keep_dims=True)
            object_mask = tf.cast((iou_predict_truth >= object_mask), tf.float32) * response
            no_object_mask = tf.ones_like(object_mask, dtype=tf.float32) - object_mask
            boxes_tran = tf.stack([
                1. * boxes[:, :, :, :, 0] * 7 - offset,
                1. * boxes[:, :, :, :, 1] * 7 - tf.transpose(offset, (0, 2, 1, 3)),
                tf.sqrt(boxes[:, :, :, :, 2]),
                tf.sqrt(boxes[:, :, :, :, 3])
            ], axis=-1)

            # 類别損失
            class_delta = response * (predict_classes - classes)
            class_loss = tf.reduce_mean(tf.reduce_sum(tf.square(class_delta), axis=[1, 2, 3]),
                                        name='class_loss') * self.class_scale

            # 對象損失
            object_delta = object_mask * (predict_scale - iou_predict_truth)
            object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(object_delta), axis=[1, 2, 3]),
                                         name='object_loss') * self.object_scale

            # 無對象損失
            no_object_delta = no_object_mask * predict_scale
            no_object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(no_object_delta), axis=[1, 2, 3]),
                                            name='no_object_loss') * self.no_object_scale

            # 坐标損失
            coord_mask = tf.expand_dims(object_mask, 4)
            boxes_delta = coord_mask * (predict_boxes - boxes_tran)
            coord_loss = tf.reduce_mean(tf.reduce_sum(tf.square(boxes_delta), axis=[1, 2, 3, 4]),
                                        name='coord_loss') * self.coord_scale
            tf.losses.add_loss(class_loss)
            tf.losses.add_loss(object_loss)
            tf.losses.add_loss(no_object_loss)
            tf.losses.add_loss(coord_loss)

            tf.summary.scalar('class_loss', class_loss)
            tf.summary.scalar('object_loss', object_loss)
            tf.summary.scalar('noobject_loss', no_object_loss)
            tf.summary.scalar('coord_loss', coord_loss)

            tf.summary.histogram('boxes_delta_x', boxes_delta[:, :, :, :, 0])
            tf.summary.histogram('boxes_delta_y', boxes_delta[:, :, :, :, 1])
            tf.summary.histogram('boxes_delta_w', boxes_delta[:, :, :, :, 2])
            tf.summary.histogram('boxes_delta_h', boxes_delta[:, :, :, :, 3])
            tf.summary.histogram('iou', iou_predict_truth)


    def calc_iou(self, boxes1, boxes2, scope='iou'):
        """calculate ious
               Args:
                 boxes1: 4-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4]  ====> (x_center, y_center, w, h)
                 boxes2: 1-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ===> (x_center, y_center, w, h)
               Return:
                 iou: 3-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]
               """
        with tf.variable_scope(scope):
            boxes1 = tf.stack([boxes1[:, :, :, :, 0] - boxes1[:, :, :, :, 2] / 2.0,
                               boxes1[:, :, :, :, 1] - boxes1[:, :, :, :, 3] / 2.0,
                               boxes1[:, :, :, :, 0] + boxes1[:, :, :, :, 2] / 2.0,
                               boxes1[:, :, :, :, 1] + boxes1[:, :, :, :, 3] / 2.0], axis=-1)
            # boxes1 = tf.transpose(boxes1, [1, 2, 3, 4, 0])

            boxes2 = tf.stack([boxes2[:, :, :, :, 0] - boxes2[:, :, :, :, 2] / 2.0,
                               boxes2[:, :, :, :, 1] - boxes2[:, :, :, :, 3] / 2.0,
                               boxes2[:, :, :, :, 0] + boxes2[:, :, :, :, 2] / 2.0,
                               boxes2[:, :, :, :, 1] + boxes2[:, :, :, :, 3] / 2.0], axis=-1)
            # boxes2 = tf.transpose(boxes2, [1, 2, 3, 4, 0])

            lu = tf.maximum(boxes1[:, :, :, :, :2], boxes2[:, :, :, :, :2])
            rd = tf.minimum(boxes1[:, :, :, :, 2:], boxes2[:, :, :, :, 2:])

            intersection = tf.maximum(0.0, rd - lu)
            inter_square = intersection[:, :, :, :, 0] * intersection[:, :, :, :, 1]

            square1 = (boxes1[:, :, :, :, 2] - boxes1[:, :, :, :, 0]) * \
                      (boxes1[:, :, :, :, 3] - boxes1[:, :, :, :, 1])
            square2 = (boxes2[:, :, :, :, 2] - boxes2[:, :, :, :, 0]) * \
                      (boxes2[:, :, :, :, 3] - boxes2[:, :, :, :, 1])

            union_square = tf.maximum(square1 + square2 - inter_square, 1e-10)

        return tf.clip_by_value(inter_square / union_square, 0.0, 1.0)
           

訓練(包含分類預訓練和檢測訓練):

solver.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Created by solver on 19-5-6
import tensorflow as tf
from net.Detection.YOLOV1.model import YOLO_Net
import net.Detection.YOLOV1.config as cfg
import tensorflow.contrib.slim as slim
from net.Detection.YOLOV1.voc07_img import Pascal_voc
from coms.learning_rate import CLR_EXP_RANGE
from coms.utils import  isHasGpu,isLinuxSys
import time,os
from coms.pre_process import get_cifar10_batch
import net.Detection.YOLOV1.voc07_tfrecord as VOC07RECORDS

class Solver(object):
    def __init__(self,net,data,tf_records=False):
        self.net = net
        self.data = data
        self.tf_records = tf_records
        self.batch_size = cfg.BATCH_SIZE
        self.clr = CLR_EXP_RANGE()
        self.log_dir = cfg.LOG_DIR
        self.model_cls_dir = cfg.CLS_MODEL_DIR
        self.model_det_dir = cfg.DET_MODEL_DIR
        self.learning_rate = tf.placeholder(tf.float32)
        self.re_train = True
        tf.summary.scalar('learning_rate',self.learning_rate)
        self.optimizer = self.optimizer_bn(lr=self.learning_rate,loss=self.net.total_loss)
        if isHasGpu():
            gpu_option = tf.GPUOptions(allow_growth=True)
            config = tf.ConfigProto(allow_soft_placement=True,gpu_options=gpu_option)
        else:
            config = tf.ConfigProto(allow_soft_placement=True)
        self.sess = tf.Session(config=config)
        self.sess.run(tf.global_variables_initializer())

        self.summary_op = tf.summary.merge_all()
        n_time = time.strftime("%Y-%m-%d %H-%M", time.localtime())
        self.writer = tf.summary.FileWriter(os.path.join(self.log_dir, n_time),self.sess.graph)
        self.saver = tf.train.Saver(max_to_keep=4)


    def train_classify(self):
        self.set_classify_params()
        max_acc = 0.
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess=self.sess, coord=coord)
        for epoch in range(cfg.EPOCH):
            for step in range(1,cfg.ITER_STEP+1):
                learning_rate_val = self.clr.calc_lr(step,cfg.ITER_STEP+1,0.001,0.01,gamma=0.9998)
                train_img_batch, train_label_batch = self.sess.run([self.train_img_batch,self.train_label_batch])
                feed_dict_train = {self.net.images:train_img_batch, self.net.labels:train_label_batch, self.net.is_training:True,self.learning_rate:learning_rate_val}
                _, summary_op, batch_train_loss, batch_train_acc = self.sess.run([self.optimizer, self.summary_op,self.net.total_loss,self.net.evalution],feed_dict=feed_dict_train)

                global_step = int(epoch * cfg.ITER_STEP + step + 1)
                print("epoch %d , step %d train end ,loss is : %f ,accuracy is %f ... ..." % (epoch, step, batch_train_loss, batch_train_acc))
                train_summary = tf.Summary(
                    value=[tf.Summary.Value(tag='train_loss', simple_value=batch_train_loss)
                        , tf.Summary.Value(tag='train_batch_accuracy', simple_value=batch_train_acc)
                        , tf.Summary.Value(tag='learning_rate', simple_value=learning_rate_val)])
                self.writer.add_summary(train_summary,global_step=global_step)
                self.writer.add_summary(summary_op,global_step=global_step)
                self.writer.flush()


                if step % 100 == 0:
                    print('test sets evaluation start ...')
                    ac_iter = int(10000 / self.batch_size)  # cifar-10測試集數量10000張
                    ac_sum = 0.
                    loss_sum = 0.
                    for ac_count in range(ac_iter):
                        batch_test_img, batch_test_label = self.sess.run([self.test_img_batch, self.test_label_batch])
                        feed_dict_test = {self.net.images: batch_test_img,self.net.labels: batch_test_label,self.net.is_training: False,self.learning_rate:learning_rate_val}
                        test_loss, test_accuracy = self.sess.run([self.net.total_loss, self.net.evalution],feed_dict=feed_dict_test)

                        ac_sum += test_accuracy
                        loss_sum += test_loss
                    ac_mean = ac_sum / ac_iter
                    loss_mean = loss_sum / ac_iter
                    print('epoch {} , step {} , accuracy is {}'.format(str(epoch), str(step), str(ac_mean)))
                    test_summary = tf.Summary(
                        value=[tf.Summary.Value(tag='test_loss', simple_value=loss_mean)
                            , tf.Summary.Value(tag='test_accuracy', simple_value=ac_mean)])
                    self.writer.add_summary(test_summary, global_step=global_step)
                    self.writer.flush()

                    if ac_mean >= max_acc:
                        max_acc = ac_mean
                        self.saver.save(self.sess, self.model_cls_dir + '/' + 'cifar10_{}_step_{}.ckpt'.format(str(epoch),str(step)), global_step=step)
                        print('max accuracy has reaching ,save model successful ...')
        print('train network task was run over')


    def set_classify_params(self):
        self.train_img_batch,self.train_label_batch = get_cifar10_batch(is_train=True,batch_size=self.batch_size,num_cls=cfg.PRE_TRAIN_NUM,img_prob=[224,224,3])
        self.test_img_batch,self.test_label_batch = get_cifar10_batch(is_train=False,batch_size=self.batch_size,num_cls=cfg.PRE_TRAIN_NUM,img_prob=[224,224,3])

    def train_detector(self):
        self.set_detector_params()
        for epoch in range(cfg.EPOCH):
            for step in range(1,cfg.ITER_STEP+1):
                global_step = int(epoch * cfg.ITER_STEP + step + 1)
                learning_rate_val = self.clr.calc_lr(step,cfg.ITER_STEP+1,0.0001,0.0005,gamma=0.9998)
                if self.tf_records:
                    train_images, train_labels = self.sess.run(self.train_next_elements)
                else:
                    train_images, train_labels = self.data.next_batch(self.gt_labels_train, self.batch_size)
                feed_dict_train = {self.net.images:train_images,self.net.labels:train_labels,self.learning_rate:learning_rate_val,self.net.is_training:True}
                _,summary_str,train_loss = self.sess.run([self.optimizer,self.summary_op,self.net.total_loss],feed_dict=feed_dict_train)
                print("epoch %d , step %d train end ,loss is : %f  ... ..." % (epoch, step, train_loss))
                self.writer.add_summary(summary_str,global_step)

                if step % 50 ==0:
                    print('test sets start ...')
                    # test sets sum :4962
                    sum_loss = 0.
                    # test_iter = int (4962 / self.batch_size)
                    test_iter = 10  # 取10個批次求均值
                    for _ in range(test_iter):
                        if self.tf_records:
                            test_images, test_labels = self.sess.run(self.test_next_elements)
                        else:
                            test_images,test_labels = self.data.next_batch(self.gt_labels_test,self.batch_size)
                        feed_dict_test = {self.net.images:test_images,self.net.labels:test_labels,self.net.is_training:False}
                        loss_iter = self.sess.run(self.net.total_loss,feed_dict=feed_dict_test)
                        sum_loss += loss_iter

                    mean_loss = sum_loss/test_iter
                    print('epoch {} , step {} , test loss is {}'.format(str(epoch), str(step), str(mean_loss)))
                    test_summary = tf.Summary(
                        value=[tf.Summary.Value(tag='test_loss', simple_value=mean_loss)])
                    self.writer.add_summary(test_summary, global_step=global_step)
                    self.writer.flush()

            self.saver.save(self.sess,self.model_det_dir+'/' + 'det_voc07_{}_step_{}.ckpt'.format(str(epoch),str(step)), global_step=step)
            print('save model successful ...')

    def set_detector_params(self):
        if self.tf_records:
            train_records_path = r'/home/ws/DataSets/pascal_VOC/VOC07/tfrecords' + '/trainval.tfrecords'
            test_records_path = r'/home/ws/DataSets/pascal_VOC/VOC07/tfrecords' + '/test.tfrecords'
            train_datasets = VOC07RECORDS.DataSets(record_path=train_records_path,batch_size=self.batch_size)
            train_gen = train_datasets.transform(shuffle=True)
            train_iterator = train_gen.make_one_shot_iterator()
            self.train_next_elements = train_iterator.get_next()
            test_datasets = VOC07RECORDS.DataSets(record_path=test_records_path, batch_size=self.batch_size)
            test_gen = test_datasets.transform(shuffle=True)
            test_iterator = test_gen.make_one_shot_iterator()
            self.test_next_elements = test_iterator.get_next()
        else:
            self.gt_labels_train = self.data.prepare('train')
            self.gt_labels_test = self.data.prepare('test')
        if self.re_train:
            self.load_det_model()
        else:
            self.load_pre_train_model()


    def load_pre_train_model(self):
        net_vars = slim.get_model_variables()
        model_file = tf.train.latest_checkpoint(self.model_cls_dir)
        reader = tf.train.NewCheckpointReader(model_file)
        model_vars = reader.get_variable_to_shape_map()
        exclude = ['yolov1/classify_fc1/weights', 'yolov1/classify_fc1/biases']

        vars_restore_map = {}
        for var in net_vars:
            if var.op.name in model_vars and var.op.name not in exclude:
                vars_restore_map[var.op.name] = var

        self.saver = tf.train.Saver(vars_restore_map,max_to_keep=4)
        self.saver.restore(self.sess, model_file)
        self.saver = tf.train.Saver(var_list=net_vars,max_to_keep=4)

    def load_det_model(self):
        # self.saver = tf.train.Saver(max_to_keep=4)
        net_vars = slim.get_model_variables()
        self.saver = tf.train.Saver(net_vars,max_to_keep=4)

        model_file = tf.train.latest_checkpoint(self.model_det_dir)
        self.saver.restore(self.sess, model_file)



    # 帶BN的訓練函數
    def optimizer_bn(self,lr, loss, mom=0.9, fun='mm'):
        with tf.name_scope('optimzer_bn'):
            update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
            with tf.control_dependencies([tf.group(*update_ops)]):
                optim = tf.train.MomentumOptimizer(learning_rate=lr, momentum=0.9)
                train_op = slim.learning.create_train_op(loss, optim)
        return train_op



def train_classify():
    yolov1 = YOLO_Net(is_pre_training=True)

    sovler = Solver(net= yolov1,data=0)
    print('start ...')
    sovler.train_classify()

def train_detector():
    yolov1 = YOLO_Net(is_pre_training=False)
    pasvoc07 = Pascal_voc()
    sovler = Solver(net=yolov1,data=pasvoc07)
    print('start train ...')
    sovler.train_detector()

def train_detector_with_records():
    yolov1 = YOLO_Net(is_pre_training=False)
    sovler = Solver(net=yolov1,data=0,tf_records=True)
    print('start train ...')
    sovler.train_detector()

if __name__ == '__main__':
    train_detector_with_records()
           

至于測試代碼detector和classifier我都放到github上,這裡就不貼了,yolo一刀切完成識别和定位,強迫網絡訓練的形式是最像神經網絡風格的,但毫無疑問的是,每個格子預測固定數量,一旦出現檢測目标貼近就會有漏檢的風險,recall就會很低;當然,yolov3改進了這些問題;

個人訓練細節:

預訓練使用了cifar10資料集,比較少,且分辨率比較低,是以識别訓練也會受到很大影響,尤其是cifar10和voc07資料集類别對不上的情況較為突出;

識别訓練使用了voc07資料集,前前後後,訓練了上千個批次,batchsize也從32,64,96,128,斷點訓練的時候改的,這裡的貼圖也隻是其中最開始的訓練,後續的重載訓練就不貼圖了;

個人網絡遵循了yolov1的網絡,差別在于添加了BN加速了訓練過程;

貼一下分類訓練:

【深度學習】YOLO_V1 複現(使用Tensorflow實作)論文閱讀Tensorflow代碼實作:

貼一下識别訓練:

【深度學習】YOLO_V1 複現(使用Tensorflow實作)論文閱讀Tensorflow代碼實作:

貼一下識别效果:

【深度學習】YOLO_V1 複現(使用Tensorflow實作)論文閱讀Tensorflow代碼實作:

結論:

本次識别效果一般,誤檢和漏檢都挺多;

yolo的設計思想對很多看rcnn二刀流流派的人來講,較為怪異,很人多弄不明白,尤其是寫代碼的時候,建議多參考參考他人代碼,我也是,參考了其他人寫的代碼,才能寫出yolo的代碼,最重要的,是yolo标簽的制作過程,和其他的有很大不同,識别效果挺一般的,當然和我的預訓練資料集有較大關系,也離不開yolov1本身設計還是有較大問題的緣故,後續會有其它更新論文來改正這個缺點;總之,好的效果需要建立在 好的資料集,好的模型,好的訓練技巧的基礎上。

這次代碼全部放在了個人的Github上,包含了yolo的模型,訓練(預訓練,分類訓練),檢測(識别檢測,分類檢測),yolo标簽制作(img版本和tfrecords版本),基本上應該是目前Yolov1最全的了,網上應該找不到比我這個還全的tensorflow的版本;

我訓練好的權重傳到了百度雲盤,可以自行下載下傳體驗,權重檔案,百度雲位址為連結:https://pan.baidu.com/s/1BdMZYvkiYT9Fts0dLIgrog

提取碼:0rmi 

本次代碼參考:

【1】https://github.com/TowardsNorth/yolo_v1_tensorflow_guiyu

【2】https://github.com/leeyoshinari/YOLO_v1

繼續閱讀