論文位址:https://arxiv.org/pdf/1506.02640.pdf
本文所包含代碼GitHub位址:https://github.com/shankezh/DL_HotNet_Tensorflow
如果對機器學習有興趣,不僅僅滿足将深度學習模型當黑盒模型使用的,想了解為何機器學習可以訓練拟合最佳模型,可以看我過往的部落格,使用數學知識推導了機器學習中比較經典的案例,并且使用了python撸了一套簡單的神經網絡的代碼架構用來加深了解:https://blog.csdn.net/shankezh/article/category/7279585
項目幫忙或工作機會請郵件聯系:[email protected]
---------------------
本文對論文進行了代碼複現;
---------------------
論文閱讀
關鍵資訊提取
1.目前檢測系統應用分類器去執行檢測,如DPM,RCNN等;
2.YOLO則将對象檢測看做一個回歸問題;使用這個系統,你進需要看一次圖像就可以預測對象是什麼且對象在哪裡;
3.系統劃分輸入圖像為SxS個格子,如果一個對象的中心在一個單元格中,那麼這個單元格就負責檢測這個對象;
4.每一個單元格預測B個bbox和這些bbox的置信率分數;分數反映了模型對這個box中的對象預測的正确率的思考;定義置信率分數為Pr(Object) * IOU;
5.如果沒有對象在這個單元格,那麼置信率分數将會是0;否則我們想要置信率分數等于預測box和真實box的IOU;
6.每個bbox包含五個預測,x,y,w,h,confidence(置信率),(x,y)坐标表示盒子的中心相對與單元格的邊界;width和height是相對于整個圖像預測的;置信率表示為真實box和預測box之間的IOU;
7.每個格子也要預測C個條件機率,Pr(Class i | Object);這些機率是限制在格子包含的一個對象中;
8.僅預測每個單元格中一組類别機率,無論裡面有多個boxes B;
9.測試時乘上條件機率和乘上分離的盒子置信率,公式見下圖:
其中,Pr(Object)要麼為1,表示單元格中有對象,要麼為0,表示單元格中無對象;
10.評估YOLO在PASCAL VOC中,我們使用S=7,B=2. PASCAL VOC有二十個标簽類别,是以C=20,那麼最終預測的結果就是7x7x30 shape的tensor,公式為S x S x (B * 5 + C);
11.設計的網絡中,卷積層從圖像抽取特征,fc預測輸出機率和坐标;
12.設計的網絡受到GoogLeNet的圖像分類啟發,設計了24個卷積層後跟2個fc層,使用了1x1後跟3x3來代替GoogLeNet的inception子產品,全部設計見下圖:
作者在ImageNet上進行了預訓練分類任務,分辨率是224x224,然後使用了翻倍的分辨率去做檢測,也就是448x448;
訓練
1.在ImageNet上預訓練;
2.預訓練使用了前20層卷積層,後面跟一個均值池化層和一個全連接配接層;
3.由于一篇論文說同時添加卷積核全連接配接可以提升性能,是以我們添加了4個卷積層和2個全連接配接層,并設定了随機初始化權重;由于檢測往往需要細密的紋理視覺資訊,是以增加了輸入到網絡的分辨率,将224x224變成了448x448;
4.最後一層預測類别機率和bbox坐标;作者通過圖檔的寬高規範化了bbox的寬和高,是以他們降到了0和1之間;我們參數化bbox坐标x,y為特殊的單元格定位偏移,是以他們也總是在0和1之間;
5.我們最後一層使用線性激活單元,其他所有曾都是用leaky 激活函數,激活函數見下圖(Leaky ReLU);
6.優化器使用和方差作為模型輸出,是以和方差容易計算,但對于最大化平均精度來說不夠完美;權衡定位誤差和分類誤差不夠完美;如果每個圖像中許多個單元格都不包含對象,那麼就設定下這些單元格的置信率為0,通常梯度會超過包含對象的單元格,這會導緻模型不穩定,因為早期的訓練會發散;
7.為了解決6的問題,作者增加了bbox的坐标預測損失,減少不包含對象的置信率預測損失,是以使用了兩個參數λcoord, λnoobj , λcoord = 5, λnoobj = 0.5;
8.和方差同樣對待大boxes和小boxes的誤差,度量誤差時,大的boxes比小的boxes偏差更重要,為了部分解決這個問題,作者預測bbox的寬高的方根來代替直接預測寬高;
9.設計的誤差如下:
這裡使用X(上角标 #下角标)來代替公式中具有上下角标的變量,其中 1(obj # i)代表在第i個單元格中存在對象;1(obj # ij)代表第j個bbox的預測器在單元格i中負責檢測;
10.在VOC2007和2012上訓練了135個批次,batchsize為64,使用了momentun,沖量0.9,衰減0.0005;學習速率如下,第一個批次緩慢減少從0.001到0.01,後面到第75個批次,速率為0.01,然後再用0.001訓練30個批次,最後使用0.0001訓練30個批次;
11.為了避免過拟合,使用了dropout和資料擴充,dropout設定為0.5在第一個fc後,對于資料擴充使用了随機縮放和變換,大約在20%的原始圖像尺寸,也随機調整了圖像的HSV飽和度通過設定因子為1.5;
12.一些大的對象機關會交叉落到多個單元格内,是以,非極大抑制法(NMS)就起作用了;
YOLO的限制
1.yolo在預測bbox上具有較強的空間限制,因為每個單元格僅僅預測兩個boxes和僅有一個類别;這樣就限制了模型對較多鄰近對象的預測;
2.模型也使用了粗略的相關性特征去預測bbox,因為結構有多重下采樣層;
Tensorflow代碼實作:
模型代碼(包含預訓練和檢測兩種網絡)
model.py:
import tensorflow as tf
import tensorflow.contrib.slim as slim
import net.Detection.YOLOV1.config as cfg
import numpy as np
class YOLO_Net(object):
def __init__(self,is_pre_training=False,is_training = True):
self.classes = cfg.VOC07_CLASS
self.pre_train_num = cfg.PRE_TRAIN_NUM
self.det_cls_num = len(self.classes)
self.image_size = cfg.DET_IMAGE_SIZE
self.cell_size = cfg.CELL_SIZE
self.boxes_per_cell = cfg.PER_CELL_CHECK_BOXES
self.output_size = (self.cell_size * self.cell_size) * ( 5 * self.boxes_per_cell + self.det_cls_num)
self.scale = 1.0 * self.image_size / self.cell_size
self.boundary1 = self.cell_size * self.cell_size * self.det_cls_num
self.boundary2 = self.boundary1 + self.cell_size * self.cell_size * self.boxes_per_cell
self.object_scale = cfg.OBJ_CONFIDENCE_SCALE
self.no_object_scale = cfg.NO_OBJ_CONFIDENCE_SCALE
self.class_scale = cfg.CLASS_SCALE
self.coord_scale = cfg.COORD_SCALE
self.learning_rate = 0.0001
self.batch_size = cfg.BATCH_SIZE
self.keep_prob = cfg.KEEP_PROB
self.pre_training = is_pre_training
self.offset = np.transpose(
np.reshape(
np.array(
[np.arange(self.cell_size)]*self.cell_size*self.boxes_per_cell
),(self.boxes_per_cell,self.cell_size,self.cell_size)
),(1,2,0)
)
self.bn_params = cfg.BATCH_NORM_PARAMS
self.is_training = tf.placeholder(tf.bool)
if self.pre_training:
self.images = tf.placeholder(tf.float32, [None, 224, 224, 3], name='images')
else:
self.images = tf.placeholder(tf.float32, [None, self.image_size, self.image_size, 3], name='images')
self.logits = self.build_network(self.images,is_training=self.is_training)
if is_training:
if self.pre_training:
self.labels = tf.placeholder(tf.float32, [None,self.pre_train_num])
self.classify_loss(self.logits,self.labels)
self.total_loss = tf.losses.get_total_loss()
self.evalution = self.classify_evalution(self.logits,self.labels)
print('預訓練網絡')
else:
self.labels = tf.placeholder(tf.float32, [None,self.cell_size,self.cell_size,5+self.det_cls_num])
self.det_loss_layer(self.logits,self.labels)
self.total_loss = tf.losses.get_total_loss()
tf.summary.scalar('total_loss', self.total_loss)
print('識别網絡')
def build_network(self, images,is_training = True,scope = 'yolov1'):
net = images
with tf.variable_scope(scope):
with slim.arg_scope([slim.conv2d, slim.fully_connected],
weights_regularizer=slim.l2_regularizer(0.00004)):
with slim.arg_scope([slim.conv2d],
weights_initializer=slim.xavier_initializer(),
normalizer_fn=slim.batch_norm,
activation_fn=slim.nn.leaky_relu,
normalizer_params=self.bn_params):
with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training):
net = slim.conv2d(net, 64, [7, 7], stride=2, padding='SAME', scope='layer1')
net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool1')
net = slim.conv2d(net, 192, [3, 3], stride=1, padding='SAME', scope='layer2')
net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool2')
net = slim.conv2d(net, 128, [1, 1], stride=1, padding='SAME', scope='layer3_1')
net = slim.conv2d(net, 256, [3, 3], stride=1, padding='SAME', scope='layer3_2')
net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer3_3')
net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer3_4')
net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool3')
net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_1')
net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_2')
net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_3')
net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_4')
net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_5')
net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_6')
net = slim.conv2d(net, 256, [1, 1], stride=1, padding='SAME', scope='layer4_7')
net = slim.conv2d(net, 512, [3, 3], stride=1, padding='SAME', scope='layer4_8')
net = slim.conv2d(net, 512, [1, 1], stride=1, padding='SAME', scope='layer4_9')
net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer4_10')
net = slim.max_pool2d(net, [2, 2], stride=2, padding='SAME', scope='pool4')
net = slim.conv2d(net, 512, [1, 1], stride=1, padding='SAME', scope='layer5_1')
net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer5_2')
net = slim.conv2d(net, 512, [1, 1], stride=1, padding='SAME', scope='layer5_3')
net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer5_4')
if self.pre_training:
net = slim.avg_pool2d(net, [7, 7], stride=1, padding='VALID', scope='clssify_avg5')
net = slim.flatten(net)
net = slim.fully_connected(net, self.pre_train_num, activation_fn=slim.nn.leaky_relu,
scope='classify_fc1')
return net
net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer5_5')
net = slim.conv2d(net, 1024, [3, 3], stride=2, padding='SAME', scope='layer5_6')
net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer6_1')
net = slim.conv2d(net, 1024, [3, 3], stride=1, padding='SAME', scope='layer6_2')
net = slim.flatten(net)
net = slim.fully_connected(net, 1024, activation_fn=slim.nn.leaky_relu, scope='fc1')
net = slim.dropout(net, 0.5)
net = slim.fully_connected(net, 4096, activation_fn=slim.nn.leaky_relu, scope='fc2')
net = slim.dropout(net, 0.5)
net = slim.fully_connected(net, self.output_size, activation_fn=None, scope='fc3')
# N, 7,7,30
# net = tf.reshape(net,[-1,S,S,B*5+C])
return net
def classify_loss(self,logits,labels):
with tf.name_scope('classify_loss') as scope:
_loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=labels)
mean_loss = tf.reduce_mean(_loss)
tf.losses.add_loss(mean_loss)
tf.summary.scalar(scope + 'classify_mean_loss', mean_loss)
def classify_evalution(self,logits,labels):
with tf.name_scope('classify_evaluation') as scope:
correct_pre = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accurary = tf.reduce_mean(tf.cast(correct_pre, 'float'))
# tf.summary.scalar(scope + 'accuracy:', accurary)
return accurary
'''
@:param predicts shape->[N,7x7x30]
@:param labels shape->[N,7,7,25] <==>[N,h方向,w方向,25] ==>[N,7,7,25(1:是否負責檢測,2-5:坐标,6-25:類别one-hot)]
'''
def det_loss_layer(self, predicts, labels, scope='det_loss'):
with tf.variable_scope(scope):
predict_classes = tf.reshape(predicts[:, :self.boundary1],
[-1, 7, 7, 20]) # 類别預測 ->[batch_size,cell_size,cell_size,num_cls]
predict_scale = tf.reshape(predicts[:, self.boundary1:self.boundary2],
[-1, 7, 7, 2]) # 置信率預測-> [batch_size,cell_size,cell_size,boxes_per_cell]
predict_boxes = tf.reshape(predicts[:,self.boundary2:],
[-1, 7, 7, 2, 4]) # 坐标預測->[batch_size,cell_size,cell_size,boxes_per_cell,4]
response = tf.reshape(labels[:, :, :, 0], [-1, 7, 7, 1]) # 标簽置信率,用來判斷cell是否負責檢測
boxes = tf.reshape(labels[:, :, :, 1:5], [-1, 7, 7, 1, 4]) # 标簽坐标
boxes = tf.tile(boxes,
[1, 1, 1, 2, 1]) / self.image_size # 标簽坐标,由于預測是2個,是以需要将标簽也變成2個,同時對坐标進行yolo形式歸一化
classes = labels[:, :, :, 5:] # 标簽類别
offset = tf.constant(self.offset, dtype=tf.float32)
offset = tf.reshape(offset, [1, 7, 7, 2])
offset = tf.tile(offset, [tf.shape(boxes)[0], 1, 1, 1])
predict_boxes_tran = tf.stack([
1. * (predict_boxes[:, :, :, :, 0] + offset) / self.cell_size,
1. * (predict_boxes[:, :, :, :, 1] + tf.transpose(offset, (0, 2, 1, 3))) / self.cell_size,
tf.square(predict_boxes[:, :, :, :, 2]),
tf.square(predict_boxes[:, :, :, :, 3])
], axis=-1)
# predict_boxes_tran = tf.transpose(predict_boxes_tran,[1,2,3,4,0])
iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes)
object_mask = tf.reduce_max(iou_predict_truth, 3, keep_dims=True)
object_mask = tf.cast((iou_predict_truth >= object_mask), tf.float32) * response
no_object_mask = tf.ones_like(object_mask, dtype=tf.float32) - object_mask
boxes_tran = tf.stack([
1. * boxes[:, :, :, :, 0] * 7 - offset,
1. * boxes[:, :, :, :, 1] * 7 - tf.transpose(offset, (0, 2, 1, 3)),
tf.sqrt(boxes[:, :, :, :, 2]),
tf.sqrt(boxes[:, :, :, :, 3])
], axis=-1)
# 類别損失
class_delta = response * (predict_classes - classes)
class_loss = tf.reduce_mean(tf.reduce_sum(tf.square(class_delta), axis=[1, 2, 3]),
name='class_loss') * self.class_scale
# 對象損失
object_delta = object_mask * (predict_scale - iou_predict_truth)
object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(object_delta), axis=[1, 2, 3]),
name='object_loss') * self.object_scale
# 無對象損失
no_object_delta = no_object_mask * predict_scale
no_object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(no_object_delta), axis=[1, 2, 3]),
name='no_object_loss') * self.no_object_scale
# 坐标損失
coord_mask = tf.expand_dims(object_mask, 4)
boxes_delta = coord_mask * (predict_boxes - boxes_tran)
coord_loss = tf.reduce_mean(tf.reduce_sum(tf.square(boxes_delta), axis=[1, 2, 3, 4]),
name='coord_loss') * self.coord_scale
tf.losses.add_loss(class_loss)
tf.losses.add_loss(object_loss)
tf.losses.add_loss(no_object_loss)
tf.losses.add_loss(coord_loss)
tf.summary.scalar('class_loss', class_loss)
tf.summary.scalar('object_loss', object_loss)
tf.summary.scalar('noobject_loss', no_object_loss)
tf.summary.scalar('coord_loss', coord_loss)
tf.summary.histogram('boxes_delta_x', boxes_delta[:, :, :, :, 0])
tf.summary.histogram('boxes_delta_y', boxes_delta[:, :, :, :, 1])
tf.summary.histogram('boxes_delta_w', boxes_delta[:, :, :, :, 2])
tf.summary.histogram('boxes_delta_h', boxes_delta[:, :, :, :, 3])
tf.summary.histogram('iou', iou_predict_truth)
def calc_iou(self, boxes1, boxes2, scope='iou'):
"""calculate ious
Args:
boxes1: 4-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ====> (x_center, y_center, w, h)
boxes2: 1-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ===> (x_center, y_center, w, h)
Return:
iou: 3-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]
"""
with tf.variable_scope(scope):
boxes1 = tf.stack([boxes1[:, :, :, :, 0] - boxes1[:, :, :, :, 2] / 2.0,
boxes1[:, :, :, :, 1] - boxes1[:, :, :, :, 3] / 2.0,
boxes1[:, :, :, :, 0] + boxes1[:, :, :, :, 2] / 2.0,
boxes1[:, :, :, :, 1] + boxes1[:, :, :, :, 3] / 2.0], axis=-1)
# boxes1 = tf.transpose(boxes1, [1, 2, 3, 4, 0])
boxes2 = tf.stack([boxes2[:, :, :, :, 0] - boxes2[:, :, :, :, 2] / 2.0,
boxes2[:, :, :, :, 1] - boxes2[:, :, :, :, 3] / 2.0,
boxes2[:, :, :, :, 0] + boxes2[:, :, :, :, 2] / 2.0,
boxes2[:, :, :, :, 1] + boxes2[:, :, :, :, 3] / 2.0], axis=-1)
# boxes2 = tf.transpose(boxes2, [1, 2, 3, 4, 0])
lu = tf.maximum(boxes1[:, :, :, :, :2], boxes2[:, :, :, :, :2])
rd = tf.minimum(boxes1[:, :, :, :, 2:], boxes2[:, :, :, :, 2:])
intersection = tf.maximum(0.0, rd - lu)
inter_square = intersection[:, :, :, :, 0] * intersection[:, :, :, :, 1]
square1 = (boxes1[:, :, :, :, 2] - boxes1[:, :, :, :, 0]) * \
(boxes1[:, :, :, :, 3] - boxes1[:, :, :, :, 1])
square2 = (boxes2[:, :, :, :, 2] - boxes2[:, :, :, :, 0]) * \
(boxes2[:, :, :, :, 3] - boxes2[:, :, :, :, 1])
union_square = tf.maximum(square1 + square2 - inter_square, 1e-10)
return tf.clip_by_value(inter_square / union_square, 0.0, 1.0)
訓練(包含分類預訓練和檢測訓練):
solver.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Created by solver on 19-5-6
import tensorflow as tf
from net.Detection.YOLOV1.model import YOLO_Net
import net.Detection.YOLOV1.config as cfg
import tensorflow.contrib.slim as slim
from net.Detection.YOLOV1.voc07_img import Pascal_voc
from coms.learning_rate import CLR_EXP_RANGE
from coms.utils import isHasGpu,isLinuxSys
import time,os
from coms.pre_process import get_cifar10_batch
import net.Detection.YOLOV1.voc07_tfrecord as VOC07RECORDS
class Solver(object):
def __init__(self,net,data,tf_records=False):
self.net = net
self.data = data
self.tf_records = tf_records
self.batch_size = cfg.BATCH_SIZE
self.clr = CLR_EXP_RANGE()
self.log_dir = cfg.LOG_DIR
self.model_cls_dir = cfg.CLS_MODEL_DIR
self.model_det_dir = cfg.DET_MODEL_DIR
self.learning_rate = tf.placeholder(tf.float32)
self.re_train = True
tf.summary.scalar('learning_rate',self.learning_rate)
self.optimizer = self.optimizer_bn(lr=self.learning_rate,loss=self.net.total_loss)
if isHasGpu():
gpu_option = tf.GPUOptions(allow_growth=True)
config = tf.ConfigProto(allow_soft_placement=True,gpu_options=gpu_option)
else:
config = tf.ConfigProto(allow_soft_placement=True)
self.sess = tf.Session(config=config)
self.sess.run(tf.global_variables_initializer())
self.summary_op = tf.summary.merge_all()
n_time = time.strftime("%Y-%m-%d %H-%M", time.localtime())
self.writer = tf.summary.FileWriter(os.path.join(self.log_dir, n_time),self.sess.graph)
self.saver = tf.train.Saver(max_to_keep=4)
def train_classify(self):
self.set_classify_params()
max_acc = 0.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=self.sess, coord=coord)
for epoch in range(cfg.EPOCH):
for step in range(1,cfg.ITER_STEP+1):
learning_rate_val = self.clr.calc_lr(step,cfg.ITER_STEP+1,0.001,0.01,gamma=0.9998)
train_img_batch, train_label_batch = self.sess.run([self.train_img_batch,self.train_label_batch])
feed_dict_train = {self.net.images:train_img_batch, self.net.labels:train_label_batch, self.net.is_training:True,self.learning_rate:learning_rate_val}
_, summary_op, batch_train_loss, batch_train_acc = self.sess.run([self.optimizer, self.summary_op,self.net.total_loss,self.net.evalution],feed_dict=feed_dict_train)
global_step = int(epoch * cfg.ITER_STEP + step + 1)
print("epoch %d , step %d train end ,loss is : %f ,accuracy is %f ... ..." % (epoch, step, batch_train_loss, batch_train_acc))
train_summary = tf.Summary(
value=[tf.Summary.Value(tag='train_loss', simple_value=batch_train_loss)
, tf.Summary.Value(tag='train_batch_accuracy', simple_value=batch_train_acc)
, tf.Summary.Value(tag='learning_rate', simple_value=learning_rate_val)])
self.writer.add_summary(train_summary,global_step=global_step)
self.writer.add_summary(summary_op,global_step=global_step)
self.writer.flush()
if step % 100 == 0:
print('test sets evaluation start ...')
ac_iter = int(10000 / self.batch_size) # cifar-10測試集數量10000張
ac_sum = 0.
loss_sum = 0.
for ac_count in range(ac_iter):
batch_test_img, batch_test_label = self.sess.run([self.test_img_batch, self.test_label_batch])
feed_dict_test = {self.net.images: batch_test_img,self.net.labels: batch_test_label,self.net.is_training: False,self.learning_rate:learning_rate_val}
test_loss, test_accuracy = self.sess.run([self.net.total_loss, self.net.evalution],feed_dict=feed_dict_test)
ac_sum += test_accuracy
loss_sum += test_loss
ac_mean = ac_sum / ac_iter
loss_mean = loss_sum / ac_iter
print('epoch {} , step {} , accuracy is {}'.format(str(epoch), str(step), str(ac_mean)))
test_summary = tf.Summary(
value=[tf.Summary.Value(tag='test_loss', simple_value=loss_mean)
, tf.Summary.Value(tag='test_accuracy', simple_value=ac_mean)])
self.writer.add_summary(test_summary, global_step=global_step)
self.writer.flush()
if ac_mean >= max_acc:
max_acc = ac_mean
self.saver.save(self.sess, self.model_cls_dir + '/' + 'cifar10_{}_step_{}.ckpt'.format(str(epoch),str(step)), global_step=step)
print('max accuracy has reaching ,save model successful ...')
print('train network task was run over')
def set_classify_params(self):
self.train_img_batch,self.train_label_batch = get_cifar10_batch(is_train=True,batch_size=self.batch_size,num_cls=cfg.PRE_TRAIN_NUM,img_prob=[224,224,3])
self.test_img_batch,self.test_label_batch = get_cifar10_batch(is_train=False,batch_size=self.batch_size,num_cls=cfg.PRE_TRAIN_NUM,img_prob=[224,224,3])
def train_detector(self):
self.set_detector_params()
for epoch in range(cfg.EPOCH):
for step in range(1,cfg.ITER_STEP+1):
global_step = int(epoch * cfg.ITER_STEP + step + 1)
learning_rate_val = self.clr.calc_lr(step,cfg.ITER_STEP+1,0.0001,0.0005,gamma=0.9998)
if self.tf_records:
train_images, train_labels = self.sess.run(self.train_next_elements)
else:
train_images, train_labels = self.data.next_batch(self.gt_labels_train, self.batch_size)
feed_dict_train = {self.net.images:train_images,self.net.labels:train_labels,self.learning_rate:learning_rate_val,self.net.is_training:True}
_,summary_str,train_loss = self.sess.run([self.optimizer,self.summary_op,self.net.total_loss],feed_dict=feed_dict_train)
print("epoch %d , step %d train end ,loss is : %f ... ..." % (epoch, step, train_loss))
self.writer.add_summary(summary_str,global_step)
if step % 50 ==0:
print('test sets start ...')
# test sets sum :4962
sum_loss = 0.
# test_iter = int (4962 / self.batch_size)
test_iter = 10 # 取10個批次求均值
for _ in range(test_iter):
if self.tf_records:
test_images, test_labels = self.sess.run(self.test_next_elements)
else:
test_images,test_labels = self.data.next_batch(self.gt_labels_test,self.batch_size)
feed_dict_test = {self.net.images:test_images,self.net.labels:test_labels,self.net.is_training:False}
loss_iter = self.sess.run(self.net.total_loss,feed_dict=feed_dict_test)
sum_loss += loss_iter
mean_loss = sum_loss/test_iter
print('epoch {} , step {} , test loss is {}'.format(str(epoch), str(step), str(mean_loss)))
test_summary = tf.Summary(
value=[tf.Summary.Value(tag='test_loss', simple_value=mean_loss)])
self.writer.add_summary(test_summary, global_step=global_step)
self.writer.flush()
self.saver.save(self.sess,self.model_det_dir+'/' + 'det_voc07_{}_step_{}.ckpt'.format(str(epoch),str(step)), global_step=step)
print('save model successful ...')
def set_detector_params(self):
if self.tf_records:
train_records_path = r'/home/ws/DataSets/pascal_VOC/VOC07/tfrecords' + '/trainval.tfrecords'
test_records_path = r'/home/ws/DataSets/pascal_VOC/VOC07/tfrecords' + '/test.tfrecords'
train_datasets = VOC07RECORDS.DataSets(record_path=train_records_path,batch_size=self.batch_size)
train_gen = train_datasets.transform(shuffle=True)
train_iterator = train_gen.make_one_shot_iterator()
self.train_next_elements = train_iterator.get_next()
test_datasets = VOC07RECORDS.DataSets(record_path=test_records_path, batch_size=self.batch_size)
test_gen = test_datasets.transform(shuffle=True)
test_iterator = test_gen.make_one_shot_iterator()
self.test_next_elements = test_iterator.get_next()
else:
self.gt_labels_train = self.data.prepare('train')
self.gt_labels_test = self.data.prepare('test')
if self.re_train:
self.load_det_model()
else:
self.load_pre_train_model()
def load_pre_train_model(self):
net_vars = slim.get_model_variables()
model_file = tf.train.latest_checkpoint(self.model_cls_dir)
reader = tf.train.NewCheckpointReader(model_file)
model_vars = reader.get_variable_to_shape_map()
exclude = ['yolov1/classify_fc1/weights', 'yolov1/classify_fc1/biases']
vars_restore_map = {}
for var in net_vars:
if var.op.name in model_vars and var.op.name not in exclude:
vars_restore_map[var.op.name] = var
self.saver = tf.train.Saver(vars_restore_map,max_to_keep=4)
self.saver.restore(self.sess, model_file)
self.saver = tf.train.Saver(var_list=net_vars,max_to_keep=4)
def load_det_model(self):
# self.saver = tf.train.Saver(max_to_keep=4)
net_vars = slim.get_model_variables()
self.saver = tf.train.Saver(net_vars,max_to_keep=4)
model_file = tf.train.latest_checkpoint(self.model_det_dir)
self.saver.restore(self.sess, model_file)
# 帶BN的訓練函數
def optimizer_bn(self,lr, loss, mom=0.9, fun='mm'):
with tf.name_scope('optimzer_bn'):
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies([tf.group(*update_ops)]):
optim = tf.train.MomentumOptimizer(learning_rate=lr, momentum=0.9)
train_op = slim.learning.create_train_op(loss, optim)
return train_op
def train_classify():
yolov1 = YOLO_Net(is_pre_training=True)
sovler = Solver(net= yolov1,data=0)
print('start ...')
sovler.train_classify()
def train_detector():
yolov1 = YOLO_Net(is_pre_training=False)
pasvoc07 = Pascal_voc()
sovler = Solver(net=yolov1,data=pasvoc07)
print('start train ...')
sovler.train_detector()
def train_detector_with_records():
yolov1 = YOLO_Net(is_pre_training=False)
sovler = Solver(net=yolov1,data=0,tf_records=True)
print('start train ...')
sovler.train_detector()
if __name__ == '__main__':
train_detector_with_records()
至于測試代碼detector和classifier我都放到github上,這裡就不貼了,yolo一刀切完成識别和定位,強迫網絡訓練的形式是最像神經網絡風格的,但毫無疑問的是,每個格子預測固定數量,一旦出現檢測目标貼近就會有漏檢的風險,recall就會很低;當然,yolov3改進了這些問題;
個人訓練細節:
預訓練使用了cifar10資料集,比較少,且分辨率比較低,是以識别訓練也會受到很大影響,尤其是cifar10和voc07資料集類别對不上的情況較為突出;
識别訓練使用了voc07資料集,前前後後,訓練了上千個批次,batchsize也從32,64,96,128,斷點訓練的時候改的,這裡的貼圖也隻是其中最開始的訓練,後續的重載訓練就不貼圖了;
個人網絡遵循了yolov1的網絡,差別在于添加了BN加速了訓練過程;
貼一下分類訓練:
貼一下識别訓練:
貼一下識别效果:
結論:
本次識别效果一般,誤檢和漏檢都挺多;
yolo的設計思想對很多看rcnn二刀流流派的人來講,較為怪異,很人多弄不明白,尤其是寫代碼的時候,建議多參考參考他人代碼,我也是,參考了其他人寫的代碼,才能寫出yolo的代碼,最重要的,是yolo标簽的制作過程,和其他的有很大不同,識别效果挺一般的,當然和我的預訓練資料集有較大關系,也離不開yolov1本身設計還是有較大問題的緣故,後續會有其它更新論文來改正這個缺點;總之,好的效果需要建立在 好的資料集,好的模型,好的訓練技巧的基礎上。
這次代碼全部放在了個人的Github上,包含了yolo的模型,訓練(預訓練,分類訓練),檢測(識别檢測,分類檢測),yolo标簽制作(img版本和tfrecords版本),基本上應該是目前Yolov1最全的了,網上應該找不到比我這個還全的tensorflow的版本;
我訓練好的權重傳到了百度雲盤,可以自行下載下傳體驗,權重檔案,百度雲位址為連結:https://pan.baidu.com/s/1BdMZYvkiYT9Fts0dLIgrog
提取碼:0rmi
本次代碼參考:
【1】https://github.com/TowardsNorth/yolo_v1_tensorflow_guiyu
【2】https://github.com/leeyoshinari/YOLO_v1