【深度學習-模型eval+模型導出】使用Tensorflow Slim對訓練的模型進行評估+導出模型

之前文章已經講解了

step1:怎麼将你的原始圖像資料轉成TF-Record格式；（請參考：TF-Record檔案制作）

step2：然後運用轉成TF-Record個格式的檔案在Inception V3上做模型訓練（請參考：模型fine-tune和整個權重檔案重新訓練）

在這兩步基礎上我們會在訓練權重檔案夾(我的目錄是：slim/satellite/train_dir/)下生成如下檔案：

【深度學習-模型eval+模型導出】使用Tensorflow Slim對訓練的模型進行評估+導出模型

我訓練使用的指令如下：

python train_image_classifier.py \
  --train_dir=satellite/train_dir \
  --dataset_name=satellite \
  --dataset_split_name=train \
  --dataset_dir=satellite/data \
  --model_name=inception_v3 \
  --checkpoint_path=satellite/pretrained/inception_v3.ckpt \
  --checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \
  --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \
  --max_number_of_steps=100000 \
  --batch_size=32 \
  --learning_rate=0.001 \
  --learning_rate_decay_type=fixed \
  --save_interval_secs=300 \
  --save_summaries_secs=2 \
  --log_every_n_steps=10 \
  --optimizer=rmsprop \
  --weight_decay=0.00004

每個指令的參數含義請參考模型fine-tune和整個權重檔案重新訓練，訓練了10萬步，其實在訓練開始後就可以運作eval來評估模型的效果。實際情況是eval模型也需要加載ckpt檔案，需要占用不小記憶體，訓練階段會适當調整batch大小合理利用顯示卡性能。是以想實時運作train和eval的話需要調整好兩者所需記憶體。

模型評估

驗證模型效果的指令如下(在slim檔案夾下運作）：

python eval_image_classifier.py \
  --checkpoint_path=satellite/train_dir \
  --eval_dir=satellite/eval_dir \
  --dataset_name=satellite \
  --dataset_split_name=validation \
  --dataset_dir=satellite/data \
  --model_name=inception_v3

其中--checkpoint_path就是模型檔案存放路徑，這個參數既可以接收一個目錄的路徑，也可以接收一個檔案的路徑。如果接收的是一個目錄的路徑，如這裡的satellite/train_dir，就會在這個目錄中尋找最新儲存的模型檔案，執行驗證。也可以指定一個模型進行驗證，以第 300 步的模型為例，在 satellite/train_dir 檔案夾下把被儲存為 model.clcpt-300.meta 、 model.ckpt-300.index、 model.ckpt-300.data-00000-of-00001 三個檔案。此時，如果要對它執行驗證，給 checkpoint_path 傳遞的參數應該為

satellite/train_dir/model.ckpt-300；

--eval_dir是驗證結果存放路徑；--dataset_name是資料集名稱；--dataset_split_name是資料集操作類型名（根據驗證集還是訓練集實際到對應目錄下找這階段的tf-record資料）；--dataset_dir就是驗證集tf-record存放目錄；--model_name網絡模型結構的名字；

用的是slim子產品下自帶的eval_image_classifier.py檔案，裡面有很多預設的參數，如傳入的batch大小預設100，預設用4線程等；

訓練結果列印如下（在TITAN X 記憶體12G）：

其中Accuracy是模型分類準确率，而Recall_5是Top5的準确率，表示在輸出的類别機率中，正确類别隻要落在前5就算對。由于訓練目标類别總共才6類，是以可更改Top-5為Top-2或Top-3的準确率。需要再eval_image_classifier.py中修改如下内容：

# Define the metrics:
    names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
        'Accuracy': slim.metrics.streaming_accuracy(predictions, labels),
        'Recall_5': slim.metrics.streaming_recall_at_k(
            logits, labels, 5),
    })

把其中的召回率5改成2或3就行；更改後再次運作驗證指令，得到如下結果（召回率結果下降）：

【深度學習-模型eval+模型導出】使用Tensorflow Slim對訓練的模型進行評估+導出模型

在slim/satellite/檔案夾下會生成eval_dir檔案夾，裡面存放着驗證結果；可用tensorboard進行檢視；

隻要在指令行輸入： tensorboard --logdir=your path\eval_dir

本文第一張圖中的events.out.tfevents （我的有8G那麼大，根據自己模型步長大小等因素）這個就是訓練結果目錄下的日志檔案，可用tensorboard檢視；

導出訓練好的模型

如第一張圖所示，訓練完成後會在train_dir下生成 .meta ; .index ; .ckpt ; checkpoint檔案。其中.meta檔案儲存了graph和metadata，而ckpt儲存了網絡的weights。而在生産環境中進行預測的時候是隻需要模型和權重，不需要metadata，是以需要将其提出進行freeze操作，将所需的部分放到一個檔案，友善之後的調用，也減少模型加載所需的記憶體。（在下載下傳的預訓練模型解壓後可以找到4個檔案，其中名為frozen_inference_graph.pb的檔案就是freeze後産生的模型檔案，比weights檔案大，但是比weights和meta檔案加起來要小不少。）

tensorflow提供了兩個代碼檔案： freeze_graph.py 和 classify_image_inception_v3.py。前者可以導出一個用于識别的模型, 後者則是使用 inception_v3 模型對單張圖檔做識别的腳本。

tensorflow/python/tools/freeze_graph.py

提供了freeze model的api，但是需要提供輸出的final node names（一般是softmax之類的最後一層的激活函數命名），而object detection api提供了預訓練好的網絡，final node name并不好找，是以

object_detection

目錄下還提供了

export_inference_graph.py （放置于slim目錄下）

首先将

freeze_graph.py

檔案拷貝到slim同級目錄下；

運作如下指令：

python export_inference_graph.py \
  --alsologtostderr \
  --model_name=inception_v3 \
  --output_file=satellite/inception_v3_inf_graph.pb \
  --dataset_name satellite

會在satellite檔案夾下生成一個pb檔案；注意：inception_v3_inf_graph.pb 檔案中隻儲存了 Inception V3 的網絡結構，并不包含訓練得到的模型參數，需要将 checkpoint 中的模型參數儲存進來。方法是使用 freeze_graph. py 腳本（在 chapter_3 檔案夾下運作）：

python freeze_graph.py \
  --input_graph slim/satellite/inception_v3_inf_graph.pb \
  --input_checkpoint slim/satellite/train_dir/model.ckpt-100000 \
  --input_binary true \
  --output_node_names InceptionV3/Predictions/Reshape_1 \
  --output_graph slim/satellite/frozen_graph.pb

這裡參數的含義為：

• --input_graph slim/satellite/inception_ v3 _inf_graph.pb。它表示使用的網絡結構檔案，即之前已經導出的

inception_ v3_inf_graph.pb。

• --input_ checkpoint slim/satellite/train_dir/model.ckpt-100000 。具體将哪一個 checkpoint 的參數載入到網絡結構中。這裡使用的是訓練檔案夾 train_dir 中的第 100000 步模型檔案。需要根據訓練檔案夾下 checkpoint 的實際步數，将 100000 修改成對應的數值。

• --input_binary true導入的 inception_v3 inf_graph.pb 實際是一個 protobuf 檔案。而protobuf檔案有兩種儲存格式，一種是文本形式，一種是二進制形式。 inception_v3 _inf _graph. pb 是二進制形式，是以對應的參數是 --input_binary true。初學的話對此可以不用深究，若高興趣的話可以參考資料。

•--output_node_names InceptionV3/Predictions/Reshape_1在導出的模型中，指定一個輸出結點， InceptionV3/Predictions/Reshape _ l 是 Inception V3 最後的輸出層。

•--output_graph slim/satellite／frozen_graph.pb 。最後導出的模型儲存為 slim/satellite／frozen_graph.pb 檔案。

最後生成的檔案如下：

【深度學習-模型eval+模型導出】使用Tensorflow Slim對訓練的模型進行評估+導出模型

模型調用

可以直接使用官方提供的https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb，使用jupyter notebook測試

何大神提供的py檔案classify_image_inception_v3.py （其原代碼如下）可完成對單張圖像進行預測。

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import argparse
import os.path
import re
import sys
import tarfile

import numpy as np
from six.moves import urllib
import tensorflow as tf

FLAGS = None

class NodeLookup(object):
  def __init__(self, label_lookup_path=None):
    self.node_lookup = self.load(label_lookup_path)

  def load(self, label_lookup_path):
    node_id_to_name = {}
    with open(label_lookup_path) as f:
      for index, line in enumerate(f):
        node_id_to_name[index] = line.strip()
    return node_id_to_name

  def id_to_string(self, node_id):
    if node_id not in self.node_lookup:
      return ''
    return self.node_lookup[node_id]


def create_graph():
  """Creates a graph from saved GraphDef file and returns a saver."""
  # Creates graph from saved graph_def.pb.
  with tf.gfile.FastGFile(FLAGS.model_path, 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    _ = tf.import_graph_def(graph_def, name='')

def preprocess_for_eval(image, height, width,
                        central_fraction=0.875, scope=None):
  with tf.name_scope(scope, 'eval_image', [image, height, width]):
    if image.dtype != tf.float32:
      image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    # Crop the central region of the image with an area containing 87.5% of
    # the original image.
    if central_fraction:
      image = tf.image.central_crop(image, central_fraction=central_fraction)

    if height and width:
      # Resize the image to the specified height and width.
      image = tf.expand_dims(image, 0)
      image = tf.image.resize_bilinear(image, [height, width],
                                       align_corners=False)
      image = tf.squeeze(image, [0])
    image = tf.subtract(image, 0.5)
    image = tf.multiply(image, 2.0)
    return image

def run_inference_on_image(image):
  """Runs inference on an image.
  Args:
    image: Image file name.
  Returns:
    Nothing
  """
  with tf.Graph().as_default():
    image_data = tf.gfile.FastGFile(image, 'rb').read()
    image_data = tf.image.decode_jpeg(image_data)
    image_data = preprocess_for_eval(image_data, 299, 299)
    image_data = tf.expand_dims(image_data, 0)
    with tf.Session() as sess:
      image_data = sess.run(image_data)

  # Creates graph from saved GraphDef.
  create_graph()

  with tf.Session() as sess:
    softmax_tensor = sess.graph.get_tensor_by_name('InceptionV3/Logits/SpatialSqueeze:0')
    predictions = sess.run(softmax_tensor,
                           {'input:0': image_data})
    predictions = np.squeeze(predictions)

    # Creates node ID --> English string lookup.
    node_lookup = NodeLookup(FLAGS.label_path)

    top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1]
    for node_id in top_k:
      human_string = node_lookup.id_to_string(node_id)
      score = predictions[node_id]
      print('%s (score = %.5f)' % (human_string, score))


def main(_):
  image = FLAGS.image_file
  run_inference_on_image(image)


if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument(
      '--model_path',
      type=str,
  )
  parser.add_argument(
      '--label_path',
      type=str,
  )
  parser.add_argument(
      '--image_file',
      type=str,
      default='',
      help='Absolute path to image file.'
  )
  parser.add_argument(
      '--num_top_predictions',
      type=int,
      default=5,
      help='Display this many predictions.'
  )
  FLAGS, unparsed = parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

運作該腳本的指令為：

python classify_image_inception_v3.py \
  --model_path slim/satellite/frozen_graph.pb \
  --label_path data_prepare/pic/label.txt \
  --image_file test_image.jpg

--model_path 就是之前導出的模型 frozen_graph.pb。模型的輸出實際是“第 0 類’＼“第 1 類”……是以用--label_path 指定了一個 label 檔案， label 檔案中按順序存儲了各個類别的名稱，這樣腳本就可以把類别的id号轉換為實際的類别名。 --image_file 是需要測試的單張圖檔。

測試結果如下：

【深度學習-模型eval+模型導出】使用Tensorflow Slim對訓練的模型進行評估+導出模型

這就表示模型預測圖檔對應的最可能的類别是 water，接着是 wetland、 urban、 wood 等。 score 是各個類别對應的 Logit。預設取了5個類别預測Logit的輸出，可在運作腳本時用--num_top_predictions參數來改變預設值。

至此，從對原始圖像轉為tfrecord格式檔案，接着訓練權重檔案，模型的驗證評估，模型檔案的導出和測試整個流程就講完了。該文章目前隻針對分類用途。

參考：https://blog.csdn.net/offbye/article/details/78369574

【深度學習-模型eval+模型導出】使用Tensorflow Slim對訓練的模型進行評估+導出模型

模型評估

導出訓練好的模型

模型調用

繼續閱讀

anaconda下鏡像快速安裝tensorflow和keras

anaconda中科大鏡像

安裝tensorflow1.12出現illegal hardware instruction python錯誤1、問題2、定位問題3、問題解決4、驗證

Linux下Anaconda安裝tensorflow-gpu

tensorflow筆記實踐：正則化優化過拟合

TensorFlow運作模型——會話

【Ubuntu-Tensorflow】TF1.0到TF1.2出現“Key LSTM/basic_lstm_cell/bias not found in checkpoin”問題

linux下的conda安裝tensorflow

Linux環境下 TensorFlow的安裝和使用基于Anaconda的tensorflow安裝

MindSpore儲存模型的格式疑惑

【Tensorflow】Tensorflow介紹

鸢尾花分類

利用tensorflow建構AlexNet模型，實作小數量級的貓狗分類（隻有train）

ImportError: libcublas.so.10.0: cannot open shared object file: No such file解決方法

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory（完美解決）

一種解決思路： ImportError: libcublas.so.10.0: cannot open shared object file: No such file