TensorFlow入門訓練筆記（三）——儲存&加載模型

PS：1、本文旨在對TF學習過程進行備忘，本人菜得摳腳，故文章難免會有一定錯誤，還望指出，謝謝；

2、本文程式代碼使用Google TensorFlow所給出的官方入門教程；

3、本文使用tf.keras,對模型進行建構與訓練。

1、在訓練中儲存模型參數（Cheakpoints）

本文通過keras所提供回調參數（callbacks）中的模型檢查點（ModelCheckpoint)儲存模型訓練中的權重資料。然後建立一個未經訓練的模型，測試集顯示新模型準确度約為10.5%，後将儲存的權重加載，重新使用訓練集評估，準确度約為87.2%。

回調函數是一個函數的合集，會在訓練的階段中所使用。你可以使用回調函數來檢視訓練模型的内在狀态和統計。

允許在訓練的過程中和結束時回調儲存的模型。

參考資料：https://keras.io/zh/callbacks/#_1u

import os
import tensorflow as tf
from tensorflow import keras
print(tf.version.VERSION)

#加載資料集（訓練集、測試集）
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
#使用前1000個資料
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]
#類歸一化處理，将圖像深度從0-255變為0-1
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

# 定義一個簡單的序列模型
def create_model():
  model = tf.keras.models.Sequential([
      #全連接配接層模型，激活函數relu,輸入次元784（28*28）
    keras.layers.Dense(512, activation='relu', input_shape=(784,)),
      #建立Dropout，防止過拟合,增加模型泛化能力，随機丢棄輸入單元機率設定為0.2
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10)
  ])

  model.compile(optimizer='adam',
                loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model
# 建立一個基本的模型執行個體
model = create_model()
# 顯示模型的結構
model.summary()




#在訓練期間儲存模型（以 checkpoints 形式儲存）
#儲存的路徑和名稱
checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

# 建立一個儲存模型權重的回調
#ModelCheckpoint：在每個訓練期之後儲存模型
#filepath:檔案路徑
#save_weights_only=True：被監測資料最佳模型不會被覆寫
#verbose=1:列印詳細資訊
#period: 每個檢查點之間的間隔（訓練輪數）
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)
# 使用新的回調訓練模型
model.fit(train_images,
          train_labels,
          epochs=10,
          validation_data=(test_images,test_labels),
          callbacks=[cp_callback])  # 記錄回調參數


# 建立一個基本模型執行個體
model = create_model()
# 評估模型
loss, acc = model.evaluate(test_images,  test_labels, verbose=2)
print("Untrained model, accuracy: {:5.2f}%".format(100*acc))
# 加載權重
model.load_weights(checkpoint_path)
# 重新評估模型
loss,acc = model.evaluate(test_images,  test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

輸出結果：

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
 1/32 [..............................] - ETA: 0s - loss: 2.3353 - accuracy: 0.1562
Epoch 00001: saving model to training_1/cp.ckpt
32/32 [==============================] - 1s 18ms/step - loss: 1.2178 - accuracy: 0.6480 - val_loss: 0.7362 - val_accuracy: 0.7850
Epoch 2/10
 1/32 [..............................] - ETA: 0s - loss: 0.3494 - accuracy: 0.9375
Epoch 00002: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 15ms/step - loss: 0.4409 - accuracy: 0.8740 - val_loss: 0.5288 - val_accuracy: 0.8410
Epoch 3/10
14/32 [============>.................] - ETA: 0s - loss: 0.3190 - accuracy: 0.9219
Epoch 00003: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 15ms/step - loss: 0.2917 - accuracy: 0.9300 - val_loss: 0.4958 - val_accuracy: 0.8490
Epoch 4/10
 1/32 [..............................] - ETA: 0s - loss: 0.1519 - accuracy: 0.9688
Epoch 00004: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 13ms/step - loss: 0.2089 - accuracy: 0.9540 - val_loss: 0.4435 - val_accuracy: 0.8530
Epoch 5/10
 1/32 [..............................] - ETA: 0s - loss: 0.0919 - accuracy: 1.0000
Epoch 00005: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 13ms/step - loss: 0.1563 - accuracy: 0.9630 - val_loss: 0.4257 - val_accuracy: 0.8560
Epoch 6/10
30/32 [===========================>..] - ETA: 0s - loss: 0.1299 - accuracy: 0.9760
Epoch 00006: saving model to training_1/cp.ckpt
32/32 [==============================] - 1s 17ms/step - loss: 0.1316 - accuracy: 0.9760 - val_loss: 0.4221 - val_accuracy: 0.8630
Epoch 7/10
31/32 [============================>.] - ETA: 0s - loss: 0.0900 - accuracy: 0.9829
Epoch 00007: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 14ms/step - loss: 0.0896 - accuracy: 0.9830 - val_loss: 0.4172 - val_accuracy: 0.8740
Epoch 8/10
30/32 [===========================>..] - ETA: 0s - loss: 0.0660 - accuracy: 0.9917
Epoch 00008: saving model to training_1/cp.ckpt
32/32 [==============================] - 1s 18ms/step - loss: 0.0658 - accuracy: 0.9920 - val_loss: 0.4227 - val_accuracy: 0.8680
Epoch 9/10
31/32 [============================>.] - ETA: 0s - loss: 0.0495 - accuracy: 0.9980
Epoch 00009: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 15ms/step - loss: 0.0494 - accuracy: 0.9980 - val_loss: 0.4176 - val_accuracy: 0.8650
Epoch 10/10
 1/32 [..............................] - ETA: 0s - loss: 0.0210 - accuracy: 1.0000
Epoch 00010: saving model to training_1/cp.ckpt
32/32 [==============================] - 1s 17ms/step - loss: 0.0382 - accuracy: 0.9970 - val_loss: 0.4103 - val_accuracy: 0.8720
評估未訓練的模型
32/32 - 0s - loss: 2.3249 - accuracy: 0.1050
Untrained model, accuracy: 10.50%
加載權重後重新評估模型
32/32 - 0s - loss: 0.4103 - accuracy: 0.8720
Restored model, accuracy: 87.20%

2、按頻次儲存Checkpoint

此外還可以根據一定頻率epoch，儲存多個具有唯一名稱的回調參數，

import os
import tensorflow as tf
from tensorflow import keras
print(tf.version.VERSION)

#加載資料集（訓練集、測試集）
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
#使用前1000個資料
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]
#類歸一化處理，将圖像深度從0-255變為0-1
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

# 定義一個簡單的序列模型
def create_model():
  model = tf.keras.models.Sequential([
      #全連接配接層模型，激活函數relu,輸入次元784（28*28）
    keras.layers.Dense(512, activation='relu', input_shape=(784,)),
      #建立Dropout，防止過拟合,增加模型泛化能力，随機丢棄輸入單元機率設定為0.2
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10)
  ])

  model.compile(optimizer='adam',
                loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model
# 建立一個基本的模型執行個體
model = create_model()
# 顯示模型的結構
model.summary()


#在訓練期間儲存模型（以 checkpoints 形式儲存）
#儲存的路徑和名稱
checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

# 建立一個儲存模型權重的回調
#ModelCheckpoint：在每個訓練期之後儲存模型
#filepath:檔案路徑
#save_weights_only=True：被監測資料最佳模型不會被覆寫
#verbose=1:列印詳細資訊
#period: 每個檢查點之間的間隔（訓練輪數）
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)
#在檔案名中包含 epoch (使用 `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)






# 建立一個回調，每 5 個 epochs 儲存模型的權重
cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path,
    verbose=1,
    save_weights_only=True,
    period=5)


# 使用 `checkpoint_path` 格式儲存權重
model.save_weights(checkpoint_path.format(epoch=0))

# 使用新的回調訓練模型
model.fit(train_images,
          train_labels,
          epochs=50,
          callbacks=[cp_callback],
          validation_data=(test_images,test_labels),
          verbose=0)
#現在檢視生成的 checkpoint 并選擇最新的 checkpoint ：
#該功能由函數latest_checkpoint（）實作
latest = tf.train.latest_checkpoint(checkpoint_dir)
print('latest:',latest)
#如需選擇其他節點儲存的檔案，可以直接調用對應檔案名，如下
FristCheckPoint='training_2/cp-0000.ckpt'

#驗證回調參數

# 建立一個新的模型執行個體
model = create_model()
# 加載以前儲存的權重
model.load_weights(latest)
# 重新評估模型
loss, acc = model.evaluate(test_images,  test_labels, verbose=2)#顯示結果
print("Restored model, accuracy: {:5.2f}%".format(100*acc))#輸出準确度

輸出結果

Epoch 00005: saving model to training_2/cp-0005.ckpt

Epoch 00010: saving model to training_2/cp-0010.ckpt

Epoch 00015: saving model to training_2/cp-0015.ckpt

Epoch 00020: saving model to training_2/cp-0020.ckpt

Epoch 00025: saving model to training_2/cp-0025.ckpt

Epoch 00030: saving model to training_2/cp-0030.ckpt

Epoch 00035: saving model to training_2/cp-0035.ckpt

Epoch 00040: saving model to training_2/cp-0040.ckpt

Epoch 00045: saving model to training_2/cp-0045.ckpt

Epoch 00050: saving model to training_2/cp-0050.ckpt
latest: training_2\cp-0050.ckpt
32/32 - 0s - loss: 0.4845 - accuracy: 0.8740
Restored model, accuracy: 87.40%

儲存的檔案

TensorFlow入門訓練筆記（三）——儲存&加載模型

上述代碼将權重存儲到 checkpoint—— 格式化檔案的集合中，這些檔案僅包含二進制格式的訓練權重。 Checkpoints 包含：

一個或多個包含模型權重的分片。

索引檔案，訓示哪些權重存儲在哪個分片中。

如果你隻在一台機器上訓練一個模型，你将有一個帶有字尾的碎片：.data-00000-of-00001

3、手動儲存權重

您将了解如何将權重加載到模型中。使用 Model.save_weights 方法手動儲存它們同樣簡單。預設情況下， tf.keras 和 save_weights 特别使用 TensorFlow checkpoints 格式 .ckpt 擴充名和 ( 儲存在 HDF5 擴充名為 .h5 儲存并序列化模型 )：

import os
import tensorflow as tf
from tensorflow import keras
print(tf.version.VERSION)

#加載資料集（訓練集、測試集）
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
#使用前1000個資料
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]
#類歸一化處理，将圖像深度從0-255變為0-1
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

# 定義一個簡單的序列模型
def create_model():
  model = tf.keras.models.Sequential([
      #全連接配接層模型，激活函數relu,輸入次元784（28*28）
    keras.layers.Dense(512, activation='relu', input_shape=(784,)),
      #建立Dropout，防止過拟合,增加模型泛化能力，随機丢棄輸入單元機率設定為0.2
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10)
  ])

  model.compile(optimizer='adam',
                loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model
# 建立一個基本的模型執行個體
model = create_model()
# 顯示模型的結構
model.summary()


#在訓練期間儲存模型（以 checkpoints 形式儲存）
#儲存的路徑和名稱
checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

# 建立一個儲存模型權重的回調
#ModelCheckpoint：在每個訓練期之後儲存模型
#filepath:檔案路徑
#save_weights_only=True：被監測資料最佳模型不會被覆寫
#verbose=1:列印詳細資訊
#period: 每個檢查點之間的間隔（訓練輪數）
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)
# 使用新的回調訓練模型
model.fit(train_images,
          train_labels,
          epochs=10,
          validation_data=(test_images,test_labels),
          callbacks=[cp_callback])  # 記錄回調參數


# 儲存權重
#Saves all layer weights.
model.save_weights('./checkpoints/my_checkpoint')

# 建立模型執行個體
model = create_model()

# 恢複權重
model.load_weights('./checkpoints/my_checkpoint')

# 評估模型
loss,acc = model.evaluate(test_images,  test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

輸出結果

4、以HDF5格式儲存整個模型

此處代碼使用提前停止的方式防止過拟合。

import os
import tensorflow as tf
from tensorflow import keras
print(tf.version.VERSION)

#加載資料集（訓練集、測試集）
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
#使用前1000個資料
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]
#類歸一化處理，将圖像深度從0-255變為0-1
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

# 定義一個簡單的序列模型
def create_model():
  model = tf.keras.models.Sequential([
      #全連接配接層模型，激活函數relu,輸入次元784（28*28）
    keras.layers.Dense(512, activation='relu', input_shape=(784,)),
      #建立Dropout，防止過拟合,增加模型泛化能力，随機丢棄輸入單元機率設定為0.2
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10)
  ])

  model.compile(optimizer='adam',
                loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model

#以HDF5格式儲存
# 建立并訓練一個新的模型執行個體
model = create_model()


# patience 值用來檢查改進 epochs 的數量
#當驗證值沒有提高上是自動停止訓練。 我們将使用一個 EarlyStopping callback 來測試每個 epoch
#的訓練條件。如果經過一定數量的 epochs 後沒有改進，則自動停止訓練。
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)
#再次訓練模型，顯示日志verbose=1
model.fit(train_images, train_labels, epochs=50,
 validation_split = 0.2, verbose=1, callbacks=[early_stop])

#model.fit(train_images, train_labels, epochs=20)

# 将整個模型儲存為 HDF5 檔案。
# '.h5' 擴充名訓示應将模型儲存到 HDF5。
model.save('my_model.h5')

# 重新建立完全相同的模型，包括其權重和優化程式
new_model = tf.keras.models.load_model('my_model.h5')

# 顯示網絡結構
new_model.summary()
#檢查準确性
loss, acc = new_model.evaluate(test_images,  test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100*acc))

輸出結果

Epoch 16/50
25/25 [==============================] - 0s 2ms/step - loss: 0.0124 - accuracy: 1.0000 - val_loss: 0.5265 - val_accuracy: 0.8650
Epoch 17/50
25/25 [==============================] - 0s 2ms/step - loss: 0.0124 - accuracy: 1.0000 - val_loss: 0.5186 - val_accuracy: 0.8750
Epoch 18/50
25/25 [==============================] - 0s 2ms/step - loss: 0.0110 - accuracy: 1.0000 - val_loss: 0.5418 - val_accuracy: 0.8750
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
32/32 - 0s - loss: 0.4602 - accuracy: 0.8680
Restored model, accuracy: 86.80%

參考資料:HDF5 資料檔案簡介

5、以SavedModel 格式儲存整個模型

SavedModel 格式是序列化模型的另一種方法。以這種格式儲存的模型，可以使用 tf.keras.models.load_model 還原，并且模型與 TensorFlow Serving 相容。SavedModel 指南詳細介紹了如何提供/檢查 SavedModel。以下部分說明了儲存和還原模型的步驟。

SavedModel 格式是一個包含 protobuf 二進制檔案和 Tensorflow 檢查點（checkpoint）的目錄。

import os
import tensorflow as tf
from tensorflow import keras
print(tf.version.VERSION)

#加載資料集（訓練集、測試集）
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
#使用前1000個資料
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]
#類歸一化處理，将圖像深度從0-255變為0-1
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

# 定義一個簡單的序列模型
def create_model():
  model = tf.keras.models.Sequential([
      #全連接配接層模型，激活函數relu,輸入次元784（28*28）
    keras.layers.Dense(512, activation='relu', input_shape=(784,)),
      #建立Dropout，防止過拟合,增加模型泛化能力，随機丢棄輸入單元機率設定為0.2
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10)
  ])

  model.compile(optimizer='adam',
                loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model
  
#以SavedModel 格式儲存
# 建立并訓練一個新的模型執行個體。
model = create_model()
model.fit(train_images, train_labels, epochs=5)

# 将整個模型另存為 SavedModel。
model.save('saved_model/my_model')
#從儲存的模型重新加載新的 Keras 模型：
new_model = tf.keras.models.load_model('saved_model/my_model')
# 檢查其架構
new_model.summary()
# 評估還原的模型
loss, acc = new_model.evaluate(test_images,  test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100*acc))
print(new_model.predict(test_images).shape)

輸出結果

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
32/32 - 0s - loss: 0.4319 - accuracy: 0.8610
Restored model, accuracy: 86.10%
(1000, 10)

産生的檔案結構

TensorFlow入門訓練筆記（三）——儲存&加載模型

TensorFlow入門訓練筆記（三）——儲存&加載模型

1、在訓練中儲存模型參數（Cheakpoints）

2、按頻次儲存Checkpoint

3、手動儲存權重

4、以HDF5格式儲存整個模型

5、以SavedModel 格式儲存整個模型

繼續閱讀

自學記錄《深度學習500問》之深度學習基礎

問答機器人代碼封裝和對外提供接口代碼封裝和對外提供接口

數學模組化智能優化算法之神經網絡案例附Matlab代碼

突破！雙一流大學，首篇Nature！

PALM病理性近視預測 2021-07-04飛槳正常賽：PALM病理性近視預測 6月第3名方案一、賽題介紹

鸢尾花分類

圖形處理單元(GPU)的演進

CogView: Mastering Text-to-Image Generation via Transformers翻譯摘要1.介紹2.方法3.Finetuning

利用tensorflow建構AlexNet模型，實作小數量級的貓狗分類（隻有train）

ImportError: libcublas.so.10.0: cannot open shared object file: No such file解決方法

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory（完美解決）

一種解決思路： ImportError: libcublas.so.10.0: cannot open shared object file: No such file

深度學習之卷積01 卷積02 填充Padding03 步幅Stride04 卷積核的選擇05 多通道卷積參考

通俗了解查準率(precision)和查全率(recall)

人工智能如何有效地運用于自然語言處理

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

TensorFlow入門訓練筆記（三）——儲存&amp;加載模型

1、在訓練中儲存模型參數（Cheakpoints）

2、按頻次儲存Checkpoint

3、手動儲存權重

4、以HDF5格式儲存整個模型

5、以SavedModel 格式儲存整個模型

繼續閱讀

TensorFlow入門訓練筆記（三）——儲存&加載模型