這裡寫目錄标題
- softmax激活函數
- 下面使用獨熱編碼的形式重新訓練模型
softmax激活函數
- softmax邏輯回歸模型是logistic回歸模型在多分類問題上的推廣,在多分類問題中,類标簽y可以取兩個以上的值。在機器學習尤其是深度學習中,softmax是個非常常用而且比較重要的函數,尤其在多分類的場景中使用廣泛。他把一些輸入映射為0-1之間的實數,并且歸一化保證和為1,是以多分類的機率之和也剛好為1。
-
Keras 是一個用 Python 編寫的進階神經網絡 API,它能夠以 TensorFlow, CNTK, 或者 Theano 作為後端運作。
Keras可以很明确的定義了層的概念,反過來層與層之間的參數反倒是使用者不需要關心的對象,是以建構神經網絡的方法對于普通開發者來說,相對tensorflow,Keras更易上手。
并且Keras也是tensorflow官方在tensorflow2.0開始極力推薦使用的。
- 下面我們就利用tf.keras實作一個多層感覺機進行多分類,然後用此模型對fashion_mnist資料集進行訓練預測。
- fashion_mnist資料集和手寫數字集類似,訓練集6w張圖檔,測試集1w張圖檔,共有十個分類:衣服,鞋子,T恤等等;每張圖檔是一個28*28像素的圖檔,每個像素值都是0-255的灰階圖。
- 首先導入需要用的包
import numpy as np
import pandas as ps
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline
- 分割資料集,6w張圖檔和相應的标簽用來訓練,另外1w張圖檔和标簽用作測試
## 分割fashion_mnist資料集
(train_image,train_label),(test_image,test_label) = tf.keras.datasets.fashion_mnist.load_data()
print(train_image.shape,train_label.shape)
print(test_image.shape,test_label.shape)
>>
(60000, 28, 28) (60000,)
(10000, 28, 28) (10000,)
- 檢視一下資料,可以看到訓練集的第一張圖檔是鞋子,标簽是9
plt.imshow(train_image[0])
train_label[0]
>> 9
train_image[0] # 0-255的灰階圖
>> array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
...
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0,
200, 232, 232, 233, 229, 223, 223, 215, 213, 164, 127, 123, 196,
229, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
183, 225, 216, 223, 228, 235, 227, 224, 222, 224, 221, 223, 245,
173, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
193, 228, 218, 213, 198, 180, 212, 210, 211, 213, 223, 220, 243,
202, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0, 12,
219, 220, 212, 218, 192, 169, 227, 208, 218, 224, 212, 226, 197,
209, 52],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 99,
244, 222, 220, 218, 203, 198, 221, 215, 213, 222, 220, 245, 119,
167, 56],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 55,
236, 228, 230, 228, 240, 232, 213, 218, 223, 234, 217, 217, 209,
92, 0],
[ 0, 0, 1, 4, 6, 7, 2, 0, 0, 0, 0, 0, 237,
226, 217, 223, 222, 219, 222, 221, 216, 223, 229, 215, 218, 255,
77, 0],
[ 0, 3, 0, 0, 0, 0, 0, 0, 0, 62, 145, 204, 228,
207, 213, 221, 218, 208, 211, 218, 224, 223, 219, 215, 224, 244,
159, 0],
[ 0, 0, 0, 0, 18, 44, 82, 107, 189, 228, 220, 222, 217,
226, 200, 205, 211, 230, 224, 234, 176, 188, 250, 248, 233, 238,
215, 0],
[ 0, 57, 187, 208, 224, 221, 224, 208, 204, 214, 208, 209, 200,
159, 245, 193, 206, 223, 255, 255, 221, 234, 221, 211, 220, 232,
246, 0],
[ 3, 202, 228, 224, 221, 211, 211, 214, 205, 205, 205, 220, 240,
80, 150, 255, 229, 221, 188, 154, 191, 210, 204, 209, 222, 228,
225, 0],
...
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]], dtype=uint8)
- 由于圖檔的像素值是0-255的,是以我們對圖檔進行歸一化隻需對每張圖檔除以255,然後利用歸一化的資料初始化模型,因為模型輸出是10個分類是以最後一層需要用softmax激活。
# 對資料進行歸一化
train_image = train_image/255
test_image = test_image/255
# 初始化線性模型
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape = (28,28))) # 把二維的圖檔扁平化處理(28,28)→(784,)
model.add(tf.keras.layers.Dense(128, activation = "relu"))
# model.add(tf.keras.layers.Dense(64, activation = "relu"))
model.add(tf.keras.layers.Dense(10, activation = "softmax"))
model.summary()
>>
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0
_________________________________________________________________
dense (Dense) (None, 128) 100480
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
- 下面對模型進行編譯,然後對模型訓練
# 當label是數字編碼時,loss用sparse_categorical_crossentropy,獨熱編碼時用categorical_crossentropy
model.compile(
optimizer="adam",
loss = 'sparse_categorical_crossentropy',
metrics = ['acc'])
# 模型的訓練
history = model.fit(train_image,train_label,epochs=10)
>>
Epoch 1/10
60000/60000 [==============================] - 3s 49us/sample - loss: 0.4967 - acc: 0.8255
Epoch 2/10
60000/60000 [==============================] - 2s 41us/sample - loss: 0.3736 - acc: 0.8658
Epoch 3/10
60000/60000 [==============================] - 3s 43us/sample - loss: 0.3374 - acc: 0.8773
Epoch 4/10
60000/60000 [==============================] - 2s 41us/sample - loss: 0.3118 - acc: 0.8858
Epoch 5/10
60000/60000 [==============================] - 2s 41us/sample - loss: 0.2958 - acc: 0.8913
Epoch 6/10
60000/60000 [==============================] - 2s 39us/sample - loss: 0.2806 - acc: 0.8963
Epoch 7/10
60000/60000 [==============================] - 2s 36us/sample - loss: 0.2691 - acc: 0.9012
Epoch 8/10
60000/60000 [==============================] - 2s 39us/sample - loss: 0.2567 - acc: 0.9047
Epoch 9/10
60000/60000 [==============================] - 2s 33us/sample - loss: 0.2440 - acc: 0.9092
Epoch 10/10
60000/60000 [==============================] - 2s 32us/sample - loss: 0.2370 - acc: 0.9116
- 畫圖觀察一下訓練的效果
history.history.keys() # 字典的形式讀出訓練的損失loss和精度acc
y_loss = history.history.get('loss')
y_acc = history.history.get('acc')
# plt.figure(figsize= (20,8), dpi = 80)
plt.plot(history.epoch, y_acc)
plt.show()
plt.plot(history.epoch, y_loss)
plt.show()
- 下面利用evaluate方法對模型效果進行評估
# 模型評估
print(model.evaluate(test_image,test_label))
model.evaluate(train_image,train_label)
>>
10000/10000 [==============================] - 0s 36us/sample - loss: 0.3444 - acc: 0.8922
[0.3443740872859955, 0.8922]
60000/60000 [==============================] - 2s 31us/sample - loss: 0.1544 - acc: 0.9440
[0.15438867097496986, 0.9440333]
下面使用獨熱編碼的形式重新訓練模型
-
獨熱編碼的訓練時的損失函數使用categorical_crossentropy
調用子產品tf.keras.utils.to_categorical(train_label)完成獨熱編碼
# 獨熱編碼的使用和模型優化
train_label_onehot = tf.keras.utils.to_categorical(train_label)
test_label_onehot = tf.keras.utils.to_categorical(test_label)
print(train_label)
print(train_label_onehot[-1],'\n')
train_label_onehot
>>
[9 0 0 ... 3 0 5]
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
array([[0., 0., 0., ..., 0., 0., 1.],
[1., 0., 0., ..., 0., 0., 0.],
[1., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[1., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)
- 重新訓練一個新的線性模型
# 初始化線性模型
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape = (28,28))) # 把二維的圖檔扁平化處理(28,28)→(784,)
model.add(tf.keras.layers.Dense(128, activation = "relu"))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(64, activation = "relu"))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(64, activation = "relu"))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(10, activation = "softmax"))
# 當label是數字編碼時,loss用sparse_categorical_crossentropy,獨熱編碼時用categorical_crossentropy
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss = 'categorical_crossentropy',
metrics = ['acc'])
model.summary()
>>
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_2 (Flatten) (None, 784) 0
_________________________________________________________________
dense_7 (Dense) (None, 128) 100480
_________________________________________________________________
dropout_3 (Dropout) (None, 128) 0
_________________________________________________________________
dense_8 (Dense) (None, 64) 8256
_________________________________________________________________
dropout_4 (Dropout) (None, 64) 0
_________________________________________________________________
dense_9 (Dense) (None, 64) 4160
_________________________________________________________________
dropout_5 (Dropout) (None, 64) 0
_________________________________________________________________
dense_10 (Dense) (None, 10) 650
=================================================================
Total params: 113,546
Trainable params: 113,546
Non-trainable params: 0
_________________________________________________________________
- 對編譯好的模型進行訓練
# 模型的訓練
history = model.fit(train_image, train_label_onehot,
batch_size= 32, epochs=10,
validation_data=(test_image,test_label_onehot))
>>
Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 4s 66us/sample - loss: 1.0157 - acc: 0.6175 - val_loss: 0.5866 - val_acc: 0.7851
Epoch 2/10
60000/60000 [==============================] - 4s 61us/sample - loss: 0.6970 - acc: 0.7470 - val_loss: 0.5170 - val_acc: 0.7965
Epoch 3/10
60000/60000 [==============================] - 4s 62us/sample - loss: 0.6337 - acc: 0.7724 - val_loss: 0.4967 - val_acc: 0.8118
Epoch 4/10
60000/60000 [==============================] - 4s 63us/sample - loss: 0.6070 - acc: 0.7844 - val_loss: 0.4941 - val_acc: 0.8302
Epoch 5/10
60000/60000 [==============================] - 4s 62us/sample - loss: 0.5929 - acc: 0.7926 - val_loss: 0.4668 - val_acc: 0.8285
Epoch 6/10
60000/60000 [==============================] - 4s 63us/sample - loss: 0.5782 - acc: 0.7977 - val_loss: 0.4591 - val_acc: 0.8297
Epoch 7/10
60000/60000 [==============================] - 4s 62us/sample - loss: 0.5650 - acc: 0.8035 - val_loss: 0.4814 - val_acc: 0.8290
Epoch 8/10
60000/60000 [==============================] - 4s 62us/sample - loss: 0.5580 - acc: 0.8100 - val_loss: 0.4490 - val_acc: 0.8410
Epoch 9/10
60000/60000 [==============================] - 4s 62us/sample - loss: 0.5463 - acc: 0.8142 - val_loss: 0.4525 - val_acc: 0.8334
Epoch 10/10
60000/60000 [==============================] - 4s 63us/sample - loss: 0.5355 - acc: 0.8159 - val_loss: 0.4362 - val_acc: 0.8410
- 利用predict方法檢視預測的情況,預測值和真是值一緻,說明這個數預測正确
predict = model.predict(test_image)
print(np.argmax(predict[0])) # 最大位是第9位,即預測為9
print(test_label[0]) # 預測正确
>> 9
9
- 同理繪制acc的圖像觀察模型訓練的效果
# 繪制acc和val_acc的圖像
plt.plot(history.epoch, history.history.get('acc'), label = "acc")
plt.plot(history.epoch, history.history.get('val_acc'), label = "val_acc")
plt.legend()
>>
<matplotlib.legend.Legend at 0x1a52b9a5148>
可以看出acc和val_acc都一直在上升,說明模型訓練不足,還有上升空間,可以多加一些訓練輪數,或者加深網絡層數。