TensorFlow教程之完整教程 2.2MNIST資料下載下傳MNIST 資料下載下傳

2021-11-09 22:25:26

本文檔為tensorflow參考文檔，本轉載已得到tensorflow中文社群授權。

本教程的目标是展示如何下載下傳用于手寫數字分類問題所要用到的（經典）mnist資料集。

本教程需要使用以下檔案：

檔案

目的

<a href="https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/input_data.py" target="_blank"><code>input_data.py</code></a>

下載下傳用于訓練和測試的mnist資料集的源碼

mnist是在機器學習領域中的一個經典問題。該問題解決的是把28x28像素的灰階手寫數字圖檔識别為相應的數字，其中數字的範圍從0到9.

TensorFlow教程之完整教程 2.2MNIST資料下載下傳MNIST 資料下載下傳

内容

<a href="http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz" target="_blank"><code>train-images-idx3-ubyte.gz</code></a>

訓練集圖檔 - 55000 張訓練圖檔, 5000 張驗證圖檔

<a href="http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz" target="_blank"><code>train-labels-idx1-ubyte.gz</code></a>

訓練集圖檔對應的數字标簽

<a href="http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz" target="_blank"><code>t10k-images-idx3-ubyte.gz</code></a>

測試集圖檔 - 10000 張圖檔

<a href="http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz" target="_blank"><code>t10k-labels-idx1-ubyte.gz</code></a>

測試集圖檔對應的數字标簽

在 <code>input_data.py</code> 檔案中, <code>maybe_download()</code> 函數可以確定這些訓練資料下載下傳到本地檔案夾中。

檔案夾的名字在 <code>fully_connected_feed.py</code> 檔案的頂部由一個标記變量指定，你可以根據自己的需要進行修改。

這些檔案本身并沒有使用标準的圖檔格式儲存，并且需要使用<code>input_data.py</code>檔案中<code>extract_images()</code>和<code>extract_labels()</code>函數來手動解壓（頁面中有相關說明）。

圖檔資料将被解壓成2維的tensor：<code>[image index, pixel index]</code> 其中每一項表示某一圖檔中特定像素的強度值, 範圍從 <code>[0, 255]</code> 到 <code>[-0.5, 0.5]</code>。 "image index"代表資料集中圖檔的編号, 從0到資料集的上限值。"pixel index"代表該圖檔中像素點得個數, 從0到圖檔的像素上限值。

以<code>train-*</code>開頭的檔案中包括60000個樣本，其中分割出55000個樣本作為訓練集，其餘的5000個樣本作為驗證集。因為所有資料集中28x28像素的灰階圖檔的尺寸為784，是以訓練集輸出的tensor格式為<code>[55000, 784]</code>。

數字标簽資料被解壓稱1維的tensor: <code>[image index]</code>，它定義了每個樣本數值的類别分類。對于訓練集的标簽來說，這個資料規模就是:<code>[55000]</code>。

底層的源碼将會執行下載下傳、解壓、重構圖檔和标簽資料來組成以下的資料集對象:

資料集

<code>data_sets.train</code>

55000 組圖檔和标簽, 用于訓練。

<code>data_sets.validation</code>

5000 組圖檔和标簽, 用于疊代驗證訓練的準确性。

10000 組圖檔和标簽, 用于最終測試訓練的準确性。

執行<code>read_data_sets()</code>函數将會傳回一個<code>dataset</code>執行個體，其中包含了以上三個資料集。函數<code>dataset.next_batch()</code>是用于擷取以<code>batch_size</code>為大小的一個元組，其中包含了一組圖檔和标簽，該元組會被用于目前的tensorflow運算會話中。

TensorFlow教程之完整教程 2.2MNIST資料下載下傳MNIST 資料下載下傳

繼續閱讀

anaconda中科大鏡像

安裝tensorflow1.12出現illegal hardware instruction python錯誤1、問題2、定位問題3、問題解決4、驗證

Linux下Anaconda安裝tensorflow-gpu

tensorflow筆記實踐：正則化優化過拟合

TensorFlow運作模型——會話

【Ubuntu-Tensorflow】TF1.0到TF1.2出現“Key LSTM/basic_lstm_cell/bias not found in checkpoin”問題

linux下的conda安裝tensorflow

Linux環境下 TensorFlow的安裝和使用基于Anaconda的tensorflow安裝

MindSpore儲存模型的格式疑惑

【Tensorflow】Tensorflow介紹

鸢尾花分類

利用tensorflow建構AlexNet模型，實作小數量級的貓狗分類（隻有train）

ImportError: libcublas.so.10.0: cannot open shared object file: No such file解決方法

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory（完美解決）

一種解決思路： ImportError: libcublas.so.10.0: cannot open shared object file: No such file

K-近鄰算法以及圖像分類應用