1. MNIST 資料集的下載下傳及其介紹

MNIST資料集分成兩部分：60000行的訓練資料集（mnist.train）和10000行的測試資料集（mnist.test）。每一個MNIST資料單元有兩部分組成：一張包含手寫數字的圖檔和一個對應的标簽。訓練資料集的圖檔是 mnist.train.images ，訓練資料集的标簽是 mnist.train.labels。每一張圖檔包含28X28個像素點。把這個數組展開成一個向量，長度是 28x28 = 784。是以，在MNIST訓練資料集中，mnist.train.images 是一個形狀為 [60000, 784] 的張量，第一個次元數字用來索引圖檔，第二個次元數字用來索引每張圖檔中的像素點。在此張量裡的每一個元素，都表示某張圖檔裡的某個像素的強度值，值介于0和1之間。相對應的MNIST資料集的标簽是介于0到9的數字，用來描述給定圖檔裡表示的數字。此處使用的标簽資料是”one-hot vectors”。一個one-hot向量除了某一位的數字是1以外其餘各次元數字都是0。是以，數字n将表示成一個隻有在第n次元（從0開始）數字為1的10維向量。比如，标簽0将表示成([1,0,0,0,0,0,0,0,0,0,0])。是以， mnist.train.labels 是一個 [60000, 10] 的數字矩陣。

2. 實作過程

2.1 tensorflow 環境

若叢集未事先裝有tensorflow子產品，可利用cacheArchive參數特性進行配置，方法如下：

- 打包TensorFlow的庫，它依賴的那些庫可以先在環境安裝，也可以将所有依賴的一起打包。如：tar -zcvf tensorflow.tgz ./*

- 上傳該壓縮包至hdfs，如放置在hdfs的/tmp/tensorflow.tgz

- xlearning送出腳本中，添加cacheArchive參數，如： –cacheArchive /tmp/tensorflow.tgz#tensorflow

- 在launch-cmd中所執行的腳本中，添加環境變量設定：export PYTHONPATH=./:$PYTHONPATH

tensorflow依賴庫安裝

yum install numpy python-devel python-wheel

2.2 訓練模型

進入目錄

cd /var/lib/ambari-server/resources/stacks/CRH//services/XLEARNING/xlearning-/examples/tfmnist
export XLEARNING_HOME=/var/lib/ambari-server/resources/stacks/CRH//services/XLEARNING/xlearning-

運作腳本run.sh

#!/bin/sh
$XLEARNING_HOME/bin/xl-submit \
   --app-type "tensorflow" \
   --app-name "tf-mnist" \
   --input /tmp/data/tfmnist/MNIST_data#data \
   --output /tmp/tfmnist_model#model \
   --files demo.py,input_data.py,demo.sh \
   --cacheArchive /tmp/tensorflow.tgz#tensorflow \
   --launch-cmd "sh demo.sh" \
   --worker-memory G \
   --worker-num  \
   --worker-cores  \
   --ps-memory G \
   --ps-num  \
   --ps-cores  \
   --queue default \

demo.sh腳本

export PYTHONPATH=./:$PYTHONPATH
python demo.py --data_path=./data --save_path=./model --log_dir=./eventLog

demo.py代碼

import argparse
import sys
import os
import json
import numpy as np
import time

sys.path.append(os.getcwd())
import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)


import tensorflow as tf

FLAGS = None


def main(_):
    # cluster specification
    FLAGS.task_index = int(os.environ["TF_INDEX"])
    FLAGS.job_name = os.environ["TF_ROLE"]
    cluster_def = json.loads(os.environ["TF_CLUSTER_DEF"])
    cluster = tf.train.ClusterSpec(cluster_def)
    #sess = tf.InteractiveSession()

    print("ClusterSpec:", cluster_def)
    print("current task id:", FLAGS.task_index, " role:", FLAGS.job_name)

    gpu_options = tf.GPUOptions(allow_growth=True)
    server = tf.train.Server(cluster, job_name=FLAGS.job_name, task_index=FLAGS.task_index,
                             config=tf.ConfigProto(gpu_options=gpu_options, allow_soft_placement=True))

    if FLAGS.job_name == "ps":
        server.join()
    elif FLAGS.job_name == "worker":
        # set the train parameters
        with tf.device(tf.train.replica_device_setter(worker_device=("/job:worker/task:%d" % (FLAGS.task_index)),
                                                      cluster=cluster)):
            global_step = tf.get_variable('global_step', [], initializer=tf.constant_initializer(), trainable=False)
            x = tf.placeholder(tf.float32, shape=[None, ])
            y_ = tf.placeholder(tf.float32, shape=[None, ])
            W = tf.Variable(tf.zeros([, ]))
            b = tf.Variable(tf.zeros([]))

            #sess.run(tf.global_variables_initializer())

            y = tf.matmul(x, W) + b  
            cross_entropy = tf.reduce_mean(
                tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

            train_step = tf.train.GradientDescentOptimizer().minimize(cross_entropy)
            correct_prediction = tf.equal(tf.argmax(y, ), tf.argmax(y_, ))

            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

            def weight_variable(shape):
                initial = tf.truncated_normal(shape, stddev=)
                return tf.Variable(initial)

            def bias_variable(shape):
                initial = tf.constant(, shape=shape)
                return tf.Variable(initial)

            def conv2d(x, W):
                return tf.nn.conv2d(x, W, strides=[, , , ], padding='SAME')

            def max_pool_2x2(x):
                return tf.nn.max_pool(x, ksize=[, , , ],
                                      strides=[, , , ], padding='SAME')

            W_conv1 = weight_variable([, , , ])
            b_conv1 = bias_variable([])
            x_image = tf.reshape(x, [-, , , ])
            h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
            h_pool1 = max_pool_2x2(h_conv1)

            W_conv2 = weight_variable([, , , ])
            b_conv2 = bias_variable([])
            h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
            h_pool2 = max_pool_2x2(h_conv2)

            W_fc1 = weight_variable([ *  * , ])
            b_fc1 = bias_variable([])

            h_pool2_flat = tf.reshape(h_pool2, [-,  *  * ])
            h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
            keep_prob = tf.placeholder(tf.float32)
            h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
            W_fc2 = weight_variable([, ])
            b_fc2 = bias_variable([])

            y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
            cross_entropy = tf.reduce_mean(
                tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
            train_step = tf.train.AdamOptimizer().minimize(cross_entropy)
            correct_prediction = tf.equal(tf.argmax(y_conv, ), tf.argmax(y_, ))
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
            init_op = tf.global_variables_initializer()
            saver = tf.train.Saver()  # defaults to saving all variables

        sv = tf.train.Supervisor(is_chief=(FLAGS.task_index == ), global_step=global_step, init_op=init_op)
        with sv.prepare_or_wait_for_session(server.target,
                                            config=tf.ConfigProto(gpu_options=gpu_options, allow_soft_placement=True,
                                                                  log_device_placement=True)) as sess:
            # perform training cycles
            start_time = time.time()
            if (FLAGS.task_index == ):
                train_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)

            sess.run(init_op)
            for i in range():
                batch = mnist.train.next_batch()
                elapsed_time = time.time() - start_time
                start_time = time.time()
                if i %  == :
                    train_accuracy = accuracy.eval(feed_dict={
                        x: batch[], y_: batch[], keep_prob: })
                    print("step %d, training accuracy %g, Time: %3.2fms" % (i, train_accuracy, float(elapsed_time*)))
                train_step.run(feed_dict={x: batch[], y_: batch[], keep_prob: })
                sys.stderr.write("reporter progress:%0.4f\n"%(float(i/)))
            print("test accuracy %g" % accuracy.eval(feed_dict={
              x: mnist.test.images, y_: mnist.test.labels, keep_prob: }))
            print("Train Completed.")
            if (FLAGS.task_index == ):
                train_writer.close()
                print("saving model...")
                saver.save(sess, FLAGS.save_path+"/model.ckpt")
        print("done")

if __name__ == "__main__":
  parser = argparse.ArgumentParser()
  parser.register("type", "bool", lambda v: v.lower() == "true")
  # Flags for defining the tf.train.ClusterSpec
  parser.add_argument(
    "--job_name",
    type=str,
    default="",
    help="One of 'ps', 'worker'"
  )
  # Flags for defining the tf.train.Server
  parser.add_argument(
    "--task_index",
    type=int,
    default=,
    help="Index of task within the job"
  )
  # Flags for defining the parameter of data path
  parser.add_argument(
    "--data_path",
    type=str,
    default="",
    help="The path for train file"
  )
  parser.add_argument(
    "--save_path",
    type=str,
    default="",
    help="The save path for model"
  )
  parser.add_argument(
    "--log_dir",
    type=str,
    default="",
    help="The log path for model"
)

  FLAGS, unparsed = parser.parse_known_args()
  tf.app.run(main=main)

注：saver部分将訓練的權重和偏置儲存下來，在評價程式中可以再次使用。

2.3 準備測試圖檔，用Opencv進行預處理

訓練好了網絡，下一步就要測試它了。準備一張圖檔，然後用Opencv預處理一下再放到評價程式裡，看看能不能準确識别。

使用的是Opencv對圖像進行預處理，縮小它的大小為28*28像素，并轉變為灰階圖，進行二值化處理。

(1) stdafx.h檔案

添加opencv相關的頭檔案

#include <opencv2/highgui/highgui.hpp>
#include <opencv2/opencv.hpp> 
#include <opencv2/gpu/gpu.hpp>
#include <opencv2/core/core.hpp>
#include <opencv/cv.h>
#include <opencv/cxcore.h>
#include <opencv/highgui.h>

(2)TF_ImgPreProcess.cpp檔案

#include "stdafx.h"

#include <opencv2/core/core.hpp>
#include <opencv2/core/opengl_interop.hpp>
#include <opencv2/gpu/gpu.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/contrib/contrib.hpp>
using namespace std;
using namespace cv;

int _tmain(int argc, _TCHAR* argv[])
{
   IplImage* img = cvLoadImage("E:\\png\\5.png",);
   IplImage* copyImg=cvCreateImage(cvGetSize(img),IPL_DEPTH_8U,);
   cvCopyImage(img,copyImg);
   IplImage* ResImg=cvCreateImage(cvSize(,),IPL_DEPTH_8U,);
   IplImage* TmpImg=cvCreateImage(cvGetSize(ResImg),IPL_DEPTH_8U,);

   cvResize(copyImg,TmpImg,CV_INTER_LINEAR); 
   cvCvtColor(TmpImg,ResImg,CV_RGB2GRAY);
   cvThreshold(ResImg,ResImg,,,CV_THRESH_BINARY_INV);

   cvSaveImage("E:\\png\\result\\1.png",ResImg);
   cvWaitKey();

    return ;
}

2.4 将圖檔輸入網絡進行識别

在環境中安裝opencv包

這裡編寫了一個前向傳播的程式，最後softmax層分類的結果就是最後的識别結果。

程式如下:

“`python

from PIL import Image, ImageFilter

import tensorflow as tf

import cv2

def imageprepare():

“””

This function returns the pixel values.

The imput is a png file location.

“””

file_name=’/data/sxl/MNIST_recognize/p_num2.png’#導入自己的圖檔位址

#in terminal ‘mogrify -format png *.jpg’ convert jpg to png

im = Image.open(file_name).convert(‘L’)

im.save("/data/sxl/MNIST_recognize/sample.png")
tv = list(im.getdata()) #get pixel values

#normalize pixels to 0 and 1. 0 is pure white, 1 is pure black.
tva = [ (255-x)*1.0/255.0 for x in tv] 
#print(tva)
return tva



"""
This function returns the predicted integer.
The imput is the pixel values from the imageprepare() function.
"""

# Define the model (same as when creating the model file)

result=imageprepare()

x = tf.placeholder(tf.float32, [None, 784])

W = tf.Variable(tf.zeros([784, 10]))

b = tf.Variable(tf.zeros([10]))

def weight_variable(shape):

initial = tf.truncated_normal(shape, stddev=0.1)

return tf.Variable(initial)

def bias_variable(shape):

initial = tf.constant(0.1, shape=shape)

return tf.Variable(initial)

def conv2d(x, W):

return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding=’SAME’)

def max_pool_2x2(x):

return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding=’SAME’)

W_conv1 = weight_variable([5, 5, 1, 32])

b_conv1 = bias_variable([32])

x_image = tf.reshape(x, [-1,28,28,1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 32, 64])

b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)

h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7 * 7 * 64, 1024])

b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])

h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

keep_prob = tf.placeholder(tf.float32)

h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

W_fc2 = weight_variable([1024, 10])

b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

init_op = tf.initialize_all_variables()

init_op = tf.global_variables_initializer()

“””

Load the model2.ckpt file

file is stored in the same directory as this python script is started

Use the model to predict the integer. Integer is returend as list.

Based on the documentatoin at

https://www.tensorflow.org/versions/master/how_tos/variables/index.html

“””

saver = tf.train.Saver()

with tf.Session() as sess:

sess.run(init_op)

saver.restore(sess, “/data/sxl/MNIST_recognize/form/model2.ckpt”)#這裡使用了之前儲存的模型參數

#print (“Model restored.”)

prediction=tf.argmax(y_conv,1)
predint=prediction.
print(h_conv2)

print('recognize result:')

print(predint[0])

輸入圖檔為：
![](/upload/images///f8c775df-a50b--a2aa-ef51653938a1.png)
運作結果為：
![](/upload/images///be8605a5-d009--b1d7-d423d35797de.png)
說明：
tensorflow模型儲存為:
```python
saver = tf.train.Saver()
with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
saver.save(sess,"checkpoint/model.ckpt",global_step=)




<div class="se-preview-section-delimiter"></div>

運作後,儲存模型儲存,得到三個檔案,分别為.data,.meta,.index,

model.ckpt.data-00000-of-00001

model.ckpt.index

model.ckpt.meta

meta file儲存了graph結構,包括 GraphDef, SaverDef等.

index file為一個 string-string table,table的key值為tensor名,value為BundleEntryProto, BundleEntryProto.

data file儲存了模型的所有變量的值.

模型加載為:

with tf.Session() as sess:
  saver.restore(sess, "/checkpoint/model.ckpt")

運作後,儲存模型儲存,得到三個檔案,分别為.data,.meta,.index,
model.ckpt.data--of-
model.ckpt.index
model.ckpt.meta
meta file儲存了graph結構,包括 GraphDef, SaverDef等.
index file為一個 string-string table,table的key值為tensor名,value為BundleEntryProto, BundleEntryProto.
data file儲存了模型的所有變量的值.
模型加載為:
```python
with tf.Session() as sess:
  saver.restore(sess, "/checkpoint/model.ckpt")

更多精彩原創文章，詳見紅象雲騰社群

利用tensorflow在mnist上訓練和測試LeNet模型1. MNIST 資料集的下載下傳及其介紹2. 實作過程init_op = tf.initialize_all_variables()

1. MNIST 資料集的下載下傳及其介紹

2. 實作過程

2.1 tensorflow 環境

2.2 訓練模型

2.3 準備測試圖檔，用Opencv進行預處理

2.4 将圖檔輸入網絡進行識别

init_op = tf.initialize_all_variables()

繼續閱讀

anaconda下鏡像快速安裝tensorflow和keras

anaconda中科大鏡像

安裝tensorflow1.12出現illegal hardware instruction python錯誤1、問題2、定位問題3、問題解決4、驗證

Linux下Anaconda安裝tensorflow-gpu

tensorflow筆記實踐：正則化優化過拟合

TensorFlow運作模型——會話

【Ubuntu-Tensorflow】TF1.0到TF1.2出現“Key LSTM/basic_lstm_cell/bias not found in checkpoin”問題

linux下的conda安裝tensorflow

Linux環境下 TensorFlow的安裝和使用基于Anaconda的tensorflow安裝

MindSpore儲存模型的格式疑惑

【Tensorflow】Tensorflow介紹

鸢尾花分類

利用tensorflow建構AlexNet模型，實作小數量級的貓狗分類（隻有train）

ImportError: libcublas.so.10.0: cannot open shared object file: No such file解決方法

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory（完美解決）

一種解決思路： ImportError: libcublas.so.10.0: cannot open shared object file: No such file