Tensorflow c++ 實踐及各種坑

在這篇文章中：

實作方案
實作步驟
- (1) 源碼編譯
- (2) 模型訓練與輸出
- (3) 模型固化
- 坑 BatchNorm bug
- (4) 模型加載及運作
- (5) 運作問題

Tensorflow目前官網僅包含python、C、Java、Go的釋出包，并無C++ release包，并且tensorflow官網也注明了并不保證除python以外庫的穩定性，在功能方面python也是最完善的。衆所周知，python在開發效率、易用性上有着巨大的優勢，但作為一個解釋性語言，在性能方面還是存在比較大的缺陷，在各類AI服務化過程中，采用python作為模型快速建構工具，使用進階語言(如C++，java)作為服務化程式實作是大勢所趨。本文重點介紹tensorflow C++服務化過程中實作方式及遇到的各種問題。

實作方案

對于tensorflow c++庫的使用，有兩種方法：

(1) 最佳方式當然是直接用C++建構graph，但是目前c++tensorflow庫并不像python api那樣full-featured。可參照builds a small graph in c++ here, C++ tensorflow api中還包含cpu和gpu的數字核心實作的類，可用以添加新的op。可參照https://www.tensorflow.org/extend/adding_an_op

(2) 常用的方式，c++調用python生成好的graph。本文主要介紹該方案。

實作步驟

(1) 編譯tensorflow源碼C++ so(2) 模型訓練輸出結果(3) 模型固化(4) 模型加載及運作(5) 運作問題

(1) 源碼編譯

環境要求：公司tlinux2.2版本， GCC版本 >= 4.8.5安裝元件： protobuf 3.3.0 bazel 0.5.0 python 2.7 java8機器要求： 4GB記憶體

a. 安裝java8

yum install java

b. 安裝protobuf 3.3.0

下載下傳https://github.com/google/protobuf/archive/v3.3.0.zip

./configure  &&  make  &&  make install

c. 安裝bazel

download  https://github.com/bazelbuild/bazel/releases
sh bazel-0.5.0-installer-linux-x86_64.sh

d. 編譯源碼

最好采用最新release版本：https://github.com/tensorflow/tensorflow/releases

編譯過程中可能遇到的問題：問題一： fatal error: unsupported/Eigen/CXX11/Tensor: No such file or directory

安裝Eigen3.3或以上版本問題二： java.io.IOException: Cannot run program "patch"

yum install patch

問題三：記憶體不夠

Tensorflow c++ 實踐及各種坑Tensorflow c++ 實踐及各種坑

(2) 模型訓練與輸出

模型訓練輸出可參照改用例去實踐https://blog.metaflow.fr/tensorflow-saving-restoring-and-mixing-multiple-models-c4c94d5d7125， google上也很多，模型訓練儲存好得到下面檔案：

Tensorflow c++ 實踐及各種坑Tensorflow c++ 實踐及各種坑

(3) 模型固化

模型固化方式有三種：

a. freeze_graph 工具

bazel build tensorflow/python/tools:freeze_graph && bazel-bin/tensorflow/python/tools/freeze_graph 
        --input_graph=graph.pb 
        --input_checkpoint=checkpoint 
        --output_graph=./frozen_graph.pb 
        --output_node_names=output/output/scores

b. 利用freeze_graph.py工具

# We save out the graph to disk, and then call the const conversion
# routine.
checkpoint_state_name = "checkpoint"
input_graph_name = "graph.pb"
output_graph_name = "frozen_graph.pb"

input_graph_path = os.path.join(FLAGS.model_dir, input_graph_name)
input_saver_def_path = ""
input_binary = False
input_checkpoint_path = os.path.join(FLAGS.checkpoint_dir, 'saved_checkpoint') + "-0"

# Note that we this normally should be only "output_node"!!!
output_node_names = "output/output/scores" 
restore_op_name = "save/restore_all"
filename_tensor_name = "save/Const:0"
output_graph_path = os.path.join(FLAGS.model_dir, output_graph_name)
clear_devices = False

freeze_graph.freeze_graph(input_graph_path, input_saver_def_path,
                          input_binary, input_checkpoint_path,
                          output_node_names, restore_op_name,
                          filename_tensor_name, output_graph_path,
                          clear_devices)

c. 利用tensorflow python

import os, argparse

import tensorflow as tf
from tensorflow.python.framework import graph_util

dir = os.path.dirname(os.path.realpath(__file__))

def freeze_graph(model_folder):
    # We retrieve our checkpoint fullpath
    checkpoint = tf.train.get_checkpoint_state(model_folder)
    input_checkpoint = checkpoint.model_checkpoint_path

    # We precise the file fullname of our freezed graph
    absolute_model_folder = "/".join(input_checkpoint.split('/')[:-1])
    output_graph = absolute_model_folder + "/frozen_model.pb"
    print output_graph
    # Before exporting our graph, we need to precise what is our output node
    # This is how TF decides what part of the Graph he has to keep and what part it can dump
    # NOTE: this variable is plural, because you can have multiple output nodes
    output_node_names = "output/output/scores"

    # We clear devices to allow TensorFlow to control on which device it will load operations
    clear_devices = True

    # We import the meta graph and retrieve a Saver
    saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)

    # We retrieve the protobuf graph definition
    graph = tf.get_default_graph()
    input_graph_def = graph.as_graph_def()

    # fix batch norm nodes
    for node in input_graph_def.node:
        if node.op == 'RefSwitch':
            node.op = 'Switch'
            for index in xrange(len(node.input)):
                if 'moving_' in node.input[index]:
                    node.input[index] = node.input[index] + '/read'
        elif node.op == 'AssignSub':
            node.op = 'Sub'
            if 'use_locking' in node.attr: del node.attr['use_locking']

    # We start a session and restore the graph weights
    with tf.Session() as sess:
        saver.restore(sess, input_checkpoint)

        # We use a built-in TF helper to export variables to constants
        output_graph_def = graph_util.convert_variables_to_constants(
            sess, # The session is used to retrieve the weights
            input_graph_def, # The graph_def is used to retrieve the nodes 
            output_node_names.split(",") # The output node names are used to select the usefull nodes
        ) 

        # Finally we serialize and dump the output graph to the filesystem
        with tf.gfile.GFile(output_graph, "wb") as f:
            f.write(output_graph_def.SerializeToString())
        print("%d ops in the final graph." % len(output_graph_def.node))


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_folder", type=str, help="Model folder to export")
    args = parser.parse_args()

    freeze_graph(args.model_folder)

坑 BatchNorm bug

Tensorflow c++ 實踐及各種坑Tensorflow c++ 實踐及各種坑

在具體實際項目，用方式一與方式二将生成的模型利用tensorflow c++ api加載，報以上錯誤，采用tensorflow python加載模型報同樣錯：

Tensorflow c++ 實踐及各種坑Tensorflow c++ 實踐及各種坑

原因是模型中用到了BatchNorm，修複方式如上面c中給出的方案

(4) 模型加載及運作

建構輸入輸出

模型輸入輸出主要就是構造輸入輸出矩陣，相比python的numpy庫，tensorflow提供的Tensor和Eigen::Tensor還是非常難用的，特别是動态矩陣建立，如果你的編譯器支援C++14，可以用xTensor庫，和numpy一樣強大，并且用法機器類似。如果是C++11版本就好好看看eigen庫和tensorflow::Tensor文檔吧。例舉集中簡單的用法：

矩陣指派：

tensorflow::Tensor four_dim_plane(DT_FLOAT, tensorflow::TensorShape({1, MODEL_X_AXIS_LEN, MODEL_Y_AXIS_LEN, fourth_dim_size}));
auto plane_tensor = four_dim_plane.tensor<float, 4>();
for (uint32_t k = 0; k < array_plane.size(); ++k)
{
    for (uint32_t j = 0; j < MODEL_Y_AXIS_LEN; ++j)
    {
        for (uint32_t i = 0; i < MODEL_X_AXIS_LEN; ++i)
        {
            plane_tensor(0, i, j, k) = array_plane[k](i, j); 
        }
    }
}

SOFTMAX:

Eigen::Tensor<float, 1> ModelApp::TensorSoftMax(const Eigen::Tensor<float, 1>& tensor)
{
    Eigen::Tensor<float, 0> max = tensor.maximum();
    auto e_x = (tensor - tensor.constant(max())).exp();
    Eigen::Tensor<float, 0> e_x_sum = e_x.sum();
    return e_x / e_x_sum();
}

模型加載及session初始化：

int32_t ModelApp::Init(const std::string& graph_file, Logger *logger)
{
    auto status = NewSession(SessionOptions(), &m_session); 
    if (!status.ok())
    {
        LOG_ERR(logger, "New session failed! %s", status.ToString().c_str());
        return Error::ERR_FAILED_NEW_TENSORFLOW_SESSION;
    }

    GraphDef graph_def;
    status = ReadBinaryProto(Env::Default(), graph_file, &graph_def);
    if (!status.ok()) 
    {
        LOG_ERR(logger, "Read binary proto failed! %s", status.ToString().c_str());
        return Error::ERR_FAILED_READ_BINARY_PROTO;
    }

    status = m_session->Create(graph_def);
    if (!status.ok()) 
    {
        LOG_ERR(logger, "Session create failed! %s", status.ToString().c_str());
        return Error::ERR_FAILED_CREATE_TENSORFLOW_SESSION;
    }

    return Error::Success;
}

運作：

0.10以上的tensorflow庫是線程安全的，是以可多線程調用predict

int32_t ModelApp::Predict(const Action& action, std::vector<int>* info, Logger *logger)
{
    ...
    auto tensor_x = m_writer->Generate(action, logger);

    Tensor phase_train(DT_BOOL, TensorShape());
    phase_train.scalar<bool>()() = false;
    std::vector<std::pair<std::string, Tensor>> inputs = {
        {"input_x", tensor_x},
        {"phase_train", phase_train}
    }; 

    std::vector<Tensor> result;
    auto status = m_session->Run(inputs, {"output/output/scores"}, {}, &result);
    if (!status.ok())
    {
        LOG_ERR(logger, "Session run failed! %s", status.ToString().c_str());
        return Error::ERR_FAILED_TENSORFLOW_EXECUTION;
    }
    ...
    auto scores = result[0].flat<float>() ;
    ...
    return Error::SUCCESS;
}

(5) 運作問題

問題一：運作告警

2017-08-16 14:11:14.393295: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-16 14:11:14.393324: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-16 14:11:14.393331: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-16 14:11:14.393338: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

是因為在編譯tensorflow so庫的時候沒有把這些CPU加速指令編譯進去，是以可以在編譯的時候加入加速指令，在沒有GPU條件下，加入這些庫實測可以将CPU計算提高10%左右。

需要注意的是并不是所有CPU都支援這些指令，一定要實機測試，以免abort。

問題二: C++ libtensorflow和python tensorflow混用

為驗證C++加載模型調用的準确性，利用swig将c++ api封裝成了python庫供python調用，在同時import tensorflow as tf和import封裝好的python swig接口時，core dump

Tensorflow c++ 實踐及各種坑Tensorflow c++ 實踐及各種坑

該問題tensorflow官方并不打算解決

Tensorflow c++ 實踐及各種坑Tensorflow c++ 實踐及各種坑

Tensorflow c++ 實踐及各種坑Tensorflow c++ 實踐及各種坑

Tensorflow c++ 實踐及各種坑

在這篇文章中：

實作方案

實作步驟

(1) 源碼編譯

a. 安裝java8

b. 安裝protobuf 3.3.0

c. 安裝bazel

d. 編譯源碼

(2) 模型訓練與輸出

(3) 模型固化

a. freeze_graph 工具

b. 利用freeze_graph.py工具

c. 利用tensorflow python

坑 BatchNorm bug

(4) 模型加載及運作

建構輸入輸出

模型加載及session初始化：

運作：

(5) 運作問題

繼續閱讀

資料挖掘學習筆記8-推薦算法

新媒體短視訊營運訓練營旨在賦能當代大學生創新創業。IT教育導師将為大家分享抖音算法原理中的流量池推薦算法。抖音的倒三角形

SIGIR‘22 推薦系統論文之對比學習篇

#推薦算法的價值##大資料的那些事##聊聊資料和AI#今日頭條咋知道我賣書了？接連跟我推薦了幾條賣書買書的内容，搞得人不

選品池/精品池/候選池圈定問題

經過檢測，我的賬号和視訊均無異常。然而，我的流量表現卻不理想。通過與粉絲的交流，我總結出以下幾點原因：隻有更多的使用者喜歡

同樣的視訊在不同地方播放量不一樣的原因可能是多方面的，包括閱聽人群體、推薦算法、文化差異、語言和字幕、網絡環境、釋出時間和

【推薦系統】推薦系統領域最新研究進展

基于圖的模型

機器學習推薦算法之關聯規則（Apriori）——支援度；置信度；提升度

Hoeffding不等式剪枝方法

經過檢測，我的賬号和視訊均未發現異常，但為何我的流量依然不佳呢？許多使用者向我反映，總結起來，主要有以下幾個方面的問題。隻

FM/FFM模型學習總結作者：jliang

FM初步了解&代碼實作0.Reference1.FM了解2.FFM了解3.1 FM圖檔3.2 FFM圖檔4.代碼實作

因子分解機FM算法原理