TensorFlow 2.0簡明指南

文章目錄

- Eager執行
- AutoGraph
- 性能優化：tf.function
- 模型建構：tf.keras
- 模型訓練
- 結語
- 參考文獻

TensorFlow雖是深度學習領域最廣泛使用的架構，但是對比PyTorch這一動态圖架構，采用靜态圖（Graph模式）的TensorFlow确實是難用。好在最近TensorFlow支援了eager模式，對标PyTorch的動态執行機制。更進一步地，Google在最近推出了全新的版本TensorFlow 2.0，2.0版本相比1.0版本不是簡單地更新，而是一次重大更新（雖然目前隻釋出了preview版本）。簡單地來說，TensorFlow 2.0預設采用eager執行模式，而且重整了很多混亂的子產品。毫無疑問，2.0版本将會逐漸替換1.0版本，是以很有必要趁早入手TensorFlow 2.0。這篇文章将簡明扼要地介紹TensorFlow 2.0，以求快速入門。

Eager執行

TensorFlow的Eager執行時一種指令式程式設計（imperative programming），這和原生Python是一緻的，當你執行某個操作時是立即傳回結果的。而TensorFlow一直是采用Graph模式，即先建構一個計算圖，然後需要開啟Session，喂進實際的資料才真正執行得到結果。顯然，eager執行更簡潔，我們可以更容易debug自己的代碼，這也是為什麼PyTorch更簡單好用的原因。一個簡單的例子如下：

x = tf.ones((2, 2), dtype=tf.dtypes.float32)
y = tf.constant([[1, 2],
                 [3, 4]], dtype=tf.dtypes.float32)
z = tf.matmul(x, y)
print(z)
# tf.Tensor(
# [[4. 6.]
#  [4. 6.]], shape=(2, 2), dtype=float32)

print(z.numpy())
# [[4. 6.]
# [4. 6.]]

可以看到在eager執行下，每個操作後的傳回值是tf.Tensor，其包含具體值，不再像Graph模式下那樣隻是一個計算圖節點的符号句柄。由于可以立即看到結果，這非常有助于程式debug。更進一步地，調用tf.Tensor.numpy()方法可以獲得Tensor所對應的numpy數組。

這種eager執行的另外一個好處是可以使用Python原生功能，比如下面的條件判斷：

random_value = tf.random.uniform([], 0, 1)
x = tf.reshape(tf.range(0, 4), [2, 2])
print(random_value)
if random_value.numpy() > 0.5:
    y = tf.matmul(x, x)
else:
    y = tf.add(x, x)

這種動态控制流主要得益于eager執行得到Tensor可以取出numpy值，這避免了使用Graph模式下的tf.cond和tf.while等算子。

另外一個重要的問題，在egaer模式下如何計算梯度。在Graph模式時，我們在構模組化型前向圖時，同時也會建構梯度圖，這樣實際喂資料執行時可以很友善計算梯度。但是eager執行是動态的，這就需要每一次執行都要記錄這些操作以計算梯度，這是通過

tf.GradientTape

來追蹤所執行的操作以計算梯度，下面是一個計算執行個體：

w = tf.Variable([[1.0]])
with tf.GradientTape() as tape:
  loss = w * w + 2. * w + 5.

grad = tape.gradient(loss, w)
print(grad)  # => tf.Tensor([[ 4.]], shape=(1, 1), dtype=float32)

對于eager執行，每個tape會記錄目前所執行的操作，這個tape隻對目前計算有效，并計算相應的梯度。PyTorch也是動态圖模式，但是與TensorFlow不同，它是每個需要計算Tensor會擁有grad_fn以追蹤曆史操作的梯度。

TensorFlow 2.0引入的eager提高了代碼的簡潔性，而且更容易debug。但是對于性能來說，eager執行相比Graph模式會有一定的損失。這不難了解，畢竟原生的Graph模式是先建構好靜态圖，然後才真正執行。這對于在分布式訓練、性能優化和生産部署方面具有優勢。但是好在，TensorFlow 2.0引入了tf.function和AutoGraph來縮小eager執行和Graph模式的性能差距，其核心是将一系列的Python文法轉化為高性能的graph操作。

AutoGraph

AutoGraph在TensorFlow 1.x已經推出，主要是可以将一些常用的Python代碼轉化為TensorFlow支援的Graph代碼。一個典型的例子是在TensorFlow中我們必須使用tf.while和tf.cond等複雜的算子來實作動态流程控制，但是現在我們可以使用Python原生的for和if等文法寫代碼，然後采用AutoGraph轉化為TensorFlow所支援的代碼，如下面的例子：

def square_if_positive(x):
    if x > 0:
        x = x * x
    else:
        x = 0.0
    return x

# eager 模式
print('Eager results: %2.2f, %2.2f' % (square_if_positive(tf.constant(9.0)),
                                       square_if_positive(tf.constant(-9.0))))

# graph 模式
tf_square_if_positive = tf.autograph.to_graph(square_if_positive)

with tf.Graph().as_default():
  # The result works like a regular op: takes tensors in, returns tensors.
  # You can inspect the graph using tf.get_default_graph().as_graph_def()
    g_out1 = tf_square_if_positive(tf.constant( 9.0))
    g_out2 = tf_square_if_positive(tf.constant(-9.0))
    with tf.compat.v1.Session() as sess:
        print('Graph results: %2.2f, %2.2f\n' % (sess.run(g_out1), sess.run(g_out2)))

上面我們定義了一個

square_if_positive

函數，它内部使用的Python的原生的if文法，對于TensorFlow 2.0的eager執行，這是沒有問題的。然而這是TensorFlow 1.x所不支援的，但是使用AutoGraph可以将這個函數轉為Graph函數，你可以将其看成一個正常TensorFlow op，其可以在Graph模式下運作（tf2 沒有Session，這是tf1.x的特性，想使用tf1.x的話需要調用tf.compat.v1）。大家要注意eager模式和Graph模式的差異，盡管結果是一樣的，但是Graph模式更高效。

從本質上講，AutoGraph是将Python代碼轉為TensorFlow原生的代碼，我們可以進一步看到轉化後的代碼：

print(tf.autograph.to_code(square_if_positive))
#################################################
from __future__ import print_function

def tf__square_if_positive(x):
  try:
    with ag__.function_scope('square_if_positive'):
      do_return = False
      retval_ = None
      cond = ag__.gt(x, 0)

      def if_true():
        with ag__.function_scope('if_true'):
          x_1, = x,
          x_1 = x_1 * x_1
          return x_1

      def if_false():
        with ag__.function_scope('if_false'):
          x = 0.0
          return x
      x = ag__.if_stmt(cond, if_true, if_false)
      do_return = True
      retval_ = x
      return retval_
  except:
    ag__.rewrite_graph_construction_error(ag_source_map__)



tf__square_if_positive.autograph_info__ = {}

可以看到AutoGraph轉化的代碼定義了兩個條件函數，然後調用if_stmt op，應該就是類似tf.cond的op。

AutoGraph支援很多Python特性，比如循環：

def sum_even(items):
    s = 0
    for c in items:
        if c % 2 > 0:
            continue
        s += c
    return s

print('Eager result: %d' % sum_even(tf.constant([10,12,15,20])))

tf_sum_even = tf.autograph.to_graph(sum_even)

with tf.Graph().as_default(), tf.compat.v1.Session() as sess:
    print('Graph result: %d\n\n' % sess.run(tf_sum_even(tf.constant([10,12,15,20]))))

對于大部分Python特性AutoGraph是支援的，但是其仍然有限制，具體可以見Capabilities and Limitations。

此外，要注意的一點是，經過AutoGraph轉換的新函數是可以eager模式下執行的，但是性能卻并不會比轉換前的高，你可以對比：

x = tf.constant([10, 12, 15, 20])
print("Eager at orginal code:", timeit.timeit(lambda: sum_even(x), number=100))
print("Eager at autograph code:", timeit.timeit(lambda: tf_sum_even(x), number=100))

with tf.Graph().as_default(), tf.compat.v1.Session() as sess:
    graph_op = tf_sum_even(tf.constant([10, 12, 15, 20]))
    sess.run(graph_op)  # remove first call
    print("Graph at autograph code:", timeit.timeit(lambda: sess.run(graph_op), number=100))
##########################################
Eager at orginal code: 0.05176109499999981
Eager at autograph code: 0.11203173799999977
Graph at autograph code: 0.03418808900000059

從結果上看，Graph模式下的執行效率是最高的，原來的代碼在eager模式下效率次之，經AutoGraph轉換後的代碼效率最低。

是以，在TensorFlow 2.0，我們一般不會直接使用tf.autograph，因為eager執行下效率沒有提升。要真正達到Graph模式下的效率，要依賴

tf.function

這個更強大的利器。

性能優化：tf.function

盡管eager執行更簡潔，但是Graph模式卻是性能更高，為了減少這個性能gap，TensorFlow 2.0引入了

tf.function

，先給出官方對tf.function的說明：

function constructs a callable that executes a TensorFlow graph (tf.Graph) created by tracing the TensorFlow operations in func. This allows the TensorFlow runtime to apply optimizations and exploit parallelism in the computation defined by func.

簡單來說，就是tf.function可以将一個func中的TensorFlow操作建構為一個Graph，這樣在調用時是執行這個Graph，這樣計算性能更優。比如下面的例子：

def f(x, y):
    print(x, y)
    return tf.reduce_mean(tf.multiply(x ** 2, 3) + y)

g = tf.function(f)

x = tf.constant([[2.0, 3.0]])
y = tf.constant([[3.0, -2.0]])

# `f` and `g` will return the same value, but `g` will be executed as a
# TensorFlow graph.
assert f(x, y).numpy() == g(x, y).numpy()
# tf.Tensor([[2. 3.]], shape=(1, 2), dtype=float32) tf.Tensor([[ 3. -2.]], shape=(1, 2), dtype=float32)
# Tensor("x:0", shape=(1, 2), dtype=float32) Tensor("y:0", shape=(1, 2), dtype=float32)

如上面的例子，被tf.function裝飾的函數将以Graph模式執行，可以把它想象一個封裝了Graph的TF op，直接調用它也會立即得到Tensor結果，但是其内部是高效執行的。我們在内部列印Tensor時，eager執行會直接列印Tensor的值，而Graph模式列印的是Tensor句柄，其無法調用numpy方法取出值，這和TF 1.x的Graph模式是一緻的。

由于tf.function裝飾的函數是Graph執行，其執行速度一般要比eager模式要快，當Graph包含很多小操作時差距更明顯，可以比較下卷積和LSTM的性能差距：

import timeit
conv_layer = tf.keras.layers.Conv2D(100, 3)

@tf.function
def conv_fn(image):
  return conv_layer(image)

image = tf.zeros([1, 200, 200, 100])
# warm up
conv_layer(image); conv_fn(image)
print("Eager conv:", timeit.timeit(lambda: conv_layer(image), number=10))
print("Function conv:", timeit.timeit(lambda: conv_fn(image), number=10))
# 單純的卷積差距不是很大
# Eager conv: 0.44013839924952197
# Function conv: 0.3700763391782858

lstm_cell = tf.keras.layers.LSTMCell(10)

@tf.function
def lstm_fn(input, state):
  return lstm_cell(input, state)

input = tf.zeros([10, 10])
state = [tf.zeros([10, 10])] * 2
# warm up
lstm_cell(input, state); lstm_fn(input, state)
print("eager lstm:", timeit.timeit(lambda: lstm_cell(input, state), number=10))
print("function lstm:", timeit.timeit(lambda: lstm_fn(input, state), number=10))
# 對于LSTM比較heavy的計算，Graph執行要快很多
# eager lstm: 0.025562446062237565
# function lstm: 0.0035498656569271647

要想靈活使用tf.function，必須深入了解它背後的機理，這裡簡單地談一下。在TF 1.x時，首先要建立靜态計算圖，然後建立Session真正執行不同的運算：

import tensorflow as tf

x = tf.placeholder(tf.float32)
y = tf.square(x)
z = tf.add(x, y)

sess = tf.Session()

z0 = sess.run([z], feed_dict={x: 2.})        # 6.0
z1 = sess.run([z], feed_dict={x: 2., y: 2.}) # 4.0

盡管上面隻定義了一個graph，但是兩次不同的sess執行（運作時）其實是執行兩個不同的程式或者說subgraph：

def compute_z0(x):
  return tf.add(x, tf.square(x))

def compute_z1(x, y):
  return tf.add(x,  y)

這裡我們将兩個不同的subgraph封裝到了兩個python函數中。更進一步地，我們可以不再需要Session，當執行這兩個函數時，直接調用對應的計算圖就可以，這就是tf.function的功效：

import tensorflow as tf

@tf.function
def compute_z1(x, y):
  return tf.add(x, y)

@tf.function
def compute_z0(x):
  return compute_z1(x, tf.square(x))

z0 = compute_z0(2.)
z1 = compute_z1(2., 2.)

可以說tf.function内部管理了一系列Graph，并控制了Graph的執行。另外一個問題時，雖然函數内部定義了一系列的操作，但是對于不同的輸入，是需要不同的計算圖。如函數的輸入Tensor的shape或者dtype不同，那麼計算圖是不同的，好在tf.function支援這種多态性（polymorphism）

# Functions are polymorphic

@tf.function
def double(a):
  print("Tracing with", a)
  return a + a

print(double(tf.constant(1)))
print(double(tf.constant(1.1)))
print(double(tf.constant([1, 2])))

# Tracing with Tensor("a:0", shape=(), dtype=int32)
# tf.Tensor(2, shape=(), dtype=int32)
# Tracing with Tensor("a:0", shape=(), dtype=float32)
# tf.Tensor(2.2, shape=(), dtype=float32)
# Tracing with Tensor("a:0", shape=(2,), dtype=int32)
# tf.Tensor([2 4], shape=(2,), dtype=int32)

注意函數内部的列印，當輸入tensor的shape或者類型發生變化，列印的東西也是相應改變。是以，它們的計算圖（靜态的）并不一樣。tf.function這種多态特性其實是背後追蹤了（tracing）不同的計算圖。具體來說，被tf.function裝飾的函數

接受一定的Tensors，并傳回0到任意到Tensor，當裝飾後的函數

被執行時：

根據輸入Tensors的shape和dtypes确定一個"trace_cache_key"；
每個"trace_cache_key"映射了一個Graph，當新的"trace_cache_key"要建立時，f将建構一個新的Graph，若"trace_cache_key"已經存在，那麼直需要從緩存中查找已有的Graph即可；
将輸入Tensors喂進這個Graph，然後執行得到輸出Tensors。

這種多态性是我們需要的，因為有時候我們希望輸入不同shape或者dtype的Tensors，但是當"trace_cache_key"越來越多時，意味着你要cache了龐大的Graph，這點是要注意的。另外，tf.function提供了

input_signature

，這個參數采用

tf.TensorSpec

指定了輸入到函數的Tensor的shape和dtypes，如下面的例子：

@tf.function(input_signature=[tf.TensorSpec(shape=None, dtype=tf.float32)])
def f(x):
    return tf.add(x, 1.)
print(f(tf.constant(1.0)))  # tf.Tensor(2.0, shape=(), dtype=float32)
print(f(tf.constant([1.0,]))) # tf.Tensor([2.], shape=(1,), dtype=float32)
print(f(tf.constant([1])))  # ValueError: Python inputs incompatible with input_signature

此時，輸入Tensor的dtype必須是float32，但是shape不限制，當類型不比對時會出錯。

tf.function的另外一個參數是

autograph

，預設是True，意思是在建構Graph時将自動使用AutoGraph，這樣你可以在函數内部使用Python原生的條件判斷以及循環語句，因為它們會被tf.cond和tf.while_loop轉化為Graph代碼。注意的一點是判斷分支和循環必須依賴于Tensors才會被轉化，當autograph為False時，如果存在判斷分支和循環必須依賴于Tensors的情況将會出錯。如下面的例子：

def sum_even(items):
  s = 0
  for c in items:
    if c % 2 > 0:
      continue
    s += c
  return s

sum_even_autograph_on = tf.function(sum_even, autograph=True)
sum_even_autograph_off = tf.function(sum_even, autograph=False)
x = tf.constant([10, 12, 15, 20])

sum_even(x) # OK 
sum_even_autograph_on(x) # OK
sum_even_autograph_off(x) # TypeError: Tensor objects are only iterable when eager execution is enabled

很容易了解，應用tf.function之後是Graph模式，Tensors是不能被周遊的，但是采用AutoGraph可以将其轉換為Graph代碼，是以可以成功。大部分情況，我們還是預設開啟autograph。

最要的是tf.function可以應用到類方法中，并且可以引用tf.Variable，可以看下面的例子：

class ScalarModel(object):
  def __init__(self):
    self.v = tf.Variable(0)

  @tf.function
  def increment(self, amount):
    self.v.assign_add(amount)

model1 = ScalarModel()
model1.increment(tf.constant(3))
assert int(model1.v) == 3
model1.increment(tf.constant(4))
assert int(model1.v) == 7
model2 = ScalarModel()  # model1和model2 擁有不同變量
model2.increment(tf.constant(5))
assert int(model2.v) == 5

後面會講到，這個特性可以應用到tf.Keras的模型建構中。上面這個例子還有一點，就是可以在function中使用tf.assign這類具有副作用（改變Variable的值）的操作，這對于模型訓練比較重要。

前面說過，python原生的print函數隻會在建構Graph時列印一次Tensor句柄。如果想要列印Tensor的具體值，要使用tf.print：

@tf.function
def print_element(items):
    for c in items:
      tf.print(c)

x = tf.constant([1, 5, 6, 8, 3])
print_element(x)

這裡就對tf.function做這些介紹，但是實際上其還有更多複雜的使用須知，詳情可以參考TensorFlow 2.0: Functions, not Sessions。

模型建構：tf.keras

TensorFlow 2.0全面keras化：如果你想使用進階的layers，隻能選擇keras。TensorFlow 1.x存在tf.layers以及tf.contrib.slim等進階API來建立模型，但是2.0僅僅支援tf.keras.layers，不管怎麼樣，省的大家重複造輪子，也意味着模型建構的部分大家都是統一的，增加代碼的複用性（回憶一下原來的TensorFlow模型建構真是千奇百怪）。值得注意的tf.nn子產品依然存在，裡面是各種常用的nn算子，不過大部分人不會去直接用這些算子構模組化型，因為keras.layers基本上包含了常用的網絡層。當然，如果想建構新的layer，可以直接繼承tf.keras.layers.Layer：

class Linear(tf.keras.layers.Layer):

    def __init__(self, units=32, **kwargs):
        super(Linear, self).__init__(**kwargs)
        self.units = units

    def build(self, input_shape):
        self.w = self.add_weight(shape=(input_shape[-1], self.units),
                             initializer='random_normal',
                             trainable=True)
        self.b = self.add_weight(shape=(self.units,),
                             initializer='random_normal',
                             trainable=True)

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

layer = Linear(32)
print(layer.weights)  # [] the weights have not created
x = tf.ones((8, 16))
y = layer(x)  # shape [8, 32]
print(layer.weights)

這裡我們繼承了Layer來實作自定義layer。第一個要注意的點是我們定義了build方法，其主要用于根據input_shape建立layer的Variables。注意，我們沒有在類構造函數中建立Variables，而是單獨定義了一個方法。之是以這樣做類的構造函數中并沒有傳入輸入Tensor的資訊，這裡需要的是input的輸入特征次元，是以無法建立Variables。這個build方法會在layer第一次真正執行（執行layer(input)）時才會執行，并且隻會執行一次（Layer内部有self.build這個bool屬性）。這是一種懶惰執行機制，如果熟悉Pytorch的話，PyTorch在建立layer時是需要輸入Tensor的資訊，這意味着它是立即建立了Variables。

第二點是Layer本身有很多屬性和方，這裡列出一些重要的：

add_weight方法：用于建立layer的weights（不用直接調用tf.Variale）；
add_loss方法：顧名思義，用于添加loss，增加的loss可以通過layer.losses屬性獲得，你可以在call方法中使用該方法添加你想要的loss；
add_metric方法：添加metric到layer；
losses屬性：通過add_loss方法添加loss的list集合，比如一部分layer的正則化loss可以通過這個屬性獲得；
trainable_weights屬性：可訓練的Variables清單，在模型訓練時需要這個屬性；
non_trainable_weights屬性：不可訓練的Variables清單；
weights屬性：trainable_weights和non_trainable_weights的合集；
trainable屬性：可變動的bool值，決定layer是否可以訓練。

Layer類是keras中最基本的類，對其有個全面的認識比較重要，具體可以看源碼。大部分情況下，我們隻會複用keras已有的layers，而我們建立模型最常用的是keras.Model類，這個Model類是繼承了Layer類，但是提供了更多的API，如model.compile(), model.fit(), model.evaluate(), model.predict()等，熟悉keras的都知道這是用于模型訓練，評估和預測的方法。另外重要的一點，我們可以繼承Model類，建立包含多layers的子產品或者模型：

class ConvBlock(tf.keras.Model):
    """Convolutional Block consisting of (conv->bn->relu).
    Arguments:
      num_filters: number of filters passed to a convolutional layer.
      kernel_size: the size of convolution kernel
      weight_decay: weight decay
      dropout_rate: dropout rate.
    """

    def __init__(self, num_filters, kernel_size,
                 weight_decay=1e-4, dropout_rate=0.):
        super(ConvBlock, self).__init__()

        self.conv = tf.keras.layers.Conv2D(num_filters,
                                          kernel_size,
                                          padding="same",
                                          use_bias=False,
                                          kernel_initializer="he_normal",
                                          kernel_regularizer=tf.keras.regularizers.l2(weight_decay))
        self.bn = tf.keras.layers.BatchNormalization()
        self.dropout = tf.keras.layers.Dropout(dropout_rate)


    def call(self, x, training=True):
        output = self.conv(x)
        output = self.bn(x, training=training)
        output = tf.nn.relu(output)
        output = self.dropout(output, training=training)
        return output


model = ConvBlock(32, 3, 1e-4, 0.5)
x = tf.ones((4, 224, 224, 3))
y = model(x)
print(model.layers)

這裡我們建構了一個包含Conv2D->BatchNorm->ReLU的block，列印model.layers可以獲得其内部包含的所有layers。更進一步地，我們可以在複用這些block就像使用tf.keras.layers一樣建構更複雜的子產品：

class SimpleCNN(tf.keras.Model):
    def __init__(self, num_classes):
        super(SimpleCNN, self).__init__()

        self.block1 = ConvBlock(16, 3)
        self.block2 = ConvBlock(32, 3)
        self.block3 = ConvBlock(64, 3)

        self.global_pool = tf.keras.layers.GlobalAveragePooling2D()
        self.classifier = tf.keras.layers.Dense(num_classes)

    def call(self, x, training=True):
        output = self.block1(x, training=training)
        output = self.block2(output, training=training)
        output = self.block3(output, training=training)
        output = self.global_pool(output)
        logits = self.classifier(output)
        return logits

model = SimpleCNN(10)
print(model.layers)
x = tf.ones((4, 32, 32, 3))
y = model(x) # [4, 10]

這種使用手法和PyTorch的Module是類似的，并且Model類的大部分屬性會遞歸地收集内部layers的屬性，比如model.weights是模型内所有layers中定義的weights。

構模組化型的另外方式還可以采用Keras原有方式，如采用tf.keras.Sequential：

model = tf.keras.Sequential([
# Adds a densely-connected layer with 64 units to the model:
layers.Dense(64, activation='relu', input_shape=(32,)),
# Add another:
layers.Dense(64, activation='relu'),
# Add a softmax layer with 10 output units:
layers.Dense(10, activation='softmax')])

或者采用keras的functional API：

inputs = keras.Input(shape=(784,), name='img')
x = layers.Dense(64, activation='relu')(inputs)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)

model = keras.Model(inputs=inputs, outputs=outputs, name='mnist_model')

雖然都可以，但是我個人還是喜歡第一種那種子產品化的模型建構方法。另外，你可以對call方法應用tf.function，這樣模型執行就使用Graph模式了。

模型訓練

在開始模型訓練之前，一個重要的項是資料加載，TensorFlow 2.0的資料加載還是采用tf.data，不過在eager模式下，tf.data.Dataset這個類将成為一個Python疊代器，我們可以直接取值：

dataset = tf.data.Dataset.range(10)
for i, elem in enumerate(dataset):
    print(elem)  # prints 0, 1, ..., 9

這裡我們隻是展示了一個簡單的例子，但是足以說明tf.data在TensorFlow 2.0下的變化，tf.data其它使用技巧和TensorFlow 1.x是一緻的。

另外tf.keras提供兩個重要的子產品losses和metrics用于模型訓練。對于losses，其本身就是對各種loss函數的封裝，如下面的case：

bce = tf.keras.losses.BinaryCrossentropy()
loss = bce([0., 0., 1., 1.], [1., 1., 1., 0.])
print('Loss: ', loss.numpy())  # Loss: 11.522857

而metrics子產品主要包含了常用的模型評估名額，這個子產品與TensorFlow 1.x的metrics子產品設計理念是一緻的，就是metric本身是有狀态的，一般是通過建立Variable來記錄。基本用法如下：

m = tf.keras.metrics.Accuracy()
m.update_state([1, 2, 3, 4], [0, 2, 3, 4])
print('result: ', m.result().numpy())  # result: 0.75
m.update_state([0, 2, 3], [1, 2, 3])
print('result: ', m.result().numpy())  #  result: 0.714
m.reset_states()  # 重置
m.update_state([0, 2, 3], [1, 2, 3])
print('result: ', m.result().numpy())  #  result: 0.667

當你需要自定義metric時，你可以繼承tf.keras.metrics.Metric類，然後實作一些接口即可，下面這個例子展示如何計算多分類問題中TP數量：

class CatgoricalTruePositives(tf.keras.metrics.Metric):

    def __init__(self, name='categorical_true_positives', **kwargs):
      super(CatgoricalTruePositives, self).__init__(name=name, **kwargs)
      self.true_positives = self.add_weight(name='tp', initializer='zeros')

    def update_state(self, y_true, y_pred, sample_weight=None):
      y_pred = tf.argmax(y_pred)
      values = tf.equal(tf.cast(y_true, 'int32'), tf.cast(y_pred, 'int32'))
      values = tf.cast(values, 'float32')
      if sample_weight is not None:
        sample_weight = tf.cast(sample_weight, 'float32')
        values = tf.multiply(values, sample_weight)
      self.true_positives.assign_add(tf.reduce_sum(values))

    def result(self):
      return self.true_positives

    def reset_states(self):
      # The state of the metric will be reset at the start of each epoch.
      self.true_positives.assign(0.)

上面的三個接口必須都要實作，其中update_state是通過添加新資料而更新狀态，而reset_states是重置初始值，result方法是獲得目前狀态，即metric結果。注意這個metric其實是建立了一個Variable來儲存TP值。你可以類比實作更複雜的metric。

對于模型訓練，我們可以通過下面一個完整執行個體來全面學習：

import numpy as np
import tensorflow as tf

fashion_mnist = tf.keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# Adding a dimension to the array -> new shape == (28, 28, 1)
train_images = train_images[..., None]
test_images = test_images[..., None]

# Getting the images in [0, 1] range.
train_images = train_images / np.float32(255)
test_images = test_images / np.float32(255)

train_labels = train_labels.astype('int64')
test_labels = test_labels.astype('int64')

# dataset
train_ds = tf.data.Dataset.from_tensor_slices(
    (train_images, train_labels)).shuffle(10000).batch(32)
test_ds = tf.data.Dataset.from_tensor_slices(
    (test_images, test_labels)).batch(32)

# Model
class MyModel(tf.keras.Sequential):
    def __init__(self):
        super(MyModel, self).__init__([
          tf.keras.layers.Conv2D(32, 3, activation='relu'),
          tf.keras.layers.MaxPooling2D(),
          tf.keras.layers.Conv2D(64, 3, activation='relu'),
          tf.keras.layers.MaxPooling2D(),
          tf.keras.layers.Flatten(),
          tf.keras.layers.Dense(64, activation='relu'),
          tf.keras.layers.Dense(10, activation=None)
        ])

model = MyModel()

# optimizer
initial_learning_rate = 1e-4
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate,
    decay_steps=100000,
    decay_rate=0.96,
    staircase=True)

optimizer = tf.keras.optimizers.RMSprop(learning_rate=lr_schedule)

# checkpoint
checkpoint = tf.train.Checkpoint(step=tf.Variable(0), optimizer=optimizer, model=model)
manager = tf.train.CheckpointManager(checkpoint, './tf_ckpts', max_to_keep=3)

# loss function
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# metric
train_loss_metric = tf.keras.metrics.Mean(name='train_loss')
train_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')
test_loss_metric = tf.keras.metrics.Mean(name='test_loss')
test_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')

# define a train step
@tf.function
def train_step(inputs, targets):
    with tf.GradientTape() as tape:
        predictions = model(inputs, training=True)
        loss = loss_object(targets, predictions)
        loss += sum(model.losses)  # add other losses
    # compute gradients and update variables
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    train_loss_metric(loss)
    train_acc_metric(targets, predictions)

# define a test step
@tf.function
def test_step(inputs, targets):
    predictions = model(inputs, training=False)
    loss = loss_object(targets, predictions)
    test_loss_metric(loss)
    test_acc_metric(targets, predictions)

# train loop
epochs = 10
for epoch in range(epochs):
    print('Start of epoch %d' % (epoch,))
    # Iterate over the batches of the dataset
    for step, (inputs, targets) in enumerate(train_ds):
        train_step(inputs, targets)
        checkpoint.step.assign_add(1)
        # log every 20 step
        if step % 20 == 0:
            manager.save() # save checkpoint
            print('Epoch: {}, Step: {}, Train Loss: {}, Train Accuracy: {}'.format(
                epoch, step, train_loss_metric.result().numpy(),
                train_acc_metric.result().numpy())
            )
            train_loss_metric.reset_states()
            train_acc_metric.reset_states()

# do test
for inputs, targets in test_ds:
    test_step(inputs, targets)
print('Test Loss: {}, Test Accuracy: {}'.format(
    test_loss_metric.result().numpy(),
    test_acc_metric.result().numpy()))

麻雀雖小，但五髒俱全，這個執行個體包括資料加載，模型建立，以及模型訓練和測試。特别注意的是，這裡我們将train和test的一個step通過tf.function轉為Graph模式，可以加快訓練速度，這是一種值得推薦的方式。另外一點，上面的訓練方式采用的是custom training loops，自由度較高，另外一種訓練方式是采用keras比較正常的compile和fit訓練方式。

TensorFlow 2.0的另外一個特點是提供

tf.distribute.Strategy

更好地支援分布式訓練，其接口更加簡單易用。我們最常用的分布式政策是單機多卡同步訓練，

tf.distribute.MirroredStrategy

完美支援這種政策。這種政策将在每個GPU裝置上建立一個模型副本（replica），模型中的參數在所有replica之間映射，稱之為MirroredVariables，當他們執行相同更新時将在所有裝置間同步。底層的通信采用all-reduce算法，all-reduce方法可以将多個裝置上的Tensors聚合在每個裝置上，這種通信方式比較高效，而all-reduce算法有多中實作方式，這裡預設采用NVIDIA NCCL的all-reduce方法。建立這種政策隻需要簡單地定義：

mirrored_strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"],
    cross_device_ops=tf.distribute.NcclAllReduce())
# 這裡将在GPU 0和1上同步訓練

當我們建立好分布式政策後，在後續的操作中隻需要加入strategy.scope即可。下面我們建立一個簡單的模型以及優化器：

with mirrored_strategy.scope():
    model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
    optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)

對于dataset，我們需要調用

tf.distribute.Strategy.experimental_distribute_dataset

來分發資料：

with mirrored_strategy.scope():
    dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(1000).batch(
      global_batch_size)
    # 注意這裡是全局batch size
    dist_dataset = mirrored_strategy.experimental_distribute_dataset(dataset)

然後我們定義train step，并采用

strategy.experimental_run_v2

來執行：

@tf.function
def train_step(dist_inputs):
    def step_fn(inputs):
        features, labels = inputs

        with tf.GradientTape() as tape:
            logits = model(features)
            cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
            logits=logits, labels=labels)
            loss = tf.reduce_sum(cross_entropy) * (1.0 / global_batch_size)

        grads = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(list(zip(grads, model.trainable_variables)))
        return cross_entropy

    per_example_losses = mirrored_strategy.experimental_run_v2(step_fn, args=(dist_inputs,))
    mean_loss = mirrored_strategy.reduce(tf.distribute.ReduceOp.MEAN,
                    per_example_losses, axis=0)
    return mean_loss

這裡要注意的是我們要将loss除以全部batch size，隻是因為分布式訓練時在更新梯度前會将所有replica上梯度通過all-reduce算法相加聚合到每個裝置上。另外，

strategy.experimental_run_v2

傳回是每個replica的結果，要得到最終結果，需要reduce聚合一下。

最後是執行訓練，采用循環方式即可：

with mirrored_strategy.scope():
    for inputs in dist_dataset:
        print(train_step(inputs))

要注意的是MirroredStrategy隻支援單機多卡同步訓練，如果想使用多機版本，需要采用MultiWorkerMirorredStrateg。其它的分布式訓練政策還有CentralStorageStrategy，TPUStrategy，ParameterServerStrategy。想深入了解的話，可以檢視distribute_strategy guide以及distribute_strategy tuorial。

結語

這裡我們簡明扼要地介紹了TensorFlow 2.0的核心新特性，相信掌握這些新特性就可以快速入手TensorFlow 2.0。不過目前Google隻釋出了TensorFlow 2.0.0-beta0版本，未來也許會有更多想象不到的黑科技。加油！TensorFlow Coders。

參考文獻

TensorFlow官網.
TensorFlow 2.0 docs.

TensorFlow 2.0簡明指南

文章目錄

Eager執行

AutoGraph

性能優化：tf.function

模型建構：tf.keras

模型訓練

結語

參考文獻

繼續閱讀

吳恩達logistic回歸實作

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

吳恩達機器學習筆記（3）

吳恩達j機器學習之過拟合

吳恩達機器學習(一) 介紹

深度學習模型分析人類複雜疾病的準确性

疾病研究：重症肌無力

人工智能如何有效地運用于自然語言處理

新聞 | Mapbox 牽手阿裡，飛豬旅行上線六大城市地圖功能

【趨高機器視覺】機器視覺技術原了解析及解決方案

[HTML5]自定義屬性 data-* 和 jQuery.data 詳解

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡

2021年危險化學品經營機關安全管理人員考試題庫及危險化學品經營機關安全管理人員考試技巧

無人機--飛控科普