tensorflow with求導_Tensorflow2 自動求導機制

自動求導機制tf.wiki

數學邏輯：

對loss和參數計算梯度

對梯度和參數進行參數更新

使用 tape.gradient(ys, xs) 自動計算梯度；

#單純求導數，傳回值是一個tensor。ys一般是loss，xs一般是需要更新的variables。

#ys分别對variables中每一個參數求導數

使用 optimizer.apply_gradients(grads_and_vars) 自動更新模型參數。

#grads_and_vars=[(grad, x)] 輸入參數(gradient,variable)

#每一個grad和variable以元組形式輸入

# 一般 grads_and_vars=zip(grads, variables)

tf.GradientTape()

TensorFlow 引入了 tf.GradientTape() 這個 “求導記錄器” 來實作自動求導

如何使用 tf.GradientTape() 計算函數 y(x) = x^2 在 x = 3 時的導數：

import tensorflow as tf

x = tf.Variable(initial_value=3.) #初始化一個tensor x，因求x=3的導數，是以初值為3

with tf.GradientTape() as tape: # 在 tf.GradientTape() 的上下文内，所有計算步驟都會被記錄以用于求導

y = tf.square(x) #在with上下文内寫計算式

#with上下文之外

y_grad = tape.gradient(y, x) # 計算y關于x的導數

print([y, y_grad])

隻要進入了 with tf.GradientTape() as tape 的上下文環境，則在該環境中計算步驟都會被自動記錄。比如在上面的示例中，計算步驟 y = tf.square(x) 即被自動記錄。離開上下文環境後，記錄将停止，但記錄器 tape 依然可用，是以可以通過 y_grad = tape.gradient(y, x) 求張量 y 對變量 x 的導數。

以下代碼展示了如何使用 tf.GradientTape() 計算函數

tensorflow with求導_Tensorflow2 自動求導機制

X = tf.constant([[1., 2.], [3., 4.]]) #shape(2, 2)

y = tf.constant([[1.], [2.]]) #shape(2, 1)

w = tf.Variable(initial_value=[[1.], [2.]]) #shape(2, 1)

b = tf.Variable(initial_value=1.) #shape(1) 矩陣加法的時候用了broadcast機制

with tf.GradientTape() as tape:

L = 0.5 * tf.reduce_sum(tf.square(tf.matmul(X, w) + b - y))

w_grad, b_grad = tape.gradient(L, [w, b]) # 計算L(w, b)關于w, b的偏導數

print([L.numpy(), w_grad.numpy(), b_grad.numpy()])

實作線性回歸

使用 tape.gradient(ys, xs) 自動計算梯度；

#單純求導數，傳回值是一個tensor。ys一般是loss，xs一般是需要更新的variables。ys分别對variables中每一個參數求導數

使用 optimizer.apply_gradients(grads_and_vars) 自動更新模型參數。

#grads_and_vars=[(grad, x)] 輸入參數(gradient,variable) 每一個grad和variable以元組形式輸入一般 grads_and_vars=zip(grads, variables)

import tensorflow as tf

#設定随機資料

x = tf.range(10)

y = tf.range(10,)

x = tf.cast(x, dtype=tf.float32) #轉為float

y = 2 * tf.cast(y, dtype=tf.float32) + tf.random.normal([10,]) #增加随機擾動[10,]表示shape

#參數初始化

w = tf.Variable(initial_value=0.)

b = tf.Variable(initial_value=0.)

#定義優化器

optimizer = tf.keras.optimizers.Adam(lr=0.05)

#需要更新的參數

variables = [w, b]

for _ in tf.range(500):

with tf.GradientTape() as tape:

L = tf.reduce_mean(tf.square(tf.multiply(w,x) + b - y)) #loss

L_gradx = tape.gradient(L, variables) #針對loss和需要更新參數求梯度

optimizer.apply_gradients(grads_and_vars=zip(L_gradx, variables)) #針對(梯度和參數) 進行參數更新

tf.print([L]) #列印loss

在這裡，我們使用了前文的方式計算了損失函數關于參數的偏導數。同時，使用 tf.keras.optimizers.SGD(learning_rate=1e-3) 聲明了一個梯度下降優化器 (Optimizer)，其學習率為 0.05。優化器可以幫助我們根據計算出的求導結果更新模型參數，進而最小化某個特定的損失函數，具體使用方式是調用其 apply_gradients() 方法。

注意到這裡，更新模型參數的方法 optimizer.apply_gradients() 需要提供參數 grads_and_vars，即待更新的變量(如上述代碼中的 variables )及損失函數關于這些變量的偏導數(如上述代碼中的 grads )。具體而言，這裡需要傳入一個 Python 清單(List)，清單中的每個元素是一個 (變量的偏導數，變量) 對。比如上例中需要傳入的參數是 [(grad_a, a), (grad_b, b)] 。我們通過 grads = tape.gradient(loss, variables) 求出 tape 中記錄的 loss 關于 variables = [a, b] 中每個變量的偏導數，也就是 grads = [grad_a, grad_b]，再使用 Python 的 zip() 函數将 grads = [grad_a, grad_b] 和 variables = [a, b] 拼裝在一起，就可以組合出所需的參數了。