RNN：Recurrent Neural Network，循環神經網絡。在全連接配接神經網絡中，包括輸入層、隐藏層，輸出層，其中神經元可以認為是linear+激活函數。全連接配接網絡如下所示：

RNN1. 簡單RNN前向傳播2. RNN擴充3. 小結3. RNN案例pytorch實作4. RNN擴充及反向傳播5. tensorflow案例

對于一個樣本， x 1 x_1 x1， x 2 x^2 x2都為标量，其組合構成了一個樣本對應的向量。對于所有樣本， x 1 x_1 x1， x 2 x^2 x2為向量，即每個次元的向量，兩個組合就是所有樣本的特征矩陣。

備注：本文中 x 1 x_1 x1表示次元， x 1 x^1 x1表示第一個樣本

1. 簡單RNN前向傳播

rnn之是以稱為循環神經網絡，是因為前一個輸出（不一定是最終結果）是下一個的輸入（不一定是一開始的輸入），具體什麼樣的輸出，下文會講述。現以一個自然語言處理問題為例：判斷單詞的類别，如arrive China on Monday ，其中China屬于place，Monday屬于time，arrive屬于others。

首先可以将arrive China on Monday四個詞變成詞向量（如word2vec等），分别以 x 1 , x 2 , x 3 , x 4 x^1, x^2, x^3, x_4 x1,x2,x3,x4表示，對于每一個詞向量的rnn網絡也包括輸入層、隐藏層、輸出層。如下圖所示（本圖隻畫了前三個詞，處于簡便，省略了第四個詞）：

RNN1. 簡單RNN前向傳播2. RNN擴充3. 小結3. RNN案例pytorch實作4. RNN擴充及反向傳播5. tensorflow案例

上圖中，相同顔色代表相同（如權重相同），其中 h 2 = σ ( h 1 W 1 + U x 2 + b ) ， σ 表示激活函數 h^2 =\sigma(h^1W_1+Ux^2+b)，\sigma表示激活函數 h2=σ(h1W1+Ux2+b)，σ表示激活函數，更一般得 h t = σ ( h t − 1 W 1 + U x t + b ) h^t=\sigma(h^{t-1}W_1+Ux^t+b) ht=σ(ht−1W1+Uxt+b)，由此可以看出RNN記住了前面的輸出， y 1 , y 2 , y 3 y^1,y^2,y^3 y1,y2,y3則代表每個詞對應每個類别的機率 y 1 = σ ( V h 1 + b ) y^1=\sigma(Vh^1+b) y1=σ(Vh1+b)。聯系DNN，其實RNN和DNN的結構很相似，單取出 x 1 x^1 x1對應的第一列，其本質就是一個DNN，輸入層為 x 1 x^1 x1，其次元簡化成一個小方塊，隐藏層1也簡化成一個小方塊，其次元由 σ ( W x ) \sigma(Wx) σ(Wx)決定(本質是由W的行決定)，隻不過此時的x不僅僅是 x 1 x^1 x1，還要考慮 h 0 h^0 h0，即上一個詞的隐藏層輸出。故在用pytorch時，不僅需要設定輸入層次元，還要設定隐藏層的次元。

ps:上圖中隐藏層和輸出層是分開的，但pytorch中，輸出層其實是最後一個隐藏層。

此外，常見的RNN結構圖也可以表示如下：

RNN1. 簡單RNN前向傳播2. RNN擴充3. 小結3. RNN案例pytorch實作4. RNN擴充及反向傳播5. tensorflow案例

上圖中左邊是RNN模型沒有按時間展開的圖，如果按時間序列展開，則是上圖中的右邊部分。我們重點觀察右邊部分的圖。

這幅圖描述了在序列索引号t附近RNN的模型。其中：

x(t)代表在序列索引号t時訓練樣本的輸入。同樣的，x(t−1)和x(t+1)代表在序列索引号t−1和t+1時訓練樣本的輸入。
h(t)代表在序列索引号t時模型的隐藏狀态。h(t)由x(t)和h(t−1)共同決定。
o(t)代表在序列索引号t時模型的輸出。o(t)隻由模型目前的隐藏狀态h(t)決定。
L(t)代表在序列索引号t時模型的損失函數。
y(t)代表在序列索引号t時訓練樣本序列的真實輸出。
U,W,V這三個矩陣是我們的模型的線性關系參數，它在整個RNN網絡中是共享的，這點和DNN很不相同。也正因為是共享了，它展現了RNN的模型的“循環回報”的思想。

2. RNN擴充

2.1 多層RNN

上一章，僅介紹了最簡單的三層RNN，此外，RNN也可以包含多個隐藏層，結構如下所示：

RNN1. 簡單RNN前向傳播2. RNN擴充3. 小結3. RNN案例pytorch實作4. RNN擴充及反向傳播5. tensorflow案例

是以，在使用pytorch時，需要設定隐藏層個數。

2.2 Jodan RNN

以上的RNN都被稱為Elman Network，此外還有Jordan Network，Jordan Network是将上一層的最終輸出傳到下一層，結構如下圖所示：

RNN1. 簡單RNN前向傳播2. RNN擴充3. 小結3. RNN案例pytorch實作4. RNN擴充及反向傳播5. tensorflow案例

在pytorch中，使用的是Elman RNN

2.3 雙向RNN

之前講到的RNN x 1 , x 2 , x 3 x^1,x^2,x^3 x1,x2,x3都是按照正常的順序傳進去的，即先處理 x 1 x^1 x1，再處理 x 2 x^2 x2，此外還可以反向傳入，即先處理 x 2 x^2 x2，再處理 x 1 x^1 x1，最終結果為這兩條路徑輸出的和（或者其他運算）。結構如下圖所示：

RNN1. 簡單RNN前向傳播2. RNN擴充3. 小結3. RNN案例pytorch實作4. RNN擴充及反向傳播5. tensorflow案例

3. 小結

以上RNN結構圖實際上隻是一個樣本的，類似于DNN，當我們看到結構圖時，實際隻是針對一個樣本。如果RNN圖是一個樣本，就會有一個疑問， x 1 , x 2 , . . . x n x^1, x^2,...x^n x1,x2,...xn是什麼呢？其實RNN有點類似時間序列，n就是時間的跨度。舉個例子，以自然語言處理為例，預測一句話的情感類别。假設我們現在有100句話，分别預測這100句話的情感類别。假設其中有句話為：我愛中華。這句話共有四個字，可以轉換成四個字向量， x 1 即我字對應的字向量， x 2 為愛字對應的字向量， x 3 為中字對應的字向量， x 4 為華字對應的字向量。 x^1即我字對應的字向量，x^2為愛字對應的字向量，x^3為中字對應的字向量，x^4為華字對應的字向量。 x1即我字對應的字向量，x2為愛字對應的字向量，x3為中字對應的字向量，x4為華字對應的字向量。剛才講的字向量，也可以按詞向量處理。

3. RNN案例pytorch實作

在進行案例實作前，先了解RNN的參數：

參數	含義
input_size	輸入層次元
hidden_size	隐藏層次元
num_layers	隐藏層層數
nonlinearity	激活函數，預設是tanh，還可以設定成relu
bias	偏置項即階矩陣，預設True即設定截距
batch_first	如果為True，輸入tensor的shape為(batch, seq, feature),輸出也一樣
dropout	如果非零，除了最後一層，其餘各層都要随機失活
bidirectional	如果為True，将會變成雙向RNN，預設False

RNN的輸入
- RNN輸入為（input, h 0 h_0 h0），
- input的shape為（seq_len, batch, input_size），當batch_first設定為True時，shape為（batch, seq_len, input_size）。例如在自然語言處理任務中，預測一句話情感類别。神經網絡模型中，一般設定batch_size，即一次訓練多少個樣本（以50為例），即每次取出50個句子，每個句子假設有10個字，每個字都可以轉換成一個20維的向量，那麼當batch_first設定為True時，shape為（50，10，20）
- h 0 h_0 h0的shape為（num_layer * num_directions, batch, hidden_size）。num_layer即隐藏層個數，當為單向時，num_directions為1，雙向時為2。不寫預設為0。（對于一個樣本的第一列h為向量，多個樣本時為矩陣，每個樣本僅保留最後一列的h，第一個樣本第一列因為都是第一，是以需要設定一個初始值，不寫就預設為0）
RNN輸出
- （output_h, h_n）
- output_h的shape為（seq_len, batch, hidden_size*num_directions），如果設定了batch_first，batch會提前
- h_n的shape為（num_layers*num_directions, batch, hidden_size）儲存了最後一個時刻的狀态

RNN1. 簡單RNN前向傳播2. RNN擴充3. 小結3. RNN案例pytorch實作4. RNN擴充及反向傳播5. tensorflow案例

output的第2維和隐藏層size是一樣的。

3.1 單向RNN代碼示例

import torch
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn

# 定義網絡結構
class Rnn(torch.nn.Module):
    def __init__(self, input_size , hidden_size , num_layers ):
        super(Rnn, self).__init__()
        
        self.rnn = torch.nn.RNN(input_size = input_size ,
                               hidden_size =  hidden_size,
                               num_layers = num_layers)
        

    def forward(self, x):
        output, hn = self.rnn(x) 
        return output, hn

input_size = 20
hidden_size = 32
num_layers = 1
LR = 0.02

model = Rnn(input_size , hidden_size, num_layers)
loss_func = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr = LR)


for i in range(100): # 循環300次

    x = torch.randn(5, 8, 20) # 輸入seq = 5，batch = 8， input_size = 20
    y = torch.randn(5, 8, 32) # 由于預測結果shape(5, 8, 32)，故y也随機生成(5, 8, 32)
    prediction, hn = model(x)
    loss = loss_func(prediction, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(prediction.shape)

3.2 雙向RNN代碼示例

import torch
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn

# 定義網絡結構
class Rnn(torch.nn.Module):
    def __init__(self, input_size , hidden_size , num_layers, bidirectional):
        super(Rnn, self).__init__()
        
        self.rnn = torch.nn.RNN(input_size = input_size ,
                               hidden_size =  hidden_size,
                               num_layers = num_layers,
                               bidirectional = bidirectional)
        

    def forward(self, x):
        output, hn = self.rnn(x) 
        return output, hn

input_size = 20
hidden_size = 32
num_layers = 1
bidirectional = True
LR = 0.02

model = Rnn(input_size , hidden_size, num_layers, bidirectional)
loss_func = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr = LR)


for i in range(100): # 循環300次
    x = torch.randn(5, 8, 20) # 輸入seq = 5，batch = 8， input_size = 20
    y = torch.randn(5, 8, 32) # 由于預測結果shape(5, 8, 32)，故y也随機生成(5, 8, 32)
    out, hn = model(x)
    prediction = out[:,:,0:32] + out[:,:,32:] # 将前向與後向的結果相加
    loss = loss_func(prediction, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(prediction.shape)

4. RNN擴充及反向傳播

依據RNN的思想，随後擴充出lstm、GRU等，這些都是特殊地RNN，下一篇部落格會詳細介紹lstm，這些模型優化的依據都是反向傳播，在介紹lstm時詳細介紹。

5. tensorflow案例

5.1單向單個隐藏層

tf中基礎的rnn代碼如下：

def RNN(x, weights, biases):

    # Define a rnn cell with tensorflow
    rnn_cell = tf.nn.rnn_cell.BasicRNNCell(num_hidden)

    # Get rnn cell output
    outputs, states = tf.nn.static_rnn(rnn_cell, x, dtype=tf.float32)

    # Linear activation, using rnn inner loop last output
    return tf.matmul(outputs[-1], weights['out']) + biases['out'] 

logits = RNN(X, weights, biases)
prediction = tf.nn.softmax(logits)

首先，需要定義一個rnn的cell，之前的contrib用法已經廢棄，現使用tf.nn.run_cell裡面的api，一般情況，使用BasicRNNCell即可（RNNCell可先不管）。BasicRNNCell隻是代表一個隐藏層，其比較重要的一個參數為num_unit（神經元個數），該參數就類似于DNN的權重W的次元，也就是上述公式中的W的次元，該參數決定了單個序列的輸出次元（一個樣本中每個序列的輸出次元都等于num_unit）。

在運作cell時，有兩種方法，一種是靜态的（tf.nn.static_rnn），一種是動态的（tf.nn.dynamic_rnn）。簡單了解，動态的方法運作更快，占用記憶體更少。(具體差別參考部落格：https://blog.csdn.net/qq_34430032/article/details/82840834，下附1為部落格具體内容)

tf.nn.dynamic_rnn/tf.nn.static_rnn的幾個重要參數講解：
- cell：即RNN的一個執行個體，即上述代碼中的rnn_cell。
- input：輸入資料。如果time_major == False，其shape必須為[batch_size, max_time, input_num]，max_time即時序長度，input_num即次元。如果time_major == False，其shape必須為[max_time, batch_size, input_num]。預設情況下time_major為False。
- initial_state：rnn的初始狀态，即 h 0 h_0 h0前面一個h。shape為[batch_size, cell.state_size]，state_size即初始化時h的次元，這個可以任意選擇，一般initial_state就設定成0，state_size即input的次元。動态方法中可以不進行設定，預設為None。
- return：a pair(outputs, state):
  - output：如果 time_major == False (default), 其shape為 [batch_size, max_time, cell.output_size]。如果time_major == True，其shape為 [max_time, batch_size, cell.output_size]。這裡的output_size就是num_unit，即神經元個數。r如輸入資料[128, 15, 20]，num_unit = 8,則輸出的output為[128, 15, 8]
  - state：final state。如果cell.state_size是int，則将形成[batch_size，cell.state_size]。如果它是TensorShape，則将形成[batch_size] + cell.state_size。如果它是一個（可能是嵌套的）int或TensorShape元組，那麼這将是一個具有相應形狀的元組。如果單元格是LSTMCell，則狀态将是包含每個單元格的LSTMStateTuple的元組。簡單來講，state就是序列中最後一個輸出，比如輸入我愛你，state就是你對應的輸出，按照上述output例子，其shape為（15,8）
    
    。是以output[-1]就等同于state。

5.2 單向多個隐藏層（含drop）

上述隻表示一個隐藏層，如果有多個，如下所示：

def RNN(x, weights, biases):

    # Define a rnn cell with tensorflow
    rnn_cell = tf.nn.rnn_cell.BasicRNNCell(num_hidden)
    
    # DropoutWrapper
    rnn_cell_drop = tf.nn.rnn_cell.DropoutWrapper(rnn_cell, input_keep_prob = 1.0, output_keep_prob = 0.8)
    
    # muti_hidden_layer
    muti_rnn = tf.nn.rnn_cell.MultiRNNCell(rnn_cell_drop * 5)
    
    # Get rnn cell output
    outputs, states = tf.nn.static_rnn(muti_rnn, x, dtype=tf.float32)

    # Linear activation, using rnn inner loop last output
    return tf.matmul(outputs[-1], weights['out']) + biases['out']

logits = RNN(X, weights, biases)
prediction = tf.nn.softmax(logits)

其中tf.nn.rnn_cell.DropoutWrapper的主要參數有cell，input_keep_prob, output_keep_prob。多層就是在單層基礎上用tf.nn.rnn_cell.MultiRNNCell，參數為cell * num_layers

此外，如果想設計多個隐藏層，除了上述的構造一個結構後，乘以num，還可以自己設計多個結構。（寫多個def RNN()）

5.3 雙向RNN

def RNN(x, weights, biases):

    # Define a rnn cell with tensorflow
    rnn_cell_forward = tf.nn.rnn_cell.BasicRNNCell(num_hidden)
    rnn_cell_backward = tf.nn.rnn_cell.BasicRNNCell(num_hidden)
    
    # DropoutWrapper
    rnn_cell_drop_forward = tf.nn.rnn_cell.DropoutWrapper(rnn_cell_forward, input_keep_prob = 1.0, output_keep_prob = 0.8)
    rnn_cell_drop_backward = tf.nn.rnn_cell.DropoutWrapper(rnn_cell_backward, input_keep_prob = 1.0, output_keep_prob = 0.7)
    
    # muti_hidden_layer
    muti_rnn_forward = tf.nn.rnn_cell.MultiRNNCell(rnn_cell_drop_forward * 5)
    muti_rnn_backward = tf.nn.rnn_cell.MultiRNNCell(rnn_cell_drop_backward * 6)
    
    # Get rnn cell output
    outputs, states = tf.nn.static_bidirectional_rnn(muti_rnn_forward, muti_rnn_backward, x, dtype=tf.float32)
    
    # Linear activation, using rnn inner loop last output
    return tf.matmul(outputs[-1], weights['out']) + biases['out']

logits = RNN(X, weights, biases)
prediction = tf.nn.softmax(logits)

上述代碼用的tf.nn.static_bidirectional_rnn，tf中沒有tf.nn.dynamic_bidirectional_rnn。

5.4 完整代碼展示

import tensorflow as tf
from tensorflow.contrib import rnn

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)


# Training Parameters
learning_rate = 0.001
training_steps = 10000
batch_size = 128
display_step = 200

# Network Parameters
num_input = 28 # MNIST data input (img shape: 28*28)
timesteps = 28 # timesteps
num_hidden = 128 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
X = tf.placeholder("float", [None, timesteps, num_input])
Y = tf.placeholder("float", [None, num_classes])

# Define weights
weights = {
    'out': tf.Variable(tf.random_normal([num_hidden, num_classes]))
}
biases = {
    'out': tf.Variable(tf.random_normal([num_classes]))
}


def RNN(x, weights, biases):

    # Prepare data shape to match `rnn` function requirements
    # Current data input shape: (batch_size, timesteps, n_input)
    # Required shape: 'timesteps' tensors list of shape (batch_size, n_input)

    # Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
    x = tf.unstack(x, timesteps, 1)

    # Define a lstm cell with tensorflow
    lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(num_hidden, forget_bias=1.0)

    # Get lstm cell output
    outputs, states = tf.nn.static_rnn(lstm_cell, x, dtype=tf.float32)

    # Linear activation, using rnn inner loop last output
    return tf.matmul(outputs[-1], weights['out']) + biases['out']

logits = RNN(X, weights, biases)
prediction = tf.nn.softmax(logits)

# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)

# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                                 Y: batch_y})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))

    print("Optimization Finished!")

    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
    test_label = mnist.test.labels[:test_len]
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))

附件1：

RNN1. 簡單RNN前向傳播2. RNN擴充3. 小結3. RNN案例pytorch實作4. RNN擴充及反向傳播5. tensorflow案例

RNN1. 簡單RNN前向傳播2. RNN擴充3. 小結3. RNN案例pytorch實作4. RNN擴充及反向傳播5. tensorflow案例

1. 簡單RNN前向傳播

2. RNN擴充

2.1 多層RNN

2.2 Jodan RNN

2.3 雙向RNN

3. 小結

3. RNN案例pytorch實作

3.1 單向RNN代碼示例

3.2 雙向RNN代碼示例

4. RNN擴充及反向傳播

5. tensorflow案例

5.1單向單個隐藏層

5.2 單向多個隐藏層（含drop）

5.3 雙向RNN

5.4 完整代碼展示

繼續閱讀

GPT 原了解析

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

ELMo 原了解析

CentOS上Docker安裝GPU支援Nvidia-docker

場景文本檢測，CTPN tensorflow版本text-detection-ctpnpreparetraindemosome results

論文閱讀筆記20.05-第三周：ResNet的多種變種Residual Attention Network for Image ClassificationRes2Net: A New Multi-scale Backbone ArchitectureResNeSt: Split-Attention Networks

如何寫一篇好的科研論文背景我能夠從你的論文裡學到什麼？

Fast Spatio-Temporal Residual Network for Video Super-Resolution閱讀了解

Visual Attention

Tensorflow Day19 Denoising Autoencoder

Tensorflow Day16 Autoencoder 實作

Tensorflow Day17 Sparse Autoencoder

基于keras的多GPU深度學習網絡模型及參數儲存-筆記

A Guide For Time Series Prediction Using Recurrent Neural Networks (LSTMs)

ICLR 2017 | GAN Missing Modes 和 GAN

【深度學習-基礎知識】batchNormal原理及caffe中是如何使用的