全部代碼：點選這裡檢視
本文個人部落格位址：點選這裡檢視
關于 Tensorflow 實作一個簡單的二進制序列的例子可以點選這裡檢視
關于 RNN 和 LSTM 的基礎可以檢視這裡
這篇部落客要包含以下内容
- 訓練一個 RNN 模型逐字元生成文本資料(最後的部分)
- 使用 Tensorflow 的 scan 函數實作 dynamic_rnn 動态建立的效果
- 使用 multiple RNN 建立多層的 RNN
- 實作 Dropout 和 Layer Normalization 的功能

一、模型說明和資料處理

1、模型說明

我們要使用 RNN 學習一個語言模型( language model )去生成字元序列
githbub 上有别人實作好的
- Torch 中的實作：https://github.com/karpathy/char-rnn
- Tensorflow 中的實作：https://github.com/sherjilozair/char-rnn-tensorflow
接下來我們來看如何實作

2、資料處理

資料集使用莎士比亞的一段文集，點選這裡檢視, 實際也可以使用别的
大小寫字元視為不同的字元
下載下傳并讀取資料

'''下載下傳資料并讀取資料'''
file_url = 'https://raw.githubusercontent.com/jcjohnson/torch-rnn/master/data/tiny-shakespeare.txt'
file_name = 'tinyshakespeare.txt'
if not os.path.exists(file_name):
    urllib.request.urlretrieve(file_url, filename=file_name)
with open(file_name, 'r') as f:
    raw_data = f.read()
    print("資料長度", len(raw_data))

處理字元資料，轉換為數字
- 使用 set 去重，得到所有的唯一字元
- 然後一個字元對應一個數字（使用字典）
- 然後周遊原始資料，得到所有字元對應的數字

'''處理字元資料，轉換為數字'''
vocab = set(raw_data)                    # 使用set去重，這裡就是去除重複的字母(大小寫是區分的)
vocab_size = len(vocab)      
idx_to_vocab = dict(enumerate(vocab))    # 這裡将set轉為了字典，每個字元對應了一個數字0,1,2,3..........(vocab_size-1)
vocab_to_idx = dict(zip(idx_to_vocab.values(), idx_to_vocab.keys())) # 這裡将字典的(key, value)轉換成(value, key)

data = [vocab_to_idx[c] for c in raw_data]   # 處理raw_data, 根據字元，得到對應的value,就是數字
del raw_data

生成 batch 資料
- Tensorflow models 給出的PTB模型：https://github.com/tensorflow/models/tree/master/tutorials/rnn/ptb

'''超參數'''
num_steps=             # 學習的步數
batch_size=
state_size=            # cell的size
num_classes = vocab_size
learning_rate = 

def gen_epochs(num_epochs, num_steps, batch_size):
    for i in range(num_epochs):
        yield reader.ptb_iterator_oldversion(data, batch_size, num_steps)

- ptb_iterator函數實作：
  - 傳回資料 X,Y 的 shape=[batch_size, num_steps]

def ptb_iterator_oldversion(raw_data, batch_size, num_steps):
  """Iterate on the raw PTB data.
  This generates batch_size pointers into the raw PTB data, and allows
  minibatch iteration along these pointers.
  Args:
  raw_data: one of the raw data outputs from ptb_raw_data.
  batch_size: int, the batch size.
  num_steps: int, the number of unrolls.
  Yields:
  Pairs of the batched data, each a matrix of shape [batch_size, num_steps].
  The second element of the tuple is the same data time-shifted to the
  right by one.
  Raises:
  ValueError: if batch_size or num_steps are too high.
  """
  raw_data = np.array(raw_data, dtype=np.int32)

  data_len = len(raw_data)
  batch_len = data_len // batch_size
  data = np.zeros([batch_size, batch_len], dtype=np.int32)
  for i in range(batch_size):
    data[i] = raw_data[batch_len * i:batch_len * (i + )]

  epoch_size = (batch_len - ) // num_steps

  if epoch_size == :
    raise ValueError("epoch_size == 0, decrease batch_size or num_steps")

  for i in range(epoch_size):
    x = data[:, i*num_steps:(i+)*num_steps]
    y = data[:, i*num_steps+:(i+)*num_steps+]
    yield (x, y)

二、使用 `tf.scan` 函數和 `dynamic_rnn`

1、為什麼使用 `tf.scan` 和 `dynamic_rnn`

之前我們實作的第一個例子中沒有用 dynamic_rnn 的部分是将輸入的三維資料 [batch_size,num_steps, state_size] 按 num_steps 次元進行拆分，然後每計算一步都存到 list 清單中，如下圖

深度學習（08）_RNN-LSTM循環神經網絡-03-Tensorflow進階實作一、模型說明和資料處理二、使用tf.scan函數和dynamic_rnn三、關于多層RNN四、Dropout操作五、層标準化 (Layer Normalization)六、生成文本Reference

- 這種建構方式很耗時，在我們例子中沒有展現出來，但是如果我們要學習的步數很大(

num_steps

，也可以說要學習的依賴關系很長），如果再使用深層的RNN，這種就不合适了

- 為了友善比較和

dynamic_rnn

的運作耗時，下面還是給出使用

list

2、使用 `list` 的方式( `static_rnn` )

建構計算圖
- 我這裡 tensorflow 的版本是 1.2.0 ，與 1.0 些許不一樣
- 和之前的例子差不多，這裡不再累述

'''使用list的方式'''
def build_basic_rnn_graph_with_list(
    state_size = state_size,
    num_classes = num_classes,
    batch_size = batch_size,
    num_steps = num_steps,
    num_layers = ,
    learning_rate = learning_rate):

    reset_graph()

    x = tf.placeholder(tf.int32, [batch_size, num_steps], name='x')
    y = tf.placeholder(tf.int32, [batch_size, num_steps], name='y')

    x_one_hot = tf.one_hot(x, num_classes)   # (batch_size, num_steps, num_classes)
    '''這裡按第二維拆開num_steps*(batch_size, num_classes)'''
    rnn_inputs = [tf.squeeze(i,squeeze_dims=[]) for i in tf.split(x_one_hot, num_steps, )]

    cell = tf.nn.rnn_cell.BasicRNNCell(state_size)
    init_state = cell.zero_state(batch_size, tf.float32)
    '''使用static_rnn方式'''
    rnn_outputs, final_state = tf.contrib.rnn.static_rnn(cell=cell, inputs=rnn_inputs, 
                                                        initial_state=init_state)
    #rnn_outputs, final_state = tf.nn.rnn(cell, rnn_inputs, initial_state=init_state) # tensorflow 1.0的方式
    with tf.variable_scope('softmax'):
        W = tf.get_variable('W', [state_size, num_classes])
        b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer())
    logits = [tf.matmul(rnn_output, W) + b for rnn_output in rnn_outputs]

    y_as_list = [tf.squeeze(i, squeeze_dims=[]) for i in tf.split(y, num_steps, )]

    #loss_weights = [tf.ones([batch_size]) for i in range(num_steps)]
    losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y_as_list, 
                                                  logits=logits)
    #losses = tf.nn.seq2seq.sequence_loss_by_example(logits, y_as_list, loss_weights)  # tensorflow 1.0的方式
    total_loss = tf.reduce_mean(losses)
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(total_loss)

    return dict(
        x = x,
        y = y,
        init_state = init_state,
        final_state = final_state,
        total_loss = total_loss,
        train_step = train_step
    )

訓練神經網絡函數
- 和之前例子類似

'''訓練rnn網絡的函數'''
def train_rnn(g, num_epochs, num_steps=num_steps, batch_size=batch_size, verbose=True, save=False):
    tf.set_random_seed()
    with tf.Session() as sess:
        sess.run(tf.initialize_all_variables())
        training_losses = []
        for idx, epoch in enumerate(gen_epochs(num_epochs, num_steps, batch_size)):
            training_loss = 
            steps = 
            training_state = None
            for X, Y in epoch:
                steps += 
                feed_dict = {g['x']: X, g['y']: Y}
                if training_state is not None:
                    feed_dict[g['init_state']] = training_state
                training_loss_, training_state, _ = sess.run([g['total_loss'],
                                                           g['final_state'],
                                                           g['train_step']],
                                                          feed_dict=feed_dict)
                training_loss += training_loss_ 
            if verbose:
                print('epoch: {0}的平均損失值：{1}'.format(idx, training_loss/steps))
            training_losses.append(training_loss/steps)

        if isinstance(save, str):
            g['saver'].save(sess, save)
    return training_losses

調用執行：

start_time = time.time()
g = build_basic_rnn_graph_with_list()
print("建構圖耗時", time.time()-start_time)
start_time = time.time()
train_rnn(g, )
print("訓練耗時：", time.time()-start_time)

運作結果
- 建構計算圖耗時: 113.43532419204712
- 3 個 epoch 運作耗時：

epoch: 的平均損失值：
epoch: 的平均損失值：
epoch: 的平均損失值：
訓練耗時：

可以看出在建構圖的時候非常耗時，這裡僅僅一層的cell

3、 `dynamic_rnn` 的使用

之前在我們第一個例子中實際已經使用過了，這裡使用 MultiRNNCell 實作多層cell，具體下面再講
構模組化型：
- tf.nn.embedding_lookup(params, ids) 函數是在 params 中查找 ids 的表示，和在 matrix 中用 array 索引類似, 這裡是在二維embeddings中找二維的ids, ids 每一行中的一個數對應 embeddings 中的一行，是以最後是 [batch_size, num_steps, state_size] ，關于具體的輸出可以檢視這裡
- 這裡我認為就是某個字母的表示,之前上面我們的 statci_rnn 就是 one-hot 來表示的

'''使用dynamic_rnn方式
   - 之前我們自己實作的cell和static_rnn的例子都是将得到的tensor使用list存起來，這種方式建構計算圖時很慢
   - dynamic可以在運作時建構計算圖
'''
def build_multilayer_lstm_graph_with_dynamic_rnn(
    state_size = state_size,
    num_classes = num_classes,
    batch_size = batch_size,
    num_steps = num_steps,
    num_layers = ,
    learning_rate = learning_rate
    ):
    reset_graph()
    x = tf.placeholder(tf.int32, [batch_size, num_steps], name='x')
    y = tf.placeholder(tf.int32, [batch_size, num_steps], name='y')
    embeddings = tf.get_variable(name='embedding_matrix', shape=[num_classes, state_size])
    '''這裡的輸入是三維的[batch_size, num_steps, state_size]
        - embedding_lookup(params, ids)函數是在params中查找ids的表示， 和在matrix中用array索引類似,
          這裡是在二維embeddings中找二維的ids, ids每一行中的一個數對應embeddings中的一行，是以最後是[batch_size, num_steps, state_size]
    '''
    rnn_inputs = tf.nn.embedding_lookup(params=embeddings, ids=x)
    cell = tf.nn.rnn_cell.LSTMCell(num_units=state_size, state_is_tuple=True)
    cell = tf.nn.rnn_cell.MultiRNNCell(cells=[cell]*num_layers, state_is_tuple=True)
    init_state = cell.zero_state(batch_size, dtype=tf.float32)
    '''使用dynamic_rnn方式'''
    rnn_outputs, final_state = tf.nn.dynamic_rnn(cell=cell, inputs=rnn_inputs, 
                                                 initial_state=init_state)    
    with tf.variable_scope('softmax'):
        W = tf.get_variable('W', [state_size, num_classes])
        b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer())

    rnn_outputs = tf.reshape(rnn_outputs, [-, state_size])   # 轉成二維的矩陣
    y_reshape = tf.reshape(y, [-])
    logits = tf.matmul(rnn_outputs, W) + b                    # 進行矩陣運算
    total_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y_reshape))
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(total_loss)

    return dict(x = x,
                y = y,
                init_state = init_state,
                final_state = final_state,
                total_loss = total_loss,
                train_step = train_step)

調用執行即可

start_time = time.time()
g = build_multilayer_lstm_graph_with_dynamic_rnn()
print("建構圖耗時", time.time()-start_time)
start_time = time.time()
train_rnn(g, )
print("訓練耗時：", time.time()-start_time)

運作結果（注意這是3層的LSTM）：
- 建構計算圖耗時 7.616888523101807 ，相比第一種 static_rnn 很快
- 訓練耗時(這是3層的LSTM，是以還是挺慢的)：

epoch: 的平均損失值：
epoch: 的平均損失值：
epoch: 的平均損失值：
訓練耗時：

4、 `tf.scan` 實作的方式

如果你不了解 tf.scan ，建議看下官方API, 還是有點複雜的。
- 或者Youtube上有個介紹，點選這裡檢視
scan 是個高階函數，一般的計算方式是：給定一個序列 [x0,x1,.....,xn] 和初試狀态 y−1 ,根據 yt=f(xt,yt−1) 計算得到最終序列 [y0,y1,......,yn]
建構計算圖
- tf.transpose(rnn_inputs, [1,0,2]) 是将 rnn_inputs 的第一個和第二個次元調換，即變成 [num_steps,batch_size, state_size] , 在 dynamic_rnn 函數有個time_major參數，就是指定 num_steps 是否在第一個次元上，預設是 false 的,即不在第一維
- tf.scan 會将 elems 按照第一維拆開，是以一次就是一個 step 的資料（和我們 static_rnn 的例子類似）
- 參數a的結構和initializer的結構一緻，是以 a[1] 就是對應的 state ， cell 需要傳入 x 和 state 計算
- 每次疊代 cell 傳回的是一個 rnn_output, shape=(batch_size,state_size) 和對應的 state , num_steps 之後的 rnn_outputs 的 shape 就是 (num_steps, batch_size, state_size) ， state 同理
- 每次輸入的 x 都會得到的 state-->(final_states) ，我們隻要的最後的 final_state

'''使用scan實作dynamic_rnn的效果'''
def build_multilayer_lstm_graph_with_scan(
    state_size = state_size,
    num_classes = num_classes,
    batch_size = batch_size,
    num_steps = num_steps,
    num_layers = ,
    learning_rate = learning_rate
    ):
    reset_graph()
    x = tf.placeholder(tf.int32, [batch_size, num_steps], name='x')
    y = tf.placeholder(tf.int32, [batch_size, num_steps], name='y')
    embeddings = tf.get_variable(name='embedding_matrix', shape=[num_classes, state_size])
    '''這裡的輸入是三維的[batch_size, num_steps, state_size]'''
    rnn_inputs = tf.nn.embedding_lookup(params=embeddings, ids=x)
    '''建構多層的cell, 先建構一個cell, 然後使用MultiRNNCell函數建構即可'''
    cell = tf.nn.rnn_cell.LSTMCell(num_units=state_size, state_is_tuple=True)
    cell = tf.nn.rnn_cell.MultiRNNCell(cells=[cell]*num_layers, state_is_tuple=True)  
    init_state = cell.zero_state(batch_size, dtype=tf.float32)
    '''使用tf.scan方式
       - tf.transpose(rnn_inputs, [1,0,2])  是将rnn_inputs的第一個和第二個次元調換，即[num_steps,batch_size, state_size],
           在dynamic_rnn函數有個time_major參數，就是指定num_steps是否在第一個次元上，預設是false的,即不在第一維
       - tf.scan會将elems按照第一維拆開，是以一次就是一個step的資料（和我們static_rnn的例子類似）
       - a的結構和initializer的結構一緻，是以a[1]就是對應的state，cell需要傳入x和state計算
       - 每次疊代cell傳回的是一個rnn_output(batch_size,state_size)和對應的state,num_steps之後的rnn_outputs的shape就是(num_steps, batch_size, state_size)
       - 每次輸入的x都會得到的state(final_states)，我們隻要的最後的final_state
    '''
    def testfn(a, x):
        return cell(x, a[])
    rnn_outputs, final_states = tf.scan(fn=testfn, elems=tf.transpose(rnn_inputs, [,,]),
                                        initializer=(tf.zeros([batch_size,state_size]),init_state)
                                        )
    '''或者使用lambda的方式'''
    #rnn_outputs, final_states = tf.scan(lambda a,x: cell(x, a[1]), tf.transpose(rnn_inputs, [1,0,2]),
                                        #initializer=(tf.zeros([batch_size, state_size]),init_state))
    final_state = tuple([tf.nn.rnn_cell.LSTMStateTuple(
        tf.squeeze(tf.slice(c, [num_steps-,,], [,batch_size,state_size])),
        tf.squeeze(tf.slice(h, [num_steps-,,], [,batch_size,state_size]))) for c, h in final_states])

    with tf.variable_scope('softmax'):
        W = tf.get_variable('W', [state_size, num_classes])
        b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer())

    rnn_outputs = tf.reshape(rnn_outputs, [-, state_size])
    y_reshape = tf.reshape(y, [-])
    logits = tf.matmul(rnn_outputs, W) + b
    total_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y_reshape))
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(total_loss)

    return dict(x = x,
                y = y,
                init_state = init_state,
                final_state = final_state,
                total_loss = total_loss,
                train_step = train_step)

運作結果
- 建構計算圖耗時: 8.685610055923462 （比 dynamic_rnn 稍微慢一點）
- 訓練耗時（和 dynamic_rnn 耗時差不多）
使用 scan 的方式隻比 dynamic_rnn 慢一點點，但是對我們來說更加靈活和清楚執行的過程。也友善我們修改代碼（比如從 state 的 t-2 時刻跳過一個時刻直接到 t ）

epoch: 的平均損失值：
epoch: 的平均損失值：
epoch: 的平均損失值：
訓練耗時：

三、關于多層 `RNN`

1、結構

LSTM 中包含兩個 state ,一個是 c 記憶單元（ memory cell ），另外一個是 h 隐藏狀态( hidden state ), 在 Tensorflow 中是以 tuple 元組的形式，是以才有上面建構 dynamic_rnn 時的參數 state_is_tuple 的參數，這種方式執行更快
多層的結構如下圖

深度學習（08）_RNN-LSTM循環神經網絡-03-Tensorflow進階實作一、模型說明和資料處理二、使用tf.scan函數和dynamic_rnn三、關于多層RNN四、Dropout操作五、層标準化 (Layer Normalization)六、生成文本Reference
我們可以将其包裝起來, 看起來像一個 cell 一樣

深度學習（08）_RNN-LSTM循環神經網絡-03-Tensorflow進階實作一、模型說明和資料處理二、使用tf.scan函數和dynamic_rnn三、關于多層RNN四、Dropout操作五、層标準化 (Layer Normalization)六、生成文本Reference

2、代碼

Tensorflow 中的實作就是使用 tf.nn.rnn_cell.MultiRNNCell
- 聲明一個 cell
- MultiRNNCell 中傳入 [cell]*num_layers 就可以了
- 注意如果是 LSTM ，定義參數 state_is_tuple=True

cell = tf.nn.rnn_cell.LSTMCell(num_units=state_size, state_is_tuple=True)
    cell = tf.nn.rnn_cell.MultiRNNCell(cells=[cell]*num_layers, state_is_tuple=True)
    init_state = cell.zero_state(batch_size, dtype=tf.float32)

四、Dropout操作

應用在一層 cell 的輸入和輸出，不應用在循環的部分

1、一層的 `cell`

static_rnn 中實作
- 聲明 placeholder ： keep_prob = tf.placeholder(tf.float32, name='keep_prob')
- 輸入： rnn_inputs = [tf.nn.dropout(rnn_input, keep_prob) for rnn_input in rnn_inputs]
- 輸出： rnn_outputs = [tf.nn.dropout(rnn_output, keep_prob) for rnn_output in rnn_outputs]
- feed_dict 中加入即可： feed_dict = {g['x']: X, g['y']: Y, g['keep_prob']: keep_prob}
dynamic_rnn 或者 scan 中實作
- 直接添加即可，其餘類似： rnn_inputs = tf.nn.dropout(rnn_inputed, keep_prob)

2、多層 `cell`

我們之前說使用 MultiRNNCell 将多層 cell 看作一個 cell , 那麼怎麼實作對每層 cell 使用 dropout 呢
可以使用 tf.nn.rnn_cell.DropoutWrapper 來實作
方式一： cell = tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=input_keep_prob, output_keep_prob=output_drop_prob)
- 如果同時使用了 input_keep_prob 和 output_keep_prob 都是 0.9 , 那麼層之間的 drop_out=0.9*0.9=0.81
方式二: 對于 basic cell 隻使用一個 input_keep_prob 或者 output_keep_prob ，對 MultiRNNCell 也使用一個 input_keep_prob 或者 output_keep_prob

cell = tf.nn.rnn_cell.LSTMCell(num_units=state_size, state_is_tuple=True)
    cell = tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=keep_prob)
    cell = tf.nn.rnn_cell.MultiRNNCell(cells=[cell]*num_layers, state_is_tuple=True)
    cell = tf.nn.rnn_cell.DropoutWrapper(cell,output_keep_prob=keep_prob)

五、層标準化 ( `Layer Normalization` )

1、說明

Layer Normalization 是受 Batch Normalization 的啟發而來，針對于RNN，可以檢視相關論文
Batch Normalization 主要針對于傳統的深度神經網絡和CNN，關于 Batch Normalization 的操作和推導可以看我之前的部落格
可以加快訓練的速度，得到更好的結果等

2、代碼

找到 LSTMCell 的源碼拷貝一份修改即可
layer normalization 函數
- 傳入的 tensor 是二維的，對其進行 batch normalization 操作
- tf.nn.moment 是計算 tensor 的 mean value 和 variance value
- 然後對其進行縮放( scale )和平移( shift )

'''layer normalization'''
def ln(tensor, scope=None, epsilon=):
  assert(len(tensor.get_shape()) == )
  m, v = tf.nn.moments(tensor, [], keep_dims=True)
  if not isinstance(scope, str):
    scope = ''
  with tf.variable_scope(scope+'layer_norm'):
    scale = tf.get_variable(name='scale', 
                            shape=[tensor.get_shape()[]], 
                            initializer=tf.constant_initializer())
    shift = tf.get_variable('shift',
                            [tensor.get_shape()[]],
                            initializer=tf.constant_initializer())
  LN_initial = (tensor - m) / tf.sqrt(v + epsilon)
  return LN_initial*scale + shift

LSTMCell 中的 call 方法 i,j,f,o 調用 layer normalization 操作
- _linear 函數中的 bias 設為 False ，因為 BN 會加上 shift

'''這裡bias設定為false, 因為bn會加上shift'''
      lstm_matrix = _linear([inputs, m_prev],  * self._num_units, bias=False)
      i, j, f, o = array_ops.split(
          value=lstm_matrix, num_or_size_splits=, axis=)
      '''執行ln'''
      i = ln(i, scope = 'i/')
      j = ln(j, scope = 'j/')
      f = ln(f, scope = 'f/')
      o = ln(o, scope = 'o/')

建構計算圖
- 可以選擇 RNN GRU LSTM
- Dropout
- Layer Normalization

'''最終的整合模型，
   - 普通RNN，GRU，LSTM
   - dropout
   - BN
'''
from LayerNormalizedLSTMCell import LayerNormalizedLSTMCell # 導入layer normalization的LSTMCell 檔案

def build_final_graph(
    cell_type = None,
    state_size = state_size,
    num_classes = num_classes,
    batch_size = batch_size,
    num_steps = num_steps,
    num_layers = ,
    build_with_dropout = False,
    learning_rate = learning_rate):

    reset_graph()
    x = tf.placeholder(tf.int32, [batch_size, num_steps], name='x')
    y = tf.placeholder(tf.int32, [batch_size, num_steps], name='y')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    embeddings = tf.get_variable('embedding_matrix', [num_classes, state_size])
    rnn_inputs = tf.nn.embedding_lookup(embeddings, x)
    if cell_type == 'GRU':
        cell = tf.nn.rnn_cell.GRUCell(state_size)
    elif cell_type == 'LSTM':
        cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
    elif cell_type == 'LN_LSTM':
        cell = LayerNormalizedLSTMCell(state_size)  # 自己修改的代碼，導入對應的檔案
    else:
        cell = tf.nn.rnn_cell.BasicRNNCell(state_size)
    if build_with_dropout:
        cell = tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=keep_prob)

    init_state = cell.zero_state(batch_size, tf.float32)
    '''dynamic_rnn'''
    rnn_outputs, final_state = tf.nn.dynamic_rnn(cell, rnn_inputs, initial_state=init_state)
    with tf.variable_scope('softmax'):
        W = tf.get_variable('W', [state_size, num_classes])
        b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer())
    rnn_outputs = tf.reshape(rnn_outputs, [-, state_size])
    y_reshaped = tf.reshape(y, [-])
    logits = tf.matmul(rnn_outputs, W) + b

    predictions = tf.nn.softmax(logits)

    total_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped))
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(total_loss)

    return dict(
        x = x,
        y = y,
        keep_prob = keep_prob,
        init_state = init_state,
        final_state = final_state,
        total_loss = total_loss,
        train_step = train_step,
        preds = predictions,
        saver = tf.train.Saver()
    )

六、生成文本

1、說明

訓練完成之後将計算圖儲存到本地磁盤，下次直接讀取就可以了
我們給出第一個字元， RNN 接着一個個生成字元，每次都是根據前一個字元
- 是以 num_steps=1 , batch_size=1 （可以想象生成 prediction 的 shape 是 (1, num_classes) 中選擇一個機率,–> num_steps=1 ）

2、代碼

建構圖（直接傳入參數即可）： g = build_final_graph(cell_type='LN_LSTM', num_steps=1, batch_size=1)
生成文本
- 讀取訓練好的檔案
- 得到給出的第一個字元對應的數字
- 循環周遊要生成多少個字元，每次循環生成一個字元

'''生成文本'''
def generate_characters(g, checkpoint, num_chars, prompt='A', pick_top_chars=None):
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        g['saver'].restore(sess, checkpoint)   # 讀取檔案
        state = None
        current_char = vocab_to_idx[prompt]    # 得到給出的字母對應的數字
        chars = [current_char]                          
        for i in range(num_chars):             # 總共生成多少數字
            if state is not None:              # 第一次state為None,因為計算圖中定義了剛開始為0
                feed_dict={g['x']: [[current_char]], g['init_state']: state} # 傳入目前字元
            else:
                feed_dict={g['x']: [[current_char]]}
            preds, state = sess.run([g['preds'],g['final_state']], feed_dict)   # 得到預測結果（機率）preds的shape就是（1，num_classes）
            if pick_top_chars is not None:              # 如果設定了機率較大的前多少個
                p = np.squeeze(preds)
                p[np.argsort(p)[:-pick_top_chars]] =   # 其餘的置為0
                p = p / np.sum(p)                       # 因為下面np.random.choice函數p的機率和要求是1，處理一下
                current_char = np.random.choice(vocab_size, , p=p)[]    # 根據機率選擇一個
            else:
                current_char = np.random.choice(vocab_size, , p=np.squeeze(preds))[]

            chars.append(current_char)
    chars = map(lambda x: idx_to_vocab[x], chars)
    result = "".join(chars)
    print(result)
    return result

結果
- 由于訓練耗時很長，這裡使用 LSTM 訓練了 30 個 epoch ，結果如下
- 可以自己調整參數，可能會得到更好的結果

ANK
O: HFOFMFRone s the statlighte thithe thit.

BODEN --

I I's a tomir.
I't
shis and on ar tald the theand this he sile be cares hat s ond tho fo hour he singe sime shind and somante tat ond treang tatsing of the an the to to fook.. Ir ard the with ane she stale..
ANTE --

KINE
Show the ard and a beat the weringe be thing or.

Bo hith tho he melan to the mute steres.

The singer stis ard stis.

BACE CANKONS CORE
Sard the sids ing tho the the sackes tom the

IN
We stoe shit a dome thor

ate seomser hith.

That
thow ound


TANTONT. SEAT THONTITE SERTI                           
SHe the mathe a tomoner
ind is ingit ofres treacentit. Sher stard on this the tor an the candin he whor he sath heres and
stha dortour tit thas stand. I'd and or a

Reference

https://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html
https://karpathy.github.io/2015/05/21/rnn-effectiveness/
http://jmlr.org/proceedings/papers/v37/ioffe15.pdf
tensorflow scan ：
- https://www.tensorflow.org/api_docs/python/tf/scan
- https://www.youtube.com/watch?v=A6qJMB3stE4&t=621s

深度學習（08）_RNN-LSTM循環神經網絡-03-Tensorflow進階實作一、模型說明和資料處理二、使用tf.scan函數和dynamic_rnn三、關于多層RNN四、Dropout操作五、層标準化 (Layer Normalization)六、生成文本Reference

一、模型說明和資料處理

1、模型說明

2、資料處理

二、使用 `tf.scan` 函數和 `dynamic_rnn`

1、為什麼使用 `tf.scan` 和 `dynamic_rnn`

2、使用 `list` 的方式( `static_rnn` )

3、 `dynamic_rnn` 的使用

4、 `tf.scan` 實作的方式

三、關于多層 `RNN`

1、結構

2、代碼

四、Dropout操作

1、一層的 `cell`

2、多層 `cell`

五、層标準化 ( `Layer Normalization` )

1、說明

2、代碼

六、生成文本

1、說明

2、代碼

Reference

繼續閱讀

TestLink導出用例轉換工具(XML2Excel)

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入

深度學習（08）_RNN-LSTM循環神經網絡-03-Tensorflow進階實作一、模型說明和資料處理二、使用tf.scan函數和dynamic_rnn三、關于多層RNN四、Dropout操作五、層标準化 (Layer Normalization)六、生成文本Reference

一、模型說明和資料處理

1、模型說明

2、資料處理

二、使用 tf.scan 函數和 dynamic_rnn

1、為什麼使用 tf.scan 和 dynamic_rnn

2、使用 list 的方式( static_rnn )

3、 dynamic_rnn 的使用

4、 tf.scan 實作的方式

三、關于多層 RNN

1、結構

2、代碼

四、Dropout操作

1、一層的 cell

2、多層 cell

五、層标準化 ( Layer Normalization )

1、說明

2、代碼

六、生成文本

1、說明

2、代碼

Reference

繼續閱讀

二、使用 `tf.scan` 函數和 `dynamic_rnn`

1、為什麼使用 `tf.scan` 和 `dynamic_rnn`

2、使用 `list` 的方式( `static_rnn` )

3、 `dynamic_rnn` 的使用

4、 `tf.scan` 實作的方式

三、關于多層 `RNN`

1、一層的 `cell`

2、多層 `cell`

五、層标準化 ( `Layer Normalization` )