chatbot1.2
如何處理多義詞的embedding?
- 每個意思一個向量,多方疊加。在某個切面與其相同意思的向量相近
如何識别和學習詞組的向量?
- 多次出現在一起,認為是詞組
如何處理未曾見過的新詞?
- 語境平均,語境猜測
通過非監督學習訓練詞向量需要多少資料?
- Glove
- Word2vec
如何用已有的資料優化?
Retrofit
- 灰色的是非監督學習學到的向量,白色的是知識庫的詞和其關系。白色标記學到的盡量接近灰色的來加強知識庫裡的詞的關系或消弱。–》兼顧兩者特征的新的。形成下面的
- Conceptnet embedding
RNN
https://r2rt.com/recurrent-neural-networks-in-tensorflow-i.html
demo1
- 這個,可以在理論上計算一個一般精确的和非常精确的模型會有什麼樣的準确率,和理論資料做比較。
- input是簡單的0,1序列,雖然排成了序列,但互相之間是獨立的。
-
輸出序列并不完全獨立,有一部分和事件相關的資訊。每一個node都會和輸入序列有一定的關系。每一個node是0或者1的機率受一個先驗機率和輸入序列裡面的t-3和t-8這兩個位置的數字的影響。如圖。
import numpy as np
import tensorflow as tf
matplotlib inline
import matplotlib.pyplot as pltGlobal config variables
num_steps = 5 # number of truncated backprop steps (‘n’ in the discussion above)
batch_size = 200
num_classes = 2
state_size = 4
learning_rate = 0.1
def gen_data(size=1000000):
按照資料生成合成序列資料 :param size: input 和output序列的總長度 :return: X,Y:input和output序列,rank-1和numpy array(即.vector) X = np.array(np.random.choice(2, size=(size,))) Y = [] for i in range(size): threshold = 0.5 if X[i-3] == 1: threshold += 0.5 if X[i-8] == 1: threshold -= 0.25 if np.random.rand() > threshold: Y.append(0) else: Y.append(1) return X, np.array(Y)
# adapted from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/rnn/ptb/reader.py
def gen_batch(raw_data, batch_size, num_steps):
# """
# 産生minibatch資料
# :param raw_data:所有的資料,(input,output)tuple
# :param batch_size: 一個minibatch包含的樣本數量,每個樣本是一個sequence,有多少個序列
# :param num_steps: 每個sequence樣本的長度
# :return:
# 一個generator,在一個tuple裡面包含一個minibatch的輸入,輸出序列
# """
raw_x, raw_y = raw_data
data_length = len(raw_x)
# 将raw_data分為batches并且stack他們垂直的在資料矩陣中
# partition raw data into batches and stack them vertically in a data matrix
batch_partition_length = data_length // batch_size
data_x = np.zeros([batch_size, batch_partition_length], dtype=np.int32)
data_y = np.zeros([batch_size, batch_partition_length], dtype=np.int32)#這兩個形狀相同
for i in range(batch_size):
data_x[i] = raw_x[batch_partition_length * i:batch_partition_length * (i + 1)]
data_y[i] = raw_y[batch_partition_length * i:batch_partition_length * (i + 1)]
# further divide batch partitions into num_steps for truncated backprop更進一步分割
epoch_size = batch_partition_length // num_steps
for i in range(epoch_size):
x = data_x[:, i * num_steps:(i + 1) * num_steps]
y = data_y[:, i * num_steps:(i + 1) * num_steps]
yield (x, y)
def gen_epochs(n, num_steps):
for i in range(n):
yield gen_batch(gen_data(), batch_size, num_steps)
caculate the perplexity
- 3中完美的模型表示,既考慮t-3的時間也考慮了t-8的時間
- 4中不完美的模型不考慮時間
簡易的RNN模型實作
- 前一個會把它自己的狀态作為輸入放到後一個裡面,他也會把自己的觀測時作為另一部分放在rnn裡面,兩個輸入會拼接到一起映射到一個四維的向量裡,然後這個四維的向量會進一步映射到一個一維的輸出裡面。這就是如何使用一個rnn模型給定一個輸入,預測一個輸出。
- 預設graph,variale_scope,name_scope
-
自定義rnn cell
####tensorflow tip:tf.placeholder ####
- 在計算圖中添加node,用于存儲資料
- placeholder沒有預設資料,資料完全由外部提供。注意tf.Variable,tf.Tensor,tf.placeholder的差別
- 外界輸入資料:tf.placeholder
- 中間資料:tf.Tensor,tensorflow operation的output
- 參數:tf.Variable
"""
Placeholders
"""
x = tf.placeholder(tf.int32, [batch_size, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [batch_size, num_steps], name='labels_placeholder')
init_state = tf.zeros([batch_size, state_size])#得到的是目前的狀态,也是下一個的其中一個輸入
"""
RNN Inputs
"""
# RNN Inputs
# Turn our placeholder into a list of one-hot tensors:
# rnn_inputs is a list of num_steps tensors with shape [batch_size, num_classes]
# 将前面定義的placeholder輸入到rnn cells
# 将輸入序列的每一個0,1數字轉化為二維one-hot向量
x_one_hot = tf.one_hot(x, num_classes)
rnn_inputs = tf.unstack(x_one_hot, axis=1)
"""
Definition of rnn_cell
This is very similar to the __call__ method on Tensorflow's BasicRNNCell. See:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py#L95
"""
# state_size#隐含狀态的次元,希望把num_classes + state_size映射到新的rnn向量的次元上state_size
#這裡隻是定義變量
with tf.variable_scope('rnn_cell'):
W = tf.get_variable('W', [num_classes + state_size, state_size])
b = tf.get_variable('b', [state_size], initializer=tf.constant_initializer(0.0))
#這裡使用定義的變量形成一個又一個的rnn單元,且公用同樣的參數
def rnn_cell(rnn_input, state):
with tf.variable_scope('rnn_cell', reuse=True):#reuse=True如果已存在,就直接拿來用,不再重建立立了
W = tf.get_variable('W', [num_classes + state_size, state_size])
b = tf.get_variable('b', [state_size], initializer=tf.constant_initializer(0.0))
return tf.tanh(tf.matmul(tf.concat([rnn_input, state], 1), W) + b)
init_state,state,final_state
"""
Adding rnn_cells to graph
This is a simplified version of the "static_rnn" function from Tensorflow's api. See:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/core_rnn.py#L41
Note: In practice, using "dynamic_rnn" is a better choice that the "static_rnn":
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn.py#L390
"""
state = init_state#初始化的狀态,對第一個rnn單元,我們也要給他一個初始狀态,一般是0
rnn_outputs = []
for rnn_input in rnn_inputs:
state = rnn_cell(rnn_input, state)
rnn_outputs.append(state)
final_state = rnn_outputs[-1]
"""
Predictions, loss, training step
Losses is similar to the "sequence_loss"
function from Tensorflow's API, except that here we are using a list of 2D tensors, instead of a 3D tensor. See:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/seq2seq/python/ops/loss.py#L30
"""
#logits and predictions
with tf.variable_scope('softmax'):
W = tf.get_variable('W', [state_size, num_classes])
b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))
logits = [tf.matmul(rnn_output, W) + b for rnn_output in rnn_outputs]
predictions = [tf.nn.softmax(logit) for logit in logits]
# Turn our y placeholder into a list of labels
y_as_list = tf.unstack(y, num=num_steps, axis=1)
#losses and train_step
# """
# 計算損失函數,定義優化器
# 從每一個time frame 的hidden state
# 映射到每個time frame的最終output(prediction)
# 和cbow或者skip_gram的最上層相同
# Predictions, loss, training step
# Losses is similar to the "sequence_loss"
# function from Tensorflow's API, except that here we are using a list of 2D tensors, instead of a 3D tensor. See:
# https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/seq2seq/python/ops/loss.py#L30
# """
# logits and predictions
losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(labels=label, logits=logit) for \
logit, label in zip(logits, y_as_list)]
total_loss = tf.reduce_mean(losses)
train_step = tf.train.AdagradOptimizer(learning_rate).minimize(total_loss)
tf.variable_scope(‘softmax’)和tf.variable_scope(‘rnn_cell’, reuse=True)中,各有兩個W,b的tf.Variable,因為在不同的variable_scope,即便用相同的名字,也是不同的對象。
##列印所有的variable
all_vars =[node.name for node in tf.global_variables()]
for var in all_vars:
print(var)
# rnn_cell/W:0
# rnn_cell/b:0
# softmax/W:0
# softmax/b:0
# rnn_cell/W/Adagrad:0計算和優化的東西
# rnn_cell/b/Adagrad:0
# softmax/W/Adagrad:0
# softmax/b/Adagrad:0
all_node_names=[node for node in tf.get_default_graph().as_graph_def().node]
#或者tf.get_default_graph().get_operations()
all_node_values=[node.values() for node in tf.get_default_graph().getoperationa()]
for i in range(0,len(all_node_values),50):
print("output and operation %d:"%i)
print(all_node_values[i])
print('---------------------------')
print(all_node_names[i])
print('\n')
print('\n')
for i in range(len(all_node_values)):
print('%d:%s'%(i,all_node_values[i]))
tensor命名規則
add_7:0第七次調用add操作,傳回0個向量7
"""
Train the network
"""
def train_network(num_epochs, num_steps, state_size=4, verbose=True):
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
training_losses = []
for idx, epoch in enumerate(gen_epochs(num_epochs, num_steps)):
training_loss = 0
training_state = np.zeros((batch_size, state_size))
if verbose:
print("\nEPOCH", idx)
for step, (X, Y) in enumerate(epoch):
tr_losses, training_loss_, training_state, _ = \
sess.run([losses,
total_loss,
final_state,
train_step],
feed_dict={x:X, y:Y, init_state:training_state})
training_loss += training_loss_
if step % 100 == 0 and step > 0:
if verbose:
print("Average loss at step", step,
"for last 250 steps:", training_loss/100)
training_losses.append(training_loss/100)
training_loss = 0
return training_losses
training_losses = train_network(1,num_steps)
plt.plot(training_losses)