Python中利用LSTM模型進行時間序列預測分析

時間序列模型

時間序列預測分析就是利用過去一段時間内某事件時間的特征來預測未來一段時間内該事件的特征。這是一類相對比較複雜的預測模組化問題，和回歸分析模型的預測不同，時間序列模型是依賴于事件發生的先後順序的，同樣大小的值改變順序後輸入模型産生的結果是不同的。

舉個栗子：根據過去兩年某股票的每天的股價資料推測之後一周的股價變化；根據過去2年某店鋪每周想消費人數預測下周來店消費的人數等等

RNN 和 LSTM 模型

時間序列模型最常用最強大的的工具就是遞歸神經網絡（recurrent neural network, RNN）。相比與普通神經網絡的各計算結果之間互相獨立的特點，RNN的每一次隐含層的計算結果都與目前輸入以及上一次的隐含層結果相關。通過這種方法，RNN的計算結果便具備了記憶之前幾次結果的特點。

典型的RNN網路結構如下：

右側為計算時便于了解記憶而産開的結構。簡單說，x為輸入層，o為輸出層，s為隐含層，而t指第幾次的計算；V,W,U為權重，其中計算第t次的隐含層狀态時為St = f(U*Xt + W*St-1)，實作目前輸入結果與之前的計算挂鈎的目的。對RNN想要更深入的了解可以戳這裡。

RNN的局限：

由于RNN模型如果需要實作長期記憶的話需要将目前的隐含态的計算與前n次的計算挂鈎，即St = f(U*Xt + W1*St-1 + W2*St-2 + ... + Wn*St-n)，那樣的話計算量會呈指數式增長，導緻模型訓練的時間大幅增加，是以RNN模型一般直接用來進行長期記憶計算。

LSTM模型

LSTM（Long Short-Term Memory）模型是一種RNN的變型，最早由Juergen Schmidhuber提出的。經典的LSTM模型結構如下：

LSTM的特點就是在RNN結構以外添加了各層的閥門節點。閥門有3類：遺忘閥門（forget gate），輸入閥門（input gate）和輸出閥門（output gate）。這些閥門可以打開或關閉，用于将判斷模型網絡的記憶态（之前網絡的狀态）在該層輸出的結果是否達到門檻值進而加入到目前該層的計算中。如圖中所示，閥門節點利用sigmoid函數将網絡的記憶态作為輸入計算；如果輸出結果達到門檻值則将該閥門輸出與目前層的的計算結果相乘作為下一層的輸入（PS：這裡的相乘是在指矩陣中的逐元素相乘）；如果沒有達到門檻值則将該輸出結果遺忘掉。每一層包括閥門節點的權重都會在每一次模型反向傳播訓練過程中更新。更具體的LSTM的判斷計算過程如下圖所示：

LSTM模型的記憶功能就是由這些閥門節點實作的。當閥門打開的時候，前面模型的訓練結果就會關聯到目前的模型計算，而當閥門關閉的時候之前的計算結果就不再影響目前的計算。是以，通過調節閥門的開關我們就可以實作早期序列對最終結果的影響。而當你不不希望之前結果對之後産生影響，比如自然語言進行中的開始分析新段落或新章節，那麼把閥門關掉即可。（對LSTM想要更具體的了解可以戳這裡）

下圖具體示範了閥門是如何工作的：通過閥門控制使序列第1的輸入的變量影響到了序列第4,6的的變量計算結果。

黑色實心圓代表對該節點的計算結果輸出到下一層或下一次計算；空心圓則表示該節點的計算結果沒有輸入到網絡或者沒有從上一次收到信号。

Python中實作LSTM模型搭建

Python中有不少包可以直接調用來建構LSTM模型，比如pybrain, kears, tensorflow, cikit-neuralnetwork等（更多戳這裡）。這裡我們選用keras。（PS：如果作業系統用的linux或者mac，強推Tensorflow！！！）

因為LSTM神經網絡模型的訓練可以通過調整很多參數來優化，例如activation函數，LSTM層數，輸入輸出的變量次元等，調節過程相當複雜。這裡隻舉一個最簡單的應用例子來描述LSTM的搭建過程。

應用執行個體

基于某家店的某顧客的曆史消費的時間推測該顧客前下次來店的時間。具體資料如下所示：

消費時間
2015-05-15 14:03:51
2015-05-15 15:32:46
2015-06-28 18:00:17
2015-07-16 21:27:18
2015-07-16 22:04:51
2015-09-08 14:59:56
..
..

具體操作：

1. 原始資料轉化

首先需要将時間點資料進行數值化。将具體時間轉化為時間段用于表示該使用者相鄰兩次消費的時間間隔，然後再導入模型進行訓練是比較常用的手段。轉化後的資料如下：

消費間隔
0
44
18
0
54
..
..

2.生成模型訓練資料集（确定訓練集的視窗長度）

這裡的視窗指需要幾次消費間隔用來預測下一次的消費間隔。這裡我們先采用視窗長度為3，即用t-2, t-1,t次的消費間隔進行模型訓練，然後用t+1次間隔對結果進行驗證。資料集格式如下：X為訓練資料，Y為驗證資料。

PS：這裡說确定也不太合适，因為視窗長度需要根據模型驗證結果進行調整的。

X1    X2    X3    Y
0    44    18    0
44    18    0    54
..
..

注：直接這樣預測一般精度會比較差，可以把預測值Y根據數值bin到幾類，然後用轉換成one-hot标簽再來訓練會比較好。比如如果把Y按數值範圍分到五類（1：0-20，2：20-40，3：40-60，4：60-80，5：80-100）上式可化為：

X1    X2    X3    Y
0    44    18    0
44    18    0    4
...

Y轉化成one-hot以後則是(關于one-hot編碼可以參考這裡)

1    0    0    0    0
0    0    0    0    1
...

3. 網絡模型結構的确定和調整

這裡我們使用python的keras庫。（用java的同學可以參考下deeplearning4j這個庫）。網絡的訓練過程設計到許多參數的調整：比如

需要确定LSTM子產品的激活函數（activation fucntion）（keras中預設的是tanh）；
确定接收LSTM輸出的完全連接配接人工神經網絡（fully-connected artificial neural network）的激活函數（keras中預設為linear）；
确定每一層網絡節點的舍棄率（為了防止過度拟合（overfit）），這裡我們預設值設定為0.2；
确定誤差的計算方式，這裡我們使用均方誤差（mean squared error）；
确定權重參數的疊代更新方式，這裡我們采用RMSprop算法，通常用于RNN網絡。
确定模型訓練的epoch和batch size（關于模型的這兩個參數具體解釋戳這裡）

一般來說LSTM子產品的層數越多（一般不超過3層，再多訓練的時候就比較難收斂），對進階别的時間表示的學習能力越強；同時，最後會加一層普通的神經網路層用于輸出結果的降維。典型結構如下：

Python中利用LSTM模型進行時間序列預測分析
如果需要将多個序列進行同一個模型的訓練，可以将序列分别輸入到獨立的LSTM子產品然後輸出結果合并後輸入到普通層。結構如下：

4. 模型訓練和結果預測

将上述資料集按4:1的比例随機拆分為訓練集和驗證集，這是為了防止過度拟合。訓練模型。然後将資料的X列作為參數導入模型便可得到預測值，與實際的Y值相比便可得到該模型的優劣。

實作代碼

時間間隔序列格式化成所需的訓練集格式

import pandas as pd
import numpy as np

def create_interval_dataset(dataset, look_back):
    """
    :param dataset: input array of time intervals
    :param look_back: each training set feature length
    :return: convert an array of values into a dataset matrix.
    """
    dataX, dataY = [], []
    for i in range(len(dataset) - look_back):
        dataX.append(dataset[i:i+look_back])
        dataY.append(dataset[i+look_back])
    return np.asarray(dataX), np.asarray(dataY)

df = pd.read_csv("path-to-your-time-interval-file")    
dataset_init = np.asarray(df)    # if only 1 column
dataX, dataY = create_interval_dataset(dataset, lookback=3)    # look back if the training set sequence length

這裡的輸入資料來源是csv檔案，如果輸入資料是來自資料庫的話可以參考這裡

2. LSTM網絡結構搭建

import pandas as pd
import numpy as np
import random
from keras.models import Sequential, model_from_json
from keras.layers import Dense, LSTM, Dropout

class NeuralNetwork():
    def __init__(self, **kwargs):
        """
        :param **kwargs: output_dim=4: output dimension of LSTM layer; activation_lstm=\'tanh\': activation function for LSTM layers; activation_dense=\'relu\': activation function for Dense layer; activation_last=\'sigmoid\': activation function for last layer; drop_out=0.2: fraction of input units to drop; np_epoch=10, the number of epoches to train the model. epoch is one forward pass and one backward pass of all the training examples; batch_size=32: number of samples per gradient update. The higher the batch size, the more memory space you\'ll need; loss=\'mean_square_error\': loss function; optimizer=\'rmsprop\'
        """
        self.output_dim = kwargs.get(\'output_dim\', 8)
        self.activation_lstm = kwargs.get(\'activation_lstm\', \'relu\')
        self.activation_dense = kwargs.get(\'activation_dense\', \'relu\')
        self.activation_last = kwargs.get(\'activation_last\', \'softmax\')    # softmax for multiple output
        self.dense_layer = kwargs.get(\'dense_layer\', 2)     # at least 2 layers
        self.lstm_layer = kwargs.get(\'lstm_layer\', 2)
        self.drop_out = kwargs.get(\'drop_out\', 0.2)
        self.nb_epoch = kwargs.get(\'nb_epoch\', 10)
        self.batch_size = kwargs.get(\'batch_size\', 100)
        self.loss = kwargs.get(\'loss\', \'categorical_crossentropy\')
        self.optimizer = kwargs.get(\'optimizer\', \'rmsprop\')

        def NN_model(self, trainX, trainY, testX, testY):
        """
        :param trainX: training data set
        :param trainY: expect value of training data
        :param testX: test data set
        :param testY: epect value of test data
        :return: model after training
        """
        print "Training model is LSTM network!"
        input_dim = trainX[1].shape[1]
        output_dim = trainY.shape[1] # one-hot label
        # print predefined parameters of current model:
        model = Sequential()
        # applying a LSTM layer with x dim output and y dim input. Use dropout parameter to avoid overfitting
        model.add(LSTM(output_dim=self.output_dim,
                       input_dim=input_dim,
                       activation=self.activation_lstm,
                       dropout_U=self.drop_out,
                       return_sequences=True))
        for i in range(self.lstm_layer-2):
            model.add(LSTM(output_dim=self.output_dim,
                       input_dim=self.output_dim,
                       activation=self.activation_lstm,
                       dropout_U=self.drop_out,
                       return_sequences=True))
        # argument return_sequences should be false in last lstm layer to avoid input dimension incompatibility with dense layer
        model.add(LSTM(output_dim=self.output_dim,
                       input_dim=self.output_dim,
                       activation=self.activation_lstm,
                       dropout_U=self.drop_out))
        for i in range(self.dense_layer-1):
            model.add(Dense(output_dim=self.output_dim,
                        activation=self.activation_last))
        model.add(Dense(output_dim=output_dim,
                        input_dim=self.output_dim,
                        activation=self.activation_last))
        # configure the learning process
        model.compile(loss=self.loss, optimizer=self.optimizer, metrics=[\'accuracy\'])
        # train the model with fixed number of epoches
        model.fit(x=trainX, y=trainY, nb_epoch=self.nb_epoch, batch_size=self.batch_size, validation_data=(testX, testY))
        # store model to json file
        model_json = model.to_json()
        with open(model_path, "w") as json_file:
            json_file.write(model_json)
        # store model weights to hdf5 file
        if model_weight_path:
            if os.path.exists(model_weight_path):
                os.remove(model_weight_path)
            model.save_weights(model_weight_path) # eg: model_weight.h5
        return model

這裡寫的隻涉及LSTM網絡的結構搭建，至于如何把資料處理規範化成網絡所需的結構以及把模型預測結果與實際值比較統計的可視化，就需要根據實際情況做調整了。具體腳本可以參考下這個

參考文檔：

[力推]： Understanding LSTMs

Keras Documnet
What is batch size in neural network?
Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras
Save Your Neural Network Model to JSON
RECURRENT NEURAL NETWORKS TUTORIAL, PART 1 – INTRODUCTION TO RNNS
A Beginner’s Guide to Recurrent Networks and LSTMs
Pybrain time series prediction using LSTM recurrent nets
PyBrain Document
Recurrent neural network for predicting next value in a sequence
What are some good Python libraries that implement LSTM networks?