LSTM的計算公式及了解

一. 入門

對于深度學習和LSTM的新手，可參考零基礎入門深度學習系列文章，這些文章用通俗易懂的方式介紹了深度學習的基礎知識，包括前向傳播和反向傳播的數學推導等，适合入門深度學習和LSTM。

零基礎入門深度學習(1) - 感覺器

零基礎入門深度學習(2) - 線性單元和梯度下降

零基礎入門深度學習(3) - 神經網絡和反向傳播算法

零基礎入門深度學習(4) - 卷積神經網絡

零基礎入門深度學習(5) - 循環神經網絡

零基礎入門深度學習(6) - 長短時記憶網絡(LSTM)

零基礎入門深度學習(7) - 遞歸神經網絡

二. 計算過程和公式

LSTM的詳細計算過程的解釋可參考：LSTM以及三重門，遺忘門，輸入門，輸出門。該文章把下圖的計算過程一步一步拆開來解釋。

LSTM解決遠端遺忘問題的關鍵是細胞狀态cell，上圖的上方的水準線貫穿運作。這條水準線形象地說明：上一時刻的細胞狀态$c_{t-1}$，先遺忘掉一些不重要的資訊(怎麼遺忘由$f_t$決定)，然後從目前時刻的輸入中添加一些資訊(怎麼添加由$i_t$和$C\'_t$決定)。細胞狀态類似于傳送帶。直接在整個鍊上運作，隻有一些少量的線性互動。資訊在上面流傳保持不變會很容易。

The major innovation of LSTM is its memory cell ct which essentially acts as an accumulator of the state information. The cell is accessed, written and cleared by several self-parameterized controlling gates. Every time a new input comes, its information will be accumulated to the cell if the input gate $i_t$ is activated. Also, the past cell status $c_{t-1}$ could be “forgotten” in this process if the forget gate $f_t$ is on. Whether the latest cell output $c_t$ will be propagated to the final state $h_t$ is further controlled by the output gate $o_t$. One advantage of using the memory cell and gates to control information flow is that the gradient will be trapped in the cell (also known as constant error carousels) and be prevented from vanishing too quickly, which is a critical problem for the vanilla RNN model.

LSTM的計算公式：

三. LSTM的變體

peephole LSTM：在計算遺忘門、輸入門、輸出門時要考慮cell的狀态。

耦合遺忘門和輸入門：遺忘率和輸入率總和為1。

GRU

GRU對LSTM做了兩個大改動：

将輸入門、遺忘門、輸出門變為兩個門：更新門$z_t$（Update Gate）和重置門$r_t$（Reset Gate）。
将單元狀态與輸出合并為一個狀态：$h_t$。

【參考資料】

[譯] 了解 LSTM(Long Short-Term Memory, LSTM) 網絡

了解LSTM網絡