# Long-Short Term Memory (LSTM)
:::warning
:notebook_with_decorative_cover: **摘要**
[toc]
:::
$$
\{\boldsymbol{x}: x_t \forall t=1,\cdots,T \}
$$
# LSTM Cell 結構
LSTM的架構主要由以下五個元件所組成:
- <font color = 'purple'>單元狀態(cell state)</font>:LSTM的**內部單元狀態**($c$),負責儲存先前的記憶
- <font color = 'brown'>隱藏狀態(hidden state)</font>:LSTM的**外部隱藏狀態**($h$),負責計算預測結果
- <font color = 'red'>輸入閘(input gate)</font>:用來決定要把多少當期輸入($x_t$)送進<font color = 'purple'>當期單元狀態($c_t$)</font>。
- 當期輸入($x_t$)會先被轉換成<font color = 'green'>候選值($\tilde{c_t}$)</font>。
- <font color = 'blue'>遺忘閘(forget gate)</font>:用來決定要把多少<font color = 'purple'>前期單元狀態($c_{t-1}$)</font>送進<font color = 'purple'>當期單元狀態($c_t$)</font>。
- <font color = 'darkorange'>輸出閘(output gate)</font>:用來決定要把多少<font color = 'purple'>當期單元狀態($c_t$)</font>輸出到<font color = 'brown'>當期隱藏狀態($h_t$)</font>
相關式子如下:
$$
\color{red}{i_t} = \sigma(\color{red}{W_{ix}}x_t + \color{red}{W_{ih}}h_{t-1} + \color{red}{b_i})
$$
$$
\color{blue}{f_t} = \sigma(\color{blue}{W_{fx}}x_t + \color{blue}{W_{fh}}h_{t-1} + \color{blue}{b_f})
$$
$$
\color{darkorange}{o_t} = \sigma(\color{darkorange}{W_{ox}}x_t + \color{darkorange}{W_{ot}}h_{t-1} + \color{darkorange}{b_o})
$$
$$
\color{green}{\tilde{c_t}} = \tanh(\color{green}{W_{cx}}x_t + \color{green}{W_{ch}}h_{t-1} + \color{green}{b_c})
$$
$$
\sigma(\cdot) = \dfrac{1}{1+e^{-x}}
$$
$$
\color{purple}{c_t} = \color{blue}{f_t} \color{purple}{c_{t-1}} + \color{red}{i_t}\color{green}{\tilde{c_t}}
$$
$$
\color{brown}{h_t} = \color{darkorange}{o_t}\tanh(\color{purple}{c_t})
$$
圖形表示如下:

# 改善表現的小技巧
- 貪婪取樣(greedy sampling):尋找機率值最高的前 k 個候選項
- 束搜尋(beam search):往下 m 個 timestep 尋找
- 雙向LSTM
- peehole connection:偷看前幾
###### tags: `DL`