# LSTM and GRU
## Brief Outline
### Problem
* The state of an RNN records information from all previous time steps.
* At each new timestep the old information gets morphed by the current input
* One could imagine that after many time steps the information stored at a later time step might get completely morphed so much that it would be impossible to extract the original information stored.
### Key Idea
To get more _flexibility_ and have a better _modelling choice:_
* **Selective read:** Selectively read required information
* **Selective write:** Selectively write information
* **Selective forget:** Selectively forget useless information.
### Recall
$$c_t = \sigma(W c_{t-1} + U x_{t} + b)$$
## Selective Write
* We don't want to write the whole to $s_{t-1}$, we just want to write selective portions of that into $c_t$.
* We introduce a vector $o_{t-1}$ which decides _what fraction should be passed._ ie. *Selective Write* = $c_{t-1} \cdot o_{t-1}$
* But how does the RNN know what should be the values of $o_{t-1}$? - We introduce parameters.
* We compute $o_{t-1}$ and $h_{t-1}$ as
* $o_{t-1} = \sigma(W_o h_{t-2} + U_o x_{t-1} + b_o)$
* $h_{t-1} = c_{t-1} \odot o_{t-1}$
* $o_t$ is known as the output gate.
## LSTM
### Selective Read
* Now that we have $h_{t-1}$, which contains only selectively written values from $c_{t-1}$.
* We may not want to pass this along with $x_t$ directly to $c_t$ as $x_t$ also may contain irrelevant information. Hence, we define an intermediate step $\tilde{s}_t$
* $$\tilde{c}_t = \sigma(W h_{t-1} + U x_{t} + b)$$
* Then *Selective Read* = $\tilde{s}_t \cdot i_t$ where $i_t$ is defined as
* $${i}_t = \sigma(W_i h_{t-1} + U_i x_{t} + b_i)$$
### Selective Forget
* We now have $c_{t-1}$ and $\tilde{c}_t \cdot i_t$, and have to combine these to get $c_t$, ie. the new hidden state.
* One simple way of doing this is adding both the above terms. However, we may want to forget some parts of $s_{t-1}$ instead of passing it directly. We introduce another gate
* $f_t = \sigma(W_f h_{t-1} + U_f x_{t} + b_f)$
### Update new state
* Finally after combing all the above three gates, we get the new state as
* $c_t = \tilde{c}_t \odot i_t$ + $c_{t-1} \odot f_t$
![](https://i.imgur.com/Z2V0Rb8.png)
## GRU