# LSTM and GRU ## Brief Outline ### Problem * The state of an RNN records information from all previous time steps. * At each new timestep the old information gets morphed by the current input * One could imagine that after many time steps the information stored at a later time step might get completely morphed so much that it would be impossible to extract the original information stored. ### Key Idea To get more _flexibility_ and have a better _modelling choice:_ * **Selective read:** Selectively read required information * **Selective write:** Selectively write information * **Selective forget:** Selectively forget useless information. ### Recall $$c_t = \sigma(W c_{t-1} + U x_{t} + b)$$ ## Selective Write * We don't want to write the whole to $s_{t-1}$, we just want to write selective portions of that into $c_t$. * We introduce a vector $o_{t-1}$ which decides _what fraction should be passed._ ie. *Selective Write* = $c_{t-1} \cdot o_{t-1}$ * But how does the RNN know what should be the values of $o_{t-1}$? - We introduce parameters. * We compute $o_{t-1}$ and $h_{t-1}$ as * $o_{t-1} = \sigma(W_o h_{t-2} + U_o x_{t-1} + b_o)$ * $h_{t-1} = c_{t-1} \odot o_{t-1}$ * $o_t$ is known as the output gate. ## LSTM ### Selective Read * Now that we have $h_{t-1}$, which contains only selectively written values from $c_{t-1}$. * We may not want to pass this along with $x_t$ directly to $c_t$ as $x_t$ also may contain irrelevant information. Hence, we define an intermediate step $\tilde{s}_t$ * $$\tilde{c}_t = \sigma(W h_{t-1} + U x_{t} + b)$$ * Then *Selective Read* = $\tilde{s}_t \cdot i_t$ where $i_t$ is defined as * $${i}_t = \sigma(W_i h_{t-1} + U_i x_{t} + b_i)$$ ### Selective Forget * We now have $c_{t-1}$ and $\tilde{c}_t \cdot i_t$, and have to combine these to get $c_t$, ie. the new hidden state. * One simple way of doing this is adding both the above terms. However, we may want to forget some parts of $s_{t-1}$ instead of passing it directly. We introduce another gate * $f_t = \sigma(W_f h_{t-1} + U_f x_{t} + b_f)$ ### Update new state * Finally after combing all the above three gates, we get the new state as * $c_t = \tilde{c}_t \odot i_t$ + $c_{t-1} \odot f_t$ ![](https://i.imgur.com/Z2V0Rb8.png) ## GRU