# Recurrent Neural Networks
## Architecture
![](https://i.imgur.com/wTJvxs2.png =400x400)
## Forward Propagation
* The hidden state $h_t$ depends on $h_{t-1}$ and $x_t$ ie.
$$h_t = \sigma(Wh_{t-1} + Ux_{t} + b)$$
* Let us say, at timestep $t$
* $x_t$ is the word embedding of size `[e,1]`
* $h_t$ is the RNN hidden state of size `[d,1]`
* Then the transition matrices,
* $U$ is of size `[d,e]`
* $W$ is of size `[d,d]`
* Clearly, $h_{t+1}$ will also be of size `[d,1]`.
* In other words
$$h_t = RNN(h_{t-1}, x_t)$$
Where
$$RNN = \sigma(Wh_{t-1} + Ux_{t} + b)$$
* There is also an optional output layer at each time step, $y_t$
$$y_t = softmax(Vh_t+b)$$
Where $V$ is of size `[d,V]`