# Recurrent Neural Networks ## Architecture ![](https://i.imgur.com/wTJvxs2.png =400x400) ## Forward Propagation * The hidden state $h_t$ depends on $h_{t-1}$ and $x_t$ ie. $$h_t = \sigma(Wh_{t-1} + Ux_{t} + b)$$ * Let us say, at timestep $t$ * $x_t$ is the word embedding of size `[e,1]` * $h_t$ is the RNN hidden state of size `[d,1]` * Then the transition matrices, * $U$ is of size `[d,e]` * $W$ is of size `[d,d]` * Clearly, $h_{t+1}$ will also be of size `[d,1]`. * In other words $$h_t = RNN(h_{t-1}, x_t)$$ Where $$RNN = \sigma(Wh_{t-1} + Ux_{t} + b)$$ * There is also an optional output layer at each time step, $y_t$ $$y_t = softmax(Vh_t+b)$$ Where $V$ is of size `[d,V]`