Recurrent Neural Networks

Architecture

Image Not Showing Possible Reasons

The hidden state
$h_{t}$ depends on
$h_{t - 1}$ and
$x_{t}$ ie.

$h_{t} = σ (W h_{t - 1} + U x_{t} + b)$
Let us say, at timestep
$t$
- $x_{t}$ is the word embedding of size [e,1]
- $h_{t}$ is the RNN hidden state of size [d,1]
Then the transition matrices,
- $U$ is of size [d,e]
- $W$ is of size [d,d]
Clearly,
$h_{t + 1}$ will also be of size [d,1].
In other words

$h_{t} = R N N (h_{t - 1}, x_{t})$
Where

$R N N = σ (W h_{t - 1} + U x_{t} + b)$
There is also an optional output layer at each time step,
$y_{t}$

$y_{t} = s o f t m a x (V h_{t} + b)$
Where
$V$ is of size [d,V]