# Recurrent Networks continued
### Ferenc Huszár (fh277)
DeepNN Lecture 9
---
### RNN: Recap

---
### The state update rule: naive
$$
\mathbf{h}_{t+1} = \phi(W_h \mathbf{h}_t + W_x \mathbf{x}_t + \mathbf{b_h})
$$
---
### The state update rule: GRU
\begin{align}
\mathbf{h}_{t+1} &= \mathbf{z}_t \odot \mathbf{h}_t + (1 - \mathbf{z}_t) \odot \tilde{\mathbf{h}}_t \\
\tilde{\mathbf{h}}_t &= \phi\left(W\mathbf{x}_t + U(\mathbf{r}_t \odot \mathbf{h}_t)\right)\\
\mathbf{r}_t &= \sigma(W_r\mathbf{x}_t + U_r\mathbf{h}_t)\\
\mathbf{z}_t &= \sigma(W_z\mathbf{x}_t + U_z\mathbf{h}_t)\\
\end{align}
---
### implementing branching logic
...in code:
```
if r:
return 5
else:
return 3
```
...in algebra:
```
return r*5 + (1-r)*3
```
---
### The state update rule: GRU
\begin{align}
\mathbf{h}_{t+1} &= \mathbf{z}_t \odot \mathbf{h}_t + (1 - \mathbf{z}_t) \odot \tilde{\mathbf{h}}_t \\
\tilde{\mathbf{h}}_t &= \phi\left(W\mathbf{x}_t + U(\mathbf{r}_t \odot \mathbf{h}_t)\right)\\
\mathbf{r}_t &= \sigma(W_r\mathbf{x}_t + U_r\mathbf{h}_t)\\
\mathbf{z}_t &= \sigma(W_z\mathbf{x}_t + U_z\mathbf{h}_t)\\
\end{align}
---
### Side note: dealing with depth

---
### Side note: dealing with depth

---
### Very deep networks are hard to train
* exploding/vanishing gradients
* their performance degrades with depth
* VGG19: 19-layer ConvNet
---
### Deep Residual Networks (ResNets)

---
### Deep Residual Networks (ResNets)

---
### ResNets
* allow for much deeper networks (101, 152 layer)
* performance increases with depth
* new record in benchmarks (ImageNet, COCO)
* used almost everywhere now
---
### Resnets behave like ensembles

from ([Veit et al, 2016](https://arxiv.org/pdf/1605.06431.pdf))
---
### DenseNets

---
### Back to RNNs
* like ResNets, LSTMs create "shortcuts"
* allows information to skip processing
* data-dependent gating
* data-dependent shortcuts
---
### Visualising RNN behaviours
See this [distill post](https://distill.pub/2019/memorization-in-rnns/)
---
### RNN: different uses

figure from [Andrej Karpathy's blog post](https://karpathy.github.io/2015/05/21/rnn-effectiveness/)
---
### RNNs for images

([Ba et al, 2014](https://arxiv.org/abs/1412.7755))
---
### RNNs for images

([Gregor et al, 2015](https://arxiv.org/abs/1502.04623))
---
### RNNs for painting

([Mellor et al, 2019](https://learning-to-paint.github.io/))
---
### RNNs for painting

---
### Spatial LSTMs

([Theis et al, 2015](https://arxiv.org/pdf/1506.03478.pdf))
---
### Spatial LSTMs generating textures

---
### Seq2Seq: sequence-to-sequence

([Sutskever et al, 2014](https://arxiv.org/pdf/1409.3215.pdf))
---
### Seq2Seq: neural machine translation

---
### Show and Tell: "Image2Seq"

([Vinyals et al, 2015](https://arxiv.org/pdf/1411.4555.pdf))
---
### Show and Tell: "Image2Seq"

([Vinyals et al, 2015](https://arxiv.org/pdf/1411.4555.pdf))
---
### Sentence to Parsing tree "Seq2Tree"

([Vinyals et al, 2014](https://arxiv.org/abs/1412.7449))
---
### General algorithms as Seq2Seq
travelling salesman

([Vinyals et al, 2015](https://arxiv.org/abs/1506.03134))
---
### General algorithms as Seq2Seq
convex hull and triangulation

---
### Pointer networks

---
### Revisiting the basic idea

"Asking the network too much"
---
### Attention layer

---
### Attention layer
Attention weights:
$$
\alpha_{t,s} = \frac{e^{\mathbf{e}^T_t \mathbf{d}_s}}{\sum_u e^{\mathbf{e}^T_t \mathbf{d}_s}}
$$
Context vector:
$$
\mathbf{c}_s = \sum_{t=1}^T \alpha_{t,s} \mathbf{e}_t
$$
---
### Attention layer visualised

---
### Language Transformers and Transfer Learning

---
### Zero-Shot Transfer
* train as language model - predict next token
* use prompts that encode models
---
### To engage with this material at home
Try the [char-RNN Exercise](https://github.com/udacity/deep-learning-v2-pytorch/blob/master/recurrent-neural-networks/char-rnn/Character_Level_RNN_Exercise.ipynb) from Udacity.
---
* neural machine translation (historical note)
* image captioning: encoder is a CNN, decoder is RNN
* forgetting problem revisited
* asking the network too much
* allowing the decoder to look back at encoder states
* pointer networks
{"metaMigratedAt":"2023-06-15T19:56:18.388Z","metaMigratedFrom":"YAML","title":"DeepNN Lecture 9 Slides","breaks":true,"description":"Lecture slides on recurrent neural networks, its variants like uRNNs, LSTMs. Touching on deep feed-forward networks like ResNets","contributors":"[{\"id\":\"e558be3b-4a2d-4524-8a66-38ec9fea8715\",\"add\":6459,\"del\":999}]"}