{%hackmd SybccZ6XD %} ###### tags: `paper` # Spatial Transformer Networks What is that? > The spatial transformer module is a **dynamic mechanism** that can actively spatially transform an image The architecture > ![](https://i.imgur.com/HpZ0leh.png) > U: input feature map > V: output feature Localisation net > Predict the parameter of $T_\theta(G)$ > ![](https://i.imgur.com/VfASWTp.png) The example of $T_\theta(G)$ > ![](https://i.imgur.com/uXzePQG.png) Why it need the sampler > The output and input is the same size, so we only need to sample certain part. > ![](https://i.imgur.com/zZX38ZP.png) ## Appendix We define an MNIST addition task, where the network must output the sum of the two digits given in the input. > ![](https://i.imgur.com/PFHIxOs.png) The classification task > Find the feature part and predict > ![](https://i.imgur.com/eDz7Wqs.png)