# Reading Group
[Rota link](https://docs.google.com/spreadsheets/d/1nrfLCDUdd_le6yTrYCbDzi-lYXVD2vTXVD6Bl6G_70U/edit#gid=0)
## Next up
## Past meetings
1. *Auto-encoding variational Bayes.*
π
**12 Oct 2020**
π€ Feri
π [paper](https://arxiv.org/abs/1312.6114) and π [notes](/o5ijDzi0SBGV5KUfYQP-6w)
1. *Auto-encoding variational Bayes. (cont'd)*
π
**26 Oct 2020**
π€ Feri
π [paper](https://arxiv.org/abs/1312.6114) and π [notes](/o5ijDzi0SBGV5KUfYQP-6w)
1. *Ξ²-VAE: learning basic visual concepts with a constrained variational frameework*
π
**9 Nov 2020**
π€ Csabi
π [paper](https://openreview.net/pdf?id=Sy2fzU9gl), π [notes](/RLB69IecTiueh1gJrH0seg), and [colab notebook](https://colab.research.google.com/drive/1CFlAepkqNHaptWX1Iie0dGgfHercJzgT#scrollTo=2mg2ofl259R1)
1. *Learning Fair Representations*
π
**23 Nov 2020**
π€ Mina
π [paper](http://proceedings.mlr.press/v28/zemel13.html), π [notes](/GZZGc1IRSTCsvzx6HYePvg), and [slides](https://docs.google.com/presentation/d/1H6Q8uW-aius2Hz23OIdrEuEzlE-HwMTBRn-XtKcl7iU/edit#slide=id.p)
1. *A Maximum-Likelihood Interpretation for Slow Feature Analysis*
π
**7 Dec 2020**
π€ Patrik
π [paper](http://learning.eng.cam.ac.uk/pub/Public/Turner/TurnerAndSahani2007a/turner-and-sahani-2007a.pdf), π [notes](/qPv4uKr-S6eDOyS3oewojg), and [slides](https://drive.google.com/file/d/1mVXuopiIP58TEomEQGdXCJVf_T_O5J3s/view?usp=sharing)
1. *The Kalman Filter*
π
**18 Jan 2021**
π€ Patrik
π [paper](http://web.mit.edu/kirtley/kirtley/binlustuff/literature/control/Kalman%20filter.pdf), π [notes and slides](https://hackmd.io/Ahihs6CfQ-SQIxmvpNS21w?both)
1. *Equality of Opportunity in Supervised Learning*
π
**1 Feb 2021**
π€ Emese
π [paper](https://papers.neurips.cc/paper/2016/file/9d2682367c3935defcb1f9e247a97c0d-Paper.pdf) and π [notes](https://drive.google.com/file/d/1gWHqJZEm00JnTyZIg1pjxay8xjSGfVpB/view?usp=sharing)
1. *Deep Residual Learning for Image Recognition*
π
**15 Feb 2021**
π€ V DΓ³ra
π [paper](https://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf), π [notes](https://hackmd.io/0UA4BTW3RqeILM8EBDSvDQ)
1. *Probabilistic PCA*
π
**1 Mar 2021**
π€ J DΓ³ri
π [paper](https://www.robots.ox.ac.uk/~cvrg/hilary2006/ppca.pdf), π [notes]()
1. *Monte Carlo Gradient Estimation in Machine Learning*
π
**15 Mar 2021**
π€ Bea
π [paper](https://arxiv.org/abs/1906.10652), π [notes](/X1jFHugtRiyQtjOJuDM2hA), [slides](https://drive.google.com/file/d/19hKLELff55s9bHKNU0AtLLE5aXENlwml/view?usp=sharing)
1. *Towards Principled Methods for Training Generative Adversarial Networks*
π
**22 Mar 2021**
π€ Martin Arjovsky (guest)
π [paper](https://arxiv.org/abs/1701.04862)
1. *Wasserstein GAN*
π
**29 Mar 2021**
π€ Anna
π [paper](https://arxiv.org/abs/1701.07875), π [notes](https://hackmd.io/@mljc/H1Biuwlru)
1. *Policy Gradient Methods for Reinforcement Learning with Function Approximation*
π
**12 Apr 2021**
π€ S Attila
π [paper](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf), π [notes]()
1. *Kernel-Predicting Convolutional Networks for Denoising Monte Carlo Renderings*
π
**26 Apr 2021**
π€ Enci
π [paper](https://studios.disneyresearch.com/wp-content/uploads/2019/03/Kernel-Predicting-Convolutional-Networks-for-Denoising-Monte-Carlo-Renderings-Paper33.pdf), π [notes](/Oq9GgjVdSBiLtUSqSVMurg)
1. *Independent Component Analysis*
π
**10 May 2021**
π€ Patrik
π [notes](https://hackmd.io/tFr-eBO5R7WLtEc1PHRDKw)
1. *Reformer: The Efficient Transformer*
π
**24 May 2021**
π€ Bence
π [paper](https://arxiv.org/pdf/2001.04451.pdf) and π [notes](https://hackmd.io/37N3_YhqScK3crIMMvi9-g)
1. *Explainable ML Overview*
π
**7 June 2021**
π€ Emese
π [paper](https://arxiv.org/pdf/2102.13076.pdf) and π [slides](https://drive.google.com/file/d/1euLaqHF5pJ_MM8DOYAV2T3H2M-5qSHii/view?usp=sharing)
1. *Guest Seminar: Vision Transformers and MLP mixer*
π
**21 Jun 2021**
π€ Neil Houlsby
π [ViT paper](https://arxiv.org/abs/2010.11929), [MLP mixer paper](https://arxiv.org/abs/2105.01601v4)
1. *Guest Seminar: Deterministic Policy Gradients, RL for Continuous Control*
π
**28 Jun 2021**
π€ Nicolas Heess
π [DPG paper](http://proceedings.mlr.press/v32/silver14.pdf), [DDPG paper](https://arxiv.org/pdf/1509.02971.pdf)
1. *Understanding deep learning requires rethinking generalization*
π
**13 Sep 2021**
π€ Feri
π [arxiv](https://arxiv.org/abs/1611.03530)
1. *Lottery Ticket Hypothesis*
π
**28 Jun 2021**
π€ Mina
π [arXiv](https://arxiv.org/abs/1803.03635)
1. *Score Based Generative Modeling through Stochastic Differential Equations*
π
**28 Jun 2021**
π€ MΓ‘tΓ©
π [arXiv](https://arxiv.org/abs/2011.13456)
1. *Representation Learning with Contrastive Predictive Coding*
π
**12 Nov 2021**
π€ Bea
π [paper](https://arxiv.org/abs/1807.03748) and [follow-up paper](https://arxiv.org/abs/1905.09272)
1. *Guest seminar*: *Data-Efficient Representation Learning and Contrastive Losses*
π
**19 Nov 2021**
π€ Olivier HΓ©naff
π [Divide and Contrast paper](https://arxiv.org/abs/2105.08054)
1. *SimCLR v1/v2 and Intriguing Properties of Contrastive Losses*
π
**26 Nov 2021**
π€ Ting Chen
π [SimCLR v1 paper](https://arxiv.org/abs/2002.05709), [SimCLR v2 paper](https://arxiv.org/abs/2006.10029)
1. *Contrastive Learning Inverts the Data Generating Process*
π
**10 Dec 2021**
π€ Eszter
π [paper](https://arxiv.org/abs/2102.08850)
1. *Deep Q-learning*
π
**7 Jan 2022**
π€ Attila
π [paper](https://www.nature.com/articles/nature14236)
1. *TRPO*
π
**4 Feb 2022**
π€ Attila
π [paper](https://arxiv.org/abs/1502.05477)
1. *The Q-manifesto*
π
**11 March 2022**
π€ Gergely Neu
π [logistic Q-learning paper](https://arxiv.org/abs/2010.11151)
1. *Gauge Invariant Convolutional Networks.*
π
**18 March 2022**
π€ Szilvi
π [paper](https://arxiv.org/abs/1902.04615)
# Papers
## Generative Models
### VAE
- Diederik P Kingma, Max Welling (2013) **Auto-encoding variational Bayes.** ICLR [pdf](https://arxiv.org/abs/1312.6114)
- Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed and Alexander Lerchner (2017) **$\beta$-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework** [web](http://www.matthey.me/publication/beta-vae/) [openreview](https://openreview.net/forum?id=Sy2fzU9gl)
- Shakir Mohamed, Mihaela Rosca, Michael Figurnov, Andriy Mnih,. (2019) **Monte Carlo Gradient Estimation in Machine Learning** [pdf](https://arxiv.org/abs/1906.10652)
- Milton Llera Montero, Casimir JH Ludwig, Rui Ponte Costa, Gaurav Malhotra, Jeffrey Bowers (2021): **The role of Disentanglement in Generalisation** [openreview](https://openreview.net/forum?id=qbH974jKUVy)
- Zhisheng Xiao, Karsten Kreis, Jan Kautz, Arash Vahdat (2021) : **VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models** [openreview](https://openreview.net/forum?id=5m3SEczOV8L)
### GANs
* Martin Arjovsky and LΓ©on Bottou (2017) **Towards Principled Methods for Training Generative Adversarial Networks** [arXiv](https://arxiv.org/abs/1701.04862)
* Martin Arjovsky, Soumith Chintala and LΓ©on Bottou (2018) **Wasserstein GAN** [arXiv](https://arxiv.org/abs/1701.07875)
* Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. (2016) **InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets.** NeurIPS [pdf](https://arxiv.org/abs/1606.03657), [inFERENCe](https://www.inference.vc/infogan-variational-bound-on-mutual-information-twice/)
* Tero Karras, Samuli Laine and Timo Aila (2019) **A Style-Based Generator Architecture for Generative Adversarial Networks** [web](https://openaccess.thecvf.com/content_CVPR_2019/html/Karras_A_Style-Based_Generator_Architecture_for_Generative_Adversarial_Networks_CVPR_2019_paper.html)
### Maximum Likelihood, linear-Gaussian, ICA
* Laurenz Wiskott and Terrence J. Sejnowski (2002) **Slow Feature Analysis: Unsupervised Learning of Invariances** [pdf](http://www.cnbc.cmu.edu/~tai/readings/learning/wiskott_sejnowski_2002.pdf)
* Mike Tipping and Chris Bishop **Probabilistic Principal Components Analysis** [pdf](https://www.microsoft.com/en-us/research/publication/probabilistic-principal-component-analysis/)
* James V. Stone **Independent Component Analysis: A Tutorial Introduction** [pdf](http://pzs.dstu.dp.ua/DataMining/ica/bibl/Stone.pdf)
* Andrew Ng's video lecture on ICA [video](https://www.youtube.com/watch?v=YQA9lLdLig8&t=1s)
### Misc
* Aapo Hyvarinen **Estimation of Non-Normalized Statistical Models by Score Matching** [pdf](https://jmlr.org/papers/volume6/hyvarinen05a/old.pdf)
* Geoffrey Hinton (2002) **Training Products of Experts by Minimizing Contrastive Divergence** [web](https://direct.mit.edu/neco/article/14/8/1771/6687/Training-Products-of-Experts-by-Minimizing)
* Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole (2021) **Score-Based Generative Modeling through Stochastic Differential Equations** [arXiv](https://arxiv.org/abs/2011.13456)
## Architectures
- Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun (2016) **Deep Residual Learning for Image Recognition** [pdf](https://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf)
- Martin Arjovsky, Amar Shah, Yoshua Bengio (2015) **Unitary Evolution Recurrent Neural Networks** [pdf](https://arxiv.org/pdf/1511.06464.pdf)
## Generalization/theory
- Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals (2016) **Understanding deep learning requires rethinking generalization** [arxiv](https://arxiv.org/abs/1611.03530)
## Fairness, privacy-preserving ML
- Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, Cynthia Dwork (2013) **Learning Fair Representations.** ICML [pdf](http://proceedings.mlr.press/v28/zemel13.html)
- Moritz Hardt, Eric Price, and Nati Srebro (2016) **Equality of opportunity in supervised learning.** NeurIPS [pdf](http://papers.nips.cc/paper/6373-equality-of-opportunity-in-supervised-learning)
- Luca Melis, Congzheng Song, Emiliano DeCristofaro, Vitaly Shmatikov (2018) **Exploiting Unintended Feature Leakage in Collaborative Learning** IEEE Symposium on Security and Privacy [pdf](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8835269)
## Reinforcement learning
- Richard S. Sutton, David McAllester, Satinder Singh and Yishay Mansour (1999) **Policy Gradient Methods for Reinforcement Learning with Function Approximation** [pdf](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf)
- John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan and Pieter Abbeel (2015) **Trust Region Policy Optimization** [arxiv](https://arxiv.org/abs/1502.05477)
- David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra and Martin Riedmiller (2014) **Deterministic Policy Gradient Algorithms** [pdf](http://proceedings.mlr.press/v32/silver14.pdf)
- Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval Tassa and Tom Erez (2015) **Learning Continuous Control Policies by Stochastic Value Gradients** [arXiv](https://arxiv.org/abs/1510.09142)
- John Schulman, Nicolas Heess, Theophane Weber and Pieter Abbeel (2015) **Gradient Estimation Using Stochastic Computation Graphs** [arXiv](https://arxiv.org/abs/1506.05254)
- ThΓ©ophane Weber, Nicolas Heess, Lars Buesing and David Silver (2019) **Credit Assignment Techniques in Stochastic Computation Graphs** [arXiv](https://arxiv.org/abs/1901.01761)
## NLP
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin (2017) **Attention Is All You Need.** [arxiv](https://arxiv.org/abs/1706.03762)
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut (2019) **ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.** [arxiv](https://arxiv.org/abs/1909.11942)
- Nikita Kitaev, Εukasz Kaiser, Anselm Levskaya (2020) **Reformer: The Efficient Transformer.** [arxiv](https://arxiv.org/abs/2001.04451)
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu (2020) **Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.** [arxiv](https://arxiv.org/abs/1910.10683)
# Suggested Content
## Online lectures
- Philipp Hennig: Probabilistic Machine Learning [link](https://uni-tuebingen.de/en/180804)
## Miscellaneous (blogposts, visualizations, etc.)
- Andrew Miller **Monte Carlo Gradient Estimators and Variational Inference** [link](http://andymiller.github.io/2016/12/19/elbo-gradient-estimators.html)
- Yang Song **Generative Modeling by Estimating Gradients of the Data Distribution** [link](http://yang-song.github.io/blog/2021/score/)
## Textbooks
- Kevin Murphy: Machine Learing: a Probabilistic Perspective [pdf](https://www.cs.ubc.ca/~murphyk/MLbook/), the first link may not lead to a pdf file: [pdf-2](https://doc.lagout.org/science/Artificial%20Intelligence/Machine%20learning/Machine%20Learning_%20A%20Probabilistic%20Perspective%20%5BMurphy%202012-08-24%5D.pdf)
- David MacKay: Information Theory, Inference and Learning Algorithms [pdf](http://www.inference.org.uk/mackay/itila/) See also David's [MLSS lectures](http://videolectures.net/mlss09uk_mackay_it/), and famous [information theory lectures](http://videolectures.net/david_mackay/).
- Marc Deisenroth, Aldo Faisal and Cheng Soon Ong. Mathematics for Machine Learning [web](https://mml-book.github.io/), [pdf](https://mml-book.github.io/book/mml-book.pdf)
- Markus Svensen, Chris Bishop: Pattern Recognition and Machine Learning [pdf](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf)
- Ian Goodfellow, Yoshua Bengio, Aaron Courville: The Deep Learning Book [pdf](https://github.com/janishar/mit-deep-learning-book-pdf/blob/master/complete-book-bookmarked-pdf/deeplearningbook.pdf)
- Fancis Bach: Learning Theory from First Principles [pdf](https://www.di.ens.fr/~fbach/ltfp_book.pdf)
# Concepts to learn
Just notes on what concepts should we eventually learn about from the reading list.
* independence, conditional independence, explaining away
* matrix decompositions: PCA, matrix factorization, eigendimensions, slow feature analysis
* variational bound, Jensen inequality, KL divergence, ELBO, EM algorithm
* exponential family distributions, normalizing constants, conjugate priors
* decision theory, Bayes-optimality, optimality under L2 vs L1 loss
* stochastic gradient descent, convergence basics, generalisation properties
* constrained optimisation: augmented Lagrangians, Lagrange multipliers
* convex optimization: duality, Newton's method
* natural gradients: trust region view, Fisher information matrix, K-FAC
### Useful taxonomy/terms/expressions
* Score(-function): gradient of the log-likelihood function w.r.t. the parameter vector.[link](https://en.wikipedia.org/wiki/Score_(statistics))
$$ s\left(\theta\right) = \dfrac{\partial \mathcal{L}\left(\theta\right)}{\partial \theta} $$