# Reading Group [Rota link](https://docs.google.com/spreadsheets/d/1nrfLCDUdd_le6yTrYCbDzi-lYXVD2vTXVD6Bl6G_70U/edit#gid=0) ## Next up ## Past meetings 1. *Auto-encoding variational Bayes.* πŸ“… **12 Oct 2020** πŸ‘€ Feri πŸ“„ [paper](https://arxiv.org/abs/1312.6114) and πŸ“ [notes](/o5ijDzi0SBGV5KUfYQP-6w) 1. *Auto-encoding variational Bayes. (cont'd)* πŸ“… **26 Oct 2020** πŸ‘€ Feri πŸ“„ [paper](https://arxiv.org/abs/1312.6114) and πŸ“ [notes](/o5ijDzi0SBGV5KUfYQP-6w) 1. *Ξ²-VAE: learning basic visual concepts with a constrained variational frameework* πŸ“… **9 Nov 2020** πŸ‘€ Csabi πŸ“„ [paper](https://openreview.net/pdf?id=Sy2fzU9gl), πŸ“ [notes](/RLB69IecTiueh1gJrH0seg), and [colab notebook](https://colab.research.google.com/drive/1CFlAepkqNHaptWX1Iie0dGgfHercJzgT#scrollTo=2mg2ofl259R1) 1. *Learning Fair Representations* πŸ“… **23 Nov 2020** πŸ‘€ Mina πŸ“„ [paper](http://proceedings.mlr.press/v28/zemel13.html), πŸ“ [notes](/GZZGc1IRSTCsvzx6HYePvg), and [slides](https://docs.google.com/presentation/d/1H6Q8uW-aius2Hz23OIdrEuEzlE-HwMTBRn-XtKcl7iU/edit#slide=id.p) 1. *A Maximum-Likelihood Interpretation for Slow Feature Analysis* πŸ“… **7 Dec 2020** πŸ‘€ Patrik πŸ“„ [paper](http://learning.eng.cam.ac.uk/pub/Public/Turner/TurnerAndSahani2007a/turner-and-sahani-2007a.pdf), πŸ“ [notes](/qPv4uKr-S6eDOyS3oewojg), and [slides](https://drive.google.com/file/d/1mVXuopiIP58TEomEQGdXCJVf_T_O5J3s/view?usp=sharing) 1. *The Kalman Filter* πŸ“… **18 Jan 2021** πŸ‘€ Patrik πŸ“„ [paper](http://web.mit.edu/kirtley/kirtley/binlustuff/literature/control/Kalman%20filter.pdf), πŸ“ [notes and slides](https://hackmd.io/Ahihs6CfQ-SQIxmvpNS21w?both) 1. *Equality of Opportunity in Supervised Learning* πŸ“… **1 Feb 2021** πŸ‘€ Emese πŸ“„ [paper](https://papers.neurips.cc/paper/2016/file/9d2682367c3935defcb1f9e247a97c0d-Paper.pdf) and πŸ“ [notes](https://drive.google.com/file/d/1gWHqJZEm00JnTyZIg1pjxay8xjSGfVpB/view?usp=sharing) 1. *Deep Residual Learning for Image Recognition* πŸ“… **15 Feb 2021** πŸ‘€ V DΓ³ra πŸ“„ [paper](https://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf), πŸ“ [notes](https://hackmd.io/0UA4BTW3RqeILM8EBDSvDQ) 1. *Probabilistic PCA* πŸ“… **1 Mar 2021** πŸ‘€ J DΓ³ri πŸ“„ [paper](https://www.robots.ox.ac.uk/~cvrg/hilary2006/ppca.pdf), πŸ“ [notes]() 1. *Monte Carlo Gradient Estimation in Machine Learning* πŸ“… **15 Mar 2021** πŸ‘€ Bea πŸ“„ [paper](https://arxiv.org/abs/1906.10652), πŸ“ [notes](/X1jFHugtRiyQtjOJuDM2hA), [slides](https://drive.google.com/file/d/19hKLELff55s9bHKNU0AtLLE5aXENlwml/view?usp=sharing) 1. *Towards Principled Methods for Training Generative Adversarial Networks* πŸ“… **22 Mar 2021** πŸ‘€ Martin Arjovsky (guest) πŸ“„ [paper](https://arxiv.org/abs/1701.04862) 1. *Wasserstein GAN* πŸ“… **29 Mar 2021** πŸ‘€ Anna πŸ“„ [paper](https://arxiv.org/abs/1701.07875), πŸ“ [notes](https://hackmd.io/@mljc/H1Biuwlru) 1. *Policy Gradient Methods for Reinforcement Learning with Function Approximation* πŸ“… **12 Apr 2021** πŸ‘€ S Attila πŸ“„ [paper](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf), πŸ“ [notes]() 1. *Kernel-Predicting Convolutional Networks for Denoising Monte Carlo Renderings* πŸ“… **26 Apr 2021** πŸ‘€ Enci πŸ“„ [paper](https://studios.disneyresearch.com/wp-content/uploads/2019/03/Kernel-Predicting-Convolutional-Networks-for-Denoising-Monte-Carlo-Renderings-Paper33.pdf), πŸ“ [notes](/Oq9GgjVdSBiLtUSqSVMurg) 1. *Independent Component Analysis* πŸ“… **10 May 2021** πŸ‘€ Patrik πŸ“„ [notes](https://hackmd.io/tFr-eBO5R7WLtEc1PHRDKw) 1. *Reformer: The Efficient Transformer* πŸ“… **24 May 2021** πŸ‘€ Bence πŸ“„ [paper](https://arxiv.org/pdf/2001.04451.pdf) and πŸ“ [notes](https://hackmd.io/37N3_YhqScK3crIMMvi9-g) 1. *Explainable ML Overview* πŸ“… **7 June 2021** πŸ‘€ Emese πŸ“„ [paper](https://arxiv.org/pdf/2102.13076.pdf) and πŸ“ [slides](https://drive.google.com/file/d/1euLaqHF5pJ_MM8DOYAV2T3H2M-5qSHii/view?usp=sharing) 1. *Guest Seminar: Vision Transformers and MLP mixer* πŸ“… **21 Jun 2021** πŸ‘€ Neil Houlsby πŸ“„ [ViT paper](https://arxiv.org/abs/2010.11929), [MLP mixer paper](https://arxiv.org/abs/2105.01601v4) 1. *Guest Seminar: Deterministic Policy Gradients, RL for Continuous Control* πŸ“… **28 Jun 2021** πŸ‘€ Nicolas Heess πŸ“„ [DPG paper](http://proceedings.mlr.press/v32/silver14.pdf), [DDPG paper](https://arxiv.org/pdf/1509.02971.pdf) 1. *Understanding deep learning requires rethinking generalization* πŸ“… **13 Sep 2021** πŸ‘€ Feri πŸ“„ [arxiv](https://arxiv.org/abs/1611.03530) 1. *Lottery Ticket Hypothesis* πŸ“… **28 Jun 2021** πŸ‘€ Mina πŸ“„ [arXiv](https://arxiv.org/abs/1803.03635) 1. *Score Based Generative Modeling through Stochastic Differential Equations* πŸ“… **28 Jun 2021** πŸ‘€ MΓ‘tΓ© πŸ“„ [arXiv](https://arxiv.org/abs/2011.13456) 1. *Representation Learning with Contrastive Predictive Coding* πŸ“… **12 Nov 2021** πŸ‘€ Bea πŸ“„ [paper](https://arxiv.org/abs/1807.03748) and [follow-up paper](https://arxiv.org/abs/1905.09272) 1. *Guest seminar*: *Data-Efficient Representation Learning and Contrastive Losses* πŸ“… **19 Nov 2021** πŸ‘€ Olivier HΓ©naff πŸ“„ [Divide and Contrast paper](https://arxiv.org/abs/2105.08054) 1. *SimCLR v1/v2 and Intriguing Properties of Contrastive Losses* πŸ“… **26 Nov 2021** πŸ‘€ Ting Chen πŸ“„ [SimCLR v1 paper](https://arxiv.org/abs/2002.05709), [SimCLR v2 paper](https://arxiv.org/abs/2006.10029) 1. *Contrastive Learning Inverts the Data Generating Process* πŸ“… **10 Dec 2021** πŸ‘€ Eszter πŸ“„ [paper](https://arxiv.org/abs/2102.08850) 1. *Deep Q-learning* πŸ“… **7 Jan 2022** πŸ‘€ Attila πŸ“„ [paper](https://www.nature.com/articles/nature14236) 1. *TRPO* πŸ“… **4 Feb 2022** πŸ‘€ Attila πŸ“„ [paper](https://arxiv.org/abs/1502.05477) 1. *The Q-manifesto* πŸ“… **11 March 2022** πŸ‘€ Gergely Neu πŸ“„ [logistic Q-learning paper](https://arxiv.org/abs/2010.11151) 1. *Gauge Invariant Convolutional Networks.* πŸ“… **18 March 2022** πŸ‘€ Szilvi πŸ“„ [paper](https://arxiv.org/abs/1902.04615) # Papers ## Generative Models ### VAE - Diederik P Kingma, Max Welling (2013) **Auto-encoding variational Bayes.** ICLR [pdf](https://arxiv.org/abs/1312.6114) - Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed and Alexander Lerchner (2017) **$\beta$-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework** [web](http://www.matthey.me/publication/beta-vae/) [openreview](https://openreview.net/forum?id=Sy2fzU9gl) - Shakir Mohamed, Mihaela Rosca, Michael Figurnov, Andriy Mnih,. (2019) **Monte Carlo Gradient Estimation in Machine Learning** [pdf](https://arxiv.org/abs/1906.10652) - Milton Llera Montero, Casimir JH Ludwig, Rui Ponte Costa, Gaurav Malhotra, Jeffrey Bowers (2021): **The role of Disentanglement in Generalisation** [openreview](https://openreview.net/forum?id=qbH974jKUVy) - Zhisheng Xiao, Karsten Kreis, Jan Kautz, Arash Vahdat (2021) : **VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models** [openreview](https://openreview.net/forum?id=5m3SEczOV8L) ### GANs * Martin Arjovsky and LΓ©on Bottou (2017) **Towards Principled Methods for Training Generative Adversarial Networks** [arXiv](https://arxiv.org/abs/1701.04862) * Martin Arjovsky, Soumith Chintala and LΓ©on Bottou (2018) **Wasserstein GAN** [arXiv](https://arxiv.org/abs/1701.07875) * Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. (2016) **InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets.** NeurIPS [pdf](https://arxiv.org/abs/1606.03657), [inFERENCe](https://www.inference.vc/infogan-variational-bound-on-mutual-information-twice/) * Tero Karras, Samuli Laine and Timo Aila (2019) **A Style-Based Generator Architecture for Generative Adversarial Networks** [web](https://openaccess.thecvf.com/content_CVPR_2019/html/Karras_A_Style-Based_Generator_Architecture_for_Generative_Adversarial_Networks_CVPR_2019_paper.html) ### Maximum Likelihood, linear-Gaussian, ICA * Laurenz Wiskott and Terrence J. Sejnowski (2002) **Slow Feature Analysis: Unsupervised Learning of Invariances** [pdf](http://www.cnbc.cmu.edu/~tai/readings/learning/wiskott_sejnowski_2002.pdf) * Mike Tipping and Chris Bishop **Probabilistic Principal Components Analysis** [pdf](https://www.microsoft.com/en-us/research/publication/probabilistic-principal-component-analysis/) * James V. Stone **Independent Component Analysis: A Tutorial Introduction** [pdf](http://pzs.dstu.dp.ua/DataMining/ica/bibl/Stone.pdf) * Andrew Ng's video lecture on ICA [video](https://www.youtube.com/watch?v=YQA9lLdLig8&t=1s) ### Misc * Aapo Hyvarinen **Estimation of Non-Normalized Statistical Models by Score Matching** [pdf](https://jmlr.org/papers/volume6/hyvarinen05a/old.pdf) * Geoffrey Hinton (2002) **Training Products of Experts by Minimizing Contrastive Divergence** [web](https://direct.mit.edu/neco/article/14/8/1771/6687/Training-Products-of-Experts-by-Minimizing) * Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole (2021) **Score-Based Generative Modeling through Stochastic Differential Equations** [arXiv](https://arxiv.org/abs/2011.13456) ## Architectures - Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun (2016) **Deep Residual Learning for Image Recognition** [pdf](https://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf) - Martin Arjovsky, Amar Shah, Yoshua Bengio (2015) **Unitary Evolution Recurrent Neural Networks** [pdf](https://arxiv.org/pdf/1511.06464.pdf) ## Generalization/theory - Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals (2016) **Understanding deep learning requires rethinking generalization** [arxiv](https://arxiv.org/abs/1611.03530) ## Fairness, privacy-preserving ML - Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, Cynthia Dwork (2013) **Learning Fair Representations.** ICML [pdf](http://proceedings.mlr.press/v28/zemel13.html) - Moritz Hardt, Eric Price, and Nati Srebro (2016) **Equality of opportunity in supervised learning.** NeurIPS [pdf](http://papers.nips.cc/paper/6373-equality-of-opportunity-in-supervised-learning) - Luca Melis, Congzheng Song, Emiliano DeCristofaro, Vitaly Shmatikov (2018) **Exploiting Unintended Feature Leakage in Collaborative Learning** IEEE Symposium on Security and Privacy [pdf](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8835269) ## Reinforcement learning - Richard S. Sutton, David McAllester, Satinder Singh and Yishay Mansour (1999) **Policy Gradient Methods for Reinforcement Learning with Function Approximation** [pdf](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf) - John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan and Pieter Abbeel (2015) **Trust Region Policy Optimization** [arxiv](https://arxiv.org/abs/1502.05477) - David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra and Martin Riedmiller (2014) **Deterministic Policy Gradient Algorithms** [pdf](http://proceedings.mlr.press/v32/silver14.pdf) - Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval Tassa and Tom Erez (2015) **Learning Continuous Control Policies by Stochastic Value Gradients** [arXiv](https://arxiv.org/abs/1510.09142) - John Schulman, Nicolas Heess, Theophane Weber and Pieter Abbeel (2015) **Gradient Estimation Using Stochastic Computation Graphs** [arXiv](https://arxiv.org/abs/1506.05254) - ThΓ©ophane Weber, Nicolas Heess, Lars Buesing and David Silver (2019) **Credit Assignment Techniques in Stochastic Computation Graphs** [arXiv](https://arxiv.org/abs/1901.01761) ## NLP - Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin (2017) **Attention Is All You Need.** [arxiv](https://arxiv.org/abs/1706.03762) - Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut (2019) **ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.** [arxiv](https://arxiv.org/abs/1909.11942) - Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya (2020) **Reformer: The Efficient Transformer.** [arxiv](https://arxiv.org/abs/2001.04451) - Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu (2020) **Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.** [arxiv](https://arxiv.org/abs/1910.10683) # Suggested Content ## Online lectures - Philipp Hennig: Probabilistic Machine Learning [link](https://uni-tuebingen.de/en/180804) ## Miscellaneous (blogposts, visualizations, etc.) - Andrew Miller **Monte Carlo Gradient Estimators and Variational Inference** [link](http://andymiller.github.io/2016/12/19/elbo-gradient-estimators.html) - Yang Song **Generative Modeling by Estimating Gradients of the Data Distribution** [link](http://yang-song.github.io/blog/2021/score/) ## Textbooks - Kevin Murphy: Machine Learing: a Probabilistic Perspective [pdf](https://www.cs.ubc.ca/~murphyk/MLbook/), the first link may not lead to a pdf file: [pdf-2](https://doc.lagout.org/science/Artificial%20Intelligence/Machine%20learning/Machine%20Learning_%20A%20Probabilistic%20Perspective%20%5BMurphy%202012-08-24%5D.pdf) - David MacKay: Information Theory, Inference and Learning Algorithms [pdf](http://www.inference.org.uk/mackay/itila/) See also David's [MLSS lectures](http://videolectures.net/mlss09uk_mackay_it/), and famous [information theory lectures](http://videolectures.net/david_mackay/). - Marc Deisenroth, Aldo Faisal and Cheng Soon Ong. Mathematics for Machine Learning [web](https://mml-book.github.io/), [pdf](https://mml-book.github.io/book/mml-book.pdf) - Markus Svensen, Chris Bishop: Pattern Recognition and Machine Learning [pdf](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf) - Ian Goodfellow, Yoshua Bengio, Aaron Courville: The Deep Learning Book [pdf](https://github.com/janishar/mit-deep-learning-book-pdf/blob/master/complete-book-bookmarked-pdf/deeplearningbook.pdf) - Fancis Bach: Learning Theory from First Principles [pdf](https://www.di.ens.fr/~fbach/ltfp_book.pdf) # Concepts to learn Just notes on what concepts should we eventually learn about from the reading list. * independence, conditional independence, explaining away * matrix decompositions: PCA, matrix factorization, eigendimensions, slow feature analysis * variational bound, Jensen inequality, KL divergence, ELBO, EM algorithm * exponential family distributions, normalizing constants, conjugate priors * decision theory, Bayes-optimality, optimality under L2 vs L1 loss * stochastic gradient descent, convergence basics, generalisation properties * constrained optimisation: augmented Lagrangians, Lagrange multipliers * convex optimization: duality, Newton's method * natural gradients: trust region view, Fisher information matrix, K-FAC ### Useful taxonomy/terms/expressions * Score(-function): gradient of the log-likelihood function w.r.t. the parameter vector.[link](https://en.wikipedia.org/wiki/Score_(statistics)) $$ s\left(\theta\right) = \dfrac{\partial \mathcal{L}\left(\theta\right)}{\partial \theta} $$