CUED - Roadmap for Principled Deep Learning

a roadmap for # Principled Deep Learning  ### Ferenc Huszár Computer Lab, Cambridge Gatsby Computational Neuroscience Unit, UCL --- ### ML@CL ![](https://i.imgur.com/4eME22N.jpg) --- ### early 2000's: some favourite papers ![](https://i.imgur.com/H8CYcFg.png) --- ### early 2000's: some favourite papers ![](https://i.imgur.com/0sFdOPj.png) --- ### early 2000's: some favourite papers ![](https://i.imgur.com/WqUKpCP.png) --- ### early 2000s shift from bottom-up to top-down innovation * make connections between different methods * cast methods in a common framework * abstract out key driving principles * justify new methods from first principles --- ### 2010s: deep learning ![](https://i.imgur.com/MOeIywG.jpg) --- ### 2010s: deep learning ![](https://i.imgur.com/W2cBiJ6.jpg) --- ## deep learning challenged our existing principles and assumptions --- ### generalization *"it's a property of the model class"* ### representation learning *"maximum likelihood is all you need"* ### probabilistic foundations *"Bayesian learning is the best kind of learning"* ### causal inference *"goal of ML is to predict one thing from another"* --- ## Main themes in my research --- ### generalization and optimization ### representation learning ### probabilistic foundations ### causal inference --- ## Generalization --- ## Generalization ![](https://i.imgur.com/Tu5SHpr.png) --- ## Generalization ![](https://i.imgur.com/8bkhxAv.png) --- ## Generalization ![](https://i.imgur.com/YHedAr6.png) --- ## Generalization: deep nets ![](https://i.imgur.com/bfyRBsx.png) --- ## Generalization: deep nets ![](https://i.imgur.com/fzLYvHe.png) --- ## Generalization * implicit regularization of optimization method * new tools: * neural tangent kernel [(Jacot et al, 2018)](https://arxiv.org/abs/1806.07572) * infinite width neural networks * new insights: * deep linear models [(e.g. Arora et al, 2019)](https://arxiv.org/abs/1905.13655) * my particular focus: * natural gradient descent --- ## Representation learning --- ## Representation learning * unsupervised learning * learn an alternative representation of data that is 'more useful' * what does 'more useful' mean? --- ## goal 1: data-efficiency ![](https://i.imgur.com/R69OIFN.png) --- ## goal 2: Linearity ![](https://i.imgur.com/Cqnw7f2.png) --- ## Latent variable modeling * $x$: raw data * $z$: latent representation/hidden variables * $p_\theta(x, z)$: latent variable model * $p_\theta(x) = \int p_\theta(x,z) dz$: marginal likelihood * maximum likelihood: $\theta^\ast = \operatorname{argmax}_\theta \sum_{n=1}^N \log p_\theta(x_n)$ * $p_\theta(z \vert x) = \frac{p_\theta(x,z)}{p_\theta(x)}$: "inference" model * $p_\theta(x,z) = p_\theta(z|x)p_\theta(x)$ --- #### Representation learning vs max likelihood ![](https://i.imgur.com/SPx9AoA.png) --- #### Representation learning vs max likelihood ![](https://i.imgur.com/EqHhQVh.png) --- #### Representation learning vs max likelihood ![](https://i.imgur.com/L0n5kSI.png) --- #### Representation learning vs max likelihood ![](https://i.imgur.com/wuAdSbB.png) --- #### Representation learning vs max likelihood ![](https://i.imgur.com/DwGlp8k.png) --- #### Representation learning vs max likelihood ![](https://i.imgur.com/yuoEcbt.png) --- ### Open questions * under what assumptions is maximum likelihood a good criterion? * do variational methods provide useful implicit regularization? * can we find principled motivations for self-supervised criteria * provable self-supervised learning [(Lee et al, 2020)](https://arxiv.org/abs/2008.01064) * analysis of contrastive schemes [(Arora et al, 2020)](https://arxiv.org/pdf/1902.09229.pdf) --- ## Probabilistic foundations --- Bayes posterior: $p(\theta\vert \mathcal{D}) \propto p(\mathcal{D}\vert \theta) p(\theta)$ "cold" posterior: $p(\theta\vert \mathcal{D}) \propto p(\mathcal{D}\vert \theta)^T p(\theta)$ ![](https://i.imgur.com/hGMASKv.png) --- ![](https://i.imgur.com/Btd88VL.png) --- ## deep learning challenged our existing principles and assumptions --- ## 2020s - like the 2000s shift from bottom-up to top-down innovation * make connections between different methods * cast methods in a common framework * abstract out key driving principles * justify new methods from first principles --- ## Main themes in my research --- ### generalization and optimization ### representation learning ### probabilistic foundations ### causal inference --- ### Thank you! --- info for prospective PhD students: [inference.vc/phd](https://www.inference.vc/information-for-prospective-phd-students/)