NLM-uncertainty-parametrization

Unfiltered Research Notes
Important Links Papers Drive Resource Links: http://cs229.stanford.edu/section/cs229-gaussian_processes.pdf https://gregorygundersen.com/blog/2019/12/23/random-fourier-features/#a1-gaussian-kernel-derivation 5/9
Maximillian Guo changed 3 years agoEdit mode Like Bookmark
Paper Outline
general Question we are trying to answer covariance structure size of data with respect to width MAP training (determinant) claim that only a small number of bases fit the data? Motivation Current literature on NLM
LucyH changed 3 years agoView mode Like Bookmark
Thoughts from Friday 3/11
Overarching research questions What is a good basis? BUT - this is dependent on where the data is EX1. consider any basis that is infinitely differentiable in a region, then if we consider points close enough, we'll obtain piecewise linear and will not obtain good uncertainties nor fit (we think) if this is true, this tells us that we need the in-distribution and out-of-distribution areas of the data in order to meaningfully ask this question This leads us to...
Maximillian Guo changed 3 years agoView mode Like Bookmark
Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited
Main Ideas: Mysterious why NN’s can generalize well with so many more parameters than data points Effective dimensionality (ED) measures the dimensionality of the parameter space determined by the data Relates ED to posterior contraction in Bayesian deep learning, model selection, width-depth tradeoffs, double descent, and functional diversity in loss surfaces. ED compares favorably to alternative norm- and flatness-based generalization measures. Effective Dimensionality (ED)
LucyH changed 3 years agoView mode Like Bookmark
Meta Learning & Transfer Learning
Lectures UC-Berkeley 20 min primer on Meta Learning https://www.youtube.com/watch?v=h7qyQeXKxZE Papers Meta-Learning in Neural Networks: A Survey Notes Hypothesis: Good basis functions should transfer well to different tasks, but data-specific basis functions might not transfer well.
LucyH changed 3 years agoView mode Like Bookmark
Deep Kernel Learning
Deep Kernel Learning combines the representational power of neural networks with the reliable uncertainty estimates of Gaussian processed by learning a flexible deep kernel function. The complexity is O(n) at train time and O(1) at test time compared to O(n^3) at train time and O(n^2) at test time for standard Gaussian processes. One of the central critiques of Gaussian process regression is that it does not actually learn representations of the data. This is because the kernel function is specified and not flexible enough to learn representations of the data. We can solve this through deep kernel learning (DKL), which maps the inputs $x_n$ to intermediate values $v_n \in \mathbb{R}^Q$ through a neural network $g_\phi(·)$ parameterized by weights and biases, $\phi$. These intermediate values are then used as inputs to the standard kernel resulting in the effective kernel $k_{DKL}(x, x_0) = k(g_\phi(x), g_\phi(x_0))$. NLM is essentially Deep Kernel Learning with linear kernel
Geoffrey Liu changed 3 years agoView mode Like Bookmark
Weight expansion: A new perspective on dropout and generalization
Paper main idea: new measurement of generalization; understand dropout’s effectiveness in improving generalization New measurement proposed by the paper: weight expansion. With larger weight volume, one can achieve increased generalization in a PAC-Bayesian setting. Application: Apply weight expansions to dropout. Theoretically and empirically examine that the application of dropout during training “expands” the weight volume. Definition: weight volume- normalized determinant of the weight covariance matrix intuitively, the more correlated they are, the smaller the weight volume and worse generalization ability More orthogonal -> larger weight volume
LucyH changed 3 years agoView mode Like Bookmark
Glossary
(Put terminology here and link sources!) Precision Matrix: Inverse of the covariance matrix
Maximillian Guo changed 3 years agoView mode Like Bookmark
Uncertainty-Aware (UNA) Bases for Deep Bayesian Regression using Multi-Headed Auxiliary Networks
1/31/22: Todo: replicate experiments. sample from the prior and get the same function Main Ideas: NLM’s are deep Bayesian models, produce predictive uncertainties, then perform Bayesian linear regression over these features. Few works have methodically evaluated the predictive uncertainties of NLMs Traditional training procedures for NLM’s underestimate uncertainty on OOD Identify underlying reasons
LucyH changed 3 years agoView mode Like Bookmark
Overparameterization improves robustness to covariate shift
Main Idea:
Maximillian Guo changed 3 years agoView mode Like Bookmark
Neural Linear Models with Functional Gaussian Process Priors
Idea: Adding functional priors and using fVI (note: I saw fPOVI in another paper), we get BNN-like uncertainty using NLMs Questions/Technicalities: Prior choice is hard? Why use an uninformative prior when we have other methods Brings complexity up - Quadratric -> Cubic in worse case. Probably should not proceed with this approach.
Maximillian Guo changed 3 years agoView mode Like Bookmark
Scalable Bayesian Optimization Using Deep Neural Networks
GPs scale cubically with the number of observations -> NLM scales linearly, making Bayes Opt easier while maintaining flexibility and uncertainty cubically in the basis function dimensionality, instead of growing with the number of observations as in GP Related applications: Applications in reinforcement learning (see Riquelme et al., 2018 and Azizzadenesheli and Anandkumar, 2019 https://arxiv.org/abs/1802.09127, https://arxiv.org/abs/1802.04412), active learning, AutoML (Zhou and Precioso, 2019 https://arxiv.org/abs/1904.00577) Todo (lucy): understand the math?
LucyH changed 3 years agoView mode Like Bookmark
Benchmarking the Neural Linear Model for Regression
Paper main ideas: Three variations of neural linear regression MAP NL (first train the neural network using MAP estimation, then outputs of the last hidden layer of the network used as features for BLN. Use Bayesian opt for tuning) uncertainty is only added with the last layer, MAP training does not learn with the goal of uncertainty quantification in mind Regularized NL: learn the features by optimizing the tractable marginal likelihood with respect to the network weights (previous to the output layer), Bayesian noise NL
LucyH changed 3 years agoView mode Like Bookmark
Do Wide And Deep Networks Learn The Same Things? Uncovering How Neural Network Representations Vary With Width And Depth
MAIN IDEAS: There is a limited understanding of the effects of depth and width on learned representations How does varying depth and width affect model hidden representations? Characteristic block structure in hidden representations of larger capacity models Implies model capacity is large relative to the size of the training set Implies underlying layers preserving and propagating dominant principal component of their representations
Maximillian Guo changed 3 years agoView mode Like Bookmark
An Empirical Study on The Properties of Random Bases for Kernel Methods
MAIN IDEAS: Kernel machines and NN’s possess universal function approximation properties But ways of choosing the appropriate function class differ NN’s learn representation by adapting their basis functions to the data Kernel methods use a basis not adapted during training Contrast random features of approximated kernel machines with learned features of NN’s
Maximillian Guo changed 3 years agoView mode Like Bookmark
Latent Derivative Bayesian Last Layer Networks
MAIN IDEAS: Approximate inference techniques for weight space priors of BNN’s suffer from several drawbacks The ‘Bayesian last layer’ (BLL) is an alternative BNN approach that learns the feature space for an exact Bayesian linear model with explicit predictive distributions. Its predictions outside of the data distribution (OOD) are typically overconfident Overcome this weakness by introducing a functional prior on the model’s derivatives This method enhances the BLL to Gaussian process-like performance on tasks where calibrated uncertainty is critical: OOD regression, Bayesian optimization and active learning
Maximillian Guo changed 3 years agoView mode Like Bookmark
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
# Bayesian Deep Learning and a Probabilistic Perspective of Generalization ###### tags: `papers`
Maximillian Guo changed 3 years agoView mode Like Bookmark
A Tutorial on Bayesian Optimization
# A Tutorial on Bayesian Optimization ###### tags: `papers`
Maximillian Guo changed 3 years agoView mode Like Bookmark
Hands-on Bayesian Neural Networks – A Tutorial for Deep Learning Users
# Hands-on Bayesian Neural Networks – A Tutorial for Deep Learning Users ###### tags: `papers`
Maximillian Guo changed 3 years agoView mode Like Bookmark
Why Do Better Loss Functions Lead to Less Transferable Features?
# Why Do Better Loss Functions Lead to Less Transferable Features? ###### tags: `papers`
Maximillian Guo changed 3 years agoView mode Like Bookmark