NLM-uncertainty-parametrization

@NLM

Research subgroup led by Weiwei working on NLM's, especially uncertainty, parametrization, etc.

Public team

Community (0)
No community contribution yet

Joined on Jan 29, 2022

  • Important Links Papers Drive Resource Links: http://cs229.stanford.edu/section/cs229-gaussian_processes.pdf https://gregorygundersen.com/blog/2019/12/23/random-fourier-features/#a1-gaussian-kernel-derivation 5/9
     Like  Bookmark
  • general Question we are trying to answer covariance structure size of data with respect to width MAP training (determinant) claim that only a small number of bases fit the data? Motivation Current literature on NLM
     Like  Bookmark
  • Overarching research questions What is a good basis? BUT - this is dependent on where the data is EX1. consider any basis that is infinitely differentiable in a region, then if we consider points close enough, we'll obtain piecewise linear and will not obtain good uncertainties nor fit (we think) if this is true, this tells us that we need the in-distribution and out-of-distribution areas of the data in order to meaningfully ask this question This leads us to...
     Like  Bookmark
  • Main Ideas: Mysterious why NN’s can generalize well with so many more parameters than data points Effective dimensionality (ED) measures the dimensionality of the parameter space determined by the data Relates ED to posterior contraction in Bayesian deep learning, model selection, width-depth tradeoffs, double descent, and functional diversity in loss surfaces. ED compares favorably to alternative norm- and flatness-based generalization measures. Effective Dimensionality (ED)
     Like  Bookmark
  • Lectures UC-Berkeley 20 min primer on Meta Learning https://www.youtube.com/watch?v=h7qyQeXKxZE Papers Meta-Learning in Neural Networks: A Survey Notes Hypothesis: Good basis functions should transfer well to different tasks, but data-specific basis functions might not transfer well.
     Like  Bookmark
  • Deep Kernel Learning combines the representational power of neural networks with the reliable uncertainty estimates of Gaussian processed by learning a flexible deep kernel function. The complexity is O(n) at train time and O(1) at test time compared to O(n^3) at train time and O(n^2) at test time for standard Gaussian processes. One of the central critiques of Gaussian process regression is that it does not actually learn representations of the data. This is because the kernel function is specified and not flexible enough to learn representations of the data. We can solve this through deep kernel learning (DKL), which maps the inputs $x_n$ to intermediate values $v_n \in \mathbb{R}^Q$ through a neural network $g_\phi(·)$ parameterized by weights and biases, $\phi$. These intermediate values are then used as inputs to the standard kernel resulting in the effective kernel $k_{DKL}(x, x_0) = k(g_\phi(x), g_\phi(x_0))$. NLM is essentially Deep Kernel Learning with linear kernel
     Like  Bookmark
  • Paper main idea: new measurement of generalization; understand dropout’s effectiveness in improving generalization New measurement proposed by the paper: weight expansion. With larger weight volume, one can achieve increased generalization in a PAC-Bayesian setting. Application: Apply weight expansions to dropout. Theoretically and empirically examine that the application of dropout during training “expands” the weight volume. Definition: weight volume- normalized determinant of the weight covariance matrix intuitively, the more correlated they are, the smaller the weight volume and worse generalization ability More orthogonal -> larger weight volume
     Like  Bookmark
  • (Put terminology here and link sources!) Precision Matrix: Inverse of the covariance matrix
     Like  Bookmark
  • 1/31/22: Todo: replicate experiments. sample from the prior and get the same function Main Ideas: NLM’s are deep Bayesian models, produce predictive uncertainties, then perform Bayesian linear regression over these features. Few works have methodically evaluated the predictive uncertainties of NLMs Traditional training procedures for NLM’s underestimate uncertainty on OOD Identify underlying reasons
     Like  Bookmark
  •  Like  Bookmark
  • Idea: Adding functional priors and using fVI (note: I saw fPOVI in another paper), we get BNN-like uncertainty using NLMs Questions/Technicalities: Prior choice is hard? Why use an uninformative prior when we have other methods Brings complexity up - Quadratric -> Cubic in worse case. Probably should not proceed with this approach.
     Like  Bookmark
  • GPs scale cubically with the number of observations -> NLM scales linearly, making Bayes Opt easier while maintaining flexibility and uncertainty cubically in the basis function dimensionality, instead of growing with the number of observations as in GP Related applications: Applications in reinforcement learning (see Riquelme et al., 2018 and Azizzadenesheli and Anandkumar, 2019 https://arxiv.org/abs/1802.09127, https://arxiv.org/abs/1802.04412), active learning, AutoML (Zhou and Precioso, 2019 https://arxiv.org/abs/1904.00577) Todo (lucy): understand the math?
     Like  Bookmark
  • Paper main ideas: Three variations of neural linear regression MAP NL (first train the neural network using MAP estimation, then outputs of the last hidden layer of the network used as features for BLN. Use Bayesian opt for tuning) uncertainty is only added with the last layer, MAP training does not learn with the goal of uncertainty quantification in mind Regularized NL: learn the features by optimizing the tractable marginal likelihood with respect to the network weights (previous to the output layer), Bayesian noise NL
     Like  Bookmark
  • MAIN IDEAS: There is a limited understanding of the effects of depth and width on learned representations How does varying depth and width affect model hidden representations? Characteristic block structure in hidden representations of larger capacity models Implies model capacity is large relative to the size of the training set Implies underlying layers preserving and propagating dominant principal component of their representations
     Like  Bookmark
  • MAIN IDEAS: Kernel machines and NN’s possess universal function approximation properties But ways of choosing the appropriate function class differ NN’s learn representation by adapting their basis functions to the data Kernel methods use a basis not adapted during training Contrast random features of approximated kernel machines with learned features of NN’s
     Like  Bookmark
  • MAIN IDEAS: Approximate inference techniques for weight space priors of BNN’s suffer from several drawbacks The ‘Bayesian last layer’ (BLL) is an alternative BNN approach that learns the feature space for an exact Bayesian linear model with explicit predictive distributions. Its predictions outside of the data distribution (OOD) are typically overconfident Overcome this weakness by introducing a functional prior on the model’s derivatives This method enhances the BLL to Gaussian process-like performance on tasks where calibrated uncertainty is critical: OOD regression, Bayesian optimization and active learning
     Like  Bookmark
  • # Bayesian Deep Learning and a Probabilistic Perspective of Generalization ###### tags: `papers`
     Like  Bookmark
  • # A Tutorial on Bayesian Optimization ###### tags: `papers`
     Like  Bookmark
  • # Hands-on Bayesian Neural Networks – A Tutorial for Deep Learning Users ###### tags: `papers`
     Like  Bookmark
  • # Why Do Better Loss Functions Lead to Less Transferable Features? ###### tags: `papers`
     Like  Bookmark