Deep Kernel Learning

tags: papers

Deep Kernel Learning combines the representational power of
neural networks with the reliable uncertainty estimates of Gaussian processed by learning a flexible deep kernel function. The complexity is O(n) at train time and O(1) at test time compared to O(n^3) at train time and O(n^2) at test time for standard Gaussian processes.

One of the central critiques of Gaussian process regression is that it does not actually learn representations of the data. This is because the kernel function is specified and not flexible enough to learn representations of the data. We can solve this through deep kernel learning (DKL), which maps the inputs

xn to intermediate values
vnRQ

through a neural network
gϕ(·)
parameterized by weights and biases,
ϕ
.

These intermediate values are then used as inputs to the standard kernel resulting in the effective kernel

kDKL(x,x0)=k(gϕ(x),gϕ(x0)).

NLM is essentially Deep Kernel Learning with linear kernel

https://proceedings.mlr.press/v161/ober21a/ober21a.pdf

In this work, we make the following claims:

  • Using the marginal likelihood can lead to overfitting for DKL models.
  • This overfitting can actually be worse than the overfitting observed using standard maximum likelihood approaches for neural networks.
  • The marginal likelihood overfits by overcorrelating the datapoints, as it tries to correlate all the data, not just the points that should be correlated.
  • Stochastic minibatching can mitigate this overfitting, and helps to explain the success of DKL models in practice.
  • A fully Bayesian treatment of deep kernel learning can avoid overfitting and obtain the benefits of both neural networks and Gaussian processes