[1-Pager] Challenges of Type-II ML Estimator for Gaussian Processes Brainstorm - 4/2/2021

# [1-Pager] Challenges of Type-II ML Estimator for Gaussian Processes Brainstorm - 4/2/2021 ## Overview Topics include: * What is the model selection problem for GPs? * Type-II ML Method and derivation * Consistency in Infinite Regime * Lack of Properties in Finite Regime * Challenges of Type-II ML and Approaches To Solve Them ## Motivation and Problem * Gaussian Processes (GPs) are a powerful non-parametric framework for preforming Bayesian inference by leveraging canonical properties of the Multivariate Normal distribution. * Before preforming inference, practitioners need to specify the kernel or covariance function and its corresponding hyperparameters. * The correct choice of the kernel hyperparameters and the noise variance parameter directly influences the effectiveness of noisy Gaussian Process inference model. * Due to its closed, tractable form and straightforward approach, maximizing the marginal likelihood, Type-II ML has become the classic approach for model selection. * However, despite being widely-adopted, Type-II ML, has a handful of detriments. * We categorize and review the various challenges of Type-II ML and approaches that attempt to address these issues. ## Model Setup Suppose we have data $\mathcal{D}:= \{\textbf{x}_i, y_i\}_{i = 1}^n$. We consider $x_i \in \mathbb{R}^d, y_i \in \mathbb{R}, \forall i$. We are interested in estimating a regression function $\eta(x)$ and model it using a Gaussian Process prior distribution. This is our following \textit{noisy GP model}: \begin{equation} \begin{split} y_i = \eta(x_i) + \epsilon_i, i = 1, ..., n \\ \epsilon_i \sim \mathcal{N}(0, \sigma_n^2) \\ \eta(\cdot) \sim GP(\mu(\cdot), K_{\theta}(\cdot, \cdot)) \end{split} \end{equation} We assume that $\eta(\cdot)$ is independent of $\sigma_n^2$ and $\epsilon_i, \forall i.$ Notice $K_{\theta}$ is the covariance matrix constructed by a kernel function $k_{\theta}(x, x')$ depending on hyperparameters $\theta$. We are then able to preform inference by leveraging conditioning properties of multi-variate Gaussians. Our main objective is choosing $\theta$. ## Challenges of Type-II ML In the finite data regime, Type-II ML suffers from: * Overfitting to the training data * Overconfident uncertainty quantification * Optimization Difficulty