[1-Pager] Empirical Evaluation of Type-II MAP

# [1-Pager] Empirical Evaluation of Type-II MAP --- Gaussian Processes (GPs) are a powerful non-parametric framework for preforming Bayesian inference by leveraging canonical properties of the Multivariate Normal distribution. However, before preforming inference, practitioners need to specify the **kernel/covariance function** and its **hyperparameters $\theta$** which has been denoted as the **model selection problem** (Rasmussen and Williams, 2006). ## The Model --- Suppose we have data $\mathcal{D}:= \{\textbf{x}_i, y_i\}_{i = 1}^n$. We consider $x_i \in \mathbb{R}^d, y_i \in \mathbb{R}, \forall i$ We are interested in estimating a regression function $\eta(x)$ and model it using a Gaussian Process prior distribution. This is our following noisy GP model: $$ y_i = \eta(x_i) + \epsilon_i, i = 1, ..., n \\ \epsilon_i \sim \mathcal{N}(0, \sigma_n^2) \\ \eta(\cdot) \sim GP(\mu(\cdot), K_{\theta}(\cdot, \cdot)) $$ We assume that $\eta(\cdot)$ is independent of $\sigma_n^2$ and $\epsilon_i, \forall i.$ Notice $K_{\theta}$ is the covariance matrix constructed by a kernel function depending on hyperparameters $\theta$. ## Type-II ML Method --- $$\theta_{MLE} = \arg \max_{\theta} l(\theta) = \arg \max_{\theta} \log p(y|x, \theta)$$ We can then run our favourite multivariate optimization algorithm to search for $\theta_{MLE}$. ## Challenges of Type-II ML Method --- However, in the finite regime, Type-II ML suffers from optimization difficulty, overfitting to the training data, and overconfident uncertainty intervals. This is noted empirically by [Mohammed and Cawley, 2017] (overfitting), [Warnes and Ripley, 1987] (optimization difficulty), and theoretically and empirically by [Karvonen at al. 2020] (overconfident uncertainty quantification). ## Type-II MAP Method --- Type-II MAP finds $\theta_{MAP} = \arg \max_{\theta} p(y | x, \theta) p(\theta)$ ## Fully Bayesian Approach --- For a fully bayesian approach we use approximate posterior sampling schemes to sample $\theta$ from: $$p(\theta | x, y) \propto p(y| x, \theta)p(\theta)$$