# Related Works on Consistency of Type-II MLE --- ## [Ying 1991] Asymptotic Consistency only for Ornstein-Uhlenbeck processes * proves asymptotic consistency only for Ornstein-Uhlenbeck processes. * Ornstein-Uhlenbeck processes are stationary Gaussian Processes that are also Markovian Processes with Ornstein-Uhlenbeck kernel: $$ k_{OU}(x, x') = \sigma^2 \exp \bigg \{ \frac{-|x - x'|}{l}\bigg \} $$ ## [Teckentrup, 2020](https://arxiv.org/pdf/1909.00232.pdf) Convergence of GPR posterior consistency agnostic to hyperparameter estimation method * "The definition of estimated parameter values $\hat\theta_N$ is open and analysis does not reqiure any assumptions on how these estimates are computed." * mild assumption that $\theta$ is bounded away from 0 and $\infty$. * Anna's Question: If this paper proved under mild conditions that any method would guarntee posterior consistency then is our original problem or improving MLE still a problem? * [41, 21, 35] highlight drawbacks of fully Bayesian inference and instead poits to "approximate solutions to governing equations such as data likelihood" ## [Karvonen, 2020](https://arxiv.org/pdf/2001.10965.pdf) Asymptotic Behavior of Single Signal Parameter for the Kernel Function * Considers kernel $K^{\sigma}(x, y):= \sigma^2k(x, y)$ where $\sigma > 0$ in a noiseless GP model. * their analysis only goes to one parameter because of the nice closed form expression: $$\sigma_{ML} := \arg \max_{\sigma > 0} L(\sigma | D) = \sqrt\frac{f_x^TK_X^{-1}f_x}{N}$$ * Asymptotic overconfidence means that the width of the credible set decays faster than the true approximation * Conversely for underconfidence * Anna's complaint: they show asymptotic overconfidence and asymptotic underconfidence and NOT consistency, but why is this even helpful? ## [Knapik et. al, 2016] General Consistency for Empirical Bayes Methods * proves that likelihood-based empirical Bayes methods is consistent under the following conditions: assume we observe a sequence of noisy coefficients $Y = (Y_1, Y_2, ...)$ that satisfy: $$ Y_i = \kappa_i\mu_i + \frac{1}{\sqrt{n}}Z_i, i = 1, 2,... $$ where $Z_1, Z_2, ... \overset{\text{iid}}{\sim} \mathcal{N}(0, 1)$, $\mu = (\mu_1, \mu_2, ...) \in l^2$ is the infinite-dimensional parameter of interest, and $(\kappa_i)$ is a known sequence that may converge to 0 as $i \rightarrow \infty$. The authors consider a family ($\prod_{\alpha} : \alpha > 0$) of Gaussian priors for the parameter $\mu$. $\alpha$ is seen as a hyperparameter. For tractability, the priors $\prod_{\alpha}$ are assumed to put independent, Gaussian prior weights on the coefficients $\mu_i$. Their work is confied to a single hyperparameter describing the regulairty of the Gaussian prior. Them. 1 and 2 show convergence rates of the empirical estimator in probability and use this to deduce that the empirical posterior on $\alpha$ contracts around the ground truth at an optimal rate. Please see [Link Here](https://link.springer.com/content/pdf/10.1007/s00440-015-0619-7.pdf).