# [4/22/2022] Expert DAG Update
---
## Simple Experiment
We want to test the likelihood value against the horseshoe prior value. We generate $N=100$ data points from $f \sim GP(w_1X + w_0, K_{\theta}(X, X))$. Our hypothesis is that fitting GP with a fixed linear mean will achieve higher posterior value than a GP with zero mean.
### Toy Data

True Parameters: $\tau= 1.0, \lambda=0.25, l = 1.0, \sigma_n^2 = 0.0001$.
Ground Truth Type-II MLE Value: 285.0523
Ground Truth Prior Value: -2.894
### Optimizing Using MAP
Using the Adam optimizer, 20,000 iterations with learning rate $\gamma = 0.01.$
| | Zero Mean GP | Linear Mean GP | Ground Truth |
| :--- | :----: | :----: | :----: |
|Parameter Values |$\sigma_f$, $l =$ 3.979, 1.559 | $\sigma_f$, $l =$ 0.337, 1.097| $\sigma_f$, $l =$ 0.25, 1.0|
|Type-II MLE |272.257 | 285.530 | **285.052**|
|Horseshoe Prior Value|-8.279|-3.463| **-2.894**|
|Posterior Value|263.978|282.067| **282.158**|
### Optimizing Using Horseshoe Prior and VI
With fixed linear mean and fixed length-scale, we only optimized for the $\sigma_f = \tau\lambda$ parameter.
Using the Adam optimizer, 20,000 iterations with learning rate $\gamma = 0.001.$
| |Zero Mean GP | Linear Mean GP | Ground Truth |
| :--- | :----: | :----: | :----: |
|Parameter Values |$\sigma_f = 2.528$ | $\sigma_f = 0.287$| $\sigma_f = 0.25$|
|Type-II MLE|264.237 | **285.244** | 285.052|
|Horseshoe Prior Value|-23.082 |**-0.755** |-2.894|
|Posterior| 241.155| **284.489** | 282.158|
| ELBO |284.816 | **300.207**| - |
## Zero-ing Out One Kernel Experiment
---
We generate $N=100$ data points from $f \sim GP(w_1X + w_0, K_{\theta}(X, X))$. We want to test if our model is a GP consisting of sum of kernels, i.e. $f = f_0 + f_1 \sim GP(w_1X + w_0, K_{\theta_0}(X, X)) + GP(0, K_{\theta_1}(X, X))$, do we zero out one of the kernels?
### Toy Data

True Parameters: $\tau= 1.0, \lambda=1.0, l = 1.0, \sigma_n^2 = 0.0001$.
### Optimizing Using Horseshoe Prior and VI
With fixed linear mean and fixed length-scale, we only optimized for the $\sigma_f = \tau\lambda$ parameter.
Using the Adam optimizer, 20,000 iterations with learning rate $\gamma = 0.001.$
| | GP Sum Model | Ground Truth |
| :--- | :----: | :----: |
|Parameter Values |$\sigma_f^{0} = 1.134, \sigma_f^{1} = 5.928e-04$ | $\sigma_f^{0} = 1, \sigma_f^{1} = 0$|
|Type-II MLE|266.623 | 272.449 |
| ELBO |298.818 | - |
#### Yay we recover the ground truth!
### Optimizing Mean Weights and Horseshoe Prior with VI
Along with horseshoe parameters, we also learn the linear mean weights, but with fixed length-scale.
Using the Adam optimizer, 25,000 iterations with learning rate $\gamma = 0.001.$
| | GP Sum Model | Ground Truth |
| :--- | :----: | :----: |
|Parameter Values |$\sigma_f^{0} = 0.9824, \sigma_f^{1} = 0.2288, \\ w_1=1.7497, w_0=0.4031$ | $\sigma_f^{0} = 1, \sigma_f^{1} = 0, \\ w_1=2.0, w_0=1.0$|
|Type-II MLE| - | - |
| ELBO | 303.765 | - |
#### We did not recover the ground truth. This makes me think that it could be a problem with learning the mean weights along with doing VI on the horseshoe parameters.