[4/22/2022] Expert DAG Update

# [4/22/2022] Expert DAG Update --- ## Simple Experiment We want to test the likelihood value against the horseshoe prior value. We generate $N=100$ data points from $f \sim GP(w_1X + w_0, K_{\theta}(X, X))$. Our hypothesis is that fitting GP with a fixed linear mean will achieve higher posterior value than a GP with zero mean. ### Toy Data ![](https://i.imgur.com/TZUZ9sz.png) True Parameters: $\tau= 1.0, \lambda=0.25, l = 1.0, \sigma_n^2 = 0.0001$. Ground Truth Type-II MLE Value: 285.0523 Ground Truth Prior Value: -2.894 ### Optimizing Using MAP Using the Adam optimizer, 20,000 iterations with learning rate $\gamma = 0.01.$ | | Zero Mean GP | Linear Mean GP | Ground Truth | | :--- | :----: | :----: | :----: | |Parameter Values |$\sigma_f$, $l =$ 3.979, 1.559 | $\sigma_f$, $l =$ 0.337, 1.097| $\sigma_f$, $l =$ 0.25, 1.0| |Type-II MLE |272.257 | 285.530 | **285.052**| |Horseshoe Prior Value|-8.279|-3.463| **-2.894**| |Posterior Value|263.978|282.067| **282.158**| ### Optimizing Using Horseshoe Prior and VI With fixed linear mean and fixed length-scale, we only optimized for the $\sigma_f = \tau\lambda$ parameter. Using the Adam optimizer, 20,000 iterations with learning rate $\gamma = 0.001.$ | |Zero Mean GP | Linear Mean GP | Ground Truth | | :--- | :----: | :----: | :----: | |Parameter Values |$\sigma_f = 2.528$ | $\sigma_f = 0.287$| $\sigma_f = 0.25$| |Type-II MLE|264.237 | **285.244** | 285.052| |Horseshoe Prior Value|-23.082 |**-0.755** |-2.894| |Posterior| 241.155| **284.489** | 282.158| | ELBO |284.816 | **300.207**| - | ## Zero-ing Out One Kernel Experiment --- We generate $N=100$ data points from $f \sim GP(w_1X + w_0, K_{\theta}(X, X))$. We want to test if our model is a GP consisting of sum of kernels, i.e. $f = f_0 + f_1 \sim GP(w_1X + w_0, K_{\theta_0}(X, X)) + GP(0, K_{\theta_1}(X, X))$, do we zero out one of the kernels? ### Toy Data ![](https://i.imgur.com/ATqT4zA.png) True Parameters: $\tau= 1.0, \lambda=1.0, l = 1.0, \sigma_n^2 = 0.0001$. ### Optimizing Using Horseshoe Prior and VI With fixed linear mean and fixed length-scale, we only optimized for the $\sigma_f = \tau\lambda$ parameter. Using the Adam optimizer, 20,000 iterations with learning rate $\gamma = 0.001.$ | | GP Sum Model | Ground Truth | | :--- | :----: | :----: | |Parameter Values |$\sigma_f^{0} = 1.134, \sigma_f^{1} = 5.928e-04$ | $\sigma_f^{0} = 1, \sigma_f^{1} = 0$| |Type-II MLE|266.623 | 272.449 | | ELBO |298.818 | - | #### Yay we recover the ground truth! ### Optimizing Mean Weights and Horseshoe Prior with VI Along with horseshoe parameters, we also learn the linear mean weights, but with fixed length-scale. Using the Adam optimizer, 25,000 iterations with learning rate $\gamma = 0.001.$ | | GP Sum Model | Ground Truth | | :--- | :----: | :----: | |Parameter Values |$\sigma_f^{0} = 0.9824, \sigma_f^{1} = 0.2288, \\ w_1=1.7497, w_0=0.4031$ | $\sigma_f^{0} = 1, \sigma_f^{1} = 0, \\ w_1=2.0, w_0=1.0$| |Type-II MLE| - | - | | ELBO | 303.765 | - | #### We did not recover the ground truth. This makes me think that it could be a problem with learning the mean weights along with doing VI on the horseshoe parameters.