# [2/17/2022] Edge Correction in Bayesian Networks With An Expert Prior
## Anna Wants Feedback On
* **[Choice of Horseshoe Prior]** I am currently using a reparameterized version of the horseshoe prior that uses an InvGamma distribution. Sometimes the prior isn't strong enough and the algorithm chooses the wrong structure rather than the data generating structure.
* **[Objective Surface is multimodal]** Optimizing the objective is very difficult because the landscape is multimodal with many local maximia.
## Definitions
A **Bayesian network** or **directed graphical model** (Pearl, 1988) is a type of probabilistic graphical model where directionality of edges account for independence between nodes. Each node must specify a **conditional probability distribution (CPD)** $p(x_i | \textbf{x}_{\pi(i)})$ which relates node $x_i$ to parents $\pi(i)$.
## Problem Statement
Suppose we are given the true graph topology and distribution on edges from a domain expert, we aim to:
**1) correctly identify misspecified edges <mark> <- Anna is on this part of the research problem.** </mark>
2) make the minimal number of changes to correct the graph
## A Simple Example
We have a Bayesian network whose true graph is a DAG with all linear edges. Namely, each edge's distribution is a Normal distribution with a linear mean function and some variance. We assume only $Y$ is unobserved and aim to learn the parameters of each CDF.
Suppose we have the following Markov Chain as the ground truth model:

and the following true CPD (conditional probability density) for each node:
* $f_{Y} = w_1X + w_0 + \epsilon$
* $f_{Z}: w_3Y^2 + w_2 + \epsilon$ <mark> <- Expert misspecifies this edge as linear and we want to identify it. </mark>
where $\epsilon \sim \mathcal{N}(0, 1)$.
*The expert gives us the same graph but with $f_Z$ misspecified as linear.*
## Approach
For all edges $e \in G(V, E)$, we assign it the following distribution:
$$
y_e := x_e^T\beta + \epsilon_n + GP(0, K_{\theta}(\cdot, \cdot))
$$
$$
\implies y_e := x_e^T\beta + GP(0, K_{\theta}(\cdot, \cdot) + \sigma_n^2I)
$$
$$
\implies y_e := GP(x_e^T\beta, K_{\theta}(\cdot, \cdot) + \sigma_n^2I)
$$
$$
\implies y_e := GP(0, K_{\theta}^{(LIN)}(\cdot, \cdot) + K_{\theta}^{(SE)}(\cdot, \cdot) + \sigma_n^2I)
$$
$$
\implies y_e := GP(0, K_{\theta}^{(LIN)}(x_e, x_e) + K_{\theta}^{(SE)}(x_e, x_e) + \sigma_n^2I)
$$
where $GP(0, K_{\theta}(\cdot, \cdot))$ denotes a 0 mean Gaussian Process (GP) with kernel function $K$ and hyperparameters $\theta$, and $x_e, y_e$ are the input and output of edge $e$.
### Main Intuition
The added GP term will capture the extra noise that the linear component failed to capture and that will be identified by checking the variance hyperparameter of the kernel function.
### Strategies For Optimizing with Latent Nodes
We use a reparameterization trick with a reparameterized horseshoe prior which favors the expert graph.