non identifiability

--- title: Heteroscedastic Preferential BO --- ## Solving the non-identifiability problem Current work suffers from non-identifiability problem, i.e., there are more than one function which satisfy the preference relation. In our work, we assume the user can provide a set of anchors. Let us assume that the provided anchors is equipped with some measurement results. We leverage the anchors for model (hyperparameters) selection. The goal is to match the magnitude of our GP to the anchors measurement value $Y_0 = \{y_0^{(\ell)}\}_{\ell = 0}^L$. The presence of $Y_0$ confines the magnitude of our GP, hence solving the non-identifiability problem. Mathematically, we can write the model selection criterion so-called the anchor's marginal likelihood as follows \begin{align} \log p(Y_0 \vert X_0, \theta) = \log \int p(Y_0 \vert \mathbf{X}_0, f) \, p(f \vert \theta) \, df \end{align} Later, we maximize the above criterion w.r.t. the GP hyperparameters. Intuitively, the expert's initial latent function is guided by the anchor's measurement. Combining this criterion with the conventional log marginal likelihood gives us \begin{align} \theta^\ast = \max_{\theta \in \Theta} \beta \log p(\mathbf{v}_m < 0 \vert \theta) + (1 - \beta) \log p(Y_0 \vert \mathbf{X}_0, \theta) \end{align} with $\beta > 0$.     ## Gaussian process model Since we assume the anchors are equipped with the measurements, we are now able to utilize $(X_0, Y_0)$ to build our noise estimator $g$. Recall that we follow most likely heteroscedastic gaussian process - Given the anchors $(X_0, Y_0)$, we estimate a standard, homoscedastic GP $\hat{f}$, maximizing the likelihood $Y_0$ from $X_0$. - Given $\hat{f}$, we estimate the empirical noise level for the anchors, i.e., $Z_0 = \{ z_\ell = \log \mathbb{V}[y_i, \hat{f}(x_i, X_0)] \}_{\ell \leq L}$ forming a new dataset $(X_0, Z_0)$ - On $(X_0, Z_0)$, we estimate a second GP $g$ ## The anchors It is evident that storing a large number of anchors is not feasible due to the computational time it would consume. This becomes particularly crucial when considering a scenario where the expert updates the anchors mid-iteration. To mitigate the issue of excessive computational load, the expert must replace one of the existing anchors with the new candidate. Intuitively, the expert is willing to replace the anchors when the new candidate looks more promising, i.e., has a better chance to be the optimum of the objective function. If the expert keeps updating the anchors, eventually the anchors will turn into a set of optima candidates. In bandit optimization there are three common methods to deal with non-stationary environment - restarting - sliding window - weighted penalty ## Acquisition function The anchor points play a crucial role in guiding the behavior of the noise. My hypothesis is that we need to adjust the magnitude of the noise as we progress through a long iteration. An interesting idea could involve gradually reducing the noise as we grant the surrogate model more influence. Ultimately, our objective is to ensure that the predictive model exhibits homoscedastic noise, with the informed anchor points having a diminishing impact on predictions over the long term. #### Simple model $\varepsilon_i \sim \mathcal{N}(0, \phi(x)^2)$ with $\phi(x) = 1 + \sum_{\ell \leq L} (w_\ell - 1) \, k_\ell(d_\ell(x, x_0^\ell))$ \begin{align} \mathrm{ANPEI}(x) = \rho \, \mathrm{EI}(x; \phi(x)^{\frac{\gamma}{t}}) - (1 - \rho) \, \phi(x)^{\frac{\gamma}{t}} \end{align}