---
title: Regret bound heteroscedastic
---
### Background:GP-UCB
Theorem 1: Let $\delta \in (0, 1)$ and $\beta_t = 2 \log(\vert D \vert t^2 \pi^2 / 6 \delta)$. Running GP-UCB with $\beta_t$ for a sample $f$ of a GP with mean function zero and covariance function $k(x, x^\prime)$, we obtain a regret bound of $\mathcal{O}^\ast(\sqrt{T \gamma_T \log(\vert D \vert)})$ with high probability, precisely
\begin{align}
\mathrm{Pr}(\mathrm{R}_T \leq \sqrt{C_1 T \beta_T \gamma_T } \forall T) \geq 1 - \delta
\end{align}
where $C_1 = 8 / \log (1 + \sigma^{-2})$
Proof:
Lemma 5.1: Pick $\delta \in (0, 1)$ and set $\beta_t = 2 \log(\vert D \vert \pi_t / \delta)$, where $\sum_{t \geq 1} \pi_{t - 1}^{- 1} = 1$, $\pi_t > 0$. Then, $\vert f(x) - \mu_{t - 1}(x) \vert \leq \beta_t^{1/2} \sigma_{t - 1}(x) \forall x \in D \, \forall t \geq 1$ holds with probability $\geq 1 - \delta$
Theorem 2: Let $D \subset [0, r]^d$ be compact and convex, $d \in \mathbb{N}, r>0$. Suppose that the kernel $k(x, x^\prime)$ satisfies the following high probability bound on the derivatives of GP sample paths: $f$: for some constants $a, b > 0$
\begin{align}
p(\sup_{x \in D} \vert \partial f / \partial x_j \vert > L) \leq a \exp(-(L/b)^2), \quad j = 1, \cdots, d
\end{align}
Pick $\delta \in (0, 1)$ and define
\begin{align}
\beta_t = 2 \log(t^2 2\pi^2 / 3\delta) + 2d \log(t^2 dbr \sqrt{\log(4da / \delta)})
\end{align}
Running GP-UCB with $\beta_t$ for a sample $f$ of a GP with mean function zero and covariance function $k(x, x^\prime)$, we obtain a regret bound $\mathcal{O}^\ast(\sqrt{dT \gamma_T})$ with high probability. Precisely, with $C_1 = 8 / \log (1 + \sigma^{-2})$ we have
\begin{align}
p(R_T \leq \sqrt{C_1 T \beta_T \gamma_T} + 2 \forall T \geq 1) \geq 1 - \delta
\end{align}
### Background: IDS
Sub-Gaussian properties
\begin{align}
\forall \lambda \in \mathbb{R}, \quad \mathbb{E}[\exp(\lambda \epsilon_t) \vert \mathcal{F}_{t - 1}, x_t] \leq \exp(\frac{\lambda^2 \rho_t^2}{2})
\end{align}
frequentist information gain. Given the sequence $(I_t)_{t \geq 1}$ of non-negative, continuous function $I_t: (\mathcal{X} \times \mathbb{R})^{t - 1} \times \mathcal{X} \rightarrow \mathbb{R}_{\geq 0}$ and think $I_t(x) = I_t(x_1, y_1, \cdots, x_{t - 1}, y_{t - 1}, x)$ as the information gain of evaluating $x \in \mathcal{X}$ at time $t$. For a given policy $\pi$, we define the maximum information gain at time $T$:
\begin{align}
\gamma_t = \mathrm{ess sup}_{\mathcal{F}_t} \sum_{t = 1}^T I_t(x_t)
\end{align}
regret information ratio at time $t$
\begin{align}
\Psi_t : \mathcal{P}(\mathcal{X}) \rightarrow \mathbb{R}_{\geq 0}, \quad \mu \rightarrow \frac{\mathbb{E}\mu[\Delta(x) \vert \mathcal{F}_{t - 1}]^2}{\mathbb{E}\mu[I_t(x) \vert \mathcal{F}_{t - 1}]}
\end{align}
Theorem 1 (Regret bound for Randomized Policies)
Let $S = \max_{x \in \mathcal{X}} \Delta(x)$ and Let $\Psi_t, \gamma_t$ as defined above. Then for any policy, with the probability at least $1 - \delta$ at any time $T \geq 1$, it holds that
\begin{align}
R_t \leq 5/4 \sqrt{\sum_{t = 1}^T \Psi_t(2 \gamma_T + 4 \gamma_T \log (2/\delta) + 8 \gamma_T \log(4 \gamma_T) + 1)} + 4S \log(\frac{8 \pi^2 T^2}{3 \delta} (\log(T) + 1) )
\end{align}
if also $I_t(x_t) \leq 1$ holds for all $t \geq 1$, then with probability at least $1 - \delta$, at any time $T \geq 1$
\begin{align}
R_t \leq 5/4 \sqrt{\sum_{t = 1}^T \Psi_t(2 \gamma_T + 4 \log (2/\delta) + 8 \log(4 ) + 1)} + 4S \log(\frac{8 \pi^2 T^2}{3 \delta} (\log(T) + 1) )
\end{align}
Theorem 2 Let the sequence $(x_t)_{t = 1}^T$ be generated by any deterministic policy. Then, at any time $T \geq 1$, the regret is bounded by $R_T \leq \sqrt{\sum_{t = 1}^T \Psi_t \gamma_T}$
### Background: Risk-averse BO