# UAI Rebuttal: Heterogeneous Variance BAI
## Reviewer 8pPs
We wanted to thank you for appreciating the paper and detailed comments. Our rebuttal is below. Please let us know if you have any additional questions.
**K Q1: Second inequality below (5) [Done]**
The proof of Lemma 1 is by contradiction. Specifically, if the claim was not true, there must be an under-pulled arm $k$ and an over-pulled arm $i$, as defined in (5). The second inequality below (5) follows from assuming that $i$ is over-pulled in the contradiction.
**K Q2: Complexity parameters $H$ and $H'$ in Section 5[Done]**
For GapE, we use $H$ in Theorem 1 of Gabillon et al. (2011). For GapE-V, we use $H$ in Theorem 2 of Gabillon et al. (2011). The complexity term $H'$ is not used in our methods. It only appears in their bounds. In our discussion, we meant $H' = \sum_{j \in \mathcal{A}} \sigma_j^2 / \Delta_{\min}^2$. This would correspond to Theorem 3 if it was stated as $\exp[-n / H']$.
**K Q3: We replace their $H$ with $H c' / c$ [Done]**
This indeed a typo. We replace their $H$ with $H c / c'$.
**A Q4 a): VBR performs poorly in Figure 5 [Done]**
VBR performs poorly because the budget $n$ is much smaller than in the synthetic experiments. Recall that VBR has $K$ stages and eliminates one arm per stage. Therefore, when $K = 32$ and $n \approx 100$, each arm in the first elimination stage has less than one observation on average. We report the missing VBR results for $n \in \{100, 200, 400, 600\}$ below:
```
Unif | 0.314 +/- 0.007 | 0.156 +/- 0.005 | 0.059 +/- 0.003 | 0.028 +/- 0.002
SH | 0.410 +/- 0.007 | 0.033 +/- 0.003 | 0.002 +/- 0.001 | 0.001 +/- 0.000
SHVar | 0.443 +/- 0.007 | 0.044 +/- 0.003 | 0.005 +/- 0.001 | 0.000 +/- 0.000
SHAdaVar | 0.425 +/- 0.007 | 0.034 +/- 0.003 | 0.004 +/- 0.001 | 0.001 +/- 0.000
GapE | 0.248 +/- 0.006 | 0.127 +/- 0.005 | 0.031 +/- 0.002 | 0.012 +/- 0.002
GapE-V | 0.248 +/- 0.006 | 0.120 +/- 0.005 | 0.038 +/- 0.003 | 0.016 +/- 0.002
VBR | 0.970 +/- 0.002 | 0.970 +/- 0.002 | 0.967 +/- 0.003 | 0.060 +/- 0.003
```
**A Q4 b): SH is comparable to our methods in Figure 5 [Done]**
The budget $n$ in this experiment is too small, relative to the hardness of the problem, to observe the benefit of variance adaptation in SHVar and SHAdaVar. See the results below with a higher budget $n$ where SHVar clearly outperforms SH.
**A Q4 c): Experiments with less than 1% of the movies in the MovieLens dataset [Done]**
We reran the above experiment with $64$ and $128$ movies.
For $K = 64$ movies and budget $n \in \{1600, 2000, 2400, 2800\}$, the results are:
```
Unif | 0.721 +/- 0.006 | 0.693 +/- 0.007 | 0.662 +/- 0.007 | 0.643 +/- 0.007
SH | 0.522 +/- 0.007 | 0.465 +/- 0.007 | 0.407 +/- 0.007 | 0.349 +/- 0.007
SHVar | 0.465 +/- 0.007 | 0.426 +/- 0.007 | 0.365 +/- 0.007 | 0.329 +/- 0.007
SHAdaVar | 0.523 +/- 0.007 | 0.472 +/- 0.007 | 0.420 +/- 0.007 | 0.379 +/- 0.007
GapE | 0.666 +/- 0.007 | 0.637 +/- 0.007 | 0.599 +/- 0.007 | 0.562 +/- 0.007
GapE-V | 0.675 +/- 0.007 | 0.623 +/- 0.007 | 0.602 +/- 0.007 | 0.556 +/- 0.007
VBR | 0.495 +/- 0.007 | 0.466 +/- 0.007 | 0.443 +/- 0.007 | 0.422 +/- 0.007
```
For $K = 128$ movies and budget $n \in \{2400, 3200, 4000, 4800\}$, the results are:
```
Unif | 0.851 +/- 0.005 | 0.823 +/- 0.005 | 0.805 +/- 0.006 | 0.775 +/- 0.006
SH | 0.639 +/- 0.007 | 0.568 +/- 0.007 | 0.509 +/- 0.007 | 0.464 +/- 0.007
SHVar | 0.616 +/- 0.007 | 0.541 +/- 0.007 | 0.484 +/- 0.007 | 0.417 +/- 0.007
SHAdaVar | 0.631 +/- 0.007 | 0.586 +/- 0.007 | 0.526 +/- 0.007 | 0.483 +/- 0.007
GapE | 0.803 +/- 0.006 | 0.760 +/- 0.006 | 0.732 +/- 0.006 | 0.706 +/- 0.006
GapE-V | 0.782 +/- 0.006 | 0.740 +/- 0.006 | 0.699 +/- 0.006 | 0.675 +/- 0.007
VBR | 0.647 +/- 0.007 | 0.592 +/- 0.007 | 0.568 +/- 0.007 | 0.540 +/- 0.007
```
While the performance of VBR improved relatively to all baselines, we outperform it as the budget increases.
**K Q5: SHVar and SHAdaVar require complexity parameter $H'$ [Done]**
See answer to Q2. The complexity parameter $H'$ is not an input to SHVar and SHAdaVar. They only need to know the budget $n$ and the number of arms $K$.
**B Q6: Strength of the baselines [Done]**
As observed in Karnin et al. (2013), and later confirmed in
* [Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization](https://jmlr.org/papers/v18/16-558.html)
SH is a superior algorithm in the fixed-budget setting because it aggressively eliminates a half of the remaining arms in each stage. This is why it is the strongest baseline in our setting as well.
**Additional questions**
**K Is Lemma 1 stated for integer $\lambda_{s, i}$? [Done]**
Yes. We will improve the wording of the claim to make this clear.
**B Relation to the G-optimal design**
We will introduce the G-optimal design properly to make this clear. In short, an optimal solution to the G-optimal design in a $K$-armed bandit is an allocation of arm pulls such that the errors in all mean reward estimates are the same. This is what our algorithms attempt to do by the end of each stage.
**K Definition of $\Delta_i$ in (2) [Done]**
It is $\Delta_i = \mu_1 - \mu_i$, where $\mu_i$ is the mean reward of arm $i$. This was omitted by mistake.
**B Confidence intervals in GapE and GapE-V [Done]**
A smaller complexity parameter $H$ in GapE and GapE-V makes their confidence intervals tighter but it comes at the cost of a very loose bound on the probability of misindentifying the optimal arm. To make their bound comparable to SHVar and SHAdaVar, we run GapE and GapE-V with a comparable complexity parameter $H c / c'$.
## Reviewer 7tJN
We wanted to thank you for appreciating the paper and detailed comments. Our rebuttal is below. Please let us know if you have any additional questions.
**B Q1: Theoretical advantage over Faella et al. (2020) [Done]**
SH of Karnin et al. (2013) has a comparable error bound to successive rejects of Audibert et al. (2010). For this reason, our variance-adaptive SH algorithms have comparable error bounds to variance-adaptive successive rejects of Faella et al. (2020). Roughly speaking, all bounds can be stated as $\exp[- n / H]$, where $H$ is a complexity parameter that depends on the number of arms $K$, their variances, and their gaps.
The main advantage in our work is in practical performance. As observed in Karnin et al. (2013), and later confirmed in
* [Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization](https://jmlr.org/papers/v18/16-558.html)
SH is a superior algorithm in the fixed-budget setting because it aggressively eliminates a half of the remaining arms in each stage. This is especially compelling when the budget is small.
**K Q2: Gaussian noise [Done, paper not changed]**
The Gaussian noise allows us to derive especially tight confidence intervals on unknown reward variances (Lemma 2). To alleviate the concern that our methods are limited to Gaussian noise only, we conduct non-Gaussian noise experiments in Section 5.2.
**B Comment 1: Comparison to Gabillon et al. (2011) and Faella et al. (2020) [Done]**
Gabillon et al. (2011) study both known and unknown variances. The main limitation of their analyses is that they are for algorithms that know the complexity parameter. Since this one is rarely known in practice, these algorithms cannot be run as analyzed. They also propose approximate algorithms that estimate the complexity parameter. These do not come with any guarantees.
Faella et al. (2020) study unknown variances. Their algorithm is the first that can be implemented as analyzed.
**K Comment 2: Definition of $\Delta_i$ in (2) [Done]**
It is $\Delta_i = \mu_1 - \mu_i$, where $\mu_i$ is the mean reward of arm $i$. This was omitted by mistake.
**K Comment 3: Using a tigher upper bound in (6) [Done, paper not changed]**
We want to obtain a UCB on variance and thus use the first inequality in Lemma 2. The second inequality can only be used to derive an LCB. The reason is that the roles of $\sigma_i^2$ and $\hat{\sigma}_{s, t, i}^2$ in the probabilities are reversed.
**A Comment 4: Hyperparameter of VBR in experiments [Done]**
We set the hyperparameter as $\gamma = 1.96$. In this case, the probability that the mean arm reward lies between the upper and lower confidence bounds is $0.95$. Faella et al. (2020) showed that VBR performs well with Gaussian noise when $\gamma \approx 2$.
**A Comment 5: "the algorithms" mean "GapE and GapE-V" [Done]**
Correct.
**A Comment 6: Experiment with actual ratings**
We reran the experiment in Figure 5 with actual ratings. The experiment is on movies with at least $50$ ratings. When the arm of the movie is pulled, one of its randomly-chosen ratings is returned as a reward. The results for $n \in \{100, 200, 400, 600\}$ are:
```
Unif | 0.468 +/- 0.007 | 0.351 +/- 0.007 | 0.272 +/- 0.006 | 0.218 +/- 0.006
SH | 0.518 +/- 0.007 | 0.223 +/- 0.006 | 0.152 +/- 0.005 | 0.121 +/- 0.005
SHVar | 0.517 +/- 0.007 | 0.234 +/- 0.006 | 0.147 +/- 0.005 | 0.117 +/- 0.005
SHAdaVar | 0.525 +/- 0.007 | 0.226 +/- 0.006 | 0.156 +/- 0.005 | 0.124 +/- 0.005
GapE | 0.398 +/- 0.007 | 0.309 +/- 0.007 | 0.215 +/- 0.006 | 0.182 +/- 0.005
GapE-V | 0.422 +/- 0.000 | 0.324 +/- 0.007 | 0.222 +/- 0.006 | 0.172 +/- 0.005
VBR | 0.965 +/- 0.003 | 0.968 +/- 0.002 | 0.968 +/- 0.002 | 0.753 +/- 0.006
```
We observe similar relative performance of the methods to Figure 5.
**K Minor comments:** We will take all of these into account in the next version.
* Minor comment: At the beginning of Section 3, is it better to add ''with homogeneous reward variances'' after ''A near-optimal solution for fixed-budget BAI''? [Done]
* Minor comment: In the proof of Lemma 1, the second right after eq. (5) could be explained more clearly. For example, the first of eq. (5) implies that , so . [Done]
* Minor comment: In Lemma 2, you may change the notation as that denotes the number of stages in the algorithms. [Done]
* Minor comment: Is it possible to remove Section 4.2 and its related proof? It looks to me that Theorems 3 and 5 are the main results. [Done, paper not changed]
* Minor comment: At the end of the first paragraph of Section 5, do you mean ''closely related works''? [Done]
* Minor comment: In the second equation in Appendix A, you may use and to avoid confusion with the fixed. [Done]
## Reviewer f5mi
We wanted to thank you for appreciating the paper and detailed comments. Our rebuttal is below. Please let us know if you have any additional questions.
**B Q1: Theoretical comparison with previous work [Done]**
SH of Karnin et al. (2013) has a comparable error bound to successive rejects of Audibert et al. (2010). For this reason, our variance-adaptive SH algorithms have comparable error bounds to variance-adaptive successive rejects of Faella et al. (2020). Roughly speaking, all bounds can be stated as $\exp[- n / H]$, where $H$ is a complexity parameter that depends on the number of arms $K$, their variances, and their gaps.
Our error bounds are also comparable to Gabillon et al. (2011). The main limitation of their analyses is that they are for algorithms that know the complexity parameter. Since this one is rarely known in practice, these algorithms cannot be run as analyzed. They also propose approximate algorithms that estimate the complexity parameter. These do not come with any guarantees.
The main advantage in our work is in practical performance. As observed in Karnin et al. (2013), and later confirmed in
* [Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization](https://jmlr.org/papers/v18/16-558.html)
SH is a superior algorithm in the fixed-budget setting because it aggressively eliminates a half of the remaining arms in each stage. This is especially compelling when the budget is small.
**B Q2: Reuse existing results on G-optimal designs**
We will add more discussion on this topic. In short, most works on optimal designs assume that the variance is known and thus do not apply to our setting.
**B Q3: Beyond Gaussian concentration [Done]**
This is a great point. For instance, a very general method for concentration of random variables with an unknown variance are empirical Bernstein bounds
* [Empirical Bernstein Bounds and Sample Variance Penalization](https://arxiv.org/abs/0907.3740)
In fact, this approach was taken by Gabillon et al. (2011). Our initial experiments showed that the Gaussian assumption gives us quick concentration and is quite robust to misspecification. We will discuss this in more detail in the revised paper.