# Concentration for sum of a random length sequence
:::info
## Exercise 7.1 from Bandit Algorithms [1]
Let $(X_t)_t$ be a sequence of i.i.d. Gaussian random variables with mean $\mu$ and unit variance defined on the probability space $(\Omega,\mathcal{F}, \mathbb{P})$. Let $T:\Omega\to \mathbb{N}=\{1,2,3,\ldots\}$ be a random variable and let $\hat{\mu} := \frac{1}{T} \sum_{t=1}^T X_t$.
1. Show that if $T$ and $(X_t)_t$ are independent, the for any $\delta \in (0,1)$,
$$ \mathbb{P} \left( \hat{\mu} -\mu \geq \sqrt{\frac{2\log \delta^{-1}}{T}} \right) \leq \delta
$$
2. Now drop the assumption that $T$ and $(X_t)_t$ are independent. Let $\mathcal{F}_t:= \sigma(X_1,\cdots, X_t)$. For each $\delta\in (0,1)$ find $T$ such that $\{T=t\}\in \mathcal{F}_t$ for all $t=1,2,\ldots$ and
$$ \mathbb{P} \left( \hat{\mu} -\mu\geq \sqrt{\frac{2\log \delta^{-1}}{T}} \right) =1
$$
3. Show that in general we have
$$ \mathbb{P} \left( \hat{\mu} -\mu\geq \sqrt{\frac{2\log (T(T+1)/\delta)}{T}} \right) \leq \delta
$$
:::
Without loss of generality assume $\mu=0$. Recall that if $Y\in \mathcal{N}(\mu, \sigma^2)$ then for $y\geq 0$ we have
\begin{align*}
\mathbb{P}(Y\geq y)\leq \exp\left(-\frac{y^2}{2\sigma^2}\right).
\end{align*} Since $X_i$'s are assumed to be independent standard normal, we have $\hat{\mu}\sim \mathcal{N}(\mu, 1/T)$. Analogous results can be obtained for subgaussian random variables using Hoeffding's inequality instead. We also remark that tighter bounds in (3) are possible, which require more sophisticated tools.
## (1)
By independence we can write
\begin{align*}
\mathbb{P} \left( \hat{\mu} \geq \sqrt{\frac{2\log \delta^{-1}}{T}} \right) &= \sum_{t=1}^\infty \mathbb{P} \left( \hat{\mu} \geq \sqrt{\frac{2\log \delta^{-1}}{T}} \Bigg\vert T=t \right) \mathbb{P}(T=t) \\
& = \sum_{t=1}^\infty \mathbb{P} \left( \frac1t \sum_{i=1}^t X_t \geq \sqrt{\frac{2\log \delta^{-1}}{t}} \right) \mathbb{P}(T=t) \\
& \leq \sum_{t=1}^\infty \exp\left(-\dfrac{\frac2t \log \frac1\delta}{2\cdot\frac1t} \right)\mathbb{P}(T=t) = \delta.
\end{align*}
## (2)
For $\omega\in\Omega$ define
\begin{align}
T(\omega):= \inf \left\{ n\geq 1 : \frac1n \sum_{t=1}^n X_t(\omega) \geq \sqrt{\frac{2\log\delta^{-1}}{n} }\right\}.
\end{align} Then $\mathbb{I}\{T=t\}$ is a function of $X_1,\ldots, X_t$ and hence is $\mathcal{F}_t$-measurable. By the definition of $T$ we have $\hat{\mu}\geq \sqrt{2 (\log \delta^{-1})/T}$ on the subset $\Omega$ where $T$ is defined. It then remains to check that $T$ is well-defined, namely $T<\infty$ almost surely.
By the law of iterated logarithm, we have for almost every $\omega\in\Omega$,
\begin{align}
\limsup_{n\to\infty} \frac{\sum_{t=1}^n X_t(\omega)}{\sqrt{2n\log\log n}}=1.
\end{align}
Thus for every $\epsilon \in (0,1)$ and $m\geq 1$ there exists $n\geq m$ such that
\begin{align}
\frac{\sum_{t=1}^n X_t(\omega)}{\sqrt{2n\log\log n}}\geq 1-\epsilon,
\end{align} which implies
\begin{align}
\frac{\frac1n \sum_{t=1}^n X_t(\omega)}{\sqrt{(2\log\delta^{-1})/n}}\geq (1-\epsilon)\sqrt{\frac{\log\log n}{\log \delta^{-1}}},
\end{align} which is greater than $1$ for $m$ sufficiently large. This means that the set in the definition of $T$ is a nonempty subset of $\mathbb{N}$, so $T<\infty$ almost surely.
## (3)
Since $\{T=t\}_{t\geq 1}$ partitions the sample space, we have
\begin{align*}
\mathbb{P} \left( \hat{\mu} \geq \sqrt{\frac{2\log (T(T+1)/\delta)}{T}} \right) & = \sum_{t=1}^\infty \mathbb{P} \left( \hat{\mu} \geq \sqrt{\frac{2\log (T(T+1)/\delta)}{T}} \text{ and } T=t \right) \\
& \leq \sum_{t=1}^\infty \mathbb{P} \left( \frac1t\sum_{i=1}^t X_i\geq \sqrt{\frac{2\log (t(t+1)/\delta)}{t}} \right) \\
& \leq \sum_{t=1}^\infty \exp\left(-\dfrac{\frac{2\log (t(t+1)/\delta)}{t}}{2\cdot \frac1t}\right) \\
& = \sum_{t=1}^\infty \frac{\delta}{t(t+1)}=\delta.
\end{align*}
# References
[1] Lattimore, Tor, and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020.