Concentration for sum of a random length sequence

Exercise 7.1 from Bandit Algorithms [1]

Let

(X_{t})_{t}

be a sequence of i.i.d. Gaussian random variables with mean

μ

and unit variance defined on the probability space

(Ω, F, P)

. Let

T : Ω \to N = {1, 2, 3, \dots}

be a random variable and let

\hat{μ} := \frac{1}{T} \sum_{t = 1}^{T} X_{t}

Show that if
$T$ and
$(X_{t})_{t}$ are independent, the for any
$δ \in (0, 1)$ ,

$P (\hat{μ} - μ \geq \sqrt{\frac{2 \log δ^{- 1}}{T}}) \leq δ$
Now drop the assumption that
$T$ and
$(X_{t})_{t}$ are independent. Let
$F_{t} := σ (X_{1}, \dots, X_{t})$ . For each
$δ \in (0, 1)$ find
$T$ such that
${T = t} \in F_{t}$ for all
$t = 1, 2, \dots$ and

$P (\hat{μ} - μ \geq \sqrt{\frac{2 \log δ^{- 1}}{T}}) = 1$
Show that in general we have

$P (\hat{μ} - μ \geq \sqrt{\frac{2 \log (T (T + 1) / δ)}{T}}) \leq δ$

Without loss of generality assume

μ = 0

. Recall that if

Y \in N (μ, σ^{2})

then for

y \geq 0

we have

\begin{array}{r} P (Y \geq y) \leq \exp (- \frac{y^{2}}{2 σ^{2}}) . \end{array}

Since

X_{i}

's are assumed to be independent standard normal, we have

\hat{μ} \sim N (μ, 1 / T)

. Analogous results can be obtained for subgaussian random variables using Hoeffding's inequality instead. We also remark that tighter bounds in (3) are possible, which require more sophisticated tools.

(1)

By independence we can write

\begin{aligned} P (\hat{μ} \geq \sqrt{\frac{2 \log δ^{- 1}}{T}}) & = \sum_{t = 1}^{\infty} P (\hat{μ} \geq \sqrt{\frac{2 \log δ^{- 1}}{T}} | T = t) P (T = t) \\ = \sum_{t = 1}^{\infty} P (\frac{1}{t} \sum_{i = 1}^{t} X_{t} \geq \sqrt{\frac{2 \log δ^{- 1}}{t}}) P (T = t) \\ \leq \sum_{t = 1}^{\infty} \exp (- \frac{\frac{2}{t} \log \frac{1}{δ}}{2 \cdot \frac{1}{t}}) P (T = t) = δ . \end{aligned}

(2)

For

ω \in Ω

define

\begin{array}{r} T (ω) := inf {n \geq 1 : \frac{1}{n} \sum_{t = 1}^{n} X_{t} (ω) \geq \sqrt{\frac{2 \log δ^{- 1}}{n}}} . \end{array}

Then

I {T = t}

is a function of

X_{1}, \dots, X_{t}

and hence is

F_{t}

-measurable. By the definition of

T

we have

\hat{μ} \geq \sqrt{2 (\log δ^{- 1}) / T}

on the subset

Ω

where

T

is defined. It then remains to check that

T

is well-defined, namely

T < \infty

almost surely.

By the law of iterated logarithm, we have for almost every

ω \in Ω

\begin{array}{r} \underset{n \to \infty}{lim sup} \frac{\sum_{t = 1}^{n} X_{t} (ω)}{\sqrt{2 n \log \log n}} = 1. \end{array}

Thus for every

ϵ \in (0, 1)

and

m \geq 1

there exists

n \geq m

such that

\begin{array}{r} \frac{\sum_{t = 1}^{n} X_{t} (ω)}{\sqrt{2 n \log \log n}} \geq 1 - ϵ, \end{array}

which implies

\begin{array}{r} \frac{\frac{1}{n} \sum_{t = 1}^{n} X_{t} (ω)}{\sqrt{(2 \log δ^{- 1}) / n}} \geq (1 - ϵ) \sqrt{\frac{\log \log n}{\log δ^{- 1}}}, \end{array}

which is greater than

1

for

m

sufficiently large. This means that the set in the definition of

T

is a nonempty subset of

N

, so

T < \infty

almost surely.

(3)

Since

{T = t}_{t \geq 1}

partitions the sample space, we have

\begin{aligned} P (\hat{μ} \geq \sqrt{\frac{2 \log (T (T + 1) / δ)}{T}}) & = \sum_{t = 1}^{\infty} P (\hat{μ} \geq \sqrt{\frac{2 \log (T (T + 1) / δ)}{T}} and T = t) \\ \leq \sum_{t = 1}^{\infty} P (\frac{1}{t} \sum_{i = 1}^{t} X_{i} \geq \sqrt{\frac{2 \log (t (t + 1) / δ)}{t}}) \\ \leq \sum_{t = 1}^{\infty} \exp (- \frac{\frac{2 \log (t (t + 1) / δ)}{t}}{2 \cdot \frac{1}{t}}) \\ = \sum_{t = 1}^{\infty} \frac{δ}{t (t + 1)} = δ . \end{aligned}

References

[1] Lattimore, Tor, and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020.

Concentration for sum of a random length sequence

Exercise 7.1 from Bandit Algorithms [1]

(1)

(2)

(3)

References

Read more

Spectral Radius vs Row/Col sum

Multi-arm Bandit (MAB) Problem

Stochastic Iteration Convergence

Monte Carlo Convergence for Episodic MDPs