{%hackmd sp5ar-O1ShKngS_QxPI1WQ %}
[toc]
# 關於基本的材料
這邊紀錄一下需要加入的文章內容.
目標:讓觀眾熟悉本文章需要用到的概念與工具
---
* 上偏差與下偏差
* 基本定義
* 與Hypothesis testing的關係
---
*
## Upper and Lower deviation
We assume that the distribution of rewards $\reward_{\action}$ of action $\action$ satisfy the following moment conditions. There exists a \textit{CGF-like} function $\phiAct$ on the $\mathbb{R}$ such that, for all $\lambda \geq 0$, the cgf of \textit{upper deviation} satisfies
\begin{equation}
\ln \mathbb{E} e^{\lambda(\reward_{\action}-\mathbb{E}[\reward_{\action}])} \leq \phiAct^{+}(\lambda)
\end{equation}
and the cgf of \textit{lower deviation}, for all $\lambda \geq 0$,
\begin{equation}
\ln \mathbb{E} e^{\lambda(\mathbb{E}[\reward_{\action}]-\reward_{\action})} \leq \phiAct^{-}(\lambda)
\end{equation}
For example, when $\reward_{\action} \in[0,1]$ one can take $\phiAct^{+}(\lambda)=\lambda^{2}/8$ and $\phiAct^{-}(\lambda)=\lambda^{2}/8 .$ In this case is known as Hoeffding's lemma. For another case $\psi_{\action}(\lambda)=\frac{\lambda^2}{2(1-c)}$, this is known as Bernstein inequality for sub-exponential distribution. See these examples with more details in Section \ref{}.
### 與標準MGF的連結?
People write $M_{X}(\lambda)\equiv E[\exp(\lambda X)]$ for the moment generating function of a random variable $X$ whose expectation is zero ($E[X] = 0$). Then, for $\lambda \ge 0$, we have $$\psi_{X}^{+}(\lambda)=\ln M(\lambda)~\text{ and }~\psi_{X}^{-}(\lambda)=\ln M(-\lambda),$$
that is, the right hand side of MGF function manage the \textit{upper concentration} of $X$, while the left hand side manage the \textit{lower concentration} of $X$. In particular,
$$
P(X-E[X]>t)\le \exp(-\psi_{X}^{+,*}(t)) ~\text{ and }~
P(X-E[X]< -t)\le \exp(-\psi_{X}^{-,*}(t)),
$$
where $\psi_{X}^{+,*}(\cdot)$ and $\psi_{X}^{-,*}(\cdot)$ are the Legendre transformation of $\psi_{X}^{+}(\cdot)$ and $\psi_{X}^{-}(\cdot)$.
#### Normal random variable.
Let $X$ be a centered normal random variable with variance $\sigma^{2} .$ Then
$$
\psi_{X}(\lambda)=\frac{\lambda^{2} \sigma^{2}}{2} \quad \text { and } \quad \lambda_{t}=\frac{t}{\sigma^{2}}
$$
and therefore, for every $t>0$
$$
\psi_{X}^{*}(t)=\frac{t^{2}}{2 \sigma^{2}}
$$
Hence, Chernoff s inequality implies, for all $t>0$,
$$
\boldsymbol{P}\{X \geq t\} \leq e^{-t^{2} /\left(2 \sigma^{2}\right)}
$$
Chernoff's inequality appears to be quite sharp in this case. In fact, one can show that it cannot be improved uniformly by more than a factor of $1 / 2$ (see Exercise 2.7).
#### Bernoulli
這個部分很tricky。
##### Concentration 書的例子
這邊主要做上偏差: Z = Ber(p) - p
Bernoulli random variables $\quad$ In our third principal example, let $Y$ be a Bernoulli random variable with probability of success $p,$ that is, $P\{Y=1\}=1-P\{Y=0\}=p .$ Denote by $Z=Y-p$ the centered version of $Y .$ If $0<t<1-p,$ we have
$$
\psi_{Z}(\lambda)=\log \left(p e^{\lambda}+1-p\right)-\lambda p \quad \text { and } \quad \lambda_{t}=\log \frac{(1-p)}{p}\frac{p+t}{1-(p+t)}
$$
and therefore, for every $t \in(0,1-p)$,
$$
\psi_{Z}^{*}(t)=(1-p-t) \log \frac{1-p-t}{1-p}+(p+t) \log \frac{p+t}{p}
= D(\text{Ber}(p+t)\|\text{Ber}(p))
$$
Equivalently, setting $a=t+p$ for every $a \in(p, 1)$
$$
\psi_{Z}^{*}(t)=h_{p}(a) \stackrel{\text { def }}{=}(1-a) \log \frac{1-a}{1-p}+a \log \frac{a}{p}
$$
We note here that $h_{p}(a)$ is just the Kullback-Leibler divergence $D\left(P_{a} \| P_{p}\right)$ between a Bernoulli distribution $P_{a}$ of parameter $a$ and a Bernoulli distribution $P_{p}$ of parameter $p$ (see Chapter 4 for the definition).
> 對於Ber(p),高估最多在$(0, 1-p)$,低估最多在$(0, p)$.
---
- 我們也需要做「下偏差」$Z^{-} = p - Ber(p)$. 當作思考題,今天做一下。
Set $X = p - \text{Ber}(p)$, we have $E[X]=0$ and
\begin{align}
\exp(\lambda X) =& \exp(\lambda p)\exp(-\lambda \text{Ber}(p)) \\
E[\exp(\lambda X)]=&
\exp(\lambda p)[p\exp(-\lambda)+(1-p)]\\
\psi_{X-E[X]}(\lambda)=&\lambda p + \ln(p\exp(-\lambda)+(1-p))
\end{align}
結果是:
$$
\psi_{Z}^{-}(\lambda)=\log \left(p e^{-\lambda}+1-p\right)+\lambda p \quad
\text { and } \quad
\lambda_{t}=\log(\frac{p}{1-p}\frac{1-(p-t)}{p-t})
$$
and therefore, for every $t \in(0,1-p)$,
$$
\psi_{Z}^{*}(t)=(1-(p-t)) \log \frac{1-(p-t)}{1-p}+(p-t) \log \frac{p-t}{p}
= D(\text{Ber}(p-t)\|\text{Ber}(p))
$$
---
##### Howard paper的例子
Example 2 (Coin flipping).
Suppose $X_{i} \stackrel{\text { iid }}{\sim} \operatorname{Ber}(p),$ and let $S_{t}=\sum_{i=1}^{t}\left(X_{i}-p\right)$ denote the centered sum.
- 正確的mean是$p$
The CGF of each increment of $S_{t},$ scaled by $1 /[p(1-p)]$ is $\psi_{B}(\lambda):=[p(1-p)]^{-1} \log \mathbb{E} \exp \left\{\lambda\left(X_{i}-p\right)\right\}=[p(1-p)]^{-1} \log \left(p e^{(1-p) \lambda}+(1-\right.$
$p) e^{-p \lambda}$, so that $\lambda_{\max }=\infty$ and $\bar{b}=1 / p$.
- 會落於sub-Bernoulli的martingale的範疇
One may directly check the martingale property to confirm that $L_{t}(\lambda):=\exp \left\{\lambda S_{t}-\psi_{B}(\lambda) p(1-p) t\right\}$ is a martingale
for any $\lambda,$ so that $\left(S_{t}\right)$ is $1-\mathrm{sub}-\psi_{B}$ with $V_{t}=p(1-p) t .$ Then, for any $t_{0} \in \mathbb{N}$ and $x \in\left(0,(1-p) t_{0}\right),$ setting $m=p(1-p) t_{0}$ in Theorem $1(\mathrm{~b})$ yields
$$
\begin{aligned}
\mathbb{P}(\exists t&\left.\in \mathbb{N}: S_{t} \geq x+p(1-p) \mathfrak{s}_{B}\left(\frac{x}{p(1-p) t_{0}}\right) \cdot\left(t-t_{0}\right)\right) \\
& \leq \exp \left\{-t_{0} \mathrm{KL}\left(p+\frac{x}{t_{0}} \| p\right)\right\} \\
&=\left[\left(\frac{p}{p+x / t_{0}}\right)^{p+x / t_{0}}\left(\frac{1-p}{1-p-x / t_{0}}\right)^{1-p-x / t_{0}}\right]^{t_{0}}
\end{aligned}
$$
Here KL denotes the Bernoulli Kullback-Leibler divergence,
$$
\mathrm{KL}(q \| p)=q \log \left(\frac{q}{p}\right)+(1-q) \log \left(\frac{1-q}{1-p}\right)
$$
---
### 小結論
在$H_0: p_a \le p$的case下,假設$Z \sim \text{Ber}(p)$後可以得到
高估不會高估超過(0,1-p): For $t \in (0,1-p), \psi_{Z}^{*,+}(t)
= D(\text{Ber}(p+t)\|\text{Ber}(p))$
低估不會低估超過(0,p): For $t \in (0,p), \psi_{Z}^{*,-}(t)
= D(\text{Ber}(p-t)\|\text{Ber}(p))$
也就是說,這邊的 $t$ 要看成是 GAP:
---
### 補充資料:更多關於cgf的條件.
## Nonparametric Hypothesis Testing
## Decision Rule
## Type I & II error
## Chernoff index of a decision rule