{%hackmd sp5ar-O1ShKngS_QxPI1WQ %} [toc] # 關於基本的材料 這邊紀錄一下需要加入的文章內容. 目標:讓觀眾熟悉本文章需要用到的概念與工具 --- * 上偏差與下偏差 * 基本定義 * 與Hypothesis testing的關係 --- * ## Upper and Lower deviation We assume that the distribution of rewards $\reward_{\action}$ of action $\action$ satisfy the following moment conditions. There exists a \textit{CGF-like} function $\phiAct$ on the $\mathbb{R}$ such that, for all $\lambda \geq 0$, the cgf of \textit{upper deviation} satisfies \begin{equation} \ln \mathbb{E} e^{\lambda(\reward_{\action}-\mathbb{E}[\reward_{\action}])} \leq \phiAct^{+}(\lambda) \end{equation} and the cgf of \textit{lower deviation}, for all $\lambda \geq 0$, \begin{equation} \ln \mathbb{E} e^{\lambda(\mathbb{E}[\reward_{\action}]-\reward_{\action})} \leq \phiAct^{-}(\lambda) \end{equation} For example, when $\reward_{\action} \in[0,1]$ one can take $\phiAct^{+}(\lambda)=\lambda^{2}/8$ and $\phiAct^{-}(\lambda)=\lambda^{2}/8 .$ In this case is known as Hoeffding's lemma. For another case $\psi_{\action}(\lambda)=\frac{\lambda^2}{2(1-c)}$, this is known as Bernstein inequality for sub-exponential distribution. See these examples with more details in Section \ref{}. ### 與標準MGF的連結? People write $M_{X}(\lambda)\equiv E[\exp(\lambda X)]$ for the moment generating function of a random variable $X$ whose expectation is zero ($E[X] = 0$). Then, for $\lambda \ge 0$, we have $$\psi_{X}^{+}(\lambda)=\ln M(\lambda)~\text{ and }~\psi_{X}^{-}(\lambda)=\ln M(-\lambda),$$ that is, the right hand side of MGF function manage the \textit{upper concentration} of $X$, while the left hand side manage the \textit{lower concentration} of $X$. In particular, $$ P(X-E[X]>t)\le \exp(-\psi_{X}^{+,*}(t)) ~\text{ and }~ P(X-E[X]< -t)\le \exp(-\psi_{X}^{-,*}(t)), $$ where $\psi_{X}^{+,*}(\cdot)$ and $\psi_{X}^{-,*}(\cdot)$ are the Legendre transformation of $\psi_{X}^{+}(\cdot)$ and $\psi_{X}^{-}(\cdot)$. #### Normal random variable. Let $X$ be a centered normal random variable with variance $\sigma^{2} .$ Then $$ \psi_{X}(\lambda)=\frac{\lambda^{2} \sigma^{2}}{2} \quad \text { and } \quad \lambda_{t}=\frac{t}{\sigma^{2}} $$ and therefore, for every $t>0$ $$ \psi_{X}^{*}(t)=\frac{t^{2}}{2 \sigma^{2}} $$ Hence, Chernoff s inequality implies, for all $t>0$, $$ \boldsymbol{P}\{X \geq t\} \leq e^{-t^{2} /\left(2 \sigma^{2}\right)} $$ Chernoff's inequality appears to be quite sharp in this case. In fact, one can show that it cannot be improved uniformly by more than a factor of $1 / 2$ (see Exercise 2.7). #### Bernoulli 這個部分很tricky。 ##### Concentration 書的例子 這邊主要做上偏差: Z = Ber(p) - p Bernoulli random variables $\quad$ In our third principal example, let $Y$ be a Bernoulli random variable with probability of success $p,$ that is, $P\{Y=1\}=1-P\{Y=0\}=p .$ Denote by $Z=Y-p$ the centered version of $Y .$ If $0<t<1-p,$ we have $$ \psi_{Z}(\lambda)=\log \left(p e^{\lambda}+1-p\right)-\lambda p \quad \text { and } \quad \lambda_{t}=\log \frac{(1-p)}{p}\frac{p+t}{1-(p+t)} $$ and therefore, for every $t \in(0,1-p)$, $$ \psi_{Z}^{*}(t)=(1-p-t) \log \frac{1-p-t}{1-p}+(p+t) \log \frac{p+t}{p} = D(\text{Ber}(p+t)\|\text{Ber}(p)) $$ Equivalently, setting $a=t+p$ for every $a \in(p, 1)$ $$ \psi_{Z}^{*}(t)=h_{p}(a) \stackrel{\text { def }}{=}(1-a) \log \frac{1-a}{1-p}+a \log \frac{a}{p} $$ We note here that $h_{p}(a)$ is just the Kullback-Leibler divergence $D\left(P_{a} \| P_{p}\right)$ between a Bernoulli distribution $P_{a}$ of parameter $a$ and a Bernoulli distribution $P_{p}$ of parameter $p$ (see Chapter 4 for the definition). > 對於Ber(p),高估最多在$(0, 1-p)$,低估最多在$(0, p)$. --- - 我們也需要做「下偏差」$Z^{-} = p - Ber(p)$. 當作思考題,今天做一下。 Set $X = p - \text{Ber}(p)$, we have $E[X]=0$ and \begin{align} \exp(\lambda X) =& \exp(\lambda p)\exp(-\lambda \text{Ber}(p)) \\ E[\exp(\lambda X)]=& \exp(\lambda p)[p\exp(-\lambda)+(1-p)]\\ \psi_{X-E[X]}(\lambda)=&\lambda p + \ln(p\exp(-\lambda)+(1-p)) \end{align} 結果是: $$ \psi_{Z}^{-}(\lambda)=\log \left(p e^{-\lambda}+1-p\right)+\lambda p \quad \text { and } \quad \lambda_{t}=\log(\frac{p}{1-p}\frac{1-(p-t)}{p-t}) $$ and therefore, for every $t \in(0,1-p)$, $$ \psi_{Z}^{*}(t)=(1-(p-t)) \log \frac{1-(p-t)}{1-p}+(p-t) \log \frac{p-t}{p} = D(\text{Ber}(p-t)\|\text{Ber}(p)) $$ --- ##### Howard paper的例子 Example 2 (Coin flipping). Suppose $X_{i} \stackrel{\text { iid }}{\sim} \operatorname{Ber}(p),$ and let $S_{t}=\sum_{i=1}^{t}\left(X_{i}-p\right)$ denote the centered sum. - 正確的mean是$p$ The CGF of each increment of $S_{t},$ scaled by $1 /[p(1-p)]$ is $\psi_{B}(\lambda):=[p(1-p)]^{-1} \log \mathbb{E} \exp \left\{\lambda\left(X_{i}-p\right)\right\}=[p(1-p)]^{-1} \log \left(p e^{(1-p) \lambda}+(1-\right.$ $p) e^{-p \lambda}$, so that $\lambda_{\max }=\infty$ and $\bar{b}=1 / p$. - 會落於sub-Bernoulli的martingale的範疇 One may directly check the martingale property to confirm that $L_{t}(\lambda):=\exp \left\{\lambda S_{t}-\psi_{B}(\lambda) p(1-p) t\right\}$ is a martingale for any $\lambda,$ so that $\left(S_{t}\right)$ is $1-\mathrm{sub}-\psi_{B}$ with $V_{t}=p(1-p) t .$ Then, for any $t_{0} \in \mathbb{N}$ and $x \in\left(0,(1-p) t_{0}\right),$ setting $m=p(1-p) t_{0}$ in Theorem $1(\mathrm{~b})$ yields $$ \begin{aligned} \mathbb{P}(\exists t&\left.\in \mathbb{N}: S_{t} \geq x+p(1-p) \mathfrak{s}_{B}\left(\frac{x}{p(1-p) t_{0}}\right) \cdot\left(t-t_{0}\right)\right) \\ & \leq \exp \left\{-t_{0} \mathrm{KL}\left(p+\frac{x}{t_{0}} \| p\right)\right\} \\ &=\left[\left(\frac{p}{p+x / t_{0}}\right)^{p+x / t_{0}}\left(\frac{1-p}{1-p-x / t_{0}}\right)^{1-p-x / t_{0}}\right]^{t_{0}} \end{aligned} $$ Here KL denotes the Bernoulli Kullback-Leibler divergence, $$ \mathrm{KL}(q \| p)=q \log \left(\frac{q}{p}\right)+(1-q) \log \left(\frac{1-q}{1-p}\right) $$ --- ### 小結論 在$H_0: p_a \le p$的case下,假設$Z \sim \text{Ber}(p)$後可以得到 高估不會高估超過(0,1-p): For $t \in (0,1-p), \psi_{Z}^{*,+}(t) = D(\text{Ber}(p+t)\|\text{Ber}(p))$ 低估不會低估超過(0,p): For $t \in (0,p), \psi_{Z}^{*,-}(t) = D(\text{Ber}(p-t)\|\text{Ber}(p))$ 也就是說,這邊的 $t$ 要看成是 GAP: --- ### 補充資料:更多關於cgf的條件. ## Nonparametric Hypothesis Testing ## Decision Rule ## Type I & II error ## Chernoff index of a decision rule