# 機率論&數理統計不熟整理
###### tags: `統研所考試`、`機率數統`
---
## 基本機率
### 排列
>[!NOTE]
>在n個相異物品中抽m個排序
$P^n_m=\frac{n!}{m!}$
### 相異球放相異盒問題
m個相異球放入n個相異空盒,沒有空盒的數目($D_n$)為:
$D_n=n^m+C^n_1(n-1)^m+C^n_2(n-2)^m+....+(-1)^{n-1}C^n_{n-1}\times 1^m$
### 相異物重新排列問題
有n件相異物排成一列,現重新排列,沒有物件在原來位置的排列數目($D_n$)為:
$D_n=n![1-\frac{1}{1!}+\frac{1}{2!}-\frac{1}{3!}+...+(-1)^n\frac{1}{n!}$
## 用到的微積分
### $({1-\lambda \over n })^n$≈ $e^{-\lambda}$
### 泰勒展開
#### souce:662
The nth Taylor polynomial at $x=a$ of $f(x)$ is:
$p_n(x)$$=f(a)+{f^′(a)\over 1!}(x-a)+{f^{′′}(a)\over 2!}(x-a)^2+...{f^{(n)}(a)\over n!}(x-a)^n$
- 若a=0,則稱為Maclauriu多項式,<u>也是e的指數函數定義。</u>
- $e^\lambda = \sum\limits _{x=0}^\infty \frac {\lambda ^x}{x!}$
## 函數
### Gamma function
- Def: $Γ(α)=\int _0 ^\infty x^{α-1}e^{-x}dx$
- 性質:
- $Γ(n)=(n-1)!$, if n is postive integer.
- $Γ(\frac {1}{2})=\root \of {\pi}$
- $Γ(α)=Γ(α-1)(α-1)$, or $Γ(α+1)=Γ(α)(α)$
- $Γ(α+n)=Γ(α)(α)(α+1)(α+1)...(α+n-1)$
### Bata function
- Def: $B(a,b)=\int _0^1 x^{α-1}(1-x)^{b-1}dx$
- 性質:
- $B(a,b)= \frac {Γ(a)Γ(b)}{Γ(a+b)}$
- $B(a,b)=B(b,a)$
### 條件機率期望值算法
#### souce: intro.DP192
$E(Y|X)=\sum_y yf(y|x)$ , if X and Y are discrete.
$E(Y|X)=\int_{-\infty}^{\infty} yf(y|x)\mathrm{d} x$ , if X and Y are continuous.
### E(X)、Var(X)、Cov(X)其他性質
1. $E[E(Y|X)]=E(Y)$
2. $Var(Y|X)=E{[Y-E(Y|x)]^2|x}=E(Y^2|x)-[E(Y|x)]^2$
- $Var(X)=E[(X-E(X)]^2]=E(X^2)-[E(X)]^2$
3. $Cov(X,Y)=E[(X-\mu_x)(Y-\mu_x)]=\sum (x_i-\mu_x)(y_i-\mu_y)$
$=E(XY)-E(X)E(Y)=\sum x_iy_i-n\mu_x\mu_y$(?)
$=Var(X)+2Cov(X,Y)+Var(Y)$
:::danger
:face_with_head_bandage: 常忘
5. $Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)$
:::
:::info
:warning: 獨立的話,$Cov(X,Y)=0$
:::
6. $Var(X)=E[Var(X||Y)]+VAR[E(X|Y)]$
- 口號:組內變異+組間變異=總變異(相當於$SST=SSR+SSE$)
### $S^2$ 相關推導
#### 服從卡方分配原因
有用的 formula:
$\sum (X_i - \mu)^2 = \sum(X_i - \bar x)^2 + n(\bar X - \mu)^2$
$\frac{\sum (X_i - \mu)^2}{\sigma^2} = \frac {\sum(X_i - \bar x)^2}{\sigma^2} + \frac{n(\bar X - \mu)^2}{\sigma^2}$
又$\frac {n(\bar X - \mu)^2}{\sigma^2}~\chi^2(1)$ 所以:
- $\frac {\sum \limits_{i=1}^n( X_i-\mu)^2}{\sigma^2}~\chi^2(n)$
- $\frac{\sum \limits_{i=1}^n(X_i-\bar X)^2}{\sigma^2}~ \chi^2(n-1)$
### 不等式
#### Markov's inequlity
Let $Y$ ba nonegative random variable, then:
$P(Y>a)\le \frac {E(Y)}{a}$ for any constant $a>0$
:::info
:paperclip: proof:
for a>0 let I =1 if $x\ge a$, 0 otherwise
since $x\ge a$, $I\le a$
taking expectations of I, we get:
$E(I)\le \frac{E(X)}{a}$
:::
#### Chebyshev's inquality
寫法不知道為什麼很多,整理一下:
###### source: 提綱P154
設$X$為任意隨機變數,其期望值為$\mu$,變異數為$σ^2$,則$\forall k>0$,
$P(|X-\mu|)<kσ)\ge 1- \frac {1}{k^2}$ 且$P(|X-\mu| \ge kσ)\le \frac {1}{k^2}$

###### source: 陳鄰安講義DP14
$P(|X_n-X|\ge ε)\le \frac {E(X_n-X)^2}{ε^2}$ or $P(|X_n - \mu|\ge kσ)\le \frac {1}{k^2}$
##### Two-sided Chebyshev's inquality
###### source: P322
Let $X$ ba random variable, $\mu$ as expected value and $σ$ as standard deviation, then:
$P(|X-\mu|\ge c) \le \frac {σ^2}{c^2}$ for any constant $c>0$
:::info
:paperclip: apply Markov's inequality with $(x-\mu)^2>0$
$P[(x-\mu)^2\ge k^2]\le \frac{E[(x-\mu)^2]}{k^2}$
since $(x-\mu)^2$$\ge k^2 \to |x-\mu| \ge k$,
$P(|x-\mu|\ge k) \le \frac{E[(x-\mu)^2]}{k^2}= \frac{\sigma^2}{k^2}$
:::
##### One-sided Chebyshev's inquality
###### source: P322
For any constant $c>0$,
$P(X \ge \mu +c)\le \frac {σ^2}{σ^2+c^2}$ and $P(X \ge \mu -c)\le \frac {σ^2}{σ^2+c^2}$
:::info
白話:某數與$E(X)$的差大於c的機率。
:information_source: 其實one-sided 與 two-sided表達的事情都一樣,只是tow-sided是考慮平均值1到平均值2的差;one-sided則是從變化後的平均值2下去做判斷。
:::
:::info
:paperclip: let X is a random variable with mean 0 and $\sigma^2< \infty$, for all a>0, let b>0
$X\ge a \to X+b \ge a+b$
Hence,
$P(X \ge a)=P(X+b \ge a+b) \ge P[(X +b)^2 \ge (a+b)^2]$
apply Markov's inequality
$P(X\ge a)\le \frac{E[(X+b)^2]}{(a+b)^2} = \frac {\sigma^2+b^2}{(a+b)^2}$$=\frac{\sigma^2+(\sigma^2/a)}{(a+ (\sigma^2/a))^2}$
:::
#### Chernoff Bound(basic)
Let $X$ ba random variable which M_X(t) is existed.
$P(X\ge c)\le min_{t>0}[e^{-ct}M_X(t)]$
4. 不知名
$E[(Y-g(x))^2]\ge E[(Y-E(Y|X))^2]$
#### Jensen' s inequality
If $f(x)$ is a convex function, then
$E[f(X)] \ge f(E[X])$
::: info
:paperclip:
by Taylor expension:
$f(x) \ge f(c)+f'(c)(x-c)$, where c is a constant.
take $c= E(x)$,
$f(x)\ge f[E(x)] + f'[E(x)][x-E(x)]$
take $E(X)$ to both side,
$E[f(x)] \ge E[f(E(x))] +0 = g[E(x)]$, $\because g[E(x)]$ is a number already.
:::
# Convergence concepts
:::info
:popcorn: 「有XX必定有OO」的食物鏈排列:
$a.s \to p \to d \to bouded in prob.$
:ballot_box_with_check: $a.s 、conv. in prob.、 conv. in dis.$三個皆可用的 continuous mapping theorem:
if $g:R \to R$ is continuous function and $X_n \to X$, then $g(X_n)\to X$
:::
## 中央極限定理CLT
### souce: P184
If $X_1,X_2$ are *independent random variables*, each having *the same probability distributions function* with excepted value $\mu$ and standard deviation $\sigma$, then
$\lim\limits_{n\to \infty}P({X_1+...+X_n-n\mu \over{\sigma \sqrt n})≤x})$$=\phi(x)$ for all x
簡單來說,CLT的意思是,不管什麼分配,只要樣本數夠大,就能化為常態分配!
For example:
$X_i\,\overset{}{~}\,BIN(n,p), i=1,2...,n$
We known that each of X has $E(X)=np, Var(X)=npq$, by CLT:
$\sum \limits_{i=1}^{n}X_i\,\overset{i.i.d}{~}\,N(np,npq)$
## continuity correction
為什麼需要它?因為有些分布是離散型,常態分布是連續型,轉換時會有誤差,需要各加0.5來彌補。
for example:
$X~Bin(1000, 0.3)$,find $P(47<X<52)\xrightarrow{ continuity correction}P(46.5<X<52.5)$
## Almost sure convergence(a.k.a 強大數法則)
### source: stat.infer DP260
A squence of random variables, $X_1, X_2, ...$, converges almost surely to a random variable $X$ if, for every $ε>0$,
$\lim\limits_{n\to \infty} P(|X_n-X| < ε)=1$, i.e. $P([{w:\lim\limits_{n\to \infty} X_n(w)=X(w)}])=1$
which means that <u>if the function $X_n(s)$ converge to $X(s)$ for all $s \in S$(Sample Space), then $X_n$ converge to $X$</u>.
:::info
:information_source: X為從sample space to 實數空間S$\to$R的函數。
:information_desk_person:強大數法則暗示了:
1. 在S所屬空間內要收斂。
2. $P([R_1,R_2] \in S)=1$,$[R_1,R_2]$適應設在實數空間裡的範圍。
:point_up: 與 converge in propability的差異
- 主要透過sequence $X$進行判斷。
- 強大數法則的條件較嚴格。
:::
## Convergence in Probability
### source: stat.infer DP258
>白話: $X_n$與$X$的變異程度,高過誤差的機率在樣本數夠大時為0
A sequence of random variable, $X_1, X_2, ...$, *converges in probability* to a random variable $X$ if, for every $ε>0$,
$\lim\limits_{n\to \infty} P(|X_n-X| \ge ε)=0$ or, equivalenty, $\lim\limits_{n\to \infty} P(|X_n-X| < ε)=1$
:::info
:information_desk_person: 換句話說,只要:
$\lim \limits _{n \to \infty}Var(X_a)=0$, $X_a \xrightarrow{P} \theta$
因為 Covergence in Probability 是藉由觀察變異度有沒有發散的方式實現。
:::
### 補充: 只要有$a.s$或$p$就會成立的事
If $X_n \to X, Y_n \to Y, Z_n \to a, H_n \to b$, where a and b are constant, then:
1. $X_n+Y_n \to X+Y$
2. $X_nY_n \to XY$
3. $\frac{X_n}{Y_n} \to \frac{X}{Y}$
4. $aX_n + bY_n \to aX +bY$
## 弱大數法則(WLLN)
> 白話: 只要有i.i.d與$\mu$和$\sigma^2$,就會$\bar x$就會機率收斂於$\mu$ 。
### source: stat.infer DP258
Let $X_1, X_2, ...$ be iid random variable with $EX_i=\mu$ and $VarX_i=σ^2<\infty$. Denfine $\bar X_n=\frac{1}{n} \sum\limits_{i = 1}^n{X_i}$. Then, for every $ε>0$,
$\lim\limits_{n\to \infty} P(|X_n-\mu| < ε)=1$
which means that $\bar X_n$ converges in probability to $\mu$.
- Theorem: If $X_1, X_2, ...,X_n$ converges in probability to a random variable $X$ and $h$ is a *continuous funciton*. Then $h(X_1), h(X_2),...,h(X_n)$ also converges in probability to $h(X)$.
:::info
:information_desk_person: 換句話說:
只要X有iid$、E(X),VAR(X)$有明確的有限常數、$X_a=E(X)$,就可以說$X_a \xrightarrow{P} \mu$, where $\mu$ is 母體平均。
可以注意到,這比前面的判定簡單多了。
:::
## Convergence in Distribution
### source: stat.infer DP261
A squence of random variables, $X_1, X_2, ...$, converges in distribution to a random variable $X$ if
$\lim\limits_{n\to \infty}F_{X_n}(x)=F_{X}(x)$, at all points $x$ where $F_X(x)$ is continuous.
- if the squence of random variables, $X_1, X_2, ...$ is **convergence in probability**, it **most** also **convergence in distribution**.( 若$X_1...X_n \xrightarrow{P} X$,則必定$X_1...X_n \xrightarrow{D} X$);反之不一定,**但若為$X_n \xrightarrow{D} c (constant)$, 則$X_n \xrightarrow{P} X$.**
- Slutsky's Theorem(DP265): if $X_n \xrightarrow{D} X$ and $Y_n \xrightarrow{P}
a$, where $a$ is a constant, then:
1. $Y_nX_n \xrightarrow{D} aX$.
2. $Y_n+X_n \xrightarrow{P} a+X$.
:::info
應用: 求極限分配
其中一種求法:
1. 從題目給的訊息,求$F(x)$
2. 計算$\lim\limits_{n\to \infty}F_{X_n}(x)$
3. 確認$\lim\limits_{n\to \infty}F_{X_n}(x)=F_{X}(x)$是否成立
4. 有成立,計算結果為極限分配。
:::
## Bounded in Probability
不會:(((((
### 相關定理
1. if $X_n \xrightarrow{D} X$, then ${X_n}$ is bounded in probability, where ${X_n}$ is sequence.但逆敘述不一定成立。
2. if ${X_n}$ is bounded in probability, $Y_n \xrightarrow{P} 0$, then $X_nY_n \xrightarrow{P} 0$
### $\triangle-method$(delta method)
參考hogg寫法
前提:$g(x)$可微
$g(y)=g(x)+g'(x)+o(|y-x|)$,這個o有分隨機變數或實數:
1. 隨機變數(也稱大O): $Y_n=o_p(X_n) i.f.f \frac{Y_n}{X_n} \xrightarrow{P} 0$ , as $n\to \infty$.
2. 實數(也稱小o): $a=o(b) i.f.f \frac{a}{b} \xrightarrow{P} 0$ , as $b\to 0$ .
- 小o的相關定理:
- if ${Y_n}$ is bounded in probability, Suppose $X_n=o_p{Y_n}$, then $X_n \xrightarrow{P} 0$ , as $n\to \infty$.
##### delta method 相關定理
如果$g'(\theta)\neq 0$:
if $\sqrt n (X_n-\theta)\xrightarrow{d} N(0,\sigma^2)$,then:
$\sqrt n (g(X_n)-g(\theta))\xrightarrow{d} N(0,\sigma^2(g'(\theta))^2)$
如果$g'(\theta)= 0$但$g''(\theta)\neq 0$:
$\sqrt n (g(X_n)-g(\theta))\xrightarrow{d} \sigma^2\frac{g''(\theta)}{2}\chi^2_1$
以上可用泰勒展開至$\theta$+小O converge 證得。
## stronger form of the CLT
### souce: stat.infer DP264
Let $X_1, X_2,...$ be a squence of iid random variable with $EX_i=\mu$ and $0<Var X_i= σ^2<\infty$. Define $\bar X_n= \frac{1}{n} \sum\limits_{i = 1}^n{X_i}$. Let $G_n(x)$ denote the cdf of $\root \of n(\bar X_n - \mu)/σ$. Then, for any $x, -\infty < x < \infty$,
$\lim\limits_{n\to \infty} G_n(x)=\int _{-\infty}^x \frac {1}{\root \of {2\pi}} e^{-y^2/2}$
Which means that $\root \of n(\bar X_n - \mu)/σ$ has a limiting standard normal distribution.
- stronger form of the CLT v.s 弱大數法則
| 比較點 | stronger form of the CLT | 弱大數法則(WLLN) |
| -------- |:----------------------------------------:| ----------------------------------------- |
| General | $X$的CDF需收斂於~$N(0,1)$ | 所有$X$ converage in probability of $\mu$ |
| E(X)條件 | 皆為$EX_i =\mu$ |(同左)
| Var條件 | $0<Var X_i= σ^2<\infty$ | $VarX_i=σ^2<\infty$ |
| 結論 | $\root \of n(\bar X_n - \mu)/σ$~$N(0,1)$ | $\bar X_n \xrightarrow{p} \mu$ |
### Multiple joint distribution
$f(x_1,...x_n)=\frac {m!}{x_1!...x_n!}p_1^{x_1}...p^{n}=m!Π^n_{i=1}\frac {p_i^{x_1}}{x_i!}$
### Multinomial Theorem
> 求MLE、MLR均會用到
$(p_1+...+p_n)^m=\sum\limits_{x \in A}\frac {m!}{x_1!...x_n!}p_1^{x_1}...p_n^{x_n}$
#### EXAMPLE
投10次骰子,i=擲到點數,each機率為$\frac {i}{21}$,則取得3次擲3、5次擲4、2次擲6帶入公式為:
$m=10,p_3=\frac {3}{21},x_3=3;p_4=\frac{4}{21},x_4=4;p_6=\frac {6}{21},x_6=2$
### 補: joint distribution的 Domain選擇
不然每次都取錯==
| 對象 | Domain選擇 | 範例:($0<x<y<1$) |
| -------- | -------- | -------- |
| E(X)、E(Y) | 取常數部分 | $E(x)=\int_0^1=xf(x)dx$ |
| E(X|Y)、E(Y|X) | 取有x,y部分 | $E(X|Y)=\int_0^y=xf(x|y)dx$ |
### 補: 找joint distribution的 Domain(有變數轉換時)
**EX.** 原本:0<x<y<1, $u=\frac{x+y}{2}, v=\frac{x}{2}$
#### step
1. 以x,y為中心挑出,盡量分成一個transfer會纏在一起,一個transfer後會獨立
$x<y<1, 0<x<1$
2. 代入新的變數
$x=2v, y=2(u-v)$所以就是:
$2v<2(u-v)<1, 0<2v<1
3. 整理
$2v<u<v+\frac{1}{2}, 0<v<\frac{1}{2}$
# 極限分配的求法
## 1. 定義求
$\lim\limits_{n\to \infty}F_{X_n}(x)$
:::info
:warning: 注意cdf原先的寫法至少會有2個以上的式子,取極限後也可能會有2個以上式子,要用x的domain決定cdf要取哪個。
:::
可能情況:
1. 常數 $\to$ degenarate at constant.
2. 某個分配的$F(x)$
## 2. mgf求
> 1. 適用於已知原分配$M_x(t)$,且知道tansfer後的樣子。
> 2. 做CLT過程會用到泰勒展開,取至三階左右近似化為常態mgf。
$\lim\limits_{n\to \infty}M_{X_n}(t)=M(t)$
可能情況:
1. 常數 $\to$ degenarate at constant.
2. 其他分配的$M_x(t)$。
# 點估計部分
## 好的點估計應具備的特性
##### 根據統計量背後的規律評價,因此會希望統計量會在真值附近波動(不偏性),並在重複執行時,可趨近於真值(一致性)。
1. unbiased: $E_\theta(\hat \theta(X_1,...X_n)=\theta$
:::info
:information_source: 如果題目丟個estimator:
1. 肉眼看分配,MMR算出。公式求c:
$E(X)\times c = \theta$。$c\times \theta$為所求unbaised estimator。
2. 把題目的estimator 丟入$E(X)=\theta$檢查。
:::
- 又可分成完全不偏與漸進不偏(較弱),漸進不偏為:
$\lim\limits_{n\to \infty}(E(\hat \theta - \theta))=0$
2. consistent: $\hat \theta \xrightarrow{P} \theta$
- use Two-sided Chebyshev's inquality to find.
3. 有效性:如果有多個estimators,會需要找最穩的(變異性最小)。
## 最好的點估計UVMV
### **UVMV = uniformly minimun variance unbiased estimator**
##### 意思:在所有不偏統計量(unbaised estimator => UE)中找出VAR最小者,<u>需先找出不偏</u>。
$Var_\theta \hat \theta \le V_\theta \hat \theta^*$
:::info
:information_desk_person: 所謂的$\hat\theta^*$其實就是要滿足:
Compelte + sufficient + base on T(minimal statitistic)
:::
### HOW to find: CRLB
##### Def: 在特定條件下,我們可以直接找出最小的不偏統計量為多少!
$Var_θ \hat \theta \le \frac{[τ^{'}(θ)]^2}{nE[\frac{\partial}{\partial θ}lnf(x;θ)]^2}$,右邊即為CRLB
- $τ^{'}(θ)$:我們要找的不偏期望值(不限單純E(X),因為有時候題目會變化成找$E(\frac{1}{\theta})$之類的)之一階微分
- 如果統計量跟E(X)一樣,$τ^{'}(θ)=1$
- $E[\frac {\partial}{\partial θ} lnf(x;θ)]^2=-E[\frac{\partial ^{2}}{\partial ^2 θ}lnf(x;θ)]$。此為Fisher information,簡寫$I(\theta)$,<u>越大表示$\hat \theta$的精確度越高。</u>
- come from $Cov(T,U)\le Var(T)Var(U),\ where\ U \ is \ \frac{\partial} {\partial θ}lnf(x;θ)$ 詳intro DP317、stat.infre DP361。
#### 限制:regularity conditions
1. 參數空間$Θ$ is an open interval ex. $(a,\infty),\ (-\infty, b), \ (a,b)$, a、b與參數θ無關。
2. Set ${x:f(x,\theta)=0}$ is indepdent of θ.
3. $\int \frac{\partial} {\partial θ}lnf(x;θ) dx =\frac{\partial} {\partial θ} \int lnf(x;θ) = 0$(即$E(U)=0$。)
4. if $T = t(x_1...x_n)$ is an umbaised estimator, then
$\int t \frac{\partial} {\partial θ}f(x;θ)dx = \frac{\partial} {\partial θ} \int tf(x;θ)dx$
其中3.跟4.可以從CRLB的推導中發現,會影響$E(U)=0$,讓$Cov(T, U) = E(TU);Var(U)= E(U^2)$。
### 多個點估計找CRLB的情況(Hogg 6.4)
舉例:$X~N(\mu,\sigma^2)$, $\mu$、$\sigma^2$皆未知。
此時$I(\theta)$為一 information 矩陣,延續上例:
$I(\mu, \sigma)=\begin{bmatrix}
\frac{1}{\sigma^2} & 0 \\0 & \frac{2}{\sigma^2}
\end{bmatrix}$
那要怎麼求呢?運用梯度的概念。
$I(\mu, \sigma)=\begin{bmatrix}
\frac{\partial^2 ln f(x;\mu,\sigma)}{\partial^2\mu} & \frac{\partial^2 ln f(x;\mu,\sigma)}{\partial\mu\partial\sigma} \\\frac{\partial^2 ln f(x;\mu,\sigma)}{\partial\mu\partial\sigma} & \frac{\partial^2 ln f(x;\mu,\sigma)}{\partial^2\sigma^2}
\end{bmatrix}$
而MLE漸進則為:
$\sqrt (\hat \theta_n-\theta)\xrightarrow{D}N_p(0,I^{-1}(\theta))$
### Efficiency
- 如果可以找到CRLB,那比CRLB大的肯定不是efficiency。
- 反之,如果出現比CRLB小的Variance的話,那肯定是不滿足CRLB的限制條件。
##### $ARE$ (asymptotically efficient)
適用沒有CRLB可以找的情況,比較2個Variance誰大誰小。
$e(\hat \theta_{1n}, \hat \theta_{2n})=\frac {\sigma^2_{\hat \theta 2n}}{\sigma^2_{\hat \sigma 1n}}$
- if $e(\hat \theta_{1n}, \hat \theta_{2n})>1$ -> 上面大,取$\hat \theta_{2n}$
- if $e(\hat \theta_{1n}, \hat \theta_{2n})<1$ -> 下面大,取$\hat \theta_{1n}$
### Another way: Lehmann Scheffe法 to find MVUE
:::warning
注意!以下為找MVUE,not UVMVE
:::
#### step
1. setting Likehood Ratio
$\frac {L[(x_1...x_n)|\theta]}{L[(y_1...y_n)|\theta]}$
2. set $g(x_1,x_2...x_n)=g(y_1,y_2...y_n)$ if LR will not including $\theta$
3. $g(y_1,y_2...y_n)$是最小的function,如果不是,就要再選更小的。
4. $g(y_1,y_2...y_n)$ is MVUE
## 好的統計量應具備的特性
> - 沒辦法用CRLB,那就從這裡開始找。
> - 所有統計量皆適用。
假設估計統計量為$T(X)$ for $θ$,好的$T(X)$應有:
1. **Sufficient**: <u>含有all information about $θ$</u>,which means, if x and y are 2 sample point s.t $T(x)=T(y)$, then x = y.即x只能透過$T(X)$取得$θ$的資訊,且$T(X)$不取決於$θ$。
- How to Find?
- base on def: conditional probability : $\frac {p(x|θ)}{q(T(X)|θ)} \ne θ$
:::warning
:warning: 注意分母的是$T(x)$的機率分配。
:::
- Factorization Theorem : $f(x|θ)=g[T(X)|θ]h(x)$ (口號:什麼都有乘上只有x的函數)
- exponential family
- Because it is so comman to find Sufficient Statstics, the most minimal one will called *minimal sufficient sdtatistic*, denoted as $T^*(X)$.
- Use $\frac {f(x|\theta)}{f(y|\theta)}=\frac {h'(x)}{h'(y)}$ to check.(from stat.infer DP307)
> 如果找不到合適的suffienct statistics,直接套用order statistic也是個辦法!(證明見[https://stats.stackexchange.com/questions/144646/proof-that-n-order-statistics-are-sufficient-for-a-sample-of-size-n)](https://stats.stackexchange.com/questions/144646/proof-that-n-order-statistics-are-sufficient-for-a-sample-of-size-n)
2. **Ancillary**: <u>$T(X)$的分配不取決於$θ$</u>
- 具備Location invarience跟Scale invarience的分佈可能為Ancillary statistics.因為在變數轉換時,未知參數可能會被消掉。
- ex. $\frac {cx_1}{cx_2}=\frac {c_1}{x_2}\to scale invarience$
:::info
反例:$T(x)$的分配含有$\theta$且$\theta$值不固定。
:::
3. **complete** : if $T(X)=t(X_1,...X_n)$, for any funciton $h(T)$ s.t $E_θ(h(T))=0, \forall\theta\in Θ$.即$P_\theta(h(T)=0)=1, for \theta \in Θ$
- 最難證明的性質,<u>**但一旦證明了,則必定具備Ancillary。**</u>
:::warning
:warning: 完備還是會隨$T(x)$不同產生不同結果,因為要檢測的統計量分配不見得一致。see stat.infer exceise 6.21
:::
:::info
:information_source: 有sufficient + complete,也確定是最小統計量(最小function)的話即可直接用Basu定理證明$T(X)$獨立於every Ancillary statistic.
:::
## Basu's Theorem
### source: stat.infer DP313
if $T(X)$ is a complete and minimal sufficient- statistic, then $T(X)$ is independent of every ancillary statstic.
## MLE的漸進性質(Asymptotic Properties of MLEs)
### source: [點此](https://www.probabilitycourse.com/chapter8/8_2_4_asymptotic_probs_of_MLE.php#:~:text=8.2.4%20Asymptotic%20Properties%20of%20MLEs%201%20%CE%98%20%5E,random%20variable.%20More%20precisely%2C%20the%20random%20variable%20)
1. $\hat Θ_{MLE}$ 為漸進不偏
2. $\hat Θ_{MLE}$ 為漸進一致
3. $\hat Θ_{MLE}$ will converge in distribution to N(0,1) , when $n \to \infty$.
### wald test
- 近似用MLE
- 來自$(\hat \theta_{MLE} - \theta )\sqrt n ~ N(0, I(\theta)_0)$
統計量:$W=\frac{\sqrt n (\hat \theta -\theta_0)}{\sqrt {Var(\theta_{MLE})}}$, where $\theta_0$ is under $H_0$ 之值
**Reject rule**
$W>N(0,1)_{1-\frac{\alpha}{2}}$
or
$W< N(0,1)_{\frac {\alpha}{2}}$
$\to$ **Reject $H_0$**
## 假設檢定計算$\alpha,\beta,power size$
$\alpha = P(type I error) = P(reject H_0|H_0 is true)$
$\beta=P(type II error) = P(reject H_1|H_1 is true)$
略(預計寫在[文組能懂的假設檢定介紹](https://hackmd.io/ZB2wHE9FQJuBQKPvFlxuNQ))
### power function補充
#### source: stat.infer DP411
For $0\le \alpha \le 1$, a test with power function $\beta(\theta)$ is a *size* $\alpha$ test if $sup_{θ \in Θ_0}\beta(\theta)=\alpha$.
For $0\le \alpha \le 1$, a test with power function $\beta(\theta)$ is a *level* $\alpha$ test if $sup_{θ \in Θ_0}\beta(\theta)\le \alpha$.
### Monotone likeihood ratio(MLR)
直覺就是它是遞增/遞減函數,正式定義:
if $\theta_2>\theta_1$、$\frac{g(x|theta_2)}{g(x|theta_1)}$ is a monotone function.
:::warning
:warning: 看定義就知道一定要2個函數<u>互除</u>才可以確定它是MLR。
:::
確定MLR後可用以下定理證明UMP Test:
### Karlin-Rubin
#### stat.infer DP417
Condider testing $H_0:\theta\le\theta_0$ v.s $H_0:\theta>\theta_1$. Suppose that T is a sufficent statistic for $\theta$ and the family of pdf or pmf of T has an MLR. Then for any $t_0$, the test that rejects $H_0$ i.i.f $T>t_0$ is a UMP level $\alpha$ test, where $\alpha=P_{\theta_0}(T>t_0)$
## MSE部分
定義式:$MSE(\hat \theta)=E_{\hat \theta}((\hat \theta-\theta)^2)$,也就是實際估計值減去理想估計值。
遇到複雜計算的算法:$MSE(T(X))=Var(T(X))+[Bais(T(X))]^2$
$Bais(T(X))=E_{\theta}[T(X)]-\theta$
### 基本
令$T(X)$為統計量
$MSE(T(x))=Var(T(x))+(Biais(T(X)))^2$
其中$Biais(T(X))=E_{\theta}(T(x))-\theta$
### 最小平方法的計算
1. Let $Q(\theta)=\sum \limits_{i=1}^n(X_i-\mu)^2$
2. 要求$min{Q(\theta)}\to\frac{d}{d\theta}Q(\theta)=0$
3. check 是否 $\frac{d^2}{d^2\theta}Q(\theta)>0$,代表有極小值。
## 區間估計部分
### 連續型機率分配上下界計算(僅限偶函數分配有效)
$\int_{-\infty}^tf_T(\mu|\theta_U(t))d\mu=\frac{\alpha}{2}$, $\int_{t}^{\infty}f_T(\mu|\theta_L(t))d\mu=\frac{\alpha}{2}$
簡單說就是將原函數的參數部分變成U(t)/L(t)(上界下界的代號),原本x部分變成統計量t,然後再求U(t)/L(t)的解。
### Cochrane theorem
> 除了用在複廻歸外,LRT&區間估計建立也會用到
#### source: 網路
Let$X~N_n(\mu, I_n)$, $Q_i=X^TA_iX$ satisfay $Q_1+Q_2+...+Q_k=X^TX$, $A\in \mathbb{R}^{n \times n}$ is 對稱,$rank(A_i)=r_i$, Then 以下條件均等價成立:
1. $Q_i~\chi^2(n_i, \lambda _i), \forall 1 \le i \le k$
2. $r_1+r_2+...+r_k=n$
3. $Q_i ╨ Q_j$, $i \neq j$
4. $\forall Q_i~\chi^2(r_i)$