--- # System prepended metadata title: 機率論&數理統計不熟整理 tags: [統研所考試, 機率數統] --- # 機率論&數理統計不熟整理 ###### tags: `統研所考試`、`機率數統` --- ## 基本機率 ### 排列 >[!NOTE] >在n個相異物品中抽m個排序 $P^n_m=\frac{n!}{m!}$ ### 相異球放相異盒問題 m個相異球放入n個相異空盒，沒有空盒的數目($D_n$)為： $D_n=n^m+C^n_1(n-1)^m+C^n_2(n-2)^m+....+(-1)^{n-1}C^n_{n-1}\times 1^m$ ### 相異物重新排列問題有n件相異物排成一列，現重新排列，沒有物件在原來位置的排列數目($D_n$)為： $D_n=n![1-\frac{1}{1!}+\frac{1}{2!}-\frac{1}{3!}+...+(-1)^n\frac{1}{n!}$ ## 用到的微積分 ### $({1-\lambda \over n })^n$≈ $e^{-\lambda}$ ### 泰勒展開 #### souce:662 The nth Taylor polynomial at $x=a$ of $f(x)$ is: $p_n(x)$$=f(a)+{f^′(a)\over 1!}(x-a)+{f^{′′}(a)\over 2!}(x-a)^2+...{f^{(n)}(a)\over n!}(x-a)^n$ - 若a=0，則稱為Maclauriu多項式，也是e的指數函數定義。 - $e^\lambda = \sum\limits _{x=0}^\infty \frac {\lambda ^x}{x!}$ ## 函數 ### Gamma function - Def: $Γ(α)=\int _0 ^\infty x^{α-1}e^{-x}dx$ - 性質: - $Γ(n)=(n-1)!$, if n is postive integer. - $Γ(\frac {1}{2})=\root \of {\pi}$ - $Γ(α)=Γ(α-1)(α-1)$, or $Γ(α+1)=Γ(α)(α)$ - $Γ(α+n)=Γ(α)(α)(α+1)(α+1)...(α+n-1)$ ### Bata function - Def: $B(a,b)=\int _0^1 x^{α-1}(1-x)^{b-1}dx$ - 性質: - $B(a,b)= \frac {Γ(a)Γ(b)}{Γ(a+b)}$ - $B(a,b)=B(b,a)$ ### 條件機率期望值算法 #### souce: intro.DP192 $E(Y|X)=\sum_y yf(y|x)$ ,　if X and Y are discrete. $E(Y|X)=\int_{-\infty}^{\infty} yf(y|x)\mathrm{d} x$ , 　if X and Y are continuous. ### E(X)、Var(X)、Cov(X)其他性質 1. $E[E(Y|X)]=E(Y)$ 2. $Var(Y|X)=E｛[Y-E(Y|x)]^2|x｝=E(Y^2|x)-[E(Y|x)]^2$ - $Var(X)=E[(X-E(X)]^2]=E(X^2)-[E(X)]^2$ 3. $Cov(X,Y)=E[(X-\mu_x)(Y-\mu_x)]=\sum (x_i-\mu_x)(y_i-\mu_y)$ $=E(XY)-E(X)E(Y)=\sum x_iy_i-n\mu_x\mu_y$(?) $=Var(X)+2Cov(X,Y)+Var(Y)$ :::danger :face_with_head_bandage: 常忘 5. $Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)$ ::: :::info :warning: 獨立的話，$Cov(X,Y)=0$ ::: 6. $Var(X)=E[Var(X||Y)]+VAR[E(X|Y)]$ - 口號:組內變異+組間變異=總變異(相當於$SST=SSR+SSE$) ### $S^2$ 相關推導 #### 服從卡方分配原因有用的 formula： $\sum (X_i - \mu)^2 = \sum(X_i - \bar x)^2 + n(\bar X - \mu)^2$ $\frac{\sum (X_i - \mu)^2}{\sigma^2} = \frac {\sum(X_i - \bar x)^2}{\sigma^2} + \frac{n(\bar X - \mu)^2}{\sigma^2}$ 又$\frac {n(\bar X - \mu)^2}{\sigma^2}～\chi^2(1)$ 所以： - $\frac {\sum \limits_{i=1}^n( X_i-\mu)^2}{\sigma^2}～\chi^2(n)$ - $\frac{\sum \limits_{i=1}^n(X_i-\bar X)^2}{\sigma^2}～ \chi^2(n-1)$ ### 不等式 #### Markov's inequlity Let $Y$ ba nonegative random variable, then: $P(Y>a)\le \frac {E(Y)}{a}$　for any constant $a>0$ :::info :paperclip: proof: for a>0 let I =1 if $x\ge a$, 0 otherwise since $x\ge a$, $I\le a$ taking expectations of I, we get: $E(I)\le \frac{E(X)}{a}$ ::: #### Chebyshev's inquality 寫法不知道為什麼很多，整理一下: ###### source: 提綱P154 設$X$為任意隨機變數，其期望值為$\mu$，變異數為$σ^2$，則$\forall k>0$， $P(|X-\mu|)0$ :::info :paperclip: apply Markov's inequality with $(x-\mu)^2>0$ $P[(x-\mu)^2\ge k^2]\le \frac{E[(x-\mu)^2]}{k^2}$ since $(x-\mu)^2$$\ge k^2 \to |x-\mu| \ge k$, $P(|x-\mu|\ge k) \le \frac{E[(x-\mu)^2]}{k^2}= \frac{\sigma^2}{k^2}$ ::: ##### One-sided Chebyshev's inquality ###### source: P322 For any constant $c>0$, $P(X \ge \mu +c)\le \frac {σ^2}{σ^2+c^2}$ and $P(X \ge \mu -c)\le \frac {σ^2}{σ^2+c^2}$ :::info 白話：某數與$E(X)$的差大於c的機率。 :information_source: 其實one-sided 與 two-sided表達的事情都一樣，只是tow-sided是考慮平均值1到平均值2的差；one-sided則是從變化後的平均值2下去做判斷。 ::: :::info :paperclip: let X is a random variable with mean 0 and $\sigma^2< \infty$, for all a>0, let b>0 $X\ge a \to X+b \ge a+b$ Hence, $P(X \ge a)=P(X+b \ge a+b) \ge P[(X +b)^2 \ge (a+b)^2]$ apply Markov's inequality $P(X\ge a)\le \frac{E[(X+b)^2]}{(a+b)^2} = \frac {\sigma^2+b^2}{(a+b)^2}$$=\frac{\sigma^2+(\sigma^2/a)}{(a+ (\sigma^2/a))^2}$ ::: #### Chernoff Bound(basic) Let $X$ ba random variable which M_X(t) is existed. $P(X\ge c)\le min_{t>0}[e^{-ct}M_X(t)]$ 4. 不知名 $E[(Y-g(x))^2]\ge E[(Y-E(Y|X))^2]$ #### Jensen' s inequality If $f(x)$ is a convex function, then $E[f(X)] \ge f(E[X])$ ::: info :paperclip: by Taylor expension: $f(x) \ge f(c)+f'(c)(x-c)$, where c is a constant. take $c= E(x)$, $f(x)\ge f[E(x)] + f'[E(x)][x-E(x)]$ take $E(X)$ to both side, $E[f(x)] \ge E[f(E(x))] +0 = g[E(x)]$, $\because g[E(x)]$ is a number already. ::: # Convergence concepts :::info :popcorn: 「有XX必定有OO」的食物鏈排列: $a.s \to p \to d \to bouded　in　prob.$ :ballot_box_with_check: $a.s 、conv. in　prob.、 conv. in　dis.$三個皆可用的 continuous mapping theorem: if $g:R \to R$ is continuous function and $X_n \to X$, then $g(X_n)\to X$ ::: ## 中央極限定理CLT ### souce: P184 If $X_1,X_2$ are *independent random variables*, each having *the same probability distributions function* with excepted value $\mu$ and standard deviation $\sigma$, then $\lim\limits_{n\to \infty}P（{X_1+...+X_n-n\mu \over{\sigma \sqrt n}）≤x})$$=\phi(x)$ for all x 簡單來說，CLT的意思是，不管什麼分配，只要樣本數夠大，就能化為常態分配！ For example: $X_i\,\overset{}{～}\,BIN(n,p), i=1,2...,n$ We known that each of X has $E(X)=np, Var(X)=npq$, by CLT: $\sum \limits_{i=1}^{n}X_i\,\overset{i.i.d}{～}\,N(np,npq)$ ## continuity correction 為什麼需要它?因為有些分布是離散型，常態分布是連續型，轉換時會有誤差，需要各加0.5來彌補。 for example: $X~Bin(1000, 0.3)$，find $P(470$, $\lim\limits_{n\to \infty} P(|X_n-X| < ε)=1$,　i.e. $P([{w:\lim\limits_{n\to \infty} X_n(w)=X(w)}])=1$ which means that if the function $X_n(s)$ converge to $X(s)$ for all $s \in S$(Sample Space), then $X_n$ converge to $X$. :::info :information_source: X為從sample space to 實數空間S$\to$R的函數。 :information_desk_person:強大數法則暗示了： 1. 在S所屬空間內要收斂。 2. $P([R_1,R_2] \in S)=1$，$[R_1,R_2]$適應設在實數空間裡的範圍。 :point_up: 與 converge in propability的差異 - 主要透過sequence $X$進行判斷。 - 強大數法則的條件較嚴格。 ::: ## Convergence in Probability ### source: stat.infer DP258 >白話: $X_n$與$X$的變異程度，高過誤差的機率在樣本數夠大時為0 A sequence of random variable, $X_1, X_2, ...$, *converges in probability* to a random variable $X$ if, for every $ε>0$, $\lim\limits_{n\to \infty} P(|X_n-X| \ge ε)=0$　or, equivalenty, $\lim\limits_{n\to \infty} P(|X_n-X| < ε)=1$ :::info :information_desk_person: 換句話說，只要： $\lim \limits _{n \to \infty}Var(X_a)=0$, $X_a \xrightarrow{P} \theta$ 因為 Covergence in Probability 是藉由觀察變異度有沒有發散的方式實現。 ::: ### 補充: 只要有$a.s$或$p$就會成立的事 If $X_n \to X, Y_n \to Y, Z_n \to a, H_n \to b$, where a and b are constant, then: 1. $X_n+Y_n \to X+Y$ 2. $X_nY_n \to XY$ 3. $\frac{X_n}{Y_n} \to \frac{X}{Y}$ 4. $aX_n + bY_n \to aX +bY$ ## 弱大數法則(WLLN) > 白話: 只要有i.i.d與$\mu$和$\sigma^2$，就會$\bar x$就會機率收斂於$\mu$ 。 ### source: stat.infer DP258 Let $X_1, X_2, ...$ be iid random variable with $EX_i=\mu$ and $VarX_i=σ^2<\infty$. Denfine $\bar X_n=\frac{1}{n} \sum\limits_{i = 1}^n{X_i}$. Then, for every $ε>0$, $\lim\limits_{n\to \infty} P(|X_n-\mu| < ε)=1$ which means that $\bar X_n$ converges in probability to $\mu$. - Theorem: If $X_1, X_2, ...,X_n$ converges in probability to a random variable $X$ and $h$ is a *continuous funciton*. Then $h(X_1), h(X_2),...,h(X_n)$ also converges in probability to $h(X)$. :::info :information_desk_person: 換句話說：只要X有iid$、E(X),VAR(X)$有明確的有限常數、$X_a=E(X)$，就可以說$X_a \xrightarrow{P} \mu$, where $\mu$ is 母體平均。可以注意到，這比前面的判定簡單多了。 ::: ## Convergence in Distribution ### source: stat.infer DP261 A squence of random variables, $X_1, X_2, ...$, converges in distribution to a random variable $X$ if $\lim\limits_{n\to \infty}F_{X_n}(x)=F_{X}(x)$,　at all points $x$ where $F_X(x)$ is continuous. - if the squence of random variables, $X_1, X_2, ...$ is **convergence in probability**, it **most** also **convergence in distribution**.( 若$X_1...X_n \xrightarrow{P} X$，則必定$X_1...X_n \xrightarrow{D} X$)；反之不一定，**但若為$X_n \xrightarrow{D} c　(constant)$, 則$X_n \xrightarrow{P} X$.** - Slutsky's Theorem(DP265): if $X_n \xrightarrow{D} X$ and $Y_n \xrightarrow{P} a$, where $a$ is a constant, then: 1. $Y_nX_n \xrightarrow{D} aX$. 2. $Y_n+X_n \xrightarrow{P} a+X$. :::info 應用: 求極限分配其中一種求法: 1. 從題目給的訊息，求$F(x)$ 2. 計算$\lim\limits_{n\to \infty}F_{X_n}(x)$ 3. 確認$\lim\limits_{n\to \infty}F_{X_n}(x)=F_{X}(x)$是否成立 4. 有成立，計算結果為極限分配。 ::: ## Bounded in Probability 不會:((((( ### 相關定理 1. if $X_n \xrightarrow{D} X$, then $｛X_n｝$ is bounded in probability, where $｛X_n｝$ is sequence.但逆敘述不一定成立。 2. if $｛X_n｝$ is bounded in probability, $Y_n \xrightarrow{P} 0$, then $X_nY_n \xrightarrow{P} 0$ ### $\triangle-method$(delta method) 參考hogg寫法前提:$g(x)$可微 $g(y)=g(x)+g'(x)+o(|y-x|)$，這個o有分隨機變數或實數: 1. 隨機變數(也稱大O):　$Y_n=o_p(X_n)　i.f.f 　\frac{Y_n}{X_n} \xrightarrow{P} 0$ , as $n\to \infty$. 2. 實數(也稱小o): $a=o(b)　i.f.f 　\frac{a}{b} \xrightarrow{P} 0$ , as $b\to 0$ . - 小o的相關定理: - if $｛Y_n｝$ is bounded in probability, Suppose $X_n=o_p｛Y_n｝$, then $X_n \xrightarrow{P} 0$ , as $n\to \infty$. ##### delta method 相關定理如果$g'(\theta)\neq 0$: if $\sqrt n (X_n-\theta)\xrightarrow{d} N(0,\sigma^2)$，then: $\sqrt n (g(X_n)-g(\theta))\xrightarrow{d} N(0,\sigma^2(g'(\theta))^2)$ 如果$g'(\theta)= 0$但$g''(\theta)\neq 0$: $\sqrt n (g(X_n)-g(\theta))\xrightarrow{d} \sigma^2\frac{g''(\theta)}{2}\chi^2_1$ 以上可用泰勒展開至$\theta$+小O converge 證得。 ## stronger form of the CLT ### souce: stat.infer DP264 Let $X_1, X_2,...$ be a squence of iid random variable with $EX_i=\mu$ and $0求MLE、MLR均會用到 $(p_1+...+p_n)^m=\sum\limits_{x \in A}\frac {m!}{x_1!...x_n!}p_1^{x_1}...p_n^{x_n}$ #### EXAMPLE 投10次骰子，i=擲到點數，each機率為$\frac {i}{21}$，則取得3次擲3、5次擲4、2次擲6帶入公式為: $m=10，p_3=\frac {3}{21}，x_3=3；p_4=\frac{4}{21}，x_4=4；p_6=\frac {6}{21},x_6=2$ ### 補: joint distribution的 Domain選擇不然每次都取錯== | 對象 | Domain選擇 | 範例:($0 1. 適用於已知原分配$M_x(t)$，且知道tansfer後的樣子。 > 2. 做CLT過程會用到泰勒展開，取至三階左右近似化為常態mgf。 $\lim\limits_{n\to \infty}M_{X_n}(t)=M(t)$ 可能情況: 1. 常數 $\to$ degenarate at constant. 2. 其他分配的$M_x(t)$。 # 點估計部分 ## 好的點估計應具備的特性 ##### 根據統計量背後的規律評價，因此會希望統計量會在真值附近波動(不偏性)，並在重複執行時，可趨近於真值(一致性)。 1. unbiased: $E_\theta(\hat \theta(X_1,...X_n)=\theta$ :::info :information_source: 如果題目丟個estimator: 1. 肉眼看分配，MMR算出。公式求c: $E(X)\times c = \theta$。$c\times \theta$為所求unbaised estimator。 2. 把題目的estimator 丟入$E(X)=\theta$檢查。 ::: - 又可分成完全不偏與漸進不偏(較弱)，漸進不偏為： $\lim\limits_{n\to \infty}(E(\hat \theta - \theta))=0$ 2. consistent: $\hat \theta \xrightarrow{P} \theta$ - use Two-sided Chebyshev's inquality to find. 3. 有效性:如果有多個estimators，會需要找最穩的(變異性最小)。 ## 最好的點估計UVMV ### **UVMV = uniformly minimun variance unbiased estimator** ##### 意思：在所有不偏統計量(unbaised estimator => UE)中找出VAR最小者，需先找出不偏。 $Var_\theta \hat \theta \le V_\theta \hat \theta^*$ :::info :information_desk_person: 所謂的$\hat\theta^*$其實就是要滿足： Compelte + sufficient + base on T(minimal statitistic) ::: ### HOW to find: CRLB ##### Def: 在特定條件下，我們可以直接找出最小的不偏統計量為多少！ $Var_θ \hat \theta \le \frac{[τ^{'}(θ)]^2}{nE[\frac{\partial}{\partial θ}lnf(x;θ)]^2}$，右邊即為CRLB - $τ^{'}(θ)$：我們要找的不偏期望值(不限單純E(X)，因為有時候題目會變化成找$E(\frac{1}{\theta})$之類的)之一階微分 - 如果統計量跟E(X)一樣，$τ^{'}(θ)=1$ - $E[\frac {\partial}{\partial θ} lnf(x;θ)]^2=-E[\frac{\partial ^{2}}{\partial ^2 θ}lnf(x;θ)]$。此為Fisher information，簡寫$I(\theta)$，越大表示$\hat \theta$的精確度越高。 - come from $Cov(T,U)\le Var(T)Var(U),\ where\ U \ is \ \frac{\partial} {\partial θ}lnf(x;θ)$ 詳intro DP317、stat.infre DP361。 #### 限制:regularity conditions 1. 參數空間$Θ$ is an open interval ex. $(a,\infty),\ (-\infty, b), \ (a,b)$, a、b與參數θ無關。 2. Set $｛x:f(x,\theta)=0｝$ is indepdent of θ. 3. $\int \frac{\partial} {\partial θ}lnf(x;θ) dx =\frac{\partial} {\partial θ} \int lnf(x;θ) = 0$(即$E(U)=0$。) 4. if $T = t(x_1...x_n)$ is an umbaised estimator, then $\int t \frac{\partial} {\partial θ}f(x;θ)dx = \frac{\partial} {\partial θ} \int tf(x;θ)dx$ 其中3.跟4.可以從CRLB的推導中發現，會影響$E(U)=0$，讓$Cov(T, U) = E(TU);Var(U)= E(U^2)$。 ### 多個點估計找CRLB的情況(Hogg 6.4) 舉例：$X～N(\mu,\sigma^2)$, $\mu$、$\sigma^2$皆未知。此時$I(\theta)$為一 information 矩陣，延續上例： $I(\mu, \sigma)=\begin{bmatrix} \frac{1}{\sigma^2} & 0 \\0 & \frac{2}{\sigma^2} \end{bmatrix}$ 那要怎麼求呢?運用梯度的概念。 $I(\mu, \sigma)=\begin{bmatrix} \frac{\partial^2 ln f(x;\mu,\sigma)}{\partial^2\mu} & \frac{\partial^2 ln f(x;\mu,\sigma)}{\partial\mu\partial\sigma} \\\frac{\partial^2 ln f(x;\mu,\sigma)}{\partial\mu\partial\sigma} & \frac{\partial^2 ln f(x;\mu,\sigma)}{\partial^2\sigma^2} \end{bmatrix}$ 而MLE漸進則為： $\sqrt (\hat \theta_n-\theta)\xrightarrow{D}N_p(0,I^{-1}(\theta))$ ### Efficiency - 如果可以找到CRLB，那比CRLB大的肯定不是efficiency。 - 反之，如果出現比CRLB小的Variance的話，那肯定是不滿足CRLB的限制條件。 ##### $ARE$ (asymptotically efficient) 適用沒有CRLB可以找的情況，比較2個Variance誰大誰小。 $e(\hat \theta_{1n}, \hat \theta_{2n})=\frac {\sigma^2_{\hat \theta 2n}}{\sigma^2_{\hat \sigma 1n}}$ - if $e(\hat \theta_{1n}, \hat \theta_{2n})>1$ -> 上面大，取$\hat \theta_{2n}$ - if $e(\hat \theta_{1n}, \hat \theta_{2n})<1$ -> 下面大，取$\hat \theta_{1n}$ ### Another way: Lehmann Scheffe法 to find MVUE :::warning 注意！以下為找MVUE，not UVMVE ::: #### step 1. setting Likehood Ratio $\frac {L[(x_1...x_n)|\theta]}{L[(y_1...y_n)|\theta]}$ 2. set $g(x_1,x_2...x_n)=g(y_1,y_2...y_n)$ if LR will not including $\theta$ 3. $g(y_1,y_2...y_n)$是最小的function，如果不是，就要再選更小的。 4. $g(y_1,y_2...y_n)$ is MVUE ## 好的統計量應具備的特性 > - 沒辦法用CRLB，那就從這裡開始找。 > - 所有統計量皆適用。假設估計統計量為$T(X)$ for $θ$，好的$T(X)$應有: 1. **Sufficient**: 含有all information about $θ$，which means, if x and y are 2 sample point s.t $T(x)=T(y)$, then x = y.即x只能透過$T(X)$取得$θ$的資訊，且$T(X)$不取決於$θ$。 - How to Find? - base on def: conditional probability : $\frac {p(x|θ)}{q(T(X)|θ)} \ne θ$ :::warning :warning: 注意分母的是$T(x)$的機率分配。 ::: - Factorization Theorem : $f(x|θ)=g[T(X)|θ]h(x)$ (口號：什麼都有乘上只有x的函數) - exponential family - Because it is so comman to find Sufficient Statstics, the most minimal one will called *minimal sufficient sdtatistic*, denoted as $T^*(X)$. - Use $\frac {f(x|\theta)}{f(y|\theta)}=\frac {h'(x)}{h'(y)}$ to check.(from stat.infer DP307) > 如果找不到合適的suffienct statistics，直接套用order statistic也是個辦法！(證明見[https://stats.stackexchange.com/questions/144646/proof-that-n-order-statistics-are-sufficient-for-a-sample-of-size-n)](https://stats.stackexchange.com/questions/144646/proof-that-n-order-statistics-are-sufficient-for-a-sample-of-size-n) 2. **Ancillary**: $T(X)$的分配不取決於$θ$ - 具備Location invarience跟Scale invarience的分佈可能為Ancillary statistics.因為在變數轉換時，未知參數可能會被消掉。 - ex. $\frac {cx_1}{cx_2}=\frac {c_1}{x_2}\to scale invarience$ :::info 反例：$T(x)$的分配含有$\theta$且$\theta$值不固定。 ::: 3. **complete** : if $T(X)=t(X_1,...X_n)$, for any funciton $h(T)$ s.t $E_θ(h(T))=0,　\forall\theta\in Θ$.即$P_\theta(h(T)=0)=1,　for　\theta \in Θ$ - 最難證明的性質，**但一旦證明了，則必定具備Ancillary。** :::warning :warning: 完備還是會隨$T(x)$不同產生不同結果，因為要檢測的統計量分配不見得一致。see stat.infer exceise 6.21 ::: :::info :information_source: 有sufficient + complete，也確定是最小統計量(最小function)的話即可直接用Basu定理證明$T(X)$獨立於every Ancillary statistic. ::: ## Basu's Theorem ### source: stat.infer DP313 if $T(X)$ is a complete and minimal sufficient- statistic, then $T(X)$ is independent of every ancillary statstic. ## MLE的漸進性質(Asymptotic Properties of MLEs) ### source: [點此](https://www.probabilitycourse.com/chapter8/8_2_4_asymptotic_probs_of_MLE.php#:~:text=8.2.4%20Asymptotic%20Properties%20of%20MLEs%201%20%CE%98%20%5E,random%20variable.%20More%20precisely%2C%20the%20random%20variable%20) 1. $\hat Θ_{MLE}$ 為漸進不偏 2. $\hat Θ_{MLE}$ 為漸進一致 3. $\hat Θ_{MLE}$ will converge in distribution to N(0,1) , when $n \to \infty$. ### wald test - 近似用MLE - 來自$(\hat \theta_{MLE} - \theta )\sqrt n ~ N(0, I(\theta)_0)$ 統計量：$W=\frac{\sqrt n (\hat \theta -\theta_0)}{\sqrt {Var(\theta_{MLE})}}$, where $\theta_0$ is under $H_0$ 之值 **Reject rule** $W>N(0,1)_{1-\frac{\alpha}{2}}$ or $W< N(0,1)_{\frac {\alpha}{2}}$ $\to$ **Reject $H_0$** ## 假設檢定計算$\alpha,\beta,power size$ $\alpha = P(type I error) = P(reject H_0|H_0 is true)$ $\beta=P(type II error) = P(reject H_1|H_1 is true)$ 略(預計寫在[文組能懂的假設檢定介紹](https://hackmd.io/ZB2wHE9FQJuBQKPvFlxuNQ)) ### power function補充 #### source: stat.infer DP411 For $0\le \alpha \le 1$, a test with power function $\beta(\theta)$ is a *size* $\alpha$ test if $sup_{θ \in Θ_0}\beta(\theta)=\alpha$. For $0\le \alpha \le 1$, a test with power function $\beta(\theta)$ is a *level* $\alpha$ test if $sup_{θ \in Θ_0}\beta(\theta)\le \alpha$. ### Monotone likeihood ratio(MLR) 直覺就是它是遞增/遞減函數，正式定義： if $\theta_2>\theta_1$、$\frac{g(x|theta_2)}{g(x|theta_1)}$ is a monotone function. :::warning :warning: 看定義就知道一定要2個函數互除才可以確定它是MLR。 ::: 確定MLR後可用以下定理證明UMP Test: ### Karlin-Rubin #### stat.infer DP417 Condider testing $H_0:\theta\le\theta_0$ v.s $H_0:\theta>\theta_1$. Suppose that T is a sufficent statistic for $\theta$ and the family of pdf or pmf of T has an MLR. Then for any $t_0$, the test that rejects $H_0$ i.i.f $T>t_0$ is a UMP level $\alpha$ test, where $\alpha=P_{\theta_0}(T>t_0)$ ## MSE部分定義式：$MSE(\hat \theta)=E_{\hat \theta}((\hat \theta-\theta)^2)$，也就是實際估計值減去理想估計值。遇到複雜計算的算法：$MSE(T(X))=Var(T(X))+[Bais(T(X))]^2$ $Bais(T(X))=E_{\theta}[T(X)]-\theta$ ### 基本令$T(X)$為統計量 $MSE(T(x))=Var(T(x))+(Biais(T(X)))^2$ 其中$Biais(T(X))=E_{\theta}(T(x))-\theta$ ### 最小平方法的計算 1. Let $Q(\theta)=\sum \limits_{i=1}^n(X_i-\mu)^2$ 2. 要求$min｛Q(\theta)｝\to\frac{d}{d\theta}Q(\theta)=0$ 3. check 是否 $\frac{d^2}{d^2\theta}Q(\theta)>0$，代表有極小值。 ## 區間估計部分 ### 連續型機率分配上下界計算(僅限偶函數分配有效) $\int_{-\infty}^tf_T(\mu|\theta_U(t))d\mu=\frac{\alpha}{2}$, $\int_{t}^{\infty}f_T(\mu|\theta_L(t))d\mu=\frac{\alpha}{2}$ 簡單說就是將原函數的參數部分變成U(t)/L(t)(上界下界的代號)，原本x部分變成統計量t，然後再求U(t)/L(t)的解。 ### Cochrane theorem > 除了用在複廻歸外，LRT&區間估計建立也會用到 #### source: 網路 Let$X～N_n(\mu, I_n)$, $Q_i=X^TA_iX$ satisfay $Q_1+Q_2+...+Q_k=X^TX$, $A\in \mathbb{R}^{n \times n}$ is 對稱，$rank(A_i)=r_i$, Then 以下條件均等價成立: 1. $Q_i～\chi^2(n_i, \lambda _i),　\forall 1 \le i \le k$ 2. $r_1+r_2+...+r_k=n$ 3. $Q_i ╨ Q_j$, $i \neq j$ 4. $\forall Q_i～\chi^2(r_i)$