owned this note
owned this note
Published
Linked with GitHub
---
tags: Python
---
# 統計筆記
## 敘述性統計
### 動差生成函數(Moment Generation Function)
| k階動差| $\mu_k=E[(x-\mu)^k]$ |
|-|-|
| 動差生成函數(m.g.f.) | $M_x(t)=E(e^{tx})$ |
| 平均數 | $\bar{x}=\frac{\sum^n_{i=0} x_i}{n}$|
| 變異數$V(X)=E(X^2)-[E(X)]^2$|$\sigma^2=\frac{\sum^n_{i=1}(x_i-\bar{x})^2}{n}=\frac{\sum^n_{i=1}x_i^2}{n}+n\bar{x}^2$|
| 偏態係數(skewness)<br>大於0正偏右偏、小於0負偏左偏、等於0不偏 | $b_1=\cfrac{\mu_3}{(\sqrt{\mu_2})^3}=\cfrac{E[(x-\mu)^3]}{(\sqrt{E[(x-\mu)]^2})^3}$ |
| 峰態係數(kurtosis)<br>大於3高狹峰、小於3低闊峰、等於3常態峰 | $b_2=\cfrac{\mu_4}{(\sqrt{\mu_2})^4}=\cfrac{E[(x-\mu)^4]}{(\sqrt{E[(x-\mu)]^2})^4}$ |
### 統計量(母數皆知)
| $Cov(X,Y)=E[(X-\mu_X)(Y-\mu_Y)]=E(XY)-E(X)E(Y)$ |
|-|
|$\rho_{X,Y}=E[\frac{(X-\mu_X)}{\sigma_x}\frac{(Y-\mu_Y)}{\sigma_y}]$|
### 樞紐量(包含未知的母數)
| 樣本平均$E(\bar{x})=\mu$ |
|-|
| 樣本平均變異數$Var(\bar{x})=\frac{\sigma^2}{n}$ |
## 機率
### 聯合機率&條件機率
|probability mass function(pmf)| $\sum^n_{i=0}f(x)=1$ |
|-|-|
|probability density function(pdf)| $\int^{\infty}_{-\infty}f(x)=1$ |
|cumulative distribution function(cdf)| $F(x)=P(x\leq X)$ |
|prior probability|$f(x)=\frac{dF(x)}{dx}=\int^\infty_{-\infty} f(x,y)dy$|
|posterior probability| $f(x\|y)=\cfrac{f(x,y)}{f(y)}$ |
|全變異數定理(總變異=組內變異+組間變異) |$V(X)=E[V(X\|Y)]+V[E(X\|Y)]$|
|雙重期望值定理|$E(XY)=E[E(XY\|X)]=E[XE(Y\|X)]$|
### 機率上下界
| Markov's inequality | $P(x\geq c)\leq\cfrac{E(x)}{c}$ |
|-|-|
| Chebyshev's inequality | $P(\|x-\mu\|\geq k\sigma)\leq\cfrac{1}{k^2}$ |
| 單邊柴比雪夫 | $P(x\geq k)=\cfrac{\sigma^2}{\sigma^2+k^2}$ |
### 伯努力分配 bernouli
*二元試驗進行一次,投擲一次一枚硬幣正面的機率分配*
|| $x \sim Ber(p)$|
|-|-|
| $f(x)$ | $p^x(1-p)^{1-x}$ |
| $E(X)$ | $p$ |
| $Var(x)$ | $pq$ |
| $M_x(t)$ | $q+pe^t$ |
### 二項分配 binomial
*二元試驗進行n次,投擲n次一枚硬幣的正面x次的機率分配*
|| $x \sim Bin(p)$|
|-|-|
| $f(x)$ | $C^n_xp^x(1-p)^{n-x}$ |
| $E(X)$ | $np$ |
| $Var(x)$ | $npq$ |
| $M_x(t)$ | $(q+pe^t)^n$ |
### 超幾何分配 hyper geometry
*母體共N個抽n個、母體目標個數K個抽到x個目標的機率分配,且取出不放回*
|| $x \sim Hyper(N,K,n)$|
|-|-|
| $f(x)$ | $\cfrac{(^k_x)(^{N-k}_{n-x})}{(^N_n)}$ |
| $E(X)$ | $\frac{nk}{N}$ |
| $Var(x)$ | $\frac{nk}{N}(1-\frac{K}{N})\frac{N-n}{N-1}$ |
### 幾何分配 geometry
*一直試驗到成功為止所需要的次數x的機率分配*
|| $x \sim Geo(p)$|
|-|-|
| $f(x)$ | $(1-p)^xp$ |
| $E(X)$ | $\frac{1}{p}$ |
| $Var(x)$ | $\frac{q}{p^2}$ |
| $M_x(t)$ | $\frac{pe^t}{1-qe^t}$ |
|無記憶性|$P(X>a+b\|X>a)=P(X>b)$|
### 負二項分配 negative binomial
*一直試驗到成功n次為止所需要的次數x的機率分配*
|| $x \sim NB(n,p)$|
|-|-|
| $f(x)$ | $(^{x-1}_{k-1})p^kq^{x-k}$ |
| $E(X)$ | $\frac{k}{p}$ |
| $Var(x)$ | $\frac{kq}{p^2}$ |
| $M_x(t)$ | $(\frac{pe^t}{1-qe^t})^k$ |
### 連續均勻分配
||$x\sim U(a,b)$|
|-|-|
| $f(x)$ | $\frac{1}{b-a}, a\leq x\leq b$ |
| $E(X)$ | $\frac{a+b}{2}$ |
| $Var(x)$ | $\frac{(b-a)^2}{12}$ |
| $M_x(t)$ | $\frac{e^{bt}-e^{at}}{(b-a)^t}$ |
### 卜瓦松分配 poisson
*在到達率$\lambda$在某一時間同時抵達x個的機率分配*
*等同於$p=\frac{\lambda t}{n}且t=1的binomial$*
|| $x \sim Poi(\lambda)$|
|-|-|
| $f(x)$ | $\cfrac{e^{-\lambda}\lambda^x}{x!}$ |
| $E(X)$ | $\lambda$ |
| $Var(x)$ | $\lambda$ |
| $M_x(t)$ | $e^{\lambda(e^t-1)}$ |
### 指數分配 exponential
*在到達率$\lambda$抵達一個所需要時間x的機率分配*
|| $x \sim Exp(\lambda)$|
|-|-|
| $f(x)$ | $\lambda e^{-\lambda x}$ |
| $E(X)$ | $\frac{1}{\lambda}$ |
| $Var(x)$ | $\frac{1}{\lambda^2}$ |
| $M_x(t)$ | $\frac{\lambda}{\lambda-t};t<\lambda$ |
### Gamma分配
*在到達率$\lambda$抵達$\alpha$個所需要時間x的機率分配*
|| $x \sim Gamma(\alpha,\lambda)$|
|-|-|
| $f(x)$ | $\cfrac{\lambda^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-x}$ |
| $E(X)$ | $\frac{\alpha}{\lambda}$ |
| $Var(x)$ | $\frac{\alpha}{\lambda^2}$ |
| $M_x(t)$ | $(\frac{\lambda}{\lambda-t})^\alpha;t<\lambda$ |
*分部積分(integration by part)*
|左邊取微分|每隔一個取負數|右邊取積分$\int$|
|-|-|-|
|$x^2$|\ (+)|$e^{-x}$|
|x|\ (-)|$-e^{-x}$|
|1||$e^{-x}$|
### Beta分配
|| $x \sim beta(a,b)$|
|-|-|
| $f(x)$ | $\cfrac{1}{\beta(a,b)}x^{a-1}(1-x)^{b-1}$ |
| $E(X)$ | $\frac{a}{a+b}$ |
| $Var(x)$ | $\frac{ab}{(a+b+1)(a+b)^2}$ |
### 常態分配(高斯分布)
|| $x \sim norm(\mu,\sigma^2)$|
|-|-|
| $f(x)$ | $\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}$ |
| $E(X)$ | $\mu$ |
| $Var(X)$ | $\sigma^2$ |
| $M_X(t)$ | $e^{\mu t+\frac{\sigma^2 t^2}{2}}$ |
| $標準化Z=\frac{x-\mu}{\sigma}$| $Z\sim N(0,1)$|
| $M_Z(t)$ | $e^{\frac{t^2}{2}}$ |
### 卡方分配 chi-square
|$f(x)$ | $\cfrac{x^{\frac{v}{2}-1}e^{\frac{-x}{2}}}{2^{\frac{v}{2}}\Gamma(\frac{v}{2})}$|
|-|-|
| $母體\chi^2$ | $\sum_{i=0}^n(\frac{x_i-\mu}{\sigma})^2$~$\chi^2_{(n)}$|
| $樣本\chi^2$ | $\frac{(n-1)S^2}{\sigma^2}$~$\chi^2_{(n-1)}$ |
| $Z \sim \chi^2_{(1)}$ | $\chi^2_{(1)}\sim F(1,\infty)$|
|$E(X)$|$v$|
|$Var(X)$|$2v$|
|$M_t(X)$|$(1-2t)^{-\frac{v}{2}}$|
### t分配
$t=\cfrac{Z}{\sqrt{\cfrac{\chi^2}{df}}}$
### F分配
| $F=\cfrac{\chi^2_{(n_1-1)}}{\chi^2_{(n_2-1)}}=\cfrac{\frac{(n-1)S^2_1}{\sigma^2_1}}{\frac{(n-1)S^2_2}{\sigma_2}}=\cfrac{S_1^2\sigma_2^2}{S_2^2\sigma_1^2}\sim F(n_1-1,n_2-1)$ |
|-|
| $F_{\alpha}(n_1,n_2) = \cfrac{1}{F_{1-\alpha}(n_2,n_1)}$ |
### 抽樣分配
| $抽樣變異S^2$ | $\sum^n_{i=1}\frac{(x_i-\bar{x})^2}{n-1}$ |
|-|-|
| $E(X)$| $\mu$|
| $E(S^2)$ | $\sigma^2$ |
| $E(\sum^n_{i=1}\frac{(x_i-\bar{x})^2}{n})$ | $\frac{n-1}{n}\sigma^2$ |
| $E(S)$不為樣本變異數的開根號 | $\frac{\sqrt{2}\sigma}{\sqrt{n-1}}\frac{\Gamma(\frac{n}{2})}{\Gamma(\frac{n-1}{2})}$|
|$S^2$~$Gamma(\alpha=\frac{n-1}{2}, \lambda=\frac{n-1}{2\sigma^2})$||
## 點估計
### 性質
|不偏性||
|-|-|
|不偏估計式| $E(\hat{\theta_n})=0$|
|偏誤估計式| $E(\hat{\theta_n})\neq0$|
|漸進不偏估計式|$\lim_{n\to\infty}E(\hat{\theta_n})=0$|
|有效性||
|-|-|
|相對有效| $Var(\hat{\theta_i})$越小越好|
|絕對有效| Minimum Variance Unbiased Estimation|
|CRLB(Cramer-Rao Lower Bound) | $\cfrac{1}{-nE(\frac{\partial^2\ln f(x;\theta)}{\partial\theta^2})}\leq Var(\hat{\theta})$|
|充分性||
|-|-|
|Fisher-Neyman factorization|$f(x_1,..x_n;\theta)=g(\hat{\theta};\theta)h(x_1,...x_n)$|
|一致性||
|-|-|
|不偏|$\lim_{n\to\infty}Var(\hat{\theta_n})=0$|
|偏誤|$\lim_{n\to\infty}MSE(\hat{\theta_n})=0$|
### Maximum Likelihood Estimator $\hat{\theta}_{MLE}$
|likelihood function|$L(\theta)=\Pi^n_{i=1}f(x_i;\theta)$|
|-|-|
|$L(\theta)為convex$| $\hat{\theta}_{MLE}可以透過L(\theta)一次微分等於零且二次微分小於零求出$ |
|$L(\theta)為離散不可微分$|兩面逼近法<br>$L(N)\geq L(N-1)$<br>$L(N)\geq L(N+1)$|
|超幾何分配兩面逼近法|$\hat{N}_{MLE}為[\frac{nK}{x}-1,\frac{nK}{x}]之間的正整數$|
|$L(\theta)為嚴格遞減$|$\hat{\theta}_{MLE}=max\lbrace x_1...x_n \rbrace$|
| $\hat{\theta}_{MLE}$~$N(\theta,CRLB)$||
### Method of Moments Estimator $\hat{\theta}_{MME}$
| 母體k階原動差 | $\mu_k=E(X^k)$ |
|-|-|
| 樣本k階原動差 | $m_k=\frac{\sum^n_{i=1}x_i^k}{n}$ |
| 母體一階動差等於樣本平均|$E(x)=\bar{x}$ |
| 母題二階動差等於樣本變異數加平均平方|$E(x^2)=\hat{s}^2+\bar{x}^2$|
## 區間估計
### 算式表示方法
| $1-\alpha=P(\hat{x}-e<x<\hat{x}+e)$ |
|-|
|$x的(1-\alpha)\% 信賴區間 (\hat{x}-e,\hat{x}+e)$|
### 兩獨立母體$\mu_1-\mu_2$
|情境|誤差|
|-|-|
|$母體為常態、已知\sigma^2_1、\sigma^2_2=>Z分配$|$Z_\frac{\alpha}{2}\sqrt{\frac{\alpha^2_1}{n_1}+\frac{\alpha^2_2}{n_2}}$|
|$未知\sigma^2_1、\sigma^2_2且n_1\geq30、n_2\geq30$<br>$依據中央極限定理C.L.T.$|$Z_\frac{\alpha}{2}\sqrt{\frac{S^2_1}{n_1}+\frac{S^2_2}{n_2}}$|
|$母體為常態、只知\sigma^2_1=\sigma^2_2且n_1<30、n_2<30$<br>$其中S_p=\frac{(n_1-1)S^2_1+(n_2-1)S^2_2}{n_1+n_2-2}且\frac{(n_1-1)S^2_1+(n_2-1)S^2_2}{\sigma^2}$~$\chi^2_{(n_1+n_2-2)}$|$t_\frac{\alpha}{2}(n_1+n_2-2)\sqrt{\frac{S^2_p}{n_1}+\frac{S^2_p}{n_2}}$|
|$母體為常態、只知\sigma^2_1\neq\sigma^2_2且n_1<30、n_2<30$ | $t_\frac{\alpha}{2}(df)\sqrt{\frac{S^2_1}{n_1}+\frac{S^2_2}{n_2}}$、$df=\cfrac{(\frac{S^2_1}{n_1}+\frac{S^2_1}{n_1})^2}{\frac{(\frac{S^2_1}{n_1})^2}{n_1-1}+\frac{(\frac{S^2_2}{n_2})^2}{n_2-1}}$|
|母體不為常態且n<30|無母數統計|
### 兩相依母體差期望值$\mu_D$
|變異數(Di為兩者差異)|誤差|
|-|-|
|$S^2_D=\frac{1}{n-1}\sum^m_{i=0}(D_i-\bar{D}^2)$|$t\frac{\alpha}{2}(n-1)\cfrac{S_D}{\sqrt{n}}$
### 兩獨立常態母體變異數比例$\cfrac{\sigma^2_1}{\sigma^2_2}$
| 查表時可以進行以下變換 | $F_\alpha(n_1-1,n_2-1)=\cfrac{1}{F_{1-\alpha}(n_2-1,n_1-1)}$ |
|-|-|
| 信賴度$1-\alpha$的區間 | $(\cfrac{S_1^2}{S_2^2}\cfrac{1}{F_{\alpha}(n_1-1,n_2-1)},\cfrac{S_1^2}{S_2^2}\cfrac{1}{F_{1-\alpha}(n_1-1,n_2-1)})$ |
### 兩母體比例差$p_1-p_2$
$Z_{\frac{\alpha}{2}}\sqrt{\frac{\hat{p_1}(1-\hat{p_1})}{n_1}\frac{\hat{p_2}(1-\hat{p_2})}{n_2}}$
### 單一母體p
$z\frac{\alpha}{2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$
### 單一母體變異數$\sigma^2$
($\cfrac{nS^2}{X^2_{\frac{\alpha}{2}}(n)}$,$\cfrac{nS^2}{X^2_{1-\frac{\alpha}{2}}(n)}$)
### 單一母體預測區間
$t\frac{\alpha}{2}(n-1)\sqrt{S^2(1+\frac{1}{n})}$
### 單一母體樣本數
|誤差|樣本數|
|-|-|
|$E=Z\frac{\alpha}{2}\cfrac{\sigma}{\sqrt{n}}$|$n=\cfrac{(Z\frac{\alpha}{2})^2\sigma^2}{E^2}$|
## 假設檢定
| 結論\真實 | H~0~為真 | H~0~為假 |
| -------- | -------- | -------- |
| 拒絕H~0~ | $\alpha(型I錯誤)$ | $1-\beta(檢定力)$|
| 接受H~0~ | $1-\alpha$| $\beta(型II錯誤)$ |
$C\lbrace 拒絕H_0|H_0為真\rbrace=P(Z>x|x=X)=\alpha$
$C\lbrace 拒絕H_0|H_0為假\rbrace=P(Z<x|x=X)=1-\beta$
### 最強力檢定與抽樣數
|情境|臨界值|抽樣數|
|-|-|-|
|右尾檢定|$k=\mu_0+z_\alpha \frac{\sigma}{\sqrt{n}}=\mu_a-z_\beta \frac{\sigma}{\sqrt{n}}$|$n=\frac{(Z_\alpha +Z_\beta)^2\sigma^2}{(\mu_1-\mu_0)^2}$|
|左尾檢定|$k=\mu_0-z_\alpha \frac{\sigma}{\sqrt{n}}=\mu_a+z_\beta \frac{\sigma}{\sqrt{n}}$|$n=\frac{(Z_\alpha +Z_\beta)^2\sigma^2}{(\mu_1-\mu_0)^2}$|
|雙尾檢定|$k=\mu_0+z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}=\mu_a-z_\beta \frac{\sigma}{\sqrt{n}}$|$n=\frac{(Z_{\frac{\alpha}{2}} +Z_\beta)^2\sigma^2}{(\mu_1-\mu_0)^2}$|
## 變異數分析 ANOVA
### 單因子
|$x_{ij}=\mu+\alpha_i組間差異+\epsilon_{ij}組內差異$|
|-|-|
|$(x_{ij}-\mu)總差異=(x_i-\mu)+(x_{ij}-x_i)$|
|$SST(總變異)=SSTR(因子變異)+SSE(隨機變異)$|
|$SST=\sum^K_{i=1}\sum^{n_i}_{j=1}(x_{ij}-\bar{x..})^2=\sum^K_{i=1}\sum^{n_i}_{j=1}x^2_{ij}-\cfrac{T_{..}^2}{N}$|
|$SSTR=\sum^K_{i=1}\sum^{n_i}_{j=1}(x_{i.}-\bar{x_{..}})^2=\sum^K_{i=1}\cfrac{T^2_{i.}}{n_i}-\cfrac{T_{..}^2}{N}=\sum^K_{i=1}n_i(\bar{x_i.}-\bar{x..})^2$|
|$SSE=\sum^K_{i=1}\sum^{n_i}_{j=1}(x_{ij}-\bar{x_{i.}})^2=\sum^K_{i=1}\sum^{n_i}_{j=1}x^2_{ij}-\cfrac{T^2_{i.}}{n_i}=\sum^K_{i=1}(n_i-1)S^2_i$|
| Variance<br>Component | SS | df |MS|F|
| -------- | -------- | -------- |-|-|
|Between|SSTR|K-1|MSTR|$\frac{MSTR}{MSE}$|
|Within|SSE|N-K|MSE||
|Total|SST|N-1|||
### 隨機集區Randomized Block Design
| Variance<br>Component | SS | df |MS|F|
| -------- | -------- | -------- |-|-|
|Between|SSR|c-1|MSR|$\frac{MSR}{MSE}$|
|Block|SSB|r-1|MSB|$\frac{MSB}{MSE}$|
|Within|SSE|(r-1)(c-1)|MSE||
|Total|SST|rc-1|||
### 二因子未重複
| Variance<br>Component | SS | df |MS|F|
| -------- | -------- | -------- |-|-|
|Row|SSR|r-1|MSR|$\frac{MSR}{MSE}$|
|Column|SSC|c-1|MSC|$\frac{MSC}{MSE}$|
|Within|SSE|(r-1)(c-1)|MSE||
|Total|SST|rc-1|||
### 二因子重複試驗
| Variance<br>Component | SS | df |MS|F|
| -------- | -------- | -------- |-|-|
|Row|SSR|r-1|MSR|$\frac{MSR}{MSE}$|
|Column|SSC|c-1|MSC|$\frac{MSC}{MSE}$|
|Interaction|SSI|(r-1)(c-1)|MSI|$\frac{MSI}{MSE}$|
|Within|SSE|rc(n-1)|MSE||
|Total|SST|rcn-1|||
### 變異數同質性檢定 Hartley's Test
|$H_0:\sigma^2_1=\sigma^2_2=...=\sigma^2_k=\sigma^2$|
|-|
|$H_1:\sigma^2_i不全相同$|
|$H=\cfrac{Max(S^2_i)}{Min(S^2_i)}$|
## 簡單回歸
### 變異符號
|$SS_x=\sum(x_i-\bar{x_i})^2$|
|-|
|$SS_{xy}=\sum(x_i-\bar{x_i})(y_i-\bar{y_i})$|
|$S^2_x=\frac{1}{n-1}\sum(x_i-\bar{x_i})^2$|
|$S_{xy}=\frac{1}{n-1}\sum(x_i-\bar{x_i})(y_i-\bar{y_i})$|
### 回歸變異數
|$\hat{y_i}=\hat{\alpha}+\hat{\beta}x_i$|
|-|
|$SST=\sum(\hat{y_i}-\bar{y_i})^2=SS_y$|
|$SSR=\sum(y_i-\bar{y_i})^2=\hat{\beta}^2SS_x$|
|$SSE=\sum(y_i-\hat{y_i})^2=\sum y^2_i-\hat{\alpha}\sum y_i-\hat{\beta}\sum x_iy_i$|
|$MSE=\frac{SSE}{n-2}$|
| Variance<br>Component | SS | df |MS|F|
| -------- | -------- | -------- |-|-|
|Regression|SSR|1|MSR|$\frac{MSR}{MSE}$|
|Error|SSE|N-1|MSE||
|Total|SST|N-2|||
### 迴歸係數求解
|$SSE=\sum(y_i-\hat{\alpha}-\hat{\beta}x_i)^2$|
|-|
|$解聯立\frac{\partial{SSE}}{\partial{\hat{\alpha}}}=\frac{\partial{SSE}}{\partial{\hat{\beta}}}=0$|
|$\hat{\beta}=\cfrac{\sum x_iy_i-\frac{(\sum x_i)(\sum y_i)}{n}}{\sum x^2_i-\frac{(\sum x_i)^2}{n}}$|
|$\alpha=\bar{Y}-\hat{\beta}\bar{x}$|
|$E(MSE)=E(\frac{SSE}{n-2})=\sigma^2$|
### 回歸模型檢定
|斜率是否為$\beta$|$t=\cfrac{\hat{\beta}-\beta}{\sqrt{\cfrac{MSE}{SS_x}}}$|
|-|-|
|截距是否為$\alpha$|$t=\cfrac{\hat{\alpha}-\alpha}{\sqrt{\cfrac{MSE(\sum x^2_i)}{n SS_x}}}$|
|給定x=X,求y平均(期望值)區間|$V(\mu_{y\|x})=\sigma^2[\frac{1}{n}+\frac{(x-\bar{x})^2}{SS_x}]$ |
|給定x=X,求y值區間|$V(\hat{y}_x)=\sigma^2[1+\frac{1}{n}+\frac{(x-\bar{x})^2}{SS_x}]$|
|檢定相關係數$\rho$是否等於零|$t=\frac{r\sqrt{n-2}}{\sqrt{1-r^2}}$~$t_{(n-2)}$|
|檢定相關係數$\rho$是否等於$\rho_0$ | $Z_r=\frac{1}{2}\ln(\frac{1+r}{1-r})$<br>$Z_{\rho_0}=\frac{1}{2}\ln(\frac{1+\rho_0}{1-\rho_0})$<br>$Z=\frac{Z_r-Z_{\rho_0}}{\sqrt{\frac{1}{n-3}}}$|
### 皮爾森相關係數
|相關係數|$r=\frac{S_{xy}}{S_xS_y}=\frac{SS_{xy}}{\sqrt{SS_xSS_y}}=\frac{\sum(x-\bar{x})(y-\bar{y})}{\sqrt{\sum(x-\bar{x})^2(y-\bar{y})^2}}$|
|-|-|
|判定係數|$R^2=r^2=\frac{SSR}{SST}$|
### 多元回歸
$s(\beta)=\epsilon'\epsilon=(y-x\beta)'(y-x\beta)=y'y-\beta'x'y-y'x'\beta+\beta'x'x\beta=y'y-2\beta'x'y+\beta'x'x\beta$
$\frac{\partial S}{\partial \hat{\beta}} = -2x'y+2x'x\hat{\beta}=0 \Sigma \quad x'x\hat{\beta}=x'y \quad \hat{\beta}=(x'x)^-1x'y$
$SSR=\hat{\beta}'x'y-\frac{(\Sigma y_i)^2}{n}$
$SSE=y'y-\hat{\beta}'x'y$
$SST=y'y-\frac{(\Sigma y_i)^2}{n}$
$Cov(\hat{\beta})=\sigma^2(x'x)^-1 \quad Cov(\hat{\beta}_i,\hat{\beta}_j)=\sigma^2C_{ij}$
### 殘差分析
$e_i=y_i-\hat{y_i}$
$e=y-\hat{y}=y-X(X'X)^{-1}Xy=y-Hy=(I-H)y=(I-H)(X\beta +\epsilon)$
$V(e)=(I-H)V(\epsilon)(I-H)'=\sigma^2(I-H)(I-H)'=\sigma^2(I-H)$
1. standardized Residuals
$d_i=\frac{e_i}{\sqrt{MSE}}$
2. Studentized Residuals
$r_i=\frac{e_i}{\sqrt{MSE(1-h_{ii})}}$
3. Press Residuals
$PRESS=\frac{e_i}{\sigma^2(1-h_{ii})}$
4. R-student
$s^i_{(i)}=\frac{ (n-p)MSE-\frac{e^2_i}{1-h_{ii}} } {n-p-1}$
$t_i=\frac{e_i}{\sqrt{S^2_{(i)}(1-h_{ii})}}$
## 無母數統計 nonparametric statistic
### 卡方檢定
* 適合度檢定(檢定資料是否符合某種分配)(卜瓦松、二項、常態分配)
|理論值|$e_i=試驗次數n*理論機率P_i$|
|-|-|
|拒絕域|$C=\lbrace \chi^2\|\chi^2>\chi^2_\alpha(k分類個數-1-m母數估計個數)\rbrace$|
|統計量|$\chi^2=\sum^k_{i=1}\frac{(O_i-e_i)^2}{e_i}$|
| 次數 | 0 | 1 | 2 | ...|
| -------- | - | -|-|-|
| O~i~觀察值| 30|27|10|3|
| e~i~理論值| 29.53|29.53|9.84|1.1|
* 獨立性檢定(檢定兩個名義變項是否獨立,又稱列聯檢定)
|理論值|$e_i=nP_{ij}=nP_iP_j=n\frac{T_i}{n}\frac{T_j}{n}=\frac{T_iT_j}{n}$|
|-|-|
|拒絕域|$C=\lbrace \chi^2\|\chi^2>\chi^2_\alpha(r-1)(c-1)\rbrace$|
|統計量|$\chi^2=\sum^r_{i=1}\sum^c_{i=1}\frac{(O_i-e_i)^2}{e_i}$|
| O~i~(e~i~) | 是 | 否 | 合計T~i~ |
| -------- | -|-|-|
| 項目1|$436(\frac{848*1691}{6913}=207)$|1255(1483)|1691|
| 項目2|208(292)|2174(2089)|2382|
| 項目3|204(304)|2636(2176)|2480|
| 合計T~j~|848|6065|6913|