統計實驗筆記

變數

$\hat p$ 樣本比例
$\mu$ = 母體平均數 = 中央趨勢量數
$\sigma$ = 母體變異術 = 分散趨勢量數
$p$ = 母體比例

nominal scale: 名目尺度/類別尺度
- 姓名、身高…
ordinal scale: 順序尺度/等級尺度
- 名次: 1、2、3
interval scale: 等距尺度/比例尺度
- 溫度： 10 -> 20 -> 30…

https://www.myclass-lin.org/wordpress/archives/615

Qualitative Data: 非數值資料(定性資料)
Quantitative Data: 數值資料
- 離散隨機變數
- 連續隨機變數

Random Variable

隨機變數
給定樣本空間$(S,{\mathbb {F}})$，如果其上的實值函數 $X:S \to {\mathbb {R}}$ 是 $\mathbb{F}$ (實值)可測函數，則稱$X$為（實值）隨機變數。

A random variable is a measurable function ${\displaystyle X\colon \Omega \to E}$ from a set of possible outcomes $\Omega$ to a measurable space $E$.

變異數

代數性質
$(\sigma)^2={1 \over N}{\Sigma}_1^N(X_i-\mu)^2$

移項，拆開後得到
${\Sigma}X_i^2=N{\sigma}^2+N\mu$

亦可表達為「$\sigma^2=$ 平方的期望值-期望值的平方」
$(\sigma)^2={{\Sigma}X_i^2 \cdot f(x)}-\mu^2$

樣本變異數，亦若是
${\Sigma}x_i^2=(n-1)s^2+n \cdot \bar x$
平移不變性
- 變異數的平移不變性，平移變異數不變
- 自己推，很簡單
平方擴充性
- 變異數的平方擴充性
- 原來：$X_1,X_2,X_3...X_N$
- 令$Y_i=aX_i$
- 則${\mu}Y=a{\mu}_X$
- $Y$標準差公式，以$aX_i$代換，提出a
- 得知${\sigma}_Y=a{\sigma}_X$，所以${\sigma}_Y^2=a^2*{\sigma}_X^2$

共變異數

${{\sigma}_{x,y}}^2={Cov}(X,Y)$
$=\Sigma_y\Sigma_x(x-\mu_x)(y-\mu_y)f(x,y)$
$={E}((X-{\mu}_X)(Y-{\mu}_Y))$　　定義式
$={E}(XY-{\mu}_X \cdot Y-{\mu}_y \cdot X+{\mu_X}{\mu}_Y)$
$=E(XY)-\mu_X \cdot E(Y) - \mu_Y \cdot E(X)+E(\mu_X \mu_Y)$
$=E(XY)-\mu_X \mu_Y$
$=E(XY)-E(X)E(Y)$　　計算式

待自己證

$Var(aX+bY)=a^2Var(X)+b^2Var(Y)+2ab \cdot Cov(X,Y)$

性質：
- ${Cov}(X,a)=0$， $a \in Constant$
- ${Cov}(X,Y)={Cov}(Y,X)$
- ${Cov}(X,X)=Var(X)={\sigma}^2_X$
- ${Cov}(X+d,Y+c)={Cov}(X,Y)$
- ${Cov}(aX,bY)=a \cdot b\cdot {Cov}(X,Y)$
  - Eg: ${Cov}(-2X-5,3Y-7)=-2*3*{Cov}(X,Y)=-6{Cov}(X,Y)$

Chebyshev's Theorem

https://zh.wikipedia.org/wiki/切比雪夫不等式
$P( \left\|{x- \mu} \right\| \lt z \sigma) \gt 1 - {1 \over z^2}$

Proof

By Markov Theorem
We have $P(X \ge a) \le {E(X) \over a}$， Take $X = |x-\mu|$
$\Rightarrow P(|x-\mu| \ge a) \le {E(|x-\mu|) \over a}$
$\Rightarrow P(|x-\mu| \ge a)^2 \le {E((x-\mu)^2) \over a^2}$
$\Rightarrow P(|x-\mu| \ge a)^2 \le {Var(x) \over a^2}$
$\Rightarrow P(|x-\mu| \ge a) \le {\sigma \over a}$
$\Rightarrow P( |x- \mu| \ge a \sigma) \le {1 \over a^2}$
That is Chebyshev's Theorem!

機率複習

eg:

	台大	中山	政大	(人數)
男	30	66	234	330
女	18	42	210	270
	48	108	444	600

列聯表

	台大	中山	政大	機率
男	0.05	0.11	0.39	0.55
女	0.03	0.07	0.35	0.45
機率	0.08	0.18	0.74	1

邊際機率：在有兩個以上的事件的樣本空間中，若僅考慮某一事件個別發生的機率，稱為邊際機率。
也就是最右邊的 column 及最下面的 row

獨立事件：自己看

$P(A|M)$:念作 probility of $A$ condition $M$

算機率在離散型要注意等號

axiom:

$\int_x P(x)=1$
$0\le P(x) \le 1$, $\forall A \subset \Omega$
$P(\Omega)＝1$
設$A_{1},A_{2}..為樣本空間\Omega中之一組事件，A_{i}\land A_{j} \not = 0,\forall_{i\not = j}，則P(\cup_{i=1}^{\infty})=\sum_{i=1}^{\infty}P(A_{i})。$

貝氏定理:
設$A_1,A_2...A_n為\Omega中之一組分割，B為\Omega上之任意分割事件，則P(A_i|B)=\frac{P(B|A_i)P(A_i)}{\sum_{i=1}^{n}P(B|A_i)P(A_i)}$

期望值

Except

$E(X)=\mu$
$Var(X)=\sigma^2 = E[(x-\mu)^2]$

分佈

r.v. $X, 　X \sim B(n,p)$ ~ : belongs to(服從)
$f_{\otimes}(x)=\{^{C^{n}_{x}P^x(1-P)^{n-x},　\forall x \in \mathbb{N} \cup \{0\}}_{0\quad\quad,其他(otherwise)}$
P:成功的機率

二項式分配：當 n = 1 時是 bernoulli

機率函數

設x為離散型r.v.,則$f_x(x)=\{^{P(X=x),x\in R_x}_{0, x \not \in R_x}\quad$ R:range

$R_x=\{x|x\in X(\omega),\forall \omega\in \Omega\}$
$X:\Omega\to\mathbb{R}$

$f_{xy}(x,y)=\{^{P(X=x,Y=y),(x,y)\in R_{xy}}_{0\quad\quad,(x,y)\not\in R_{xy}}$

老師喜歡這樣表達：當你寫P()，你要在 () 中描述完整事件，所以要寫得像：P(Z<z)或f(x)…
* class P(Event);
* class f(var);

$f(z) \ne P(Z<z)$
$f(z)$ 是單點機率密度
$P(Z<z)$ 是事件機率

Distribution

只有 Possion, normal 分布有封閉性

Discreate

Bernoulli distribution

\[P(x) = p^xq^{1-x}\]
$$

進行一次成敗實驗，定義 x 表成功的次數
$R_x = \{0,1\}$
母數：$0 \le P \le 1$
$X \sim Ber(p)$

Binomial distribution

iid: 獨立且同樣集合，Independent and identically distributed

Definition
在n個獨立的是/非試驗中成功的次數的離散機率分布，其中每次試驗的成功機率為p。其分佈即為二項分佈。

\[P(x) = {n\choose m} p^x q^{n-x}\]
$$

Testing Bernoulli for n times
$Rx = \{x \in \mathbb{N}, x \lt n \}$
Bernomial Sigma additivity (可加性)
- $x, y \sim^{iid} B(P)$
二項式分布式離散型的常態分佈
$E(x) = np$
$Var(x) = npq$

Poisson distribution

有封閉性
\[P(x) ={e^{- \lambda } \lambda^x \over x!}\]

Definition
A discrete random variable X is said to have a Poisson distribution with parameter λ > 0, if, for x = 0, 1, 2, …, the probability density function of X is given by:
\[P(x) ={e^{- \lambda } \lambda^x \over x!}\]
$$

在單位時間內，線段平面空間上連續操作，Poisson 過程
- Poisson must homogeneous and indepedent
$R = \{\mathbb{N}+0\}$
$\lambda$ 為發生偶發事件的期望次數
$\lambda = E(X) = Var(X)$

Hyper Geometric

Definition

The result of each draw (the elements of the population being sampled) can be classified into one of two mutually exclusive categories.
The probability of a success changes on each draw, as each draw decreases the population.

\[{k \choose x}{N-k \choose n-x} \over {N \choose n}\]
$$

$E(x)=n{k \over N}$
取後不放回抽 n 個，成功 k 次
$Var(x)=n({k \over N})(1-{k \over N})({N-n \over N-1})$
修正因子：$(1-{N-n \over N-1})$ 因為因為他是 finite 所以前一次會影響下一次，(會縮小)，這稱作有限母體的修正因子。
$R_x = \{0、1、2 ... n\}$

Continuous

Normal

有封閉性

\[f(x) = {1 \over \sqrt{2 \pi} \sigma} e^{{-1 \over 2}({x- \mu \over \sigma})^2}\]
$$

Definition
將一連續變項之觀察值發生機率以圖呈現其分布情形，且具有以下特性：
以平均數為中線，構成左右對稱之單峰、鐘型曲線分布。
觀察值之範圍為負無限大至正無限大之間。

$X \sim N(\mu, \sigma^2)$
積起來很不好積，所以查表
- 因為每個常態分佈的 $\sigma, \mu$ 不同，查表怎麼查?
  - 規定一個標準常態分布：$Z \sim N(0,1)$
  - Standard Normal Probability Distribution
  - $f(x) = {1 \over \sqrt{2 \pi}} e^{{-1 \over 2}x^2}$
Computing Probabilities for Any Normal
Probability Distribution
- 標準化
- $X \sim X(\mu, \sigma^2), 　Let　 {x-\mu \over \sigma} \sim N(0,1)$
常態分配做線性變換，依舊是常態分配
- 注意平方->平移，變異數->|a|倍
- $E(\bar x) = \mu$
- $Var(\bar x) = {\sigma^2 \over n}$
反標準化
- $Z \sim N(0,1)$
- Let $X = \sigma Z + \mu$

Normal Approximation of Binomial Probabilities

葉氏連續性校正(Yates continuity correction)
用邊界 ± 0.5 去包住離散值

Exponential probability distribution

\[f(x) = {1 \over \mu} e^{-x \over \mu}\]
$$
https://zh.wikipedia.org/zh-tw/指数分布

令 τ 為隨機變數且其機率密度(probability density) 滿足

$f_τ(t):=λ e^{−λt}, if\ t \ge 0;$
$f_τ(t):=0, if\ t \lt 0$

其中 λ>0 為常數。則我們說 τ 為 exponential distribution 或者說 τ 為 Exponential 隨機變數

$E(x) = {\int}^{\infty}_0 x{1 \over \mu} e^{-x \over \mu} dx =\sigma$
$Var(x) = \mu^2$
By part

公式：$P(x>x_0)=e^{-x_0 \over \mu}$
proof:

若某計次過程服從 poisson process $\iff$ 間格時間必服從指數分布
指數分布的 $\mu$ 跟 poisson 的 $\mu$ 互為倒數
注意單位，使用標準單位不容易錯

eg:
Poisson: ${e^{- \lambda } \lambda^x \over x!}$
$\iff$
Expnential: ${\lambda} e^{-y \lambda}$

Sampling and Sampling Distributions

definition

樣本統計量的分配，稱為抽樣分配

smapling

有限母體
- hypergeomttric, sampling w/o replacement, dependent
- 取後不放回
無限母體
- Binomonal, sampling w/ replacement, independent

Statistical Inference 統計推測

Estimatoin 估測
Testing 檢定

我們主要想要估測三件事
平均數、標準差、百分比
我們說這是統計參數

eg: $X_1, X_2 ... X_n$
$\bar{x} = {1 \over n} \Sigma X$
$Var(\bar x) = Var( {1 \over n} \Sigma X) = {\sigma^2 \over n}$

點估計

重點： $\bar x$ 好用

$x_1, x_2 ...X_n \sim^{iid} f_{x_i}(x_i, \theta)$

用 $\hat \theta$ 去推論母體參數 $\theta$

估計值跟估計量是不同的，估計量有無限多個
有 hat 是估計量

不偏性

$Bias({\bar \theta}) = E({\bar \theta}) - \theta = 0$

高估估計量
$Bias(\theta)>0 \iff E(\theta)>0$
不偏估計量
$Bias(\theta)=0 \iff E(\theta)=0$
低估估計量
$Bias(\theta)<0 \iff E(\theta)<0$

證明 $s^2-\sigma^2 =0$

$E(s^2) = E({1 \over (n-1)} \Sigma(x^2_i) - n{\bar x}^2)$
$= {1 \over (n-1)} (\Sigma(E(x^2_i))-nE({\bar x}^2))$
$= {1 \over (n-1)} (\Sigma(Var(x)+E^2(x))-nE({\bar x}^2))$
$= {1 \over (n-1)} (\Sigma(\sigma^2+\mu^2)-nE({\bar x}^2))$
$= {1 \over (n-1)} (\Sigma(\sigma^2+\mu^2)-n({\sigma^2 \over n}+\mu^2))$
$= {1 \over (n-1)} (n\sigma^2+n\mu^2-\sigma^2-n\mu^2)$
$= {1 \over (n-1)} (n\sigma^2-\sigma^2)$
$=\sigma^2$

倒著寫即可。

有效性 (efficiency)

有效性是以估計式的平均平方誤差來衡量, 越小代表估計式的有效性越高。

sum of least squares

Wiki

一致性 (consistency)

當樣本數增大時, 估計值會漸近於母體參數真值。

A consistent estimator is one for which, when the estimate is considered as a random variable indexed by the number n of items in the data set, as n increases the estimates converge in probability to the value that the estimator is designed to estimate.

區間估計

信賴區間（英語：Confidence interval，C.I）

$[L,U]$ 估計 $\theta$，在 $(1-\alpha)100\%$ 信心水準
信心水準 $(1-\alpha)100\%$ 越大表示：越大的信心區間 [L, U] 會包含真實的母體 $\theta$

$(1-\alpha)$是中間面積

$1-\alpha = P(L \lt \theta \lt U)$

樞紐量

Pivotal Quantity
樞紐量有

隨機變數
未知代估母數

https://en.wikipedia.org/wiki/Pivotal_quantity

wikiA pivotal quantity or pivot is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters.

通常是點估計量的 t 或 z 分配

$x_1, x_2 ...x_n$ 與 $\theta$ 之函數組合
記為 $Q({\hat \theta_i}; \theta)$，且其機率分配不依賴於任何未知母數
(即，可完全被掌握)
$g(\hat \theta ,\theta) = \sqrt{n}\frac{\hat \theta - \theta}{s}$

求 $\theta$ 之 $(1-\alpha)100$ 信賴區間

找出適當估計量
找出適當的樞紐量及其機率分配
- 點估計量的分配
$1-\alpha = P(L \lt \theta \lt U)$
- $1-\alpha = P({\hat \theta}-k{\sqrt n \over s} \lt g(\hat \theta ,\theta) \lt {\hat \theta}+k{\sqrt n \over s})$
- k 要查表

Margin error: $E = {\sigma \over \sqrt n}{z_{\alpha \over 2}}$

為什麼 t 分配的自由度是 n-1?

因為t分配中的未知待估母數只有一個($\mu$)
因此未必自由度是 n-1

$\sigma$ 已知樞紐量是 z

查 t 表，如果自由度很大的時候，可以近似去查 z 表

變異數的區間估計

http://mail.tku.edu.tw/yinghaur/lee/stat-new/第十章補充–%E7%B5%B1%E8%A8%88%E4%BC%B0%E8%A8%88(%E6%AF%8D%E9%AB%94%E8%AE%8A%E7%95%B0%E6%95%B8%E4%B9%8B%E5%8D%80%E9%96%93%E4%BC%B0%E8%A8%88).pdf

信賴區間的意義

試驗 k 次，平均有 $1-\alpha$ 次，未知待估母數會落在該區間。

寫法：
- $0.95 = P({\bar x}-{\sigma \over \sqrt n}z_{\alpha \over 2} \le \mu \le {\bar x}+{\sigma \over \sqrt n}z_{\alpha \over 2})$

樣本比例的信賴區間

單一母體樣本比例的區間估計

$X_1, X_2, ... X_n \sim^{iid} Ber(p)$

點估計: $\hat p \Rightarrow p$
$\hat p \Rightarrow^a_{CLT} N(p, \sigma_{\hat p})$
- $z = {{\hat p - p} \over \sqrt{\hat p (1- \hat p) \over n}}$
- a 是漸近
- 根據中央極限定理漸近常態
$1-\alpha = P(|\hat p - p| \lt z_{\alpha \over 2}SE(\hat p))$
- SE = standard error

margin error = $z_{\alpha \over 2}\sqrt{\hat p(1- \hat p) \over n}$

假說檢定

讓樣本據說話
檢定力(power)，檢定力的大小，就是檢定的有效程度大小：
- eg:
  - 左圖 power 大，右圖 power 小

	有罪推論	無罪推論
H0	有罪	無罪
Ha	無罪(需負舉證責任)	有罪

	H0	!H0
reject	$\alpha$ type one error	1-$\beta$
Do not reject	1-$\alpha$	$\beta$ type two error

如果題目沒說 $\alpha$ 沒說，一般來說設 0.05

p-value

樣本觀察值的尾機率

A p-value is a probability that provides a measure of the evidence againest the null hypothesis provided by the sample.
Smaller p-value indicate more evidence againest $H_0$.

魏丞偉把檢定統計量的絕對值拿掉，假設是檢定統計量是x，|x| > a => x > a or x < -a，之後再查表找大於a，小於-a的尾巴機率，加起來就會是p-value。

假說檢定之三面等價法

臨界值法
- Test statistic
p-value 法
- 樣本觀察值得尾機率
  - 如果雙尾檢定。算兩邊機率
區間估計法
- 從 $\bar x$ 出發，算信賴區間

結論必一致

母體變異數未知

自己算樣本變異數，所以使用 t 分配

假設母體常態
1. 假設 H0
2. $\alpha$
3. test statistic
  - $T = {{\bar x - \mu_0} \over {s \over {\sqrt{n}}}} \sim t(n-1)$

Definition of Student-T distribution

$T_\nu = {Z \over \sqrt{\chi^2 \over \nu}} \sim T$
$Z$ is a standard normal distribution
$\nu$ is the degree of freedom
$\chi^2$ is a Chi-square distribution

所需樣本數

單尾檢定
$\mu_0-{\sigma \over \sqrt{n}}\mathcal{z}_\alpha = \mu_a+{\sigma \over \sqrt{n}}\mathcal{z}_\beta$

左尾右尾可交換，所就用左尾檢定表示，算法相同。

因此，$n={\sigma^2(\mathcal{z}_\alpha+\mathcal{z}_\beta)^2 \over (\mu_0 - \mu_a)^2}$
注意這邊 $\alpha$ 值有可能因為雙尾檢定而除以 2

想像：用 $\alpha$ 算閾值的砍點跟用 $\beta$ 算肯定會一樣，而根據這砍點，定義我的 $\alpha$ 要多少

兩獨立母體之檢定

Case I: 母體常態，$\sigma_1^2 , \sigma_2^2$ 皆已知

Recall: $\bar x_1 - \bar x_2 \to \mu_1 - \mu_2$
$a\bar x_1 - b\bar x_2 \sim N(a\mu_1 - b\mu_2, {(a\sigma_1)^2\over n_1} + {(b\sigma_2)^2\over n_2})$
同樣的 $Var(aX+bY)=a^2Var(X)+b^2Var(Y)+2ab \cdot Cov(X,Y)$

然後依樣畫葫蘆，放變數進去

$\sigma = \sqrt{{(a\sigma_1)^2\over n_1} + {(b\sigma_2)^2\over n_2}}$ 我個人稱作 coSigma

技巧

在假說檢定上，需要有一個 const 放在右邊(待改進說法)，所以會盡量把變數放在左邊，做假說檢定。

$H_0: \mu_0 > \mu_1$
$\to \mu_0 - \mu_1 > 0$

檢定力:

$power = 1- \beta$

Case II: 母體常態，變異數皆未知

使用T分配

變異數相等(同質)

同質(Homogeneous)變異數假設：$\sigma_1 = \sigma_2$

$S_p^2 = \sigma^2 = {{(n_1 - 1)S_1^2+(n_2 - 1)S_2^2} \over {n_1 + n_2 - 2}}$
如此帶入

檢定統計數 $TS = {{(\bar x_1 - \bar x_2)-(\mu_1 - \mu_2)} \over \sqrt{S^2_p({1 \over n_1}+{1 \over n_2})}}$

自由度：$n_1 + n_2 - 2$

變異數相異

檢定統計數 $TS = {{(\bar x_1 - \bar x_2)-(\mu_1 - \mu_2)} \over \sqrt{{{s_1^2}\over n_1}+{{s_2^2}\over n_2}}}$

自由度為(取高斯整數)：
$df = {({{s_1^2 \over n_1}+{s_2^2 \over n_2}})^2 \over \sqrt{{1 \over n_1-1}{s_1^2 \over n_1}+{1 \over n_2-1}{s_2^2 \over n_2}}}$

兩相關常態母體之檢定

(成對樣本)相依母體
Sample matched, pair!

eg: 實驗組、對照組

$Sample \ size: n$

$d_k = {x_1}_k - {x_2}_k$

${\Sigma d_k \over n}= \bar D$

$S_D^2 = \Sigma(d_i- \bar D)^2$

$H_0: \mu_D = C$

服從 T 分配

$T = {{\bar D - \mu_D} \over {S_D \over \sqrt{n}}} \sim T(n-1)$

兩獨立母體比例之檢定

$\bar p_1 - \bar p_2 \sim N(p_1-p_2, {p_1q_1 \over n_1}+{p_2q_2 \over n_2})$

因為沒有 $p_1 \ p_2$ 所以變異數使用 ${\bar p_1}$ & ${\bar p_2}$ 代替

$if \ \ \ \ H_0:(p_1 = p_2 = p)$

$p = {{n_1 \bar p_1 + n_2 \bar p_2} \over {n_1 + n_2}}$

$\sigma = \sqrt{pq({1 \over n_1}+{1 \over n_1})}$

母體變異數之檢定

Chi-Square symbol: ${\chi}^2$

推導：
$s^2 = {1 \over n-1}\Sigma(x_i- \bar x)^2$

$\Rightarrow (n-1) s^2 = \Sigma(x_i- \bar x)^2$

$\Rightarrow {(n-1) s^2 \over \sigma^2} = {\Sigma(x_i- \bar x)^2 \over \sigma^2} = (Z^2_1+Z^2_2+Z^2_3+ ... +Z^2_n)\sim {\chi}^2_{(n-1)}$

Chi-square doesn't closed!!
$c \cdot {\chi}^2 \notin {\chi}^2, \forall c \in R$

$E(\chi^2) = df$
卡方變數之期望值＝自由度
$Var(\chi^2) = 2df$
卡方變數之變異數＝兩倍自由度

檢定統計數：
$TS = {(n-1)s^2 \over \sigma^2_0} \sim {\chi}^2_{(n-1)}$

because
$\chi^2_{1-{\alpha \over 2}} \le TS \le \chi^2_{\alpha \over 2}$

$\Rightarrow {(n-1)s^2 \over \chi^2_{\alpha \over 2}} \le \sigma^2 \le {(n-1)s^2 \over \chi^2_{1-{\alpha \over 2}}}$

移項而已
Then we can say $\sigma$ has {$1-\alpha$}% confidence in this intervel!

兩獨立母體變異數檢定

F-distribation

必要條件：

independent
two Normal populations
equal variances

F distribution

$X \sim F({df}_1, {df}_2)$
${df}_1 = n_1 - 1$
${df}_2 = n_2 - 1$

一個F-分布的隨機變數是兩個卡方分布變數除以自由度的比率：
${U_1/d_1 \over U_2/d_2} = {U_1/U_2 \over d_1/d_2}$
其中，$U_1 \sim \chi^2_1, U_2 \sim \chi^2_2$彼此獨立，自由度為 $d_1, d_2$

檢定統計數：
$TS = {s^2_1 \over s^2_2}$

標準差較大的放上面

可以保證出來的檢定統計數，是在右尾

比較多母體比率

多母體比率相等之檢定

卡方分配(chi-square distridution)

檢定統計數：
$\chi^2 = \Sigma_i\Sigma_j{(f_{ij}-e_{ij})^2 \over e_{ij}} \sim \chi^2_{(r-1)(c-1)}$

$f_{ij}$ = reality value
$e_{ij}$ = expected value, $H_0$, $\forall e_{ij} \ge 5$
$r$ = number of rows
$c$ = number of columns

Reject rule

p-value approach: Reject $H_0$ if p-value $\le \alpha$
Critical value: Reject $H_0$ if $\chi^2 \ge \chi^2_\alpha$

Critical values for the marascuilo pairwise comparison procedure for k population proportions

$CV_{ij} = \sqrt{\chi^2_{\alpha}}\sqrt{{\bar p_i \bar q_i \over n_i}+{\bar p_j \bar q_j \over n_j}}$

where
$\chi^2_\alpha$ with a level of significance $\alpha$ and $k \ – 1$ degrees of freedom
$\bar p_i$ and $\bar p_j$ are the proportions for the populations $i$, $j$
$n_i$ and $n_j$ are the sample size of populations $i$, $j$

Reject or significant if:
$|{\bar p_i - \bar p_j}| \gt CV_{ij}$

Test of independence

use preverious formula to judge whether the $\chi^2$ is siginificance.

$H_0$: Assumes that there is no association between the two variables.
$H_a$: Assumes that there is an association between the two variables.

Goodness of Fit test

適合度

檢定統計數：
$\chi^2_{(k-1)} = \Sigma^k_{i=1}{(f_i - e_i)^2 \over e_i}$

$f_i$ is the reality value
$e_i$ is the expected value, $\forall e_i \ge 5$
$k$ is the number of categories

Test for is Normal distribution?

Use Goodness of fit test to test whether it is normal distribution.

$n$ divided by 5 in to ${\lfloor}{n \over 5}{\rfloor}$ slice.

each slice is the $e_i$
Imgur

And test it's $\chi^2_{({\lfloor}{n \over 5}{\rfloor} -3)}$

Why -3?

beacuse the degree of freedom is $k - p -1$
$p$ is the number of parameters of the distribution estimated by the sample.

And the Normal distribution has 2 parameters.

Hence $k-p-1 = k-3$

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

	H0	!H0
reject	\(\alpha\) type one error	1-\(\beta\)
Do not reject	1-\(\alpha\)	\(\beta\) type two error

統計實驗筆記

變數

Random Variable

變異數

共變異數

相關係數

Chebyshev's Theorem

Proof

機率複習

期望值

分佈

機率函數

Distribution

Discreate

Bernoulli distribution

Binomial distribution

Poisson distribution

Hyper Geometric

Continuous

Normal

Normal Approximation of Binomial Probabilities

Exponential probability distribution

Sampling and Sampling Distributions

definition

smapling

Statistical Inference 統計推測

點估計

不偏性

有效性 (efficiency)

一致性 (consistency)

區間估計

樞紐量

求 \(\theta\) 之 \((1-\alpha)100\) 信賴區間

變異數的區間估計

信賴區間的意義

樣本比例的信賴區間

單一母體樣本比例的區間估計

假說檢定

p-value

假說檢定之三面等價法

母體變異數未知

Definition of Student-T distribution

所需樣本數

兩獨立母體之檢定

Case I: 母體常態，\(\sigma_1^2 , \sigma_2^2\) 皆已知

技巧

檢定力:

Case II: 母體常態，變異數皆未知

變異數相等(同質)

變異數相異

兩相關常態母體之檢定

服從 T 分配

兩獨立母體比例之檢定

母體變異數之檢定

兩獨立母體變異數檢定

F distribution

比較多母體比率

多母體比率相等之檢定

Reject rule

Critical values for the marascuilo pairwise comparison procedure for k population proportions

Test of independence

Goodness of Fit test

Test for is Normal distribution?

Why -3?