--- title: Stat09 tags: Stat --- [Home](https://hackmd.io/y_1O1ws1TQe8VRI7bxQbeg) # Chapter 9 Large-sample Tests of Hypotheses ## General principles ### Key ingredients - Model specification: $F(\theta)$ - Find an **estimator** of the parameter. - Standardize the estimator to obtain a **statistic** and its sampling distribution - Then, we have two statistical approaches - Confidence Interval :heavy_check_mark: - Hypothesis test :strawberry: ## 9.1 Hypothesis tests ### $H_0$ versus $H_a$ - Null hypothesis ($H_0$): a contradiction of the alternative hypothesis - Alternative hypothesis ($H_a$ or $H_1$): the hypothesis the researcher wants to support. ### Type I and Type II errors - Type I error refers to reject $H_0$ when it is in fact true. - The significance level $\alpha$ is the probability if rejecting $H_0$ when it is in fact true. Or, the probability of Type I error. - Type II error refers to accept $H_0$ when it is in fact false. - $\beta$ is the probability of accepting $H_0$ when it is in fact false. - The power of the test is ($1-\beta$), the probability of rejecting $H_0$ when it is false. :::warning ### S.O.P. of a hypothesis test The following procedures ensure that the probability of type I error is less than the predetermined $\alpha$. 1. Set up $H_0$ and $H_a$. 2. Decide the significant level $\alpha$ (usually $\alpha=0.05, 0.1, 0.01$.) 3. Decide the test statistic and its sampling distribution. 4. Calculate the realized test statistic from the data. 5. Find (a) the rejection region or (b) the $p$-value. 6. Conclude. ### (a) Rejection region approach (or critical value approach) 5. Find the rejection region 6. Conclusion: If the realized statistics falls into the rejection region, reject $H_0$. Otherwise, do not reject $H_0$. %(or when the test realized test statistic exceeds the critical value) ### (b) The $p$-value approach 5. Calculate the $p$-value: The $p$-value is the probability of observing a test statistic as extreme as or more than the one observed assuming $H_0$ is true. 6. Conclusion: If $p$-value is smaller than $\alpha$, reject $H_0$. Otherwise, do not reject $H_0$. ::: ## 9.2 One population mean ### Two-sided, one-sided (left-sided and right-sided) tests #### Two-sided - $H_0:\mu = \mu_0$ versus $H_a: \mu\neq \mu_0$. #### Left-sided - $H_0:\mu = \mu_0$ versus $H_a: \mu < \mu_0$ - $H_0:\mu \geq \mu_0$ versus $H_a: \mu < \mu_0$ - $H_0:\mu > \mu_0$ versus $H_a: \mu \leq \mu_0$ #### Right-sided - $H_0:\mu = \mu_0$ versus $H_a: \mu > \mu_0$ - $H_0:\mu \leq \mu_0$ versus $H_a: \mu > \mu_0$ - $H_0:\mu < \mu_0$ versus $H_a: \mu \geq \mu_0$ ### Key ingredients (A1) - Model: $X_i\stackrel{i.i.d.}{\sim}F(\mu,\sigma^2)$ for $i=1,\ldots,n$. - We are interested in the value of the parameter $\mu$. - We use the sample mean $\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$ to estimate the parameter mean. - With the CLT, we have $$\frac{\bar{X}_n-\mu}{\sigma/\sqrt{n}}\stackrel{d}{\rightarrow}Z$$ - When $\sigma$ is unknown, define the sample variance by $$s^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X}_n)^2.$$With the advanced CLT (Slutsky's theorem), we have $$\frac{\bar{X}_n-\mu}{s/\sqrt{n}}\stackrel{d}{\rightarrow}Z$$ ### $H_0:\mu = \mu_0$ For $H_0:\mu = \mu_0$, the test statistics and its sampling distribution is $$Z_{STAT}=\frac{\bar{X}_n-\mu_0}{s/\sqrt{n}}\stackrel{d}{\rightarrow}Z.$$ ### The two-sided test 1. $H_0:\mu = \mu_0$ versus $H_a: \mu\neq \mu_0$. 2. Set up $\alpha$ 3. $Z_{STAT} = \frac{\bar{X}_n-\mu_0}{s/\sqrt{n}}\stackrel{d}{\rightarrow}Z$. 4. Calculate the realized statistic $Z^*$ from the data. 5. Find (a) the rejection region = $\{z: z<-z_{\alpha/2}\;,z>z_{\alpha/2}\}$ or (b) the $p$-value = $2P(Z>|Z^*|)$. 6. Conclude. ### The left-sided test 1. One of the following: - $H_0:\mu = \mu_0$ versus $H_a: \mu < \mu_0$ - $H_0:\mu \geq \mu_0$ versus $H_a: \mu < \mu_0$ - $H_0:\mu > \mu_0$ versus $H_a: \mu \leq \mu_0$ 2. Set up $\alpha$ 3. $Z_{STAT} = \frac{\bar{X}_n-\mu_0}{s/\sqrt{n}}\stackrel{d}{\rightarrow}Z$. 4. Calculate the realized statistic $Z^*$ from the data. 5. Find (a) the rejection region = $\{z: z< -z_{\alpha}\}$ or (b) the $p$-value = $P(Z<Z^*)$. 6. Conclude. ### The right-sided test 1. One of the following: - $H_0:\mu = \mu_0$ versus $H_a: \mu > \mu_0$ - $H_0:\mu \leq \mu_0$ versus $H_a: \mu > \mu_0$ - $H_0:\mu < \mu_0$ versus $H_a: \mu \geq \mu_0$ 2. Set up $\alpha$ 3. $Z_{STAT} = \frac{\bar{X}_n-\mu_0}{s/\sqrt{n}}\stackrel{d}{\rightarrow}Z$. 4. Calculate the realized statistic $Z^*$ from the data. 5. Find (a) the rejection region = $\{z: z>z_{\alpha}\}$ or (b) the $p$-value = $P(Z>Z^*)$. 6. Conclude. ### Example: Assume $X_i\sim F(\mu,\sigma^2)$ for $i=1,ldots, 64$. $F$ is unknown. 1. $H_0:\mu=250,000$ vs $H_a: \mu>250,000$. 2. $\alpha=0.01$ 3. Test statistic: $$Z_{STAT}=\frac{\bar{X}-250000}{15000/\sqrt{64}}\rightarrow Z$$. 4. $Z*=1.07$. 5. 6. ![](https://i.imgur.com/HKTxDTs.png) ![](https://i.imgur.com/Qms5u9O.png) ![](https://i.imgur.com/eGKybc3.png) ## 9.3 Two population means ### Key ingredients (B1) - Model: $X_{1i}\stackrel{i.i.d.}{\sim} F(\mu_1,\sigma_1^2)$ for $i=1,\ldots,n_1$ and $X_{2j}\stackrel{i.i.d.}{\sim} F(\mu_2,\sigma_2^2)$ for $i=1,\ldots,n_2$. - We are interested in understanding $(\mu_1-\mu_2)$. - We use the difference between two sample means $(\bar{X}_1-\bar{X}_2)$ to estimate $(\mu_1-\mu_2)$, where $\bar{X}_1=\frac{1}{n_1}\sum_{i=1}^{n_1}X_{1i}$ and $\bar{X}_2=\frac{1}{n_2}\sum_{j=1}^{n_2}X_{2j}$. - With the CLT, we have $$\frac{(\bar{X}_1-\bar{X}_2)-(\mu_1-\mu_2)}{\sqrt{ \frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}}\stackrel{d}{\rightarrow}Z$$ - When $\sigma_1$ and $\sigma_2$ are unknown, we use the sample variances, $$s_1^2 = \frac{1}{n_1-1}\sum_{i=1}^{n_1}(X_{1i}-\bar{X}_1)^2, \quad s_2^2 = \frac{1}{n_2-1}\sum_{j=1}^{n_2}(X_{2j}-\bar{X}_2)^2.$$ - With the advanced CLT (Slutsky's theorem), we have $$\frac{(\bar{X}_1-\bar{X}_2)-(\mu_1-\mu_2)}{\sqrt{ \frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\stackrel{d}{\rightarrow}Z.$$ ### $H_0: \mu_1-\mu_2 = D_0$ The test statistic and its sampling distribution is $$Z_{STAT} = \frac{(\bar{X}_1-\bar{X}_2)-D_0}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\stackrel{d}{\rightarrow}Z$$ ### Example: $H_0$: $\mu_1-\mu_2=0$ vs $H_a$: $\mu_1-\mu_2\neq 0$. ![](https://i.imgur.com/MCxJg12.png) ![](https://i.imgur.com/KPu0hbe.png) ## 9.4 Population proportion ### Key ingredients (A2) - Model: $X {\sim} Binomial(n,p)$ - We use sample proportion $\hat{p} = \frac{X}{n}$ to estimate the population proportion $p$. - With the CLT, we have $$\frac{\hat{p}-p}{\sqrt{{p}(1-{p})/n}}\stackrel{d}{\rightarrow}Z.$$ ### $H_0: p = p_0$ For $H_0: p = p_0$, the test statistic and its sampling distribution is $$Z_{STAT}=\frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/{n}}}\stackrel{d}{\rightarrow}Z$$ ### The two-sided test 1. $H_0: p = p_0$ versus $H_a: p \neq p_0$. 2. Set up $\alpha$ 3. $Z_{STAT} = \frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}}\stackrel{d}{\rightarrow}Z$. 4. Calculate the realized statistic $Z^*$ from the data. 5. Find (a) the rejection region = $\{z: z<-z_{\alpha/2}\;,z>z_{\alpha/2}\}$ or (b) the $p$-value= $2*P(Z>|Z^*|)$. 6. Conclude. ### The left-sided test 1. One of the following: - $H_0:p = p_0$ versus $H_a: p < p_0$ - $H_0:p \geq p_0$ versus $H_a: p < p_0$ - $H_0:p > p_0$ versus $H_a: p \leq p_0$ 2. Set up $\alpha$. 3. $Z_{STAT} = \frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}}\stackrel{d}{\rightarrow}Z$. 4. Calculate the realized statistic $Z^*$ from the data. 5. Find either (a) the rejection region = $\{z: z<-z_{\alpha}\}$ or (b) $p$-value = $P(Z<Z^*)$. 6. Conclude. ### The right-sided test 1. One of the following: - $H_0: p = p_0$ versus $H_a: p > p_0$ - $H_0:p \leq p_0$ versus $H_a: p > p_0$ - $H_0:p < p_0$ versus $H_a: p \geq p_0$ 2. Set up $\alpha$ 3. $Z_{STAT} = \frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}}\stackrel{d}{\rightarrow}Z$. 4. Calculate the realized statistic $Z^*$ from the data. 5. Find (a) the rejection region = $\{z: z>z_{\alpha}\}$ or (b) the $p$-value = $P(Z>Z^*)$. 6. Conclude. ### Example: $H_0$: $p=0.2$ vs $H_a$: $p< 0.2$. Bernoulli(p=0.20) ![](https://i.imgur.com/4J2OfwE.png) ![](https://i.imgur.com/N2JtaHc.png) ## 9.5 Two population proportions ### Key ingredients (B2) - Model: $X_{1}{\sim} Binomial(n_1,p_1)$ and $X_{2}{\sim} Bernoulli(n_2, p_2)$. - We are interested in understanding $(p_1-p_2)$. - Define $$\hat{p}_1=\frac{X_1}{n_1}, \quad \hat{p}_2=\frac{X_2}{n_2}.$$We use $(\hat{p}_1-\hat{p}_2)$ to estimate $(p_1-p_2)$. - With the CLT, we have $$\frac{(\hat{p}_1-\hat{p}_2)-(p_1-p_2)}{\sqrt{ \frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}}\stackrel{d}{\rightarrow}Z$$ - When $p_1$ and $p_2$ are unknown, assuming $H_0$ is true,i.e., $p_1=p_2=p$, we define $$\hat{p}=\frac{X_1+X_2}{n_1+n_2}$$ as an pool estimate of $p$. - Apply the advanced CLT (Slutsky's theorem), we have $$\frac{(\hat{p}_1-\hat{p}_2)-(p_1-p_2)}{\sqrt{\hat{p}(1-\hat{p})( \frac{1}{n_1}+\frac{1}{n_2})}}\stackrel{d}{\rightarrow}Z.$$ ### $H_0: p_1-p_2 = 0$ For $H_0: p_1-p_2 = 0$, the test statistic and its sampling distribution is $$Z_{STAT}=\frac{(\hat{p}_1-\hat{p}_2)-0}{\sqrt{\hat{p}(1-\hat{p})( \frac{1}{n_1}+\frac{1}{n_2})}}\stackrel{d}{\rightarrow}Z,$$ where $\hat{p}_1=\frac{X_1}{n_1}$, $\hat{p}_2=\frac{X_2}{n_2}$, and $\hat{p}=\frac{X_1+X_2}{n_1+n_2}$. ### Exemple: $H_0$: $p_1-p_2=0$ vs $H_a$: $p_1-p_2\neq 0$. ![](https://i.imgur.com/1jlfDZs.png) ![](https://i.imgur.com/45MFHqf.png)