---
title: Stat09
tags: Stat
---
[Home](https://hackmd.io/y_1O1ws1TQe8VRI7bxQbeg)
# Chapter 9 Large-sample Tests of Hypotheses
## General principles
### Key ingredients
- Model specification: $F(\theta)$
- Find an **estimator** of the parameter.
- Standardize the estimator to obtain a **statistic** and its sampling distribution
- Then, we have two statistical approaches
- Confidence Interval :heavy_check_mark:
- Hypothesis test :strawberry:
## 9.1 Hypothesis tests
### $H_0$ versus $H_a$
- Null hypothesis ($H_0$): a contradiction of the alternative hypothesis
- Alternative hypothesis ($H_a$ or $H_1$): the hypothesis the researcher wants to support.
### Type I and Type II errors
- Type I error refers to reject $H_0$ when it is in fact true.
- The significance level $\alpha$ is the probability if rejecting $H_0$ when it is in fact true. Or, the probability of Type I error.
- Type II error refers to accept $H_0$ when it is in fact false.
- $\beta$ is the probability of accepting $H_0$ when it is in fact false.
- The power of the test is ($1-\beta$), the probability of rejecting $H_0$ when it is false.
:::warning
### S.O.P. of a hypothesis test
The following procedures ensure that the probability of type I error is less than the predetermined $\alpha$.
1. Set up $H_0$ and $H_a$.
2. Decide the significant level $\alpha$ (usually $\alpha=0.05, 0.1, 0.01$.)
3. Decide the test statistic and its sampling distribution.
4. Calculate the realized test statistic from the data.
5. Find (a) the rejection region or (b) the $p$-value.
6. Conclude.
### (a) Rejection region approach (or critical value approach)
5. Find the rejection region
6. Conclusion: If the realized statistics falls into the rejection region, reject $H_0$. Otherwise, do not reject $H_0$.
%(or when the test realized test statistic exceeds the critical value)
### (b) The $p$-value approach
5. Calculate the $p$-value: The $p$-value is the probability of observing a test statistic as extreme as or more than the one observed assuming $H_0$ is true.
6. Conclusion: If $p$-value is smaller than $\alpha$, reject $H_0$. Otherwise, do not reject $H_0$.
:::
## 9.2 One population mean
### Two-sided, one-sided (left-sided and right-sided) tests
#### Two-sided
- $H_0:\mu = \mu_0$ versus $H_a: \mu\neq \mu_0$.
#### Left-sided
- $H_0:\mu = \mu_0$ versus $H_a: \mu < \mu_0$
- $H_0:\mu \geq \mu_0$ versus $H_a: \mu < \mu_0$
- $H_0:\mu > \mu_0$ versus $H_a: \mu \leq \mu_0$
#### Right-sided
- $H_0:\mu = \mu_0$ versus $H_a: \mu > \mu_0$
- $H_0:\mu \leq \mu_0$ versus $H_a: \mu > \mu_0$
- $H_0:\mu < \mu_0$ versus $H_a: \mu \geq \mu_0$
### Key ingredients (A1)
- Model: $X_i\stackrel{i.i.d.}{\sim}F(\mu,\sigma^2)$ for $i=1,\ldots,n$.
- We are interested in the value of the parameter $\mu$.
- We use the sample mean $\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$ to estimate the parameter mean.
- With the CLT, we have
$$\frac{\bar{X}_n-\mu}{\sigma/\sqrt{n}}\stackrel{d}{\rightarrow}Z$$
- When $\sigma$ is unknown, define the sample variance by
$$s^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X}_n)^2.$$With the advanced CLT (Slutsky's theorem), we have
$$\frac{\bar{X}_n-\mu}{s/\sqrt{n}}\stackrel{d}{\rightarrow}Z$$
### $H_0:\mu = \mu_0$
For $H_0:\mu = \mu_0$, the test statistics and its sampling distribution is
$$Z_{STAT}=\frac{\bar{X}_n-\mu_0}{s/\sqrt{n}}\stackrel{d}{\rightarrow}Z.$$
### The two-sided test
1. $H_0:\mu = \mu_0$ versus $H_a: \mu\neq \mu_0$.
2. Set up $\alpha$
3. $Z_{STAT} = \frac{\bar{X}_n-\mu_0}{s/\sqrt{n}}\stackrel{d}{\rightarrow}Z$.
4. Calculate the realized statistic $Z^*$ from the data.
5. Find (a) the rejection region = $\{z: z<-z_{\alpha/2}\;,z>z_{\alpha/2}\}$ or (b) the $p$-value = $2P(Z>|Z^*|)$.
6. Conclude.
### The left-sided test
1. One of the following:
- $H_0:\mu = \mu_0$ versus $H_a: \mu < \mu_0$
- $H_0:\mu \geq \mu_0$ versus $H_a: \mu < \mu_0$
- $H_0:\mu > \mu_0$ versus $H_a: \mu \leq \mu_0$
2. Set up $\alpha$
3. $Z_{STAT} = \frac{\bar{X}_n-\mu_0}{s/\sqrt{n}}\stackrel{d}{\rightarrow}Z$.
4. Calculate the realized statistic $Z^*$ from the data.
5. Find (a) the rejection region = $\{z: z< -z_{\alpha}\}$ or (b) the $p$-value = $P(Z<Z^*)$.
6. Conclude.
### The right-sided test
1. One of the following:
- $H_0:\mu = \mu_0$ versus $H_a: \mu > \mu_0$
- $H_0:\mu \leq \mu_0$ versus $H_a: \mu > \mu_0$
- $H_0:\mu < \mu_0$ versus $H_a: \mu \geq \mu_0$
2. Set up $\alpha$
3. $Z_{STAT} = \frac{\bar{X}_n-\mu_0}{s/\sqrt{n}}\stackrel{d}{\rightarrow}Z$.
4. Calculate the realized statistic $Z^*$ from the data.
5. Find (a) the rejection region = $\{z: z>z_{\alpha}\}$ or (b) the $p$-value = $P(Z>Z^*)$.
6. Conclude.
### Example:
Assume $X_i\sim F(\mu,\sigma^2)$ for $i=1,ldots, 64$. $F$ is unknown.
1. $H_0:\mu=250,000$ vs $H_a: \mu>250,000$.
2. $\alpha=0.01$
3. Test statistic: $$Z_{STAT}=\frac{\bar{X}-250000}{15000/\sqrt{64}}\rightarrow Z$$.
4. $Z*=1.07$.
5.
6.



## 9.3 Two population means
### Key ingredients (B1)
- Model: $X_{1i}\stackrel{i.i.d.}{\sim} F(\mu_1,\sigma_1^2)$ for $i=1,\ldots,n_1$ and
$X_{2j}\stackrel{i.i.d.}{\sim} F(\mu_2,\sigma_2^2)$ for $i=1,\ldots,n_2$.
- We are interested in understanding $(\mu_1-\mu_2)$.
- We use the difference between two sample means $(\bar{X}_1-\bar{X}_2)$ to estimate $(\mu_1-\mu_2)$, where
$\bar{X}_1=\frac{1}{n_1}\sum_{i=1}^{n_1}X_{1i}$ and $\bar{X}_2=\frac{1}{n_2}\sum_{j=1}^{n_2}X_{2j}$.
- With the CLT, we have $$\frac{(\bar{X}_1-\bar{X}_2)-(\mu_1-\mu_2)}{\sqrt{ \frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}}\stackrel{d}{\rightarrow}Z$$
- When $\sigma_1$ and $\sigma_2$ are unknown, we use the sample variances,
$$s_1^2 = \frac{1}{n_1-1}\sum_{i=1}^{n_1}(X_{1i}-\bar{X}_1)^2, \quad s_2^2 = \frac{1}{n_2-1}\sum_{j=1}^{n_2}(X_{2j}-\bar{X}_2)^2. $$
- With the advanced CLT (Slutsky's theorem), we have $$\frac{(\bar{X}_1-\bar{X}_2)-(\mu_1-\mu_2)}{\sqrt{ \frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\stackrel{d}{\rightarrow}Z.$$
### $H_0: \mu_1-\mu_2 = D_0$
The test statistic and its sampling distribution is
$$Z_{STAT} = \frac{(\bar{X}_1-\bar{X}_2)-D_0}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\stackrel{d}{\rightarrow}Z$$
### Example: $H_0$: $\mu_1-\mu_2=0$ vs $H_a$: $\mu_1-\mu_2\neq 0$.


## 9.4 Population proportion
### Key ingredients (A2)
- Model: $X {\sim} Binomial(n,p)$
- We use sample proportion $\hat{p} = \frac{X}{n}$ to estimate the population proportion $p$.
- With the CLT, we have
$$\frac{\hat{p}-p}{\sqrt{{p}(1-{p})/n}}\stackrel{d}{\rightarrow}Z. $$
### $H_0: p = p_0$
For $H_0: p = p_0$, the test statistic and its sampling distribution is
$$Z_{STAT}=\frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/{n}}}\stackrel{d}{\rightarrow}Z$$
### The two-sided test
1. $H_0: p = p_0$ versus $H_a: p \neq p_0$.
2. Set up $\alpha$
3. $Z_{STAT} = \frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}}\stackrel{d}{\rightarrow}Z$.
4. Calculate the realized statistic $Z^*$ from the data.
5. Find (a) the rejection region = $\{z: z<-z_{\alpha/2}\;,z>z_{\alpha/2}\}$ or (b) the $p$-value= $2*P(Z>|Z^*|)$.
6. Conclude.
### The left-sided test
1. One of the following:
- $H_0:p = p_0$ versus $H_a: p < p_0$
- $H_0:p \geq p_0$ versus $H_a: p < p_0$
- $H_0:p > p_0$ versus $H_a: p \leq p_0$
2. Set up $\alpha$.
3. $Z_{STAT} = \frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}}\stackrel{d}{\rightarrow}Z$.
4. Calculate the realized statistic $Z^*$ from the data.
5. Find either (a) the rejection region = $\{z: z<-z_{\alpha}\}$ or (b) $p$-value = $P(Z<Z^*)$.
6. Conclude.
### The right-sided test
1. One of the following:
- $H_0: p = p_0$ versus $H_a: p > p_0$
- $H_0:p \leq p_0$ versus $H_a: p > p_0$
- $H_0:p < p_0$ versus $H_a: p \geq p_0$
2. Set up $\alpha$
3. $Z_{STAT} = \frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}}\stackrel{d}{\rightarrow}Z$.
4. Calculate the realized statistic $Z^*$ from the data.
5. Find (a) the rejection region = $\{z: z>z_{\alpha}\}$ or (b) the $p$-value = $P(Z>Z^*)$.
6. Conclude.
### Example: $H_0$: $p=0.2$ vs $H_a$: $p< 0.2$.
Bernoulli(p=0.20)


## 9.5 Two population proportions
### Key ingredients (B2)
- Model: $X_{1}{\sim} Binomial(n_1,p_1)$ and $X_{2}{\sim} Bernoulli(n_2, p_2)$.
- We are interested in understanding $(p_1-p_2)$.
- Define
$$\hat{p}_1=\frac{X_1}{n_1}, \quad \hat{p}_2=\frac{X_2}{n_2}.$$We use $(\hat{p}_1-\hat{p}_2)$ to estimate $(p_1-p_2)$.
- With the CLT, we have $$\frac{(\hat{p}_1-\hat{p}_2)-(p_1-p_2)}{\sqrt{ \frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}}\stackrel{d}{\rightarrow}Z$$
- When $p_1$ and $p_2$ are unknown, assuming $H_0$ is true,i.e., $p_1=p_2=p$, we define $$\hat{p}=\frac{X_1+X_2}{n_1+n_2}$$ as an pool estimate of $p$.
- Apply the advanced CLT (Slutsky's theorem), we have $$\frac{(\hat{p}_1-\hat{p}_2)-(p_1-p_2)}{\sqrt{\hat{p}(1-\hat{p})( \frac{1}{n_1}+\frac{1}{n_2})}}\stackrel{d}{\rightarrow}Z.$$
### $H_0: p_1-p_2 = 0$
For $H_0: p_1-p_2 = 0$, the test statistic and its sampling distribution is
$$Z_{STAT}=\frac{(\hat{p}_1-\hat{p}_2)-0}{\sqrt{\hat{p}(1-\hat{p})( \frac{1}{n_1}+\frac{1}{n_2})}}\stackrel{d}{\rightarrow}Z,$$
where $\hat{p}_1=\frac{X_1}{n_1}$, $\hat{p}_2=\frac{X_2}{n_2}$, and $\hat{p}=\frac{X_1+X_2}{n_1+n_2}$.
### Exemple: $H_0$: $p_1-p_2=0$ vs $H_a$: $p_1-p_2\neq 0$.

