---
title: Stat10
tags: Stat
---
## Chapter 10
[Stat202202](/q9HjXLBNSlaycgL5vCF-dQ)
## 2023/3/13
### General principles
## 2023/3/6 Distributions need to know in this course
### Example
Model: Assume $X_i \sim N(\mu,\sigma^2)$.
A statistic (統計量): A function of random samples
Examples
1. sample mean $\bar{X}=\frac{1}{n}\sum_{i=1}^n X_i$
2. sample variance $S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar{X})^2$
Examples:
1. When $\sigma$ is known, $\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim N(0,1)$.
2. When $\sigma$ is unknown, $\frac{\bar{X}-\mu}{S/\sqrt{n}}\sim t_{(n-1)}$.
3. $\frac{(n-1)S^2}{\sigma^2}\sim \chi^2_{(n-1)}$.
4. $F_{\nu_1,\nu_2}$
### $\chi^2_\nu$?
Chi-squared distribution with degrees of freedom $\nu$.
Suppose $Z_i\stackrel{iid}{\sim} N(0,1)$ for $i=1,\ldots,\nu$
Then, we say $\sum_{i=1}^{\nu} Z_i^2 \sim \chi^2_\nu$
An interesting example:
$$\frac{(n-1)s^2}{\sigma^2}=\frac{\sum(x_i-\bar{x})^2}{\sigma^2}=\sum \left(\frac{(x_i-\bar{x})}{\sigma}\right)^2\approx \sum \left(\frac{(x_i-\mu)}{\sigma}\right)^2 \sim \chi^2_n$$
We are interested in:
1. $\mu$: The mean of a normal distribution
2. $\mu_1-\mu_2$: The difference between two population means
3. $\sigma^2$: The variance of a normal distribution
4. ${\sigma_1^2}/{\sigma_2^2}$: The ratio of two population variances
### General principles
For a dataset, we do the following:
1. Model specification: $N(\mu,\sigma^2)$
2. Point estimtor: $\mu$ and $\sigma^2$.
3. Identify the pivot statistic and its sampling distribution (assuming all other parameters are known/some parameters are unknown)
4. Confidence Interval
5. Hypothesis test
## 2023/02/13 Review on Probability
### Interpretation of probabiltiy
1. $n\rightarrow \infty$
2. Ratio
$$P(A) = \lim_{n\rightarrow \infty}\frac{\# A}{n}$$
### Discrete random variable?
$X$: a random variable
$x$: a possible realization
#### Ex: Bernoulli random varialbe
Bernoulli trial/experiment
The outcome could be 1 (success) or 0 (failure) only.
The probability of getting 1 is $p$, and the probability of getting 0 is $(1-p)$.
### Bernoulli random varible
$X \sim Bernoulli(p)$, where $p$ is the parameter.
$X$ has the probabiltiy mass function:
$P(X=1)=p$, $P(X=0)=1-p$, $P(X=x)=0$ if $x\neq 0$ or 1.
Or, equivalently,
$P(X=x)=(p)^x (1-p)^{1-x}$ for $x=0,1$
Example: $Y\sim Binomial(3,0.3)$
$P(Y=3)=C^3_3 0.3^3 0.7^0= 0.027=2.7\%$
:apple: Quiz: $Y\sim Binomial(100, 0.45)$. Find
$$P(Y\leq 10)=P(Y=0)+P(Y=1)+\cdots+P(Y=10) = C^{100}_0 0.45^0 0.55^{100}
+\cdots+ C^{100}_{10} 0.45^{10} 0.55^{90} = 3 \times 10^{-14}$$ by online calculator.
### Binomial random variable
How does it be construcuted?
$X_i\sim Bernoulli(p)$, $X_i$'s are independent,
THen, $Y=\sum_{i=1}^n{X_i}$, we write $Y\sim Binomial(n, p)$.
$Y$ has the probability mass function (pmf)
$P(Y=y)=C^n_y p^y(1-p)^{(n-y)}$, for $y=0,1,2,\ldots,n$.
### Continuous random variable?
:strawberry: Why we need a continuous random varialbe? We would like to simplify the calculation.
![](https://i.imgur.com/UvbkjD7.png)
A continuous random varialbe $X$, has its probability density funciton (pdf), denoted as $f(x)$, so that
$$P(a<X<b) = \int_{a}^b f(x) dx.$$
## Go to keynote (20240304)
## 10.2 Small-sample inferences concerning a population mean
### Key ingredients
### 1. $\sigma$ is known
Suppose $X_i\stackrel{i.i.d.}{\sim}N(\mu,\sigma^2)$ for $i=1,\ldots,n$. We estimate $\mu$ with $\bar{X}$ assuming $\sigma^2$ is known. The standardized estimator is
$$\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim Z.$$
#### Confidence interval with known $\mu$
When $\sigma$ is known, $\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim Z$. Then, we have
\begin{eqnarray*}
1-\alpha &=& P(-z_{\alpha/2} < \frac{\bar{X}-\mu}{\sigma/\sqrt{n}} < z_{\alpha/2}) \\
&=& P(-z_{\alpha/2}\frac{\sigma}{\sqrt{n}} < {\bar{X}-\mu} < z_{\alpha/2} \frac{\sigma}{\sqrt{n}})\\
&=& P(-z_{\alpha/2}\frac{\sigma}{\sqrt{n}} < {\mu-\bar{X}} < z_{\alpha/2} \frac{\sigma}{\sqrt{n}})\\
&=& P(\bar{X}-z_{\alpha/2}\frac{\sigma}{\sqrt{n}}< \mu < \bar{X}+z_{\alpha/2} \frac{\sigma}{\sqrt{n}})\\
\end{eqnarray*}
We define the $100(1-\alpha)\%$ confidence interval for $\mu$ as $$\bar{X}\pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}. $$
#### Hypothesis test
1. $H_0: \mu = \mu_0$ versus $H_a: \mu \neq \mu_0$ (or > or <)
2. Set up $\alpha$.
3. The test statistic and its sampling distribution is
$$Z_{STAT} = \frac{\bar{X}-\mu_0}{\sigma/\sqrt{n}}\sim Z.$$
4. Calculate the realized statistic, $Z^{\ast}$.
5. Find the rejection region or $p$-value.
6. Conclude.
- Reject or do not reject $H_0$.
- Conclude in plain language and get back to the scenario.
(Move the following to keynote 20240304)
## 10.3 Small sample inference for the difference between two population means: independent random samples.
## 10.5 Inferences concerning a population variance
Sometimes, the primary parameter of interest is not the population mean $\mu$, but rather the population variance $\sigma^2$.
### 4. CI for $\sigma^2$
### 5. For hypothesis testing:
### Example
A cement manufacturer claims that his cement has a compressive strength with a standard deviation of 10 kg/cm$^2$ or less. A sample of $n=10$ measurements produced a mean and standard deviation of 312 and 13.96, respectively.
![](https://i.imgur.com/Ll19QyX.png)
### CI for $\sigma^2$
Find the 95\% CI for the variance of the compressive strength.
With $n=10$, $s^2 = 13.96^2$, $\chi^2_{(n-1),1-\alpha/2}=2.70$, $\chi^2_{\alpha/2} =19.02$, the 95% CI for $\sigma^2$ is
$$\left[\frac{(n-1)s^2}{\chi_{(n-1),\alpha/2}^2}, \frac{(n-1)s^2}{\chi_{(n-1),1-\alpha/2}^2}\right]=[79.46,\; 559.79].$$
The 95% CI for $\sigma$ is
$$\left[\sqrt{\frac{(n-1)s^2}{\chi_{(n-1),\alpha/2}^2}}, \sqrt{\frac{(n-1)s^2}{\chi_{(n-1),1-\alpha/2}^2}}\right]=[\sqrt{79.46},\; \sqrt{559.79}].$$
### Hypothesis test
1. $H_0: \sigma^2 = 10^2$ v.s. $H_1:\sigma^2 > 10^2$.
2. Set $\alpha=0.05$.
3. Test statistic is
$$\chi^2_{STAT} = \frac{(n-1)s^2}{100}\sim \chi^2_{(n-1)=9}.$$
4. Calculate the realized statistic, $$\chi^{2*} =\frac{(n-1)s^2}{100} = \frac{9(13.96)^2}{10^2} = 17.5.$$
### The rejection region approach.
5. The rejection region approach.
The rejection region is $\{\chi^2: \chi^2>16.919\}$.
6. Because $\chi^{2*} = 17.5$ falls into the rejection region, we reject $H_0:\sigma^2 = 100$ (and accept $H_1:\sigma^2>10$). We conclude that the standard deviation of the cement strength is likely more than 10.
### The $p$-value approach.
5. The $p$-value approach.
The approximate $p$-value is
$$0.025 < p-\mbox{value}=P(\chi^2 > 17.5) < 0.05.$$
6. Because the $p$-value is less than $\alpha=0.05$, we reject $H_0:\sigma^2 = 100$ (and accept $H_1:\sigma^2>10$). We conclude that the standard deviation of the cement strength is likely more than 10.
![](https://i.imgur.com/WaaBeC8.png)
![](https://i.imgur.com/MZJGjOf.png)
## 10.6 Comparing two population variances
Updated in keynotes
### 10.7 Revisiting the small-sample assumptions
### Case 1: One population
$X_i\stackrel{i.i.d.}{\sim}N(\mu,\sigma^2),i=1,\ldots, n$.
### (10.2) focuses on $\mu$
The test statistic:
$$\frac{\bar{X}-\mu}{s/\sqrt{n}}\sim t_{df=(n-1)}.$$
- The $100(1-\alpha)\%$ CI for $\mu$ is $$[\bar{X}-t_{\alpha/2}\frac{s}{\sqrt{n}},\bar{X}+t_{\alpha/2}\frac{s}{\sqrt{n}}]$$
- For $H_0: \mu=\mu_0$, the test statistic is
$$\frac{\bar{x}-\mu_0}{s/\sqrt{n}}\sim t_{(n-1)}.$$
### (10.5) focuses on $\sigma^2$
$$\frac{(n-1)s^2}{\sigma^2}\sim \chi^2_{(n-1)}.$$
- The $100(1-\alpha)\%$ CI for $\sigma^2$ is
$$\left[\frac{(n-1)s^2}{\chi_{\alpha/2}^2}, \frac{(n-1)s^2}{\chi_{1-\alpha/2}^2}\right]$$
- For $H_0: \sigma^2 = \sigma_0^2$, the test statistic is
$$\chi^2_{STAT} = \frac{(n-1)s^2}{\sigma^2_0}\sim \chi^2_{(n-1)}.$$
### Case 2: Two populations
$$X_{1i}\stackrel{i.i.d.}{\sim}N(\mu_1,\sigma_1^2),i=1,\ldots, n_1$$
$$X_{2j}\stackrel{i.i.d.}{\sim}N(\mu_2,\sigma^2_2),i=1,\ldots, n_2$$
### (10.3) case 1 focuses on $\mu_1-\mu_2$ assuming $\sigma_1^2=\sigma_2^2$
$$t_{STAT} = \frac{\bar{x}_1-\bar{x}_2-(\mu_1-\mu_2)}{\sqrt{s_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\sim t_{(n_1+n_2-2)},$$with
$$s_p^2 = \frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}.$$
- CI: $(\bar{x}_1-\bar{x}_2)\pm t_{\alpha/2} \sqrt{s_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}$
- For $H_0: \mu_1-\mu_2 = \mu_0$, if $\sigma_1^2 = \sigma^2_2$,
the test statistic is $$t_{STAT} = \frac{\bar{x}_1-\bar{x}_2-\mu_0}{\sqrt{s_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\sim t_{(n_1+n_2-2)}.$$
### (10.3) case 2 focuses on $\mu_1-\mu_2$ assuming $\sigma_1^2\neq\sigma_2^2$
$$t_{STAT} = \frac{(\bar{x}_1-\bar{x}_2)-(\mu_1-\mu_2)}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\approx t_{df},$$with
$df=\frac{\left( \frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1}+\frac{(s_2^2/n_2)^2}{n_2-1}}$
- CI: $(\bar{x}_1-\bar{x}_2)\pm t_{\alpha/2}\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}$
- For $H_0: \mu_1-\mu_2 = \mu_0$, if $\sigma_1^2 \neq \sigma^2_2$.
$$t_{STAT} = \frac{(\bar{x}_1-\bar{x}_2)-\mu_0}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\approx t_{df}.$$
### (10.6) focuses on $\sigma_1^2/\sigma_2^2$
The test statistic: $$\frac{s_1^2/\sigma_1^2}{s_2^2/\sigma_2^2}\sim F, df_1=(n_1-1), df_2 = (n_2-2).$$
- The $100(1-\alpha)\%$ confidence interval is
$$\left[\frac{s_1^2}{s_2^2}\frac{1}{F_{\alpha/2,df_1,df_2}},\; \frac{s_1^2}{s_2^2}{F_{\alpha/2,df_2,df_1}}\right]$$
- For $H_0: \sigma_1^2 = \sigma_2^2$, the test statistic is
$$F_{STAT} = \frac{s_1^2}{s_2^2}\sim F,\;df_1 = (n_1-1),df_2=(n_2-1).$$
### Case 3: Paired population difference
$$X_{1i}-X_{2i}\stackrel{i.i.d.}{\sim} N(\mu_D,\sigma^2)$$
or,
$$D_i\sim N(\mu_D,\sigma^2),$$
with $D_{i}=X_{1i}-X_{2i}$.
### (10.4) focuses on $\mu_D$
$$\frac{\bar{D}-\mu_D}{s_D/\sqrt{n}}\sim t_{df=n-1}.$$
- The $100(1-\alpha)\%$ confidence interval for $\mu_1-\mu_2$ is
$$\bar{D}\pm t_{\alpha/2} \frac{s_D}{\sqrt{n}},$$
- For $H_0: \mu_1-\mu_2 = \mu_{0}$, the test statistic is
$$t_{STAT}=\frac{\bar{D}-\mu_0}{s_D/\sqrt{n}}\sim t_{(n-1)}.$$
[Stat202202](/q9HjXLBNSlaycgL5vCF-dQ)