--- title: Stat10 tags: Stat --- ## Chapter 10 [Stat202202](/q9HjXLBNSlaycgL5vCF-dQ) ## 2023/3/13 ### General principles ## 2023/3/6 Distributions need to know in this course ### Example Model: Assume $X_i \sim N(\mu,\sigma^2)$. A statistic (統計量): A function of random samples Examples 1. sample mean $\bar{X}=\frac{1}{n}\sum_{i=1}^n X_i$ 2. sample variance $S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar{X})^2$ Examples: 1. When $\sigma$ is known, $\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim N(0,1)$. 2. When $\sigma$ is unknown, $\frac{\bar{X}-\mu}{S/\sqrt{n}}\sim t_{(n-1)}$. 3. $\frac{(n-1)S^2}{\sigma^2}\sim \chi^2_{(n-1)}$. 4. $F_{\nu_1,\nu_2}$ ### $\chi^2_\nu$? Chi-squared distribution with degrees of freedom $\nu$. Suppose $Z_i\stackrel{iid}{\sim} N(0,1)$ for $i=1,\ldots,\nu$ Then, we say $\sum_{i=1}^{\nu} Z_i^2 \sim \chi^2_\nu$ An interesting example: $$\frac{(n-1)s^2}{\sigma^2}=\frac{\sum(x_i-\bar{x})^2}{\sigma^2}=\sum \left(\frac{(x_i-\bar{x})}{\sigma}\right)^2\approx \sum \left(\frac{(x_i-\mu)}{\sigma}\right)^2 \sim \chi^2_n$$ We are interested in: 1. $\mu$: The mean of a normal distribution 2. $\mu_1-\mu_2$: The difference between two population means 3. $\sigma^2$: The variance of a normal distribution 4. ${\sigma_1^2}/{\sigma_2^2}$: The ratio of two population variances ### General principles For a dataset, we do the following: 1. Model specification: $N(\mu,\sigma^2)$ 2. Point estimtor: $\mu$ and $\sigma^2$. 3. Identify the pivot statistic and its sampling distribution (assuming all other parameters are known/some parameters are unknown) 4. Confidence Interval 5. Hypothesis test ## 2023/02/13 Review on Probability ### Interpretation of probabiltiy 1. $n\rightarrow \infty$ 2. Ratio $$P(A) = \lim_{n\rightarrow \infty}\frac{\# A}{n}$$ ### Discrete random variable? $X$: a random variable $x$: a possible realization #### Ex: Bernoulli random varialbe Bernoulli trial/experiment The outcome could be 1 (success) or 0 (failure) only. The probability of getting 1 is $p$, and the probability of getting 0 is $(1-p)$. ### Bernoulli random varible $X \sim Bernoulli(p)$, where $p$ is the parameter. $X$ has the probabiltiy mass function: $P(X=1)=p$, $P(X=0)=1-p$, $P(X=x)=0$ if $x\neq 0$ or 1. Or, equivalently, $P(X=x)=(p)^x (1-p)^{1-x}$ for $x=0,1$ Example: $Y\sim Binomial(3,0.3)$ $P(Y=3)=C^3_3 0.3^3 0.7^0= 0.027=2.7\%$ :apple: Quiz: $Y\sim Binomial(100, 0.45)$. Find $$P(Y\leq 10)=P(Y=0)+P(Y=1)+\cdots+P(Y=10) = C^{100}_0 0.45^0 0.55^{100} +\cdots+ C^{100}_{10} 0.45^{10} 0.55^{90} = 3 \times 10^{-14}$$ by online calculator. ### Binomial random variable How does it be construcuted? $X_i\sim Bernoulli(p)$, $X_i$'s are independent, THen, $Y=\sum_{i=1}^n{X_i}$, we write $Y\sim Binomial(n, p)$. $Y$ has the probability mass function (pmf) $P(Y=y)=C^n_y p^y(1-p)^{(n-y)}$, for $y=0,1,2,\ldots,n$. ### Continuous random variable? :strawberry: Why we need a continuous random varialbe? We would like to simplify the calculation. ![](https://i.imgur.com/UvbkjD7.png) A continuous random varialbe $X$, has its probability density funciton (pdf), denoted as $f(x)$, so that $$P(a<X<b) = \int_{a}^b f(x) dx.$$ ## Go to keynote (20240304) ## 10.2 Small-sample inferences concerning a population mean ### Key ingredients ### 1. $\sigma$ is known Suppose $X_i\stackrel{i.i.d.}{\sim}N(\mu,\sigma^2)$ for $i=1,\ldots,n$. We estimate $\mu$ with $\bar{X}$ assuming $\sigma^2$ is known. The standardized estimator is $$\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim Z.$$ #### Confidence interval with known $\mu$ When $\sigma$ is known, $\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim Z$. Then, we have \begin{eqnarray*} 1-\alpha &=& P(-z_{\alpha/2} < \frac{\bar{X}-\mu}{\sigma/\sqrt{n}} < z_{\alpha/2}) \\ &=& P(-z_{\alpha/2}\frac{\sigma}{\sqrt{n}} < {\bar{X}-\mu} < z_{\alpha/2} \frac{\sigma}{\sqrt{n}})\\ &=& P(-z_{\alpha/2}\frac{\sigma}{\sqrt{n}} < {\mu-\bar{X}} < z_{\alpha/2} \frac{\sigma}{\sqrt{n}})\\ &=& P(\bar{X}-z_{\alpha/2}\frac{\sigma}{\sqrt{n}}< \mu < \bar{X}+z_{\alpha/2} \frac{\sigma}{\sqrt{n}})\\ \end{eqnarray*} We define the $100(1-\alpha)\%$ confidence interval for $\mu$ as $$\bar{X}\pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}. $$ #### Hypothesis test 1. $H_0: \mu = \mu_0$ versus $H_a: \mu \neq \mu_0$ (or > or <) 2. Set up $\alpha$. 3. The test statistic and its sampling distribution is $$Z_{STAT} = \frac{\bar{X}-\mu_0}{\sigma/\sqrt{n}}\sim Z.$$ 4. Calculate the realized statistic, $Z^{\ast}$. 5. Find the rejection region or $p$-value. 6. Conclude. - Reject or do not reject $H_0$. - Conclude in plain language and get back to the scenario. (Move the following to keynote 20240304) ## 10.3 Small sample inference for the difference between two population means: independent random samples. ## 10.5 Inferences concerning a population variance Sometimes, the primary parameter of interest is not the population mean $\mu$, but rather the population variance $\sigma^2$. ### 4. CI for $\sigma^2$ ### 5. For hypothesis testing: ### Example A cement manufacturer claims that his cement has a compressive strength with a standard deviation of 10 kg/cm$^2$ or less. A sample of $n=10$ measurements produced a mean and standard deviation of 312 and 13.96, respectively. ![](https://i.imgur.com/Ll19QyX.png) ### CI for $\sigma^2$ Find the 95\% CI for the variance of the compressive strength. With $n=10$, $s^2 = 13.96^2$, $\chi^2_{(n-1),1-\alpha/2}=2.70$, $\chi^2_{\alpha/2} =19.02$, the 95% CI for $\sigma^2$ is $$\left[\frac{(n-1)s^2}{\chi_{(n-1),\alpha/2}^2}, \frac{(n-1)s^2}{\chi_{(n-1),1-\alpha/2}^2}\right]=[79.46,\; 559.79].$$ The 95% CI for $\sigma$ is $$\left[\sqrt{\frac{(n-1)s^2}{\chi_{(n-1),\alpha/2}^2}}, \sqrt{\frac{(n-1)s^2}{\chi_{(n-1),1-\alpha/2}^2}}\right]=[\sqrt{79.46},\; \sqrt{559.79}].$$ ### Hypothesis test 1. $H_0: \sigma^2 = 10^2$ v.s. $H_1:\sigma^2 > 10^2$. 2. Set $\alpha=0.05$. 3. Test statistic is $$\chi^2_{STAT} = \frac{(n-1)s^2}{100}\sim \chi^2_{(n-1)=9}.$$ 4. Calculate the realized statistic, $$\chi^{2*} =\frac{(n-1)s^2}{100} = \frac{9(13.96)^2}{10^2} = 17.5.$$ ### The rejection region approach. 5. The rejection region approach. The rejection region is $\{\chi^2: \chi^2>16.919\}$. 6. Because $\chi^{2*} = 17.5$ falls into the rejection region, we reject $H_0:\sigma^2 = 100$ (and accept $H_1:\sigma^2>10$). We conclude that the standard deviation of the cement strength is likely more than 10. ### The $p$-value approach. 5. The $p$-value approach. The approximate $p$-value is $$0.025 < p-\mbox{value}=P(\chi^2 > 17.5) < 0.05.$$ 6. Because the $p$-value is less than $\alpha=0.05$, we reject $H_0:\sigma^2 = 100$ (and accept $H_1:\sigma^2>10$). We conclude that the standard deviation of the cement strength is likely more than 10. ![](https://i.imgur.com/WaaBeC8.png) ![](https://i.imgur.com/MZJGjOf.png) ## 10.6 Comparing two population variances Updated in keynotes ### 10.7 Revisiting the small-sample assumptions ### Case 1: One population $X_i\stackrel{i.i.d.}{\sim}N(\mu,\sigma^2),i=1,\ldots, n$. ### (10.2) focuses on $\mu$ The test statistic: $$\frac{\bar{X}-\mu}{s/\sqrt{n}}\sim t_{df=(n-1)}.$$ - The $100(1-\alpha)\%$ CI for $\mu$ is $$[\bar{X}-t_{\alpha/2}\frac{s}{\sqrt{n}},\bar{X}+t_{\alpha/2}\frac{s}{\sqrt{n}}]$$ - For $H_0: \mu=\mu_0$, the test statistic is $$\frac{\bar{x}-\mu_0}{s/\sqrt{n}}\sim t_{(n-1)}.$$ ### (10.5) focuses on $\sigma^2$ $$\frac{(n-1)s^2}{\sigma^2}\sim \chi^2_{(n-1)}.$$ - The $100(1-\alpha)\%$ CI for $\sigma^2$ is $$\left[\frac{(n-1)s^2}{\chi_{\alpha/2}^2}, \frac{(n-1)s^2}{\chi_{1-\alpha/2}^2}\right]$$ - For $H_0: \sigma^2 = \sigma_0^2$, the test statistic is $$\chi^2_{STAT} = \frac{(n-1)s^2}{\sigma^2_0}\sim \chi^2_{(n-1)}.$$ ### Case 2: Two populations $$X_{1i}\stackrel{i.i.d.}{\sim}N(\mu_1,\sigma_1^2),i=1,\ldots, n_1$$ $$X_{2j}\stackrel{i.i.d.}{\sim}N(\mu_2,\sigma^2_2),i=1,\ldots, n_2$$ ### (10.3) case 1 focuses on $\mu_1-\mu_2$ assuming $\sigma_1^2=\sigma_2^2$ $$t_{STAT} = \frac{\bar{x}_1-\bar{x}_2-(\mu_1-\mu_2)}{\sqrt{s_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\sim t_{(n_1+n_2-2)},$$with $$s_p^2 = \frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}.$$ - CI: $(\bar{x}_1-\bar{x}_2)\pm t_{\alpha/2} \sqrt{s_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}$ - For $H_0: \mu_1-\mu_2 = \mu_0$, if $\sigma_1^2 = \sigma^2_2$, the test statistic is $$t_{STAT} = \frac{\bar{x}_1-\bar{x}_2-\mu_0}{\sqrt{s_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\sim t_{(n_1+n_2-2)}.$$ ### (10.3) case 2 focuses on $\mu_1-\mu_2$ assuming $\sigma_1^2\neq\sigma_2^2$ $$t_{STAT} = \frac{(\bar{x}_1-\bar{x}_2)-(\mu_1-\mu_2)}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\approx t_{df},$$with $df=\frac{\left( \frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1}+\frac{(s_2^2/n_2)^2}{n_2-1}}$ - CI: $(\bar{x}_1-\bar{x}_2)\pm t_{\alpha/2}\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}$ - For $H_0: \mu_1-\mu_2 = \mu_0$, if $\sigma_1^2 \neq \sigma^2_2$. $$t_{STAT} = \frac{(\bar{x}_1-\bar{x}_2)-\mu_0}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\approx t_{df}.$$ ### (10.6) focuses on $\sigma_1^2/\sigma_2^2$ The test statistic: $$\frac{s_1^2/\sigma_1^2}{s_2^2/\sigma_2^2}\sim F, df_1=(n_1-1), df_2 = (n_2-2).$$ - The $100(1-\alpha)\%$ confidence interval is $$\left[\frac{s_1^2}{s_2^2}\frac{1}{F_{\alpha/2,df_1,df_2}},\; \frac{s_1^2}{s_2^2}{F_{\alpha/2,df_2,df_1}}\right]$$ - For $H_0: \sigma_1^2 = \sigma_2^2$, the test statistic is $$F_{STAT} = \frac{s_1^2}{s_2^2}\sim F,\;df_1 = (n_1-1),df_2=(n_2-1).$$ ### Case 3: Paired population difference $$X_{1i}-X_{2i}\stackrel{i.i.d.}{\sim} N(\mu_D,\sigma^2)$$ or, $$D_i\sim N(\mu_D,\sigma^2),$$ with $D_{i}=X_{1i}-X_{2i}$. ### (10.4) focuses on $\mu_D$ $$\frac{\bar{D}-\mu_D}{s_D/\sqrt{n}}\sim t_{df=n-1}.$$ - The $100(1-\alpha)\%$ confidence interval for $\mu_1-\mu_2$ is $$\bar{D}\pm t_{\alpha/2} \frac{s_D}{\sqrt{n}},$$ - For $H_0: \mu_1-\mu_2 = \mu_{0}$, the test statistic is $$t_{STAT}=\frac{\bar{D}-\mu_0}{s_D/\sqrt{n}}\sim t_{(n-1)}.$$ [Stat202202](/q9HjXLBNSlaycgL5vCF-dQ)