--- title: Stat07 tags: Stat --- # 7 Sampling distributions ## Assess normality Quantile-Quantile plot (QQplot) ![](https://i.imgur.com/S7BBVbs.png) ![](https://i.imgur.com/ahu7V7E.png) ![](https://i.imgur.com/Vj3pqDr.png) ![](https://i.imgur.com/VkGGo2D.png) ## Overview Recall $X\sim F(\theta)$: - Discrete distribution: with pmf $p(x)$. $$E[X] = \sum x p(x)$$ $$E[g(X)] = \sum g(x) p(x)$$ - Continuous distribution with pdf $f(x)$ $$E[X] = \int x p(x) dx$$ $$E[g(X)] = \int g(x) p(x) dx$$ - $$Var(X) = E[(X-E(X))^2]$$ ### Example $X\sim Bernoulli(p)$. Find its mean and variance. $$E(X)= 1\times p+ 0 \times (1-p) = p.$$ $$Var(X) = (1-p)^2\times p + (0-p)^2\times (1-p)=p(1-p)=pq,$$ with $q = (1-p)$. | Index| Distribution | Notation $F(\theta)$ | Mean | Variance| | --- |---|---|--- | ---| | 1 | Normal distribution with mean $\mu$ and variance $\sigma^2$ | $N(\mu,\sigma^2)$ | $\mu$ | $\sigma^2$| | 2 | Bernoulli distribution with probability $p$| $Bernoulli(p)$ | $p$ | $pq$| ### Population Let $F(\theta)$ denote a distribution with parameter $\theta=(\theta_1,\theta_2,\ldots,\theta_p)$. ### Samples Let $X_i\stackrel{i.i.d}{\sim} F(\theta)$ for $i=1,\ldots, n$. We call that $X_i$ a sample of $F(\theta)$ of size $n$. ### A statistic? 一個統計量: a statistic (可數單數名詞) 很多個統計量: statistics (可數複數名詞) 統計學: statistics (不可數名詞) **統計**: 統統拿來計算一下 **統計量**: 統統拿來計算一下 - Consider a sample of size $n$ following a distribution $F(\theta)$: Let $X_i\stackrel{i.i.d}{\sim} F(\theta)$ for $i=1,\ldots, n$. - A statistic is of the form of $g(X_1,\ldots,X_n),$ for some function $g(\cdot)$. - A statistic is used to estimate a parameter. :::info A statistic is of the form of $$g(X_1,\ldots,X_n),$$for some function $g(\cdot)$. ::: ### Examples of a statistic #### General case Consider a sample of size $n$ following a distribution $F(\theta)$. $F(\theta)$ has the population mean $\mu$ and population variance $\sigma^2$. Let $X_i\stackrel{i.i.d}{\sim} F(\theta)$ for $i=1,\ldots, n$. To estimate the population mean $\mu$, we use the *sample mean* $$\bar{X}:=\bar{X}_n=\frac{1}{n}\sum_{i=1}^n X_i=\frac{X_1+X_2+\cdots+X_n}{n}=g_{m}(X_1,\ldots,X_n).$$ To estimtate the population variance, we use the *sample variance* $$s^2:=s^2_n=\frac{1}{(n-1)}\sum_{i=1}^n(X_i-\bar{X})^2=\frac{(X_1-\bar{X})^2+\cdots+(X_n-\bar{X})^2}{n-1}=g_{v}(X_1,\ldots,X_n)$$ #### $N(\mu,\sigma^2)$ #### $Bernoulli(p)$ If $F(\theta)\sim Bernoulli(p)$, we know $\mu = p$ and $\sigma^2 = pq$. We estimate $p$ by $$\bar{X}=\frac{1}{n}X_i \stackrel{\triangle}{=} \hat{p}.$$ We call $\hat{p}$ the *sample proportion*. | Index| Notation $F(\theta)$ | Parameter | Statistic| | --- |---|---|--- | | Case 1| $N(\mu,\sigma^2)$ | $\mu$ | $\bar{X}$| | | $N(\mu,\sigma^2)$ | $\sigma^2$ | $s^2$| | Case 2 | $Bernoulli(p)$ | $p$ | $\hat{p}$| ### A statistic versus a realized statistic - Before sampling: A statistic can be written as $g(X_1,\ldots,X_n)$ and hence is a random variable. :+1: The distribution of the **statistic** is called the **sampling distribution**. - After sampling: After collecting data and plugging in the data to the formula of the statistic, we obtain *realized statistic*. ## :apple: Show the following 6 examples 1. Calculate by hand 2. Approximate using simulation (no worries) 3. Approximate by the Central Limit Theorem (CLT) ### Example 1 (slides page 12) ### Example 1a. Another example of sampling distribution Toss a coin three times. We record $+1$ if a head faces up and 0 otherwise. Suppose that this coin is fair, i.e., the probability of getting head is $p=1/2$. Let $X_i$ denote the outcome of the $i$-th throw. Then, $X_i\stackrel{i.i.d.}{\sim}Bernoulli(p=1/2)$. 1. Enumerate all possible outcomes of $(X_1,X_2,X_3)$ and their probabilities. 2. Find the sampling distribution of the sample proportion $\hat{p}=\frac{1}{3}(X_1+X_2+X_3)$. | ($X_1$,$X_2$, $X_3$)| $\hat{p}$| Prob| | -- |---|--| | (1,1,1)| 1| 1/8 |(1,1,0)| 2/3| 1/8 | (1,0,1)| 2/3| 1/8 |(1,0,0)|1/3| 1/8 | (0,1,1)|2/3| 1/8 |(0,1,0)|1/3| 1/8 | (0,0,1)|1/3| 1/8 |(0,0,0)|0| 1/8| We summarize the sampling distribution of $\hat{p}$ as follows. |$\hat{p}$|Prob| |--|---| | 0 |1/8| |1/3| 3/8| |2/3| 3/8| | 1 | 1/8| ### Example 2 (slides p 14-16) Approximate the sampling distribution by simulation. ## Central Limit Theorem (CLT) :::info Recall - Expecatation (or Mean): $E[\bar{X}] = \mu$ - Variance: $Var(\bar{X})=\sigma^2/n$. - Standard deviation: $SD(\bar{X}) =\sqrt{\frac{\sigma^2}{n}}=\sigma/\sqrt{n}$ (also called standard error) ::: ### General case - Consider a sample of size $n$ following a distribution $F(\theta)$ with population mean $\mu$ and population variance $\sigma^2$. - Suppose $X_i\stackrel{i.i.d}{\sim} F(\theta)$ for $i=1,\ldots, n$. - Let $\bar{X}$ denote the sample mean $\bar{X}=\frac{1}{n}\sum_{i=1}^n X_i.$ $$\frac{\bar{X}-\mu}{\sqrt{\sigma^2/n}}\stackrel{d}{\rightarrow}N(0,1). $$ :::success Plain language: 標準化後的樣本平均數,分布收斂到標準常態分配。 The normalized sample mean converges to the standard normal random variable in distribution. ::: ### Special case 1 - Consider a sample of size $n$ following a distribution $N(\mu,\sigma^2)$. - Suppose $X_i\stackrel{i.i.d}{\sim} N(\mu,\sigma^2)$ for $i=1,\ldots, n$. $$\frac{\bar{X}-\mu}{\sqrt{\sigma^2/n}} \sim N(0,1). $$ ### Example 3 (slides p.20) ### Example 4 (slides p.21) ### Special case 2 - Consider a sample of size $n$ following a distribution $Bernoulli(p)$. (i.e., with population $\mu=p$ and population variance $\sigma^2=pq$) - Suppose $X_i\stackrel{i.i.d}{\sim} Bernoulli(p)$ for $i=1,\ldots, n$, $$\frac{\hat{p}-p}{\sqrt{pq/n}} \stackrel{d.}{\rightarrow} N(0,1). $$ - An alternative expression in the textbook. We say $X\sim Binomial(n,p)$. And, we denote the sample proportion by $\hat{p}=\frac{X}{n}$. Equivalentlty, we have $$\frac{\hat{p}-p}{\sqrt{pq/n}} \stackrel{d.}{\rightarrow} N(0,1). $$ ### Example 5 (7.5.9) Random samples of size $n=75$ were selected from a Binomial population with $p=0.4$. Approximate the following probabilities using CLT. #### Solution $$P(\hat{p}\leq 0.43)=P(\frac{\hat{p}-0.4}{\sqrt{0.4*0.6/75}}\leq \frac{0.43-0.4}{\sqrt{0.4*0.6/75}})\approx P(Z\leq \frac{0.43-0.4}{\sqrt{0.4*0.6/75}})=P(Z\leq 0.53)=0.70 $$ ### Example 6 (7.5.27) A fair coin is tossed $n=80$ times. Let $\hat{p}$ be the sample proportion of heads. Find $P(0.44<\hat{p}<0.61)$. #### Solution \begin{align} P(0.44<\hat{p}<0.61) &= P\left(\frac{0.44-0.5}{\sqrt{0.5\times0.5/80}}<\frac{\hat{p}-0.5}{\sqrt{0.5\times0.5/80}}<\frac{0.61-0.5}{\sqrt{0.5\times0.5/80}}\right)\\ &\approx P\left(\frac{0.44-0.5}{\sqrt{0.5\times0.5/80}}<Z<\frac{0.61-0.5}{\sqrt{0.5\times0.5/80}}\right)\\ &= P(-1.07 <Z<1.96)=0.8339.\end{align}