---
title: Stat07
tags: Stat
---
# 7 Sampling distributions
## Overview
Recall $X\sim F(\theta)$:
- Discrete distribution: with pmf $p(x)$.
$$E[X] = \sum x p(x)$$
$$E[g(X)] = \sum g(x) p(x)$$
- Continuous distribution with pdf $f(x)$
$$E[X] = \int x p(x) dx$$
$$E[g(X)] = \int g(x) p(x) dx$$
- $$Var(X) = E[(X-E(X))^2]$$
### Example
$X\sim Bernoulli(p)$. Find its mean and variance.
$$E(X)= 1\times p+ 0 \times (1-p) = p.$$
$$Var(X) = (1-p)^2\times p + (0-p)^2\times (1-p)=p(1-p)=pq,$$
with $q = (1-p)$.
| Index| Distribution | Notation $F(\theta)$ | Mean | Variance|
| --- |---|---|--- | ---|
| 1 | Normal distribution with mean $\mu$ and variance $\sigma^2$ | $N(\mu,\sigma^2)$ | $\mu$ | $\sigma^2$|
| 2 | Bernoulli distribution with probability $p$| $Bernoulli(p)$ | $p$ | $pq$|
### Population
Let $F(\theta)$ denote a distribution with parameter $\theta=(\theta_1,\theta_2,\ldots,\theta_p)$.
### Samples
Let $X_i\stackrel{i.i.d}{\sim} F(\theta)$ for $i=1,\ldots, n$. We call that $X_i$ a sample of $F(\theta)$ of size $n$.
### A statistic?
一個統計量: a statistic (可數單數名詞)
很多個統計量: statistics (可數複數名詞)
統計學: statistics (不可數名詞)
**統計**: 統統拿來計算一下
**統計量**: 統統拿來計算一下
- Consider a sample of size $n$ following a distribution $F(\theta)$: Let $X_i\stackrel{i.i.d}{\sim} F(\theta)$ for $i=1,\ldots, n$.
- A statistic is of the form of $g(X_1,\ldots,X_n),$ for some function $g(\cdot)$.
- A statistic is used to estimate a parameter.
:::info
A statistic is of the form of
$$g(X_1,\ldots,X_n),$$for some function $g(\cdot)$.
:::
### Examples of a statistic
#### General case
Consider a sample of size $n$ following a distribution $F(\theta)$. $F(\theta)$ has the population mean $\mu$ and population variance $\sigma^2$.
Let $X_i\stackrel{i.i.d}{\sim} F(\theta)$ for $i=1,\ldots, n$.
To estimate the population mean $\mu$, we use the *sample mean*
$$\bar{X}:=\bar{X}_n=\frac{1}{n}\sum_{i=1}^n X_i=\frac{X_1+X_2+\cdots+X_n}{n}=g_{m}(X_1,\ldots,X_n).$$
To estimtate the population variance, we use the *sample variance*
$$s^2:=s^2_n=\frac{1}{(n-1)}\sum_{i=1}^n(X_i-\bar{X})^2=\frac{(X_1-\bar{X})^2+\cdots+(X_n-\bar{X})^2}{n-1}=g_{v}(X_1,\ldots,X_n)$$
#### $N(\mu,\sigma^2)$
#### $Bernoulli(p)$
If $F(\theta)\sim Bernoulli(p)$, we know $\mu = p$ and $\sigma^2 = pq$. We estimate $p$ by
$$\bar{X}=\frac{1}{n}X_i \stackrel{\triangle}{=} \hat{p}.$$
We call $\hat{p}$ the *sample proportion*.
| Index| Notation $F(\theta)$ | Parameter | Statistic|
| --- |---|---|--- |
| Case 1| $N(\mu,\sigma^2)$ | $\mu$ | $\bar{X}$|
| | $N(\mu,\sigma^2)$ | $\sigma^2$ | $s^2$|
| Case 2 | $Bernoulli(p)$ | $p$ | $\hat{p}$|
### A statistic versus a realized statistic
- Before sampling: A statistic can be written as $g(X_1,\ldots,X_n)$ and hence is a random variable.
:+1: The distribution of the **statistic** is called the **sampling distribution**.
- After sampling: After collecting data and plugging in the data to the formula of the statistic, we obtain *realized statistic*.
## :apple: Show the following 6 examples
1. Calculate by hand
2. Approximate using simulation (no worries)
3. Approximate by the Central Limit Theorem (CLT)
### Example 1 (slides page 12)
### Example 1a. Another example of sampling distribution
Toss a coin three times. We record $+1$ if a head faces up and 0 otherwise. Suppose that this coin is fair, i.e., the probability of getting head is $p=1/2$. Let $X_i$ denote the outcome of the $i$-th throw. Then, $X_i\stackrel{i.i.d.}{\sim}Bernoulli(p=1/2)$.
1. Enumerate all possible outcomes of $(X_1,X_2,X_3)$ and their probabilities.
2. Find the sampling distribution of the sample proportion $\hat{p}=\frac{1}{3}(X_1+X_2+X_3)$.
| ($X_1$,$X_2$, $X_3$)| $\hat{p}$| Prob|
| -- |---|--|
| (1,1,1)| 1| 1/8
|(1,1,0)| 2/3| 1/8
| (1,0,1)| 2/3| 1/8
|(1,0,0)|1/3| 1/8
| (0,1,1)|2/3| 1/8
|(0,1,0)|1/3| 1/8
| (0,0,1)|1/3| 1/8
|(0,0,0)|0| 1/8|
We summarize the sampling distribution of $\hat{p}$ as follows.
|$\hat{p}$|Prob|
|--|---|
| 0 |1/8|
|1/3| 3/8|
|2/3| 3/8|
| 1 | 1/8|
### Example 2 (slides p 14-16)
Approximate the sampling distribution by simulation.
## Central Limit Theorem (CLT)
:::info
Recall
- Expecatation (or Mean): $E[\bar{X}] = \mu$
- Variance: $Var(\bar{X})=\sigma^2/n$.
- Standard deviation: $SD(\bar{X}) =\sqrt{\frac{\sigma^2}{n}}=\sigma/\sqrt{n}$ (also called standard error)
:::
### General case
### Special case 1
- Consider a sample of size $n$ following a distribution $N(\mu,\sigma^2)$.
- Suppose $X_i\stackrel{i.i.d}{\sim} N(\mu,\sigma^2)$ for $i=1,\ldots, n$.
$$\frac{\bar{X}-\mu}{\sqrt{\sigma^2/n}} \sim N(0,1). $$
### Example 3 (slides p.20)
### Example 4 (slides p.21)