# Introductory Statistics
## Probability Density Function
- closely related to the parent population
- probability that an $x$ will be in the range of $x$ to $x+dx=f(x)dx$
$$\left\{
\begin{array}{**lr**}
1=\int_{-\infty}^{\infty} f(x)\, dx \\
prob.(a<x<b)=\int_{a}^{b} f(x)\, dx \\
f(x)\ge 0 \\
\end{array}
\right.$$
- In general, f(x) is continuous.
- All statististical quantities are obtained from its integral.
## Properties of Parent Population
### 1. Location
#### mean
$$\mu\equiv\int_{-\infty}^{\infty} xf(x)\, dx=\lim_{n \to \infty} \frac{1}{N}\sum_{}^N x_i$$
#### median
$$\mu_\frac{1}{2}\equiv\int_{-\infty}^{\mu_\frac{1}{2}} xf(x)\, dx\quad;\quad\sum_{i=1}^{\mu_\frac{1}{2}}P(x)=\sum_{i=\mu_\frac{1}{2}}^{\infty}P(x)$$
#### mode
$$P(\mu_{max})\ge P(x)$$
$$f(\mu_{max})dx\ge f(x)dx$$
In practice in astronomy, for symmetric $f(x)$, $\mu=\mu_{\frac{1}{2}}=\mu_{max}$
$\mu_{\frac{1}{2}}$ is more stable to the effect of outliers.
### 2. Width
#### Variance
$$\sigma^2\equiv\int_{-\infty}^{\infty} (x-\mu)^2f(x)\, dx=\int_{-\infty}^{\infty} x^2f(x)\, dx-\mu^2 \quad$$ or
$$\sigma^2\equiv\lim_{n \to \infty} \frac{1}{N}\sum_{}^N (x_i-\mu)^2=\lim_{n \to \infty} \frac{1}{N}\sum_{}^N x_i^2-\mu^2$$
#### standard deviation
$$\sigma=\sqrt{Variance}$$
#### moments
$$\mu_n\equiv\int_{-\infty}^{\infty} (x-\mu)^nf(x)\, dx=\lim_{n \to \infty} \frac{1}{N}\sum_{}^N (x_i-\mu)^n$$
## To Estimate Properties of Parent Population from a Sample
### 1. Sample mean
$$\bar{x}=\frac{1}{N}\sum{x_i}$$
Is $\bar{x}$ a good estimate of $\mu$ ?
#### expection value
\begin{eqnarray}
\bar{\bar{x}}&=&\int_{-\infty}^{\infty} \bar{x}f(x)\, dx=\int_{-\infty}^{\infty}(\frac{1}{N}\sum_{}^N x_i)f(x)\,dx\\
&=&\frac{1}{N}\sum\int_{-\infty}^{\infty} {x_i}f(x)\, dx=\frac{1}{N}\sum\mu\\
&=&\mu
\end{eqnarray}
You can also show that $\sigma_{\bar{x}}^2=\frac{1}{N}\sigma^2$ or $\sigma_{\bar{x}}=\frac{1}{\sqrt N}\sigma$
Standard deviaction of $\bar{x}$ decrease as $\frac{1}{\sqrt N}$
### 2. Sample Variance
#### Sample Variance
\begin{eqnarray}
s^2&\equiv&\frac{1}{N-1}\sum_{i=1}^N(x_i-\bar x)^2\\
&=&\frac{1}{N-1}\sum_{}^Nx_i^2-\frac{N}{N-1}\bar{x}^2
\end{eqnarray}
Why N-1?
#### Expectation value of $s^2$ ?
\begin{eqnarray}
\bar{s^2}&=&\frac{1}{N-1}\int\sum_ix_i^2f(x)\, dx-\frac{N}{N-1}\bar{x}^2f(x)dx\\
&=&\frac{1}{N-1}\sum\int x_i^2f(x)\, dx-\frac{N}{N-1}(\sigma_\bar{x}^2+\bar{\bar{x}})\\
&=&\frac{N}{N-1}(\sigma^2+\mu^2)-\frac{N}{N-1}(\frac{\sigma^2}{N}+\mu^2)\\
&=&\sigma^2
\end{eqnarray}
#### variance of $s^2$ ?
$$\sigma_s^2=\frac{\mu_4-\mu_3}{N}+\frac{2}{N(N-1)}\mu^2$$
- Sample mean is an unbiased estimate of the population mean.
- Sample variance is an unbiased estimate of the population variance.
## Errors
- measure $x$ N times $\rightarrow$ $\mu$ via $\bar{x}$, ${\sigma}$ via $s$
- No $x_i$ can be infinitely precise
- There is a spread in our measurements, producing a histogram pf finite width.
- If $x_i$ has an error of $\sigma$ we know the true value of $x_i\pm\sigma$ with some confidence.
- To improve the error on measurement of $\bar x$ $\rightarrow$ increase N
should stop to increase N when error reaches the "systematic" error in the experiment.
<font color="FF6F61">$$\sigma_{\bar x} \propto \frac{1}{\sqrt N}$$</font>
### Propagation of Errors
$$a=5\pm1\quad b=7\pm2 \\ \Rightarrow a+b=12+?$$
#### error prop.equation
\begin{eqnarray}
f&=&f(u,v),\quad \bar f=f(\bar u,\bar v)\\
f&-&\bar f =\frac{\partial f}{\partial u}(u-\bar u)-\frac{\partial f}{\partial v}(v-\bar v)\\
\sigma_f^2&=&\lim_{n \to \infty} \frac{1}{N}\sum_{i} (f_i-\bar f)^2\\
&\approx &\lim_{n \to \infty} \frac{1}{N}\sum_{i} [\frac{\partial f}{\partial u}(u_i-\bar u)+\frac{\partial f}{\partial v}(v_i-\bar v)]^2\\
&=&\lim_{n \to \infty} \frac{1}{N}\sum_{i} [(\frac{\partial f}{\partial u})^2(u_i-\bar u)^2+(\frac{\partial f}{\partial v})^2(v_i-\bar v)^2+2\frac{\partial f}{\partial u}\frac{\partial f}{\partial v}(u_i-\bar u)(v_i-\bar v)]\\
&=&(\frac{\partial f}{\partial u})^2\sigma_u^2+(\frac{\partial f}{\partial v})^2\sigma_v^2+2\frac{\partial f}{\partial u}\frac{\partial f}{\partial v}\sigma_{uv}^2\\
\end{eqnarray}
#### Note
1. Only for small errors, i.e. $\frac{\sigma_u}{u}$,$\frac{\sigma_v}{v}$ $\lesssim$ 10%
For big errors, $\sigma_f^2=[f(\bar u+\sigma_u,\bar v)-f]^2+[f(\bar u,\bar v+\sigma_v)-f]^2$
2. f(u,v) generalize to f(u,v,w,x,y,...)
3. "covariance" of u,v
$$\sigma_uv^2\equiv\lim_{n \to \infty}\frac{1}{N}\sum(u_i-\bar u)(v_i-\bar v)$$
$\sigma_{uv}^2 = 0\quad$if u and v are independent
$\sigma_{uv}^2 \neq 0\quad$if u and v are correlated. can be < 0
#### Special case
1. $f=au+bv \quad a,b>0,\quad\frac{\partial f}{\partial u}=a,\quad \frac{\partial f}{\partial v}=\pm b \\
\sigma_f^2=a^2\sigma_u^2+b^2\sigma_v^2\pm 2ab\sigma_{uv}$
2. $f=\pm aubv \\
\sigma_f^2=(av\sigma_u)^2+(au\sigma_v)^2\pm 2a^2uv\sigma_{uv}^2$
or
$(\frac{\sigma_f}{f})^2=(\frac{\sigma_u}{u}^2)+(\frac{\sigma_v}{v})^2+2\frac{\sigma_{uv}^2}{uv}$
3. $f=\pm a\frac{u}{v}\\
(\frac{\sigma_f}{f})^2=(\frac{\sigma_u}{u}^2)+(\frac{\sigma_v}{v})^2-2\frac{\sigma_{uv}^2}{uv}$
4. $f=au^{\pm b},\quad \frac{\sigma_f}{f}=\pm b \frac{\sigma_u}{u}$
5. $f=ae^{\pm bu},\quad \frac{\sigma_f}{f}=\pm b \sigma_u$
6. $f=a^{\pm bu},\quad \frac{\sigma_f}{f}=\pm (b\ln a){\sigma_u}$
7. $f=a\ln ({\pm bu}),\quad \sigma_f=a \frac{\sigma_u}{u}$
## Central Limit Theorem
- Even if $f(x)$ is not Gaussian, the distribution of $\bar x$ <font color="FF6F61">**IS Gaussian**</font> when $N \rightarrow \infty$
$$\bar{\bar x}=\mu,\quad variance = \frac{\sigma^2}{N}$$
- For Gaussian distribution, $\sigma_{\mu_\frac{1}{2}}^2=\frac{\pi}{2}\sigma_\mu^2$
## Probability Distribution
### Uniform Distribution
$$P(x;a,b)=\left\{
\begin{array}{**lr**}
\frac{1}{b-a}&,&\text{if}\ a \le x \le b\\
0&,& \text{otherwise} \\
\end{array}
\right.$$
$$\mu=\frac{a+b}{2}\\
\sigma^2=\frac{b^2-a^2}{12}$$
### Binominal Distribution
### Poisson Distribution
### Gaussian (Normal) Distribution