# Introductory Statistics ## Probability Density Function - closely related to the parent population - probability that an $x$ will be in the range of $x$ to $x+dx=f(x)dx$ $$\left\{ \begin{array}{**lr**} 1=\int_{-\infty}^{\infty} f(x)\, dx \\ prob.(a<x<b)=\int_{a}^{b} f(x)\, dx \\ f(x)\ge 0 \\ \end{array} \right.$$ - In general, f(x) is continuous. - All statististical quantities are obtained from its integral. ## Properties of Parent Population ### 1. Location #### mean $$\mu\equiv\int_{-\infty}^{\infty} xf(x)\, dx=\lim_{n \to \infty} \frac{1}{N}\sum_{}^N x_i$$ #### median $$\mu_\frac{1}{2}\equiv\int_{-\infty}^{\mu_\frac{1}{2}} xf(x)\, dx\quad;\quad\sum_{i=1}^{\mu_\frac{1}{2}}P(x)=\sum_{i=\mu_\frac{1}{2}}^{\infty}P(x)$$ #### mode $$P(\mu_{max})\ge P(x)$$ $$f(\mu_{max})dx\ge f(x)dx$$ In practice in astronomy, for symmetric $f(x)$, $\mu=\mu_{\frac{1}{2}}=\mu_{max}$ $\mu_{\frac{1}{2}}$ is more stable to the effect of outliers. ### 2. Width #### Variance $$\sigma^2\equiv\int_{-\infty}^{\infty} (x-\mu)^2f(x)\, dx=\int_{-\infty}^{\infty} x^2f(x)\, dx-\mu^2 \quad$$ or $$\sigma^2\equiv\lim_{n \to \infty} \frac{1}{N}\sum_{}^N (x_i-\mu)^2=\lim_{n \to \infty} \frac{1}{N}\sum_{}^N x_i^2-\mu^2$$ #### standard deviation $$\sigma=\sqrt{Variance}$$ #### moments $$\mu_n\equiv\int_{-\infty}^{\infty} (x-\mu)^nf(x)\, dx=\lim_{n \to \infty} \frac{1}{N}\sum_{}^N (x_i-\mu)^n$$ ## To Estimate Properties of Parent Population from a Sample ### 1. Sample mean $$\bar{x}=\frac{1}{N}\sum{x_i}$$ Is $\bar{x}$ a good estimate of $\mu$ ? #### expection value \begin{eqnarray} \bar{\bar{x}}&=&\int_{-\infty}^{\infty} \bar{x}f(x)\, dx=\int_{-\infty}^{\infty}(\frac{1}{N}\sum_{}^N x_i)f(x)\,dx\\ &=&\frac{1}{N}\sum\int_{-\infty}^{\infty} {x_i}f(x)\, dx=\frac{1}{N}\sum\mu\\ &=&\mu \end{eqnarray} You can also show that $\sigma_{\bar{x}}^2=\frac{1}{N}\sigma^2$ or $\sigma_{\bar{x}}=\frac{1}{\sqrt N}\sigma$ Standard deviaction of $\bar{x}$ decrease as $\frac{1}{\sqrt N}$ ### 2. Sample Variance #### Sample Variance \begin{eqnarray} s^2&\equiv&\frac{1}{N-1}\sum_{i=1}^N(x_i-\bar x)^2\\ &=&\frac{1}{N-1}\sum_{}^Nx_i^2-\frac{N}{N-1}\bar{x}^2 \end{eqnarray} Why N-1? #### Expectation value of $s^2$ ? \begin{eqnarray} \bar{s^2}&=&\frac{1}{N-1}\int\sum_ix_i^2f(x)\, dx-\frac{N}{N-1}\bar{x}^2f(x)dx\\ &=&\frac{1}{N-1}\sum\int x_i^2f(x)\, dx-\frac{N}{N-1}(\sigma_\bar{x}^2+\bar{\bar{x}})\\ &=&\frac{N}{N-1}(\sigma^2+\mu^2)-\frac{N}{N-1}(\frac{\sigma^2}{N}+\mu^2)\\ &=&\sigma^2 \end{eqnarray} #### variance of $s^2$ ? $$\sigma_s^2=\frac{\mu_4-\mu_3}{N}+\frac{2}{N(N-1)}\mu^2$$ - Sample mean is an unbiased estimate of the population mean. - Sample variance is an unbiased estimate of the population variance. ## Errors - measure $x$ N times $\rightarrow$ $\mu$ via $\bar{x}$, ${\sigma}$ via $s$ - No $x_i$ can be infinitely precise - There is a spread in our measurements, producing a histogram pf finite width. - If $x_i$ has an error of $\sigma$ we know the true value of $x_i\pm\sigma$ with some confidence. - To improve the error on measurement of $\bar x$ $\rightarrow$ increase N should stop to increase N when error reaches the "systematic" error in the experiment. <font color="FF6F61">$$\sigma_{\bar x} \propto \frac{1}{\sqrt N}$$</font> ### Propagation of Errors $$a=5\pm1\quad b=7\pm2 \\ \Rightarrow a+b=12+?$$ #### error prop.equation \begin{eqnarray} f&=&f(u,v),\quad \bar f=f(\bar u,\bar v)\\ f&-&\bar f =\frac{\partial f}{\partial u}(u-\bar u)-\frac{\partial f}{\partial v}(v-\bar v)\\ \sigma_f^2&=&\lim_{n \to \infty} \frac{1}{N}\sum_{i} (f_i-\bar f)^2\\ &\approx &\lim_{n \to \infty} \frac{1}{N}\sum_{i} [\frac{\partial f}{\partial u}(u_i-\bar u)+\frac{\partial f}{\partial v}(v_i-\bar v)]^2\\ &=&\lim_{n \to \infty} \frac{1}{N}\sum_{i} [(\frac{\partial f}{\partial u})^2(u_i-\bar u)^2+(\frac{\partial f}{\partial v})^2(v_i-\bar v)^2+2\frac{\partial f}{\partial u}\frac{\partial f}{\partial v}(u_i-\bar u)(v_i-\bar v)]\\ &=&(\frac{\partial f}{\partial u})^2\sigma_u^2+(\frac{\partial f}{\partial v})^2\sigma_v^2+2\frac{\partial f}{\partial u}\frac{\partial f}{\partial v}\sigma_{uv}^2\\ \end{eqnarray} #### Note 1. Only for small errors, i.e. $\frac{\sigma_u}{u}$,$\frac{\sigma_v}{v}$ $\lesssim$ 10% For big errors, $\sigma_f^2=[f(\bar u+\sigma_u,\bar v)-f]^2+[f(\bar u,\bar v+\sigma_v)-f]^2$ 2. f(u,v) generalize to f(u,v,w,x,y,...) 3. "covariance" of u,v $$\sigma_uv^2\equiv\lim_{n \to \infty}\frac{1}{N}\sum(u_i-\bar u)(v_i-\bar v)$$ $\sigma_{uv}^2 = 0\quad$if u and v are independent $\sigma_{uv}^2 \neq 0\quad$if u and v are correlated. can be < 0 #### Special case 1. $f=au+bv \quad a,b>0,\quad\frac{\partial f}{\partial u}=a,\quad \frac{\partial f}{\partial v}=\pm b \\ \sigma_f^2=a^2\sigma_u^2+b^2\sigma_v^2\pm 2ab\sigma_{uv}$ 2. $f=\pm aubv \\ \sigma_f^2=(av\sigma_u)^2+(au\sigma_v)^2\pm 2a^2uv\sigma_{uv}^2$ or $(\frac{\sigma_f}{f})^2=(\frac{\sigma_u}{u}^2)+(\frac{\sigma_v}{v})^2+2\frac{\sigma_{uv}^2}{uv}$ 3. $f=\pm a\frac{u}{v}\\ (\frac{\sigma_f}{f})^2=(\frac{\sigma_u}{u}^2)+(\frac{\sigma_v}{v})^2-2\frac{\sigma_{uv}^2}{uv}$ 4. $f=au^{\pm b},\quad \frac{\sigma_f}{f}=\pm b \frac{\sigma_u}{u}$ 5. $f=ae^{\pm bu},\quad \frac{\sigma_f}{f}=\pm b \sigma_u$ 6. $f=a^{\pm bu},\quad \frac{\sigma_f}{f}=\pm (b\ln a){\sigma_u}$ 7. $f=a\ln ({\pm bu}),\quad \sigma_f=a \frac{\sigma_u}{u}$ ## Central Limit Theorem - Even if $f(x)$ is not Gaussian, the distribution of $\bar x$ <font color="FF6F61">**IS Gaussian**</font> when $N \rightarrow \infty$ $$\bar{\bar x}=\mu,\quad variance = \frac{\sigma^2}{N}$$ - For Gaussian distribution, $\sigma_{\mu_\frac{1}{2}}^2=\frac{\pi}{2}\sigma_\mu^2$ ## Probability Distribution ### Uniform Distribution $$P(x;a,b)=\left\{ \begin{array}{**lr**} \frac{1}{b-a}&,&\text{if}\ a \le x \le b\\ 0&,& \text{otherwise} \\ \end{array} \right.$$ $$\mu=\frac{a+b}{2}\\ \sigma^2=\frac{b^2-a^2}{12}$$ ### Binominal Distribution ### Poisson Distribution ### Gaussian (Normal) Distribution