Statistics Mid-term Note

[toc] ## Ch 2 ### Population $\mu: \text{population mean}$ $\begin{align} \mu=\frac{\sum{x_i}}{N} \end{align}$ $\sigma: \text{populatoin variance}$ $\begin{align} \sigma&=\frac{\sum(x_i-\mu)^2}{N}\\ &=\frac{\sum x_i^2}{N}-\mu^2 \end{align}$ ### Sample $\bar x:\text{sample mean}$ $\begin{align} \bar x =\frac{\sum{x_i}}{n} \end{align}$ $s^2: \text{sample variance}$ $\begin{align} s^2&=\frac{\sum(x_i-\bar x)^2}{n-1}\\ &=\frac{\sum x_i^2-\frac{(\sum x_i)^2}{n}}{n-1} \end{align}$ $s: \text{sample standard deviation}$ $\begin{align} s=\sqrt{s^2} \end{align}$ ### Stem and Leaf Diagram **e.g. 1** ![](https://i.imgur.com/SBJUz3i.png) **e.g. 2** ![](https://i.imgur.com/QxLmcIk.png) ### Histograms **e.g. 1** ![](https://i.imgur.com/87SyOXP.png) **e.g. 2** ![](https://i.imgur.com/16ObA2a.png) ### Bar chart **e.g. 1** ![](https://i.imgur.com/sRQWUap.png) ### Pareto chart Basically bar graph but sorted. **e.g. 1** ![](https://i.imgur.com/rtAU9UT.png) ### Box Plots ![](https://i.imgur.com/7oWujIw.png) **whisker** the min/max point within 1.5 IQR from the quartiles **outliers** points within 3 IQR ### Sample Correlation Coefficient $\begin{align} S_{xy}&=\sum(x_i-\bar x)(y_i-\bar y)\\ &=\sum x_i y_i -\frac1n(\sum x_i)(\sum y_i) \end{align}$ $\begin{align} r=\frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}} \end{align}$ ## Ch 3 ### Random variable a variable whose measured value can change in a random experiment. **e.g. 1** toss the coins, the total number of heads observed is called a random variable. ### Discrete (Countable) countable object, finite or infinite ### Continuous infinite and uncountable, measurable ### Probability Density Function (PDF) $P(a<X<b)=\int^b_af(x)dx$ ![](https://i.imgur.com/xT5SGc4.png) if X is continuous random variable $P(x=a)=0$ ### Cumulative Distribution Function (CDF) $F(x)=P(X\leq x)=\int^x_{-\infty}f(u)du$ ### Mean (Expected value) $\mu=E(X)=\int^\infty_{-\infty}xf(x)dx$ ### Variance $\sigma^2=V(X)=\int^\infty_{-\infty}(x-\mu)^2f(x)dx=E(X^2)-\mu^2$ The **standard deviation** of $X$ is $\sigma$ ### Normal distribution (Gaussian distribution) Bell-shaped and symmetrical denote as $X\sim N(\mu,\sigma^2)$ ### Standard normal random variable $\begin{align} Z=\frac{X-\mu}{\sigma} \end{align}$ $P(X\leq x)=P(Z\leq z)$ where $Z$ is a standard normal random variable, and $z=(x-\mu)/\sigma$ is the z-value obtained by standardize $x$. ### Normal Probability Plots For determining whether sample data conform to a hypothesized distribution. ### Probability Mass Function for **discrete** random variable $f(x_i)=P(X=x_i)$ ### Cumulative Distribution Function for **discrete** random variable $F(x)=P(X\leq x)=\sum f(x_i)$ ### Mean and Variance the same except the integral is replaced by sum. ### Binomial Distribution A trail with only **two possible outcomes** is call **Bernoulli trial**. usually assumed 1. the random experiment are **independent**. 2. the probability of a success on each trial is **constant** #### Mean & Variance $\mu = E(X)=np$ $\sigma^2=V(X)=npq=np(1-p)$ ### Poisson Process Number of events that occur in an interval. ### Poisson Distribution #### Mean & Variance $\mu=\sigma^2=\lambda=np$ $\begin{align} p(x)=\frac{\lambda^xe^{-\lambda}}{x!} \end{align}$ ### Exponential Distribution Is a continuous distribution. **e.g.** The length of time or the distance between occurrences of random events. $\begin{align} \theta = \frac{1}{\lambda} \end{align}$ #### pdf $f(x)=\lambda e^{-\lambda x}$ #### Mean & Variance $\begin{align} E(X)=\frac{1}{\lambda} \end{align}$ $\begin{align} V(X)=\frac{1}{\lambda^2} \end{align}$ Mean = SD = $\theta$ ### Normal Approximation to Binomial and Poisson Distributions 1. Requires large sample size 2. Need correction for continuity $np\geq 5 \& nq\geq 5$ ### formulas ![](https://i.imgur.com/tmkF1uA.png) ### Power $1-\beta$ (type II error) The power of a statistical test is the probability of rejecting the null hypothesis $H_0$ when the alternative hypothesis is true. ### P-Values in Hypothesis Testing the smallest level of significance that would lead to rejection of null Hypothesis $H_0$ ### Large Sample Test if $n\geq 30$, the sample variance $s^2$ will be close to $\sigma^2$ for most samples ### Confidence Interval 95% of our confidence intervals will contain $\mu$ and 5% will not. #### confidence level (confidence coefficient) $1-\alpha$ (type I error)