CS2008301 機率與統計 Probability and Statistics === Textbook: Montgomery, D. C., & Runger, G. C. *Applied Statistics and Probability for Engineers*. John Wiley & Sons, Inc. Instructor: 沈上翔 Shan-Hsiang Shen 1 The Role of Statistics in Engineering --- ### 1.1 The Engineering Method and Statistical Thinking Engineering method (scientific method) 1. Develop a clear description 2. Identify the important factors 3. Propose or refine a model 4. Conduct experiments 5. Manipulate the model 6. Confirm the solution 7. Conclusions and recommendations ``` several cycles or iterations of step 2-4 may be required to obtain the final solution ``` Statistics - **the science of data** - deals with the collection, presentation, analysis, and use of data to make decisions, solve problems, and design products and processes Variability - successive observations of a system or phenomenon do not produce exactly the same result - **statistical thinking** can give us a useful way to incorporate it - statistics provides a framework for describing this variability and for learning about which potential **sources of variability** - **random variable:** the measurement exhibit variability - $X=\mu+\epsilon$ - $X$ : random variable - $\mu$ : constant - remains the same with every measurement - $\epsilon$ : random disturbance - small changes in the environment, variance in test equipment, differences in the individual parts - **dot diagram** - displaying the data up to about 20 observation - easily see two features of the data - **location** (middle) - **scatter** or **variability** Reasoning ```mermaid graph TD; L[Physical laws]; D[Product designs]; L-->D; ``` ```mermaid graph BT; Sample--Statistical inference-->Population; ``` ### 1.2 Collecting Engineering Data #### 1.2.2 Retrospective Study Data collected in the past for other purposes Hazards - contain relatively little useful **information** about the problem - some of the relevant data may be missing - transcription or recording errors resulting in **outliers** (unusual values) - other important factors may not have been collected and archived #### 1.2.3 Observational Study presently collected, by a passive observer - conducted for a relatively short time period - variables that are not routinely measured can be included #### 1.2.4 Designed Experiments - collected in response to process input changes - makes deliberate or purposeful changes in the controllable variables of the system or process - randomization are needed to establish cause-and-effect relationships comparative experiment - comparate the difference - hypothesis testing - test some aspect of the system in which we are interested - single-sample hypothesis-testing problem - two-sample hypothesis-testing problem factorial experiment - The specified values of the factors used in the experiment are called **factor levels** - typically two or three for each factor - detect the interaction - **fractional factorial experiment** - only a subset of the factor is actually tested #### 1.2.5 Observing Processes Over Time - time series plot - overcontrol / tampering - control chart - center line - upper control limit, lower control limit - enumerative study - collect data from a process to evaluate current production - analytic study - use data from current production to evaluate future production - statistical process control (SPC) ### 1.3 Mechanistic and Empirical Models - mechanistic model - built from our underlying knowledge of the basic physical mechanism - may have factor which are not completely controlled - empirical model - add $\epsilon$ to the mechanistic model - regression model - $f(x)=\beta_0+\beta_1x_1+\beta_2x_2+\epsilon$ - use least squares method to estimate ### 1.4 Probability and Probability Models - to quantify the risks involved in statistical inference 2 Probability --- ### 2.1 Sample Spaces and Events - Random Experiments: An experiment outcomes, even though it is repeated in the same manner every time - Sample Spaces: The set of all possible outcomes of a random experiment. Denoted as $S$. - Discrete Sample Spaces: it consists of a finite or countable infinite set of outcomes - **Continuous Sample Spaces:** it contains an interval of real numvers. - tree diagrams - Events: a subset of the sample space of a random experiment ### 2.3 Interpretations and Axioms of Probability - the probability of an event $E$, denoted as $P(E)$ #### Axioms of Probability 1. $P(S)=1$ where $S$ is the sample space 2. $0\leq P(E)\leq1$ for any event $E$ 3. For two events $E_1$ and $E_2$ with $E_1\cap E_2=\emptyset$, $P(E_1\cup E_2)=P(E_1)+P(E_2)$ ### 2.5 Conditional Probability - *Random Samples*: at each step of the sample, the items that remain in the batch are equally likely to be selected ### 2.6 Intersections of Events and Multiplication and Total Probability Rules - **Multiplication Rule**: $P(A\cap B)=P(B|A)P(A)=P(A|B)P(B)$ - **Total Probability Rule**: - $P(B)=P(B\cap A)+P(B\cap A')=P(B|A)P(A)+P(B|A')P(A')$ - $P(B)=P(B\cap E_1)+P(B\cap E_2)+..+P(B\cap E_k)=P(B|E_1)P(E_1)+P(B|E_2)P(E_2)+..+P(B|E_k)P(E_k)$ ### 2.9 Random Variables - a function that assigns a real number to each outcome in the sample space of a random expriment - denoted by an uppercase letter such as $X$ - the measured value of the random variable is denoted by a lowercase letter such as $x=$ 70 milliamperes - **discrete random variable:** with a finite (or countably infinite) range - *example: electrical current, length, pressure, temperature, time, voltage, weight* - **continuous random variable:** with an interval (either finite or infinite) of real numbers for its range - *example: number of scratches on a surface, proportion of defective parts among 1000 tested, number of transmitted bits received in error* 3 Discrete Random Variables and Probability Distributions --- ### 3.1 Probability Distributions and Probability Mass Functions - probability distribution: a description of probabilities associated with the possible values of $X$ - **probability mass function** 1. $f(x_i)\leq0$ 2. $\sum_{i=1}^nf(x_i)=1$ 3. $f(x_i)=P(X=x_i)$ ### 3.2 Cumulative Distribution Functions - alternate method for describing a random variable's probability distribution - to find the PMF - $F(x)=P(X\leq x)=\sum_{x_i\leq x}f(x_i)$ - $0\leq F(x)\leq1$ - If $x\leq y$, then $F(x)\leq F(y)$ ### 3.3 Mean and Variance of a Discrete Random Variable - mean(expected value): $\mu=E(X)=\sum_xxf(x)$ - measure of the center or middle of the probability distribution - variance: $\sigma^2=V(X)=E(X-\mu)^2=\sum_x(x-\mu)^2f(x)=\sum_xx^2f(x)-\mu^2$ - $\sum_x(x-\mu)^2f(x)=\sum_xx^2f(x)-2\mu\sum_xxf(x)+\mu^2\sum_xf(x)=\sum_xx^2f(x)-2\mu^2+\mu^2=\sum_xx^2f(x)-\mu^2$ - measure of the dispersion - standard deviation: $\sigma=\sqrt{\sigma^2}$ - Expected Value of a Function of a Discrete Random Variable: - $E[h(x)]=\sum_xh(x)f(x)$ ### 3.4 Discrete Uniform Distribution - a finite number of possible values, each with equal probability - $f(x_i)=\dfrac1n$ - $\mu=\sum_{k=a}^bk(\dfrac1{b-a+1})=\dfrac{b(b+1)-(a-1)a}2\dfrac1{b-a+1}=\dfrac{b+a}2$ - $\sigma^2=\dfrac{(b-a+1)^2-1}{12}$(有空補推導) ### 3.5 Binomial Distribution - Bernoulli trial - with only two possible outcomes - independent - probability of a success in each is constant - $f(x)=\dbinom nxp^x(1-p)^{n-x}$ - $\mu=E(X)=E(X_1)+E(X_2)+...+E(X_n)=np$ - $\sigma^2=V(X)=V(X_1)+V(X_2)+...+V(X_n)=np(1-p)$ ### 3.6 Geometric and Negative Binomial Distributions #### Geometric Distribution - Bernoulli trial are conducted until a success is obtained - $f(x)=(1-p)^{x-1}p$ - $\mu=p\sum_{k=1}^\infty kq^{k-1}=\dfrac\partial{\partial q}\lfloor\dfrac{pq}{1-q}\rfloor=\dfrac1p$ - $q=1-p$ - $\sigma^2=\dfrac{1-p}{p^2}$ - $V(X)=E(X^2)-(E(X))^2$ #### Lack of Memory Property - the count of the number of trials until the next success can be started at any trial without changing the probability distribution of the random variable #### Negative Binomial Distribution - the number of Bernoulli trials required to obtain $r$ successes results - $f(x)=\dbinom {x-1}{r-1}(1-p)^{x-r}p^r$ - $\mu=\dfrac rp$ - $\sigma^2=\dfrac{r(1-p)}{p^2}$ ### 3.7 Hypergeometric Distributioon - selected a sample set $n$ from set of $K$ successes and $N-K$ failture - $f(x)=\frac {\dbinom Kx\dbinom {N-K}{n-x}}{\dbinom KN}$ - $p=K/N$ - $\mu=np$ - $\sigma^2=np(1-p)(\dfrac{N-n}{N-1})$ - finite population correction factor: $\dfrac{N-n}{N-1}$ ### 3.8 Poisson Distribution - the count of events that occur within the interval $T$ - **Poisson process** 1. The probability of more than one event in a subinterval tends to zero 2. The probability of one event in a subinterval tends to $\lambda\Delta t$ 3. The event in each subinterval is independent of other subintervals - approximate independent Bernourlli trials with the number of trials equal to $n=T\Delta t$ and the success probability $p=\lambda T/n$ - $f(x)=\dfrac {e^{-\lambda T}(\lambda T)^x}{x!}$ - $\sum_{x=0}^\infty\frac{(\lambda T)^x}{x!}$ is Taylor's expansion of $e^{\lambda T}$ - $\mu=\lambda T$ - $\sigma^2=\lambda T$ 4 Continuous Random Variables and Probability Distributions --- ### 4.1 Probability Distributions and Probability Density Functions - Probability Density Function 1. $f(x)\geq 0$ 2. $\int_{-\infty}^{\infty}f(x)dx=1$ 3. $P(a\leq X\leq b)=\int_{a}^{b}f(x)dx$ - histogram: an approximation to a probability dessity function - $P(x_1\leq X\leq x_2)=P(x_1<X\leq x_2)=P(x_1\leq X<x_2)=P(x_1<X<x_2)$ ### 4.2 Cumulative Distribution Functions - $F(x)=P(X\leq x)=\int_{-\infty}^xf(u)du$ - $f(x)=\dfrac{dF(x)}{dx}$ ### 4.3 Mean and Variance of a Continuous Random Variable - $\mu=\int_{-\infty}^\infty xf(x)dx$ - $\sigma^2=\int_{-\infty}^\infty(x-\mu)^2f(x)dx=\int_{-\infty}^\infty x^2f(x)dx-\mu^2$ - also applied to functions of a continuous random variable ### 4.4 Continuous Uniform Disdribution - $f(x)=\dfrac1{b-a}$ - $\mu=\dfrac{a+b}2$ - $\sigma^2=\dfrac{(b-a)^2}{12}$ ### 4.5 Normal Distribution - aka Gaussian distribution - most widely used model - central limit theorem - $f(x)=\dfrac1{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}$ - $N(\mu,\sigma)$ - $\pm\sigma=0.6827$ - $\pm2\sigma=0.9545$ - $\pm3\sigma=0.9973$ (the width of a normal distribution) - standard normal random variable - $\mu=0$ - $\sigma^2=1$ - (comulative distribution) $\Phi=P(Z\leq z)$ - standardizing - z-value $Z=\dfrac{X-\mu}\sigma$ ### 4.6 Normal Approximation to the Binomial and Poisson Distributions - with a large number of trials - continuity correction: a modified interval is used to better compensate for the difference between the continuous normal distribution and the discrete binomial distribution - **Binomial Distribution** - $Z=\dfrac{X-np}{\sqrt{np(1-p)}}$ - $X$: binomial random variable - $n$, $p$: parameters - $P(X\leq x)=P(X\leq x+0.5)\approx P(Z\leq\dfrac{x+0.5-np}{\sqrt{np(1-p)}})$ $P(x\leq X)=P(x-0.5\leq X)\approx P(\dfrac{x-0.5-np}{\sqrt{np(1-p)}}\leq Z)$ - for $np>5$ and $n(1-p)>5$ * **Poisson Distribution** * $Z=\dfrac{X-\lambda}{\sqrt \lambda}$ * for $\lambda>5$ ### 4.7 Exponential Distribution * the distance between successive events from a Poisson process with mean number of events $\lambda>0$ per unit interval * $f(x)=\lambda e^{-\lambda x}$ * $\mu=\dfrac1\lambda$ * $\sigma^2=\dfrac1{\lambda^2}$ * use consistent units to express intervals #### Lack of Memory Property * $P(X<t_1+t_2|X>t_1)=P(X<t_2)$ ### 4.8 Erlang and Gamma Distributions * the length until $r$ events occur in a Poisson process * Erlang random variable: $f(x)=\dfrac{\lambda^rx^{r-1}e^{-\lambda x}}{(r-1)!}$ * Gamma Function $\Gamma(r)=\int_0^\infty x^{r-1}e^{-x}dx$ for $r>0$ * $\Gamma(r)=(r-1)\Gamma(r-1)$ * **Gamma Distribution** * $f(x)=\dfrac{\lambda^rx^{r-1}e^{-\lambda x}}{\Gamma(r)}$ * $\lambda$: scale * $r$: shape * $\mu=\dfrac r\lambda$ * $\sigma^2=\dfrac r{\lambda^2}$ * chi-square distribution: $\lambda=\frac 12$ and $r=\frac {\mathbb N}2$ ## 5 Joint Probability Distributions ### 5.1 Joint Probability Distributions for Two Random Variables * more than one random variable defined in a random experiment * generate values in specific regions of two-dimensional space * **joint probability mass function** (bivariate distribution) * discrete random variables 1. $f_{XY}(x,y)\geq0$ 2. $\sum_x\sum_yf_{XY}(x,y)=1$ 3. $f_{XY}(x,y)=P(X=x,Y=y)$ * **joint probability density function** * continuous random variables 1. $f_{XY}(x,y)\geq0$ 2. $\int_{-\infty}^\infty\int_{-\infty}^\infty f_{XY}(x,y)dxdy=1$ 3. $P((X,Y)\in R)=\int\int_Rf_{XY}(x,y)dxdy$ * **marginal probability distribution** * individual probability distribution of a random variable * $f_X(x)=\int f_{XY}(x,y)dy$ * $E(X)=\int_{-\infty}^\infty xf_X(x)dx=\int_{-\infty}^\infty\int_{-\infty}^\infty xf_{XY}(x,y)dydx$ * $V(X)=\int_{-\infty}^\infty(x-\mu_X)^2f_X(x)dx=\int_{-\infty}^\infty\int_{-\infty}^\infty(x-\mu_X)^2f_{XY}(x,y)dydx$ ### 5.2 Conditional Probability Distributions and Independence * conditional probability mass function * **conditional probability density function** * $f_{Y|x}(y)=\dfrac{f_XY(x,y)}{f_X(x)}$ * CPDF of $Y$ given $X=x$ for $f_X(x)>0$ * CPDF satisfy properties of PDF * $E(Y|x)=\int_yyf_{Y|x}(y)dy$ * $V(Y|x)=\int_y(y-\mu_{Y|x})^2f_{Y|x}(y)dy=\int_yy^2f_{Y|x}(y)dy-\mu_{Y|x}^2$ * **independence** 1. $f_{XY}(x,y)=f_X(x)f_Y(y)$ 2. $f_{Y|x}(y)=f_Y(y)$ 3. $f_{X|y}(x)=f_X(x)$ 4. $P(X\in A,Y\in B)=P(X\in A)P(Y\in B)$ * if the set of points in two-dimensional space that receive positive probability under $f_{XY}(x,y)$ is not rectangular, then $X$ and $Y$ are not independent ### 5.3 Joint Probability Distributions for More Than Two Random Variables * **joint probability density function** 1. $f_{X_1X_2...X_p}(x_1,x_2,...,x_p)\geq0$ 2. $\int_{-\infty}^\infty\int_{-\infty}^\infty...\int_{-\infty}^\infty f_{X_1X_2...X_p}(x_1,x_2,...,x_p)dx_1dx_2...dx_p=1$ 3. $P((X_1,X_2,...,X_p)\in B)=\int\int_Bf_{X_1X_2...X_p}(x_1,x_2,...,x_p)dx_1dx_2...dx_p$ * $f_{X_1X_2...X_p}(x_1,x_2,...,x_p) = 0$ if it is not specified * **marginal probability density function** * $f_{X_i}(x_i)=\int\int...\int f_{X_1X_2...X_p}(x_1,x_2,...,x_p)dx_1dx_2...dx_{i-1}dx_{i+1}...dx_p$ * $E(X_i)=\int_{-\infty}^\infty\int_{-\infty}^\infty...\int_{-\infty}^\infty x_if_{X_1X_2...X_p}(x_1,x_2,...,x_p)dx_1dx_2...dx_p=\int_{-\infty}^\infty x_if_{X_i}(x_i)dx_i$ * $V(X_i)=\int_{-\infty}^\infty\int_{-\infty}^\infty...\int_{-\infty}^\infty(x_i-\mu_{X_i})^2f_{X_1X_2...X_p}(x_1,x_2,...,x_p)dx_1dx_2...dx_p=\int_{-\infty}^\infty(x_i-\mu_{X_i})^2f_{X_i}(x_i)dx_i$ * **distribuion of a subset of random variables** * $f_{X_1X_2...X_k}(x_1,x_2,...x_k)=\int\int...\int f_{X_1X_2...X_p}(x_1,x_2,...,x_p)dx_{k+1}dx_{k+2}...dx_p$ * conditional probability distribution * $f_{X_1X_2X_3|x_4x_5}(x_1,x_2,x_3)=\dfrac{f_{X_1X_2X_3X_4X_5}(x_1,x_2,x_3,x_4,x_5)}{f_{X_4X_5}(x_4,x_5)}$ * **independence** * $f_{X_1X_2...X_p}(x_1,x_2,...,x_p)=f_{X_1}(x_1)f_{X_2}(x_2)...f_{X_p}(x_p)$ ### 5.4 Covariance and Correlation * **covariance** * a *linear* relationship between two variables * the variation between the two random variables * $cov(X,Y)=\sigma_{XY}=E[(X-\mu_X)(Y-\mu_Y)]=E(XY)-\mu_X\mu_Y$ * **correlation** * a *linear* relationship between two variables * compare the linear relationships between pairs of variables in different units * $\rho_{XY}=\dfrac{cov(X,Y)}{\sqrt{V(X)V(Y)}}=\dfrac{\sigma_{XY}}{\sigma_X\sigma_Y}$ * $-1\leq\rho_{XY}\leq+1$ * If $X$ and $Y$ are independent, $\sigma_{XY}=\rho_{XY}=0$ * if $\rho_{XY}=0$, we *cannot* conclude $X$ and $Y$ are independent ### 5.5 Common Joint Distributions #### 5.5.1 Multinomial Probability Distribution 1. The result of each trial is classified into one of $k$ classes 2. The probability of a trial generating a result in class 1, class 2,..., class $k$ is $p_1$, $p_2$,...,$p_k$. And these probability are constant over the trials. 3. The trials are independent * $P(X_1=x_1,X_2=x_2,...,X_k=x_k)=\dfrac{n!}{x_1!x_2!...x_k!}p_1^{x_1}p_2^{x_2}...p_k^{x_k}$ * $E(X_i)=np_i$ * $V(X_i)=np_i(1-p_i)$ #### 5.5.2 Bivariate Normal Distribution * $f_{XY}(x,y;\sigma_X,\sigma_Y,\mu_X,\mu_Y,\rho)=\dfrac1{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}\exp\{\dfrac{-1}{2(1-\rho^2)}[\dfrac{(x-\mu_X)^2}{\sigma_X^2}-\dfrac{2\rho(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y}+\dfrac{(y-\mu_Y)^2}{\sigma_Y^2}]\}$ * **marginal distributions** * $N(\mu_X,\sigma_X)$ * **conditional distribution** * $\mu_{Y|x}=\mu_Y+\rho\dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)$ * $\sigma_{Y|x}^2=\sigma_Y^2(1-\rho^2)$ * **correlation** * **independence** * if $\rho=0$, $X$ and $Y$ are independent ### 5.6 Linear Functions of Random Variables * **linear function**: $Y=c_0+c_1X_1+c_2X_2+...+c_pX_p$ * $E(Y)=c_0+c_1E(X_1)+c_2E(X_2)+...+c_pE(X_p)$ * $V(Y)=c_1^2V(X_1)+c_2^2V(X_2)+...+c_p^2V(X_p)+2\sum\sum_{i<j}c_ic_j\sigma_{X_iX_j}$ * if $X_1,X_2,...X_p$ are *independent*, $V(Y)=c_1^2V(X_1)+c_2^2V(X_2)+...+c_p^2V(X_p)$ * Mean and Variance of an Average * $E(\overline X)=\mu$ * $E(X_i)=\mu$ for $i=1,2,...,p$ * $V(\overline X)=\dfrac{\sigma^2}p$ * $X_1,X_2,...,X_p$ are independent * $V(X_i)=\sigma^2$ for $i=1,2,...,p$ ### 5.7 General Functions of Random Variables * **General Functions of a Discrete Random Variable** * $f_Y(y)=f_X[u(y)]$ * $Y=h(X)$ * let $x=u(y)$ * **General Functions of a Continuous Random Variable** * $f_Y(y)=f_X[u(y)]|J|$ * $J=u'(y)$, Jacobian ## 6 Descriptive Statistics * organizing and summarizing the data in ways that facilitate its interpretation and subsequent analysis * plotting the data * numerical methods * graphical techniques ### 6.1 Numerical Summaries of Data * **sample** of observations that have been selected from some larger **population** of observations * conceptual/hypothetical population * **sample mean:** $\overline x=\dfrac{x_1+x_2+...+x_n}n=\dfrac{\sum_{i=1}^nx_i}n$ * dot diagram * **sample variance**/**sample standard deviation**: $s^2=\dfrac{\sum_{i=1}^n(x_i-\overline x)^2}{n-1}=\dfrac{\sum_{i=1}^nx_i^2-\dfrac{(\sum_{i=1}^nx_i)^2}{n}}{n-1}$ * degrees of freedom * population mean, population variance, population standard deviation * **sample range:** $r=\max(x_i)-\min(x_i)$ ### 6.2 Stem-and-Leaf Diagrams ### 6.3 Frequancy Distributions and Histograms ### 6.4 Box Plots ### 6.5 Time Sequence Plots ### 6.7 Probability Plots