CS2008301 機率與統計 Probability and Statistics
===
Textbook: Montgomery, D. C., & Runger, G. C. *Applied Statistics and Probability for Engineers*. John Wiley & Sons, Inc.
Instructor: 沈上翔 Shan-Hsiang Shen
1 The Role of Statistics in Engineering
---
### 1.1 The Engineering Method and Statistical Thinking
Engineering method (scientific method)
1. Develop a clear description
2. Identify the important factors
3. Propose or refine a model
4. Conduct experiments
5. Manipulate the model
6. Confirm the solution
7. Conclusions and recommendations
```
several cycles or iterations of step 2-4 may be required to obtain the final solution
```
Statistics
- **the science of data**
- deals with the collection, presentation, analysis, and use of data to make decisions, solve problems, and design products and processes
Variability
- successive observations of a system or phenomenon do not produce exactly the same result
- **statistical thinking** can give us a useful way to incorporate it
- statistics provides a framework for describing this variability and for learning about which potential **sources of variability**
- **random variable:** the measurement exhibit variability
- $X=\mu+\epsilon$
- $X$ : random variable
- $\mu$ : constant
- remains the same with every measurement
- $\epsilon$ : random disturbance
- small changes in the environment, variance in test equipment, differences in the individual parts
- **dot diagram**
- displaying the data up to about 20 observation
- easily see two features of the data
- **location** (middle)
- **scatter** or **variability**
Reasoning
```mermaid
graph TD;
L[Physical laws];
D[Product designs];
L-->D;
```
```mermaid
graph BT;
Sample--Statistical inference-->Population;
```
### 1.2 Collecting Engineering Data
#### 1.2.2 Retrospective Study
Data collected in the past for other purposes
Hazards
- contain relatively little useful **information** about the problem
- some of the relevant data may be missing
- transcription or recording errors resulting in **outliers** (unusual values)
- other important factors may not have been collected and archived
#### 1.2.3 Observational Study
presently collected, by a passive observer
- conducted for a relatively short time period
- variables that are not routinely measured can be included
#### 1.2.4 Designed Experiments
- collected in response to process input changes
- makes deliberate or purposeful changes in the controllable variables of the system or process
- randomization are needed to establish cause-and-effect relationships
comparative experiment
- comparate the difference
- hypothesis testing
- test some aspect of the system in which we are interested
- single-sample hypothesis-testing problem
- two-sample hypothesis-testing problem
factorial experiment
- The specified values of the factors used in the experiment are called **factor levels**
- typically two or three for each factor
- detect the interaction
- **fractional factorial experiment**
- only a subset of the factor is actually tested
#### 1.2.5 Observing Processes Over Time
- time series plot
- overcontrol / tampering
- control chart
- center line
- upper control limit, lower control limit
- enumerative study
- collect data from a process to evaluate current production
- analytic study
- use data from current production to evaluate future production
- statistical process control (SPC)
### 1.3 Mechanistic and Empirical Models
- mechanistic model
- built from our underlying knowledge of the basic physical mechanism
- may have factor which are not completely controlled
- empirical model
- add $\epsilon$ to the mechanistic model
- regression model
- $f(x)=\beta_0+\beta_1x_1+\beta_2x_2+\epsilon$
- use least squares method to estimate
### 1.4 Probability and Probability Models
- to quantify the risks involved in statistical inference
2 Probability
---
### 2.1 Sample Spaces and Events
- Random Experiments: An experiment outcomes, even though it is repeated in the same manner every time
- Sample Spaces: The set of all possible outcomes of a random experiment. Denoted as $S$.
- Discrete Sample Spaces: it consists of a finite or countable infinite set of outcomes
- **Continuous Sample Spaces:** it contains an interval of real numvers.
- tree diagrams
- Events: a subset of the sample space of a random experiment
### 2.3 Interpretations and Axioms of Probability
- the probability of an event $E$, denoted as $P(E)$
#### Axioms of Probability
1. $P(S)=1$ where $S$ is the sample space
2. $0\leq P(E)\leq1$ for any event $E$
3. For two events $E_1$ and $E_2$ with $E_1\cap E_2=\emptyset$, $P(E_1\cup E_2)=P(E_1)+P(E_2)$
### 2.5 Conditional Probability
- *Random Samples*: at each step of the sample, the items that remain in the batch are equally likely to be selected
### 2.6 Intersections of Events and Multiplication and Total Probability Rules
- **Multiplication Rule**: $P(A\cap B)=P(B|A)P(A)=P(A|B)P(B)$
- **Total Probability Rule**:
- $P(B)=P(B\cap A)+P(B\cap A')=P(B|A)P(A)+P(B|A')P(A')$
- $P(B)=P(B\cap E_1)+P(B\cap E_2)+..+P(B\cap E_k)=P(B|E_1)P(E_1)+P(B|E_2)P(E_2)+..+P(B|E_k)P(E_k)$
### 2.9 Random Variables
- a function that assigns a real number to each outcome in the sample space of a random expriment
- denoted by an uppercase letter such as $X$
- the measured value of the random variable is denoted by a lowercase letter such as $x=$ 70 milliamperes
- **discrete random variable:** with a finite (or countably infinite) range
- *example: electrical current, length, pressure, temperature, time, voltage, weight*
- **continuous random variable:** with an interval (either finite or infinite) of real numbers for its range
- *example: number of scratches on a surface, proportion of defective parts among 1000 tested, number of transmitted bits received in error*
3 Discrete Random Variables and Probability Distributions
---
### 3.1 Probability Distributions and Probability Mass Functions
- probability distribution: a description of probabilities associated with the possible values of $X$
- **probability mass function**
1. $f(x_i)\leq0$
2. $\sum_{i=1}^nf(x_i)=1$
3. $f(x_i)=P(X=x_i)$
### 3.2 Cumulative Distribution Functions
- alternate method for describing a random variable's probability distribution
- to find the PMF
- $F(x)=P(X\leq x)=\sum_{x_i\leq x}f(x_i)$
- $0\leq F(x)\leq1$
- If $x\leq y$, then $F(x)\leq F(y)$
### 3.3 Mean and Variance of a Discrete Random Variable
- mean(expected value): $\mu=E(X)=\sum_xxf(x)$
- measure of the center or middle of the probability distribution
- variance: $\sigma^2=V(X)=E(X-\mu)^2=\sum_x(x-\mu)^2f(x)=\sum_xx^2f(x)-\mu^2$
- $\sum_x(x-\mu)^2f(x)=\sum_xx^2f(x)-2\mu\sum_xxf(x)+\mu^2\sum_xf(x)=\sum_xx^2f(x)-2\mu^2+\mu^2=\sum_xx^2f(x)-\mu^2$
- measure of the dispersion
- standard deviation: $\sigma=\sqrt{\sigma^2}$
- Expected Value of a Function of a Discrete Random Variable:
- $E[h(x)]=\sum_xh(x)f(x)$
### 3.4 Discrete Uniform Distribution
- a finite number of possible values, each with equal probability
- $f(x_i)=\dfrac1n$
- $\mu=\sum_{k=a}^bk(\dfrac1{b-a+1})=\dfrac{b(b+1)-(a-1)a}2\dfrac1{b-a+1}=\dfrac{b+a}2$
- $\sigma^2=\dfrac{(b-a+1)^2-1}{12}$(有空補推導)
### 3.5 Binomial Distribution
- Bernoulli trial
- with only two possible outcomes
- independent
- probability of a success in each is constant
- $f(x)=\dbinom nxp^x(1-p)^{n-x}$
- $\mu=E(X)=E(X_1)+E(X_2)+...+E(X_n)=np$
- $\sigma^2=V(X)=V(X_1)+V(X_2)+...+V(X_n)=np(1-p)$
### 3.6 Geometric and Negative Binomial Distributions
#### Geometric Distribution
- Bernoulli trial are conducted until a success is obtained
- $f(x)=(1-p)^{x-1}p$
- $\mu=p\sum_{k=1}^\infty kq^{k-1}=\dfrac\partial{\partial q}\lfloor\dfrac{pq}{1-q}\rfloor=\dfrac1p$
- $q=1-p$
- $\sigma^2=\dfrac{1-p}{p^2}$
- $V(X)=E(X^2)-(E(X))^2$
#### Lack of Memory Property
- the count of the number of trials until the next success can be started at any trial without changing the probability distribution of the random variable
#### Negative Binomial Distribution
- the number of Bernoulli trials required to obtain $r$ successes results
- $f(x)=\dbinom {x-1}{r-1}(1-p)^{x-r}p^r$
- $\mu=\dfrac rp$
- $\sigma^2=\dfrac{r(1-p)}{p^2}$
### 3.7 Hypergeometric Distributioon
- selected a sample set $n$ from set of $K$ successes and $N-K$ failture
- $f(x)=\frac {\dbinom Kx\dbinom {N-K}{n-x}}{\dbinom KN}$
- $p=K/N$
- $\mu=np$
- $\sigma^2=np(1-p)(\dfrac{N-n}{N-1})$
- finite population correction factor: $\dfrac{N-n}{N-1}$
### 3.8 Poisson Distribution
- the count of events that occur within the interval $T$
- **Poisson process**
1. The probability of more than one event in a subinterval tends to zero
2. The probability of one event in a subinterval tends to $\lambda\Delta t$
3. The event in each subinterval is independent of other subintervals
- approximate independent Bernourlli trials with the number of trials equal to $n=T\Delta t$ and the success probability $p=\lambda T/n$
- $f(x)=\dfrac {e^{-\lambda T}(\lambda T)^x}{x!}$
- $\sum_{x=0}^\infty\frac{(\lambda T)^x}{x!}$ is Taylor's expansion of $e^{\lambda T}$
- $\mu=\lambda T$
- $\sigma^2=\lambda T$
4 Continuous Random Variables and Probability Distributions
---
### 4.1 Probability Distributions and Probability Density Functions
- Probability Density Function
1. $f(x)\geq 0$
2. $\int_{-\infty}^{\infty}f(x)dx=1$
3. $P(a\leq X\leq b)=\int_{a}^{b}f(x)dx$
- histogram: an approximation to a probability dessity function
- $P(x_1\leq X\leq x_2)=P(x_1<X\leq x_2)=P(x_1\leq X<x_2)=P(x_1<X<x_2)$
### 4.2 Cumulative Distribution Functions
- $F(x)=P(X\leq x)=\int_{-\infty}^xf(u)du$
- $f(x)=\dfrac{dF(x)}{dx}$
### 4.3 Mean and Variance of a Continuous Random Variable
- $\mu=\int_{-\infty}^\infty xf(x)dx$
- $\sigma^2=\int_{-\infty}^\infty(x-\mu)^2f(x)dx=\int_{-\infty}^\infty x^2f(x)dx-\mu^2$
- also applied to functions of a continuous random variable
### 4.4 Continuous Uniform Disdribution
- $f(x)=\dfrac1{b-a}$
- $\mu=\dfrac{a+b}2$
- $\sigma^2=\dfrac{(b-a)^2}{12}$
### 4.5 Normal Distribution
- aka Gaussian distribution
- most widely used model
- central limit theorem
- $f(x)=\dfrac1{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}$
- $N(\mu,\sigma)$
- $\pm\sigma=0.6827$
- $\pm2\sigma=0.9545$
- $\pm3\sigma=0.9973$ (the width of a normal distribution)
- standard normal random variable
- $\mu=0$
- $\sigma^2=1$
- (comulative distribution) $\Phi=P(Z\leq z)$
- standardizing
- z-value $Z=\dfrac{X-\mu}\sigma$
### 4.6 Normal Approximation to the Binomial and Poisson Distributions
- with a large number of trials
- continuity correction: a modified interval is used to better compensate for the difference between the continuous normal distribution and the discrete binomial distribution
- **Binomial Distribution**
- $Z=\dfrac{X-np}{\sqrt{np(1-p)}}$
- $X$: binomial random variable
- $n$, $p$: parameters
- $P(X\leq x)=P(X\leq x+0.5)\approx P(Z\leq\dfrac{x+0.5-np}{\sqrt{np(1-p)}})$
$P(x\leq X)=P(x-0.5\leq X)\approx P(\dfrac{x-0.5-np}{\sqrt{np(1-p)}}\leq Z)$
- for $np>5$ and $n(1-p)>5$
* **Poisson Distribution**
* $Z=\dfrac{X-\lambda}{\sqrt \lambda}$
* for $\lambda>5$
### 4.7 Exponential Distribution
* the distance between successive events from a Poisson process with mean number of events $\lambda>0$ per unit interval
* $f(x)=\lambda e^{-\lambda x}$
* $\mu=\dfrac1\lambda$
* $\sigma^2=\dfrac1{\lambda^2}$
* use consistent units to express intervals
#### Lack of Memory Property
* $P(X<t_1+t_2|X>t_1)=P(X<t_2)$
### 4.8 Erlang and Gamma Distributions
* the length until $r$ events occur in a Poisson process
* Erlang random variable: $f(x)=\dfrac{\lambda^rx^{r-1}e^{-\lambda x}}{(r-1)!}$
* Gamma Function $\Gamma(r)=\int_0^\infty x^{r-1}e^{-x}dx$ for $r>0$
* $\Gamma(r)=(r-1)\Gamma(r-1)$
* **Gamma Distribution**
* $f(x)=\dfrac{\lambda^rx^{r-1}e^{-\lambda x}}{\Gamma(r)}$
* $\lambda$: scale
* $r$: shape
* $\mu=\dfrac r\lambda$
* $\sigma^2=\dfrac r{\lambda^2}$
* chi-square distribution: $\lambda=\frac 12$ and $r=\frac {\mathbb N}2$
## 5 Joint Probability Distributions
### 5.1 Joint Probability Distributions for Two Random Variables
* more than one random variable defined in a random experiment
* generate values in specific regions of two-dimensional space
* **joint probability mass function** (bivariate distribution)
* discrete random variables
1. $f_{XY}(x,y)\geq0$
2. $\sum_x\sum_yf_{XY}(x,y)=1$
3. $f_{XY}(x,y)=P(X=x,Y=y)$
* **joint probability density function**
* continuous random variables
1. $f_{XY}(x,y)\geq0$
2. $\int_{-\infty}^\infty\int_{-\infty}^\infty f_{XY}(x,y)dxdy=1$
3. $P((X,Y)\in R)=\int\int_Rf_{XY}(x,y)dxdy$
* **marginal probability distribution**
* individual probability distribution of a random variable
* $f_X(x)=\int f_{XY}(x,y)dy$
* $E(X)=\int_{-\infty}^\infty xf_X(x)dx=\int_{-\infty}^\infty\int_{-\infty}^\infty xf_{XY}(x,y)dydx$
* $V(X)=\int_{-\infty}^\infty(x-\mu_X)^2f_X(x)dx=\int_{-\infty}^\infty\int_{-\infty}^\infty(x-\mu_X)^2f_{XY}(x,y)dydx$
### 5.2 Conditional Probability Distributions and Independence
* conditional probability mass function
* **conditional probability density function**
* $f_{Y|x}(y)=\dfrac{f_XY(x,y)}{f_X(x)}$
* CPDF of $Y$ given $X=x$ for $f_X(x)>0$
* CPDF satisfy properties of PDF
* $E(Y|x)=\int_yyf_{Y|x}(y)dy$
* $V(Y|x)=\int_y(y-\mu_{Y|x})^2f_{Y|x}(y)dy=\int_yy^2f_{Y|x}(y)dy-\mu_{Y|x}^2$
* **independence**
1. $f_{XY}(x,y)=f_X(x)f_Y(y)$
2. $f_{Y|x}(y)=f_Y(y)$
3. $f_{X|y}(x)=f_X(x)$
4. $P(X\in A,Y\in B)=P(X\in A)P(Y\in B)$
* if the set of points in two-dimensional space that receive positive probability under $f_{XY}(x,y)$ is not rectangular, then $X$ and $Y$ are not independent
### 5.3 Joint Probability Distributions for More Than Two Random Variables
* **joint probability density function**
1. $f_{X_1X_2...X_p}(x_1,x_2,...,x_p)\geq0$
2. $\int_{-\infty}^\infty\int_{-\infty}^\infty...\int_{-\infty}^\infty f_{X_1X_2...X_p}(x_1,x_2,...,x_p)dx_1dx_2...dx_p=1$
3. $P((X_1,X_2,...,X_p)\in B)=\int\int_Bf_{X_1X_2...X_p}(x_1,x_2,...,x_p)dx_1dx_2...dx_p$
* $f_{X_1X_2...X_p}(x_1,x_2,...,x_p) = 0$ if it is not specified
* **marginal probability density function**
* $f_{X_i}(x_i)=\int\int...\int f_{X_1X_2...X_p}(x_1,x_2,...,x_p)dx_1dx_2...dx_{i-1}dx_{i+1}...dx_p$
* $E(X_i)=\int_{-\infty}^\infty\int_{-\infty}^\infty...\int_{-\infty}^\infty x_if_{X_1X_2...X_p}(x_1,x_2,...,x_p)dx_1dx_2...dx_p=\int_{-\infty}^\infty x_if_{X_i}(x_i)dx_i$
* $V(X_i)=\int_{-\infty}^\infty\int_{-\infty}^\infty...\int_{-\infty}^\infty(x_i-\mu_{X_i})^2f_{X_1X_2...X_p}(x_1,x_2,...,x_p)dx_1dx_2...dx_p=\int_{-\infty}^\infty(x_i-\mu_{X_i})^2f_{X_i}(x_i)dx_i$
* **distribuion of a subset of random variables**
* $f_{X_1X_2...X_k}(x_1,x_2,...x_k)=\int\int...\int f_{X_1X_2...X_p}(x_1,x_2,...,x_p)dx_{k+1}dx_{k+2}...dx_p$
* conditional probability distribution
* $f_{X_1X_2X_3|x_4x_5}(x_1,x_2,x_3)=\dfrac{f_{X_1X_2X_3X_4X_5}(x_1,x_2,x_3,x_4,x_5)}{f_{X_4X_5}(x_4,x_5)}$
* **independence**
* $f_{X_1X_2...X_p}(x_1,x_2,...,x_p)=f_{X_1}(x_1)f_{X_2}(x_2)...f_{X_p}(x_p)$
### 5.4 Covariance and Correlation
* **covariance**
* a *linear* relationship between two variables
* the variation between the two random variables
* $cov(X,Y)=\sigma_{XY}=E[(X-\mu_X)(Y-\mu_Y)]=E(XY)-\mu_X\mu_Y$
* **correlation**
* a *linear* relationship between two variables
* compare the linear relationships between pairs of variables in different units
* $\rho_{XY}=\dfrac{cov(X,Y)}{\sqrt{V(X)V(Y)}}=\dfrac{\sigma_{XY}}{\sigma_X\sigma_Y}$
* $-1\leq\rho_{XY}\leq+1$
* If $X$ and $Y$ are independent, $\sigma_{XY}=\rho_{XY}=0$
* if $\rho_{XY}=0$, we *cannot* conclude $X$ and $Y$ are independent
### 5.5 Common Joint Distributions
#### 5.5.1 Multinomial Probability Distribution
1. The result of each trial is classified into one of $k$ classes
2. The probability of a trial generating a result in class 1, class 2,..., class $k$ is $p_1$, $p_2$,...,$p_k$. And these probability are constant over the trials.
3. The trials are independent
* $P(X_1=x_1,X_2=x_2,...,X_k=x_k)=\dfrac{n!}{x_1!x_2!...x_k!}p_1^{x_1}p_2^{x_2}...p_k^{x_k}$
* $E(X_i)=np_i$
* $V(X_i)=np_i(1-p_i)$
#### 5.5.2 Bivariate Normal Distribution
* $f_{XY}(x,y;\sigma_X,\sigma_Y,\mu_X,\mu_Y,\rho)=\dfrac1{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}\exp\{\dfrac{-1}{2(1-\rho^2)}[\dfrac{(x-\mu_X)^2}{\sigma_X^2}-\dfrac{2\rho(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y}+\dfrac{(y-\mu_Y)^2}{\sigma_Y^2}]\}$
* **marginal distributions**
* $N(\mu_X,\sigma_X)$
* **conditional distribution**
* $\mu_{Y|x}=\mu_Y+\rho\dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)$
* $\sigma_{Y|x}^2=\sigma_Y^2(1-\rho^2)$
* **correlation**
* **independence**
* if $\rho=0$, $X$ and $Y$ are independent
### 5.6 Linear Functions of Random Variables
* **linear function**: $Y=c_0+c_1X_1+c_2X_2+...+c_pX_p$
* $E(Y)=c_0+c_1E(X_1)+c_2E(X_2)+...+c_pE(X_p)$
* $V(Y)=c_1^2V(X_1)+c_2^2V(X_2)+...+c_p^2V(X_p)+2\sum\sum_{i<j}c_ic_j\sigma_{X_iX_j}$
* if $X_1,X_2,...X_p$ are *independent*, $V(Y)=c_1^2V(X_1)+c_2^2V(X_2)+...+c_p^2V(X_p)$
* Mean and Variance of an Average
* $E(\overline X)=\mu$
* $E(X_i)=\mu$ for $i=1,2,...,p$
* $V(\overline X)=\dfrac{\sigma^2}p$
* $X_1,X_2,...,X_p$ are independent
* $V(X_i)=\sigma^2$ for $i=1,2,...,p$
### 5.7 General Functions of Random Variables
* **General Functions of a Discrete Random Variable**
* $f_Y(y)=f_X[u(y)]$
* $Y=h(X)$
* let $x=u(y)$
* **General Functions of a Continuous Random Variable**
* $f_Y(y)=f_X[u(y)]|J|$
* $J=u'(y)$, Jacobian
## 6 Descriptive Statistics
* organizing and summarizing the data in ways that facilitate its interpretation and subsequent analysis
* plotting the data
* numerical methods
* graphical techniques
### 6.1 Numerical Summaries of Data
* **sample** of observations that have been selected from some larger **population** of observations
* conceptual/hypothetical population
* **sample mean:** $\overline x=\dfrac{x_1+x_2+...+x_n}n=\dfrac{\sum_{i=1}^nx_i}n$
* dot diagram
* **sample variance**/**sample standard deviation**: $s^2=\dfrac{\sum_{i=1}^n(x_i-\overline x)^2}{n-1}=\dfrac{\sum_{i=1}^nx_i^2-\dfrac{(\sum_{i=1}^nx_i)^2}{n}}{n-1}$
* degrees of freedom
* population mean, population variance, population standard deviation
* **sample range:** $r=\max(x_i)-\min(x_i)$
### 6.2 Stem-and-Leaf Diagrams
### 6.3 Frequancy Distributions and Histograms
### 6.4 Box Plots
### 6.5 Time Sequence Plots
### 6.7 Probability Plots