---
tags: Probability
---
by AWfulsome
# Lecture 1: Introduction
### Probability Theory
* Developed to describe phenomena that cannot be predicted with certainty
* Frequency of occurrences
* Subjective beliefs
* Probability is number between 0 and 1
### Role of Probability Theory
* A framwork for analyzing phenomena with uncertain outcomes
* Rules for consistent reasoning
* Use for predictions and decisions about the real world
> 子曰:「由!誨女知之乎!知之為知之,不知為不知,是知也。」

---
# Sets and Probabilistic Models
## Sets
I believe it's already in your knowledge set.
Notations:
$\Omega$ for the universal set.
$S^c$ for the complement of $S$.
## Probabilistic Models
* A probabilistic model is a mathematical description of an uncertain situation.
* Elements of a probabilistic model
* The **sample space**: The set of all possible outcomes of an experiment.
* The **probability law**: Assign to a set $A$ of possible outcomes (also called **event**) a nonnegative number $P(A)$ (called the **probability** of $A$) that encodes our knowledge or belief about the collective "likelihood" of the elements of $A$.

## Sample Spaces and Events
EASY
## Sequential Probabilistic Models
can be described by means of a *tree-based sequential description*

## Probability Laws
* Given the sample space associated with an experiment is settled on a probability law
* Specify the *likelihood* of any outcome, or of any set of possible outcomes (an event).
* More precisely, assign to every event $A$ a number $\mathbf{P}(A)$, called the **probability** of $A$, satisfying the following axioms:

## Probability Laws for Discrete Models
* Discrete Probability Law
$\mathbf{P}(\{s_1, s_2, \dots, s_n\}) = \mathbf{P}(s_1) + \mathbf{P}(s_2) + \dots + \mathbf{P}(s_n)$
* Discrete Uniform Probability Law
$\mathbf{P}(A) = \dfrac{n(A)}{n(\Omega)}$
## Continuous Models
Probabilistic models with continuous sample spaces
* It is inappropriate to assign probability to each single-element event.
* Instead, it makes sense to assign probability to any interval (one-dimensional) or area (two-dimensional) of the sample space.
Example: a wheel of fortune

:::spoiler Example 1.5

:::
## Properties of Probability Laws
1. If $A \subseteq B$, then $\mathbf{P}(A) \leq \mathbf{P}(B)$.
2. $\mathbf{P}(A \cup B) = \mathbf{P}(A) + \mathbf{P}(B) - \mathbf{P}(A \cap B)$
3. $\mathbf{P}(A \cup B) \leq \mathbf{P}(A) + \mathbf{P}(B)$
4. $\mathbf{P}(A \cup B \cup C) = \mathbf{P}(A) + \mathbf{P}(A^c \cap B) + \mathbf{P}(A^c \cap B^c \cap C)$
Bonferroni Inequality
* $\mathbf{P}(A\cap B) \geq \mathbf{P}(A) + \mathbf{P}(B) - 1$
* $\mathbf{P}(A_1 \cap \cdots \cap A_n) \geq \mathbf{P}(A_1) + \dots + \mathbf{P}(A_n) - (n - 1)$
:::spoiler Proof

:::
Visualization and verification using Venn diagrams

---
## Conditional Probability, Total Probability Theorem and Bayes' Rule
### Conditional Probability
Suppose that the out come is within some given event B, we wish to quantify the likelihood that the out come also belongs some other given event A.
Definition: (def)
* When all outcomes of the experiment are equally likely, the conditional probability also can be defined as $P(A|B) = \dfrac{n(A \cap B)}{n(B)}$
Does it satisify the three axioms?(有更好的寫法)
* Nonnegative: $P(A|B) = \dfrac{P(A \cap B)}{P(B)} \geq 0$
* Normalization: $P(\Omega|B) = \dfrac{P(\Omega\cap B)}{P(B)} = \dfrac{P(B)}{P(B)} = 1$
* Additivity:

$$
\begin{align}
P(A_1 \cup A_2 | B) &=
\end{align}
$$
Does it satisfy general probability laws?

Example 1.6:

Example 1.7:

Example 1.8:

Using conditional probability for modeling

## Independence and Counting
* A special case arises when the occurrence of $B$ does not alter the probability that $A$ has occured.
$$
P(A|B) = P(A) \\
\begin{align}
& \implies P(A|B) = \dfrac{P(A \cap B)}{P(B)} = P(A) \\
& \implies P(A \cap B) = P(A)P(B)
\end{align}
$$
We call that $A$ is **independent** from $B$.
## Discrete Random Variables
### Expectation
$$
E[X] = \sum\limits_xxp_X(x)
$$
#### Moments
**n-th moment**: $X^n$
$$
E[X^n] = \sum\limits_xx^np_X(x)
$$
(above this line is WIP)
---
### Joint PMFs
Let $X$ and $Y$ be random variables **associated with the same experiment**
$$
p_{X, Y}(x, y) = P(\{X = x\} \cap \{Y = y\}) = P(X = x, Y = y)
$$
If event $A$ is the set of all pairs $(x, y)$ that have certain property, then the probability of $A$ can be calculated by
$$
P((X, Y) \in A) = \sum\limits_{(x, y) \in A} p_{X, Y}(x, y)
$$
#### Marginal PMFs of RV
$$
p_X(x) = \sum\limits_y p_{X, Y}(x, y) \\
p_Y(y) = \sum\limits_x p_{X, Y}(x, y)
$$
##### Tabular method:

#### Functions of Multiple RVs
If $Z = g(X, Y)$, then
$$
p_Z(z) = \sum\limits_{\{(x, y) | g(x, y) = z\}} p_{X, Y}(x, y)
$$
and the expectation
$$
E[Z] = E[g(X, Y)] = \sum\limits_x \sum\limits_y g(x, y)p_{X, Y}(x, y)
$$
If the function if linear and of the form $Z = g(X, Y) = aX + bY + c$, then the expectation is
$$
E[Z] = aE[X] + bE[Y] + c
$$
#### More than Two RVs
Just apply them recurrsivly, nothing different.
### Conditioning
$$
P_{X|A}(x) = P(X = x | A) = \frac{P(\{X = x\} \cap A)}{P(A)}
$$
Normalization:
$$
P(A) = \sum\limits_x P(\{X = x\} \cap A) \\
\therefore \sum\limits_x P_{X|A}(x) = \sum\limits_x \frac{P(\{X = x\} \cap A)}{P(A)} = \frac{\sum\limits_x P(\{X = x\} \cap A)}{P(A)} = \frac{P(A)}{P(A)} = 1
$$
#### Total Probability Theorem
Let $A_1, A_2, \ldots, A_n$ are disjoint events and form a partition of the sample space. Then, we have
$$
p_X(x) = \sum\limits_{i = 1}^n P(A_i) \cdot p_{X|A_i}(x) \\
E[X] = \sum\limits_{i = 1}^n P(A_i) \cdot E[X|A_i]
$$
#### Conditioning a RV on Another
$$
p_{X|Y}(x|y) = P(X = x | Y = y) = \frac{P(X = x, Y = y)}{P(Y = y)} = \frac{p_{X, Y}(x, y)}{p_Y(y)}
$$
* Normalization property:
$$
\sum\limits_x p_{X|Y}(x|y) = 1
$$
* The conditional PMF is often convenient 
* The conditional PMF can also be used to calculate the marginal PMFs
$$
p_X(x) = \sum\limits_y p_{X, Y}(x, y) = \sum\limits_y p_Y(y)p_{X|Y}(x|y)
$$
* Visualization

:::spoiler Example 2.14

:::
### Independence
## Continuous Random Variables: Basics
### Continuous RV
### Probability Density Function
### Interpretation of the PDF
* $f_X(x)$ is not the
### Continuous Uniform RV
* A random variable $X$ that takes values in an interval $[a, b]$, and all subintervals of the same length
### Functions of a Continuous RV
$Y = g(X)$
* $Y$ could be a continuous variable, e.g.: $y = g(x) = x^2$
* $Y$ could be a discrete variable, e.g.: $y = g(x) = \cases{1 & for x > 0 \\ 0 & otherwise}$
### Exponential RV
* An exponential random variable $X$ has a PDF of the form
$$
f_X(x) = \begin{cases}
\lambda e^{-\lambda x}, & \text{if }x \geq 0,\\
0, & \text{otherwise}
\end{cases}
$$
### Normal (or Gaussian) RV
* A continuous rv $X$ is said to be **normal** (or Gaussian) if it has a PDF of the form
$$
f_X(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x - \mu)^2}{2\sigma^2}}, \ \ \ \ \ \ -\infty \leq x \leq \infty
$$
* Where
* Normalization Property
$$
\int^{\infty}_{-\infty} f_X(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x - \mu)^2}{2\sigma^2}} \ dx = 1
$$
### Normality is Preserved by Linear Transformations
* If $X$
### Standard Normal RV
$$
f_Y(y) = \frac{1}{\sqrt{2\pi}} e ^{-\frac{y^2}{2}}
$$
### The PDF of a RV Can be Arbitrarily Large
### Expectation of a Continuous RV
$$
E[X] = \int^{\infty}_{-\infty} x \cdot f_X(x) \ dx
$$
$$
var(X) = E[(X - E[X])^2] = \int
$$
$$
Y = aX+b \\
E[Y] = aE[X] + b \\
var(Y) =
$$
### Illustrative Examples
---