# Preliminaries ## Probability Theory ### Basic Building Blocks - $\Omega$ - sample space: the set of all outcomes of a random experiment. - $\mathbb{P}(E)$ - probability measure of an event $E \in \Omega$: a function $\mathbb{P}: \Omega \rightarrow \mathbb{R}$ which satisfies the following three properties: - $0 \le \mathbb{P}(E) \le 1 \quad \forall E \in \Omega$ - $\mathbb{P}(\Omega)=1$ - $\mathbb{P}(\cup_{i=1}^n E_i) = \sum_{i=1}^n \mathbb{P}(E_i) \;$ for disjoint events ${E_1, ..., E_n}$ - $\mathbb{P}(A, B)$ - joint probability: probability that both $A$ and $B$ occur simultaneously. - $\mathbb{P}(A | B)$ - conditional probability: probability that $A$ occurs, if $B$ has occured. - Product rule of probabilites: - general case: $$\mathbb{P}(A, B) = \mathbb{P}(A | B)\cdot \mathbb{P}(B) = \mathbb{P}(B | A) \cdot \mathbb{P}(A)$$ - independent events: $$\mathbb{P}(A, B) = \mathbb{P}(A) \cdot \mathbb{P}(B)$$ - Sum rule of probabilites: $$\mathbb{P}(A)=\sum_{B}\mathbb{P}(A, B)$$ - Bayes rule: solving the general case of the product rule for $\mathbb{P}(A)$ results in: $$ \mathbb{P}(B|A) = \frac{\mathbb{P}(A|B) \mathbb{P}(B)}{\mathbb{P}(A)} = \frac{\mathbb{P}(A|B) \mathbb{P}(B)}{\sum_{i=1}^n \mathbb{P}(B|A_i)\mathbb{P}(A_i)}$$ - $p(B|A)$: posterior - $p(A|B)$: likelihood - $p(B)$: prior - $p(A)$: evidence ### Random Variables and Their Properties - Random variable (r.v.) $X$: a function $X:\Omega \rightarrow \mathbb{R}$. This is the formal way by which we move from abstract events to real-valued numbers. $X$ is essentially a variable that does not have a fixed value, but can have different values with certain probabilities. - Continuous r.v.: - Cumulative distribution function (cdf) $F_X(x)$ - probability that the r.v. $X$ is smaller than some value $x$: $$F_X(x) = \mathbb{P}(X\le x)$$ - Probability density function (pdf) $p_X(x)$: $$p_X(x)=\frac{dF_X(x)}{dx}\ge 0 \;\text{ and } \; \int_{-\infty}^{+\infty}p_X(x) dx =1$$ <div style="text-align:center"> <img src="https://i.imgur.com/uHHQU4r.png" alt="drawing" width="400"/> </div> - discrete r.v.: - Probability mass function (pmf) - same as the pdf but for a discrete r.v. $X$. Integrals become sums. - $\mu = E[X]$ - mean value or expected value: $$E[X] = \int_{-\infty}^{+\infty}x \, p_X(x) \, dx$$ - $\sigma^2 = Var[X]$ - variance: $$Var[X] = \int_{-\infty}^{+\infty}x^2 \, p_X(x) \, dx = E[(X-\mu)^2]$$ - $Cov[X,Y]=E[(X-\mu_X)(Y-\mu_Y)]$ - Change of variables - if $X \sim p_X$ and $Y=h(X)$, then the distribution of $Y$ becomes: $$p_Y(y)=\frac{p_X(h^{-1}(y))}{\left|\frac{dh}{dx}\right|}$$ ### Catalogue of important distributions - Binomial, $X\in\{0,1,...,n\}$. Describes how often we get $k$ positive outcomes out of $n$ independent experiments. Parameter $\lambda$ is the success probability of each trial. $$\mathbb{P}(X=k|\lambda)=\binom{n}{k}\lambda^k(1-\lambda)^{n-k}, \quad \text{ with } k\in(1,2,..., n).$$ - Bernoulli - special case of Binomial with $n=1$. - Normal, $X \in \mathbb{R}$. $$p(x| \mu, \sigma)=\mathcal{N}(x|\mu, \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$ - Multivariate Gaussian $\mathcal{N}(\mathbf{\mu}, \mathbf{\Sigma}), \; \mathbf{X}\in \mathbb{R}^n, \; \mathbf{\mu}\in \mathbb{R}^n, \; \mathbb{\Sigma}:\mathbf{n}\times\mathbf{n}$ $$p_X(x)= \frac{1}{(2\pi)^{n/2}\sqrt{\det (\mathbf{\Sigma})}} \exp \left(-\frac{1}{2}(\mathbf{x}-\mathbf{\mu})^{\top}\mathbf{\Sigma}^{-1}(\mathbf{x}-\mathbf{\mu}\right),$$ where $\mathbf{\mu}\in \mathbb{R}^n$: mean vector and $\mathbf{\Sigma}$: covariance matrix.