Basics of probability theory

# Basics of probability theory ## Axioms of probability The following axioms go back to [A. Kolmogorov](https://en.wikipedia.org/wiki/Andrey_Kolmogorov): :::info **Definition (Probability distribution).** A **probability distribution** $\mathbb P$ on a sample space $\Omega$ is a mapping from **events of $\Omega$** (certain subsets of $\Omega$) to the real numbers $\mathbb R$ satisfying the following **probability axioms**: 1. $\mathbb P(A) \geq 0$ for any event $A$ 2. $\mathbb P(\Omega) = 1$ 3. $\mathbb P(A \cup B) = \mathbb P(A) + \mathbb P(B)$ for events $A,B$ that are **mutually exclusiv**, i.e., they are disjoint as subsets: $A \cap B = \emptyset$ We call $\mathbb P(A)$ the **probability** of the event $A$. ::: This definition has many important consequences such as: * $\mathbb P(\emptyset) = 0$ * $A \subseteq B \implies \mathbb P(A) \leq \mathbb P(B)$ * If $(A_i)_{i \in I}$ is a family of (finite or countably infinite) disjoint events (i.e. $A_i \cap A_j = \emptyset$ for $i \neq j$), then $\mathbb P\left( \bigcup_{i \in I} A_i \right) = \sum_{i \in I} P(A_i)$ ## Discrete probability distributions A probability distribution $\mathbb P$ is **discrete** if the sample set $\Omega$ it is defined over is finite or countably infinite. Recall that a set is countably infinite if it has the same cardinality as the natural numbers $\mathbb N$. Then, for any event $A$, we get $$\mathbb P(A) = \sum_{a \in A} \mathbb P(\{a\}).$$ :::info **Definition (Random variable).** A **(discrete) random variable** $X$ is a function from a (finite or countably infinite) sample space $\Omega$ to the real numbers. ::: A random variable associates a real number with each possible outcome of an experiment, allowing us to work with the probability distribution induced on the resulting set of numbers. **Examples:** * **Uniform distribution:** The probability to pick $s \in S$ is $\mathbb P(X = s) = \frac{1}{|S|}$ (e.g. coin-tossing). * **Binomial distribution:** The probability of getting exactly $k$ successes in $n$ independent trials where each success occurs with the same probability $p$ is given by $\mathbb P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$. ## Expectation values In the probabilistic analysis of algorithms we will be interested in *average* outcomes, modeled by **expectation values**. :::info **Definition (Expectation value).** Let $X$ be a discrete random variable with outcomes $(x_i)_{i \in I}$. The **expected (or expectation or mean) value** of a discrete random variable is $$\mathbb E[X] := \mu_X := \sum_{i \in I} x_i \cdot \mathbb P(X = x_i).$$ ::: **Example.** Consider a game in which you flip two fair coins. You earn $3 for each heads, but lose $2 for each tails. What is the expected/mean outcome of your earnings? The answer is given by the expectation value: $$\begin{align} \mathbb E[X] & = 6 \cdot \mathbb P(\text{$2$ heads}) + 1 \cdot \mathbb P(\text{$1$ head, $1$ tail}) - 4 \cdot \mathbb P(\text{$2$ tails}) \\ & = 6 \cdot \frac{1}{4} + 1 \cdot \frac{1}{2} - 4 \cdot \frac{1}{4} \\ & = 1 \end{align} \\ $$ One can prove that he expectation value has many useful properties: * $\mathbb E[X + Y] = \mathbb E[X] + \mathbb E[Y]$ for random variables $X,Y$ * $E[g(X)] = \sum_{x} g(x) \cdot \mathbb P[X = x]$ for any random variable $X$ and any function $g$ such that the expectation value of $g(X)$ is defined * $\mathbb E[aX] = a \mathbb E[X]$ for any constant $a \in \mathbb R$ * $\mathbb E[XY] = \mathbb E[X] \cdot \mathbb E[Y]$ for independent random variables $X,Y$, where each of the expectations is defined We call two random variables $X$ and $Y$ **independent** if $\mathbb P(X = x, Y= y) = \mathbb P(X = x) \mathbb P(Y = y)$. ## Indicator random variables In general, for a sample space $\Omega$ and an event $A$ we define the **indicator random variable:** $$\mathbb I(A) := X_A := \begin{cases} 1 & \text{if $A$ occurs} \\ 0 & \text{if $A$ does not occur} \\ \end{cases}$$ One can show that for an event $A$, we have $\mathbb E[X_A] = \mathbb P(A)$.