# Basics of probability theory
## Axioms of probability
The following axioms go back to [A. Kolmogorov](https://en.wikipedia.org/wiki/Andrey_Kolmogorov):
:::info **Definition (Probability distribution).**
A **probability distribution** $\mathbb P$ on a sample space $\Omega$ is a mapping from **events of $\Omega$** (certain subsets of $\Omega$) to the real numbers $\mathbb R$ satisfying the following **probability axioms**:
1. $\mathbb P(A) \geq 0$ for any event $A$
2. $\mathbb P(\Omega) = 1$
3. $\mathbb P(A \cup B) = \mathbb P(A) + \mathbb P(B)$ for events $A,B$ that are **mutually exclusiv**, i.e., they are disjoint as subsets: $A \cap B = \emptyset$
We call $\mathbb P(A)$ the **probability** of the event $A$.
:::
This definition has many important consequences such as:
* $\mathbb P(\emptyset) = 0$
* $A \subseteq B \implies \mathbb P(A) \leq \mathbb P(B)$
* If $(A_i)_{i \in I}$ is a family of (finite or countably infinite) disjoint events (i.e. $A_i \cap A_j = \emptyset$ for $i \neq j$), then $\mathbb P\left( \bigcup_{i \in I} A_i \right) = \sum_{i \in I} P(A_i)$
## Discrete probability distributions
A probability distribution $\mathbb P$ is **discrete** if the sample set $\Omega$ it is defined over is finite or countably infinite. Recall that a set is countably infinite if it has the same cardinality as the natural numbers $\mathbb N$. Then, for any event $A$, we get
$$\mathbb P(A) = \sum_{a \in A} \mathbb P(\{a\}).$$
:::info **Definition (Random variable).**
A **(discrete) random variable** $X$ is a function from a (finite or countably infinite) sample space $\Omega$ to the real numbers.
:::
A random variable associates a real number with each possible outcome of an experiment, allowing us to work with the probability distribution induced on the resulting set of numbers.
**Examples:**
* **Uniform distribution:** The probability to pick $s \in S$ is $\mathbb P(X = s) = \frac{1}{|S|}$ (e.g. coin-tossing).
* **Binomial distribution:** The probability of getting exactly $k$ successes in $n$ independent trials where each success occurs with the same probability $p$ is given by $\mathbb P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$.
## Expectation values
In the probabilistic analysis of algorithms we will be interested in *average* outcomes, modeled by **expectation values**.
:::info **Definition (Expectation value).**
Let $X$ be a discrete random variable with outcomes $(x_i)_{i \in I}$. The **expected (or expectation or mean) value** of a discrete random variable is
$$\mathbb E[X] := \mu_X := \sum_{i \in I} x_i \cdot \mathbb P(X = x_i).$$
:::
**Example.** Consider a game in which you flip two fair coins. You earn $3 for each heads, but lose $2 for each tails. What is the expected/mean outcome of your earnings? The answer is given by the expectation value:
$$\begin{align}
\mathbb E[X] & = 6 \cdot \mathbb P(\text{$2$ heads}) + 1 \cdot \mathbb P(\text{$1$ head, $1$ tail}) - 4 \cdot \mathbb P(\text{$2$ tails}) \\
& = 6 \cdot \frac{1}{4} + 1 \cdot \frac{1}{2} - 4 \cdot \frac{1}{4} \\
& = 1
\end{align} \\
$$
One can prove that he expectation value has many useful properties:
* $\mathbb E[X + Y] = \mathbb E[X] + \mathbb E[Y]$ for random variables $X,Y$
* $E[g(X)] = \sum_{x} g(x) \cdot \mathbb P[X = x]$ for any random variable $X$ and any function $g$ such that the expectation value of $g(X)$ is defined
* $\mathbb E[aX] = a \mathbb E[X]$ for any constant $a \in \mathbb R$
* $\mathbb E[XY] = \mathbb E[X] \cdot \mathbb E[Y]$ for independent random variables $X,Y$, where each of the expectations is defined
We call two random variables $X$ and $Y$ **independent** if $\mathbb P(X = x, Y= y) = \mathbb P(X = x) \mathbb P(Y = y)$.
## Indicator random variables
In general, for a sample space $\Omega$ and an event $A$ we define the **indicator random variable:**
$$\mathbb I(A) := X_A := \begin{cases}
1 & \text{if $A$ occurs} \\
0 & \text{if $A$ does not occur} \\
\end{cases}$$
One can show that for an event $A$, we have $\mathbb E[X_A] = \mathbb P(A)$.