22 - The Probabilistic Method

# 22 - The Probabilistic Method ## The probabilistic method - For any event $A$, if $Pr(A) > 0$, then $|A| \geq 1$ - For any random variable $X$, there exists at least one outcome $\omega$ such that $X(\omega) \geq E(X)$ ### Example Theorem: For any graph $G = (V,E)$ with $m$ edges, there exists a partition $(A,B)$ of $V$ such that at least $\frac{m}{2}$ edges have one endpoint in $A$ and one endpoint in $B$. We want to design an experiment that creates a random partition of the graph has this property Proof: For each vertex (node) of $G$, toss a coin to decide if it goes into $A$ or $B$. Let's say if the coin comes up heads the vertex goes into $A$, $B$ otherwise $X = \text{The number of edges that cross } (A,B)$ Claim: $E(X) = \frac{m}{2}$, if this holds then there exists a partition where the theorem holds. Specifically, we can use the second probabilistic method to show that there exists a partition with $\frac{m}{2} crosses. Let $e_1,...,e_m$ be the edges Define $X_i$ $X_i = 1$ If $e_i$ crosses (A,B) $X_i = 0$ Otherwise $E(X) = E(\sum_{i=1}^mX_i) = \sum_{i=1}^mE(X_i)$ $X_i = 1$ if an edge $e_i$ is in $A$ and $B$, so the coin toss for $A$ and $B$ came up $HT$ or $TH$ which has a probability of $\frac{1}{2}$ of happening $E(X_i) = Pr(X_i=1) = \frac{1}{2}$ $=\sum_{i=1}^m \frac{1}{2} = \frac{m}{2}$ ### Example Say we have a graph $G$ with $n$ vertices - $k$-clique: A set of $k$ vertices with all $\binom{k}{2}$ edges between them - $k$-independent set: A set of $k$ vertices with no edges between them We know that every graph on 6 vertices contains a 3-clique or a 3-indepedent set What about bigger numbers? Theorem: For every $n \geq 2$ and every $k \geq 2 \log n + 1$, There exists an $n$-vertex graph with no $k$-clique and no $k$-independent set **Proof:** Define a special graph $G_{n, \frac{1}{2}}$ (Erdos-Reny random graph) - Has $n$ vertices - For each pair of vertices $v,w$ toss a coin to decide if $vw$ is an edge or not **Reminder:** $Pr(A \cup B) = Pr(A) + Pr(B) - Pr(A \cap B)$ This gets very large when theres more than 2 events, so it's generalized as such: $Pr(A \cup B) \leq Pr(A) + Pr(B)$ Which for a log of events is: $Pr(A_1 \cup A_2 \cup ... \cup A_n) \leq Pr(A_1) + Pr(A_2) + ... Pr(A_n)$ (Boole's inequality) **Back to the proof** Procedure: - Fix a set $S$ of $k$ vertices - $Pr(S \text{ is a clique in } G_{n, \frac{1}{2}}) = \frac{1}{2^{\binom{k}{2}}}$ (toss a coin for each pair of vertices $\binom{k}{2}$ probability of coin coming up heads is $\frac{1}{2}$) - $Pr(S \text{ is an independent set}) = \frac{1}{2^{\binom{k}{2}}}$ - $Pr(\text{one or the other}) = \frac{2}{2^{\binom{k}{2}}}$ (Mutually exclusive, sum rule) Let's look at the $2^{\binom{k}{2}}$ $2^{\binom{k}{2}} = 2^{\frac{k(k-1)}{2}} \geq 2^\frac{k \log n}{2} = 2^{k \log n} = n^k$ So: - $Pr(\text{one or the other}) = \frac{2}{2^{\binom{k}{2}}} \leq \frac{2}{n^k}$ Now we have the probability that a group of $k$ vertices either creates a $k$-clique or a $k$-independent set. Now we just have to sum over all subsets of $k$ vertices in the graph and find the probability that either of those events occur ### Another example For any two non-empty sets $x$ and $y$ $d_j(X,Y)$ (Jaccard distance) $= 1- \frac{|X \cap Y|}{|X \cup Y|}$ **Distance function and the triangle inequality** A distance function defines the distance between to things, the distance from anyting to itself should be 0 Triangle inequality states that $|XY| < |XZ + |ZY|$ Lemma: For any two empty sets $X, Y, Z$: $d_j(X,Z) \leq d_j(X,Y) + d_j(Y,Z)$ **Obvious facts** - For any two events $E_1$ and $E_2$ - "$E_1 \text{ implies } E_2$" means $E_1 \subseteq E_2$ - For any three numbers $i, j, k$ - $i \neq k$ implies that $i \neq j$ or $j \neq k$ - $Pr(i \neq k) \leq Pr(i \neq j \text{ or } j \neq k) \leq Pr(i \neq j) + Pr(j \neq k)$ (Boole's inequality) Back to lemma - Let $x_1,...,x_n$ be a random permutation of $X \cup Y \cup Z$ - Let $i = \min\{a : x_a \in X\}$ - Let $j = \min\{a : x_a \in Y\}$ - Let $k = \min\{a : x_a \in Z\}$ - $Pr(i = j) = \frac{|X \cap Y|}{|X \cup Y|}$ - $Pr(i \neq j) = 1 - \frac{|X \cap Y|}{|X \cup Y|}$ $d_j(X,Y)$ - $Pr(j \neq k) = d_j(Y,Z)$ - $Pr(i \neq k) = d_j(X,Z)$ From the obvious facts, we can substitute to show that $d_j(X,Z) \leq d_j(X,Y) + d_j(Y,Z)$ ###### tags: `COMP2804` `Probability`