# Lecture 3 - MST Algorithm and Lowest Common Ancestor ###### tags: `Algorithm` ## Minimum Spanning Tree (MST) + Given a **connected undirected** graph $G=(V,E)$ with real weight edges $w$ + $n=|V|$, $m=|E|$ + Spanning tree: a tree that connects all vertices + Minimum spanning tree: the spanning tree with minimum total weight ![](https://i.imgur.com/Oj71MUI.png) ### Motivate + First studied by *Czech* scientist *Otakar Borůvka* in 1926 (*Borůvka*'s algorithm) + Its purpose was an efficient electrical coverage of Moravia ### Cut (in undirected graphs) + In graph theory, a **cut** is a partition of the vertices of a graph into two disjoint subsets + For any $S⊆V$, $(S, V\setminus S)$ is a cut ![](https://i.imgur.com/5mph2VW.png) + The **cut-set** of a cut is the set of edges that have one endpoint in each subset of the partition + These edges are said to cross the cut + $C=\{(u,v)| u∈S, v∈V\setminus S \}$ + Sometimes we also call the cut-set a cut ![](https://i.imgur.com/V8WCpIF.png) ### Property of Spanning Tree + Put an edge $e$ not in the tree can form a cycle with tree edges + By the path on the tree connecting two endpoints of $e$ + We can replace any edge on the path with e to get another spanning tree ![](https://i.imgur.com/ry5Sjl7.png) ![](https://i.imgur.com/jfyhYZK.png) #### Cut Property: One of the lightest edge crossing any **cut** of G must be in the MST + Suppose the lightest edge $e$ crossing cut $S$ is not in the MST ![](https://i.imgur.com/un0xH5g.png) + Consider the **path** on the tree connecting two endpoints of $e$ + There must be an edge $f$ on the path crossing the cut which is larger than $e$ ![](https://i.imgur.com/pUrhBU5.png) + **Replace $f$ with $e$ to get smaller spanning tree** ![](https://i.imgur.com/WfyuCKF.png) ##### Corollary: Deleting any edge $e$ in the MST will give us a cut + By the cut property, $e$ is smallest one in the cut ![](https://i.imgur.com/KBImQsz.png) #### Cycle Property: One of the heaviest edge in any **cycle** in G cannot be in the MST + If it is in the MST, then consider the cut formed by deleting heaviset edge $e$ from the tree ![](https://i.imgur.com/MVcxzFj.png) + There must be another edge $f$ in the cycle crossing the cut + We can replace $e$ with $f$ to get a smaller tree ![](https://i.imgur.com/YLcNO6d.png) ## MST Algorithm ### Classic Algorithms #### *Borůvka*'s algorithm + For each vertex, add its **cheapest incident edge** to the tree $T$ + By the cut property, this edge must be in MST + But the smallest edges of two vertices may be the same ![](https://i.imgur.com/YajhY6A.png) + Contract each connected components ![](https://i.imgur.com/atTfE3m.png) + Recursively repeat this until the whole graph is contracted into a single vertex ![](https://i.imgur.com/lt9oPsX.png) + Running time: each iteration takes $O(m)$ time, and there are $O(\log n)$ iterations, so the time is $O(m\log n)$ + **Since every time the number of vertices shrink by at least one half** #### *Kruskal*'s algorithm + Sort all edges + Add edges to the empty graph in increasing order + If the edge connects two components, it is in $T$ ![](https://i.imgur.com/p98xwz2.png) + Thus, we need a **union-find** data structure + Sorting also takes $O(m\log n)$ time. #### *Prim*'s algorithm + Start from a single vertex $W=\{s\}$ + Add the **cheapest edge** with **one endpoint** in $W$ to $T$, and grows $W$ + By the cut property, the smallest edge of the cut $(W, V\setminus W)$ is in the MST ![](https://i.imgur.com/4r6eOae.png) + Running time: $O(m+n\log n)$ by a **Fibonacci heap** + Who discovered it? + *Robert Prim* published it in *Shortest connection networks And some generalizations* in *1957* + *Edsger Dijkstra* published it in *A note on two problems in connexion with graphs* in *1959* + Later, people found that *Vojtěch Jarník* has published it in *O jistém problému minimálním* (in *Czech*) in *1930* + So, sometimes called *Jarník*’s algorithm or *DJP* algorithm #### Better algorithm than $O(m\log n)$ + $O(m \log\log n)$ + *[Yao 1975]* and *[Cheriton, Tarjan 1976]* + *Yao* cited a unpublished result of *Tarjan* with $O(m\sqrt{\log n})$ time + $O(mβ(m,n))$ by **Fibonacci heap** + *[Fredman, Tarjan 1987]* + $β(m,n)=min\{i\in\Bbb N| \log^{(i)}n≤m/n\}$, where $\log^{(i)}n=\underbrace{\log \log ... \log}_{i\ times} n$ ### MST Algorithm with Fibonacci Heap (*Fredman & Tarjan*’s approach) + By Fibonacci heap, total time is: $O($(#edges)$+$(#vertices)$\log ($size of heap$))$ + We can bound the size of heap to reduce the running time to $O($#edges$)$ + It’s published in the same paper as Fibonacci heap + Run the *Prim*’s approach combining with *Borůvka*’s step + When the heap size reaches $k$, stop and start from another vertex not in current trees ![](https://i.imgur.com/biAJgcx.png) + The heap size is bounded by the number of edges going out of the current components ![](https://i.imgur.com/TpwvCzw.png) + Iteration, when the heap size reaches $k$, stop and start from another vertex not in current trees + Until every vertex is in a tree, contract all trees ![](https://i.imgur.com/aNw9YOa.png) + Every tree is associated with $≥k$ edges + So the number of vertices decreases to $n’=(2m)/k$ + Choose $k=2^{(2m)/n}$ ($(2m)/n’=2^{(2m)/n}$), so each iteration takes $O(m)$ time + Until $k≥n$ :::info + First iteration, $k=2^{(2m)/n}$ + Second iteration + $n'$ becomes $(2m)/k=(2m)/2^{(2m)/n}$ + $k'=2^{(2m)/n'}=2^{2^{(2m)/n}}$ + $......$ + In the $i$-th iteration, $k=2^{2^{2^{...2^{(2m)/n}}}}$ ::: + So the number of iterations is bounded by $O(\beta(m,n))$ where $\beta(m,n)=min\{i\in\Bbb N| \log^{(i)}n≤m/n\}$ + Running time $O(m\beta(m,n))$ + $\beta(m,n)=min\{i\in\Bbb N| \log^{(i)}n≤m/n\}$ + Only for very sparse graphs + If $m=n\log^{(k)}n$ for some constant $k$, then it is $O(m)$ ### Best Algorithms for MST + Algorithm + Randomized algorithm with linear time $O(m)$ [*Karger, Klein, Tarjan 1995*] + Deterministic algorithm with $O(m\alpha(m,n))$ time [*Chazelle 2000*] + Deterministic algorithm with proved optimal running time by decision trees [*Pettie, Ramachandran 2002*] + Verification + Verify a MST in linear time $O(m)$ [*Dixon, Rauch, Tarjan 1992*] ## Lowest Common Ancestor (LCA) ![](https://i.imgur.com/1rqV44w.png) + Given $x$, $y$ + The lowest (i.e. deepest) node that has both $x$ and $y$ as descendants + All green nodes: common ancestors + Dark green node: lowest common ancestors + Preprocessing a tree $T$ + Given a sequence of vertex pairs $P=\{(u_1,v_1), (u_2,v_2),…, (u_m,v_m)\}$ + Output the LCAs of every pair in $P$ ![](https://i.imgur.com/8GYFpVw.png) ### *Tarjar*'s off-line LCA algorithm ```python function LCA(u) MakeSet(u); u.ancestor := u; for each v in u.children do LCA(v); Union(u,v); Find(u).ancestor := u; u.color := black; for each v such that {u,v} in P do if v.color == black print "LCA of " + u + " and " + v + " is " + Find(v).ancestor; ``` ### Range Minimum Query (RMQ) + Input: an array $A[1...n]$ of integers + Query: Given $1≤i,j≤n$, find the minimum element in the subarray $A[i...j]$ + Example: $A[1...10]=(8, 7, 3, 20, 2, 17, 5, 21, 11, 12)$, $\min A[6...9]=5$ + $<f(n), g(n)>$-structure + Preprocessing time: $\leq f(n)$ + Query time: $\leq g(n)$ + Size of structure: $\leq f(n)$ #### $<O(n), O(\log n)>$ + Binary Tree + Preprocess: find the minimum element in every following range + $[(k-1)2^i+1,…,k2^i]$ of length $2^i$ ![](https://i.imgur.com/BXtpEPv.png) + Query: every range $A[i,...,j]$ can be partitioned into $O(\log n)$ such ranges ![](https://i.imgur.com/6hYPCEZ.png) #### $<O(n\log n), O(1)>$ + Store the min of every range of length $2^i$ ![](https://i.imgur.com/zRNlIZv.png) + Query $\min A[k,...,j]$: find $2^i≤j-k+1<2^{i+1}$ Then we find $\min A[k,...,k+2^i-1]$ and $\min A[j-2^i+1,...,j]$ ![](https://i.imgur.com/Q8NCHeh.png) + Thus, query time is $O(1)$ ### Reduction from LCA to RMQ + Euler tour of the tree T from the root ![](https://i.imgur.com/uTanRfZ.png) + $A,B,E,B,F,B,A,C,G,I,G,J,G,C,H,C,A,D,A$ ```python function ET-tree(u) Output u; for each v in u children do: ET-tree(v); Output u; ``` + How long is the ET tour of an $n$-node tree + $2n-1$, since every edge visited twice + The levels of vertices in this tour + $L[i]$ for $i$-th element ![](https://i.imgur.com/nh3g1xD.png) + Store first appearance of each node $x$ (representative $R(x)$) ![](https://i.imgur.com/tfxTCf4.png) + $LCA(x,y)=RMQ(L[R(x),...,R(y)])$ ![](https://i.imgur.com/FUt0uNn.png) ![](https://i.imgur.com/lvQNYqn.png) + We only need to construct these lists, which is still $O(n)$ size + Range Minimum Query + $<O(n), O(\log n)>$-algorithm + A binary tree + $<O(n\log n), O(1)>$-algorithm + Store the minimum element of every range of length $2^i$ + Thus, LCA also has $<O(n), O(\log n)>$ and $<O(n\log n), O(1)>$-algorithms ### An $<O(n),O(1)>$ Algorithm (*Bender & Farach-Colton 2000*) + Observation: in the RMQ for LCA problem, two adjacent numbers only differ by $1$ ![](https://i.imgur.com/GZpG3gn.png) + We call this problem $±1$RMQ + Thus, we only need to solve $±1$RMQ in $<O(n),O(1)>$ time + Use a table-lookup technique to precompute answers on small subarrays, thus removing the log factor from the preprocessing + Partition $A$ into $\frac{2n}{\log n}$ blocks of size $\frac{n}{\frac{2n}{\log n}}=\frac{\log n}{2}$ ![](https://i.imgur.com/gVgfPiS.png) + Array $A’[1,..,\frac{2n}{\log n}]$: $A’[i]$ is the minimum element in the $i$-th block of $A$ + Array $B[1,.., \frac{2n}{\log n}]$: $B[i]$ is the position (index) in which value $A’[i]$ occurs ![](https://i.imgur.com/C8zwvSK.png) + For example ![](https://i.imgur.com/F7biTzJ.png) #### Faster $±1$RMQ + Preprocess $A’$ for RMQ using $<O(n\log n), O(1)>$- algorithm. + $A’$s size: $\frac{2n}{\log n}$ + Thus, the preprocessing time on $A’$: $O(\frac{2n}{\log n}\log\frac{2n}{\log n})=O(n)$ + RMQ on $A’$: $<O(n), O(1)>$-time + Having preprocessed $A’$ for RMQ, how to answer $RMQ(i,j)$ queries on $A$? + $i$ and $j$ might be in the same block $\to$ preprocess every block + $i<j$ on different blocks, answer the query as follows + $1$. Compute minima from $i$ to end of its block + $2$. Compute minima of all blocks in between $i$’s and $j$’s blocks + $3$. Compute minima from the beginning of $j$’s block to $j$ + Return **the index of the minimum** of these $3$ values + $2$. takes $O(1)$ time by RMQ on $A’$ + $1$. & $3$. have to answer in-block RMQ queries ##### In-block query + Observation + Let two arrays $X$ and $Y$ such that $\forall X[i]=Y[i]+C$ Then $\forall i,j$ $RMQ_x(i,j)=RMQ_y(i,j)$ ![](https://i.imgur.com/w3VK5qU.png) + There are $O(\sqrt{n})$ normalized blocks + Preprocess + Create $O(\sqrt{n})$ tables of size $O(\log^2n)$ to answer all in block queries + Overall $O(\sqrt{n}\log^2n)=O(n)$ ![](https://i.imgur.com/yYr9k5F.png) #### Overall LCA Algorithm + Preprocess + Create $O(\sqrt{n})$ tables of size $O(\log^2n)$ to answer all in block queries Overall $O(\sqrt{n}\log^2n)=O(n)$ + For each block in $A$ compute which normalized block table it should use $O(n)$ + Preprocess $A’$: $O(n)$ + Query + Query on $A'$: $O(1)$ + Query on in-blocks (lookup in table): $O(1)$ + Overall ±1RMQ complexity: $<O(n), O(1)>$ + Since LCA can be reduced to $±1$RMQ + Thus, LCA has $<O(n), O(1)>$-algorithm ## Verification and Randomized Algorithm for MST ### Verification for MST + Given a tree $T$, check whether $T$ is a MST in $G$ + **Heavy edge** $e=(u,v)$: heavier than any edge in the path $u$ to $v$ in $T$ + By the cycle property, $e$ is not in the MST; otherwise, it is a light edge + Verification of $T$ be a MST of $G$ + All edges $e ∉T$ are heavy edges ![](https://i.imgur.com/3HRQZLt.png) #### Algorithm + Run the *Borůvka*'s algorithm **just for $T$** + We can get a rooted tree + Linear time since the number of edges also shrink by a half each time ![](https://i.imgur.com/6rqG5du.png) + Construct a tree by *Borůvka*'s algorithm ![](https://i.imgur.com/EVmMlVq.png) + Finding the heaviest edge on path $u$ to $v$ in $T$ + If $u$ and $v$ are in the same component in *Borůvka*'s algorithm + Then the heaviest edge must be the edge adjacent to $u$ or $v$ on the path ![](https://i.imgur.com/309btDF.png) + Finding the heaviest edge on path $u$ to $v$ in $T$$\iff$Finding the LCA in the rooted tree + Processes the tree in $O(n)$ time, answers each query in $O(1)$ time + Also preprocess all non-tree edges in $G$ + The heaviest edge is on the path between $u$, $v$ and the $LCA(u,v)$ ![](https://i.imgur.com/OBIKDva.png) ##### Reference + *A Simpler Minimum Spanning Tree Verification Algorithm* by *V. King* + *An Even Simpler Linear-Time Algorithm for Verifying Minimum Spanning Trees* by *T. Hagerup* ### Randomized MST (*Karger, Klein, Tarjan* 1995) #### Idea + Sample each edges with prob. $1/2$ + Recursively find MST + Put light edges back + Recursively find MST ##### *Borůvka*'s step + For each vertex, add its cheapest edge to the tree $T$ + Contract each connected components + **#vertices shrink by at least a half** ![](https://i.imgur.com/FYBBaPP.png) #### Algorithm + Create a contracted graph $G'$ by running two successive ***Borůvka* steps** on $G$ + Create a subgraph $H$ by selecting each edge in $G'$ **with probability $1/2$** **Recursively apply the algorithm** to $H$ to get its minimum spanning forest $F$ + Remove all **heavy edges** from $G’$ w.r.t. $F$ using the linear time minimum spanning tree verification algorithm + **Recursively apply the algorithm** to $G'$ to get its minimum spanning forest #### Lemma: Let $H$ be a subgraph of $G’$ formed by including each edge of $G’$ independently with probability $p$ and let $F$ be the minimum spanning forest of $H$, then the expected number of $F$-light edges in $G’$ is at most $n/p$ where $n$ is the number of vertices in $G’$ ($F$-light edges: not heavy w.r.t. $F$) + We can assume the $H$ is formed by select edges with prob. $p$ in increasing order $e_1, e_2, …, e_m$ + When we have reached $e_i$, consider the current connected components of $H$, let $E’$ be the edges among $e_{i+1},…, e_m$ connecting different components of $H$ + Edges among $e_{i+1},…, e_m$ not in $E’$ cannot be $F$-light ![](https://i.imgur.com/9CVBiNA.png) + Select $E’$-edges with prob. $p$ in $H$ by the order + In $E’$, the edges before the first edge selected in $H$ is $F$-light + Expectation of the number of such edges is $1/p$ since it is geometric distribution + So the expectation of total number of F-light edges is bounded by $n/p$ #### Analysis + Crucial + Step-$1$ decrease #vertices to $n/4$ + Expected #edges remaining at step-$4$ is about $2$ times #vertices by Lemma + $T(m,n)=T(m/4,n/2)+T(n/2,n/4)+O(m)\to T(m,n)=O(m)$