Course : Advanced Analysis of Algorithms

*This is a review note of a course I took back when I was a graduate student. Therefore, this will only include additional material and some implementation of algorithms in C++.* > Reference \: > * Introduction to Algorithms by *Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein* > https://mitpress.mit.edu/9780262046305/introduction-to-algorithms/ # I\. Foundations ## Induction > https://zh.wikipedia.org/zh-tw/%E5%BD%92%E7%BA%B3%E6%8E%A8%E7%90%86 ### Example \(Weak Induction\) **Problem** \: Prove 1 + 2 + 3 + ... + n = n\(n + 1\) \/ 2 **Basis** \: If n = 0 **Inductive hypothesis** \: Assume a arbitrary n, 1 + 2 + 3 + ... + n = n\(n + 1\) \/ 2 **Step** \: \( Show true for n + 1 \) 1 + 2 + 3 + ... + n + n+1, Reduce this with the hypothesis and reform it back to the problem provided formula. \(1 + 2 + 3 + ... + n\) + \(n + 1\) = \(n+1\)\(n+1+1\)\/2 :::info :information_source: **數學歸納法** https://aaphi.blogspot.com/2011/02/proof-by-induction.html ::: ## Asymptotic Performance In my opinion, using big-O as performance indication of a algorithm is vary rough and most of the time not correct. Performance of a algorithm in a software system should be compared by measuring. The reason of big-O being not correct is because the assumption of big-O as it consider all instruction take same time and only take loops in a program into consideration. ## Recurrsion > An equation that describes a function in terms of its value on smaller inputs ### Notation Example \: Merge Sort ![image](https://hackmd.io/_uploads/BJaRD2roR.png =x100) ### The Master Theorem :::info :information_source: **Master Theorem** https://mycollegenotebook.medium.com/%E6%99%82%E9%96%93%E8%A4%87%E9%9B%9C%E5%BA%A6-%E9%81%9E%E8%BF%B4-%E4%B8%8B-master-th-307ad4608ab6 ::: # II\. Order Statistic With a sorted array in ascending order, * A minimum is *1*st * A maximum is *n*st * A median is *n\/2* if *n* is even, there are 2 medians ## Find Minimum element and Maximum element A algorithm with O\(3n\/2\) * Compare each element in pair * Assume A \& B, Compare A with B, Compare Largest to Maximum and Smallest to Minimum ## Find the *i*th Smallest Element of a Set ### Randomized Selection ```clike RandomizedSelection(A, p, r, i) if (p == r) then return A[p] q = RandomizedPartition(A, p, r) k = q - p + 1 if (i == k) then return A[q] if (i < k) then return RandomizedSelection(A, p, q-1, i) else return RandomizedSelection(A, q+1, r, i-k) ``` ![image](https://hackmd.io/_uploads/rJzkz6BiA.png =x150) Notice that, in the above pseudo code, if the `i > k`, we are searching for `i - k`th number not `i`th anymore. ### Implementation > Reference \: https://www.shubo.io/quick-sort/ ![image](https://hackmd.io/_uploads/BJ104kIo0.png =x80) There are two quick Sort partition schemes can be used. **Lomuto Partition Scheme** 1. Select the last element as pivot point 2. Use an indicator `i` to denote a location to swap a element less than pivot 3. Traverse from `lo` to `hi-1`. If an element is less than or equal to pivot, swap the element with element in `i` location and increament `i`. 4. After traversing, swap pivot with element in `i` location. 5. `i` is the pivot index and the array is partitioned. ```cpp int lomutoPartition(int* data, int lo, int hi) { int pivot = data[hi]; int i = lo; for (int j = lo; j < hi; ++j) { if (data[j] <= pivot) { data[i] = data[i] ^ data[j]; data[j] = data[i] ^ data[j]; data[i] = data[i] ^ data[j]; i++; } } data[i] = data[i] ^ data[hi]; data[hi] = data[i] ^ data[hi]; data[i] = data[i] ^ data[hi]; return i; } ``` **Hoare Partition Scheme** *TODO* **Quick Select** In the following code snippet, the implementation is *iterative* instead of *recursive*. Acessing `k-1` is because that is index, which start from `0` and `k` is order which start from `1`. ```cpp int QuickSelect(int* data, int l, int r, int k) { while (l < r) { int pivotIdx = lomutoPartition(data, l, r); if (k - 1 == pivotIdx) return data[k - 1]; else if (k - 1 < pivotIdx) r = pivotIdx - 1; else l = pivotIdx + 1; } return data[k - 1]; } ``` :::info :information_source: Quick Select in `std` * `std::nth_element` in C++ https://www.geeksforgeeks.org/cpp/stdnth_element-in-cpp/ ::: ## Find Median in 5 Element > Reference \: https://stackoverflow.com/questions/480960/how-do-i-calculate-the-median-of-five-in-c ```cpp /* * Reference : https://stackoverflow.com/questions/480960/how-do-i-calculate-the-median-of-five-in-c * If the array is bigger than 5, only the first 5 will be take into consideration */ int MedianOfFive(const int* data) { // a, b, c, d, e // 0, 1, 2, 3, 4 int* temp = new int[5]; int result = 0; memcpy(temp, data, sizeof(int) * 5); // make a < b if (temp[1] < temp[0]) { std::swap(temp[1], temp[0]); } // make c < d if (temp[3] < temp[2]) { std::swap(temp[3], temp[2]); } // eliminate the lowest // notice that some elements will be eleminated // therefore, not all the original data will be kept in final temp if (temp[2] < temp[0]) { std::swap(temp[3], temp[1]); temp[2] = temp[0]; } // take e into consideration // a = e temp[0] = temp[4]; // make a < b if (temp[1] < temp[0]) { std::swap(temp[1], temp[0]); } // eliminate another lowest if (temp[0] < temp[2]) { std::swap(temp[1], temp[3]); temp[0] = temp[2]; } if (temp[3] < temp[0]) { result = temp[3]; } else { result = temp[0]; } delete[] temp; return result; } ``` ## Linear Time Selection \: Find Great Pivot Point ### Steps ![image](https://hackmd.io/_uploads/HJUoFjX11g.png =x350) 1. Divide the input `vector` in `group of 5`. The last group can have less than 5 elements. 2. For each `group` find the median, \(Previous Algorithm can be used ***Find Median in 5 Element***\) and collect all medians into a set `M`. 3. Find the median `p` of the set `M`. 4. Use `p` as pivot point. ### Analyse ![image](https://hackmd.io/_uploads/ByMY2oX1Je.png) # III\. Dynamic Programming It is used, when the solution can be recursively described in terms of solutions to subproblems :::info :information_source: **演算法筆記系列 — Dynamic programming 動態規劃** https://medium.com/技術筆記/演算法筆記系列-dynamic-programming-動態規劃-de980ca4a2d3 ::: ## Longest Common Subsequence ![image](https://hackmd.io/_uploads/rJKdH36i0.png =x90) Brute force algorithm would compare each subsequence of X with the symbols in Y which will take ![image](https://hackmd.io/_uploads/HyDfUnasR.png) ```clike LCS-Length(X, Y) m = length(X) n = length(Y) for i = 1 to m c[i, 0] = 0 for j = 1 to n c[0, j] = 0 // Stage 1 for i = 1 to m for j = 1 to n if (X[i] == Y[j]) c[i, j] = c[i - 1, j - 1] +1 else c[i, j] = max(c[i - 1, j], c[i, j - 1]) return c ``` ## Initial Stage In stage one, initialize the table and initialize row`0` and column`0` to `0` ![image](https://hackmd.io/_uploads/Hk1CanpoA.png =x400) ## Loop Stage We will only follow two rows with two different colors. The loop will fill the table and the right bottom corner is the length of LCS. ![image](https://hackmd.io/_uploads/By_uxp6sC.png =x400) ## Complexity \& Back Track the String ![image](https://hackmd.io/_uploads/SJzrZT6s0.png =x40) ![image](https://hackmd.io/_uploads/B1hLMaTjA.png =x400) # IV\. Greedy Algorithm TODO \: Lists more examples > When we have a choice to make, make the one that looks best right now. Expect a locally optimal choice getting a globally optimal solution. Usually a greedy algorithm can also be solved with a dynamic programming algorithm. # V\. Amortized Analysis Reference \: * 淺談均攤分析 - Amortized Analysis https://s311354.github.io/Louis.github.io/2022/01/19/%E6%B7%BA%E8%AB%87%E5%9D%87%E6%94%A4%E5%88%86%E6%9E%90/ * 圖論演算法ch17 — Amortized Analysis https://cihcih.medium.com/圖論演算法ch17-amortized-analysis-f63f314851cd * Why is push_back in C++ vectors constant amortized? https://cs.stackexchange.com/questions/9380/why-is-push-back-in-c-vectors-constant-amortized # VI\. Graph Algorithms ## Adjacency Matrix > Reference \: Programming Massively Parallel Processors > https://shop.elsevier.com/books/programming-massively-parallel-processors/hwu/978-0-323-91231-0 ![image](https://hackmd.io/_uploads/SkAHLJW5R.png) ## Minimum Spanning Trees With a input of connected, undirected and weight graph, a minimum spanning tree is a tree that connect all vertices with a minimum weight. ![image](https://hackmd.io/_uploads/BJPzulO30.png =x250) A minimum spanning tree with a cut can be separate into two minimum spanning trees for two subgraphs. This is a optimal substructure. :::info :information_source: **Prim's Algorithm \& Kruskal's algorithm Reference** * 演算法 —最小生成樹\(Minimum spanning tree\) https://ithelp.ithome.com.tw/articles/10338718 * Kruskal's Spanning Tree Algorithm https://ithelp.ithome.com.tw/articles/10239656 ::: ## Shortest Path Algorithm A shortest path from `u` to `v` is a path of minimum weight from `u` to `v`. * A negative-weight should will make some shortest path not exist. * A subpath of a shortest path is a shortest path of a subgraph. * A graph with weighted edges utilize `priority_queue` * A graph with unweighted edges utilize `FIFO queue` ### Dijkstra's Algorithm ![image](https://hackmd.io/_uploads/ryU3ZnCnC.png) ![image](https://hackmd.io/_uploads/HJz0_2AnC.png) :::info :information_source: **\[演算法\] 學習筆記 — 14. Dijkstra Algorithm 最短路徑演算法** This article provide a step by step walk through of the algorithm and code example is provided as well. https://medium.com/@amber.fragments/演算法-學習筆記-14-dijkstra-algorithm-最短路徑演算法-745983dd4332 ::: :::info :information_source: **Bellman-Ford Algorithm** An algorithm will be able to find all shortest path or determines that a negative-weight cycle exist. > Reference \: https://medium.com/@yining1204/演算法圖鑑讀書筆記-第肆章-圖形搜尋-中-61e9190329e0 ::: :::info :information_source: **Floyd–Warshall algorithm** An algorithm that can handle negative paths but slow. The final result table will be the **shortest path between all pair of vertices**. https://www.youtube.com/watch?v=4OQeCuLYj-4 ::: ## Minimum Cut For a graph with undirected and unweighted edges, a cut is a partition of the vertices into two noempty parts. A minimum cut is a cut that has fewest edges possible crossing it. ![image](https://hackmd.io/_uploads/BkuP1fyaR.png =x300) ### Karger's Algorithm * A randomized algorithm * Might be wrong :::info :information_source: **Randomized Algorithm** * *Las Vegas randomized algorithm* \: QuickSort * It's always correct * It might be slow * *Monte Carlo randomized algorithm* \: Karger's Algorithm * It's always fast * It might be wrong ::: **Steps** 1. Randomly choose an edge. 2. Combine vertices that connected by the choson edge 3. Keep tracking the edges \(connected vertices\) 4. Loop from Step 1 till we have *two vertices* left ![image](https://hackmd.io/_uploads/rJy1Tf16C.png =x300) **How it works?** Intuitively, the combined super-edges will be in higher chance that will be chosen by the random algorithm. Therefore, we might be left with two big super-node with few edges connected them \(Since this two super-node or nodes will be connected with the most small amount of edges \(if we are lucky\)\) **How to the pobability that we are correct bigger** * forking on different stages * run multiple times with at least ![image](https://hackmd.io/_uploads/rJPuZmypA.png =x40) times ## Maximum Flow ### Graph for Maximum Flow ![image](https://hackmd.io/_uploads/Byf6uGep0.png =x300) An **s-t cut** is a cut which separates s from t. The cost of a cut is the sum of the capacities of the edges that cross the cut. Additionaly, we only count the edges that goes from s to t, if a edge goes from t to s, the edge is not counted. ### Flows A flow can be consider as supplies that need to be transported. A flow on the edge cannot be greater than the capacity of an edge and the amount of stuff coming out of s should equal to the stuff flowing into t. :::success :bulb: **A maximum flow is the same as the minimum cut of the graph** ::: ### Ford-Fulkerson Algorithm **Residual Network** ![image](https://hackmd.io/_uploads/SyndpzxaR.png =x160) *Intuitively, consider a traveling a green edge as minus a capacity on certain edge which might increase the other flows on other edges.* **Example of traveling a green edge.** ![image](https://hackmd.io/_uploads/Hy5YRflaC.png)