Algorithm [Note] - Midterm

# Algorithm [Note] - Midterm > [name=D] ## Ch1 - Introduction to Algorithm ### Analysis of algorithms - Measure the ==goodness of algorithms== - efficiency - upper bound (worst case) - 條件在最糟糕時，問題所要花費**最大**的時間與步驟數 - Measure the ==difficulty of problems== - NP-complete - 代表**困難問題**，目前不存在有效率(效率是指隨著問題數量增加，時間的幅度會快速增長)的解法 - lower bound (best case) - 條件在最佳狀況時，問題**最起碼 / 至少**所要花費的時間與步驟數 - Is the algorithm optimal ? 根據以上兩點，確認此演算法真的已為最佳解、最佳化了嗎 ? ### Problems - `0/1 Knapsack problem` -> `NP-complete` - `Traveling salesperson problem` -> `NP-complete` - `Partition problem` -> `NP-complete` - `Art gallery problem` -> `NP-complete` - `Minimum spanning tree` -> `greedy`, `divide-and-conquer` - `Convex hull` -> `divide-and-conquer` - `One-center problem` -> `prune-and-search` ## Ch2 - The Complexity of Algorithms and the Lower Bounds of Problems ### The time complexity of an Algo - 沒辦法用程式來計算演算法的複雜度 - 不同的 OS，程式運行時間不同 - `Big-O` 演算法時間函式的上限 (Upper bound) - `|f(n)| <= c|g(n)|`, `f(n) = O(g(n))` - 在最壞的狀況下，演算法的執行時間不會超過 Big-O - `O(1)` < `O(log n)` < `O(n)` < `O(n log n)` - `O(n`^2^`)` < `O(n`^3^`)` < `O(n`^k^`)` - `O(2`^n^`)` < `O(3`^n^`)` < `O(n!)` - ![image](https://hackmd.io/_uploads/HkKbOn9lC.png) - `Omega(Ω)` 演算法時間函式的下限 (Lower bound) - `|f(n)| >= c|g(n)|`, `f(n) = Ω(g(n))` - 在最好的狀況下，演算法的執行時間不會小於 Omega (不會比 Omega 快) - `O(1)` -> 所需時間和問題沒有關聯，但他並不代表複雜度為定值，只是代表不會超過一個定值而已 - 複雜度大的 `O(n`^2^`)` 演算法一定會比複雜度小的 `O(n)` 快嗎? - 不一定，`Big-O` 是指一種時間成長規模的趨勢，並且指定為問題數量很多時的時間花費 - 所以當問題數量不多時，只看 `Big-O` 並不是準的 ### the best-, average-, worst-case analysis of Algo - `best case`, `average case`, `worst case` 的分析難易度 - `best case`(easy) < `worst case` < `average case`(hard) - The examples concerning the three cases analysis - `straight insertion sort` 插入排序法 - 方法：在未排序組中選擇一個數字(未排序組的第一個數字)，與已排序組的全部數字進行比對(比對順序是從以排序組的最後一個數字比到第一個數字)，必將其插入進已排序組中正確的位置 - best case : already sorted $$x = 2(n-1) = O(n)$$ - 不用做 innerloop 的移動，只要做 outerloop (每次兩次) - worst case : reversely sorted $$x = \sum_{i=1}^{n-1}(2+d_i) = 2(n-1) + \frac{n(n-1)}{2} = O(n^2)$$ - average case : $$x = \sum_{j=2}^{n}\frac{j+3}{2} = \frac{(n+8)(n-1)}{4} = O(n^2)$$ - `binary search` 二分搜 - best case : 目標為中間值 - 1 step(第一次就找到了) = ==O(1)== - worst case : 目標為最小值 / 最大值 - (log~2~n + 1) steps = ==O(log n)== - average case : - 將數列放進 binary tree (root 為中間值)，此 binary tree 的深度會為比較次數，例如：`1,2,3,4,5,6,7`, root 為 4 - 而要求平均值就是將所有情況都要考慮，因此有 2^t-1^ elements, t = 1, 2, ..., log n+1 - successful search case -> `i`, unsuccessful search case -> `i+1` - k = log n + 1 $$A(n) = \frac{1}{2n+1}\left(\sum_{i=1}^k i\times 2^{i-1}+k(n+1)\right) = O(log\space n)$$ - `straight selection sort` 選擇排序法 - 方法：每次搜尋時，都從未排序組中選擇一個最小數字，並將其插入進未排序組最前面的位置(另外一種做法是，將其和未排序組最前面的數字進行對調互換)，使得每次都能讓未排序組前的數列為已排序組 - 要用什麼 (comparison, change of flag) 作為可分析此 algo 的複雜度，由於 selection sort 很少 data movement，比較多的是在搜尋過程中做比較大小和flag對調的變動。因為在selection sort中，比對次數為固定的( n(n-1) )，所以==用 change of flag 的次數來分析== - best case : already sorted - 每一回合都不需要做 change of flag = ==O(1)== - worst case : reversely sorted - 總共有 n 個回合，每一次比對都要比 n-1 次 = n(n-1) = ==O(n^2^)== - average case - ~~呃至於這怎證明... 我...我覺得不會考而且感覺好複雜喔...dddd~~ - ==O(n log n)== - `quick sort` 快速排序 - 方法：每一個集合都選定一個 pivot，並將pivot左右兩邊分為左集合(小於 pivot)和右集合(大於 pivot) - 過程：取一個 pivot 值，並設置左指標與右指標，每一回合左[右]指標往右[左]走，當左[右]指標比 pivot 值大[小]時，停止移動，並將其左右互換，直到左右指標相遇，當左右指標交錯時，將右指標與 pivot 值對調，最後得到 two sub-lists - best case : a list is split into two sub-lists with almost equal size - n(每回合要比較的次數) * log n(樹高) = ==O(n log n)== - worse case : in each round, the number used to split is either the smallest or the largest 每一回合分割都很不均勻，要碼是最小值作為pivot將其數列分割，不然就是最大值 - e.g. **1** 8 2 4 9 5 6 - ==O(n^2^)== - average case : - 每次選定 pivot 值不一樣，分割出來的結果也會不一樣 - quick sort is optimal in the average case - ==O(n log n)== - `2-D ranking finding` - 定義：given a set S of n pints, fine the rank of every point in S. - `rank(x)=2`, the **rank** of a point x is the number of points dominated by x. 點x包含(dominates)其他點(在set S裡)數量為2。 - **dominate** - A dominates B - A(a~1~, a~2~), B(b~1~, b~2~) - a~1~ > b~1~ && a~2~ > b~2~ - 方法1 [straightforward strategy] - n(n 個點) * n-1(每個點比較 n-1 次) = ==O(n^2^)== - 方法2 [divide-and-conquer]：每次都用一條線劃分兩半(直到線的兩邊各只有一點)，記得往左邊劃分，且在找 ranks of points 時，從左邊的點開始找(rank of 最左邊的點為 0)，當都劃分完畢，也把點找到後，根據兩點y軸判斷，就可將線的兩邊進行合併，並更新每一點的 rank值 (線的左半邊的所有點rank值都不會做更動) - O(n * log n * log n) = ==O(n log^2^n)== - ![image](https://hackmd.io/_uploads/BkAnl-yg0.png) ### The lower bound of a problem - 因為解決問題的方法會一直被發明，而 lower bound 也會一直變，所以algo 的 lower bound 不準，要從問題的角度來找 lower bound - a lower bound of a problem is the least time complexity required for any algo which can be used to solve this problem - 用 Ω 分析問題的難度，無論用什麼演算法，最起碼所需花費的時間，且要找較**高**的 lower bound - categories of lower bound - worst-case lower bound (easy) - average-case lower bound - 當 problem 的 lower bound < algo 的 upper bound，代表還有解法可以改進 - try to find a higher lower bound - try to find a better algo - both of the lower bound and the algo may be improved - 當 problem 的 lower bound = algo 的 upper bound，代表此方法為 optimal - the algo is **optimal**，且不會再找到更好的 algo 了 ### The worst-case lower bound of sorting - any sorting algo( `selection sorting`, `insertion sorting`, ... ) whose basic operation is **compare** and **exchange** operation can be described by a ==binary decision tree==. - different algo have different binary decision tree - lower bound of sorting - 要找 binary tree 的 smallest depth，也就是說要找 balanced tree(此時樹高為最小)，總共有 n! 個葉子 - ==Ω(n log n)== ### Heap Sort - optimal sorting in worst cases - heap - a binary tree - completely balanced (完全平衡) - 如果高度為 h，葉子會在 h 或 h-1 層 - 如果葉子在 h 層的話，皆會靠左 - parent >= children [max heap] - 建構時的時間複雜度為`O(n)`，輸出時的時間複雜度為`O(n log n)`，合併時取較大的時間複雜度，所以為 ==O(n log n)== - [binary tree - construction] - 2(d-L) -> 總共需要做 2(每一次要做兩次比較：children互比、比完跟目標節點比) * (d-L)(總共有d(總樹高)-L(節點以上的高)層要比較) 的比較次數 - 2^L^ -> 在L層，總共會有2^L^個節點 - ==O(n)== $$\sum_{L=0}^{d-1}2(d-L)2^L = O(n)$$ - [binary tree - output] - log i -> 樹高 - 每個節點要做兩次 - ==O(n log n)== $$2\sum_{i=1}^{n-1}log\space i = O(n\space log\space n)$$ - [linear array] - O(n log n) ### the average-case lower bound of sorting - the average time complexity of a sorting algo - ==Ω(n log n)== - quick sort is optimal in the average case - heap sort is optimal in the average case and worst case - ==the external path length / n!== - `the external path length` -> 所有葉子深度的總和 - `n!` -> 總共有 n! 個葉子 ![image](https://hackmd.io/_uploads/HkBr4kxl0.png) :::info ### 整理 - n 個**數字** sorting problem -> 建立 n! 個**葉子**的binary decision tree - 每一個葉子的**深度**，代表該輸入排序好所需要的**比較次數** - ==worst case :== - 針對一種 sorting algo，當輸入較複雜時，需要**最多**比較次數 - 找 binary decision tree 的最大的葉子深度(樹高) - lower bound : - 對所有 sorting algo，**至少**需要的最多比較次數 - 對所有可能的binary decision tree，找**最小**的樹高 - 當 **balanced** 時，樹高最小 - ==average case:== - 針對一種 sorting algo，考慮所有輸入情形，**平均**需要比較次數 - 找 binary decision tree 的**所有葉子的平均深度** - lower bound : - 對所有 sorting algo，**至少**需要的平均比較次數 - 對所有可能的 binary decision tree，找**最小**的所有葉子平均深度 - 當 **balanced** 時，所有葉子的平均深度**最小** ::: ### improving a lower bound through oracles - Problem P : merging two sorted sequences A and B with lengths m and n - [conventional 2-way merging] - complexity : at most m(第一個數列總數) + n(第二個數列總數) - 1 - oracle - 分析問題在什麼狀況底下會有最困難/要處理最多步驟的case，並計算此case所花費的步驟數，再將此數和方法的時間複雜度比對，就知道此方法是否為 optimal - 用 Oracle 方法找 lower bound，是找尋問題**最困難**解決的情況來推導出問題的 lower bound ### finding the lower bound by problem transformation - 假設我們想要找到一個算法來解決一個給定的問題，而該問題 (B) 的複雜度難以分析。我們可以將這個問題轉換為另一個已知下界的問題 (A) - 透過問題之間存在的轉換關係，並根據一個問題的已知 lower bound，找到另一個問題的未知 lower bound - problem A **reduces to (化簡)** problem B - 可以透過解決問題B來解決問題A - iff A can be solved by using any algo which solves B - If A α B, B is more difficult - 將問題 A 先轉換成問題 B，並解決問題 B，再將問題 B 轉換成問題 A - T(tr~1~) + T(tr~2~) < T(B) - T(A) <= T(tr~1~) + T(tr~2~) + T(B) = O(T(B)) - 轉換時間很小 T(tr) ![image](https://hackmd.io/_uploads/rJMpEellC.png) - example : - sorting α convex hull problem - Ω(convex) >= Ω(sorting) - Ω(α) - Ω(n log n) - 3,2,1,4 -> (3,9), (2,4), (1,1), (4,16) -> (1,1), (2,4), (3,9), (4,16) -> 1,2,3,4 - sorting α Euclidean MST - 3,2,1,4 -> (3,0), (2,0), (1,0), (4,0) -> (1,0), (2,0), (3,0), (4,0) -> 1,2,3,4 ## Ch3 - Greedy ### When can we use greedy method - suppose that a problem can be solved by a **sequence** of decisions 可將問題拆分成有階段性且獨立的多個子問題 - the greedy method has that each decision is **locally optimal** 每個子問題中都會有最佳解 - these locally optimal solutions will finally add up to a **globally optimal** solution 能夠將這些子問題的最佳解組成原問題的最佳解 - not all optimization problems can be solved by the greedy method - example : - `shortest paths on a special graph` -> greedy solved - ![image](https://hackmd.io/_uploads/SJhMWK9x0.png) - `shortest paths on a multi-stage graph` -> greedy can't solved - ![image](https://hackmd.io/_uploads/r1O7-FcgC.png) ### Minimum spanning trees (MST) - it may be defined on euclidean space [平面圖型] points or on a graph - a spanning tree [生成樹 - 所有的邊都可將點連起來] with the smallest total weight - ![image](https://hackmd.io/_uploads/Hy_cGYql0.png) ![image](https://hackmd.io/_uploads/HJxoGt5g0.png) - algo - **Kruskal's algo** - time complexity `O(|E| log|E|)` - 每一次都選擇最小的邊，並同時判斷將兩邊的點相連時是否會形成迴圈，如果不會的話就兩點連起 - ![image](https://hackmd.io/_uploads/By5a7FqxC.png) - **Prim's algo** - tiem complexity `O(|V|`^2^`)` - 每一次都要選擇一個新(還未被選擇過)的節點，並依照每次最小邊來選 - ![image](https://hackmd.io/_uploads/H1Z7SYcx0.png) ### The single-source shortest path problem ![image](https://hackmd.io/_uploads/BkX0StcgA.png) - algo - **Dijkstra's algorithm** - time complexity : O(n^2^) ### The 2-way merging problem - 將兩個已排序好的數列，整合為一 - linear 2-way merge algorithm is m~1~+m~2~-1 ![image](https://hackmd.io/_uploads/r16c6F5lA.png) - 某次都選最短的做合併，會最有效率 ![image](https://hackmd.io/_uploads/Sy2ECt5g0.png) - apply this characteristic to **Huffman code** ### Huffman codes - 字元出現頻率高 -> 使用少一點的 bits 做編碼，解決翻譯和解碼時，成本高的問題 - 依照 symbols 的出現頻率做 merging tree，再將左邊填 0，右邊填 1 ![image](https://hackmd.io/_uploads/Hywpkqqx0.png) ### the minimal cycle basis problem - Def : A cycle basis of a graph is a set of cycles such that every cycle in the graph can be generated by applying XOR on some cycles of the basis - Minimal cycle basis : smallest total weight of all edges in this cycle. - `#` of cycles in a cycle basis - k (`#` of dot lines) - (`#` of all edges) - (`#` of all spanning tree) - |E| - (|V| - 1) - |E| - |V| + 1 ### the 2-terminal one to any special channel routing problem - Def : Given two sets of terminals on the upper and lower rows, respectively, we have to connect each upper terminal to the lower row in a one to one fashion. This connection requires that `#` of tracks used is minimized. - ![image](https://hackmd.io/_uploads/rJxgpKW5C.png) - 三角形為需要連線的點 (one to one) - each upper triangle point must connect to lower triangle point - upper terminal △ 數量 < lower row △ 數量 - 規定：不能有重疊的電線，每一條電線都不會碰在一起 - to minimize the density. The density is a lower bound of `#` of tracks. ### the knapsack problem - 和 `0/1 knapsack problem` 不一樣 - `0/1 knapsack problem` : 只有定義拿或不拿，是 NP hard problem - `the knapsack problem` : 物品是可以被切割的，可以只拿部分，不用到全拿(1)，也不用到完全不能拿(0)，是 easy problem - ![image](https://hackmd.io/_uploads/HysaFKys0.png) - the greedy algorithm : - step 1 : sort p~i~/w~i~ into nonincreasing order - step 2 : put the objects into the knapsack according to the sorted sequence as possible as we can. - 題目：n = 3, M = 20, (p~1~, p~2~, p~3~) = (25, 24, 15), (w~1~, w~2~, w~3~) = (18, 15, 10) - 分析： - n 為物品數量， M 為背包可以乘載的重量，p 為價值，w 為重量 - Steps : - 先計算每個物品的一單位重量，能有多少價值，並根據價值來排序擺放進背包的順序 - p~1~ / w~1~ = 25/18 = 1.32 (物品一：1 個重量，有 1.32 個價值) - p~2~ / w~2~ = 24/15 = 1.6 (物品二：1 個重量，有 1.6 個價值) - p~3~ / w~3~ = 15/10 = 1.5 (物品三：1 個重量，有 1.5 個價值) - 所以根據價值大小，擺放順序應該為：物品二 -> 物品三 -> 物品一 - 根據物品可以乘載的重量 (M) 來擺放物品，看能放多少物品 - 先放物品二 (w~2~ = 15) : 20(M) - 15(w~2~) = 5(背包目前剩下五個重量可以被乘載) - 再放物品三 (w~3~ = 10) : 5 - 5(1/2 w~3~) = 0 (背包容量只夠放 1/2 個物品三) - 算價值總和 - p~2~ + 1/2 p~3~ = 24 + 1/2 * 15 = 31.5 - optimal solution : x~1~ = 0, x~2~ = 1, x~3~ = 1/2