Greedy Algorithm

# Greedy Algorithm ## Concept - builds up a solution in small steps, **choosing a decision at each step myopically** to optimize some underlying criterion - Intuitive and fast - Usually not optimal - prove optimality 1. The greedy algorithm stays ahead. 2. An exchange argument. ## Interval Scheduling Problem - Given: Set of requests {1, 2, ..., n}, ith request corresponds to an interval with start time s(i) and finish time f(i) - interval i: $[s(i), f(i))$ - Find a compatible subset of requests of maximum size - compatible: 不重疊 - maximum: optimal - Repeat - Use a simple rule to select a first request i1 - Once i1 is selected, reject all requests incompatible with i1. - ![](https://i.imgur.com/MDwxz2i.png) - ![](https://i.imgur.com/MCIle6y.png) - A is a compatible set of requests because of line 5 - prove optimality - greedy algorithm **stays ahead** - 假設有最佳解O，證明|A| = |O| (不一定要完全相同，只要cardinality一樣就好) - First lemma: ![](https://i.imgur.com/6VTj6jl.png) - 還不知道k跟m誰大誰小 - proof by induciton: - Basis step: for r = 1, f(i1) ≤ f(j1) - 因為i1必是所有人中結束時間最早的，而j1只是從O這個subset排序出來結束時間最早的 - 假設r-1時成立: f(ir-1) ≤ f(jr-1) -- (1) - 因為O is compatible${\rightarrow}$f(jr-1) ≤ s(jr) -- (2) - (1) + (2) ${\rightarrow}$ f(ir-1) ≤ s(jr) ${\rightarrow}$ jr跟ir-1不重疊 ${\rightarrow}$ jr ${\in}$ R after ir-1 is selected in line 5 - According to line 3,f(ir) ≤ f(jr) - jr跟ir都在R裡(因為都跟ir-1 compatible)，但greedy最後選擇ir，所以f(ir)<=f(jr) - ![](https://i.imgur.com/wfVGeGt.png) - small conclusion: greedy結果的每個finish time都小於O的finish time - The greedy algorithm returns an optimal set A - Proof by contradiction - 假設A不是最佳解${\rightarrow}$|O|= m > k = |A| - 因為f(ik)≤f(jk)而且|O| > |A|${\rightarrow}$O裡至少有jk+1 - ![](https://i.imgur.com/0ffJ1JM.png) - f(ik) ≤ f(jk) ≤ f(jk+1)${\rightarrow}$jk+1跟ik compatible${\rightarrow}$jk+1${\in}$R - greedy只有在R為空時才停止，但是卻在ik就停止了，矛盾 - Running time - O(n2): - Initialization: - O(n log n): sort R in ascending order of f(i) (根據finish time排序) - line 5: - 第一次檢查(n-1)個點跟i是否compatible，第二次(n-2)....總共O(n2) - O(n log n): - Initialization: - O(n log n): sort R in ascending order of f(i) (根據finish time排序) - O(n): construct S, S[i] = s(i) - 記錄所有人的start time - Lines 3 and 5: - O(n): 只需掃過R一次 - 檢查compatible: 檢查f(i)是否小於s(j) - 只檢查下一個(j=i+1)是否跟i compatible，其餘的交給後面檢查 ## Interval partitioning - multiple resources - use as few resources as possible - a.k.a. the interval coloring problem: one resource = one color - Given: Set of requests {1, 2, ..., n}, ith request corresponds an interval with start time s(i) and finish time f(i) - interval i: $[s(i), f(i))$ - Goal: Partition these requests into a **minimum** number of **compatible** subsets, each corresponds to one resource - 用最少資源跑完所有工作 - depth: a set of intervals is the maximum number that pass over any single point on the time-line. - 任意時刻需要的最大資源數(定義上不等於總共所需資源數) - Lemma: 在interval partitioning的問題下，總共所需資源至少為depth (lower bound) - proof: - 在產生depth的那個時間點，有d個intervalI1, ..., Id都經過該時間點 - 這d個interval都需要resources，所以至少需要d個這d個interval都需要resources - Use only d resources (depth) to schedule all requests - prove the optimality 1. 找到一個所有解都必須符合的bound 2. 證明該演算法總是可以達到bound的結果 - Greedy Algorithm - ![](https://i.imgur.com/FeQjgeg.png) 1. 把interval根據starting time排序 2. 對1到n中第j個interval: 3. ${\ \ \ \ }$把所有跟j不compatible的label排除 4. ${\ \ \ \ }$如果能從{1, 2, ..., d}中找到沒被排除的label 5. ${\ \ \ \ \ \ \ \ }$把label assign給j 6. ${\ \ \ \ }$否則不label - Lines 3~5 implementation: 找到compatible with j的resource並label - 紀錄f(labeli): labeli的最後一個interval的finish time - check compatibility: f(labeli) ≤ s(Ij) - 找出depth: - ![](https://i.imgur.com/SAkUH3I.png) 1. 把finish time跟start time一起排序 - 若有finish time跟start time剛好切在同個時間點，則finish time優先 2. 從頭掃到尾，resource遇到start time表示使用資源就加1，遇到finish time表示釋放資源就減1，紀錄resource的max值即為depth - Optimality - The greedy algorithm assigns every interval a label, and no two overlapping intervals receive the same label. - proof: 1. 所有interval都會得到label(line 6不會發生) - 假設interval Ij跟前面的t個interval overlap${\rightarrow}$t+1個interval會重疊於s(j) - t+1 ≤ depth ${\rightarrow}$t ≤ depth - 1${\rightarrow}$至少還剩一個label可以assign給Ij 2. 兩個overlap的interval不會得到相同的label - 假設Ii跟Ij overlap，i < j - 當for迴圈執行到Ij時，Ii在line 3會被exclude(因為兩個overlap) - 所以不可能Ii跟Ij有相同label - 因為greedy只需要depth數量的label，符合lower bound${\rightarrow}$ Optimal! - Interval graph (Intersection graph) - 每個interval對應一個node，若兩個interval overlap，則用edge連接兩個node - 相鄰的兩個點表示其overlap，必須assign不同顏色(resource)，可轉換成coloring問題 ## Scheduling to Minimize Lateness - Given: A single resource is available starting at time s. A set of requests {1, 2, ..., n}, request i requires a contiguous interval of length ti and has a **deadline** di - 所需作業時間及期限 - Goal: Schedule all requests without overlapping so as to minimize the maximum lateness - Lateness: li = max{0, f(i) – di}. - 遲交時間最小化 - ![](https://i.imgur.com/PYEwHO9.png) 1. 所需時間短的不一定緊急 2. 餘裕最小(越緊急的越先做): 若所需時間長，可能使其他工作遲交的時間太長 - ![](https://i.imgur.com/8n10ogQ.png) - no idle time - ![](https://i.imgur.com/tn5JTX8.png) - optimal的解如果中間有idle time，則idle time可被壓縮掉，不影響optimality - **Exchange argument**: Gradually transform an optimal solution to the one found by the greedy algorithm without hurting its quality. - 對一個optimal solution，把其中不符合greedy rule的部分換成greedy的形式，仍能保持optimality - **inversion (倒轉)**: An inversion in schedule S is a pair of requests i and j such that s(i) < s(j) but dj < di - 比較早開始但是比較晚才要交 - Lemma 1: All schedules without inversions and without idle time have the same maximum lateness. - proof: - ![](https://i.imgur.com/Y95zvT7.png) - 如果兩種schedule沒有晚做但是要早交的情況，且中間沒有idle time，則這兩種schedule皆以deadline順序做排程，且只會在deadline相同的request上有不同的排序 - 因為deadline時間相同，且所有相同deadline的request所需時間總和相同，所以lateness不會因schedule的順序改變 - Lemma 2: There is an optimal schedule with no inversions and no idle time - ![](https://i.imgur.com/7yCltgT.png) 1. 如果有inversion發生在job i 跟 j，假設j被排在i之後，且dj < di 2. 把 i 跟 j 互換可以消除inversion 3. 因為dj < di，所以把j換到i前面不可能增加lateness (甚至可能減少lateness)，可維持optimality - 即使不相鄰也能有相同結論 - Theorem: The greedy schedule S is optimal. - proof by contradiction: - 假設O 為有inversion的optimal solution - 假設O沒有idle time (idle time可被壓縮) - 若O沒有inversion，則S = O (因為greedy的結果即為沒有inversion沒有idle time，又根據Lemma 1，可推得S 跟 O 有相同maximun lateness) - 若O有inversion，則當我們把inversion pair互換時，maximun lateness又可能變更小，與optimal的前提矛盾 ## Shortest Paths - Given: - Directed graph G = (V, E) - Length le = length of edge e = (u, v) ${\in}$ E - 成本(距離、時間) ≥ 0 - Source s - Goal: - 從s到V中其他點的最短路徑Pv - Length of path: path經過的length總和 - ![](https://i.imgur.com/i5X2TFf.png) - 檢查所有從S裡的node u到不在S裡的node v的最小距離d(u)+le，把距離最小的node v放進S，更新d(v) - Correctness - ![](https://i.imgur.com/cNUJVc9.png) - 一旦node u被放進S，d(u)即為最短距離，不會再因放入新的node改變 - Proof by induction on |S|: - Basis step: S只有s在裡面，d(s)=0 - Inductive step: - 假設|S| = k ≥ 1時敘述為真 - 令下一個要被放進S的node為v，(u, v)為從s到v的最短路徑Pv的最後一段edge，Pv = Pu + (u, v) - 根據假設，因u已經在S內，Pu為s到u的最短路徑 - 考慮其他從s到v路徑的可能路徑P，此路徑必會跨越S與不是S的點，另x為交界處S內的點，y為交界處S外的點 - 如果沒有y，這輪應該選擇從x出發而非u - 因為line 4會選擇d'(v) = (d(u)+le)最短的v點放入S，所以P不可能比Pv還短 - 若 - ![](https://i.imgur.com/vFbTzmz.png) - d'(v)就已經比d(y)小了，且le''=(y,v) ≥ 0 - 所以新的距離d(v)也是最短的距離 - Dijkstra重要設定: le ≥ 0 - ![](https://i.imgur.com/eV7RMAq.png) - Implementation - ![](https://i.imgur.com/9pdMY0g.png) - ![](https://i.imgur.com/kJfzdVL.png) - 所有node放在min priority queue裡，每輪更新看得到的node的cost，每次從min priority queue中deque最小的node放進S ## Recap Heaps: Priority Queues - Priority Queue - Each element has a priority (key) - Only the element with highest (or lowest) priority can be deleted - Max priority queue, or min priority queue - An element with arbitrary priority can be inserted into the queue at any time - ![](https://i.imgur.com/vvpoWA8.png) - Heap - A max heap is - 爸媽的key比小孩大的tree - root的key最大 – A complete binary tree - Implementation - ![](https://i.imgur.com/lOCJabD.png) - Insertion into a Max Heap - ![](https://i.imgur.com/eMWoKeF.png) - 把新node先放在尾端，如果新node的key比他爸媽的key大，把新node跟他爸媽交換(Bubble up) - Time complexity: O(lg n) - Deletion from a Max Heap - ![](https://i.imgur.com/VPt34b5.png) - 把root拿走(key最大)，把最尾端的node放到root，如果這個node比他的子孫小，把他跟他子孫交換(trickle down) - Time complexity: O(lg n) - Max Heapify - 從下面往上修正 - ![](https://i.imgur.com/hMWKfZ4.png) - 數學歸納法，假設小孩已經是heap - ![](https://i.imgur.com/igPy4oD.png) - Time complexity: O(*n*lg n) ## Minimum Spanning Trees - Given: Undirected graph G = (V, E) - cost ce for each edge e = (u, v) ${\in}$ V - Goal: - 找到一個edges的子集 T ⊆ E 使得: - 連結graph中所有點 - 總成本最小 - (V, T)會是什麼？ 1. (V, T)一定要連起來 2. (V, T)沒有cycle - 如果有cycle，拿掉cycle中某條edge，cycle中的點仍然保持連結，且cost減少 - (V, T)是tree - Greedy Algorithms - Kruskal's algorithm 1. T = {} 2. 把edge的cost從低到高排列 3. 依序把edge e放進T，但如果e放進T會使T出現cycle，則捨棄 - Prim's algorithm: (c.f. Dijkstra’s algorithm) 1. 從任意點出發(因為最後T一定會連接所有點) 2. 每次把從T看出去cost最小的edge放入T - Reverse-delete algorithm: (reverse of Kruskal’s algorithm) 1. T = E 2. 把edge的cost從高到低排列 3. 依序把edge e從T拿掉，但如果e從T拿掉會破壞連結性(出現孤島)，則不拿掉e - Cut Property - Optimality of **Kruskal’s** & **Prim’s algorithms** - 假設所有cost ce 都不一樣 - ![](https://i.imgur.com/ZNgGN9x.png) - 從所有點中劃分出一個subset，subset交界的edge中cost最小的edge e是safe edge，所有MST都應該包含e - Wrong Pf: Exchange argument - T為一個任意的spanning tree而且沒有包含e，希望證明T不是MST - T為spanning tree，T一定至少有一條edge通過subset交界 - 因為e是交界上cost最小的edge，ce < cf - 所以T – {f} + {e}有更小的cost - 但是T – {f} + {e}還是spanning tree嗎？ - Spanning tree: connected & acyclic - maybe not connected - 需要滿足spanning tree的條件 - Modified Pf: Exchange argument - ![](https://i.imgur.com/ob4y5Dg.png) - T為一個任意的spanning tree而且沒有包含e，希望證明T不是MST - T為spanning tree，T一定至少有一條edge e'通過subset交界，e'=(v',w')，v' ${\in}$ S and w' ${\in}$ V–S - T' = T – {e'} + {e}是spanning tree - (V, T') is connected - 因為T為spanning tree，所以S內所有點都相通，S外所有點也相通，交界處的e'被拿掉只有使內外斷絕連結，所以可以繞路改由e相通 - (V, T') must be acyclic - 因為T為spanning tree，所以S內S外都沒有cycle，只有可能在交界處增加e時會產生cycle，但是因為我們同時拿掉了e'，所以不會有cycle - ce < ce'，所以T'的cost比T小 - ![](https://i.imgur.com/uYMfOfh.png) - Cycle Property - ![](https://i.imgur.com/Q6qTTqC.png) - Pf: Exchange argument! (Similar to Cut Property) - ![](https://i.imgur.com/X3JjlP5.png) - Implementing MST Algorithms - Dijkstra’s Algorithm - ![](https://i.imgur.com/1sntc54.png) - Prim’s Algorithm - ![](https://i.imgur.com/E1KCtgl.png) - 只要看cost最小的tree edge就好，不用看從s出發的最小cost - Kruskal’s Algorithm - ![](https://i.imgur.com/EatJKbG.png) - 如果ei的兩個端點已經屬於同個tree，就捨棄ei - The Union-Find Data Structure - **Union-find** data structure represents **disjoint sets** - Disjoint sets: elements are disjoint - 每個set有一個代表 - Operations: - MakeUnionFind(S): 建立一個set - Find(u): 找u屬於哪個set，return該set的代表 - Union(A, B): 合併 A 跟 B，重新選擇新的代表 - Implementation: **disjoint-set forest** - 代表為root，連結由小孩指向父母 - Union: 把比較小的樹並到比較大的樹裡(union by rank) - Find: 找到之後直接把他直接指到root(path compression) - ![](https://i.imgur.com/1Po52A6.png) - Running time: union by rank + path compression - amortized running time per operation is O(α(n)), α(n) < 5 - amortized running time: 平均分攤的running time - Implementing Kruskal’s Algorithm - ![](https://i.imgur.com/L5pRoqw.png) - 每個subtree用disjoint set表示，一開始有n個set - Line 1: 排序(用heap) - O(*m*lg m) - Line 4: 只要find(u)不等於find(v)，就把ej放進T - 每條edge有兩端，find兩次 - 2m次 - Line 5: 把ej並到T裡 - tree最多n-1條edge - Comparison sort + simple disjoint set: O(m log n) - Linear sort + union-find: O(m α(m, n))

Read more

Untitled

Untitled

Graph