# Greedy Algorithm
## Concept
- builds up a solution in small steps, **choosing a decision at each step myopically** to optimize some underlying criterion
- Intuitive and fast
- Usually not optimal
- prove optimality
1. The greedy algorithm stays ahead.
2. An exchange argument.
## Interval Scheduling Problem
- Given: Set of requests {1, 2, ..., n}, ith request corresponds to an interval with start time s(i) and finish time f(i)
- interval i: $[s(i), f(i))$
- Find a compatible subset of requests of maximum size
- compatible: 不重疊
- maximum: optimal
- Repeat
- Use a simple rule to select a first request i1
- Once i<sub>1</sub> is selected, reject all requests incompatible with i<sub>1</sub>.
- ![](https://i.imgur.com/MDwxz2i.png)
- ![](https://i.imgur.com/MCIle6y.png)
- A is a compatible set of requests because of line 5
- prove optimality
- greedy algorithm **stays ahead**
- 假設有最佳解O,證明|A| = |O| (不一定要完全相同,只要cardinality一樣就好)
- First lemma: ![](https://i.imgur.com/6VTj6jl.png)
- 還不知道k跟m誰大誰小
- proof by induciton:
- Basis step: for r = 1, f(i<sub>1</sub>) ≤ f(j<sub>1</sub>)
- 因為i<sub>1</sub>必是所有人中結束時間最早的,而j<sub>1</sub>只是從O這個subset排序出來結束時間最早的
- 假設r-1時成立: f(i<sub>r-1</sub>) ≤ f(j<sub>r-1</sub>) -- (1)
- 因為O is compatible${\rightarrow}$f(j<sub>r-1</sub>) ≤ s(j<sub>r</sub>) -- (2)
- (1) + (2) ${\rightarrow}$ f(i<sub>r-1</sub>) ≤ s(j<sub>r</sub>) ${\rightarrow}$ j<sub>r</sub>跟i<sub>r-1</sub>不重疊 ${\rightarrow}$ j<sub>r</sub> ${\in}$ R after i<sub>r-1</sub> is selected in line 5
- According to line 3,f(i<sub>r</sub>) ≤ f(j<sub>r</sub>)
- j<sub>r</sub>跟i<sub>r</sub>都在R裡(因為都跟i<sub>r-1</sub> compatible),但greedy最後選擇i<sub>r</sub>,所以f(i<sub>r</sub>)<=f(j<sub>r</sub>)
- ![](https://i.imgur.com/wfVGeGt.png)
- small conclusion: greedy結果的每個finish time都小於O的finish time
- The greedy algorithm returns an optimal set A
- Proof by contradiction
- 假設A不是最佳解${\rightarrow}$|O|= m > k = |A|
- 因為f(i<sub>k</sub>)≤f(j<sub>k</sub>)而且|O| > |A|${\rightarrow}$O裡至少有j<sub>k+1</sub>
- ![](https://i.imgur.com/0ffJ1JM.png)
- f(i<sub>k</sub>) ≤ f(j<sub>k</sub>) ≤ f(j<sub>k+1</sub>)${\rightarrow}$j<sub>k+1</sub>跟i<sub>k</sub> compatible${\rightarrow}$j<sub>k+1</sub>${\in}$R
- greedy只有在R為空時才停止,但是卻在i<sub>k</sub>就停止了,矛盾
- Running time
- O(n<sup>2</sup>):
- Initialization:
- O(n log n): sort R in ascending order of f(i) (根據finish time排序)
- line 5:
- 第一次檢查(n-1)個點跟i是否compatible,第二次(n-2)....總共O(n<sup>2</sup>)
- O(n log n):
- Initialization:
- O(n log n): sort R in ascending order of f(i) (根據finish time排序)
- O(n): construct S, S[i] = s(i)
- 記錄所有人的start time
- Lines 3 and 5:
- O(n): 只需掃過R一次
- 檢查compatible: 檢查f(i)是否小於s(j)
- 只檢查下一個(j=i+1)是否跟i compatible,其餘的交給後面檢查
## Interval partitioning
- multiple resources
- use as few resources as possible
- a.k.a. the interval coloring problem: one resource = one color
- Given: Set of requests {1, 2, ..., n}, ith request corresponds an interval with start time s(i) and finish time f(i)
- interval i: $[s(i), f(i))$
- Goal: Partition these requests into a **minimum** number of **compatible** subsets, each corresponds to one resource
- 用最少資源跑完所有工作
- depth: a set of intervals is the maximum number that pass over any single point on the time-line.
- 任意時刻需要的最大資源數(定義上不等於總共所需資源數)
- Lemma: 在interval partitioning的問題下,總共所需資源至少為depth (lower bound)
- proof:
- 在產生depth的那個時間點,有d個intervalI<sub>1</sub>, ..., I<sub>d</sub>都經過該時間點
- 這d個interval都需要resources,所以至少需要d個這d個interval都需要resources
- Use only d resources (depth) to schedule all requests
- prove the optimality
1. 找到一個所有解都必須符合的bound
2. 證明該演算法總是可以達到bound的結果
- Greedy Algorithm
- ![](https://i.imgur.com/FeQjgeg.png)
1. 把interval根據starting time排序
2. 對1到n中第j個interval:
3. ${\ \ \ \ }$把所有跟j不compatible的label排除
4. ${\ \ \ \ }$如果能從{1, 2, ..., d}中找到沒被排除的label
5. ${\ \ \ \ \ \ \ \ }$把label assign給j
6. ${\ \ \ \ }$否則不label
- Lines 3~5 implementation: 找到compatible with j的resource並label
- 紀錄f(label<sub>i</sub>): label<sub>i</sub>的最後一個interval的finish time
- check compatibility: f(label<sub>i</sub>) ≤ s(I<sub>j</sub>)
- 找出depth:
- ![](https://i.imgur.com/SAkUH3I.png)
1. 把finish time跟start time一起排序
- 若有finish time跟start time剛好切在同個時間點,則finish time優先
2. 從頭掃到尾,resource遇到start time表示使用資源就加1,遇到finish time表示釋放資源就減1,紀錄resource的max值即為depth
- Optimality
- The greedy algorithm assigns every interval a label, and no two overlapping intervals receive the same label.
- proof:
1. 所有interval都會得到label(line 6不會發生)
- 假設interval I<sub>j</sub>跟前面的t個interval overlap${\rightarrow}$t+1個interval會重疊於s(j)
- t+1 ≤ depth ${\rightarrow}$t ≤ depth - 1${\rightarrow}$至少還剩一個label可以assign給I<sub>j</sub>
2. 兩個overlap的interval不會得到相同的label
- 假設I<sub>i</sub>跟I<sub>j</sub> overlap,i < j
- 當for迴圈執行到I<sub>j</sub>時,I<sub>i</sub>在line 3會被exclude(因為兩個overlap)
- 所以不可能I<sub>i</sub>跟I<sub>j</sub>有相同label
- 因為greedy只需要depth數量的label,符合lower bound${\rightarrow}$ Optimal!
- Interval graph (Intersection graph)
- 每個interval對應一個node,若兩個interval overlap,則用edge連接兩個node
- 相鄰的兩個點表示其overlap,必須assign不同顏色(resource),可轉換成coloring問題
## Scheduling to Minimize Lateness
- Given: A single resource is available starting at time s. A set of requests {1, 2, ..., n}, request i requires a contiguous interval of length t<sub>i</sub> and has a **deadline** d<sub>i</sub>
- 所需作業時間及期限
- Goal: Schedule all requests without overlapping so as to minimize the maximum lateness
- Lateness: l<sub>i</sub> = max{0, f(i) – d<sub>i</sub>}.
- 遲交時間最小化
- ![](https://i.imgur.com/PYEwHO9.png)
1. 所需時間短的不一定緊急
2. 餘裕最小(越緊急的越先做): 若所需時間長,可能使其他工作遲交的時間太長
- ![](https://i.imgur.com/8n10ogQ.png)
- no idle time
- ![](https://i.imgur.com/tn5JTX8.png)
- optimal的解如果中間有idle time,則idle time可被壓縮掉,不影響optimality
- **Exchange argument**: Gradually transform an optimal solution to the one found by the greedy algorithm without hurting its quality.
- 對一個optimal solution,把其中不符合greedy rule的部分換成greedy的形式,仍能保持optimality
- **inversion (倒轉)**: An inversion in schedule S is a pair of requests i and j such that s(i) < s(j) but d<sub>j</sub> < d<sub>i</sub>
- 比較早開始但是比較晚才要交
- Lemma 1: All schedules without inversions and without idle time have the same maximum lateness.
- proof:
- ![](https://i.imgur.com/Y95zvT7.png)
- 如果兩種schedule沒有晚做但是要早交的情況,且中間沒有idle time,則這兩種schedule皆以deadline順序做排程,且只會在deadline相同的request上有不同的排序
- 因為deadline時間相同,且所有相同deadline的request所需時間總和相同,所以lateness不會因schedule的順序改變
- Lemma 2: There is an optimal schedule with no inversions and no idle time
- ![](https://i.imgur.com/7yCltgT.png)
1. 如果有inversion發生在job i 跟 j,假設j被排在i之後,且d<sub>j</sub> < d<sub>i</sub>
2. 把 i 跟 j 互換可以消除inversion
3. 因為d<sub>j</sub> < d<sub>i</sub>,所以把j換到i前面不可能增加lateness (甚至可能減少lateness),可維持optimality
- 即使不相鄰也能有相同結論
- Theorem: The greedy schedule S is optimal.
- proof by contradiction:
- 假設O 為有inversion的optimal solution
- 假設O沒有idle time (idle time可被壓縮)
- 若O沒有inversion,則S = O (因為greedy的結果即為沒有inversion沒有idle time,又根據Lemma 1,可推得S 跟 O 有相同maximun lateness)
- 若O有inversion,則當我們把inversion pair互換時,maximun lateness又可能變更小,與optimal的前提矛盾
## Shortest Paths
- Given:
- Directed graph G = (V, E)
- Length l<sub>e</sub> = length of edge e = (u, v) ${\in}$ E
- 成本(距離、時間) ≥ 0
- Source s
- Goal:
- 從s到V中其他點的最短路徑P<sub>v</sub>
- Length of path: path經過的length總和
- ![](https://i.imgur.com/i5X2TFf.png)
- 檢查所有從S裡的node u到不在S裡的node v的最小距離d(u)+l<sub>e</sub>,把距離最小的node v放進S,更新d(v)
- Correctness
- ![](https://i.imgur.com/cNUJVc9.png)
- 一旦node u被放進S,d(u)即為最短距離,不會再因放入新的node改變
- Proof by induction on |S|:
- Basis step: S只有s在裡面,d(s)=0
- Inductive step:
- 假設|S| = k ≥ 1時敘述為真
- 令下一個要被放進S的node為v,(u, v)為從s到v的最短路徑P<sub>v</sub>的最後一段edge,P<sub>v</sub> = P<sub>u</sub> + (u, v)
- 根據假設,因u已經在S內,P<sub>u</sub>為s到u的最短路徑
- 考慮其他從s到v路徑的可能路徑P,此路徑必會跨越S與不是S的點,另x為交界處S內的點,y為交界處S外的點
- 如果沒有y,這輪應該選擇從x出發而非u
- 因為line 4會選擇d'(v) = (d(u)+l<sub>e</sub>)最短的v點放入S,所以P不可能比P<sub>v</sub>還短
- 若
- ![](https://i.imgur.com/vFbTzmz.png)
- d'(v)就已經比d(y)小了,且l<sub>e''=(y,v)</sub> ≥ 0
- 所以新的距離d(v)也是最短的距離
- Dijkstra重要設定: l<sub>e</sub> ≥ 0
- ![](https://i.imgur.com/eV7RMAq.png)
- Implementation
- ![](https://i.imgur.com/9pdMY0g.png)
- ![](https://i.imgur.com/kJfzdVL.png)
- 所有node放在min priority queue裡,每輪更新看得到的node的cost,每次從min priority queue中deque最小的node放進S
## Recap Heaps: Priority Queues
- Priority Queue
- Each element has a priority (key)
- Only the element with highest (or lowest) priority can be deleted
- Max priority queue, or min priority queue
- An element with arbitrary priority can be inserted into the queue at any time
- ![](https://i.imgur.com/vvpoWA8.png)
- Heap
- A max heap is
- 爸媽的key比小孩大的tree
- root的key最大
– A complete binary tree
- Implementation
- ![](https://i.imgur.com/lOCJabD.png)
- Insertion into a Max Heap
- ![](https://i.imgur.com/eMWoKeF.png)
- 把新node先放在尾端,如果新node的key比他爸媽的key大,把新node跟他爸媽交換(Bubble up)
- Time complexity: O(lg n)
- Deletion from a Max Heap
- ![](https://i.imgur.com/VPt34b5.png)
- 把root拿走(key最大),把最尾端的node放到root,如果這個node比他的子孫小,把他跟他子孫交換(trickle down)
- Time complexity: O(lg n)
- Max Heapify
- 從下面往上修正
- ![](https://i.imgur.com/hMWKfZ4.png)
- 數學歸納法,假設小孩已經是heap
- ![](https://i.imgur.com/igPy4oD.png)
- Time complexity: O(*n*lg n)
## Minimum Spanning Trees
- Given: Undirected graph G = (V, E)
- cost c<sub>e</sub> for each edge e = (u, v) ${\in}$ V
- Goal:
- 找到一個edges的子集 T ⊆ E 使得:
- 連結graph中所有點
- 總成本最小
- (V, T)會是什麼?
1. (V, T)一定要連起來
2. (V, T)沒有cycle
- 如果有cycle,拿掉cycle中某條edge,cycle中的點仍然保持連結,且cost減少
- (V, T)是tree
- Greedy Algorithms
- Kruskal's algorithm
1. T = {}
2. 把edge的cost從低到高排列
3. 依序把edge e放進T,但如果e放進T會使T出現cycle,則捨棄
- Prim's algorithm: (c.f. Dijkstra’s algorithm)
1. 從任意點出發(因為最後T一定會連接所有點)
2. 每次把從T看出去cost最小的edge放入T
- Reverse-delete algorithm: (reverse of Kruskal’s algorithm)
1. T = E
2. 把edge的cost從高到低排列
3. 依序把edge e從T拿掉,但如果e從T拿掉會破壞連結性(出現孤島),則不拿掉e
- Cut Property
- Optimality of **Kruskal’s** & **Prim’s algorithms**
- 假設所有cost c<sub>e</sub> 都不一樣
- ![](https://i.imgur.com/ZNgGN9x.png)
- 從所有點中劃分出一個subset,subset交界的edge中cost最小的edge e是safe edge,所有MST都應該包含e
- Wrong Pf: Exchange argument
- T為一個任意的spanning tree而且沒有包含e,希望證明T不是MST
- T為spanning tree,T一定至少有一條edge通過subset交界
- 因為e是交界上cost最小的edge,c<sub>e</sub> < c<sub>f</sub>
- 所以T – {f} + {e}有更小的cost
- 但是T – {f} + {e}還是spanning tree嗎?
- Spanning tree: connected & acyclic
- maybe not connected
- 需要滿足spanning tree的條件
- Modified Pf: Exchange argument
- ![](https://i.imgur.com/ob4y5Dg.png)
- T為一個任意的spanning tree而且沒有包含e,希望證明T不是MST
- T為spanning tree,T一定至少有一條edge e'通過subset交界,e'=(v',w'),v' ${\in}$ S and w' ${\in}$ V–S
- T' = T – {e'} + {e}是spanning tree
- (V, T') is connected
- 因為T為spanning tree,所以S內所有點都相通,S外所有點也相通,交界處的e'被拿掉只有使內外斷絕連結,所以可以繞路改由e相通
- (V, T') must be acyclic
- 因為T為spanning tree,所以S內S外都沒有cycle,只有可能在交界處增加e時會產生cycle,但是因為我們同時拿掉了e',所以不會有cycle
- c<sub>e</sub> < c<sub>e'</sub>,所以T'的cost比T小
- ![](https://i.imgur.com/uYMfOfh.png)
- Cycle Property
- ![](https://i.imgur.com/Q6qTTqC.png)
- Pf: Exchange argument! (Similar to Cut Property)
- ![](https://i.imgur.com/X3JjlP5.png)
- Implementing MST Algorithms
- Dijkstra’s Algorithm
- ![](https://i.imgur.com/1sntc54.png)
- Prim’s Algorithm
- ![](https://i.imgur.com/E1KCtgl.png)
- 只要看cost最小的tree edge就好,不用看從s出發的最小cost
- Kruskal’s Algorithm
- ![](https://i.imgur.com/EatJKbG.png)
- 如果e<sub>i</sub>的兩個端點已經屬於同個tree,就捨棄e<sub>i</sub>
- The Union-Find Data Structure
- **Union-find** data structure represents **disjoint sets**
- Disjoint sets: elements are disjoint
- 每個set有一個代表
- Operations:
- MakeUnionFind(S): 建立一個set
- Find(u): 找u屬於哪個set,return該set的代表
- Union(A, B): 合併 A 跟 B,重新選擇新的代表
- Implementation: **disjoint-set forest**
- 代表為root,連結由小孩指向父母
- Union: 把比較小的樹並到比較大的樹裡(union by rank)
- Find: 找到之後直接把他直接指到root(path compression)
- ![](https://i.imgur.com/1Po52A6.png)
- Running time: union by rank + path compression
- amortized running time per operation is O(α(n)), α(n) < 5
- amortized running time: 平均分攤的running time
- Implementing Kruskal’s Algorithm
- ![](https://i.imgur.com/L5pRoqw.png)
- 每個subtree用disjoint set表示,一開始有n個set
- Line 1: 排序(用heap)
- O(*m*lg m)
- Line 4: 只要find(u)不等於find(v),就把e<sub>j</sub>放進T
- 每條edge有兩端,find兩次
- 2m次
- Line 5: 把e<sub>j</sub>並到T裡
- tree最多n-1條edge
- Comparison sort + simple disjoint set: O(m log n)
- Linear sort + union-find: O(m α(m, n))