owned this note
owned this note
Published
Linked with GitHub
# greedy algo
## 3.1 貪婪演算法簡介
:::success
- 貪婪演算法(Greedy algorithm)是指在對問題求解時,總是做出在當前看來是最好的選擇。也就是說,不從整體上最優(==global optimization==)加以考慮,他所做出的僅是在某種意義上的局部上最優的解(==local optimization==)。貪心演算法不是對所有問題都能得到整體最優解,但對範圍相當廣泛的許多問題他能產生整體最優解或者是整體最優解的近似解。
:::
:::info
有些最佳化問題僅用很簡單的一個貪婪演算法就可解決,例如
- Activity – selection problem : find maximum independent set in interval graph
- Huffman codes
- Min spanning tree problem : Kruskal’s algorithm and Prim’s algorithm (見Graph Algorithm)
- Fractional knapsack problem
:::
:::spoiler
:::success
The activity selection problem is a combinatorial ==optimization problem== concerning the ==selection of non-conflicting== activities to perform within a given time frame, given a set of activities each marked by a start time (s~i~) and finish time (f~i~). The problem is to select the maximum number of activities that can be performed by a single person or machine, assuming that a person can only work on a single activity at a time. The activity selection problem is also known as the ==Interval scheduling maximization problem== (ISMP), which is a special type of the more general ==Interval Scheduling== problem.
A classic application of this problem is in scheduling a room for multiple competing events, each having its own time requirements (start and end time), and many more arise within the framework of operations research.
:::info
Contents
1. Formal definition
2. Optimal solution
1. Algorithm
1. Explanation
2. Proof of optimality
3. Weighted activity selection problem
3. References
4. External links
**Formal definition**
Assume there exist n activities with each of them being represented by a start time s~i~ and finish time f~i~. Two activities i and j are said to be non-conflicting if s~i~ ≥ f~j~ or s~j~ ≥ f~i~. The activity selection problem consists in finding the maximal solution set (S) of non-conflicting activities, or more precisely there must exist no solution set S' such that |S'| > |S| in the case that multiple maximal solutions have equal sizes.
Optimal solution
The activity selection problem is notable in that using a greedy algorithm to find a solution will always result in an optimal solution. A pseudocode sketch of the iterative version of the algorithm and a proof of the optimality of its result are included below.
```= pesudo
Algorithm
Greedy-Iterative-Activity-Selector(A, s, f):
Sort A by finish times stored in
S = {A[1]}
k = 1
n = A.length
for i = 2 to n:
if s[i] ≥ f[k]:
S = S U {A[i]}
k = i
return S
```
Explanation
Line 1: This algorithm is called Greedy-Iterative-Activity-Selector, because it is first of all a greedy algorithm, and then it is iterative. There's also a recursive version of this greedy algorithm.
${\displaystyle A}$A is an array containing the activities.
${\displaystyle s}$s is an array containing the start times of the activities in ${\displaystyle A}$A.
${\displaystyle f}$f is an array containing the finish times of the activities in ${\displaystyle A}$A.
Note that these arrays are indexed starting from 1 up to the length of the corresponding array.
Line 3: Sorts in increasing order of finish times the array of activities ${\displaystyle A}$A by using the finish times stored in the array {\displaystyle f}f. This operation can be done in ${\displaystyle O(n\cdot \log _{2}n)}$O(n \cdot \log_2 n) time, using for example merge sort, heap sort, or quick sort algorithms.
Line 4: Creates a set ${\displaystyle S}$S to store the selected activities, and initialises it with the activity ${\displaystyle A[1]}$A[1] that has the earliest finish time.
Line 5: Creates a variable ${\displaystyle k}$k that keeps track of the index of the last selected activity.
Line 9: Starts iterating from the second element of that array {\displaystyle A}A up to its last element.
Lines 10,11: If the start time ${\displaystyle s[i]}$s[i] of the {\displaystyle ith}ith activity (${\displaystyle A[i]}$A[i]) is greater or equal to the finish time ${\displaystyle f[k]}$f[k] of the last selected activity (${\displaystyle A[k]}$A[k]), then ${\displaystyle A[i]}$A[i] is compatible to the selected activities in the set {\displaystyle S}S, and thus it can be added to ${\displaystyle S}$S.
Line 12: The index of the last selected activity is updated to the just added activity ${\displaystyle A[i]}$A[i].
Proof of optimality
Let ${\displaystyle S=\{1,2,\ldots ,n\}}S=\{1,2,\ldots ,n\}$ be the set of activities ordered by finish time. Assume that ${\displaystyle A\subseteq S}$ ${\displaystyle A\subseteq S}$ is an optimal solution, also ordered by finish time; and that the index of the first activity in A is {\displaystyle k\neq 1}{\displaystyle k\neq 1}, i.e., this optimal solution does not start with the greedy choice. We will show that {\displaystyle B=(A\setminus \{k\})\cup \{1\}}B=(A\setminus \{k\})\cup \{1\}, which begins with the greedy choice (activity 1), is another optimal solution. Since {\displaystyle f_{1}\leq f_{k}}f_{1}\leq f_{k}, and the activities in A are disjoint by definition, the activities in B are also disjoint. Since B has the same number of activities as A, that is, {\displaystyle |A|=|B|}{\displaystyle |A|=|B|}, B is also optimal.
Once the greedy choice is made, the problem reduces to finding an optimal solution for the subproblem. If A is an optimal solution to the original problem S containing the greedy choice, then {\displaystyle A^{\prime }=A\setminus \{1\}}A^{\prime }=A\setminus \{1\} is an optimal solution to the activity-selection problem {\displaystyle S'=\{i\in S:s_{i}\geq f_{1}\}}S' = \{i \in S: s_i \geq f_1\}.
Why? If this were not the case, pick a solution B′ to S′ with more activities than A′ containing the greedy choice for S′. Then, adding 1 to B′ would yield a feasible solution B to S with more activities than A, contradicting the optimality.
Weighted activity selection problem
The generalized version of the activity selection problem involves selecting an optimal set of non-overlapping activities such that the total weight is maximized. Unlike the unweighted version, there is no greedy solution to the weighted activity selection problem. However, a dynamic programming solution can readily be formed using the following approach:[1]
Consider an optimal solution containing activity k. We now have non-overlapping activities on the left and right of k. We can recursively find solutions for these two sets because of optimal sub-structure. As we don't know k, we can try each of the activities. This approach leads to an {\displaystyle O(n^{3})}O(n^{3}) solution. This can be optimized further considering that for each set of activities in {\displaystyle (i,j)}(i,j), we can find the optimal solution if we had known the solution for {\displaystyle (i,t)}(i,t), where t is the last non-overlapping interval with j in {\displaystyle (i,j)}(i,j). This yields an {\displaystyle O(n^{2})}O(n^{2}) solution. This can be further optimized considering the fact that we do not need to consider all ranges {\displaystyle (i,j)}(i,j) but instead just {\displaystyle (1,j)}(1,j). The following algorithm thus yields an {\displaystyle O(n\log n)}O(n\log n) solution:
Weighted-Activity-Selection(S): // S = list of activities
sort S by finish time
opt[0] = 0 // opt[j] represents optimal solution (sum of weights of selected activities) for S[1,2..,j]
for i = 1 to n:
t = binary search to find activity with finish time <= start time for i
// if there are more than one such activities, choose the one with last finish time
opt[i] = MAX(opt[i-1], opt[t] + w(i))
return opt[n]
:::
:::info
有些問題用貪婪演算法做雖然不一定能找到整體最佳解,卻能很有效率的找到一個不錯的近似解(常用來當作heuristic找NP-hard問題的近似解)
- greedy set-covering algorithm (heuristic)
- Approximate-Subset-Sum problem (Knapsack-problem)
:::
:::danger
[補充] 貪婪演算法可以獲得整體最佳解的充分必要條件是它必須具備一種稱為擬陣(matriod)的數學結構。其實應該說,貪婪演算法的正確性的來源正是擬陣。
:::
:::warning
一個關於有限集合 E 的擬陣是一個 (E,U) 對,U 是一個系統E的子集,滿足下列條件:
ϕ∈U
A⊆B,B∈U⇒A∈U
A,B∈U且∣A∣<∣B∣⇒∃x∈B∖A且A∪x∈U
∣A∣是集合A的基數。例如 A={a,b,c},∣A∣=3。
:::
## 3.2 經典貪婪演算法問題
:::success
活動選擇問題 (The activity – selection problem)
假設有n個活動提出申請要使用同一個資源,而這資源在同一時間點時最多只能讓一個活動使用,問題是:從這 n 個活動選一組數量最多,且可以用這個資源執行活動的集合。
:::
:::info
- 正式定義題目 : We are given n activities a1,...,an where i-th activity a~i~=[s~i~,f~i~] starts at time si and finishes at time fi. They require the same resource (for example, one lecture hall). Two activities are ai and aj are compatible if [s~i~,f~i~]∩[s~j~,f~j~]=∅. The activity-selection problem (ASP) is to select a maximum size subset of mutually compatible activities.
:::
:::warning
- 解法 : Greedy choice. Select an activity with minimum f~i~. Remove it and incompatible activities. Continue until no more activities left.
- 複雜度分析 : O(nlog^n^),花在排序上。
:::info
概念補充
把撞期的行程,表示成圖,稱作 `Interval Graph` ,有著很特別的數學性質。工作排程問題等價於在interval graph中找maximum independent set。
在general graph中找max independent set是NP-hard問題。
:::
:::success
:::霍夫曼編碼 (Huffman code)
霍夫曼編碼是一種編碼方式,是一種用於無損資料壓縮的熵編碼(權重編碼)演算法。出現機率高的字母使用較短的編碼,反之出現機率低的則使用較長的編碼,這便使編碼之後的字串的平均長度、期望值降低,從而達到無失真壓縮資料的目的。
:::info
正式定義題目 : Optimal code problem. Given an alphabet C of n characters and frequency of each character, find optimal prefix code so that the compressed file has minimum length. Huffman algorithm constructs the optimal tree T. The characters are the leaves of T.
:::
:::warning
- 解法 : select two vertices x and y of lowest frequency and replace them by a vertex z so that x and y are the children of z and the frequency of z is the sum of frequencies of x and y.
- 複雜度分析 : 使用binary min-heap資料結構 (Heap介紹詳見資料結構筆記)
建heap花費 O(n) 的時間
每次 ExtractMin() 和 Insert() 花費 O(log~n~) 的時間,這個動作重複做n次
因此整個演算法花費 O(nlogn) 的時間

:::
## 3.3 Minimum Spanning Trees
### Spanning Tree
:::success
- 定義
- S = (V, F)為G的一個Spanning Tree且S滿足
- 自F’中任取一邊加入S中必形成Cycle
- 在S中任何頂點對之間必存在一唯一Simple path
:::info
- 性質
- 若G 不連通則G無Spanning Tree
- G中的Spanning Tree不只一個
- 同一G中的任二個不同之Spanning Tree不一定有交集的邊存在
:::
### Minimal Spanning Tree
- 定義
- 
- 在G 的所有Spanning Tree中,具有最小的邊成本總和者。
可求MST的演算法如下
#### Prim’s Algorithm:
- 從一個點出發( U, V-U ),像Dijkstra,由擴張樹的某一頂點與其它頂點的所有邊中挑選出具最小值者,不允許有迴路,O(n^2),使用資料結構Heap優化,O(ElgV)。
- 
#### Sollin’s Algorithm:
- 以merge的概念實作,剩一棵樹結束
**皆是greedy algorithm。**
#### Kruskal’s Algorithm
- 概念
- Kruskal演算法是一種用來尋找最小生成樹的演算法,是greedy演算法的應用。Kruskal演算法是以增加邊的觀念做為出發點。首先將所有的邊,依照權重的大小排序。再來依序加入權重最小的邊,如果造成cycle時,則必須捨棄,直到增加了n - 1條邊為止(假設有 n 個節點)。Kruskal演算法在圖中存在相同權值的邊時也有效。
- 步驟
- 選擇:由spanning tree的所有邊中,挑選出具最小值者**(不允許有迴路)**。
重複直到下列任一條件成立為止
(n-1)個邊已挑出 //n是頂點的個數
無邊可挑,若 |F| < n-1,則無spanning tree
複雜度分析
建邊權重的heap : O(E)
取得並刪除最小權重的邊(u,v) : O(logE)
需偵測取得的邊是否構成cycle,使用set operation操作,同一集合為同set
union(u,v) : O(1)
Find(u) : 花O( a(m,n) ), 逼近O(1)
每回合最多花O(logE)
因此Kruskal’s Algorithm complexity = O(E) + O(ElogE) = O(ElogE)
證明
手法為cut-and-paste
假設最小權重的邊E不在minimum spanning tree T中
在T中加入E會形成cycle
因此拿掉一個比E權重大的邊,形成權重較小的T’,矛盾而得證