# Algorithm [Note] - Midterm
> [name=D]
## Ch1 - Introduction to Algorithm
### Analysis of algorithms
- Measure the ==goodness of algorithms==
- efficiency
- upper bound (worst case)
- 條件在最糟糕時,問題所要花費**最大**的時間與步驟數
- Measure the ==difficulty of problems==
- NP-complete
- 代表**困難問題**,目前不存在有效率(效率 是指 隨著問題數量增加,時間的幅度會快速增長)的解法
- lower bound (best case)
- 條件在最佳狀況時,問題**最起碼 / 至少**所要花費的時間與步驟數
- Is the algorithm optimal ?
根據以上兩點,確認此演算法真的已為最佳解、最佳化了嗎 ?
### Problems
- `0/1 Knapsack problem` -> `NP-complete`
- `Traveling salesperson problem` -> `NP-complete`
- `Partition problem` -> `NP-complete`
- `Art gallery problem` -> `NP-complete`
- `Minimum spanning tree` -> `greedy`, `divide-and-conquer`
- `Convex hull` -> `divide-and-conquer`
- `One-center problem` -> `prune-and-search`
## Ch2 - The Complexity of Algorithms and the Lower Bounds of Problems
### The time complexity of an Algo
- 沒辦法用程式來計算演算法的複雜度
- 不同的 OS,程式運行時間不同
- `Big-O` 演算法時間函式的上限 (Upper bound)
- `|f(n)| <= c|g(n)|`, `f(n) = O(g(n))`
- 在最壞的狀況下,演算法的執行時間不會超過 Big-O
- `O(1)` < `O(log n)` < `O(n)` < `O(n log n)`
- `O(n`^2^`)` < `O(n`^3^`)` < `O(n`^k^`)`
- `O(2`^n^`)` < `O(3`^n^`)` < `O(n!)`
- 
- `Omega(Ω)` 演算法時間函式的下限 (Lower bound)
- `|f(n)| >= c|g(n)|`, `f(n) = Ω(g(n))`
- 在最好的狀況下,演算法的執行時間不會小於 Omega (不會比 Omega 快)
- `O(1)` -> 所需時間和問題沒有關聯,但他並不代表複雜度為定值,只是代表不會超過一個定值而已
- 複雜度大的 `O(n`^2^`)` 演算法一定會比複雜度小的 `O(n)` 快嗎?
- 不一定,`Big-O` 是指一種時間成長規模的趨勢,並且指定為問題數量很多時的時間花費
- 所以當問題數量不多時,只看 `Big-O` 並不是準的
### the best-, average-, worst-case analysis of Algo
- `best case`, `average case`, `worst case` 的分析難易度
- `best case`(easy) < `worst case` < `average case`(hard)
- The examples concerning the three cases analysis
- `straight insertion sort` 插入排序法
- 方法:在未排序組中選擇一個數字(未排序組的第一個數字),與已排序組的全部數字進行比對(比對順序是從以排序組的最後一個數字比到第一個數字),必將其插入進已排序組中正確的位置
- best case : already sorted
$$x = 2(n-1) = O(n)$$
- 不用做 innerloop 的移動,只要做 outerloop (每次兩次)
- worst case : reversely sorted
$$x = \sum_{i=1}^{n-1}(2+d_i) = 2(n-1) + \frac{n(n-1)}{2} = O(n^2)$$
- average case :
$$x = \sum_{j=2}^{n}\frac{j+3}{2} = \frac{(n+8)(n-1)}{4} = O(n^2)$$
- `binary search` 二分搜
- best case : 目標為中間值
- 1 step(第一次就找到了) = ==O(1)==
- worst case : 目標為最小值 / 最大值
- (log~2~n + 1) steps = ==O(log n)==
- average case :
- 將數列放進 binary tree (root 為中間值),此 binary tree 的深度會為比較次數,例如:`1,2,3,4,5,6,7`, root 為 4
- 而要求平均值就是將所有情況都要考慮,因此有 2^t-1^ elements, t = 1, 2, ..., log n+1
- successful search case -> `i`, unsuccessful search case -> `i+1`
- k = log n + 1
$$A(n) = \frac{1}{2n+1}\left(\sum_{i=1}^k i\times 2^{i-1}+k(n+1)\right) = O(log\space n)$$
- `straight selection sort` 選擇排序法
- 方法:每次搜尋時,都從未排序組中選擇一個最小數字,並將其插入進未排序組最前面的位置(另外一種做法是,將其和 未排序組最前面的數字進行對調互換),使得每次都能讓未排序組前的數列為已排序組
- 要用什麼 (comparison, change of flag) 作為可分析此 algo 的複雜度,由於 selection sort 很少 data movement,比較多的是在搜尋過程中做比較大小和flag對調的變動。因為在selection sort中,比對次數為固定的( n(n-1) ),所以==用 change of flag 的次數來分析==
- best case : already sorted
- 每一回合都不需要做 change of flag = ==O(1)==
- worst case : reversely sorted
- 總共有 n 個回合,每一次比對都要比 n-1 次 = n(n-1) = ==O(n^2^)==
- average case
- ~~呃 至於這怎證明... 我...我覺得不會考 而且感覺好複雜喔...dddd~~
- ==O(n log n)==
- `quick sort` 快速排序
- 方法:每一個集合都選定一個 pivot,並將pivot左右兩邊分為左集合(小於 pivot)和右集合(大於 pivot)
- 過程:取一個 pivot 值,並設置左指標與右指標,每一回合左[右]指標往右[左]走,當左[右]指標比 pivot 值大[小]時,停止移動, 並將其左右互換,直到左右指標相遇,當左右指標交錯時,將右指標與 pivot 值對調,最後得到 two sub-lists
- best case : a list is split into two sub-lists with almost equal size
- n(每回合要比較的次數) * log n(樹高) = ==O(n log n)==
- worse case : in each round, the number used to split is either the smallest or the largest 每一回合分割都很不均勻,要碼是最小值作為pivot將其數列分割,不然就是最大值
- e.g. **1** 8 2 4 9 5 6
- ==O(n^2^)==
- average case :
- 每次選定 pivot 值不一樣,分割出來的結果也會不一樣
- quick sort is optimal in the average case
- ==O(n log n)==
- `2-D ranking finding`
- 定義:given a set S of n pints, fine the rank of every point in S.
- `rank(x)=2`, the **rank** of a point x is the number of points dominated by x. 點x包含(dominates)其他點(在set S裡)數量為2。
- **dominate**
- A dominates B
- A(a~1~, a~2~), B(b~1~, b~2~)
- a~1~ > b~1~ && a~2~ > b~2~
- 方法1 [straightforward strategy]
- n(n 個點) * n-1(每個點比較 n-1 次) = ==O(n^2^)==
- 方法2 [divide-and-conquer]:每次都用一條線劃分兩半(直到線的兩邊各只有一點),記得往左邊劃分,且在找 ranks of points 時,從左邊的點開始找(rank of 最左邊的點為 0),當都劃分完畢,也把點找到後,根據兩點y軸判斷,就可將線的兩邊進行合併,並更新每一點的 rank值 (線的左半邊的所有點rank值都不會做更動)
- O(n * log n * log n) = ==O(n log^2^n)==
- 
### The lower bound of a problem
- 因為解決問題的方法會一直被發明,而 lower bound 也會一直變,所以algo 的 lower bound 不準,要從問題的角度來找 lower bound
- a lower bound of a problem is the least time complexity required for any algo which can be used to solve this problem
- 用 Ω 分析問題的難度,無論用什麼演算法,最起碼所需花費的時間,且要找較**高**的 lower bound
- categories of lower bound
- worst-case lower bound (easy)
- average-case lower bound
- 當 problem 的 lower bound < algo 的 upper bound,代表還有解法可以改進
- try to find a higher lower bound
- try to find a better algo
- both of the lower bound and the algo may be improved
- 當 problem 的 lower bound = algo 的 upper bound,代表此方法為 optimal
- the algo is **optimal**,且不會再找到更好的 algo 了
### The worst-case lower bound of sorting
- any sorting algo( `selection sorting`, `insertion sorting`, ... ) whose basic operation is **compare** and **exchange** operation can be described by a ==binary decision tree==.
- different algo have different binary decision tree
- lower bound of sorting
- 要找 binary tree 的 smallest depth,也就是說要找 balanced tree(此時樹高為最小),總共有 n! 個葉子
- ==Ω(n log n)==
### Heap Sort - optimal sorting in worst cases
- heap
- a binary tree
- completely balanced (完全平衡)
- 如果高度為 h,葉子會在 h 或 h-1 層
- 如果葉子在 h 層的話,皆會靠左
- parent >= children [max heap]
- 建構時的時間複雜度為`O(n)`,輸出時的時間複雜度為`O(n log n)`,合併時取較大的時間複雜度,所以為 ==O(n log n)==
- [binary tree - construction]
- 2(d-L) -> 總共需要做 2(每一次要做兩次比較:children互比、比完跟目標節點比) * (d-L)(總共有d(總樹高)-L(節點以上的高)層要比較) 的比較次數
- 2^L^ -> 在L層,總共會有2^L^個節點
- ==O(n)==
$$\sum_{L=0}^{d-1}2(d-L)2^L = O(n)$$
- [binary tree - output]
- log i -> 樹高
- 每個節點要做兩次
- ==O(n log n)==
$$2\sum_{i=1}^{n-1}log\space i = O(n\space log\space n)$$
- [linear array]
- O(n log n)
### the average-case lower bound of sorting
- the average time complexity of a sorting algo
- ==Ω(n log n)==
- quick sort is optimal in the average case
- heap sort is optimal in the average case and worst case
- ==the external path length / n!==
- `the external path length` -> 所有葉子深度的總和
- `n!` -> 總共有 n! 個葉子

:::info
### 整理
- n 個**數字** sorting problem -> 建立 n! 個**葉子**的binary decision tree
- 每一個葉子的**深度**,代表該輸入排序好所需要的**比較次數**
- ==worst case :==
- 針對一種 sorting algo,當輸入較複雜時,需要**最多**比較次數
- 找 binary decision tree 的最大的葉子深度(樹高)
- lower bound :
- 對所有 sorting algo,**至少**需要的最多比較次數
- 對所有可能的binary decision tree,找**最小**的樹高
- 當 **balanced** 時,樹高最小
- ==average case:==
- 針對一種 sorting algo,考慮所有輸入情形,**平均**需要比較次數
- 找 binary decision tree 的**所有葉子的平均深度**
- lower bound :
- 對所有 sorting algo,**至少**需要的平均比較次數
- 對所有可能的 binary decision tree,找**最小**的所有葉子平均深度
- 當 **balanced** 時,所有葉子的平均深度**最小**
:::
### improving a lower bound through oracles
- Problem P : merging two sorted sequences A and B with lengths m and n
- [conventional 2-way merging]
- complexity : at most m(第一個數列總數) + n(第二個數列總數) - 1
- oracle
- 分析問題在什麼狀況底下會有最困難/要處理最多步驟的case,並計算此case所花費的步驟數,再將此數和方法的時間複雜度比對,就知道此方法是否為 optimal
- 用 Oracle 方法找 lower bound,是找尋問題**最困難**解決的情況來推導出問題的 lower bound
### finding the lower bound by problem transformation
- 假設我們想要找到一個算法來解決一個給定的問題,而該問題 (B) 的複雜度難以分析。我們可以將這個問題轉換為另一個已知下界的問題 (A)
- 透過問題之間存在的轉換關係,並根據一個問題的已知 lower bound,找到另一個問題的未知 lower bound
- problem A **reduces to (化簡)** problem B
- 可以透過解決問題B來解決問題A
- iff A can be solved by using any algo which solves B
- If A α B, B is more difficult
- 將問題 A 先轉換成問題 B,並解決問題 B,再將問題 B 轉換成問題 A
- T(tr~1~) + T(tr~2~) < T(B)
- T(A) <= T(tr~1~) + T(tr~2~) + T(B) = O(T(B))
- 轉換時間很小 T(tr)

- example :
- sorting α convex hull problem
- Ω(convex) >= Ω(sorting) - Ω(α)
- Ω(n log n)
- 3,2,1,4 -> (3,9), (2,4), (1,1), (4,16) -> (1,1), (2,4), (3,9), (4,16) -> 1,2,3,4
- sorting α Euclidean MST
- 3,2,1,4 -> (3,0), (2,0), (1,0), (4,0) -> (1,0), (2,0), (3,0), (4,0) -> 1,2,3,4
## Ch3 - Greedy
### When can we use greedy method
- suppose that a problem can be solved by a **sequence** of decisions 可將問題拆分成有階段性且獨立的多個子問題
- the greedy method has that each decision is **locally optimal** 每個子問題中都會有最佳解
- these locally optimal solutions will finally add up to a **globally optimal** solution 能夠將這些子問題的最佳解組成原問題的最佳解
- not all optimization problems can be solved by the greedy method
- example :
- `shortest paths on a special graph` -> greedy solved
- 
- `shortest paths on a multi-stage graph` -> greedy can't solved
- 
### Minimum spanning trees (MST)
- it may be defined on euclidean space [平面圖型] points or on a graph
- a spanning tree [生成樹 - 所有的邊都可將點連起來] with the smallest total weight
-  
- algo
- **Kruskal's algo**
- time complexity `O(|E| log|E|)`
- 每一次都選擇最小的邊,並同時判斷將兩邊的點相連時是否會形成迴圈,如果不會的話就兩點連起
- 
- **Prim's algo**
- tiem complexity `O(|V|`^2^`)`
- 每一次都要選擇一個新(還未被選擇過)的節點,並依照每次最小邊來選
- 
### The single-source shortest path problem

- algo
- **Dijkstra's algorithm**
- time complexity : O(n^2^)
### The 2-way merging problem
- 將兩個已排序好的數列,整合為一
- linear 2-way merge algorithm is m~1~+m~2~-1

- 某次都選最短的做合併,會最有效率

- apply this characteristic to **Huffman code**
### Huffman codes
- 字元出現頻率高 -> 使用少一點的 bits 做編碼,解決翻譯和解碼時,成本高的問題
- 依照 symbols 的出現頻率做 merging tree,再將左邊填 0,右邊填 1

### the minimal cycle basis problem
- Def : A cycle basis of a graph is a set of cycles such that every cycle in the graph can be generated by applying XOR on some cycles of the basis
- Minimal cycle basis : smallest total weight of all edges in this cycle.
- `#` of cycles in a cycle basis
- k (`#` of dot lines)
- (`#` of all edges) - (`#` of all spanning tree)
- |E| - (|V| - 1)
- |E| - |V| + 1
### the 2-terminal one to any special channel routing problem
- Def : Given two sets of terminals on the upper and lower rows, respectively, we have to connect each upper terminal to the lower row in a one to one fashion. This connection requires that `#` of tracks used is minimized.
- 
- 三角形為需要連線的點 (one to one)
- each upper triangle point must connect to lower triangle point
- upper terminal △ 數量 < lower row △ 數量
- 規定 : 不能有重疊的電線,每一條電線都不會碰在一起
- to minimize the density. The density is a lower bound of `#` of tracks.
### the knapsack problem
- 和 `0/1 knapsack problem` 不一樣
- `0/1 knapsack problem` : 只有定義 拿 或 不拿,是 NP hard problem
- `the knapsack problem` : 物品是可以被切割的,可以只拿部分,不用到全拿(1),也不用到完全不能拿(0),是 easy problem
- 
- the greedy algorithm :
- step 1 : sort p~i~/w~i~ into nonincreasing order
- step 2 : put the objects into the knapsack according to the sorted sequence as possible as we can.
- 題目:n = 3, M = 20, (p~1~, p~2~, p~3~) = (25, 24, 15), (w~1~, w~2~, w~3~) = (18, 15, 10)
- 分析:
- n 為物品數量, M 為背包可以乘載的重量,p 為價值,w 為重量
- Steps :
- 先計算每個物品的一單位重量,能有多少價值,並根據價值來排序擺放進背包的順序
- p~1~ / w~1~ = 25/18 = 1.32 (物品一:1 個重量,有 1.32 個價值)
- p~2~ / w~2~ = 24/15 = 1.6 (物品二:1 個重量,有 1.6 個價值)
- p~3~ / w~3~ = 15/10 = 1.5 (物品三:1 個重量,有 1.5 個價值)
- 所以根據價值大小,擺放順序應該為:物品二 -> 物品三 -> 物品一
- 根據物品可以乘載的重量 (M) 來擺放物品,看能放多少物品
- 先放物品二 (w~2~ = 15) : 20(M) - 15(w~2~) = 5(背包目前剩下五個重量可以被乘載)
- 再放物品三 (w~3~ = 10) : 5 - 5(1/2 w~3~) = 0 (背包容量只夠放 1/2 個物品三)
- 算價值總和
- p~2~ + 1/2 p~3~ = 24 + 1/2 * 15 = 31.5
- optimal solution : x~1~ = 0, x~2~ = 1, x~3~ = 1/2
<!-- ## HW
[演算法作業 HW1](/zdm6xV_QQt2kdy1MJDCjQA)
[演算法作業 HW2](/QH7skOopT0qUZh8M9Bbo1w)
[演算法作業 HW3](/uxqjpZJUTpKA2UZasW940A)
[演算法作業 HW4](/wM5T497GTM6Dxasku94GUQ)
[演算法作業 HW5](/UY_HvrlJR8yu7_S58SlTfQ)
[演算法作業 HW6](/1OiDV-UbRzaAGIGXL_axAw) -->