Lecture 06: Dynamic programming

# Lecture 06: Dynamic programming > 課程內容 : 交通大學開放式課程交大電機江蕙如教授 > 參考書目 : Algorithm Design(2006), J. Kleinberg and E. Tardos ## Content * [Basic](#Basic) * [Weighted interval scheduling: a recursive procedure](#Weighted-interval-scheduling-a-recursive-procedure) * [1. Design a recursive algorithm](#1-Design-a-recursive-algorithm) * [1.1 Induction on time](#11-Induction-on-time) * [1.2 Induction on interval index](#12-Induction-on-interval-index) * [1.3 Implement of index-based induction](#13-Implement-of-index-based-induction) * [2. Memoization: top-down](#2-Memoization-top-down) * [3. Iteration: bottom-up](#3-Iteration-bottom-up) * [4. Summary: memoization vs. iteration](#4-Summary-memoization-vs-iteration) * [4.1 Keys for DP](#41-Keys-for-DP) * [4.2 Algorithm paradigms](#4-2-Algorithm-paradigms) * [Appendix](#Appendix) * [1. Finonacci sequence](#1-Finonacci-sequence) * [Subset sums and Knapsacks: adding a variable](#Subset-sums-and-Knapsacks-adding-a-variable) * [1. Subset sum](#1-Subset-sum) * [1.1 DP: false start](#11-DP-false-start) * [1.2 Improvement: add a new variable](#12-Improvement-add-a-new-variable) * [2. Knapsack problem](#2-Knapsack-problem) * [Shortest paths in a grpah](#Shortest-paths-in-a-grpah) * [1. Review: Dijkstra's algorithm](#1-Review-Dijkstras-algorithm) * [2. Bellman-Ford algorithm](#2-Bellman-Ford-algorithm) * [3. Implement: iteration](#3-Implement-iteration) * [4. Negative cycles detection](#4-Negative-cycles-detection) ## Basic Dynamic Programming(DP) 並非 coding 的方式，而是一種在數學上最佳化的機制，由 Richard E. Bellman 在 1953 年發明。 DP 的核心想法是藉由以下方式隱晦地(implicitly)探索所有可能的解 : * 將大問題分解成一系列的小問題(subprpblems) * 從小問題開始逐層建立正確答案與 divide and conquer 不同的是 divide and conquer 是將明確地定義小問題再解決，但 DP 解決 subproblem 前不會明確的分析它。 > [!Important] **Core idea** > Dynamic Programming 的核心想法就是數學歸納法(mathmatical induction) ## Weighted interval scheduling: a recursive procedure > **Example :** > 假設有 n 個給定開始(start)/結束(finish)時間的 interval 與對應的 weight 需要執行 > * Interval $i$ : $[s_i, f_i),\ 1 \le i \le n$ > * Weight : $v_i,\ 1 \le i \le n$ > > 需要找到一個由不重疊的 interval 組成的子集 $S$ 使得他有最大的 weight sum。 > ![image](https://hackmd.io/_uploads/rJi1SD_c1g.png) 以 greedy algorithm 的角度來說並無法得到一個最佳解: * 將完成時間遞增排序 * 每次依序挑選一個 interval，若重疊則放棄以 greedy algorithm 進行(下圖為例)會得到的解 $= \{I_1, I_3\}$ 且 weight sum $=v_1 + v_3 = 2$。但實際上的最佳解應該 $= \{ I_2\}$，weight sum = $v_2 = 3$。 ![image](https://hackmd.io/_uploads/SkUGvv_91g.png) ### 1. Design a recursive algorithm 一開始有提過 DP 的核心概念就是 mathmatical induction，因此在進行 DP 找最佳解之前該考慮的是要依據何種 base condition 進行歸納。 #### 1.1 Induction on time 以下圖簡化後的 Example 為例，假設先對時間做歸納：考慮小一點的時間 $t'$ 的解，再加下一段時間差的貢獻。此時要考慮的問題是「時間範圍應該要怎麼選 or 怎麼切割」 (1) 考慮最後一個 $I_6 = 1$ (左圖) 若最後一個 $I_6 = 1$ 有貢獻，則前一刻的時間 $t'$ 應該選在 $I_3 = 4$ 之後與 $I_4 = 6$ 之前 (2) 不考慮最後一個 $I_6 = 1$ (右圖) 若最後一個 $I_6 = 1$ 沒有貢獻，則前一個的時間 $t'$ 應該選在 $I_5 = 2$ 之後與 $I_6 = 1$ 之前 ![image](https://hackmd.io/_uploads/HJ3y6P_cyl.png) 結論：以時間做 induction 不是很好的做法，因為時間範圍 $\Delta t$ 不好做切割。 #### 1.2 Induction on interval index 第二種方式以 interval 的編號做歸納： * 先以完成時間對 interval 做排序(DP 的小技巧，主要用來簡化問題結構) * 令 $p(j) = i$ 表示最大的 index $i$ 使得 $I_i$ 與 $I_j$ 互斥，其中 $i < j$ * 若沒有 $I_i$ 與 $I_j$ 互斥，則 $p(j) = 0$ 與 time-based 的 induction 相同，觀察 $n$ 個 interval 的最佳解與 $n-1$ 個 interval 的最佳解的關係，也就是考慮第 $n$ 個 interval 的貢獻。 (1) 考慮最後一個 $I_6 = 1$ (左圖) 若最後一個 $I_6 = 1$ 有貢獻，則前一個 index 應該取到 $I_3$ (2) 不考慮最後一個 $I_6 = 1$ (右圖) 若最後一個 $I_6 = 1$ 沒有貢獻，則前一個 index 應該取到 $I_5$ ![image](https://hackmd.io/_uploads/rJvHQOdcyg.png) 若以符號表示： * 令 $O_j$ 表示前 $j$ 個 interval 的最佳解的集合 * 令 $OPT_j$ 表示前 $j$ 個 interval 的最佳解的值以上述圖片為例，若有 6 個 interval $I_1, ..., I_6$，則此 6 個 interval 的解集合為 $O_6$： * 考慮是否包含最後一個 $I_6$ 的貢獻 * 包含 $I_6$，則 $O_6 = \{I_6, O_3\}$ * 不包含 $I_6$，則 $O_6 = O_5$ * 最佳解為 $OPT(6) = \max\{\{v_6 + OPT(3)\},\ OPT(5)\}$ 因此前 $j$ 個 interval 的最佳解的通式為 $$ OPT(j) = \max\{\{v_j + OPT(p(j))\},\ OPT(j-1)\} $$ #### 1.3 Implement of index-based induction 參考前面所推得的結論： $$ OPT(j) = \max\{\{v_j + OPT(p(j))\},\ OPT(j-1)\} $$ > [!Tip] **Algorithm: index-based induction** > ![image](https://hackmd.io/_uploads/rymyPuu9kg.png) 將上述的 pseudo code 展開後會得到一個樹狀的遞迴結構 <img src="https://hackmd.io/_uploads/Hyb_vdu9Jx.png" width=300> ### 2. Memoization: top-down 從上面展開後的樹狀遞迴結構可以發現會有非常多重複計算的項目(下圖紅色區塊)，因此可以把已經計算過的值儲存下來(`M[j]`)，之後遇到重複項直接使用，實際上只有計算左側的 $OPT(6)$ 到 $OPT(1)$。這種方法稱為 memoization，是一種 top-down 結構的 dynamic programming。 <img src="https://hackmd.io/_uploads/ryeKF_uc1e.png" width=300> 修改過後的演算法如下： > [!Tip] **Algorithm: memoization** > ![image](https://hackmd.io/_uploads/rJ-G5uOq1x.png) ### 3. Iteration: bottom-up 另一種使用迭代方式進行，從底層的 `M[0]` 開始計算至 `M[n]`，是一種 bottom-up 結構的 dynamic programming。 > [!Tip]**Algorithm: iteration** > ![image](https://hackmd.io/_uploads/BJ3Q6aO5Jx.png) Bottom-up 的迭代類似查表格的動作，如下圖所示。每次計算/填寫 `M[i]` 時會需要往前做查找，但需要確保前面的值已經計算過，若沒有計算則代表 DP 是錯的，可能是 induction 的條件不正確。 ![image](https://hackmd.io/_uploads/S1CnTTdqJg.png) ### 4. Summary: memoization vs. iteration | Memoization | Iteration | | :-: | :-: | | Top-down | Bottom-up | | Recursive algorithm | Iteration algorithm | | Compute what we need only | Construct solution from smallest subproblem to the largest one | #### 4.1 Keys for DP * DP 通常用在最佳化問題(但有少數例外) * DP 在有線性順序且無改變順序的問題上有不錯的表現 * DP 的組成 * Optimal substructure: 最佳解來自 subproblem 的最佳解 * Overlapping subproblem: subproblem 會不斷的重複出現，且 subproblem 的數量是多項式等級的 #### 4.2 Algorithm paradigms * Brute-force: 明確檢查所有可能解的集合 * Greedy: 每個步驟選擇當下的最佳解，且選擇以後不會再反悔。每個步驟的最佳解都會隊最後的最佳解有貢獻 * Divide and conquer: 將大問題拆解成小問題，但問題的本質不變，最後再將小問題的解做合併(最難) * Dynamic programming: 將大問題拆解成一系列重複的小問題，透過重複的小問題建構出更大的小問題的解 ## Appendix ### 1. Finonacci sequence 費氏數列： $$ F_n = F_{n-1} + F_{n-2},\ \text{where}\ F_0=0\ \text{and}\ F_1=1 $$ 本身也是一種具有遞迴關係的問題 > [!Tip] **Algorithm: finonacci sequence** > ![image](https://hackmd.io/_uploads/ry-0URtcJg.png) 將費氏數列的遞迴結構展開後如下圖所示，會發現到有很多項目被重複計算。將問題簡化以後同樣可以用 DP 的兩個方法來求費氏數列的解。 > [!Warning] > 但實際上費氏數列並不算是正規的 DP 問題，因為只是使用 DP 的技巧計算數列的任意項，但費氏數列本身沒有牽涉到最佳化的問題。 (1) Memoization * 將計算過的值儲 * 每次遞迴之前先檢查該數值是否已經被計算過 > [!Tip] **Algorithm: finonacci sequence(memoization ver.)** > * 陣列 `f[n]` 儲存已經算過的函數值 $f(n)$，初始化 $f(i) = -1,\ i = 0, ..., n$ > * 設定初始值 `f[0] = 0` 與 `f[1] = 1` > ![image](https://hackmd.io/_uploads/BkOJP0tqyg.png) (2) Iteration * 從最簡單的 subproblem 開始逐步建構完整問題的解 > [!Tip] **Algorithm: finonacci sequence(iteration ver.)** > * 陣列 `f[n]` 儲存已經算過的函數值 $f(n)$，初始化 $f(i) = -1,\ i = 0, ..., n$ > * 設定初始值 `f[0] = 0` 與 `f[1] = 1` > ![image](https://hackmd.io/_uploads/HJBxP0Yqkl.png) ## Subset sums and Knapsacks: adding a variable ### 1. Subset sum > Question description: > * Given > * 有 $n$ 個物品的集合與一個背包(knapsack) > * 物品 $i$ 的重量為 $w_i > 0$ > * 背包限重 $W$ > * Goal > * 將背包填滿使得總重最大 > * $i.e., \max \{\Sigma_{i \in S}\ w_i\}$ 且 $\Sigma_{i \in S}\ w_i < W$ 以下方表格為例(W = 11)： | Item | Weight | | :-: | :-: | | 1 | 1 | | 2 | 2 | | 3 | 5 | | 4 | 6 | | 5 | 7 | 若使用 greedy algorithm 找每個步驟的最佳解： $$ w_5 + w_2 + w_1 = 7 + 2 + 1 = 10 $$ 但可以明顯發現 10 並不是最佳解，最佳解應該是 $w_3 + w_4 = 5 + 6 = 11$。 #### 1.1 DP: false start 以 dynamic programming 的方法找最佳解，首先先列出最佳化問題的格式 $$ \begin{align} & \max\ \Sigma_{i \in S}\ w_i\\ & \text{s.t.}\ \Sigma_{i \in S\ }w_i < W,S \subseteq\{1, 2, ..., n\} \end{align} $$ Subproblem 的形式為: $OPT(i) = \max_S\Sigma_{j \in S}\ w_j,S \subseteq \{1, 2, ..., i\}$，表示物品 $1$ 到物品 $i$ 的總重。設最佳解: $OPT(n)$ 表示物品 $1$ 到物品 $n$ 的總重，且解集合為 $O$。依照 mathmatical induction 的概念討論 2 種情況: (1) $w_n$ 沒貢獻 $(i.e., n \notin O)$ 此時最佳解 $OPT(n) = OPT(n-1)$ (2) $w_n$ 有貢獻 $(i.e., n \in O)$ 此時最佳解 $OPT(n) = w_n + OPT(n-1)$ 但這個 indution 的推論是錯誤的，why ? 因為在第二段的討論中，當 $w_n$ 有貢獻添加到解集合 $O$ 之中時，可能會造成 $OPT(n)$ 計算出的背包總重超出 $W$ 的限制。因此第二階段的討論中，若要考慮 $w_n$ 的貢獻，則需要在前一階段 $OPT(n-1)$ 中加入總重的限制必須 $< W - w_n$ #### 1.2 Improvement: add a new variable 同樣的最佳化問題形式 $$ \begin{aligned} & \max\ \Sigma_{i \in S}\ w_i\\ & \text{s.t.}\ \Sigma_{i \in S\ }w_i < W,S \subseteq\{1, 2, ..., n\} \end{aligned} $$ 此時 subproblem 需要多一個限制 $$ OPT(i, W) = \max_S \Sigma_{j \in S}\ w_j, S \subseteq \{1, 2, ..., i\}, \Sigma_{j \in S}\ w_j < W $$ 此時的最佳解 $OPT(n)$ 同樣討論 2 種狀況: (1) $w_n$ 沒貢獻 $(i.e., n \notin O)$ 與前面相同，此時最佳解 $OPT(n) = OPT(n-1, W)$ (2) $w_n$ 有貢獻 $(i.e., n \in O)$ 此時最佳解 $OPT(n, W) = w_n + OPT(n-1, W-w_n)$ 最後遞迴關係式如下: $$ OPT(i, W) = \begin{cases} &0 &\text{, if } i \text{ or } w = 0\\ &OPT(i-1, W) &\text{, if } w_i > W\\ &\max \{OPT(i-1, W),\ w_i + OPT(i-1, W-w_i)\} &\text{, otherwise} \end{cases} $$ 其中第 1 式表示物品為 $0$ 或重量限制為 $0$ 的邊界條件。第 2 式表示下一個物品重量 $w_i$ 超過重量限制 $W$，所以最佳解為前一步的最佳解。 > [!Tip] **Algorithm: DP for subset sum(iteration ver.)** > * 以陣列 `M[i, w]` 表示在物品數量為 $i$ 且重量限制為 $W$ 的情況下的最佳解 > ![image](https://hackmd.io/_uploads/SysMPCF51e.png) 建構出來的表格下圖所示 ![image](https://hackmd.io/_uploads/SJm4PAFqJl.png) ### 2. Knapsack problem > Question description: > * Given > * 有 $n$ 個物品的集合與一個背包(knapsack) > * 物品 $i$ 的重量為 $w_i > 0$ 且價值為 $v_i > 0$ > * 背包限重 $W$ > * Goal > * 將背包填滿使得總價值最大 > * $i.e., \max \Sigma_{i \in S}\ v_i$ 最佳化問題形式為 $$ \begin{align} &\max \Sigma_{i \in S}\ v_i\\ &\text{s.t. } \Sigma_{i \in S}\ w_i < W, S \subseteq\{1, 2, ..., n\} \end{align} $$ 以下方表格為例(W = 11)： | Item | Value | Weight | | :-: | :-: | :-: | | 1 | 1 | 1 | | 2 | 6 | 2 | | 3 | 18 | 5 | | 4 | 22 | 6 | | 5 | 28 | 7 | 若使用 greedy algorithm 找每個步驟的最佳解： $$ v_5 + v_2 + v_1 = 28 + 6 + 1 = 35 $$ 但可以明顯發現 35 並不是最佳解，最佳解應該是 $v_3 + v_4 = 18 + 22 = 40$。 Knapsack problem 與 subset sum problem 幾乎一樣，但前者是考慮物品的總價值最大，後者是考慮總重量最大，而條件限制一樣是重量須 $< W$。因此只要將 subset sum problem 中第 $i$ 個物品的重量 $w_i$ 項改為價值 $v_i$ 項即可 $$ OPT(i, W) = \begin{cases} &0 &\text{, if } i \text{ or } w = 0\\ &OPT(i-1, W) &\text{, if } w_i > W\\ &\max \{OPT(i-1, W),\ v_i + OPT(i-1, W-w_i)\} &\text{, otherwise} \end{cases} $$ ## Shortest paths in a grpah ### 1. Review: Dijkstra's algorithm 考慮一個 $G = (V, E)$ 以及起點 $s$ 和終點 $t$。且任意兩節點之間的 edge $(u, v) \in E$ 的權重為 $c_{uv} \ge 0$。要找一個從 $s$ 到 $t$ 的路徑 $P$ 使得 $c(P) = \Sigma_{(u, v) \in P}\ c_{uv}$ 因為 Dijkstra's algorithm 只能用在非負 edge weight 的情況，當考慮到 edge weight 有負數時(以下圖為例): ![image](https://hackmd.io/_uploads/HJ1LDRFcyl.png) Dijkstra's algorithm 的最短路徑走法為 s -> t: 1 但實際上的最短路徑為 s -> a -> b -> t: -1 就算把負數 edge weight 的情況做等值放大後 Dijkstra's algorithm 找到的最短路徑依然不會等價於原始情況，因為最佳解 s -> a -> b -> t 會經過 3 個 edges，每個 edges 都會 +6，共加了 3 $\cdot$ 6 = 18。 ![image](https://hackmd.io/_uploads/ry5Dw0Kqyg.png) 但 Dijkstra's algorithm 的路徑只走了一個 edges，只加了一個 6。不同的縮放比例導致 Dijkstra's algorith 的解不會等價於最佳解。 ### 2. Bellman-Ford algorithm Bellman-Ford algorithm 是一種使用 DP 找兩節點之間最短路徑的方法，且適用在 edge weight 有負數時。可以使用節點或邊做 induction 的推論。 > [!Note] **Lemma** > If $G = (V, E)$ has no negative cycles, then there is a shortest path from $s$ to $t$ that is simple (i.e. not repeat nodes) and hence has at most $n-1$ edges, where $n = |V|$. Negative cycle 如下右圖所示，表示迴圈中的 weight sum < 0 $(i.e., c(C) < 0)$。若出現 negative cycle，則路徑選擇時會不斷進入這個迴圈中，每走一圈 weight sum 就不斷下降，最後完全走不出來。 ![image](https://hackmd.io/_uploads/S1eRDCtcJe.png) 假設從起點 $s$ 到終點 $t$ 並以 edges 做歸納且固定終點 $t$，則最佳解 $OPT(n-1, s)$ 表示從節點 $S$ 到終點 $t$ 之間最多經過 $n-1$ 個 edges 的最短距離路徑。因此 subproblem 的最佳解形式為 $OPT(i, v)$，表示從節點 $v$ 到終點 $t$ 之間的路徑 $P$ 最多經過 $i$ 個 edges 的最短距離。同樣討論 2 個狀況: (1) 前一個 edge 沒有貢獻(下圖左) 前一個 edge 沒有貢獻(節點 v -> 節點 w)，表示路徑 $P$ 最多使用後面 $i-1$ 個 edges，因此 $OPT(i, v) = OPT(i-1, v)$。 (2) 第一個 edge 有貢獻(下圖右) 前一個 edge 有貢獻(節點 v -> 節點 w)，表示路徑 $P$ 最多使用 $i$ 個 edges，因此 $OPT(i, v) = c_{vw} + OPT(i-1, w)$。 ![image](https://hackmd.io/_uploads/SyY7d0Yqyl.png) > [!Important] > 這邊的做法是固定終點位置從終點位置往回推，與一開始 DP 從頭開始歸納的推論方向不同，所以會考慮前一個 edge 有無做出貢獻而不是最後一個 edge 有沒有貢獻。綜合兩個討論，遞迴關係式為: $$ OPT(i, v) = \begin{cases} &0 &,\text{if } i=0 \text{ and } v = t\\ &\infty &,\text{if } i=0 \text{ and } v \neq t\\ &\min\{OPT(i-1, v),\ \min_{(v, w) \in E} \{c_{vw} + OPT(i-1, w)\}\} &,\text{otherwise} \end{cases} $$ (其中 i=0 表示沒有經過 edges，v=t 表示自己到自己最短距離) > [!Important] > Bellman-Ford algorithm 透過 DP 找到的最短路徑，可以使用預備定理證明此最短路徑必為最多使用 $n-1$ 個 edges 的簡單路徑。 ### 3. Implement: iteration Bottom-up 的迭代版本實作如下 > [!Tip] **Algorithm: Bellman-Ford algorithm(iteration ver.)** > * 以陣列 `M[n-1, v]` 表示每個 subproblem 的最佳解，例如 `M[i, v]` 表示從節點 $v$ 到終點 $t$ 且最多使用 $i$ 個 edges 的最短距離 > * `M[0, t] = 0` 表示終點 $t$ 到終點 $t$ 的邊界條件 > * `M[0, v] = inf` 表示從節點 $v$ 到終點 $t$ 且不使用 edge 的邊界條件 > > ![3200A66B-F0B7-47B9-8589-CD6F30DE22BB](https://hackmd.io/_uploads/BJ6_d0tcJg.jpg) 依照上述的演算法所建構出的表格如下，實際手動操作會發現填表格時是以 column by column 的方式依序填寫。 ![image](https://hackmd.io/_uploads/H1WAuRY5kl.png) ### 4. Negative cycles detection Bellman-Ford algorithm 的使用前提是圖上不能包含 negeative cycles，但反過來說它也可以用來檢查圖中是否存在 negative cycle。 > [!Note] **Thm.** > If $OPT(n, v) = opt(n-1, v)$ for all $v$, then no negative cycles Bellman-Ford algorithm 保證了經過 $n-1$ 次的迭代後可以找到最短路徑(穩定狀態)，此時繼續做迭代也不可能找到更短路徑。這表示圖中不可能存在 negative cycles，因為如果存在 negative cycle，則繼續迭代理論上會得到更小的路徑長。 > [!Note] **Thm.** > If $OPT(n, v) < OPT(n-1, v)$ for some $v$，then shortest path contains a negetive cycle 換句話說，如果經過 $n$ 次迭代的最短距離可以推翻 $n-1$ 次迭代的最短距離，表示此圖/路徑會存在 negative cycle，因為繼續做迭代可以得到更短的距離。