--- tags: INFO2022 --- # 2022-09-20 課堂討論簡記 - [ ] video: [叫 foodpanda「送~」](https://youtu.be/f0Nr0XywxnA) * 雖然是廣告短片 (本課程謝絕業配),但卻刻劃著職場生態,不是你委曲求全,公司主管就會理你,後者寧可花更多錢找其他人 * 你在簡歷中宣揚自身的智慧,他人可能會認定是[小惠](https://dict.revised.moe.edu.tw/dictView.jsp?ID=106818) - [ ] Nic: [學習資料結構、演算法在工作上真的有用嗎?](https://youtu.be/-Y_4rOXeqHQ) * [冷次定律 (Lenz's law)](https://zh.m.wikipedia.org/wiki/%E6%A5%9E%E6%AC%A1%E5%AE%9A%E5%BE%8B) * 何時用得到 linked list merge 呢? + [你所不知道的 C 語言: linked list 和非連續記憶體](https://hackmd.io/@sysprog/c-linked-list) + memory allocator! * 成大某教授:「很多知識和理論之所以讓你覺得沒有用,其實是因為你尚未到達需要他們的高度」 - [ ] [臺灣人從事軟體行業直接出國工作,機會成本最低的國家大概是日本和新加坡](https://www.facebook.com/permalink.php?story_fbid=pfbid0aU7pvpFzS5GJtaaDwTE3LvW7wiwhmBdMqXt5HsV36JiPjLhdT18FaVCnigPJG5QXl&id=100000201416442) ## 課程問答 Q1: Merge Two Sorted Lists 真實案例 linux/list.h 是 Linux os 中實用的 doubly linked list 封裝,只要在自定義的結構中加入 struct list_head,就能使用 linux 所提供的一系列 api,像是 `list_add `, `list_add`,`list_empty` 等... 在 linux 原始碼 lib/list_sort.c 可以看到以下 **merge** 實作,功能是將兩個排序好的 linked list 給整合 :::spoiler merge() ```c /* * Returns a list organized in an intermediate format suited * to chaining of merge() calls: null-terminated, no reserved or * sentinel head node, "prev" links not maintained. */ __attribute__((nonnull(2,3,4))) static struct list_head *merge(void *priv, list_cmp_func_t cmp, struct list_head *a, struct list_head *b) { struct list_head *head, **tail = &head; for (;;) { /* if equal, take 'a' -- important for sort stability */ if (cmp(priv, a, b) <= 0) { *tail = a; tail = &a->next; a = a->next; if (!a) { *tail = b; break; } } else { *tail = b; tail = &b->next; b = b->next; if (!b) { *tail = a; break; } } } return head; } ``` ::: \ 而這個 function 在 **list_sort** 有被使用到,用以實作 linked list 高效率 merge sort :::spoiler list_sort ```c __attribute__((nonnull(2,3))) void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp) { struct list_head *list = head->next, *pending = NULL; size_t count = 0; /* Count of pending */ if (list == head->prev) /* Zero or one elements */ return; /* Convert to a null-terminated singly-linked list. */ head->prev->next = NULL; /* * Data structure invariants: * - All lists are singly linked and null-terminated; prev * pointers are not maintained. * - pending is a prev-linked "list of lists" of sorted * sublists awaiting further merging. * - Each of the sorted sublists is power-of-two in size. * - Sublists are sorted by size and age, smallest & newest at front. * - There are zero to two sublists of each size. * - A pair of pending sublists are merged as soon as the number * of following pending elements equals their size (i.e. * each time count reaches an odd multiple of that size). * That ensures each later final merge will be at worst 2:1. * - Each round consists of: * - Merging the two sublists selected by the highest bit * which flips when count is incremented, and * - Adding an element from the input as a size-1 sublist. */ do { size_t bits; struct list_head **tail = &pending; /* Find the least-significant clear bit in count */ for (bits = count; bits & 1; bits >>= 1) tail = &(*tail)->prev; /* Do the indicated merge */ if (likely(bits)) { struct list_head *a = *tail, *b = a->prev; a = merge(priv, cmp, b, a); /* Install the merged result in place of the inputs */ a->prev = b->prev; *tail = a; } /* Move one element from input list to pending */ list->prev = pending; pending = list; list = list->next; pending->next = NULL; count++; } while (list); /* End of input; merge together all the pending lists. */ list = pending; pending = pending->prev; for (;;) { struct list_head *next = pending->prev; if (!next) break; list = merge(priv, cmp, pending, list); pending = next; } /* The final merge, rebuilding prev links */ merge_final(priv, cmp, head, pending, list); } ``` ::: \ 根據 May 15, 2019 的 commit 紀錄指出,高效率 merge sort 概念源自論文 [Queue-mergesort](https://www.sciencedirect.com/science/article/pii/002001909390088Q) :::spoiler commit 紀錄 > lib/list_sort: optimize number of calls to comparison function > > CONFIG_RETPOLINE has severely degraded indirect function call > performance, so it's worth putting some effort into reducing the number > of times cmp() is called. > > This patch avoids badly unbalanced merges on unlucky input sizes. It > slightly increases the code size, but saves an average of 0.2*n calls to > cmp(). > > x86-64 code size 739 -> 803 bytes (+64) > > Unfortunately, there's not a lot of low-hanging fruit in a merge sort; > it already performs only n*log2(n) - K*n + O(1) compares. The leading > coefficient is already at the theoretical limit (log2(n!) corresponds to > K=1.4427), so we're fighting over the linear term, and the best > mergesort can do is K=1.2645, achieved when n is a power of 2. > > The differences between mergesort variants appear when n is *not* a > power of 2; K is a function of the fractional part of log2(n). Top-down > mergesort does best of all, achieving a minimum K=1.2408, and an average > (over all sizes) K=1.248. However, that requires knowing the number of > entries to be sorted ahead of time, and making a full pass over the > input to count it conflicts with a second performance goal, which is > cache blocking. > > Obviously, we have to read the entire list into L1 cache at some point, > and performance is best if it fits. But if it doesn't fit, each full > pass over the input causes a cache miss per element, which is > undesirable. > > While textbooks explain bottom-up mergesort as a succession of merging > passes, practical implementations do merging in depth-first order: as > soon as two lists of the same size are available, they are merged. This > allows as many merge passes as possible to fit into L1; only the final > few merges force cache misses. > > This cache-friendly depth-first merge order depends on us merging the > beginning of the input as much as possible before we've even seen the > end of the input (and thus know its size). > > The simple eager merge pattern causes bad performance when n is just > over a power of 2. If n=1028, the final merge is between 1024- and > 4-element lists, which is wasteful of comparisons. (This is actually > worse on average than n=1025, because a 1204:1 merge will, on average, > end after 512 compares, while 1024:4 will walk 4/5 of the list.) > > Because of this, bottom-up mergesort achieves K < 0.5 for such sizes, > and has an average (over all sizes) K of around 1. (My experiments show > K=1.01, while theory predicts K=0.965.) > > There are "worst-case optimal" variants of bottom-up mergesort which > avoid this bad performance, but the algorithms given in the literature, > such as queue-mergesort and boustrodephonic mergesort, depend on the > breadth-first multi-pass structure that we are trying to avoid. > > This implementation is as eager as possible while ensuring that all > merge passes are at worst 1:2 unbalanced. This achieves the same > average K=1.207 as queue-mergesort, which is 0.2*n better then > bottom-up, and only 0.04*n behind top-down mergesort. > > Specifically, defers merging two lists of size 2^k until it is known > that there are 2^k additional inputs following. This ensures that the > final uneven merges triggered by reaching the end of the input will be > at worst 2:1. This will avoid cache misses as long as 3*2^k elements > fit into the cache. > > (I confess to being more than a little bit proud of how clean this code > turned out. It took a lot of thinking, but the resultant inner loop is > very simple and efficient.) > > Refs: > Bottom-up Mergesort: A Detailed Analysis > Wolfgang Panny, Helmut Prodinger > Algorithmica 14(4):340--354, October 1995 > https://doi.org/10.1007/BF01294131 > https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.5260 > > The cost distribution of queue-mergesort, optimal mergesorts, and > power-of-two rules > Wei-Mei Chen, Hsien-Kuei Hwang, Gen-Huey Chen > Journal of Algorithms 30(2); Pages 423--448, February 1999 > https://doi.org/10.1006/jagm.1998.0986 > https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380 > > Queue-Mergesort > Mordecai J. Golin, Robert Sedgewick > Information Processing Letters, 48(5):253--259, 10 December 1993 > https://doi.org/10.1016/0020-0190(93)90088-q > https://sci-hub.tw/10.1016/0020-0190(93)90088-Q > > Feedback from Rasmus Villemoes <linux@rasmusvillemoes.dk>. > > Link: http://lkml.kernel.org/r/fd560853cc4dca0d0f02184ffa888b4c1be89abc.1552704200.git.lkml@sdf.org > Signed-off-by: George Spelvin <lkml@sdf.org> > Acked-by: Andrey Abramov <st5pub@yandex.ru> > Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Cc: Daniel Wagner <daniel.wagner@siemens.com> > Cc: Dave Chinner <dchinner@redhat.com> > Cc: Don Mullis <don.mullis@gmail.com> > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > @torvalds > George Spelvin authored and torvalds committed on 15 May 2019 ::: \ Mordecai J. Golin, Robert Sedgewick **Queue-Mergesort** 論文裡的示意圖 ,方法為把 linked list 視為一個 queue。每次的 merge 對象為 queue 中的最前面二個節點,merge 後放到 tail 並視為一個節點,重複步驟直到全部 merge 完成。 > ![](https://i.imgur.com/8M9b1DX.png) 因此只要有用到 list_sort 的地方,背後都有用到 merge two sorted list 的概念 Q2: 範例一缺失 (1) Ans: 在說明題目時未提到 input 的 data type,以及沒有提到 Binary Tree 類型 (constraint),例如說:是 complte binary tree 還是 full binary tree...等。最後,沒有說明 node 的型態為何。 Q3 - 1: 範例一缺失 (2) Ans: * 🐶 interviewer: * 開頭題目說明的環節,建議一開始就搭配 example 1 的圖片優先說明本題目會給一棵 binary tree,每個節點存放整數數值。從 root node 至 leaf node 可以連成一條路徑,例如:「5-4-11-2」是一條 root-to-leaf 路徑,並且我們可以將 root-to-leaf 的每個節點數值加總得到 **targetSum**,而且加總的路徑是單向不可折返。 <br>![](https://i.imgur.com/zrkvA8U.jpg =50%x)<br>接著表達本題的需求是「要 interviewee 寫出一個 function 來判斷給定的 targetSum 是否存在於 binary tree 之中」<br>並且應明確表達**第一個參數會給這棵 binary tree 的 root,第二個參數是題目欲判斷的 targetSum**。 * 上課筆記: [1:27](https://youtu.be/vcccBHvvoJw?t=87) example 的部分應由 interviewer 進行解說,而非 interviewee。 * [3:13](https://youtu.be/vcccBHvvoJw?t=193) 以我觀察 google coding interview 通常不會讓 interviewee 直接寫程式或 pseudocode,而是要先讓 interviewer 聽懂 interviewee 的方法後 ([Google Coding Interview: 15:20-15:41](https://youtu.be/rw4s4M3hFfs?t=920)),再讓 interviewee 撰寫 pseudocode ([Google Coding Interview: 16:54-16:57](https://youtu.be/rw4s4M3hFfs?t=1004)),故 [3:16-3:23](https://youtu.be/vcccBHvvoJw?t=196) 應為 interviewer 的台詞。 * 🐼 interviewee: * [2:23](https://youtu.be/vcccBHvvoJw?t=123) 其實**可以直接借用 example 1 的圖片進行舉例提問**,節省時間,舉例也更具體。 * [3:28-3:42](https://youtu.be/vcccBHvvoJw?t=208) 先說明方法再寫程式是正確的,**建議可以先強調自己要使用的演算法名稱 (例如本題的 method 1 使用 BFS)**,此處對於 queue 的功能說明不夠明確,即便影片提到「佇列資料結構裡面存的型態應該是節點跟它減完 targetSum 的值」,若改說「**我預計使用 BFS 演算法,並於走訪各節點的時候,使用 queue 來存放走訪到的數值,並且這個數值並非直接儲存 node value,而是技巧性地儲存 targetSum 與 node value 的差**」應會更清楚。若有清楚說明 queue 的用途,後續說明 `Approach` 及 [6:37](https://youtu.be/vcccBHvvoJw?t=397) 撰寫程式碼時,interviewer 比較容易明白你操作 queue 的 pop() 或 push() 的時機與用意。 * [3:55](https://youtu.be/vcccBHvvoJw?t=235) 此處[影片中的 Panda 同學](https://hackmd.io/@sysprog/BkeIYF8-j#112-Path-Sum)應是想使用 `REACTO`,搭配 `Example` 輔助講解 `Approach`,但是純口頭說明演算法對聽者 (interviewer) 較難想像。<br>建議口頭說明的同時,於 Google Docs 打出 queue 的示意圖以輔助講解,類似下方的文字註解方式,當你要做 pop 或 push 時,亦可同時更改下方的內容。 ```c // queue // ------------------------- // (17-8), (17-4), (22-5) // ------------------------- ``` * 說明 `Approach` 時,建議用**程式註解的方式,先粗略寫出各步驟、策略,確認 interviewer 有跟上後,再寫出各步驟細部的 pseudocode 或 coding 實作。** (例如: [Google Coding Interview With A High School Student: 17:08-32:33](https://youtu.be/qz9tKlF431k?t=1028))<br>先寫演算法步驟的好處是後續寫 code 可以參考每個步驟,也能幫助 interviewer 跟上 interviewee 的節奏。 Q3 - 2: 範例一的其他解法 Ans: 我嘗試使用課程中老師所提示的 Python 的 [dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) 實作第三種方法,以下從 `REACTO` 的 `Approach` 步驟開始,並一邊說明 `Approach` 一邊用 example 1 作 `Example`: * Interviewee:「我認為可以使用**遞迴實作 DFS** 來走訪二元樹的所有節點,(參見下方註解框,可同步於 docs 寫下 **`DFS`**),並且從樹的底部也就是 leaf node,一路反向回傳路徑上的 current summation。並搭配 **dictionary 紀錄每條路徑的 current summation**,我可以用 dictionary key 來記錄 current node value summation、dictionary value 紀錄該 current summation 出現次數 (於 docs 寫下 `dict of (current sum, counts)`),最後遍歷整棵樹之後可以拿到所有路徑的 node value summation,若存在符合 TargetSum 的路徑就回傳 True,否則回傳 False」 ![](https://i.imgur.com/zrkvA8U.jpg =45%x) * Interviewee:「(搭配 example 1 範例圖,路徑 5-4-11-7),遞迴的 **base case** 就是當左右子節點都是 None 表示走到 leaf node,接著遞迴會從 ++node 7++ 將 value 7 包進 dictionary 回傳給 ++node 11++,接著走訪 ++node 2++ 也會將 node value 包進 dictionary 回傳,此時 ++node 11++ 可以將左右子節點回傳的兩個 dictionary Merge 成一個 dictionary,同時計算 ++node 11++ 的 current summation。」 * Interviewee:「++node 11++ 會新增兩個 current summation 至 dictionary,key 值是 node.value 減去左右節點的 node value (於 docs 寫下 `recursive step:...`),回傳到 root 時會先走訪右子樹,走訪完回到 root 的時候,dictionary 就能紀錄四筆路徑的 summation」 * 接者寫出演算法個步驟: (下方 `#1 ~ #5`) ``` # Main Approcch: DFS, dict of (current sum, counts) # recursive step: HT[node.val - sub_leftnode.val], HT[node.val - sub_rightnode.val] # 1) Use DFS traverse this tree # 2) When reach leaf node, make a dict. of map_key is node.val, map_value is count # 3) Get current sum merge dicts from left-subtree and right-subtree # 4) End of DFS, get all paths sum and counts # 5) if (HT[targetSum] > 0) return True else False ``` * `Code` 實作參考 ```python= class Solution: # T_whole(?) = O(1) + T_DFS(?) + O(1) # = O(n * log(n)) def hasPathSum(self, root: Optional[TreeNode], targetSum: int) -> bool: # O(1) # if tree is empty if not root: return False # Do DFS and get dict of (value sum, counts) record all paths HT_full = self.DFS(root) if HT_full.get(targetSum, 0) > 0: # if HT_full[targetSum] exist return True else: return False # T_DFS(?) = base case + recurrsive steps # = O(counts of null nodes) + (k * merge time) # = O(n) + O(k * 2^{k-1}) # k = log(n+1) # = O(n) + O(log(n) * n) # = O(n * log(n)) def DFS(self, node): # O(1) '''Base case: reach NULL child node''' if not node: return {} # O(?) = O(k * merge time) '''Recurrsive steps:''' # O(1) # reach leaf node, return a dict(): map_key is node.val, map_value is count of node.val (init. 1) if (not node.left) and (not node.right): return {node.val: 1} HT1 = self.DFS(node.left) HT2 = self.DFS(node.right) # T_MergeHT(?) = O(n) HT_merged = self.MergeHT(node.val, HT1, HT2) return HT_merged # T_MergeHT(?) = O(counts of key in input Hashtable) # = O(counts of paths) = O(counts of leaf nodes) # = O(2^{k-1}) = O(n/2) # k = log(n+1) def MergeHT(self, current_val, HT1, HT2): if HT1 and (current_val != 0): for k in list(HT1.keys()): HT1.update({k + current_val:HT1.get(k + current_val, 0)+HT1[k]}) del[HT1[k]] if HT2: for k in list(HT2.keys()): HT1.update({k + current_val:HT1.get(k + current_val, 0)+HT2[k]}) del HT2 return HT1 ``` * 程式碼重點解釋 * 第 3~13 行 `hasPathSum()` 是主要 function * 第 15~27 行實作 DFS * 第 29~38 行實作 `#recursive step` 提到的 dictionary merge * 第 10、32、36 行 [dict.get()](https://docs.python.org/3/library/stdtypes.html#dict.get) 會依 map_key 查 map_value, 若查不到 key 則生成該 map_key, 並初始化 map_value 為 0 * 第 32、36 行 [dict.update()](https://docs.python.org/3/library/stdtypes.html#dict.update) 是 python dict 新增 key, value 的方法 * `Test` * 假設是 full binary tree、輸入的樹有 $n$ 個節點、樹高為 $k$ * `hasPathSum()` 的時間複雜度會受 `DFS()` 影響。 * `DFS()` 的時間複雜度是 base case 次數 + recursive setps 次數 * base case 是走訪到 NULL 節點數量,即 $O(2^{k})$ * recursive setps 會「切分左右子樹執行遞迴」以及「執行 `MergeHT()`」,所以會是 $k$ 乘上 `MergeHT()` 執行時間 * 假設所有 distionary 查找的時間複雜度是 $O(1)$,則 `MergeHT()` 時間會是兩個 hashtable 的 keys 數量的和,最終 keys 數量是 paths 數,也就是 leaf nodes 數量,leaf nodes 數量是 $2^{k-1}$ * 整理得 $T_{recursive\;setps}(?) = O(k \cdot 2^{k-1})$,而 $k = log_2(n+1)$ * 故`hasPathSum()` 的時間複雜度是 $O(n) + O(n \cdot log_2(n)) = O(n \cdot log_2(n))$ * `Optimization`: ==這裡延伸補充一下我在課堂中,最初提到使用 **DFS 走訪**,並用**全域變數紀錄 current summation** 的方法,剛好能作為 `Optimization`。== * Interviewee:「原本的方法需要走訪完所有的路徑,但本題其實找到一條符合 TargetSum 的路徑就可以回傳 True,所以我可以設法在找到正確路徑時就提早終止走訪,另外這題在走訪過程中似乎不需要紀錄這麼多資訊,事實上我只需要一個全域變數記錄當前走訪節點的 current summation 即可。 * `code` 在下方 (此提供++延伸修改原本的 python code++ 及 ++c code 版本可與 panda 同學比較++) * Interviewee: 「(以下可以一邊修改複製後的原程式碼,一邊說明) **首先我要把 dict 及 `MergeHT()` 拿掉,因為要改成使用單一全域變數來記錄 current summation**,這裡一般作法會把 current summation 當作 recursion 的 parameter,但副作用是每次遞迴都要暫存 current summation 到記憶體的 stack,如果改用全域變數來管理 current summation,執行時就只需要用 4 bytes (假設是在 C 語言並使用 integer 型態儲存的話),在 python 裡的 class 可以用 `self.current_sum` 來宣告 class 裡的全域變數。再來 **DFS 的 return 要改為 boolean,DFS 的 Recursive step 變成每次走訪到一個節點就做 `self.current_sum += node.val`,最後在左右節點走訪後都各做一次提前結束的檢查,也就是 `if result==true: return True`。**」 * `Test` Interviewee:「修改後的程式碼,假設總共有 n 個節點、樹高是 k (root 高度從 1 開始),最差情況是 $O(2^{k}-1)$ 因為每個節點都要走訪一次,最佳情況是第一條路徑就找到答案就會是 $\Omega(k)$。而樹高 k,假設是極度歪斜樹 $k = n$,相當於走訪一條 linked list;假設是 full binary tree,則 $k=log_{2}(n+1)$,因為總節點數是 $n=2^{k}-1$,則 $k=log_{2}(n+1)$。」 :::spoiler {state="open"} (<--可展開)**(Optimization) 延伸修改 Python code** ```python # Optimization 參考程式碼: DFS + global current_sum class Solution: def __init__(self): self.current_sum = 0 self.TargetSum = 0 def hasPathSum(self, root: Optional[TreeNode], targetSum: int) -> bool: self.TargetSum = targetSum return self.DFS(root) # Do DFS and get boolean result def DFS(self, node): '''Base case: reach NULL child node''' if not node: return False '''Recursive step''' self.current_sum += node.val # reach leaf node if (not node.left) and (not node.right): if (self.current_sum == self.TargetSum): return True result = self.DFS(node.left) if result: return True # Early stop result = self.DFS(node.right) if result: return True # Early stop # if result == false self.current_sum -= node.val return False ::: :::spoiler {state="close"} (<--可展開) **(Optimization) C 語言實作參考** ```c /* Optimization 參考程式碼: DFS + global current_sum */ int current_sum = 0; bool DFS(struct TreeNode* node, int targetSum); bool hasPathSum(struct TreeNode* root, int targetSum) { current_sum = 0; return DFS(root, targetSum); } bool DFS(struct TreeNode* node, int targetSum) { /* baes case */ if (!node) return false; /* recursive step */ current_sum += node->val; // add curr sum, // print traversal //printf("node %d\n", node->val); // leaf node check if (!(node->left) && !(node->right)) if (current_sum == targetSum) return true; bool result; result = DFS(node->left, targetSum); if (result) return result; result = DFS(node->right, targetSum); if (result) return result; // result = false: give up this node value, not add to curr sum current_sum -= node->val; return false; } ``` ::: <br> Q4: DFS 和 BFS 手法 (實作) 對 "hasPathSum" 的差異 Ans: BFS 的空間複雜度為 $O(B^M)$ ,較 DFS 的空間複雜度 $O(BM)$ 大很多。而 "hasPathSum" 必須且只需找到一個符合條件的葉節點, BFS "分支優先找最佳解" 的優勢無法發揮,反而花大量空間在保存待檢查節點,複製節點也花了不少時間。