Nic: 學習資料結構、演算法在工作上真的有用嗎?
Q1: Merge Two Sorted Lists 真實案例
linux/list.h 是 Linux os 中實用的 doubly linked list 封裝,只要在自定義的結構中加入 struct list_head,就能使用 linux 所提供的一系列 api,像是 list_add
, list_add
,list_empty
等…
在 linux 原始碼 lib/list_sort.c 可以看到以下 merge 實作,功能是將兩個排序好的 linked list 給整合
而這個 function 在 list_sort 有被使用到,用以實作 linked list 高效率 merge sort
根據 May 15, 2019 的 commit 紀錄指出,高效率 merge sort 概念源自論文 Queue-mergesort
lib/list_sort: optimize number of calls to comparison function
CONFIG_RETPOLINE has severely degraded indirect function call
performance, so it's worth putting some effort into reducing the number
of times cmp() is called.This patch avoids badly unbalanced merges on unlucky input sizes. It
slightly increases the code size, but saves an average of 0.2*n calls to
cmp().x86-64 code size 739 -> 803 bytes (+64)
Unfortunately, there's not a lot of low-hanging fruit in a merge sort;
it already performs only nlog2(n) - Kn + O(1) compares. The leading
coefficient is already at the theoretical limit (log2(n!) corresponds to
K=1.4427), so we're fighting over the linear term, and the best
mergesort can do is K=1.2645, achieved when n is a power of 2.The differences between mergesort variants appear when n is not a
power of 2; K is a function of the fractional part of log2(n). Top-down
mergesort does best of all, achieving a minimum K=1.2408, and an average
(over all sizes) K=1.248. However, that requires knowing the number of
entries to be sorted ahead of time, and making a full pass over the
input to count it conflicts with a second performance goal, which is
cache blocking.Obviously, we have to read the entire list into L1 cache at some point,
and performance is best if it fits. But if it doesn't fit, each full
pass over the input causes a cache miss per element, which is
undesirable.While textbooks explain bottom-up mergesort as a succession of merging
passes, practical implementations do merging in depth-first order: as
soon as two lists of the same size are available, they are merged. This
allows as many merge passes as possible to fit into L1; only the final
few merges force cache misses.This cache-friendly depth-first merge order depends on us merging the
beginning of the input as much as possible before we've even seen the
end of the input (and thus know its size).The simple eager merge pattern causes bad performance when n is just
over a power of 2. If n=1028, the final merge is between 1024- and
4-element lists, which is wasteful of comparisons. (This is actually
worse on average than n=1025, because a 1204:1 merge will, on average,
end after 512 compares, while 1024:4 will walk 4/5 of the list.)Because of this, bottom-up mergesort achieves K < 0.5 for such sizes,
and has an average (over all sizes) K of around 1. (My experiments show
K=1.01, while theory predicts K=0.965.)There are "worst-case optimal" variants of bottom-up mergesort which
avoid this bad performance, but the algorithms given in the literature,
such as queue-mergesort and boustrodephonic mergesort, depend on the
breadth-first multi-pass structure that we are trying to avoid.This implementation is as eager as possible while ensuring that all
merge passes are at worst 1:2 unbalanced. This achieves the same
average K=1.207 as queue-mergesort, which is 0.2n better then
bottom-up, and only 0.04n behind top-down mergesort.Specifically, defers merging two lists of size 2^k until it is known
that there are 2^k additional inputs following. This ensures that the
final uneven merges triggered by reaching the end of the input will be
at worst 2:1. This will avoid cache misses as long as 3*2^k elements
fit into the cache.(I confess to being more than a little bit proud of how clean this code
turned out. It took a lot of thinking, but the resultant inner loop is
very simple and efficient.)Refs:
Bottom-up Mergesort: A Detailed Analysis
Wolfgang Panny, Helmut Prodinger
Algorithmica 14(4):340–354, October 1995
https://doi.org/10.1007/BF01294131
https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.5260The cost distribution of queue-mergesort, optimal mergesorts, and
power-of-two rules
Wei-Mei Chen, Hsien-Kuei Hwang, Gen-Huey Chen
Journal of Algorithms 30(2); Pages 423–448, February 1999
https://doi.org/10.1006/jagm.1998.0986
https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380Queue-Mergesort
Mordecai J. Golin, Robert Sedgewick
Information Processing Letters, 48(5):253–259, 10 December 1993
https://doi.org/10.1016/0020-0190(93)90088-q
https://sci-hub.tw/10.1016/0020-0190(93)90088-QFeedback from Rasmus Villemoes linux@rasmusvillemoes.dk.
Link: http://lkml.kernel.org/r/fd560853cc4dca0d0f02184ffa888b4c1be89abc.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin lkml@sdf.org
Acked-by: Andrey Abramov st5pub@yandex.ru
Acked-by: Rasmus Villemoes linux@rasmusvillemoes.dk
Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com
Cc: Daniel Wagner daniel.wagner@siemens.com
Cc: Dave Chinner dchinner@redhat.com
Cc: Don Mullis don.mullis@gmail.com
Cc: Geert Uytterhoeven geert@linux-m68k.org
Signed-off-by: Andrew Morton akpm@linux-foundation.org
Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
@torvalds
George Spelvin authored and torvalds committed on 15 May 2019
Mordecai J. Golin, Robert Sedgewick Queue-Mergesort 論文裡的示意圖
,方法為把 linked list 視為一個 queue。每次的 merge 對象為 queue 中的最前面二個節點,merge 後放到 tail 並視為一個節點,重複步驟直到全部 merge 完成。
因此只要有用到 list_sort 的地方,背後都有用到 merge two sorted list 的概念
Q2: 範例一缺失 (1)
Ans: 在說明題目時未提到 input 的 data type,以及沒有提到 Binary Tree 類型 (constraint),例如說:是 complte binary tree 還是 full binary tree…等。最後,沒有說明 node 的型態為何。
Q3 - 1: 範例一缺失 (2)
Ans:
Approach
及 6:37 撰寫程式碼時,interviewer 比較容易明白你操作 queue 的 pop() 或 push() 的時機與用意。REACTO
,搭配 Example
輔助講解 Approach
,但是純口頭說明演算法對聽者 (interviewer) 較難想像。Approach
時,建議用程式註解的方式,先粗略寫出各步驟、策略,確認 interviewer 有跟上後,再寫出各步驟細部的 pseudocode 或 coding 實作。 (例如: Google Coding Interview With A High School Student: 17:08-32:33)Q3 - 2: 範例一的其他解法
Ans:
我嘗試使用課程中老師所提示的 Python 的 dictionary 實作第三種方法,以下從 REACTO
的 Approach
步驟開始,並一邊說明 Approach
一邊用 example 1 作 Example
:
DFS
),並且從樹的底部也就是 leaf node,一路反向回傳路徑上的 current summation。並搭配 dictionary 紀錄每條路徑的 current summation,我可以用 dictionary key 來記錄 current node value summation、dictionary value 紀錄該 current summation 出現次數 (於 docs 寫下 dict of (current sum, counts)
),最後遍歷整棵樹之後可以拿到所有路徑的 node value summation,若存在符合 TargetSum 的路徑就回傳 True,否則回傳 False」recursive step:...
),回傳到 root 時會先走訪右子樹,走訪完回到 root 的時候,dictionary 就能紀錄四筆路徑的 summation」#1 ~ #5
)Code
實作參考hasPathSum()
是主要 function#recursive step
提到的 dictionary mergeTest
hasPathSum()
的時間複雜度會受 DFS()
影響。DFS()
的時間複雜度是 base case 次數 + recursive setps 次數
MergeHT()
」,所以會是 乘上 MergeHT()
執行時間MergeHT()
時間會是兩個 hashtable 的 keys 數量的和,最終 keys 數量是 paths 數,也就是 leaf nodes 數量,leaf nodes 數量是 hasPathSum()
的時間複雜度是 Optimization
: 這裡延伸補充一下我在課堂中,最初提到使用 DFS 走訪,並用全域變數紀錄 current summation 的方法,剛好能作為 Optimization
。
code
在下方 (此提供延伸修改原本的 python code 及 c code 版本可與 panda 同學比較)MergeHT()
拿掉,因為要改成使用單一全域變數來記錄 current summation,這裡一般作法會把 current summation 當作 recursion 的 parameter,但副作用是每次遞迴都要暫存 current summation 到記憶體的 stack,如果改用全域變數來管理 current summation,執行時就只需要用 4 bytes (假設是在 C 語言並使用 integer 型態儲存的話),在 python 裡的 class 可以用 self.current_sum
來宣告 class 裡的全域變數。再來 DFS 的 return 要改為 boolean,DFS 的 Recursive step 變成每次走訪到一個節點就做 self.current_sum += node.val
,最後在左右節點走訪後都各做一次提前結束的檢查,也就是 if result==true: return True
。」Test
Interviewee:「修改後的程式碼,假設總共有 n 個節點、樹高是 k (root 高度從 1 開始),最差情況是 因為每個節點都要走訪一次,最佳情況是第一條路徑就找到答案就會是 。而樹高 k,假設是極度歪斜樹 ,相當於走訪一條 linked list;假設是 full binary tree,則 ,因為總節點數是 ,則 。」Q4: DFS 和 BFS 手法 (實作) 對 "hasPathSum" 的差異
Ans: BFS 的空間複雜度為 ,較 DFS 的空間複雜度 大很多。而 "hasPathSum" 必須且只需找到一個符合條件的葉節點, BFS "分支優先找最佳解" 的優勢無法發揮,反而花大量空間在保存待檢查節點,複製節點也花了不少時間。