2022-09-20 課堂討論簡記

video: 叫 foodpanda「送~」

雖然是廣告短片 (本課程謝絕業配)，但卻刻劃著職場生態，不是你委曲求全，公司主管就會理你，後者寧可花更多錢找其他人
你在簡歷中宣揚自身的智慧，他人可能會認定是小惠

Nic: 學習資料結構、演算法在工作上真的有用嗎?
- 冷次定律 (Lenz's law)
- 何時用得到 linked list merge 呢？
  - 你所不知道的 C 語言: linked list 和非連續記憶體
  - memory allocator!
- 成大某教授：「很多知識和理論之所以讓你覺得沒有用，其實是因為你尚未到達需要他們的高度」
臺灣人從事軟體行業直接出國工作，機會成本最低的國家大概是日本和新加坡

課程問答

Q1: Merge Two Sorted Lists 真實案例

linux/list.h 是 Linux os 中實用的 doubly linked list 封裝，只要在自定義的結構中加入 struct list_head，就能使用 linux 所提供的一系列 api，像是 list_add , list_add,list_empty 等…

在 linux 原始碼 lib/list_sort.c 可以看到以下 merge 實作，功能是將兩個排序好的 linked list 給整合

merge()

/*
 * Returns a list organized in an intermediate format suited
 * to chaining of merge() calls: null-terminated, no reserved or
 * sentinel head node, "prev" links not maintained.
 */
__attribute__((nonnull(2,3,4)))
static struct list_head *merge(void *priv, list_cmp_func_t cmp,
				struct list_head *a, struct list_head *b)
{
	struct list_head *head, **tail = &head;

	for (;;) {
		/* if equal, take 'a' -- important for sort stability */
		if (cmp(priv, a, b) <= 0) {
			*tail = a;
			tail = &a->next;
			a = a->next;
			if (!a) {
				*tail = b;
				break;
			}
		} else {
			*tail = b;
			tail = &b->next;
			b = b->next;
			if (!b) {
				*tail = a;
				break;
			}
		}
	}
	return head;
}

而這個 function 在 list_sort 有被使用到，用以實作 linked list 高效率 merge sort

list_sort

__attribute__((nonnull(2,3)))
void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp)
{
	struct list_head *list = head->next, *pending = NULL;
	size_t count = 0;	/* Count of pending */

	if (list == head->prev)	/* Zero or one elements */
		return;

	/* Convert to a null-terminated singly-linked list. */
	head->prev->next = NULL;

	/*
	 * Data structure invariants:
	 * - All lists are singly linked and null-terminated; prev
	 *   pointers are not maintained.
	 * - pending is a prev-linked "list of lists" of sorted
	 *   sublists awaiting further merging.
	 * - Each of the sorted sublists is power-of-two in size.
	 * - Sublists are sorted by size and age, smallest & newest at front.
	 * - There are zero to two sublists of each size.
	 * - A pair of pending sublists are merged as soon as the number
	 *   of following pending elements equals their size (i.e.
	 *   each time count reaches an odd multiple of that size).
	 *   That ensures each later final merge will be at worst 2:1.
	 * - Each round consists of:
	 *   - Merging the two sublists selected by the highest bit
	 *     which flips when count is incremented, and
	 *   - Adding an element from the input as a size-1 sublist.
	 */
	do {
		size_t bits;
		struct list_head **tail = &pending;

		/* Find the least-significant clear bit in count */
		for (bits = count; bits & 1; bits >>= 1)
			tail = &(*tail)->prev;
		/* Do the indicated merge */
		if (likely(bits)) {
			struct list_head *a = *tail, *b = a->prev;

			a = merge(priv, cmp, b, a);
			/* Install the merged result in place of the inputs */
			a->prev = b->prev;
			*tail = a;
		}

		/* Move one element from input list to pending */
		list->prev = pending;
		pending = list;
		list = list->next;
		pending->next = NULL;
		count++;
	} while (list);

	/* End of input; merge together all the pending lists. */
	list = pending;
	pending = pending->prev;
	for (;;) {
		struct list_head *next = pending->prev;

		if (!next)
			break;
		list = merge(priv, cmp, pending, list);
		pending = next;
	}
	/* The final merge, rebuilding prev links */
	merge_final(priv, cmp, head, pending, list);
}

根據 May 15, 2019 的 commit 紀錄指出，高效率 merge sort 概念源自論文 Queue-mergesort

commit 紀錄

lib/list_sort: optimize number of calls to comparison function

CONFIG_RETPOLINE has severely degraded indirect function call
performance, so it's worth putting some effort into reducing the number
of times cmp() is called.

This patch avoids badly unbalanced merges on unlucky input sizes. It
slightly increases the code size, but saves an average of 0.2*n calls to
cmp().

x86-64 code size 739 -> 803 bytes (+64)

Unfortunately, there's not a lot of low-hanging fruit in a merge sort;
it already performs only nlog2(n) - Kn + O(1) compares. The leading
coefficient is already at the theoretical limit (log2(n!) corresponds to
K=1.4427), so we're fighting over the linear term, and the best
mergesort can do is K=1.2645, achieved when n is a power of 2.

The differences between mergesort variants appear when n is not a
power of 2; K is a function of the fractional part of log2(n). Top-down
mergesort does best of all, achieving a minimum K=1.2408, and an average
(over all sizes) K=1.248. However, that requires knowing the number of
entries to be sorted ahead of time, and making a full pass over the
input to count it conflicts with a second performance goal, which is
cache blocking.

Obviously, we have to read the entire list into L1 cache at some point,
and performance is best if it fits. But if it doesn't fit, each full
pass over the input causes a cache miss per element, which is
undesirable.

While textbooks explain bottom-up mergesort as a succession of merging
passes, practical implementations do merging in depth-first order: as
soon as two lists of the same size are available, they are merged. This
allows as many merge passes as possible to fit into L1; only the final
few merges force cache misses.

This cache-friendly depth-first merge order depends on us merging the
beginning of the input as much as possible before we've even seen the
end of the input (and thus know its size).

The simple eager merge pattern causes bad performance when n is just
over a power of 2. If n=1028, the final merge is between 1024- and
4-element lists, which is wasteful of comparisons. (This is actually
worse on average than n=1025, because a 1204:1 merge will, on average,
end after 512 compares, while 1024:4 will walk 4/5 of the list.)

Because of this, bottom-up mergesort achieves K < 0.5 for such sizes,
and has an average (over all sizes) K of around 1. (My experiments show
K=1.01, while theory predicts K=0.965.)

There are "worst-case optimal" variants of bottom-up mergesort which
avoid this bad performance, but the algorithms given in the literature,
such as queue-mergesort and boustrodephonic mergesort, depend on the
breadth-first multi-pass structure that we are trying to avoid.

This implementation is as eager as possible while ensuring that all
merge passes are at worst 1:2 unbalanced. This achieves the same
average K=1.207 as queue-mergesort, which is 0.2n better then
bottom-up, and only 0.04n behind top-down mergesort.

Specifically, defers merging two lists of size 2^k until it is known
that there are 2^k additional inputs following. This ensures that the
final uneven merges triggered by reaching the end of the input will be
at worst 2:1. This will avoid cache misses as long as 3*2^k elements
fit into the cache.

(I confess to being more than a little bit proud of how clean this code
turned out. It took a lot of thinking, but the resultant inner loop is
very simple and efficient.)

Refs:
Bottom-up Mergesort: A Detailed Analysis
Wolfgang Panny, Helmut Prodinger
Algorithmica 14(4):340–354, October 1995
https://doi.org/10.1007/BF01294131
https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.5260

The cost distribution of queue-mergesort, optimal mergesorts, and
power-of-two rules
Wei-Mei Chen, Hsien-Kuei Hwang, Gen-Huey Chen
Journal of Algorithms 30(2); Pages 423–448, February 1999
https://doi.org/10.1006/jagm.1998.0986
https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380

Queue-Mergesort
Mordecai J. Golin, Robert Sedgewick
Information Processing Letters, 48(5):253–259, 10 December 1993
https://doi.org/10.1016/0020-0190(93)90088-q
https://sci-hub.tw/10.1016/0020-0190(93)90088-Q

Feedback from Rasmus Villemoes linux@rasmusvillemoes.dk.

Link: http://lkml.kernel.org/r/fd560853cc4dca0d0f02184ffa888b4c1be89abc.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin lkml@sdf.org
Acked-by: Andrey Abramov st5pub@yandex.ru
Acked-by: Rasmus Villemoes linux@rasmusvillemoes.dk
Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com
Cc: Daniel Wagner daniel.wagner@siemens.com
Cc: Dave Chinner dchinner@redhat.com
Cc: Don Mullis don.mullis@gmail.com
Cc: Geert Uytterhoeven geert@linux-m68k.org
Signed-off-by: Andrew Morton akpm@linux-foundation.org
Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
@torvalds
George Spelvin authored and torvalds committed on 15 May 2019

Mordecai J. Golin, Robert Sedgewick Queue-Mergesort 論文裡的示意圖
，方法為把 linked list 視為一個 queue。每次的 merge 對象為 queue 中的最前面二個節點，merge 後放到 tail 並視為一個節點，重複步驟直到全部 merge 完成。

因此只要有用到 list_sort 的地方，背後都有用到 merge two sorted list 的概念

Q2: 範例一缺失 (1)
Ans: 在說明題目時未提到 input 的 data type，以及沒有提到 Binary Tree 類型 (constraint)，例如說：是 complte binary tree 還是 full binary tree…等。最後，沒有說明 node 的型態為何。

Q3 - 1: 範例一缺失 (2)
Ans:

🐶 interviewer:
- 開頭題目說明的環節，建議一開始就搭配 example 1 的圖片優先說明本題目會給一棵 binary tree，每個節點存放整數數值。從 root node 至 leaf node 可以連成一條路徑，例如:「5-4-11-2」是一條 root-to-leaf 路徑，並且我們可以將 root-to-leaf 的每個節點數值加總得到 targetSum，而且加總的路徑是單向不可折返。
  Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
  接著表達本題的需求是「要 interviewee 寫出一個 function 來判斷給定的 targetSum 是否存在於 binary tree 之中」
  並且應明確表達第一個參數會給這棵 binary tree 的 root，第二個參數是題目欲判斷的 targetSum。
- 上課筆記: 1:27 example 的部分應由 interviewer 進行解說，而非 interviewee。
- 3:13 以我觀察 google coding interview 通常不會讓 interviewee 直接寫程式或 pseudocode，而是要先讓 interviewer 聽懂 interviewee 的方法後 (Google Coding Interview: 15:20-15:41)，再讓 interviewee 撰寫 pseudocode (Google Coding Interview: 16:54-16:57)，故 3:16-3:23 應為 interviewer 的台詞。
🐼 interviewee:
- 2:23 其實可以直接借用 example 1 的圖片進行舉例提問，節省時間，舉例也更具體。
- 3:28-3:42 先說明方法再寫程式是正確的，建議可以先強調自己要使用的演算法名稱 (例如本題的 method 1 使用 BFS)，此處對於 queue 的功能說明不夠明確，即便影片提到「佇列資料結構裡面存的型態應該是節點跟它減完 targetSum 的值」，若改說「我預計使用 BFS 演算法，並於走訪各節點的時候，使用 queue 來存放走訪到的數值，並且這個數值並非直接儲存 node value，而是技巧性地儲存 targetSum 與 node value 的差」應會更清楚。若有清楚說明 queue 的用途，後續說明 Approach 及 6:37 撰寫程式碼時，interviewer 比較容易明白你操作 queue 的 pop() 或 push() 的時機與用意。
- 3:55 此處影片中的 Panda 同學應是想使用 REACTO，搭配 Example 輔助講解 Approach，但是純口頭說明演算法對聽者 (interviewer) 較難想像。
  建議口頭說明的同時，於 Google Docs 打出 queue 的示意圖以輔助講解，類似下方的文字註解方式，當你要做 pop 或 push 時，亦可同時更改下方的內容。
```
// queue 
// -------------------------
// (17-8), (17-4), (22-5)
// -------------------------
```
- 說明 Approach 時，建議用程式註解的方式，先粗略寫出各步驟、策略，確認 interviewer 有跟上後，再寫出各步驟細部的 pseudocode 或 coding 實作。 (例如: Google Coding Interview With A High School Student: 17:08-32:33)
  先寫演算法步驟的好處是後續寫 code 可以參考每個步驟，也能幫助 interviewer 跟上 interviewee 的節奏。

Q3 - 2: 範例一的其他解法
Ans:
我嘗試使用課程中老師所提示的 Python 的 dictionary 實作第三種方法，以下從 REACTO 的 Approach 步驟開始，並一邊說明 Approach 一邊用 example 1 作 Example:

Interviewee:「我認為可以使用遞迴實作 DFS 來走訪二元樹的所有節點，(參見下方註解框，可同步於 docs 寫下 DFS)，並且從樹的底部也就是 leaf node，一路反向回傳路徑上的 current summation。並搭配 dictionary 紀錄每條路徑的 current summation，我可以用 dictionary key 來記錄 current node value summation、dictionary value 紀錄該 current summation 出現次數 (於 docs 寫下 dict of (current sum, counts))，最後遍歷整棵樹之後可以拿到所有路徑的 node value summation，若存在符合 TargetSum 的路徑就回傳 True，否則回傳 False」
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Interviewee:「(搭配 example 1 範例圖，路徑 5-4-11-7)，遞迴的 base case 就是當左右子節點都是 None 表示走到 leaf node，接著遞迴會從 node 7 將 value 7 包進 dictionary 回傳給 node 11，接著走訪 node 2 也會將 node value 包進 dictionary 回傳，此時 node 11 可以將左右子節點回傳的兩個 dictionary Merge 成一個 dictionary，同時計算 node 11 的 current summation。」
Interviewee:「node 11 會新增兩個 current summation 至 dictionary，key 值是 node.value 減去左右節點的 node value (於 docs 寫下 recursive step:...)，回傳到 root 時會先走訪右子樹，走訪完回到 root 的時候，dictionary 就能紀錄四筆路徑的 summation」
接者寫出演算法個步驟: (下方 #1 ~ #5)

# Main Approcch: DFS, dict of (current sum, counts)
# recursive step: HT[node.val - sub_leftnode.val], HT[node.val - sub_rightnode.val]
# 1) Use DFS traverse this tree 
# 2) When reach leaf node, make a dict. of map_key is node.val, map_value is count 
# 3) Get current sum merge dicts from left-subtree and right-subtree
# 4) End of DFS, get all paths sum and counts
# 5) if (HT[targetSum] > 0) return True else False

Code 實作參考




















































class Solution:
    # T_whole(?) = O(1) + T_DFS(?) + O(1)
    #            = O(n * log(n))
    def hasPathSum(self, root: Optional[TreeNode], targetSum: int) -> bool: 
        # O(1)
        # if tree is empty
        if not root: 
            return False
        
        # Do DFS and get dict of (value sum, counts) record all paths 
        HT_full = self.DFS(root)
        if HT_full.get(targetSum, 0) > 0: # if HT_full[targetSum] exist
            return True
        else:
            return False
    
    # T_DFS(?) = base case + recurrsive steps
    #          = O(counts of null nodes) + (k * merge time)
    #          = O(n) + O(k * 2^{k-1}) # k = log(n+1)
    #          = O(n) + O(log(n) * n)
    #          = O(n * log(n))
    def DFS(self, node):
        # O(1)
        '''Base case: reach NULL child node'''
        if not node: return {}
        
        # O(?) = O(k * merge time)
        '''Recurrsive steps:'''
        # O(1)
        #  reach leaf node, return a dict(): map_key is node.val, map_value is count of node.val (init. 1)
        if (not node.left) and (not node.right): 
            return {node.val: 1}
        
        HT1 = self.DFS(node.left)
        HT2 = self.DFS(node.right)
        # T_MergeHT(?) = O(n)
        HT_merged = self.MergeHT(node.val, HT1, HT2)
        return HT_merged
    
    # T_MergeHT(?) = O(counts of key in input Hashtable)
    #              = O(counts of paths) = O(counts of leaf nodes)
    #              = O(2^{k-1}) = O(n/2) # k = log(n+1)
    def MergeHT(self, current_val, HT1, HT2):
        if HT1 and (current_val != 0):
            for k in list(HT1.keys()):
                HT1.update({k + current_val:HT1.get(k + current_val, 0)+HT1[k]})
                del[HT1[k]]
        if HT2:
            for k in list(HT2.keys()):
                HT1.update({k + current_val:HT1.get(k + current_val, 0)+HT2[k]})
            del HT2
        return HT1

程式碼重點解釋
- 第 3~13 行 hasPathSum() 是主要 function
- 第 15~27 行實作 DFS
- 第 29~38 行實作 #recursive step 提到的 dictionary merge
- 第 10、32、36 行 dict.get() 會依 map_key 查 map_value, 若查不到 key 則生成該 map_key, 並初始化 map_value 為 0
- 第 32、36 行 dict.update() 是 python dict 新增 key, value 的方法
Test
- 假設是 full binary tree、輸入的樹有
  $n$ 個節點、樹高為
  $k$
- hasPathSum() 的時間複雜度會受 DFS() 影響。
- DFS() 的時間複雜度是 base case 次數 + recursive setps 次數
  - base case 是走訪到 NULL 節點數量，即
    $O (2^{k})$
  - recursive setps 會「切分左右子樹執行遞迴」以及「執行 MergeHT()」，所以會是
    $k$ 乘上 MergeHT() 執行時間
  - 假設所有 distionary 查找的時間複雜度是
    $O (1)$ ，則 MergeHT() 時間會是兩個 hashtable 的 keys 數量的和，最終 keys 數量是 paths 數，也就是 leaf nodes 數量，leaf nodes 數量是
    $2^{k - 1}$
  - 整理得
    $T_{r e c u r s i v e s e t p s} (?) = O (k \cdot 2^{k - 1})$ ，而
    $k = l o g_{2} (n + 1)$
- 故hasPathSum() 的時間複雜度是
  $O (n) + O (n \cdot l o g_{2} (n)) = O (n \cdot l o g_{2} (n))$
Optimization: 這裡延伸補充一下我在課堂中，最初提到使用 DFS 走訪，並用全域變數紀錄 current summation 的方法，剛好能作為 Optimization。
- Interviewee:「原本的方法需要走訪完所有的路徑，但本題其實找到一條符合 TargetSum 的路徑就可以回傳 True，所以我可以設法在找到正確路徑時就提早終止走訪，另外這題在走訪過程中似乎不需要紀錄這麼多資訊，事實上我只需要一個全域變數記錄當前走訪節點的 current summation 即可。
- code 在下方 (此提供延伸修改原本的 python code 及 c code 版本可與 panda 同學比較)
- Interviewee: 「(以下可以一邊修改複製後的原程式碼，一邊說明) 首先我要把 dict 及 MergeHT() 拿掉，因為要改成使用單一全域變數來記錄 current summation，這裡一般作法會把 current summation 當作 recursion 的 parameter，但副作用是每次遞迴都要暫存 current summation 到記憶體的 stack，如果改用全域變數來管理 current summation，執行時就只需要用 4 bytes (假設是在 C 語言並使用 integer 型態儲存的話)，在 python 裡的 class 可以用 self.current_sum 來宣告 class 裡的全域變數。再來 DFS 的 return 要改為 boolean，DFS 的 Recursive step 變成每次走訪到一個節點就做 self.current_sum += node.val，最後在左右節點走訪後都各做一次提前結束的檢查，也就是 if result==true: return True。」
- Test Interviewee:「修改後的程式碼，假設總共有 n 個節點、樹高是 k (root 高度從 1 開始)，最差情況是
  $O (2^{k} - 1)$ 因為每個節點都要走訪一次，最佳情況是第一條路徑就找到答案就會是
  $Ω (k)$ 。而樹高 k，假設是極度歪斜樹
  $k = n$ ，相當於走訪一條 linked list；假設是 full binary tree，則
  $k = l o g_{2} (n + 1)$ ，因為總節點數是
  $n = 2^{k} - 1$ ，則
  $k = l o g_{2} (n + 1)$ 。」

(<–可展開)(Optimization) 延伸修改 Python code

# Optimization 參考程式碼: DFS + global current_sum 
class Solution: 
    def __init__(self):
        self.current_sum = 0
        self.TargetSum = 0
    
    def hasPathSum(self, root: Optional[TreeNode], targetSum: int) -> bool: 
        self.TargetSum = targetSum
        return self.DFS(root) # Do DFS and get boolean result
    
    def DFS(self, node):
        '''Base case: reach NULL child node'''
        if not node: return False
        
        '''Recursive step'''
        self.current_sum += node.val
        
        # reach leaf node
        if (not node.left) and (not node.right): 
            if (self.current_sum == self.TargetSum):
                return True
        
        result = self.DFS(node.left)
        if result: return True # Early stop
        result = self.DFS(node.right)
        if result: return True # Early stop
        # if result == false
        self.current_sum -= node.val
        return False

(<–可展開) (Optimization) C 語言實作參考

/* Optimization 參考程式碼: DFS + global current_sum */
int current_sum = 0;
bool DFS(struct TreeNode* node, int targetSum);
bool hasPathSum(struct TreeNode* root, int targetSum)
{   
    current_sum = 0;
    return DFS(root, targetSum);
}

bool DFS(struct TreeNode* node, int targetSum)
{
    /* baes case */
    if (!node) return false;
    
    /* recursive step */
    current_sum += node->val; // add curr sum, 
    // print traversal //printf("node %d\n", node->val);
    
    // leaf node check 
    if (!(node->left) && !(node->right))
        if (current_sum == targetSum)
            return true;
    
    bool result;
    result = DFS(node->left, targetSum);
    if (result) return result;
    result = DFS(node->right, targetSum);
    if (result) return result;
    // result = false: give up this node value, not add to curr sum
    current_sum -= node->val;
    return false;
}

Q4: DFS 和 BFS 手法 (實作) 對 "hasPathSum" 的差異
Ans: BFS 的空間複雜度為

O (B^{M})

，較 DFS 的空間複雜度

O (B M)

大很多。而 "hasPathSum" 必須且只需找到一個符合條件的葉節點， BFS "分支優先找最佳解" 的優勢無法發揮，反而花大量空間在保存待檢查節點，複製節點也花了不少時間。