2025q1 Homework2 (quiz1+2)

# 2025q1 Homework2 (quiz1+2) contributed by < `LinMarc1210` > {%hackmd NrmQUGbRQWemgwPfhzXj6g %} ## 開發環境 ```shell $ gcc --version gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: GenuineIntel Model name: 12th Gen Intel(R) Core(TM) i7-1260P CPU family: 6 Model: 154 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 1 Stepping: 3 CPU(s) scaling MHz: 21% CPU max MHz: 4700.0000 CPU min MHz: 400.0000 BogoMIPS: 4992.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge m ca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 s s ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nons top_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma c x16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_de adline_timer aes xsave avx f16c rdrand lahf_lm abm 3dn owprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_e nhanced tpr_shadow flexpriority ept vpid ept_ad fsgsba se tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsav ec xgetbv1 xsaves split_lock_detect user_shstk avx_vnn i dtherm ida arat pln pts hwp hwp_notify hwp_act_windo w hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_c lear serialize arch_lbr ibt flush_l1d arch_capabilitie s Virtualization features: Virtualization: VT-x Caches (sum of all): L1d: 448 KiB (12 instances) L1i: 640 KiB (12 instances) L2: 9 MiB (6 instances) L3: 18 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-15 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Reg file data sampling: Mitigation; Clear Register File Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prct l Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointe r sanitization Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditiona l; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S Srbds: Not affected Tsx async abort: Not affected ``` ## 第一週 - 測驗 1 > Commit [84f412b](https://github.com/LinMarc1210/linux-hw2/commit/84f412bf3607032909a96148a501512257a5e69f) > Commit [2a6ac64](https://github.com/LinMarc1210/linux-hw2/commit/2a6ac64bce80b97834a0ec6645266f0f71c40535) 觀察 `list_insert_before`，此函式的目的在於透過 **指標的指標** 走訪所有節點並插入。參考 [你所不知道的 C 語言: linked list 和非連續記憶體](https://hackmd.io/@sysprog/c-linked-list#circular-linked-list)，若只用 `list_item_t` 的指標，會遇到特例，我嘗試用指標寫法做和 `list_insert_before` 一樣的事情，但是當我們的 `before == l->head` 的情況，就必須另外用 if 分支處理這個情況，並且將插入的 `item` 改為新的 `head`，後續才可照正常的迴圈走訪，讓 `pos` 指向 `before` 的前一個節點，並且更改 `item` 和 `pos` 的 `next` ，才能插入完成。 ```c void list_insert_before_naive(list_t *l, list_item_t *before, list_item_t *item) { list_item_t *pos = l->head; if (before == l->head) { item->next = before; l->head = item; return; } while (pos && pos->next != before) { pos = pos->next; } item->next = before; pos->next = item; } ``` 但是指標的指標不會有特例，並且程式碼更精簡，如考試的範例： `*p = item` 可以直接更改原本在 `before` 的前一個節點的 `next` 指向至新節點 `item` ，因為 `p` 是指標的指標，我們更改 `*p` 實際上直接更改了 `next` 指標的指向。這樣的方式可以避免我們直接 dereference NULL pointer 而產生錯誤。 ```c void list_insert_before(list_t *l, list_item_t *before, list_item_t *item){ list_item_t **p; for (p = &(l)->head; *p != before; p = &(*p)->next) ; *p = item; item->next = before; } ``` ```graphviz digraph linked_list { node [shape=record]; rankdir=LR; head [label="head"]; item1 [label="<val>1 | <next>"]; item2 [label="<val>2 | <next>"]; item3 [label="<val>3 | <next>"]; newItem [label="<val>99 | <next>", color=blue, style=filled, fillcolor=lightblue]; // Linked list connections head -> item1; item1:next -> item2; item2:next -> item3; item3:next -> null [label="NULL"]; null [shape=point]; // Virtual node to simulate pointing to the arrow arrow_mid [label="", shape=point, width=0.00]; item1:next -> arrow_mid [style=invis]; arrow_mid -> item2 [label="*p", style=invis]; // Pointer p p [label="p (pointer to pointer)", shape=plaintext]; p -> arrow_mid [style=dashed, color=green, label="point to *p"]; // before pointer before [label="before", shape=plaintext]; before -> item2 [style=dashed, color=blue]; // Overall explanation label="Linked List: insert '99' before '2'"; labelloc="t"; } ``` ```graphviz digraph linked_list { node [shape=record]; rankdir=LR head [label="head"]; item1 [label="<val>1 | <next>"]; item2 [label="<val>2 | <next>"]; item3 [label="<val>3 | <next>"]; newItem [label="<val>99 | <next>", color=blue, style=filled, fillcolor=lightblue]; // Linked list connections head -> item1; item2:next -> item3; item3:next -> null [label="NULL"]; null [shape=point]; // Virtual node to simulate pointing to the arrow arrow_mid [label="", shape=point, width=0.00]; item1:next -> arrow_mid [style=invis]; arrow_mid -> newItem [label="*p", style=invis]; p [label="p (pointer to pointer)", shape=plaintext]; p -> arrow_mid [style=dashed, color=green, label="point to *p"]; // Insertion illustration item1:next -> newItem [style=dashed, color=red]; newItem:next -> item2 [style=dashed, color=red]; // before pointer before [label="before", shape=plaintext]; before -> item2 [style=dashed, color=blue]; // Overall explanation label="Linked List: insert '99' before '2'"; labelloc="t"; } ``` 在測試程式方面則是定義兩個巨集：`my_assert` 用來檢查條件，在這裡都是檢查 `list_size` ，而 `my_run_test` 是用來測試指定的函式，透過傳遞 function pointer 測試此 function。要是 `message` 不為 `NULL`，代表 function 執行有問題，並回傳錯誤訊息。 ### 撰寫合併排序操作 ## 第一週 - 測驗 2 > Commit [6c6a52f](https://github.com/LinMarc1210/linux-hw2/commit/6c6a52fb56aae995a2695e9bb9efb20f97c61347) ### 解釋並補完測試程式碼首先需要先了解 `block_t` 的結構，`block_t` 是記憶體管理樹的節點結構，而透過 `block_t` 構成的樹狀結構是用來維護 free blocks 的，因此可以說 `block_t` 就是一小塊還未被使用的記憶體。串成的樹結構如下，其中 `root` 是指標的指標， `*root` 是一個指向根節點的指標，而 `root` 是指向這個指標的指標。 ```graphviz digraph BlockTree { node [shape=record, style=plaintext]; // 指標 root (block_t **) root_ptr_ptr [label="root", shape=plaintext]; root_ptr [label="*root", shape=plaintext]; // 範例樹節點 node1 [label="<l> | <data> size=16 | <r>"]; node2 [label="<l> | <data> size=8 | <r>"]; node3 [label="<l> | <data> size=24 | <r>"]; node4 [label="<l> | <data> size=4 | <r>"]; node5 [label="<l> | <data> size=12 | <r>"]; node6 [label="<l> | <data> size=20 | <r>"]; node7 [label="<l> | <data> size=28 | <r>"]; // 節點之間的連結（l 表示 left child, r 表示 right child） node1:l -> node2:data; node1:r -> node3:data; node2:l -> node4:data; node2:r -> node5:data; node3:l -> node6:data; node3:r -> node7:data; root_ptr_ptr->root_ptr [style=dashed]; root_ptr->node1 [style=dashed]; } ``` 此結構是為了實作動態記憶體分配器，也就是 heap，在這邊使用二元搜尋樹結構，因為這樣可以確保每個元素在時間複雜度 $O(\log n)$ 之內一定可以被找到。而我們需要快速找到適合被分配的 free block，因此需要 `find_free_tree` 和 `find_predecessor_free_tree` 兩個函式協助實作，需要分配記憶體的時候就要用 `remove_free_tree` 從樹中移除一個 free block 來分配。接下來是 `remove` 的邏輯，首先要用 `find_free_tree` 找到目標節點 `target` 並回傳一個指標的指標。在這裡使用指標的指標是因為我們可以直接修改 `*node_ptr` 來改變樹狀結構，但如果 `node_ptr` 只是一個指向 `block_t` 的指標，則我們只是在改變這個指標本身，而未改變樹狀結構中節點的 `*l`, `*r` 指向。 ```diff + *node_ptr = child // for (block_t **)node_ptr - node_ptr = child // for (block *)node_ptr ``` 刪除可以分成 `target` 有 2, 1, 或 0 個子節點的情況來處理，其中 1 個或 0 個子節點的情況都較簡單，只需要直接刪除並且把 `*node_ptr` 變成其子節點就好，而在有 2 個節點的情況就需要決定用 predecessor 還是 successor 來補位，以維持 BST 的特性。小考的實作是以 predecessor 進行補位，即**左子樹的最右節點**，因此用 if 條件式確認 predecessor 是正好位於 `target` 的左子節點，還是在左子樹更深的地方。找 predecessor 的程式碼： ```c block_t **pred_ptr = &(*node_ptr)->l; // 從左子節點開始 while ((*pred_ptr)->r) pred_ptr = &(*pred_ptr)->r; ``` 取得 `pred_ptr` 後，會跟 `find_predecessor_free_tree` 比較，確保我們所找到的 predecessor 是正確的。接下來則判斷 predecessor 是否為 `target` 的左子節點，若是，則左子節點直接移到 target 的位置，並且原本 `target` 的右子節點變成了剛補上來的 `*pred_ptr` 的右子節點。 ```c if (*pred_ptr == (*node_ptr)->l) { block_t *old_right = (*node_ptr)->r; *node_ptr = *pred_ptr; /* Replace target with its left child. */ (*node_ptr)->r = old_right; /* Attach the original right subtree. */ // ...assert } ``` *pred_ptr` 的左、右子節點。要特別注意的是移除 predecessor 時，會遞迴呼叫 `remove_free_tree` ，這是為了避免 predecessor 也有自己的子節點，所以遞迴呼叫會變成子節點數量為 0 或 1 的 case（不可能還有 2 個子節點，否則不會是 rightmost ）。 ```c else { /* The predecessor is deeper in the left subtree. */ block_t *old_left = (*node_ptr)->l; block_t *old_right = (*node_ptr)->r; block_t *pred_node = *pred_ptr; /* Remove the predecessor from its original location. */ remove_free_tree(&old_left, *pred_ptr); /* Replace the target node with the predecessor. */ *node_ptr = pred_node; (*node_ptr)->l = old_left; (*node_ptr)->r = old_right; ``` - `remove_free_tree` 的示意圖： ```graphviz digraph RemoveNode { node [shape=record, style=plaintext]; // Initial Tree State subgraph cluster_0 { label="Step 1: Initial Tree"; node1 [label="<l> | <data> size=16 | <r>"]; node2 [label="<l> | <data> size=8 | <r>"]; node3 [label="<l> | <data> size=24 | <r>"]; node4 [label="<l> | <data> size=4 | <r>"]; node5 [label="<l> | <data> size=12 | <r>"]; node6 [label="<l> | <data> size=20 | <r>"]; node7 [label="<l> | <data> size=28 | <r>"]; node1:l -> node2:data; node1:r -> node3:data; node2:l -> node4:data; node2:r -> node5:data; node3:l -> node6:data; node3:r -> node7:data; } // Predecessor Removal subgraph cluster_1 { label="Step 2: Removing Predecessor (16)"; // Main nodes node1b [label="<l> | | <r>"]; node2b [label="<l> | <data> size=8 | <r>"]; node3b [label="<l> | <data> size=24 | <r>"]; node4b [label="<l> | <data> size=4 | <r>"]; node5b [label="<l> | <data> size=12 | <r>"]; node6b [label="<l> | <data> size=20 | <r>"]; node7b [label="<l> | <data> size=28 | <r>"]; // Predecessor pointers pred_ptr_ptr [label="pred_ptr", shape=plaintext, color=red]; pred_ptr [label="*pred_ptr", shape=plaintext, color=red]; // Define tree structure node1b:l -> node2b:data; node1b:r -> node3b:data; node2b:l -> node4b:data; node2b:r -> node5b:data; node3b:l -> node6b:data; node3b:r -> node7b:data; // Highlight the predecessor pointer pred_ptr_ptr -> pred_ptr [style=dashed, color=red]; pred_ptr -> node5b:data [style=dashed, color=red]; // Ensure left and right nodes are at the same rank for proper positioning {rank=same; node4b; node5b;} {rank=same; node6b; node7b;} } // Tree after Replacement subgraph cluster_2 { label="Step 3: Predecessor (12) Replaces Target (16)"; node1c [label="<l> | <data> size=12 | <r>"]; node2c [label="<l> | <data> size=8 | <r>"]; node3c [label="<l> | <data> size=24 | <r>"]; node4c [label="<l> | <data> size=4 | <r>"]; node6c [label="<l> | <data> size=20 | <r>"]; node7c [label="<l> | <data> size=28 | <r>"]; pred_ptr_ptr2 [label="pred_ptr", shape=plaintext, color=red]; pred_ptr2 [label="*pred_ptr", shape=plaintext, color=red]; pred_ptr_ptr2 -> pred_ptr2 [style=dashed, color=red]; pred_ptr2 -> node1c:data [style=dashed, color=red]; node1c:l -> node2c:data; node1c:r -> node3c:data; node2c:l -> node4c:data; node3c:l -> node6c:data; node3c:r -> node7c:data; } } ``` 接下來則是補完程式碼，讓其可以運作，我補完了 `find_free_tree` 和 `find_predecessor_free_tree` ，其中我使用遞迴 DFS 的方式尋找目標節點，讓程式碼保持簡潔。在尋找 predecessor 的時候則要先判斷 target 有沒有左子節點，若有則直接找左子樹的 rightmost node，若無同時紀錄目前節點和前一個節點，直到 pos 走到 target 之後，pred 就會紀錄到 target 的 predecessor。 - 找 target ```c block_t **left_result = find_free_tree(&(*root)->l, target); if (left_result) return left_result; return find_free_tree(&(*root)->r, target); ``` - 找 target 的 predecessor（沒有 leftchild 的情況） ```c while (pos) { if (pos->size < node->size) { pred = pos; pos = pos->r; } else { pos = pos->l; } } ``` 完整程式碼包含測試程式，見 Commit [6c6a52f](https://github.com/LinMarc1210/linux-hw2/commit/6c6a52fb56aae995a2695e9bb9efb20f97c61347) ## 第一週 - 測驗 3 ### 解釋程式碼運作原理小考示範使用 `begin` 和 `end` 兩個指標陣列來模擬堆疊，並透過索引值進行遞迴排序。其中 `L` 和 `R` 會取得該子陣列的頭和尾，並且判斷 `L` 是否等於 `R` ，其用意在於判斷子陣列是否僅剩下一個元素，與遞迴終止條件相同。當子陣列只剩一個元素的時候則將元素加到 `result` ，並且每次都加到串列的最前端，因為我們是從右邊開始進行排序的，這樣做才能保持元素的排序是正確的。而在需要我們填空的那段程式碼與前面用來講解的，最大的不同處在於底下這段不用另外開一個 `end` 陣列去紀錄子陣列的尾，而是從結構上直接斷開鏈結，因此我們只須紀錄頭，並且每次使用 `list_tail` 取得其最後一個元素即可。過程保持單向鏈結串列的結構走訪，並且在最後呼叫 `rebuild_list_link` 重建雙向環狀鏈結串列。而在迴圈內我們用指標 `p` 走訪節點並逐一和 `pivot` 比較 `value` ，比 `pivot` 小則用 `left` 去加進左子串列，比 `pivot` 大則用 `right` 去加進右子串列，如下： - 原本的串列 ```graphviz digraph graphname{ node[shape=box] rankdir = LR A[label=4] B[label=1] C[label=3] D[label=5] E[label=2] F[label=7] G[label=8] NULL[label=NULL,shape=plaintext] A->B->C->D->E->F->G->NULL { rank="same"; L[label="L",shape=plaintext,fontcolor=red] pivot[label="pivot",shape=plaintext,fontcolor=blue] pivot->L[color=blue] L->A[color=red]; } { rank="same"; R[label="R",shape=plaintext,fontcolor=red] R->G[color=red]; } { rank="same"; p[label="p",shape=plaintext,fontcolor=green] p->B[color=green]; } } ``` - 加入子串列的過程： ```graphviz digraph graphname{ node[shape=box] A[label=1] NULL[label=NULL,shape=plaintext] NULL { rank="same"; left[label="left",shape=plaintext,fontcolor=green] left->NULL[color=green]; } { rank="same"; n[label="n",shape=plaintext,fontcolor=green] n->A[color=green]; } } ``` ```graphviz digraph graphname{ node[shape=box] rankdir = LR B[label=1] NULL[label=NULL,shape=plaintext] B->NULL { rank="same"; p[label="left",shape=plaintext,fontcolor=green] p->B[color=green]; } } ``` - 完成 partition 後： ```graphviz digraph{ ## rankdir=TD rankdir=LR node [shape=box,color=black] //tp [shape=none,label=pivot,fontcolor=red] A [shape=none,label=left] B [shape=none,label=right] C [shape=none,label=pivot,fontcolor=blue] C->4 B->8->7->5 A->2->3->1 } ``` 這樣就完成了將走訪到的節點 1 加進左子串列，並且在 p 走完所有節點後，left 和 right 會分別停在左子串列和右子串列的頭部，因此就可以拿來紀錄 `begin[i]`, `begin[i+2]`，之後只要用 `list_tail` 就可以得到子串列的尾端。為了模擬遞迴呼叫，每次根據 pivot 完成 partition 之後，會將索引值 `i += 2` ，每次都從右半部開始進行排序，直到只剩一個元素則加進 `result` ，這裡必須要進行 `i--` ，讓索引值可以往左半部，並對左半部的串列進行排序。 #### swap 為主體的方法根據 [quiz1 - 測驗 3](https://hackmd.io/@sysprog/linux2025-quiz1#%E6%B8%AC%E9%A9%97-3) 對於非遞迴的快速排序解釋如下，但我覺得這邊解釋的不清楚： > 假定下方是起始的陣列內容。採取 head 作為 pivot，並且 L 會從左邊開始掃，紀錄有多少值小於 pivot，R 從右邊開始，紀錄有多少值大於 pivot。(R 一開始會定義為陣列的 length，以減法的方式紀錄) 但我參考 [Optimized QuickSort — C Implementation (Non-Recursive)](https://alienryderflex.com/quicksort/) 的解釋，L 和 R 應該是從左邊界和右邊界開始走訪，只要 L 漸增的過程中遇到有節點比 pivot 大，就把目前的元素複製到 R 的位置，並且換 R 漸減，若遇到有節點比 pivot 小，就複製到 L 的位置，並且換成 L 漸減，直到 `L >= R` 停止，然後把 `pivot` 放到 L 的位置，這樣才能解釋為什麼要 swap，因為比較小的元素被換到 pivot 左邊，比較大的元素被換到 pivot 右邊，並且繼續下一輪的排序。 #### 測試程式解釋 ```c shuffle(test_arr, count); while (count--) list_construct(list, test_arr[count]); quick_sort(list); assert(list_is_ordered(list)); ``` 這題在 `main()` 裡面先將 `test_arr` 所儲存的值洗牌，並且用來建構串列，然後再呼叫 `quick_sort` 並且用 `list_is_ordered()` 檢查元素是否排序完成。 ### 研讀〈A comparative study of linked list sorting algorithms〉 ## 第二週 - 測驗 1 ### 解釋程式碼與探討 stable sort 先來補充一下一般遞迴版本 Quick Sort 的 Pseudo Code ： ``` QUICKSORT(A, p, r) { if p < r q = PARTITION(A, p, r) QUICKSORT(A, p, q-1) QUICKSORT(A, q+1, r) } ``` 流程是先用 `PARTITION` 選定一個 pivot，並且將陣列中分成比 pivot 小的和比 pivot 大的兩部份，最後分別遞迴呼叫左半部和右半部，再次進行一樣的動作，直到子陣列只剩一個元素。注意這邊遞迴呼叫 `QUICKSORT` 的時候，是不應該將 pivot 再放到任意子陣列的，否則到後面切分永遠都會回傳一樣的子陣列，因為子陣列所有元素都比 pivot 小（或大），造成無限遞迴而導致記憶體 stack 爆掉。小考使用 Linux 核心風格 List API 的簡化標頭檔 `"list.h"` 進行開發，其中最特別的點是，一般的遞迴版本並不會保持 stable sort ，而小考的版本則會，根據《 Introduction to Algorithm 》，書內的快速排序在做 partition 是透過交換來決定 pivot 的位置，但交換時並不會特別考量相同元素的相對順序，原版的 `PARTITION` 程式碼如下： ``` PARTITION(A, p, r) { x = A[r]; i = p - 1; for j = p to r - 1 do if A[j] <= x then i = i + 1 exchange A[i] , A[j] exchange A[i+1], A[r] return i + 1 } ``` 其無法保持 stable sort 的例子如下，在 pivot 放到最後位置的時候並不會考慮 5(red) 和 5(blue) 的相對順序，導致在交換之後相對順序亂掉。 ```graphviz digraph LomutoPartition { rankdir=TB; node [shape=box, style=filled, fontname="Arial"]; subgraph cluster_start { label = "Start"; style = dashed; start_3 [label="3 (pivot)", fillcolor="lightgrey"]; start_5b [label="5", fillcolor="lightcoral"]; start_1 [label="1", fillcolor="white"]; start_5a [label="5", fillcolor="lightblue"]; } subgraph cluster_step1 { label = "Step 1: 5 > 3 → No Swap"; style = dashed; step1_3 [label="3 (pivot)", fillcolor="lightgrey"]; step1_5b [label="5", fillcolor="lightcoral"]; step1_1 [label="1", fillcolor="white"]; step1_5a [label="5", fillcolor="lightblue"]; } subgraph cluster_step2 { label = "Step 2: 1 <= 3 → Swap 5 and 1"; style = dashed; step2_3 [label="3 (pivot)", fillcolor="lightgrey"]; step2_5b [label="5", fillcolor="lightcoral"]; step2_5a [label="5", fillcolor="lightblue"]; step2_1 [label="1", fillcolor="white"]; } subgraph cluster_step3 { label = "Step 3: 5 > 3 → No Swap"; style = dashed; step3_3 [label="3 (pivot)", fillcolor="lightgrey"]; step3_5b [label="5", fillcolor="lightcoral"]; step3_5a [label="5", fillcolor="lightblue"]; step3_1 [label="1", fillcolor="white"]; } subgraph cluster_step4 { label = "Step 4: Swap Pivot 3 with 5a"; style = dashed; step4_5a [label="5", fillcolor="lightblue"]; step4_5b [label="5", fillcolor="lightcoral"]; step4_3 [label="3 (pivot)", fillcolor="lightgrey"]; step4_1 [label="1", fillcolor="white"]; } start_3 -> step1_3 [style=invis]; start_5b -> step1_5b [style=invis]; start_1 -> step1_1 [style=invis]; start_5a -> step1_5a [style=invis]; step1_3 -> step2_3 [style=invis]; step1_5b -> step2_5b [style=invis]; step1_1 -> step2_5a [style=invis]; step1_5a -> step2_1 [style=invis]; step2_3 -> step3_3 [style=invis]; step2_5b -> step3_5b [style=invis]; step2_5a -> step3_5a [style=invis]; step2_1 -> step3_1 [style=invis]; step3_3 -> step4_5a [style=invis]; step3_5b -> step4_5b [style=invis]; step3_5a -> step4_3 [style=invis]; step3_1 -> step4_1 [style=invis]; } ``` 接下來分析小考 `list_quicksort` 的實作，首先為了將串列分割成比 pivot 小和比 pivot 大的子串列，會使用兩個 dummy node，分別是 `list_less` 和 `list_greater` 作為子串列的頭。這裡不用 `list_head *` 而是用 `list_head` 是因為我們需要真實的記憶體區段，才有辦法使用 `head->next` ，否則會在使用 `->` 的時候發生 dereference a NULL pointer 的錯誤。 ```c struct list_head list_less, list_greater; ... INIT_LIST_HEAD(&list_less); INIT_LIST_HEAD(&list_greater); ``` 回想最基本版的快速排序 pseudo code，我們得知 pivot 是不應該被算進去字串列的，因此在進行分割串列之前，得先將 pivot 從串列中移除，所以使用 `list_first_entry` 直接取得該鏈結串列的第一個元素。 ```c pivot = list_first_entry(head, struct listitem, list); list_del(&pivot->list); ``` 再來，我們希望將小的元素放至 `list_less` 的串列，而大的放去 `list_greater` 的串列，因此使用 `list_move_tail`。在這裡就是保持穩定排序的關鍵，首先是判斷式，當元素值與 pivot 相等時，`cmpint` 會回傳 0，此時程式會進到 else 分支，而 else 分支會將元素加進 `list_greater` 子串列當中，而因為我們的 **pivot** 選定的是串列的**第一個元素**，這代表即使串列當中有和 pivot 相同的元素，也應該在排序後保持在 pivot 的右邊（即 `list_greater` 子陣列）才可以維持 pivot 本來就比較前面的相對順序。又因 `list_for_each_entry_safe` 這個巨集是從左走訪至右，因此有相同的元素時，先被走到的應該仍然要在被加到同個串列時保持在左邊，這也使得我們要使用 `list_move_tail` ，將元素移至串列的尾端，如果用的是 `list_move` 則會讓原本的相對順序反過來了。 ```c list_for_each_entry_safe (item, is, head, list) { if (cmpint(&item->i, &pivot->i) < 0) list_move_tail(&item->list, &list_less); else list_move_tail(&item->list, &list_greater); } ``` 最後執行遞迴呼叫，將左半和右半都排序完成。`list_add` 和 `list_splice` 的差別只在於前者是只能從 `head` 之後插入一個節點（新的頭元素），而後者是插入一整段子串列至 `head` 後面。`list_splice_tail` 則是插入一整段子串列至串列尾端。這樣子執行，正好可以在 partition 排序完成後，依照 less->pivot->greater 的升序組合回原本的串列。 ```c list_quicksort(&list_less); list_quicksort(&list_greater); list_add(&pivot->list, head); list_splice(&list_less, head); list_splice_tail(&list_greater, head); ``` ### 改進 random_shuffle_array ## 第二週 - 測驗 2 ### 解釋程式碼針對 leading zero 的個數要怎麼算，這邊使用遞迴，每次將位元數切半，只要算一半位元的 leading zero 即可，並且當 upper 為 0，就代表 upper 全部都是 leading zero，只要計算 lower 還有幾個就好 ; 反之若 upper 不為 0 ，則 lower 都不需要再計算，因為 upper 一定還有位元為 1，使得 lower 的位元不可能是 leading zero。根據這樣的邏輯，我們可以實作 `clz2` ，並且針對 32 位元的整數操作。小考題目有說到僅剩 2 位元時，就需要檢查 leading zero 了，因此我們可以判斷從 32 位元的數字要切半 4 次才有辦法將 x 切到僅剩 2 位元，因此我們除了傳入要計算 leading zero 的數字 x 以外，還要傳入現在是第幾刀，以 `c = 0, 1, 2, 3` 表示切 4 次，這樣的用意是為了要知道我們要使用 x 的幾個最低位元。程式使用 `mask[c]` 來取得 lower 的位元，因此我們要決定第幾刀的時候，要將多少位元清空和留下哪些位元。對於 `uint32_t lower` 來說，最高 16 位元是不需要的，我們只關注最低 16 位元，所以用一個 16 位元的 mask `0xFFFF` 來保留最低 16 位。 4 次切半 lower 分別需要保留 16, 8, 4, 2 個最低位元，因此 `0xFFFF` 應該要往右分別 shift 0, 8, 12, 14。而當 `c == 3` 也就是切到剩 2 個位元時，必須要開始計算 leading zero 了。 2 個位元可以表示成四種數值，換成十進制會正好是 0, 1, 2, 3 也就是 c 的數值。我們用表格說明遞迴終止的情況，就可以得知 `magic[upper]` 其實就是表示 upper 此時的 leading zero 有幾個，lower 也一樣。 | upper (lower) 二進位 | upper (lower) 十進位 | leading zero | | -------- | -------- | -------- | | `00` | 0 | **2** | | `01` | 1 | **1** | | `10` | 2 | **0** | | `11` | 3 | **0** | 因此程式碼在遞迴時都是判斷 upper 是否為 0 ，若 upper 不為 0 則遞迴呼叫 `clz2` 計算 upper ，為 0 則將 upper 現在的位元數 `16 >> c` 加上遞迴呼叫計算 `lower` 得到的結果，並且每次遞迴呼叫 c 要加 1，表示要進行新一次的切半了。 ```c static const int mask[] = {0, 8, 12, 14}; static const int magic[] = {2, 1, 0, 0}; unsigned clz2(uint32_t x, int c) { if (!x && !c) return 32; uint32_t upper = (x >> (16 >> c)); uint32_t lower = (x & (0xFFFF >> mask[c])); if (c == 3) return upper ? magic[upper] : 2 + magic[lower]; return upper ? clz2(upper, c + 1) : (16 >> (c)) + clz2(lower, c + 1); } ``` 而 64 位元的 `clz64` 只需要再多增一層遞迴就可以達到一樣的效果：切半後呼叫 `clz32` 。 ```c static inline int clz64(uint64_t x) { /* If the high 32 bits are nonzero, count within them. * Otherwise, count in the low 32 bits and add 32. */ return (x >> 32) ? clz32((uint32_t) (x >> 32)) : clz32((uint32_t) x) + 32; } ``` 在求整數平方根時， ```c uint64_t sqrti(uint64_t x) { uint64_t m, y = 0; if (x <= 1) return x; int total_bits = 64; /* clz64(x) returns the count of leading zeros in x. * (total_bits - 1 - clz64(x)) gives the index of the highest set bit. * Rounding that index down to an even number ensures our starting m is a * power of 4. */ int shift = (total_bits - 1 - clz64(x)) & MMMM; m = 1ULL << shift; while (m) { uint64_t b = y + m; y >>= 1; if (x >= b) { x -= b; y += m; } m >>= 2; } return y; } ``` ## 第二週 - 測驗 3 ### 解釋程式碼透過 [Linux 核心的 hash table 實作](https://hackmd.io/@sysprog/linux-hashtable) 得知 Linux 核心中使用 `hlist` 的結構來處理 hash collision，並且用 `**pprev` 取代 `*prev` ，改成存「指向自己的指標」，結構如下： - 使用 `prev` 時：指向前個節點 ```graphviz digraph G { rankdir = LR; splines = false; node[shape = "record"] list_head[label = "<m>list_head | <n>first"] node_1[label = "<m>dll_node_1 | {prev | <n>next}", group = list]; node_2[label = "<m>dll_node_2 | {prev | <n>next}", group = list]; node_3[label = "<m>dll_node_3 | {prev | <n>next}", group = list]; NULL_1, NULL_2[shape = plaintext, label = "NULL", group = list] {rank = "min" list_head NULL_1} list_head -> node_1:m; node_1:p -> NULL_1 node_1:n -> node_2:m; node_2:p -> node_1:m; node_2:n -> node_3:m; node_3:p -> node_2:m; node_3:n -> NULL_2; } ``` - 使用 `pprev` 時：指向前一個節點的成員指標 `next` ```graphviz digraph G { rankdir = LR; splines = false; node[shape = "record"] list_head[label = "<m>list_head | <n>first"] node_1[label = "<m>hlist_node_1 | {pprev | <n>next}", group = list]; node_2[label = "<m>hlist_node_2 | {pprev | <n>next}", group = list]; node_3[label = "<m>hlist_node_3 | {pprev | <n>next}", group = list]; NULL[shape = plaintext, label = "NULL", group = list] {rank = "min" list_head} list_head -> node_1 -> node_2 -> node_3[ weight = 10, style = invis ] list_head:n -> node_1:m; node_1:p -> list_head:n; node_1:n -> node_2:m; node_2:p -> node_1:n; node_2:n -> node_3:m; node_3:p -> node_2:n; node_3:n -> NULL; } ``` 也就是 `**pprev` 其實會指到自己的節點，因此在做刪除時，我們只需要傳入要刪除的節點，不需要再傳入 `head`，我們也可以在教材中看到對於 `hlist` 的小結。 > 至此，我們理解到，`hlist` (即 hash list) 是針對雜湊表特製的鏈結串列，儘管尋常的雙向鏈結串列亦可實作雜湊表，但由於後者開頭和結尾需要各用 2 個指標，當 bucket 很大時，會增加記憶體開銷。回到小考題目，我們要用 hash table (以下簡稱 `HT` )讓 [Two Sum](https://leetcode.com/problems/two-sum/description/) 的時間複雜度降為 $O(n)$，因此可用 `HT` 紀錄缺少的值：`target - nums[i]` ，並且讓 `HT[target - nums[i]] = i`，因此若現在的 `nums[i]` 是一個與前面某個值相加可以成為 `target` 的值，則會回傳 `[i, HT[target - i]]`。接下來觀察結構 `map_t` ，裡面內嵌一個 `hlist_head` 的結構，代表整個 `hlist` 的起始端，`bits` 則是 hash table 的大小，用來實作在教材中提及雜湊函數的 Multiplication Method，本來的雜湊函數 $h(K) = \lfloor m \times (KA - \lfloor KA \rfloor) \rfloor$ 可以由 $h(K) = K \times 2^w \times A >> (w - p)$ 實作，適用於我們不知道 key 的範圍與分佈情形。`hash_key` 則是內嵌 `hlist_node`，代表這是元素經過 hash function 計算後分配的節點。原本的 key 以 `key` 存，原本的值則用 `value` 存，並且透過內嵌的 `hlist_node` 處理碰撞時需要用的鏈結串列。用來檢索 key 的 `find_key` 如下，我們需要傳入 `map_t` ，也就是表示整個 hash table 的結構，還有我們要尋找的 `key`。其中我們使用 `hash(key, map->bits)` 呼叫雜湊函數計算對應的 key 值，並且在 `map->ht` 中尋找對應 bucket 的 `hlist_head` ，最後還需要使用 `&` 取址才能丟給 `head`。後續再從對應的 `hlist` 依照 `key` 找正確的節點，並回傳整個節點結構 `hash_key`，最後再讓 `map_get` 呼叫 `find_key` 並取出 `data` 來完成 key-value 的檢索。 ```c static struct hash_key *find_key(map_t *map, int key) { struct hlist_head *head = &(map->ht)[hash(key, map->bits)]; for (struct hlist_node *p = head->first; p; p = p->next) { struct hash_key *kn = container_of(p, struct hash_key, node); if (kn->key == key) return kn; } return NULL; } ``` 新增一個節點則需要考慮是否已存在一樣的 key，若不存在才會繼續進行新增，並且透過 `hash` 決定要安插在哪個 bucket。插入時如果該 bucket 已經有東西了，則需要接在該鏈結串列的最前面。 ``` void map_add(map_t *map, int key, void *data) { struct hash_key *kn = find_key(map, key); ... struct hlist_head *h = &map->ht[hash(key, map->bits)]; struct hlist_node *n = &kn->node, *first = h->first; n->next = first; if (first) first->pprev = &n->next; h->first = n; n->pprev = &h->first; } ``` `map_deinit` 釋放記憶體，依序將 hash table 的每個 bucket 裡面的 `hlist` 依序釋放，並且透過 `if(!n->pprev)` 檢查 n 是否有插入至 hash table，但我目前還未釐清為什麼是用 `!n->pprev` 作為條件來判斷是否插入節點，還有什麼情況會發生這樣的問題。 ```c void map_deinit(map_t *map) { ... for (int i = 0; i < MAP_HASH_SIZE(map->bits); i++) { ... for (struct hlist_node *p = head->first; p;) { ... if (!n->pprev) /* unhashed */ goto bail; struct hlist_node *next = n->next, **pprev = n->pprev; *pprev = next; if (next) next->pprev = pprev; n->next = NULL, n->pprev = NULL; bail: free(kn->data); free(kn); } } free(map); } ``` 最後，就可以只用一層迴圈，每次將走訪到的元素放進 hash table，同時確認是否有對應的 key，以及其互補值在 `nums` 的 index 為多少。