2024q1 Homework2 (quiz1+2)

# 2024q1 Homework2 (quiz1+2) > contributed by < vestata > # 第一周測驗 ## 測驗一 Optimized QuickSort ### 解釋上述程式碼的運作原理 > 解釋上述程式碼的運作，撰寫完整的測試程式，指出其中可改進之處並實作 #### 理解演算法詳細閱讀 [Optimized QuickSort — C Implementation (Non-Recursive)](https://alienryderflex.com/quicksort/) 它帶來以下優點： 1. 避免 FUNCTION CALLS > Calling (and returning from) a function is time-consuming 2. 避免使用 STACK 3. SWAP VARIABLE：每一輪要用 swap 函式存取 swap 的暫存空間數次，作者的方法只需要一次，在 swap pivot 的時候 4. 降低不必要的移動 5. SELF-SWAPS: Wikipedia version 有 50% 的機會自我交換，造成額外開銷 - Non-recursive quicksort 相比 WIKI 版本需要更多的比較次數。 - 每一輪示範： 1. Starting condition. Pivot 複製第一個節點內容。 L,R 分別在頭尾 ```graphviz digraph S{ rankdir=LR node[shape=box] head[color=white, fontcolor=red] pivot[color=white, fontcolor=red] L[color=white, fontcolor=green] R[color=white, fontcolor=green] tmp[label=4] pivot->tmp null[label="NULL", color=white] 4->1->3->5->2->7->null head->4{rank=same;head;4} L->4{rank=same;L;4} R->7{rank=same;R;7} } ``` 2. R 往回尋找小於 pivot 值的位置，並複製內容到 L 的位置 ```graphviz digraph S{ rankdir=LR node[shape=box] head[color=white, fontcolor=red] pivot[color=white, fontcolor=red] L[color=white, fontcolor=green] R[color=white, fontcolor=green] tmp[label=4] node1[label=2, fontcolor=green] node2[label=1] node3[label=3] node4[label=5] node5[label=2] node6[label=7] pivot->tmp null[label="NULL", color=white] node1->node2->node3->node4->node5->node6->null head->node1{rank=same;head;node1} L->node1{rank=same;L;node1} R->node5{rank=same;R;node5} } ``` 3. L 往前尋找大於 pivot 值得位置，並複製內容到 R 的位置 ```graphviz digraph S{ rankdir=LR node[shape=box] head[color=white, fontcolor=red] pivot[color=white, fontcolor=red] L[color=white, fontcolor=green] R[color=white, fontcolor=green] tmp[label=4] node1[label=2, fontcolor=green] node2[label=1] node3[label=3] node4[label=5] node5[label=5, fontcolor=green] node6[label=7] pivot->tmp null[label="NULL", color=white] node1->node2->node3->node4->node5->node6->null head->node1{rank=same;head;node1} L->node4{rank=same;L;node4} R->node5{rank=same;R;node5} } ``` 4. 一樣動作， R 往回尋找小於 pivot 值的位置，並複製內容到 L 的位置，遇到 L>=R 的情形，便複製 pivot 內容到 L 。 ```graphviz digraph S{ rankdir=LR node[shape=box] head[color=white, fontcolor=red] pivot[color=white, fontcolor=red] L[color=white, fontcolor=green] R[color=white, fontcolor=green] tmp[label=4] node1[label=2, fontcolor=green] node2[label=1] node3[label=3] node4[label=4, fontcolor=red] node5[label=5, fontcolor=green] node6[label=7] pivot->tmp null[label="NULL", color=white] node1->node2->node3->node4->node5->node6->null head->node1{rank=same;head;node1} L->node4{rank=same;L;node4} R->node4{rank=same;R;node4} } ``` 在 [測驗一](https://hackmd.io/@sysprog/linux2024-quiz1) 當時來不及用懂的內容，現在理解原理了。 #### 理解實作 - 鏈結串列結構體 - 它定義 `*left, *right;` 但是實際上沒有運用到 doubly-linked list 的性質 - `node_t *list_tail(node_t **left)` - 直接使用 while 迴圈遍歷至鏈結串列的最後一個節點（此操作類似於單向鏈結串列的處理方式）。 - `int list_length(node_t **left)` - 一樣類似於單向鏈結串列的處理方式計算節點個數 - quicksort 本體 #### 改進方案 1. 在 [Optimized QuickSort — C Implementation (Non-Recursive)](https://alienryderflex.com/quicksort/) 中提到 MAX_LEVELS 的限制，可以透過先執行較小的分區。以下是實作演算法。 ```c if (end[i]-beg[i]>end[i-1]-beg[i-1]) { swap=beg[i]; beg[i]=beg[i-1]; beg[i-1]=swap; swap=end[i]; end[i]=end[i-1]; end[i-1]=swap; } ``` > This might be slightly slower than the first one, but it will never fail because it always performs the smaller partition first. Hence, the only way it could run into the limit is if the to-be-sorted array was at least $2^{MAX_LEVELS}$ elements in size. Since $2^{300}$ is greater than $10^{90}$, and there are only about $10^{80}$ fundamental particles in this universe from which a human-made computer can be built, no larger limit will ever be needed. 所以將 MAX_LEVELS 設為 300。 **實作手法整合於下方使用 Linux 核心風格的 List API 改寫上述程式碼** 2. `int n` 跟 `node_t *n`容易混淆 `int n` 改為 `int size`; ### 使用 Linux 核心風格的 List API 改寫上述程式碼使用 Linux 核心風格的 List API 改寫上述程式碼，並針對鏈結串列，提出可避免最差狀況的快速排序實作，應設計效能評比的測試程式。 1. 結構體設計：原有的 *left 、 *right *next 指標現在可以被省略，直接採用 LIST API 進行替代。 ```diff typedef struct __node { - struct __node *left, *right; - struct __node *next; + struct list_head list; long value; } node_t; ``` 2. 修改 `list_add` 功能：此功能在串列的第一個節點前新增節點，現在的改進是直接對 `struct list_head` 進行操作。此外，由於其輸入參數的順序與 LIST API 完全相反，因此將其更名為 `list_add_` 以便於區分。 ```diff void list_add(struct list_head *list, struct list_head *node_t) { - node_t->next = *list; - *list = node_t; + list_add(node_t, list); } ``` - - - `list_add` 函數的使用是目前實作中遭遇的最大困擾。本來計劃為了減少對 `quicksort` 主函式的修改，因此沒有像 LIST API 那樣將 head 節點獨立出來。但這種結構設計在後續將 quicksort 與之結合時仍然面臨問題，目前尚待解決。 ```c void list_add_(struct list_head **head, struct list_head *node_t) { if (!*head) INIT_LIST_HEAD(node_t); else if (list_is_singular(*head)){ list_add(*head, node_t); } else list_add(node_t, list_tail(*head)); *head = node_t; } ``` - - - 3. 由於結構調整，`list_tail` 現在可以直接通過 `prev` 取得最後一個節點。 ```diff +struct list_head *list_tail(struct list_head *head) { - while ((*left) && (*left)->next) - left = &((*left)->next); - return *left; + return head->prev; } ``` 4. 調用 LIST API 以簡潔的程式遍歷串列（此函式不會使用到） ```diff -int list_length(node_t **left) +int list_length(struct list_head **left) { int n = 0; - while (*left) { - ++n; - left = &((*left)->next); - } + struct list_head *tmp, *safe; + list_for_each_safe(tmp, safe, *left) + n++; return n; } ``` 5. quicksort() 這個版本的實作未能成功。我在使用 begin 和 end 陣列時移除了 head 節點，原意是為了減少對程式碼的修改量。但現在，在最終添加結果時遇到了問題：當左側（L）或右側（R）僅包含 head 節點時，這個 head 會被錯誤地加入到結果中。 > 此內容合併到下方一起整理參考了 [vax-r](https://github.com/vax-r/Sorting-Lab/commit/a6851c4afc288d85f54ee5a06a36c8a4811809ee) 同學的實作方法後，我決定將 `end[i]` 的部分在我的實作中直接省略。原先沒有考慮到一開始就將 `begin[max_level]` 所有元素初始化的做法，我本以為這會分配過多的空間，但實際上這樣可以節省許多程式碼修改的時間。至於是否能進一步減少空間使用量，還有待進一步討論。 ```diff void quick_sort(struct list_head **list) { int value; int i = 0; int max_level = 300; struct list_head *begin[max_level], *end[max_level]; - struct list_head *result = malloc(sizeof(struct list_head)); - struct list_head *left = malloc(sizeof(struct list_head)); - struct list_head *right = malloc(sizeof(struct list_head)); - INIT_LIST_HEAD(result); - INIT_LIST_HEAD(left); - INIT_LIST_HEAD(right); + for (int i = 0; i < max_level; i++){ + begin[i] = list_new(); + end[i] = list_new(); + } + struct list_head *result = list_new(), *left = list_new(), *right = list_new(); - begin[0] = (*list)->next; - end[0] = list_tail(*list); + begin[0] = *list; while (i >= 0) { - struct list_head *L = begin[i]->next, *R = end[i]->next; + struct list_head *L = begin[i]->next, *R = begin[i]->prev; if (L != R) { struct list_head *pivot = L; node_t *tmp = list_entry(pivot, node_t, list); struct list_head *p = pivot->next; list_del_init(pivot); - while (p != L) { + while (p != begin[i]) { struct list_head *n = p; tmp = list_entry(n, node_t, list); p = p->next; list_add_(tmp->value > value ? right : left, n); } - begin[i] = left->next; - end[i] = list_tail(left); - begin[i + 1] = pivot; - end[i + 1] = pivot; - begin[i + 2] = right->next; - end[i + 2] = list_tail(right); - - struct list_head *ptr; - list_for_each(ptr, left) - list_del(ptr); - list_for_each(ptr, right) - list_del(ptr); + INIT_LIST_HEAD(begin[i]); + INIT_LIST_HEAD(begin[i + 1]); + INIT_LIST_HEAD(begin[i + 2]); + + list_splice_init(left, begin[i]); + list_add(pivot, begin[i + 1]); + list_splice_init(right, begin[i + 2]); + i += 2; } else { - if (!list_empty(L)) - list_add_(result, L); + if (list_is_singular(begin[i])) + list_splice_init(begin[i], result); i--; } } } ``` 在每一輪中，將 left 和 right 透過 `list_splice_init` 添加回 `begin[i]` 時，出現了節點重複的情況。因此，在每一輪中，我重新初始化 `begin[i]`、`begin[i + 1]` 和 `begin[i + 2]`，以確保之後的 `list_splice_init` 能夠正常運作。 - 圖解說明： ```graphviz ``` 6. list_is_ordered():使用 list_for_each_entry_safe 在最後一個節點可能會出錯，加上修正 ```diff bool list_is_ordered(struct list_head *head){ node_t *tmp, *safe; list_for_each_entry_safe(tmp, safe, head, list) { + if (&safe->list == head) + break; if (tmp->value > safe->value) return false; } return true; } ``` - - - ## 測驗二 Timsort ### 解釋上述程式碼的運作原理解釋上述程式碼的運作，撰寫完整的測試程式，指出其中可改進之處並實作 - 對於隨機資料，最佳的排序效率上限被約束在 $O(nlogn)$，但即便是在 $O(nlogn)$ 等級的排序演算法中，其效能與理論上的極限值 $O(nlogn)$ 之間仍存在差距。Timsort 在處理隨機資料時，表現略遜一籌，但對於部分已排序的資料則有出色的表現。 - 參考 [youtube](https://www.youtube.com/watch?v=_dlzWEJoU7I&ab_channel=THKazi) 解釋，它可以理解為 binary insertion sort 加上 merge sort 的合併。其中 minrun 的大小設定會是 timsort 的關鍵。 - minrun 的設計目的是讓剩餘資料在分割為 minrun 時，能夠**盡可能接近 2 的冪**，從而==使得後續的合併過程更高效== - 當 $N$ 為 2112 時，若 minrun 設為 32，則會切分成 66 個 run，合併時會產生 2048 + 64 的不平衡；但如果 minrun 設為 33，則會被切分成 64 個 run，其分割結果為 2 的冪，合併時達到完美平衡。 #### Collapse - Timsort 採用一組堆疊 (stack) 來暫存 run，此舉是為了避免在一開始就掃描整個資料範圍來產生 run，從而降低記憶體開銷。過程中，Timsort 運用 merge_collapse 函式來確保堆疊上的 run 長度保持平衡。該函式主要檢查堆疊頂端的 3 個 run 是否滿足以下原則： 1. A 的長度要大於 B 和 C 的長度總和。 2. B 的長度要大於 C 的長度。 - 當不符合這些條件時，函式會決定是否進行合併。 - 不僅確保堆疊中 run 的長度平衡，且因合併只能在相鄰的兩個 run 間進行，以確保排序的穩定性，因此合併操作僅可能採取 A+(B+C) 或 (A+B)+C 的形式進行。此舉讓 Timsort 在執行時兼顧高效又穩定。 - `merge_collapse` 函數的合併策略是較為寬鬆的 - 如果當前堆疊大小 n 大於等於3,並且前兩個 run 的長度之和小於等於後兩個 run 的長度之和,那麼就合併前兩個run。 - 如果當前堆疊大小 n 大於等於4,並且前三個 run 的長度之和小於等於後三個 run 的長度之和,那麼就合併前三個run。 - 如果前一個 run 的長度小於等於當前 run 的長度,那麼就合併前一個 run 和當前 run 。 - `merge_force_collapse` ：只要堆疊大小大於等於3,就一定要進行合併操作。 ```c do { /* Find next run */ struct pair result = find_run(priv, list, cmp); result.head->prev = tp; tp = result.head; list = result.next; stk_size++; tp = merge_collapse(priv, cmp, tp); } while (list); /* End of input; merge together all the runs. */ tp = merge_force_collapse(priv, cmp, tp); ``` 提到這兩者的差距，體現出它不一樣的試用場合， merge_collapse 試圖在減少合併次數和保留適當大小的 run 之間取得平衡。而 merge_force_collapse 在最後執行，採取了一種更加強制和嚴格的合併策略,目的是儘快將所有 run 合併成較大的 run 。最後呼叫完 merge_force_collapse 會剩下 1~2 個 run 再呼叫 merge_final 已完成最後的合併。 #### Galloping mode - 處理兩個相鄰子數列的合併過程中，直接在記憶體中操作會遇到一定的挑戰，因為大量的記憶體操作不僅實作困難，且效率也未必理想。因此，Timsort 採取了使用一塊臨時記憶區的策略，其大小設定為兩個子數列（A、B）中較短者的長度。 - 然而，考慮到 A 和 B 都是已排序好的數列，我們可持續改進：首先尋找 B 的首個元素（即 $B_0$）在 A 中的排序位置，從而確定 A 中有一段是小於 $B_0$ 者，可將這部分元素放回 A。接著，尋找剩餘 A 的首個元素在 B 中的位置，如此反覆進行，直到完成排序。這種方法被稱為 Galloping mode。 ```graphviz digraph S{ rankdir=LR node[shape=box] a1[label=1] a2[label=2] a3[label=3] a4[label=7] a5[label=8] a6[label=9] b1[label=4] b2[label=5] b3[label=6] b4[label=8] b5[label=9] b6[label=10] b7[label=11] A->a1->a2->a3->a4->a5->a6 B->b1->b2->b3->b4->b5->b6->b7 tmp } ``` 顯然 A 較短，因此先將 A 放入暫存記憶區 temp （只有第一次要做此動作，方便最小值移到 A ，最後直接回傳 A ） ```graphviz digraph S{ rankdir=LR node[shape=box] a1[label=1] a2[label=2] a3[label=3] a4[label=7, fontcolor=red] a5[label=8] a6[label=9] b1[label=4, fontcolor=red] b2[label=5] b3[label=6] b4[label=8] b5[label=9] b6[label=10] b7[label=11] tmp->a1->a2->a3->a4->a5->a6 B->b1->b2->b3->b4->b5->b6->b7 A } ``` 接著找到 $B_0$ 在 A 中的位置，因 $B_0 > A_2$ ，所以 $A_0$ 到 $A_2$ 可以直接放回 A 數列。 ```graphviz digraph S{ rankdir=LR node[shape=box] a1[label=1, fontcolor=blue] a2[label=2, fontcolor=blue] a3[label=3, fontcolor=blue] a4[label=7, fontcolor=red] a5[label=8] a6[label=9] b1[label=4, fontcolor=red] b2[label=5] b3[label=6] b4[label=8] b5[label=9] b6[label=10] b7[label=11] tmp->a4->a5->a6 B->b1->b2->b3->b4->b5->b6->b7 A->a1->a2->a3 } ``` $tmp_0$ 去比對 $B$ 有多少小於 $tmp_0$ 並合併到 $A_{end}$ ```graphviz digraph S{ rankdir=LR node[shape=box] a1[label=1] a2[label=2] a3[label=3] a4[label=7, fontcolor=red] a5[label=8] a6[label=9] b1[label=4] b2[label=5] b3[label=6] b4[label=8, fontcolor=red] b5[label=9] b6[label=10] b7[label=11] tmp->a4->a5->a6 B->b1->b2->b3->b4->b5->b6->b7 A->a1->a2->a3 } ``` 反覆去進行以上操作...實作上針對相同大小元素要特別處理 ```graphviz digraph S{ rankdir=LR node[shape=box] a1[label=1] a2[label=2] a3[label=3] a4[label=7, fontcolor=red] a5[label=8] a6[label=9] b1[label=4, fontcolor=blue] b2[label=5, fontcolor=blue] b3[label=6, fontcolor=blue] b4[label=8, fontcolor=red] b5[label=9] b6[label=10] b7[label=11] tmp->a4->a5->a6 B->b4->b5->b6->b7 A->a1->a2->a3->b1->b2->b3 } ``` ```graphviz digraph S{ rankdir=LR node[shape=box] a1[label=1] a2[label=2] a3[label=3] a4[label=7, fontcolor=blue] a5[label=8] a6[label=9] b1[label=4] b2[label=5] b3[label=6] b4[label=8, fontcolor=red] b5[label=9] b6[label=10] b7[label=11] tmp->a5->a6 B->b4->b5->b6->b7 A->a1->a2->a3->b1->b2->b3->a4 } ``` ```graphviz digraph S{ rankdir=LR node[shape=box] a1[label=1] a2[label=2] a3[label=3] a4[label=7] a5[label=8, fontcolor=blue] a6[label=9, fontcolor=blue] b1[label=4] b2[label=5] b3[label=6] b4[label=8, fontcolor=blue] b5[label=9, fontcolor=blue] b6[label=10] b7[label=11] tmp B->b6->b7 A->a1->a2->a3->b1->b2->b3->a4->b4->b5->a5->a6 } ``` ```graphviz digraph S{ rankdir=LR node[shape=box] a1[label=1] a2[label=2] a3[label=3] a4[label=7] a5[label=8] a6[label=9] b1[label=4] b2[label=5] b3[label=6] b4[label=8] b5[label=9] b6[label=10, fontcolor=blue] b7[label=11, fontcolor=blue] tmp B A->a1->a2->a3->b1->b2->b3->a4->b4->b5->a5->a6->b6->b7 } ``` 將 temp 的首個元素放到 $B$ 中適當的位置，如此反覆進行。這種改進方案在多數情況效果顯著，但對於隨機資料序列而言，可能會比逐對合併更慢。故在 Timsort 會有一個機制決定是否要啟動 Galloping mode 。 ### 實作解析 #### main.c 針對我沒有見過得部份: ```c /* Assume ASLR */ srand((uintptr_t) &main); ``` 這一行是在設置隨機數種子( seed )。srand() 函數用於初始化隨機數生成器。它的參數是一個整數值,作為隨機數序列的起點。 (uintptr_t) &main 這個表達式是將 main 函數的地址轉換為一個整數值。這種做法是基於一種假設,即由於開啟了地址空間隨機載入(Address Space Layout Randomization, ASLR)的緣故, main 函數的地址在每次執行時都會發生變化,從而可以提供一個"足夠隨機"的種子。 ```c test_t tests[] = { {.name = "timesort", .impl = timsort}, {NULL, NULL}, }; test_t *test = tests; ``` 這部分定義了一個 test_t 結構體數組,數組中只有一個元素,其 name 成員為 "timesort", impl 成員為 timsort 函數的地址。最後一個元素是 {NULL, NULL} ,作為數組終止標記。然後定義了一個指向 test_t 結構體的指標 test ,並將它初始化為指向 tests 數組的首元素。 ```c while (test->impl) ``` while (test->impl)的意思是:只要 test->impl 不為 NULL (即不為空指針),就持續執行循環體內的代碼。在 tests 數組的最後一個元素 {NULL, NULL} 中, impl 成員就是 NULL 。當循環執行到這個元素時,條件 test->impl 就為假,循環自然結束。所以這個 while 循環的作用是遍歷 tests 數組中所有非空的 impl 成員(即所有要測試的排序算法實現),並依次執行循環體中的測試代碼。這樣的設計允許靈活地添加或刪除要測試的算法實現。 **tests 中只有 timsort 要去測試，但是這個方法可以方便引入其他種類的 sort ，非常有趣的作法！** ### 改進方案 :::warning 其中會有 `check_list` 去檢測是否正常排序，也有檢驗 stable 的功能，但是我檢查程式碼它沒有給予 seq 的數值，全部預設為 0 ，這樣此程式沒有比對 stable 的功能，需要加入以下： ```diff static void create_sample(struct list_head *head, element_t *space, int samples) { printf("Creating sample\n"); for (int i = 0; i < samples; i++) { element_t *elem = space + i; elem->val = rand(); + elem->seq = i; list_add_tail(&elem->list, head); } } ``` ::: 觀察程式碼中沒有實作 galloping mode 的程式，此部份實作待完成 :::info TODO　將 galloping mode 加入 timsort.h 的 `merge` github 上找到 [patperry/timsort](https://github.com/patperry/timsort/blob/master/timsort-impl.h) 翻閱 `timsort-impl.h` 去思考怎麼實作 ::: ### Timsort 實作整合進 sysprog21/lab0-c :::info TODO ::: - - - # 第二周測驗 ## 測驗一 LeetCode 105 ### 解釋上述程式碼的運作解釋上述程式碼的運作，撰寫完整的測試程式，指出其中可改進之處並實作 >給定由二個由整數構成陣列的前序 (preorder) 和中序 (inorder) 形式，分別是對同一棵二元樹的前序及中序走訪結果，藉此重建此二元樹。 #### `container_of` 巨集 ```c #define container_of(ptr, type, member) \ ((type *) ((char *) (ptr) - (size_t) & (((type *) 0)->member))) ``` container_of 巨集在 lab0 變大量使用到，查閱 [Linux 核心原始程式碼巨集: container_of](https://hackmd.io/@sysprog/linux-macro-containerof) ，更釐清其中內容。 > container_of 巨集則逆轉上述流程，特別在 C 語言程式設計中，我們通常會定義一系列公開介面 (interface)，從而區隔各式實作 (implementation) 可以避免原本思維上必須知道每個結構體的起始才能操作，用 container_of 可以活用「模板」。 #### `hash table` 定義 - ```hlist_for_each(pos, head)``` :traverse 節點直到 NULL - ```struct hlist_node```: 定義他的結構體。 - 這邊使用的是 ```struct hlist_node *next, **pprev;``` 在 [Linux 核心的 hash table 實作](https://hackmd.io/@sysprog/linux-hashtable) 有探討其原因。 - 原本==不使用== `pointer to pointer` 在第一個節點 `prev` 會指向 `NULL` ```graphviz digraph G { rankdir = LR; splines = false; node[shape = "record"] list_head[label = "<m>list_head | <n>first"] node_1[label = "<m>dll_node_1 | {prev | <n>next}", group = list]; node_2[label = "<m>dll_node_2 | {prev | <n>next}", group = list]; node_3[label = "<m>dll_node_3 | {prev | <n>next}", group = list]; NULL_1, NULL_2[shape = plaintext, label = "NULL", group = list] {rank = "min" list_head NULL_1} list_head -> node_1:m; node_1:p -> NULL_1 node_1:n -> node_2:m; node_2:p -> node_1:m; node_2:n -> node_3:m; node_3:p -> node_2:m; node_3:n -> NULL_2; } ``` 問題會出在刪除第一個節點時： > 必須做出額外的操作來更新 list_head 指向的節點，於是除了移除的目標之外，還必須傳入 list_head。 - 若是 `prev` 使用 `pointer to pointer`，結構會變成： ```graphviz digraph G { rankdir = LR; splines = false; node[shape = "record"] list_head[label = "<m>list_head | <n>first"] node_1[label = "<m>hlist_node_1 | {pprev | <n>next}", group = list]; node_2[label = "<m>hlist_node_2 | {pprev | <n>next}", group = list]; node_3[label = "<m>hlist_node_3 | {pprev | <n>next}", group = list]; NULL[shape = plaintext, label = "NULL", group = list] {rank = "min" list_head} list_head -> node_1 -> node_2 -> node_3[ weight = 10, style = invis ] list_head:n -> node_1:m; node_1:p -> list_head:n; node_1:n -> node_2:m; node_2:p -> node_1:n; node_2:n -> node_3:m; node_3:p -> node_2:n; node_3:n -> NULL; } ``` > 跟典型 doubly-linked list 的實作做比較，由於該實作的 prev 指標必須指向 struct dll_node 型別，導致第一個節點會出現指向 NULL 的狀況；而將其替換成指標的指標後，便可順利消除這個特例。 - `struct hlist_head`:定義 `hlist_head` 結構體 - 在這個使用情境 (doubly-linked list) 由於最後一個節點指 `NULL` 故多定義一個只有一個的 `hlist_head` 結構，來節省一個指標記憶體開銷。 > 我們理解到，hlist (即 hash list) 是針對雜湊表特製的鏈結串列，儘管雙向鏈結串列亦可實作雜湊表，但由於後者開頭和結尾需要各用 2 個指標，當 bucket 很大時，會增加記憶體開銷。 - `static inline void INIT_HLIST_HEAD(struct hlist_head *h)`：初始 `hlist_head` - `static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h)` - `static int find(int num, int size, const struct hlist_head *heads) ```c static int find(int num, int size, const struct hlist_head *heads) { struct hlist_node *p; int hash = (num < 0 ? -num : num) % size; hlist_for_each (p, &heads[hash]) { struct order_node *on = list_entry(p, struct order_node, node); if (num == on->val) return on->idx; } return -1; } ``` - `static inline void node_add(int val, int idx, int size, struct hlist_head *heads)` ```c static inline void node_add(int val, int idx, int size, struct hlist_head *heads) { struct order_node *on = malloc(sizeof(*on)); on->val = val; on->idx = idx; int hash = (val < 0 ? -val : val) % size; hlist_add_head(&on->node, &heads[hash]); } ``` 在這邊讓我卡住的是 `&heads[hash]` 的使用。順便測試新的 LLMS Claude 他的回答： > in_heads 是一個陣列，in_heads[hash] 取得的是陣列中索引為 hash 的元素，它的型別是 struct hlist_head，但不是指標。因此，如果直接傳入 in_heads[hash]，編譯器會做隱式的型別轉換，將它視為 struct hlist_head 的位址。 > > 取址運算子 & 可以確保傳入正確的型別，也就是 struct hlist_head *。&in_heads[hash] 的意思是取得 in_heads 陣列中索引為 hash 的元素的位址，這個位址的型別正是 struct hlist_head *，符合 hlist_add_head 函數的參數需求。有幫助我釐清這邊的觀念。 - `static struct TreeNode *buildTree(int *preorder, int preorderSize, int *inorder, int inorderSize)` - 第一步，觀察到由於它 input 是 `preorder = [3,9,20,15,7], inorder = [9,3,15,20,7]` - 將 inorder 依照 val 放進雜湊表中，這樣可以快速找到其相對應 index. ```c for (int i = 0; i < inorderSize; i++) node_add(inorder[i], i, inorderSize, in_heads); ``` 下一步便是使用 dfs 去 traverse 二元樹。 - `static struct TreeNode *dfs(int *preorder, int pre_low, int pre_high, int *inorder, int in_low, int in_high, struct hlist_head *in_heads, int size)` - `if (in_low > in_high || pre_low > pre_high) return NULL`; 這個條件是遞迴結束的條件。當中序遍歷的起始索引大於結束索引或者前序遍歷的起始索引大於結束索引時，代表已經沒有節點需要建構，因此返回 NULL，並同時將沒有 val 的節點指向 NULL 。 - 取 `preorder[pre_low];` 的值並用雜湊表找其 index - 以下便是 dfs 經典格式去遍歷並生成唯一二元樹（ NULL 補空缺的節點） ### 在 Linux 核心的 cgroups 相關原始程式碼，找出 pre-order walk 程式碼並探討在 [linux/kernel/cgroup/cgroup.c](https://github.com/torvalds/linux/blob/5847c9777c303a792202c609bd761dceb60f4eed/kernel/cgroup/cgroup.c#L4573) 並搜尋 `css_next_descendant_pre` 程式碼觀察 pre-order walk ```c #define css_for_each_child(pos, parent) \ for ((pos) = css_next_child(NULL, (parent)); (pos); \ (pos) = css_next_child((pos), (parent))) ``` ```c struct cgroup_subsys_state * css_next_descendant_pre(struct cgroup_subsys_state *pos, struct cgroup_subsys_state *root) ``` - - - ## 測驗二 LeetCode 146 ### 解釋上述程式碼的運作解釋上述程式碼的運作，撰寫完整的測試程式，指出其中可改進之處並實作大量調用 LIST API 去實作，結合 hash table ，將最近有在使用的節點插入在雜湊表之中 1. 宣告 LRUCache 結構 - capacity 為雜湊表大小 - count: 計算雜湊表是否滿出 - dhead: 這個雙向鏈結串列的用途就是維護"最近使用情況"的節點順序,把最近常用的節點串在一起。這是實現LRU置換策略的關鍵數據結構。 - hhead: 一連續記憶體，其大小會是 capacity ，用來做為雜湊表，存放 Node 。 ```c typedef struct { int capacity; int count; struct list_head dhead; struct hlist_head hhead[]; } LRUCache; ``` ```graphviz digraph S{ rankdir=LR node[shape=record] subgraph cluster{ label=LRUCache cache[label="<a1> capacity | <a2> count | <a3> dhead | {<a4> | <a5> ...| <a6>}"] } hhead[label="hhead \n work as hash table \n", color=white] hhead->cache:a4 tmp[label="works as LRU Cache", color=white] tmp->cache:a3 LRUN[label=LRUNode1] LRUN->cache:a3 cache:a3->LRUN NOD[label=node1] cache:a6->NOD NOD->cache:a6 } ``` 2. 宣告 LRUNode 結構 - key:用於雜湊的 ID - value:值 - struct hlist_node node - struct list_head link:用 LIST API 來連結節點 ```c typedef struct { int key; int value; struct hlist_node node; struct list_head link; } LRUNode; ``` ```graphviz digraph S{ rankdir=LR node[shape=record] cache[label="<a1> key | <a2> value | <a3> node | <a4> link"] a[color=white, label="LRUNode"] } ``` 3. `LRUCacheGet()` :走訪過雜湊表的 bucket 如果有符合的節點便將他移到該 bucket 的第一個節點。將選到的節點連接到 LRUCahce 成為 dhead 的第一個節點。 4. `lRUCachePut(LRUCache *obj, int key, int value)` : ```c LRUNode *cache = NULL; int hash = key % obj->capacity; struct hlist_node *pos; hlist_for_each (pos, &obj->hhead[hash]) { LRUNode *c = list_entry(pos, LRUNode, node); if (c->key == key) { list_move(&c->link, &obj->dhead); cache = c; } } ``` - 這個迴圈會遍歷雜湊桶中的鏈結串列,尋找與給定 key 相符的節點。如果找到相符的節點,它會將該節點移動到雙向鏈結串列的最前端(使其成為最近使用過的節點),並將該節點的指標指派給 cache。 ```c if (!cache) { if (obj->count == obj->capacity) { cache = list_last_entry(&obj->dhead, LRUNode, link); list_move(&cache->link, &obj->dhead);// 將最舊節點移至鏈表頭(作為新的最近使用節點) hlist_del(&cache->node); // 從雜湊表中刪除該節點 hlist_add_head(&cache->node, &obj->hhead[hash]);// 將該節點重新插入新的雜湊桶 } else { cache = malloc(sizeof(LRUNode)); hlist_add_head(&cache->node, &obj->hhead[hash]); list_add(&cache->link, &obj->dhead); obj->count++; } cache->key = key; } cache->value = value; ``` - 如果沒有在 hlist 找到相對應的 key: - 如果快取已達容量上限,它會從雙向鏈結串列的末端移除最近最少使用的節點,並將其加入到對應雜湊桶的鏈結串列的最前端。這樣做的目的是為了重用該節點來存儲新的 key。（已達上限便不用 malloc 直接拿最舊的更改裡面的值。） - 如果快取尚未達容量上限,它會動態分配一個新的 LRUNode,將其加入到對應雜湊桶的鏈結串列及雙向鏈結串列的最前端,並增加計數器。(每次新增都會新增在 dhead 作為 cache, 跟在雜湊表之中。) ### 在 Linux 核心找出 LRU 相關程式碼並探討 - - - ## 測驗三 find_nth_bit - `#define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))` - ~0UL - 這是一個無符號長整型變量,所有位元都被設置為 1。在 64 位系統上,它的二進制表示為 `0xFFFFFFFFFFFFFFFF`。 - 確保 (nbits) 為非負數 - & (BITS_PER_LONG - 1) - 這個結果就是 **nbits 除以 BITS_PER_LONG 的餘數**。 - 將其右移餘數位 - ex: BITMAP_LAST_WORD_MASK(67) - 0xffffffff >> (67 & (64 - 1)) - 0xffffffff >> 3 - 0x1fffffff - `#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG))` - `BIT_MASK(nr)` 生成一個掩碼,該 mask 只有第 nr 位為 1,其餘位元為 0。 - `#define __const_hweight8(w)` - 這個 macro 用於計算一個 8 位元的值中有多少位元被設置為 1。它通過檢查每一位是否為 1,然後將結果相加來實現。`!!` 運算符將非零值轉換為 1,0 則保持不變。 - 基於這個結構，尚有 `__const_hweight16(w)`, `__const_hweight32(w)`, `__const_hweight64(w)` - `#define FIND_NTH_BIT(FETCH, size, num)` - 對應到**一個記憶體區域中查找第 n 個設置為 1 的 bit** - 接受三個參數: 1. FETCH: 一個表達式,用於獲取指定記憶體位置的值。 2. size: 記憶體區域的大小(以位元為單位)。 3. num: 要查找的第幾個設置為 1 的位元,從 0 開始計數。 - 五個局部變量: 1. sz: 記憶體區域的剩餘大小,初始為 size。 2. nr: 要查找的第幾個設置為 1 的位元,初始為 num。 3. idx: 用於遍歷記憶體區域的索引。 4. w: 暫存 hweight_long 函數的返回值,表示一個 unsigned long 型變量中設置為 1 的位元數量。 5. tmp: 暫存從記憶體中獲取的值。 - ==`static inline unsigned long __ffs(unsigned long word)`== - 一開始的操作有如 binary search ，快速排掉全0的 bits - 它首先檢查 word 的高 32 位是否全為 0。如果是,則將 num 增加 32,因為第一個設置為 1 的位元肯定在低 32 位中,然後將 word 右移 32 位,丟棄高 32 位。 - 需要注意的是,如果 word 為 0,則這個函數的行為是未定義的,因此在調用它之前應該先檢查 word 是否為 0。 - ==`static inline unsigned long fns(unsigned long word, unsigned int n)`== - 對應到**單個無符號長整數中查找第 n 個設置為 1 的 bit** - 這個函數的目的是在一個無符號長整數 word 中找到第 n 個設置為 1 的位元的位置索引。 - 用一個 while loop 使用 ffs 找到第 n 個設置唯一 bit ```c static inline unsigned long fns(unsigned long word, unsigned int n) { while (word) { unsigned int bit = __ffs(word); if (n-- == 0) return bit; __clear_bit(bit, &word); } return BITS_PER_LONG; } ``` ### 解釋上述程式碼的運作原理 ### 在 Linux 核心原始程式碼找出 find_nth_bit 的應用案例在 Linux 核心原始程式碼找出 find_nth_bit 的應用案例，並予以解說和分析，需要涵蓋 CPU affinity 相關的原始程式碼。