2024q1 Homework2 (quiz1+2)

# 2024q1 Homework2 (quiz1+2) contributed by < `lintin528` > ## 2024q1 第 1 週測驗一 Quick sort ([連結](https://hackmd.io/@sysprog/linux2024-quiz1)) >測驗一實作非遞迴 (non-recursive; iterative) 的快速排序法 ### 基本函式功能分析 #### list_add 與在 [lab0-c](https://github.com/lintin528/lab0-c) 之實作不同，此處使用間接指標的方式完成鏈結串列的節點移動，是因為這邊需要對節點的 `head(即 *list)` 做修改，因此不可使用 pass by value 的參數傳遞方式。 ```c void list_add(node_t **list, node_t *node_t) { node_t->next = *list; *list = node_t; } ``` ```graphviz digraph G { label=origin; labelloc=top; rankdir=LR; // size="10" node [shape = square, width=1, height=1]; S1 [label = a]; S2 [label = b]; S3 [label = c]; node [shape = none]; // {rank=same; S1; S2; S3;} P [label = "*list"]; S1 -> S2 -> S3; P -> S1; } ``` ```graphviz digraph G { label=<after>; labelloc=top; rankdir=LR; // size="10" node [shape = square, width=1, height=1]; S1 [label = a]; S2 [label = b]; S3 [label = c]; S4 [label = node_t]; node [shape = none]; // {rank=same; S1; S2; S3;} P [label = "*list"]; P2 [label = "node_t"]; P3 [label = "NULL"]; S4 -> S1 [label = "origin *list"]; S1 -> S2 -> S3; S3 -> P3 P -> S4; P2 -> S4; } ``` >這邊遇到問題，若使用 subgraph 將導致無法套用 `rankdir=LR;` ，尚不知道原因，先以多段 code 分割圖片。 #### list_tail 根據名稱以及後半段 `quick_sort` 的使用情形，可以推斷該函式功能為回傳鏈結串列最後的節點指標，以 `left` ，即指向第一個節點的指標的指標為輸入參數，向鏈結串列尾端遍歷，因此可得知 `AAAA` 為 `(*left)->next` 。 ```c node_t *list_tail(node_t **left) { while ((*left) && (*left)->next) left = &(AAAA); return *left; } ``` #### list_length 類似於 `list_tail` ，但新增計數器並將節點個數回傳，因此可以判斷 `BBBB` 為遍歷過程中間接指標向下一個節點更新，同樣為 `(*left)->next` 。 ```c int list_length(node_t **left) { int n = 0; while (*left) { ++n; left = &(BBBB); } return n; } ``` ### quick_sort 分析分段拆解 `quick_sort` 的功能與流程：首先，先理解到這邊並不是以遞迴呼叫的方式逐段排序，而是透過 `stack` 將各次迭代的起點、終點存儲在陣列後，透過改變 `i` 去實作原本的遞迴功能。其中， `begin[], end[]` 可視為該次迭代的作用範圍，一開始為整個鏈結串列，在第一輪時分別為該鏈結串列的第一個節點與最後一個節點，第一輪結束後分別儲存 "`pivot` 左側所有節點的第一個與最後一個節點"、"`pivot` 節點"、"`pivot` 右側所有節點的第一個與最後一個節點"，因此 `DDDD` 與 `EEEE` 分別為 `list_tail(&left), list_tail(&right)`。進入 `while (i >= 0)` 迴圈後，主要的切分過程透過以下程式碼來實現： ```c while (p) { node_t *n = p; p = CCCC; list_add(n->value > value ? &right : &left, n); } begin[i] = left; end[i] = DDDD; begin[i + 1] = pivot; end[i + 1] = pivot; begin[i + 2] = right; end[i + 2] = EEEE; ``` 觀察 while 迴圈，可以發現它的功能是遍歷該次 `begin` 到 `end` 節點，並將開頭節點設定為 `pivot` ，將該範圍中所有比 `pivot` 大的節點放入 `right` 鏈結串列，反之(含等於)則放入 `left`，因此這裡可以知道 `CCCC` 應為 `p->next` 。在最後將 `left = right = NULL;` 是為了在下一次迭代中儲存該次的 `begin, end` ，一開始理解這個內部的中止條件 `p` 時產生了一個誤解，以為會需要使用 `end` 與當下指針的判斷去決定迴圈是否結束，但後來發現 `end` 僅用以判斷該區間是否為一個節點的區間，在切分 `left, right` 時其實是新增兩個鏈結串列，因此當 `p == NULL` 時，其實就完成該區間的遍歷了。原本的鏈結串列： ```graphviz digraph G { label="origin"; labelloc=top; rankdir=LR; node [shape = square]; s1 [label = 4]; s2 [label = 1]; s3 [label = 3]; s4 [label = 5]; s5 [label = 2]; s6 [label = 7]; pivot [label = 4]; node [shape = none]; n1 [label = head]; n2 [label = NULL]; n3 [label = pivot]; n1 -> s1 -> s2 -> s3 -> s4 -> s5 -> s6 -> n2 n3 -> pivot } ``` 通過 `i = 0` 第一次迭代後： ```graphviz digraph G { label="first iteration"; labelloc=top; // rankdir=LR; node [shape = square]; s2 [label = 1]; s3 [label = 3]; s4 [label = 5]; s5 [label = 2]; s6 [label = 7]; pivot [label = 4]; node [shape = none]; // n1 [label = head]; ppivot [label = pivot]; left [label = left]; right [label = right]; null1 [label = NULL]; null2 [label = NULL]; null3 [label = NULL]; right -> s4 -> s6 -> null1; ppivot -> pivot -> null3; left -> s2 -> s3 -> s5 -> null2; s5 -> pivot [color=transparent]; {rank=same; s2; s3; s5; pivot; s4; s6} } ``` 透過上圖，可以看出原本的鏈結串列被分成三個部分，且在最後以 `begin, end` 儲存分段後的鏈結串列之起點、終點，可由此推斷出 `DDDD, EEEE` 即為 `list_tail(left), list_tail(right)` 。最後，原本的一個鏈結串列應該會變成 n 個僅有一個節點的鏈結串列，就是由上面的 `left, right` 切分出來的，在這個遞迴過程中即為終止條件 `(L == R)` 的狀況，考慮到 `i` 的變化情形，將會從最右側節點開始觸發這個終止條件，當我們進行 n 次的 `list_add(&result, L);` 即完成該鏈結串列的排序。 ```c else { if (L) list_add(&result, L); i--; } ``` 第一次迭代 (`i = 0`)： ```graphviz digraph G { label="first iteration"; labelloc=top; // rankdir=LR; node [shape = square]; s2 [label = 1]; s3 [label = 3]; s4 [label = 5]; s5 [label = 2]; s6 [label = 7]; pivot [label = 4]; node [shape = none]; // n1 [label = head]; ppivot [label = pivot]; left [label = left]; right [label = right]; null1 [label = NULL]; null2 [label = NULL]; null3 [label = NULL]; right -> s6 -> null1; ppivot -> pivot -> null2; left -> s2 -> s3 -> s5 -> null3; s5 -> pivot [color=transparent]; {rank=same; s2; s3; s5; pivot; s4; s6} } ``` 第二次迭代 (`i = 2`)： ```graphviz digraph G { label="second iteration"; labelloc=top; node [shape = square]; s2 [label = 1]; s3 [label = 3]; s4 [label = 4]; s5 [label = 2]; s6 [label = 7]; pivot [label = 5]; node [shape = none]; // n1 [label = head]; ppivot [label = pivot]; left [label = left]; right [label = right]; null1 [label = NULL]; null2 [label = NULL]; null3 [label = NULL]; s2 -> s3 -> s5 left -> null1; ppivot -> pivot -> null2; right -> s6 -> null3; s5 -> s4 -> null1 -> pivot -> s6 [color=transparent]; {rank=same; s2; s3; s5; pivot; s4; s6; null1} } ``` 第三次迭代 (`i = 3`) 結束後的結果鏈結串列： ```graphviz digraph G { label="result after 3rd iteration"; labelloc=top; rankdir=LR; node [shape = square]; s7 [label = 7]; node [shape = none]; // n1 [label = head]; result [label = result]; NULL [label = NULL]; result -> s7 -> NULL; } ``` 觀察第二次迭代可以發現 `left = NULL` ，且 `list_length(node_t **right) = 1`，將結果儲存在 `begin, end` 時，目前應為： ```c begin[2] = NULL begin[3] = "pointer to 5" begin[4] = "pointer to 7" end[2] = NULL end[3] = "pointer to 5" end[4] = "pointer to 7" ``` 再進入下次迭代 (`i = 4` ) ，這次將不會通過 `(L != R)` 之判斷，因此會進行 `list_add(&result, L);` 也就是將儲存 value = 7 的節點加入結果的鏈結串列內，結束後 `i--` ，然後 (`i = 3` ) 時做同樣的動作將 value = 5 的節點存入結果， (`i = 2` ) 時，因為 `begin[2] = end[2] = NULL` ，不會通過 if(L) 的判斷式，跳過， (`i = 1` ) 時將 pivot 存入結果，最終回到 `begin[0]` 並重複以上過程，直到最後一部 `begin[0]` 將等於 `end[0]` ，將所有節點加入結果的串列後， `i == -1` ，結束所有迴圈，排序結束。 >可改善：終止條件 L or R 為NULL時 ## 2024q1 第 1 週測驗二 TimSort ### 實作簡述對於排序的效能分析而言，最常被使用到的有 quick sort 與 merge sort ，Timsort 這個演算法考慮到一般資料其實並不是完全隨機的，相比起以往的 merge sort ， Timsort 在處理已經部分排序的資料時效能會更佳，一部分是因為減少了多餘的 devide 與 compare 次數，一部分是考慮到 Cache 的 Spacial locality。 Timsort 的步驟可以大約拆成兩個部分，第一步是從當下的起點與下個起點做比較，決定這個 `run` 為升序或降序，直到串列不符合升序、降序，並將這個 `run` 分隔出來，即 `find_run`，`stack` 個數加一，第二步是在每次的 `find_run` 結束後，檢查 `runs` 的分布情形是否符合條件： A 的長度要大於 B 和 C 的長度總和。 B 的長度要大於 C 的長度。若不符合，則有相對應的 A+(B+C) 或 (A+B)+C 的合併操作，以保持 Timsort 的效能穩定，這就是 Timsort 一邊進行 `run` 的劃分一邊進行 `merge` 的設計方式。 ### 基本函式分析這邊僅討論在 Timsort 操作過程中使用到的函式，其餘即為初始化產生隨機鏈接串列以及 Timsort 之後的檢驗。 #### compare 比較 `a,b` 兩個鏈結串列中的第一個元素大小，回傳值為 `a 第一個元素大小 - b 第一個元素大小`，後面被使用到的情形是在 `merge` 以及 `find_run` 的過程，僅考慮回傳值 `res` 的正負號。 ```c if (a == b) return 0; int res = list_entry(a, element_t, list)->val - list_entry(b, element_t, list)->val; if (priv) *((int *) priv) += 1; return res; ``` #### run_size 這裡會牽涉到 `find_run` 中，對於每個 `run` 會將長度存放至 "第二個元素的 prev" 以 `list_head *` 型態儲存。 ```c static inline size_t run_size(struct list_head *head) { if (!head) return 0; if (!head->next) return 1; return (size_t) (head->next->prev); } ``` #### find_run 從起點 `list` 開始，在第一次比較第一個即第二個元素的時候決定為升序或降序，且降序的情況需要在 `run` 的建立過程中將鏈結串列反轉且在這裡每個 `run` 的儲存將變為單向的鏈結串列，最後將回傳一個 `pair` 結構體，`result`，`result.head` 將指向該 `run` 的第一個節點，而 `result.next` 指向原本鏈結串列中，該 `run` 區間後的一個節點。以題目範例 `123432478` 為例，執行結束後(未合併)示意圖如下： ```graphviz digraph hlist_add_head { rankdir = TB; node [shape = record]; run1 [label = "{<h> 1 | 2 | 3 | 4}", width=3]; run2 [label = "{<h> 2 | 3}", width=1.5]; run3 [label = "{<h> 4 | 7 | 8}", width=2.25]; node [shape = none]; head1 [label = "result.head(tp->prev->prev)"] head2 [label = "result.head(tp->prev)"] head3 [label = "result.head(tp)"] run1 -> run2 -> run3 -> NULL[label = "result.next"]; head1 -> run1:h head2 -> run2:h head3 -> run3:h subgraph cluster_1 { label = "runA"; run1 }; subgraph cluster_2 { label = "runB"; run2 }; subgraph cluster_3 { label = "runC"; run3 }; {rank = same;head3 -> head2 -> head1 [label = "prev"]} } ``` #### merge 將兩個 `run` 區間 `a,b` 合併，將會在比較完 `a,b` 鏈結串列的第一個元素後，將較小的節點放置於 `head` 的最後一位，相等則取 `a` 以確保 `sort stability` ，因此這裡使用間接指標的方式儲存，因此 `AAAA` 即為指向起始位置的指標的指標 `&head` ，而在後續的合併過程，同樣透過改變 `*tail` 的方式間接更新鏈結串列中最後一位的下一個節點，因此 `BBBB` 與 `CCCC` 分別為更新完後指向下個元素指標的指標， `&a->next, &b->naxt`。部分更新過程： ```c if (cmp(priv, a, b) <= 0) { *tail = a; tail = BBBB; a = a->next; if (!a) { *tail = b; break; } } ``` #### merge_collapse 這裡以較常見的情況分析，此處可以概括為違反這兩種情況：情況 1：A 的長度要大於 B 和 C 的長度總和。情況 2：B 的長度要大於 C 的長度。另外解析一下 `merge_at` 即為將目前所有的 `runs` 中最後一個 `run(tp)` ，與 `tp->prev` 合併並更新 `tp` 指標；當不滿足情況一時若 A<C 則將 A,B 合併，反之則將 B,C 合併。 ```c if ((n >= 3 && run_size(tp->prev->prev) <= run_size(tp->prev) + run_size(tp)) if (run_size(tp->prev->prev) < run_size(tp)) { tp->prev = merge_at(priv, cmp, tp->prev); } else { tp = merge_at(priv, cmp, tp); if (run_size(tp->prev) <= run_size(tp)) tp = merge_at(priv, cmp, tp); ``` #### merge_force_collapse 在整個鏈結串列成功建立完所有的 `runs` 後，因為途中的 `merge_collapse` 執行後有可能部分的 `runs` 都還是滿足 Timsort 的主要條件，因此為了進行最後的 `merge_final` ，需要透過 `merge_force_collapse` 將存在的 `runs` 減少為一至兩個，在這裡 `tp` 一開始指向結構中的最後一個 `run` ，透過比較 A,C (C 為最後一個 run ) 的長度決定該如何合併，直至剩下兩個以下的 `runs`。 ```c static struct list_head *merge_force_collapse(void *priv, list_cmp_func_t cmp, struct list_head *tp) { while (stk_size >= 3) { if (run_size(tp->prev->prev) < run_size(tp)) { tp->prev = merge_at(priv, cmp, tp->prev); } else { tp = merge_at(priv, cmp, tp); } } return tp; } ``` #### build_prev_link 雙向鏈結串列的結構在建立 `runs` 時已被破壞，並會透過 `timsort` 最後的 `merge_final` 以及 `build_prev_link` 進行重建，本函式用意是在最後合併完成後未在結果內的鏈結串列拼接上結果串列的尾端，因此可以判斷 `DDDD` 和 `EEEE` 分別為 `tail->next` 與 `head->prev`。另外在 `timsort` 內當滿足 `if (stk_size <= FFFF)` 判斷式時，其實代表在 `merge_force_collapse` 就已經完全合併完成，只差在需要將整個單向串列 `stk0` 拼接到結果 `head` 之後，並完成雙向指標的修復，因此 `FFFF` 即為 `1` ，不需再作最後的 `merge_final`。 ```c static void build_prev_link(struct list_head *head, struct list_head *tail, struct list_head *list) { tail->next = list; do { list->prev = tail; tail = list; list = list->next; } while (list); /* The final links to make a circular doubly-linked list */ DDDD = head; EEEE = tail; } ``` #### merge_final `timsort` 過程的最後，只剩下兩個 `runs` 時，重複比對兩個串列的頭個節點，並將較小的節點插入 `head` 串列的尾端，以下為 `a` 較 `b` 小(含)的情況，將 `a` 的第一個節點移動後，若 `a` 已無節點，則將整個剩餘的 `b` 拼接至 `head` 的尾端，完成排序。，並在最後 ```c if (cmp(priv, a, b) <= 0) { tail->next = a; a->prev = tail; tail = a; a = a->next; if (!a) break; } ... build_prev_link(head, tail, b); ``` ## 2024q1 第 2 週測驗一 ### 儲存結構分析以題目的 `inorder = [9,3,15,20,7]` 為例，可繪製出： ```graphviz digraph hlist_add_head { rankdir = LR; node [shape = record]; label = "storage structure"; labelloc=top; head4 [label = "in_head[4] | <f> first"]; head3 [label = "in_head[3] | <f> first"]; head2 [label = "in_head[2] | <f> first"]; head1 [label = "in_head[1] | <f> first"]; head0 [label = "in_head[0] | <f> first"]; node3 [label = "<h> hlist_node | { pprev | <x> next}"]; value3 [label = "value = 3"]; index3 [label = "idx = 1"]; subgraph cluster_3 { label = "order_node "; node3; value3; index3; }; node9 [label = "<h> hlist_node | { pprev | <x> next}"]; value9 [label = "value = 9"]; index9 [label = "idx = 0"]; subgraph cluster_9 { label = "order_node "; node9; value9; index9; }; node15 [label = "<h> hlist_node | { pprev | <x> next}"]; value15 [label = "value = 15"]; index15 [label = "idx = 2"]; subgraph cluster_15 { label = "order_node "; node15; value15; index15; }; node20 [label = "<h> hlist_node | { pprev | <x> next}"]; value20 [label = "value = 20"]; index20 [label = "idx = 3"]; subgraph cluster_20 { label = "order_node "; node20; value20; index20; }; node7 [label = "<h> hlist_node | { pprev | <x> next}"]; value7 [label = "value = 7"]; index7 [label = "idx = 4"]; subgraph cluster_7 { label = "order_node "; node7; value7; index7; }; node [shape = none]; null0 [shape = none, label = "NULL"]; null1 [shape = none, label = "NULL"]; null2 [shape = none, label = "NULL"]; null3 [shape = none, label = "NULL"]; null4 [shape = none, label = "NULL"]; head3:f -> node3:h; node3:p -> head3; node3:x -> null3; head4:f -> node9:h; node9:p -> head4; node9:x -> null4; head0:f -> node15:h; node15:p -> head0; node15:x -> node20:h; node20:p -> node15:x; node20:x -> null0; head2:f -> node7:h; node7:p -> head2; node7:x -> null2; head1 -> null1; } ``` ### 分析基本函式與巨集 #### hlist_add_head 透過函式名稱可以得知該函式的功用為在 `h` 指向的鏈結串列中新加入一個節點至 `h->first` 指向的位置，首先若該鏈結串列中有節點的話，更新原本第一個節點的 `pprev` 使其儲存新加入節點之 "指向下一個節點的指標的指標" ，將 `n->next` 設定為原本節點的指標，因此 `AAAA` 即為原本的第一個節點 `h->first` ，再來就是將 `h->first` 指向新加入節點，並更新 `n->pprev` 。 ```c static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h) { if (h->first) h->first->pprev = &n->next; n->next = AAAA; n->pprev = &h->first; h->first = n; } ``` 示意圖： ```graphviz digraph hlist_add_head { rankdir = LR; node [shape = record]; label = "before add"; labelloc=top; head0 [label = "in_head[0] | <f> first"]; node15 [label = "<h> old node | { pprev | <x> next}"]; node [shape = none]; null0 [shape = none, label = "NULL"]; head0:f -> node15:h; node15:p -> head0; node15:x -> null0; } ``` ```graphviz digraph hlist_add_head { rankdir = LR; node [shape = record]; label = "after add"; labelloc=top; head0 [label = "in_head[0] | <f> first"]; node15 [label = "<h> new node | { pprev | <x> next}"]; node20 [label = "<h> old node | { pprev | <x> next}"]; node [shape = none]; null0 [shape = none, label = "NULL"]; head0:f -> node15:h; node15:p -> head0; node15:x -> node20:h; node20:p -> node15; node20:x -> null0; } ``` #### find 根據 hash table 建立的規則，先對輸入的 `num` 做一次 hash function ，決定該在`in_heads` 中的哪一個鏈結串列做搜尋，接下來透過 `hlist_for_each` 遍歷該鏈結串列，因此 `BBBB` 應為該鏈結串列的 `hlist_head` 之位址，即為 `&head[hash]`；再來為了獲得該鏈結串列中節點的值，必須先從 `hlist_node` 找出外部結構 `order_node` ，再去搜尋該結構中的值數值 `on->val` ，因此 `CCCC` 應該為透過 `hlist_node` 找出外部結構 `order_node` 之巨集，即為 `list_entry` ，這裡值得一提的是，不同以往透過遍歷鏈結串列搜索 O(n)，透過 hash table 的方式去做搜尋可以減少搜尋的時間到常數時間 O(1)。 ```c static int find(int num, int size, const struct hlist_head *heads) { struct hlist_node *p; int hash = (num < 0 ? -num : num) % size; hlist_for_each (p, BBBB) { struct order_node *on = CCCC(p, struct order_node, node); if (num == on->val) return on->idx; } return -1; } ``` #### node_add 此函式的功能為將節點加入 hash table 中對應值的鏈結串列的頭部，因此 `DDDD` 應為 `&heads[hash]` ```c static inline void node_add(int val, int idx, int size, struct hlist_head *heads) { struct order_node *on = malloc(sizeof(*on)); on->val = val; on->idx = idx; int hash = (val < 0 ? -val : val) % size; hlist_add_head(&on->node, DDDD); } ``` ### TreeNode 分析可以看到他有六個參數，每個的功用為： ```c int *preorder 存取前序的陣列第一個位址 int pre_low 該次遞迴所設定的前序陣列下限 int pre_high 該次遞迴所設定的前序陣列上限 int *inorder 存取中序的陣列第一個位址 int in_low 該次遞迴所設定的中序陣列下限 int in_high 該次遞迴所設定的中序陣列上限 struct hlist_head *in_heads 儲存 hash table 結構體的位址 int size 為 hash table 之欄位總數 ``` #### `TreeNode` 主要的的執行過程：第一步，配置一個新的 `struct TreeNode` 記憶體空間，並設定該 TreeNode 之數值為該次遞迴中 `*preorder` 中的第 `pre_low` 個元素，找到這個數值對應到 `*inorder` 中的 index ，並利用這個值決定進入左右子樹的遞迴。 ```c struct TreeNode *tn = malloc(sizeof(*tn)); tn->val = preorder[pre_low]; int idx = find(preorder[pre_low], size, in_heads); ``` 第二步，左子樹遞迴：以第一次執行為例，這次的遞迴會將整個左子樹作為參數傳入，在 root 的左子樹中， `*preorder` 之範圍改變為 `pre_low + 1, pre_low + (idx - in_low)` ，按照前序 "中左右" 的搜尋順序，將一開始作為 root 的第一個元素移除，因此 `pre_low` 傳入的是 `pre_low + 1` ，而 `pre_low + (idx - in_low)` 的部分可以以下圖的方式理解： ```graphviz digraph hlist_add_head { rankdir = LR; node [shape = record]; in [label = "{9 | '3' | 15 | 20 | 7}", width=3]; pre [label = "{'3' | 9 | 20 | 15 | 7}", width=3]; node [shape = none]; preorder -> pre; inorder -> in } ``` 這裡選取了 3 ，因此左子樹可以判斷出在 inorder 的左半邊，透過 `(idx - in_low)` 計算出個數後，將反映到 `pre_high` 上，即為 `pre_low + (idx - in_low)` ，也就是說在第一次的左子樹遞迴中， `preorder` 的範圍就被界定為 `(1, 1)` ，而 `inorder` 的範圍被界定為 `(0, 0)`。第三步，右子樹遞迴則類似於上面的判斷方式，透過 `inorder` 中的位置與 `low_high` 判斷右子樹的元素個數，並取出對應的 `preorder` 與 `inorder` 範圍作為參數傳入 `dfs` ，在這裡 `preorder` 的範圍被界定為 `(2, 4)` ，而 `inorder` 的範圍被界定為 `(2, 4)`。 ```c tn->left = dfs(preorder, pre_low + 1, pre_low + (idx - in_low), inorder, in_low, idx - 1, in_heads, size); ``` 最後觸發中止條件 `in_low > in_high || pre_low > pre_high` 時，回傳 `TreeNode` ，這個中止條件將在只有一個元素的時候進行 `dfs` 後的左右子樹遞迴觸發。 ## 2024q1 第 2 週測驗二對於可快速存取但容量有限的 Cache 而言，通常會以 Spatial Locality 或 Temporal Locality 為考量選擇不同的排班策略，LRU 就是其中考慮 Temporal Locality 較多的一種，題目中以 `dhead` 這個鏈結串列儲存 Cache 中的使用情形，當在 Cache 已滿的時候發生 Cache miss ，則會將此鏈結串列的最後一個節點移出，即為最久沒被使用到的節點。 ### 基本結構分析 #### hlist_del 將目前指向的 hlist_node 刪除，首先用 `next` 儲存目前節點指向下個節點的指標， `pprev` 儲存指向目前這個節點的指標，透過間接指標 `*pprev = next;` 的方式將上一個節點的 `next` 指標指向下一個節點，最後則是更新 `next` 節點的 `pprev` ，因此 `EEEE` 即為 `next->pprev`。 ```c void hlist_del(struct hlist_node *n) { struct hlist_node *next = n->next, **pprev = n->pprev; *pprev = next; if (next) EEEE = pprev; } ``` #### list_add & __list_add list_add 的功用即為透過呼叫 `__list_add` ，將 `_new` 節點加入到 `head` 的後面，但觀察起來我認為不太需要 `__list_add` 這個函式，可以簡單透過像是 lab0-c 中的 "list.h" 中的方式，直接在 `list_add` 中宣告一個新的 `struct list_head *next = head->next;` 就可以完成一樣的操作了。 ```c void __list_add(struct list_head *new, struct list_head *prev, struct list_head *next) { next->prev = new; new->next = next; new->prev = prev; prev->next = new; } void list_add(struct list_head *_new, struct list_head *head) { __list_add(_new, head, head->next); } ``` #### struct LRUCache and LRUNode 在看到這兩個結構體後我一開始想的是為何會需要兩個鏈結串列去儲存 Cache 中的資料，其實僅透過 `dhead` 就可以完整的搜尋整個 Cache 的使用情形了，但後來想起上一題的 hash table 其實就是為了縮短搜尋的時間，在 Cache 中 key-value pair 的搜尋時間優化也是相對重要的。 LRUCache 中四個參數分別代表： ```c int capacity; 整個 Cache 可容納 LRUNode 的最大總數 int count; 目前 Cache 中存在使用中的 LRUNode 總數 struct list_head dhead; 追蹤使用狀態的雙向鏈結串列，最後使用的 LRUNode 將在最後面 struct hlist_head hhead[]; 存放 Cache 中各個 hash table 欄位的存放情形 ``` LRUNode 中四個參數分別代表： ```c int key; 對應到 hash table 的其中一個欄位，並比對欄位中是否為同個節點 int value; 儲存該 LRUNode 的數值 struct hlist_node node; LRUNode 在 hash table 中對應欄位所指向的節點 struct list_head link; LRUNode 在使用順序的鏈結串列中的節點 ``` 對於 LRUNode 而言， `node` 與 `link` 作為 `entry` 來進行管理。 #### lRUCacheCreate 創立一個新的 Cache 結構體，主要在執行記憶體配置、設定初始參數，對 `dhead` 初始化、初始化 `capacity` 個 hash table 欄位的 `first` 指標。 ```c LRUCache *lRUCacheCreate(int capacity) { LRUCache *cache = malloc(2 * sizeof(int) + sizeof(struct list_head) + capacity * sizeof(struct list_head)); cache->capacity = capacity; cache->count = 0; INIT_LIST_HEAD(&cache->dhead); for (int i = 0; i < capacity; i++) INIT_HLIST_HEAD(&cache->hhead[i]); return cache; } ``` #### lRUCacheFree 釋放整個 Cache 的記憶體空間，利用 `list_for_each_safe` 在遍歷每個節點的同時，可以對當下的 `pos` 進行釋放，因為有 safe node 'n' 保存著下個節點的位址，再來使用 `list_entry` 透過 `list_head *pos` 找到上層 `LRUNode` 結構，而 `pos` 在結構內的變數名稱為 `link` ，因此 `FFFF` 即為 `link` ，之後對 `list_head` 進行釋放，因此 `GGGG` 應為 `pos` 。另外這邊我想到一個問題是雖然在 Cache 結構體內 hash table 的儲存型態 `hhead[]` 不為指標，不需要 free ，但在 `obj` 指標被釋放後期時喪失了 `hhead[hash]` 的造訪方式，那他內部所儲存的 `first` 及後面的 `hlist_node` 鏈結串列與指標所耗費的記憶體空間沒有被釋放出來，又或者說在執行 `lRUCacheFree` 之前， hash table 相關的釋放會先被執行。 ```c void lRUCacheFree(LRUCache *obj) { struct list_head *pos, *n; list_for_each_safe (pos, n, &obj->dhead) { LRUNode *cache = list_entry(pos, LRUNode, FFFF); list_del(GGGG); free(cache); } free(obj); } ``` #### lRUCacheGet 接下來的 Get 與 Put 操作可以對應到 Cache 中的 read 與 write 操作，首先是 `lRUCacheGet` ，該函式是透過 `hlist_for_each` 遍歷該 key 值對應到的 hash table 欄位中的 `hlist_node` ，在 `obj` 這個 Cache 中查找有無 `key` 對應到的 `LRUNode` ，若有則將該節點的 `value` 讀出，若無則回傳 `-1`，遍歷的過程中先使用 `list_entry` 透過目前的 `hlist_node *pos` 查找上層結構，而 `pos` 在 `LRUNode` 中為 `node` 成員，因此 `HHHH` 應為 `node` ，另外，因為我們使用 LRU 排程的關係，在我們讀取或寫入該 `LRUNode` 後，應該要將他在 `dhead` 鏈結串列中移至最前面，代表最近一次使用，並且 `LRUNode` 儲存在 `dhead` 鏈結串列中的成員為 `list_head link` ，因此此處的 `IIII` 應為 `&cache->link`。 ```c int lRUCacheGet(LRUCache *obj, int key) { int hash = key % obj->capacity; struct hlist_node *pos; hlist_for_each (pos, &obj->hhead[hash]) { LRUNode *cache = list_entry(pos, LRUNode, HHHH); if (cache->key == key) { list_move(IIII, &obj->dhead); return cache->value; } } return -1; } ``` #### lRUCachePut 分析時，將這個函式大致拆解為兩個部分： 1. 在 Cache 中查找是否有該 key 對應的 `LRUNode` 存在，若有則儲存為變數 `cache`：同樣是透過 `hlist_for_each ` 遍歷 key 對應之 hash table 欄位的鏈結串列，並做 `list_entry` 取出上層結構後查找是否有成員 key 相同者，若有則在使用順序鏈結串列 `dhead` 中放入最前方，因此可看出 `KKKK` 即為 `&c->link`。 ```c LRUNode *cache = NULL; int hash = key % obj->capacity; struct hlist_node *pos; hlist_for_each (pos, &obj->hhead[hash]) { LRUNode *c = list_entry(pos, LRUNode, JJJJ); if (c->key == key) { list_move(KKKK, &obj->dhead); cache = c; } } ``` 2. 若有找到則直接存入新的 value，若無則考慮兩種狀況： a : Cache 已滿，即 `obj->count == obj->capacity` ，則將位於 `dhead` 最後一位之 `LRUNode->link` 取出後放至第一位，另外透過 `hlist_del` 與 `hlist_add_head` 在該 key 對應到的 hash table 鏈結串列中將此 `LRUNode->node` 移至第一位，這裡目前不知道改變節點在 hash table 欄位中的順序會影響什麼，推測是因為注重 Temporal Locality 的方面，考慮到多次使用同個 `LRUNode` ，這麼做可以減少在 hash table 中搜尋的時間。 b：在 Cache 未滿的狀況下需要新配置一個 `LRUNode` 記憶體空間並直接將該節點中的 `node, link` 成員分別放置於 `&obj->hhead[hash], &obj->dhead` 指向的兩個鏈結串列之開頭，並讓 `count` 計數器加一，紀錄目前 Cache 中被使用的欄位個數 ```c if (!cache) { if (obj->count == obj->capacity) { cache = list_last_entry(&obj->dhead, LRUNode, link); list_move(&cache->link, &obj->dhead); hlist_del(&cache->node); hlist_add_head(&cache->node, &obj->hhead[hash]); } else { cache = malloc(sizeof(LRUNode)); hlist_add_head(&cache->node, &obj->hhead[hash]); list_add(&cache->link, &obj->dhead); obj->count++; } cache->key = key; } cache->value = value; ``` 最後在觀察這整個結構之後，發現他應該是一個 fully associative cache ，之前的認知是在一個容量較小的 Cache 中，可透過 `capacity` 個比較器直接查找 key 對應的 block 是否存在於 Cache 中，但在這個實作中是使用 hash table 查找，發現自己對於在硬體層面與軟體層面之間關聯性的理解不太明確。 ## 2024q1 第 2 週測驗三 ### 函式與巨集分析 #### __ffs ffs 即為 find first set ，從最低位向高位搜尋第一個 1 ，由此可以分析這些 if else 的功效，第一步則是判斷從 0-31 bit 是否有設置 1，若無，則將 word 向右移 32 bit ，實際上就是讓下次的比對從第 32 bit 的位置開始，因此 `AAAA` 應為檢查 word 的 0-31 bit 是否存在 1，可得 `AAAA` = `0xffffffff`，這個函式相當於每次將 word 切分使用二分法的方式去搜尋。 ```c int num = 0; #if BITS_PER_LONG == 64 if ((word & AAAA) == 0) { num += 32; word >>= 32; } #endif if ((word & 0xffff) == 0) { num += 16; word >>= 16; } if ((word & 0xff) == 0) { num += 8; word >>= 8; } if ((word & 0xf) == 0) { num += 4; word >>= 4; } if ((word & 0x3) == 0) { num += 2; word >>= 2; } if ((word & 0x1) == 0) num += 1; return num; ``` #### __clear_bit 這邊得先提到 `__clear_bit` 為在進行 `fns` 時，是透過進行 `n` 次的 `ffs` ，並在每次搜尋到第一個 1 後，將其從 word 中去除所使用的函式，因此它利用 `BIT_MASK` 產出第 `nr` 位為 1 ，其餘為 0 的 mask，並做一次反向並透過 & 運算將當下的 1 過濾掉，因此 `BBBB` 應為 `~mask`，另外這裡的 `BIT_WORD` 則是為了若 `nr > 64` 也就是一個 unsigned long 長度時，需要做記憶體位址的變更以查找更高位的位元在做運算。 ```c static inline void __clear_bit(unsigned long nr, volatile unsigned long *addr) { unsigned long mask = BIT_MASK(nr); unsigned long *p = ((unsigned long *) addr) + BIT_WORD(nr); *p &= BBBB; } ``` 示意圖： ```graphviz digraph hlist_add_head { label = "storage structure"; labelloc=top; // rankdir = LR; node [shape = record]; A [label = "<h> 1 | 1 | ...| 1 | 1 |<n> 0 | 1", width=6]; B [label = "<h> ... | ... | 1 | 0 | 0 |<n> 1 | 0", width=6]; node [shape = none]; mask [label = "~mask"]; nbit [label = "nth bit"] nbit2 [label = "nth bit"] NULL [label = ""] p -> B:h; mask -> A:h; nbit -> B:n; nbit2 -> A:n; A -> NULL -> B [color=transparent] } ``` #### fns 即為在 `word` 所在的記憶體空間中，第 n 個 1 的位置，若找不到則回傳 `BITS_PER_LONG` ，即 64 ，可以看到操作過程就是 `找到第一位 -> 篩選掉 -> 重複 n 次` ，值得一提的是在呼叫 fns 的時候，只會限制在一個 unsigned long 記憶體空間內，因此不理解為何在 `__clear_bit` 會需要執行 `unsigned long *p = ((unsigned long *) addr) + BIT_WORD(nr)` 這樣的動作，在巨集 `FIND_NTH_BIT` 中的呼叫，也是先將 `fns(tmp, nr)` 中的 `tmp` 設定為 `addr[idx]` ，我想不出何種情況 `__clear_bit` 的 input `bit` 會大於 64 ，因此我認為在這個實作中他的 `BIT_WORD(nr)` 將總是為 0，可以刪除。 ```c while (word) { unsigned int bit = __ffs(word); if (n-- == 0) return bit; __clear_bit(bit, &word); } return BITS_PER_LONG; ``` #### find_nth_bit 這個函式將在 `size` 個位元中尋找第 n 個 1 ，分成三種情況，第一種若是要尋找的位元比 `size` 還要或是一樣大，則必定回傳 `size`，第二種情形為當 `size` 不超過一個 unsigned long 記憶體大小時，可直接呼叫 `fns` 而不用先透過巨集 `FIND_NTH_BIT` 分段查找多個 unsigned long 記憶體空間，最後則是超過一個 unsigned long 記憶體大小時，會利用 `addr[idx]` 分段造訪。 ```c if (n >= size) return size; if (small_const_nbits(size)) { unsigned long val = *addr & GENMASK(size - 1, 0); return val ? fns(val, n) : size; } return FIND_NTH_BIT(addr[idx], size, n); ``` #### FIND_NTH_BIT 在這個 for 迴圈，就是在造訪多個 unsigned long 記憶體空間， `tmp` 將隨著 idx 改變為 `FETCH` ，也就是 `addr[idx]` ，在每個迭代中，會先透過 `hweight_long` 檢查該次的記憶體空間內有幾個 1 ，若 `w > nr` 即表示最終的第 n 個 1 位於此記憶體空間內，並且在最後一次的搜尋中，`size` 不為 64 的倍數，最高位的記憶體空間會先經過 `if (sz CCCC BITS_PER_LONG)` 中的過濾去將 `size` 最高位以後的位元篩選掉，因此判斷式中的 `CCCC` 應為 `%`，在不整除的情況下才會進行。 ```c for (idx = 0; (idx + 1) * BITS_PER_LONG <= sz; idx++) { \ if (idx * BITS_PER_LONG + nr >= sz) \ goto out; \ \ tmp = (FETCH); \ w = hweight_long(tmp); \ if (w > nr) \ goto found; \ \ nr -= w; \ } \ \ if (sz CCCC BITS_PER_LONG) \ tmp = (FETCH) & BITMAP_LAST_WORD_MASK(sz); ``` 另外 `BITMAP_LAST_WORD_MASK` 巧妙的運用二補數去將超出 `size` 最後一塊記憶體空間的位元刪除，因為一次作業的單位為 64 bit ，所以進行 `& (BITS_PER_LONG - 1)`，可以得到最高位元含右側全為 1 左側全為 0 的 64 bit 二進位數，基本上與 `GENMASK`，`l = 0` 時功用相同，示意圖如下：其中 `nth bit = size % BITS_PER_LONG` ```graphviz digraph hlist_add_head { // rankdir = LR; label = "mask"; labelloc=top; node [shape = record]; B [label = " 0 | ... | ... | 0 | <n> 1 | ... | ... | 1", width=6]; node [shape = none]; nbit [label = "nth bit"] nbit -> B:n; } ```