2024q1 Homework2 (quiz1+2)

# 2024q1 Homework2 (quiz1+2) contributed by < `SH` > ## 第一週測驗題 > [完整題目](https://hackmd.io/@sysprog/linux2024-quiz1) ### 測驗 1 >延伸問題： >- [x] 解釋上述程式碼的運作原理，提出改進方案並予以實作。 >- [ ] 使用 Linux 核心風格的 List API 改寫上述程式碼，並針對鏈結串列，提出可避免最差狀況的快速排序實作，應設計效能評比的測試程式。 ### 非遞迴 (non-recursive; iterative) 的快速排序法理解在傳統的遞迴快速排序演算法中，一旦完成了節點與基準值(Pivot)的比較後，演算法會將鏈結串列分割成兩個子鏈結串列。接著，透過遞迴呼叫對這些更小的子串列進行相同的操作，選擇新的基準值，再次分割，如此反覆，直到每個子鏈結串列達到有序狀態。而本程式碼所採用的方法是將遞迴呼叫的過程轉換為使用堆疊來實現。在這種實現方式中，程式並不是透過直接的遞迴呼叫來分割鏈結串列，而是將每次分割後得到的子鏈結串列的起始和終點節點存儲在堆疊中。觀察題目中對於 `void quick_sort(node_t **list)` 的實現，程式先將 `max_level` 設為 2\*n，這是為了確保，在最差的分割情況下，堆疊有足夠的空間來存儲所有可能的子鏈結串列的開始和結束指標。在快速排序法中，最壞的情況是每次選擇的 Pivot 都是目前子鏈結串列中的最小或最大元素。這樣會導致分割非常不均勻，一邊是空的，另一邊則包含剩餘的所有元素。在這種情況下，如果有 n 個節點，每次分割會產生兩個新的子範圍，所以需要 2 個空間來存儲每層的範圍（begin 和 end）。因此，max_level = 2 * n 確保了即使在最壞的情況下，也有足夠的空間來存儲所有的子範圍。 ```c int max_level = 2 * n; node_t *begin[max_level], *end[max_level]; node_t *result = NULL, *left = NULL, *right = NULL; begin[0] = *list; end[0] = list_tail(list); ``` 接著演算法首先將堆疊 `begin[0]` 以及 `end[0]` 的位置放入鏈結串列的開頭結尾，作為最開始的開頭與結尾。 ```c while (i >= 0) { node_t *L = begin[i], *R = end[i]; if (L != R) { node_t *pivot = L; value = pivot->value; node_t *p = pivot->next; pivot->next = NULL; while (p) { node_t *n = p; p = CCCC; list_add(n->value > value ? &right : &left, n); } begin[i] = left; end[i] = DDDD; begin[i + 1] = pivot; end[i + 1] = pivot; begin[i + 2] = right; end[i + 2] = EEEE; left = right = NULL; i += 2; } else { if (L) list_add(&result, L); i--; } } *list = result; ``` 排序的過程: * 迴圈 `while (i >= 0)` 用於走訪和排序鏈結串列的不同部分。 * 在迴圈內，選擇一個節點作為 pivot，這裡選擇的是每個子串列的開始節點 L。 * 走訪 pivot 後的所有節點，根據它們的值相對於 pivot 的大小，將它們分到 left 或 right 子串列。 * 更新 begin 和 end 數組以反映新的子列表，其中包括左子串列、pivot、和右子串列。 * 如果達到一個只有一個節點的子列表，將該節點添加到最終結果 result 中。圖解: * 選擇一個節點作為 pivot ![1](https://hackmd.io/_uploads/H1228pSTa.png) * 走訪 pivot 後的所有節點 ![lklis](https://hackmd.io/_uploads/BywaL6Haa.png) * 比較它們的值相對於 pivot 的大小 ![3](https://hackmd.io/_uploads/Sy3TIarTp.png) * 根據它們的值相對於 pivot 的大小，將它們分到 left 或 right 子串列。 ![4](https://hackmd.io/_uploads/rJZ4D6BT6.png) * 第一輪結果 ![5](https://hackmd.io/_uploads/rko4_TSap.png) * 更新 begin 和 end 堆疊，其中包括左子串列、pivot、和右子串列。 ![next](https://hackmd.io/_uploads/SJeRopHpa.png) * 演算法會先針對右子串列進行走訪，接著才是左子串列。 ![final](https://hackmd.io/_uploads/ByiJp6Saa.png) * 當 begin 與 end 相等 (亦即 `L == R`) ，演算法由右至左將節點加入 result 串列。 1. ![end](https://hackmd.io/_uploads/BJppyCSpT.png) 2. ![7](https://hackmd.io/_uploads/Hy5hgCH6p.png) 3. ![last](https://hackmd.io/_uploads/S1FXbAraa.png) * 重複直到所有節點排序完成。 ---------------------------------------------------------------- ```c while (p) { node_t *n = p; p = CCCC; list_add(n->value > value ? &right : &left, n); } ``` 值得注意的是，以上程式碼為快速排序法中的一個關鍵步驟，此段程式碼將走訪從 pivot 之後開始的所有鏈結串列節點，每個節點依據其值與 pivot 值比較的結果，既然要走訪整個鏈結串列，因此我認為 `p = CCCC` 實際上應該是 `p = p->next;` ，利用`p` 指標走訪整個鏈結串列，比較走訪過的節點其值與 pivot 值的大小關係。 -------------------------------------------- ```c begin[i] = left; end[i] = DDDD; begin[i + 1] = pivot; end[i + 1] = pivot; begin[i + 2] = right; end[i + 2] = EEEE; left = right = NULL; i += 2; ``` 上述程式碼作用是將左右子串列的開頭與結尾放入堆疊，放入的順序為左子串列， Pivot ，右子串列。因此 `end[i] = DDDD;` 應該放入左子串列的結尾 `end[i] = list_tail(&left);`，而 `end[i + 2] = EEEE;` 應該放入右子串列的結尾 `end[i] = list_tail(&right);`。 ### 改進方案並予以實作在閱讀完並理解非遞迴的快速排序演算法後，我認為 `end` 這個堆疊不需要存在的，程式中已經有 `list_tail()` 可以用來存取練可以用來存取鏈結串列的結尾，我認為可以將 `end` 堆疊刪除以節省不必要的宣告。修改的程式碼 : ```diff int value; int i = 0; int max_level = 2 * n; ! node_t *begin[max_level]; node_t *result = NULL, *left = NULL, *right = NULL; begin[0] = *list; while (i >= 0) { ! node_t *L = begin[i], *R = list_tail(&begin[i]); if (L != R) { node_t *pivot = L; value = pivot->value; ``` ```diff begin[i] = left; - end[i] = list_tail(&left); begin[i + 1] = pivot; - end[i + 1] = pivot; begin[i + 2] = right; - end[i + 2] = list_tail(&right); ``` 實驗 : ![image](https://hackmd.io/_uploads/H1WYVZU66.png) 實驗比較修改後的快速排序法和原始的快速排序法效能差異，發現花費的 Cycles 數量相當，雖然沒有在效能上有明顯進步，但能夠減少一個堆疊的宣告。 ### 測驗 2 >延伸問題： >- [x] 解釋上述程式碼的運作原理，提出改進方案並予以實作。 >- [ ] 將改進過的 Timsort 實作整合進 sysprog21/lab0-c，作為另一個有效的排序命令，應設計效能評比的測試程式，確保測試案例符合 Timsort 設想的資料分布樣式。 ### Timsort 排序演算法理解 Timsort 對 merge sort 的不足進行了改進，並針對實真實世界中的數據中**已排序部分的特性**進行了演算法上的優化，從而在記憶體使用和速度方面實現了進步。 Timsort 具備了以下特性: * 提高時間效率：在一個一個待排序的序列中，裡面可能有部份元素已經排序好。 Timsort 利用這個特性，在尋找 runs 時，會把**已經排序好的元素**加入同一個 runs ，**避免需要額外的時間排序**。 * 降低記憶體開銷 : 只會對 2 個 runs 中，其**範圍重疊的部份進行合併**，額外空間只需要配置二個要合併的部份中跟較小的那一部份元素數量的記憶體。 * 減少 Cache misses : Timesort 儘量讓要排序的元素剛才存取過，還在記憶體階層上層時，即可進行合併。接著我將針對 Timsort 的程式碼進行探討。 ```c static struct list_head *merge(void *priv, list_cmp_func_t cmp, struct list_head *a, struct list_head *b) { struct list_head *head; struct list_head **tail = AAAA; for (;;) { /* if equal, take 'a' -- important for sort stability */ if (cmp(priv, a, b) <= 0) { *tail = a; tail = BBBB; a = a->next; if (!a) { *tail = b; break; } } else { *tail = b; tail = CCCC; b = b->next; if (!b) { *tail = a; break; } } } return head; } ``` 這段程式碼實現了經典的合併操作，用於合併兩個已排序的鏈結串列。特別值得注意的是使用了 `struct list_head **tail = AAAA;` 這樣的雙重指標，他用於定位合併過程中即將添加節點的位置。圖解 : * 初始化 `tail` : 首先將 `tail` 雙重指標使其指向合併後的鏈結串列開頭指標。 ![01](https://hackmd.io/_uploads/ByIqgp8aT.png) * 比較並插入節點 : 對 `a` `b` 兩個開頭節點值進行比較，將值較小的節點放入合併後鏈結串列中，值得注意的是，若 `a` 與 `b` 開頭節點值相同，將 `a`　放入合併後鏈結串列以確保 stable。 ![02](https://hackmd.io/_uploads/HJ6cinLpa.png) * 更新 `tail` : 在節點新增到合併後的鏈結串列後，更新 `tail` 指標，使其指向合併後的鏈結串列最後一個節點的 `next` 指標位置，使 `tail` 始終指向下一個插入節點的位置。 ![03](https://hackmd.io/_uploads/S10Coh8pp.png) * 重複直到合併完成。 ![04](https://hackmd.io/_uploads/rJfMJ68Ta.png) 由以上的概念可以得知 `struct list_head **tail = AAAA;` 應改為 `struct list_head **tail = &head;` ， `tail` 雙重指標使其指向合併後的鏈結串列開頭指標。 `tail = BBBB;` 應改為 `&a->next` ，更新 `tail` 指標，使其指向合併後的鏈結串列最後一個節點的 `next` 指標位置，也就是剛插入的節點的下一個指標位置。 `tail = CCCC;` 應改為 `&b->next`，更新 `tail` 指標，使其指向合併後的鏈結串列最後一個節點的 `next` 指標位置，也就是剛插入的節點的下一個指標位置。 ---------------------- ```c static void build_prev_link(struct list_head *head, struct list_head *tail, struct list_head *list) { tail->next = list; do { list->prev = tail; tail = list; list = list->next; } while (list); /* The final links to make a circular doubly-linked list */ DDDD = head; EEEE = tail; } ``` `Timsort` 有一個特色在對鏈結串列進行排序時，會先將鏈結串列改為單向鏈結串列，而在排序結束後需要再轉換為雙向鏈結串列。以上程式碼就是一個將單向鏈結串列改為雙向鏈結串列的實現，仔細閱讀程式碼後了解到主要的幾個步驟為： * 將 `tail->next` 設定為 `list` 以便接下來走訪。 * 接著走訪整個鏈結串列，將每個節點的 `prev` 指標指向前一個節點。 * 重複直到最後一個節點。 * 最後需要把頭尾連接完成雙向鏈結串列。由以上步驟可以了解到 `DDDD = head;` 應改為 `tail->next = head;` ，而將`EEEE = tail;` 改為 `head->prev = tail;` 完成雙向的鏈結串列編輯。 -------------------- ```c void timsort(void *priv, struct list_head *head, list_cmp_func_t cmp) { stk_size = 0; struct list_head *list = head->next, *tp = NULL; if (head == head->prev) return; /* Convert to a null-terminated singly-linked list. */ head->prev->next = NULL; do { /* Find next run */ struct pair result = find_run(priv, list, cmp); result.head->prev = tp; tp = result.head; list = result.next; stk_size++; tp = merge_collapse(priv, cmp, tp); } while (list); /* End of input; merge together all the runs. */ tp = merge_force_collapse(priv, cmp, tp); /* The final merge; rebuild prev links */ struct list_head *stk0 = tp, *stk1 = stk0->prev; while (stk1 && stk1->prev) stk0 = stk0->prev, stk1 = stk1->prev; if (stk_size <= FFFF) { build_prev_link(head, head, stk0); return; } merge_final(priv, cmp, head, stk1, stk0); } ``` 接著我針對以上 `timsort` 程式碼的持續探討，我注意到它首先將鏈結串列從雙向鏈結串列轉換為單向鏈結串列。這一轉換是透過設置鏈結串列最後一個節點的 `next` 指針為 `NULL` 來實現的，具體操作為 `head->prev->next = NULL;`。這一步驟的目的是確保鏈結串列的最後一個節點的 `next` 指針指向 `NULL`，這樣，在後續的走訪中，當走訪到 `NULL` 指針時，就可以識別出鏈結串列的末尾，並停止走訪。接著，使用了 `find_run` 來找到串列中已經排序好的部分。 ```c static struct pair find_run(void *priv, struct list_head *list, list_cmp_func_t cmp) { size_t len = 1; struct list_head *next = list->next, *head = list; struct pair result; if (!next) { result.head = head, result.next = next; return result; } if (cmp(priv, list, next) > 0) { /* decending run, also reverse the list */ struct list_head *prev = NULL; do { len++; list->next = prev; prev = list; list = next; next = list->next; head = list; } while (next && cmp(priv, list, next) > 0); list->next = prev; } else { do { len++; list = next; next = list->next; } while (next && cmp(priv, list, next) <= 0); list->next = NULL; } head->prev = NULL; head->next->prev = (struct list_head *) len; result.head = head, result.next = next; return result; } ``` 以下使用圖解來說明 `find_run` 的部分 : * 第一種情況 : 升序排列的節點 * 首先將 `head` 指向鏈結串列的開頭， `next` 指向 `list` 的下一個節點。 ![001](https://hackmd.io/_uploads/HJStmPv66.png) * 若 `next` 指標中節點的值大於(或等於) `list` 指標中節點的值，則繼續走訪，直到找到非升序的節點為止。 ![002](https://hackmd.io/_uploads/SkgRVwvTa.png)\ * 離開迴圈後回傳串列的開頭以及 `next` 指標，使得下一輪能夠繼續走訪並找到下一組 `run`。 ![003](https://hackmd.io/_uploads/BkNiSDwTp.png) * 第二種情況 : 降序排列的節點 * 同樣將 `head` 指向鏈結串列的開頭， `next` 指向 `list` 的下一個節點。值得注意的是會新增一個 `prev` 指標方便接下來做反轉串列。 ![001](https://hackmd.io/_uploads/BJCQ1_PTT.png) * 接著將 `list` 與 `next` 做反轉，第一輪操作結果如下圖 : ![5](https://hackmd.io/_uploads/r13oHODap.png) * 持續走訪直到 `next` 節點的值大於(或等於) `list` 節點的值 ![5](https://hackmd.io/_uploads/S1inU_w6T.png) * 最終離開迴圈後回傳串列的開頭以及 `next` 指標，使得下一輪能夠繼續走訪並找到下一組 `run`，可以發現走訪過的節點皆被反轉。 ![003](https://hackmd.io/_uploads/BkNiSDwTp.png) 接著，timsort 將檢視已找到的 runs 中，是否有存在可以合併的 runs，這也是 timsort 一個顯著特點，它會在分割過程中進行合併，以維持合併的平衡。為了保持 run 長度的平衡，該函數主要檢查堆棧頂端的三個 run 是否符合以下原則： * A 的長度需大於 B 與 C 的長度總和。 * B 的長度需大於 C 的長度。接著，在迴圈之後，將剩餘的 runs 進行合併，直到只剩下兩組 runs。最後，透過 merge_final 函數將這兩組 runs 合併，完成排序演算法。值得注意的是，如果最終只剩下一組 run，則會透過 `if (stk_size <= FFFF)` 這個條件判斷來進行判定，並將鏈表轉換為雙向鏈表後返回。因此，FFFF 應填入 1，以便於只剩一組 run 時進行正確的判斷。 ### 提出改進方案並予以實作再閱讀並理解作業中的程式碼後，我發現作業中的 `timsort` 並沒有針對選擇 `minrun` 的策略進行實作，因此我決定將程式碼加入這個部份並進行實驗。首先，我需要先取得 `minrun` 的值，而作業中有提到: >其中一個實務上的方法是，取資料長度的前 6 位，若剩餘位中任何一位被設置，則加 1。這主要是為了在處理隨機數列時，使得能夠被 minrun 切分成 2 的冪的 run，以便在合併時達到最佳平衡。我嘗試根據這個方法實作出一個名為 `int compute_min_run(int size)` 的函式。 ```c int compute_min_run(int size) { if (size >= 0) { int r = 0; while (size >= 32) { r |= (size & 1); size >>= 1; } return (size + r); } } ``` 這個 `compute_min_run` 函式的主要步驟是反覆檢查目前前的 `size` 是否大於等於 32。選用 32 的原因是因為 32 的二進制表示剛好是 100000，採用這樣的比較方式可以找到資料長度的前 6 位數值。通過 `r |= (size & 1);` 這行程式碼觀察在不斷除以 2 的過程中是否有任何一位餘數，最終將計算結果回傳。 `minrun` 的設計目的是讓剩餘資料在分割為 `minrun` 時，能夠盡可能接近 2 的冪，從而使得後續的合併過程更高效；實際操作中，會不斷將 n 除以 2，直到 n 小於一個預設的最小合併長度（MIN_MERGE），過程中只要有任何一次不能整除 2，最終結果就加一，這種方法確保分組接近 2 的冪。 ----------------------------------------------------- 我為了處理當 `run` 長度不滿足 `minrun` 時的情況，需要在尋找 `run` 時使用插入排序將 `next` 中的資料插入。因此，我需要針對 `find_run` 中的程式碼做調整。 ```diff static struct pair find_run(void *priv, struct list_head *list, list_cmp_func_t cmp, + int min_run) { size_t len = 1; struct list_head *next = list->next, *head = list; struct pair result; if (!next) { result.head = head, result.next = next; return result; } if (cmp(priv, list, next) > 0) { /* decending run, also reverse the list */ struct list_head *prev = NULL; do { len++; list->next = prev; prev = list; list = next; next = list->next; head = list; } while (next && cmp(priv, list, next) > 0); list->next = prev; } else { do { len++; list = next; next = list->next; } while (next && cmp(priv, list, next) <= 0); list->next = NULL; } + while (next && len < min_run) { + struct list_head *next_tmp = next->next; + struct list_head *prevr = NULL, *curr = head; + while (curr && cmp(priv, next, curr) > 0) { + prevr = curr; + curr = curr->next; + } + if (prevr) { + next->next = curr; + prevr->next = next; + } else { + next->next = head; + head = next; + } + len++; + next = next_tmp; + } head->prev = NULL; head->next->prev = (struct list_head *) len; result.head = head, result.next = next; return result; } ``` 以上程式碼修改 `find_run()` 是我針對當 `run` 節點數量小於 `minrun` 時使用插入排序的實作部份。首先，你在尋找 run 的迴圈之後，加入了一個新的 while 迴圈。這個迴圈的條件是 next 不為空且目前 run 的長度 len 小於最小 run 長度 min_run。在這個新增的迴圈中，事先儲存了 `next` 節點的下一個節點 `next_tmp` ，因為在插入排序的過程中， `next` 將被改變。接著，使用了一個標準的插入排序法，從 `head` 開始，找到 `next` 節點應該插入的正確位置。我使用了一個臨時節點 `curr` 來走訪鏈結串列，並用 `prevr` 記錄應該插入的位置前一個節點。一旦找到正確的插入位置，就更新鏈結關係，將 `next` 插入到正確的位置。如果插入位置是鏈結串列的開頭，就更新 `head` 指標。最後，我增加 `len` 計數器，並將 `next` 指向之前儲存的 `next_tmp` ，準備處理下一個節點的插入排序。透過這個修改，實現了當 `run` 長度小於 `min_run` 時，使用插入排序法將 `next` 中的資料插入的功能，確保排序的正確性。 -------------------------------------------------------- 經過以上實作後，我稍微做了實驗，實驗結果發現，執行我自己實作改進的 `timsort` 方法後，產生了較多的比較次數，然而若比較兩種方法的 `clock cycles` 數量，會發現我的方法使用的 `clock cycles` 數量，或許是因為使用了插入排序法增加了更多的比較次數，但是使用 `min_run` 讓整體合併效率提昇，進而減少 `clock cycles` 的花費數量。原始 `timsort` 結果： ``` Creating sample ==== Testing timesort ==== Comparisons: 9441 List is sorted ``` ``` Performance counter stats for './main': 2,746,517 cycles ``` 我所改進的 `timsort` 結果： ``` Creating sample ==== Testing timesort ==== Comparisons: 13694 List is sorted ``` ``` Performance counter stats for './main': 2,621,688 cycles ``` 以上的步驟只是初步的實驗，若要檢驗此改進的效果，需要再進一步作更多的實驗佐證。 ---------------------------------------- 我實作了一個程式，用於產生混合了部分排序和部分亂數的資料樣本，以便後續進行實驗。 ```c void create_sample() { printf("Creating sample\n"); srand(time(NULL)); int samples = 100000; FILE* output_file = fopen("random_data.txt", "w"); if (output_file == NULL) { printf("Failed to open output file.\n"); return 1; } for (int i = 0; i < samples; i++) { if (rand() % 2) { int random_num = rand(); fprintf(output_file, "%d\n", random_num); } else { int start_val = rand(); int remaining = samples - i - 1; remaining = remaining / 3; int count; if (!remaining) count = remaining; else count = rand() % (remaining) + 1; if (count > remaining) { count = remaining; } for (int j = 0; j < count; j++) { int val = start_val + j; fprintf(output_file, "%d\n", val); } i += count - 1; } } fclose(output_file); return 0; } ``` 該程式的主要實作概念是透過隨機的方式產生部分已經排序的資料，剩餘的部分則以亂數插入資料。整體上，該程式能夠生成符合 TimSort 適用場景的測試資料，用於後續的性能測試和優化。接著我產生樣本存入檔案中，兩種排序法讀入相同的資料作實驗比較。比較次數 (Comparisons) 實驗: |節點數量|Timsort|加入 `minrun` 設計| |-|-|-| |10000|44531|45948| |100000|444470|446720| |1000000|4060103|4063341| 經過更詳細的實驗發現，此改進方式並沒有達到更好的效果，比較次數也沒有明顯降低，甚至還些微提昇，我認為很可能是因為 `timsort` 使用二分插入排序法，而我的實作使用一般的插入排序法效率不高，反而增加了更多的比較次數。 ### 將改進過的 Timsort 實作整合進 sysprog21/lab0-c 我將 `timsort` 嘗試整合進 `lab0-c` 時，不知道為何排序一直出錯，使用測驗提供的 `main` 函式測試時都可以正確排序，但整合進 `sysprog21/lab0-c` ，就無法正常排序。 ## 第二週測驗題 > [完整題目](https://hackmd.io/@sysprog/linux2024-quiz2) ### 測驗 1 >延伸問題： >- [x] 解釋上述程式碼的運作，撰寫完整的測試程式，指出其中可改進之處並實作。 >- [x] 在 Linux 核心的 cgroups 相關原始程式碼，找出 pre-order walk 程式碼並探討。 ### 運用 Linux 核心的 hash table 實作建立二元樹理解 ```c static struct TreeNode *buildTree(int *preorder, int preorderSize, int *inorder, int inorderSize) { struct hlist_head *in_heads = malloc(inorderSize * sizeof(*in_heads)); for (int i = 0; i < inorderSize; i++) INIT_HLIST_HEAD(&in_heads[i]); for (int i = 0; i < inorderSize; i++) node_add(inorder[i], i, inorderSize, in_heads); return dfs(preorder, 0, preorderSize - 1, inorder, 0, inorderSize - 1, in_heads, inorderSize); } ``` 從以上程式碼中可以看到，在 `buildTree` 函式中,它先建立一個 `hlist_head` 陣列`in_heads` 大小為 `inorderSize`。每個元素代表一個雜湊鏈結串列的開頭。 ```graphviz digraph so { rankdir=TB ; Array [ shape = record, label = "{ 0 | 1 | 2 | 3 |4 }"] ; in_heads [shape= "none"] in_heads -> Array [style=invis] shape = "none" inorder [label="Inorder:";shape="none"] {rank=same inorder 9 3 15 20 7} 9 [shape = "none"] 3 [shape = "none"] 15 [shape = "none"] 20 [shape = "none"] 7 [shape = "none"] inorder -> 9 -> 3 -> 15 -> 20 -> 7 [style=invis] color = white; } ``` ```c static inline void node_add(int val, int idx, int size, struct hlist_head *heads) { struct order_node *on = malloc(sizeof(*on)); on->val = val; on->idx = idx; int hash = (val < 0 ? -val : val) % size; hlist_add_head(&on->node, DDDD); } ``` 接著它走訪`inorder` 串列，對於每個值，計算該值對 `inorderSize` 取餘數作為該值的雜湊索引，然後呼叫 `node_add` 函式來新增一個 `order_node `到對應的雜湊鏈結串列中，值得注意的是 `order_node` 除了會除存本身節點內的值以外還會儲存在 `inorder` 串列的索引值 `idx`。 ```graphviz digraph so { rankdir=LR ; {rank=same inh inh2 inh3 inh4 } inh [label="in_heads[0]"; shape="none"] inh1 [label="in_heads[1]"; shape="none"] inh2 [label="in_heads[2]"; shape="none"] inh3 [label="in_heads[3]"; shape="none"] inh4 [label="in_heads[4]"; shape="none"] 15 [label="{val : 15 | Idx : 2}", shape="record"]; 20 [label="{val : 20 | Idx : 3}", shape="record"]; 7 [label="{val : 7 | Idx : 4}", shape="record"]; 9 [label="{val : 9 | Idx : 0}", shape="record"]; 3 [label="{val : 3 | Idx : 1}", shape="record"]; inh -> 20 -> 15 inh2 -> 7 inh3 -> 3 inh4 -> 9 } ``` 而程式碼 `hlist_add_head(&on->node, DDDD);` 的部分應該為將節點加入特定的雜湊串列中，因此 `DDDD` 應該傳入特定雜湊鏈結串列的開頭，而 `&heads[hash]` 即為此節點需要放入的雜湊串列開頭。接著，繼續追蹤程式碼 : ```c static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h) { if (h->first) h->first->pprev = &n->next; n->next = AAAA; n->pprev = &h->first; h->first = n; } ``` `hlist_add_head` 傳入兩個參數:一個指向 `hlist_node` 結構體的指針 `n` 和一個指向 `hlist_head` 結構體的指針 `h` 後，首先判斷這個雜湊鏈結串列是否本身已儲存節點，若存在節點，應事先將第一個節點的 `pprev` 指標指向 `&n->next` ，值得注意的是 `pprev` 是一個指標的指標用於存儲指向該節點的上一個節點的 next 指針的地址，以方便對鏈結進行操作。接著 `n->next = AAAA;` 此程式碼中，新節點的 `next` 指標應該指向原來的開頭節點。因此 `AAAA` 應該替換為 `h->first`。圖解: ```graphviz digraph so { rankdir=LR ; inh [label="in_heads[0]"; shape="none"] 15 [label="{val : 15 | Idx : 2}", shape="record"]; 20 [label="{val : 20 | Idx : 3 }", shape="record"]; inh -> 15 [label="pprev"]; } ``` ---------------------------- ```graphviz digraph so { rankdir=LR ; inh [label="in_heads[0]"; shape="none"] nul [label=""; shape="none"] 15 [label="{val : 15 | Idx : 2}", shape="record"]; 20 [label="{val : 20 | Idx : 3 }", shape="record"]; inh -> 20 -> 15 [label="pprev"] } ``` ------------------ 將所有 `inorder` 中的節點正確放入雜湊串列後，執行 `dfs` 來建構二元樹。 ```c static struct TreeNode *dfs(int *preorder, int pre_low, int pre_high, int *inorder, int in_low, int in_high, struct hlist_head *in_heads, int size) { if (in_low > in_high || pre_low > pre_high) return NULL; struct TreeNode *tn = malloc(sizeof(*tn)); tn->val = preorder[pre_low]; int idx = find(preorder[pre_low], size, in_heads); tn->left = dfs(preorder, pre_low + 1, pre_low + (idx - in_low), inorder, in_low, idx - 1, in_heads, size); tn->right = dfs(preorder, pre_high - (in_high - idx - 1), pre_high, inorder, idx + 1, in_high, in_heads, size); return tn; } ``` 首先看到 `dfs` 接受以下參數: * `preorder`、`pre_low`、`pre_high`：代表先序走訪的陣列及其目前處理範圍。 * `inorder`、`in_low`、`in_high`：代表中序走訪陣列及其目前處理範圍。 * `in_heads`：一個雜湊鏈結串列，用於快速找到中序走訪中節點的索引。 * `size`：中序走訪陣列的大小，也是雜湊的大小。 ------ 檢查遞迴範圍是否有效: ```c if (in_low > in_high || pre_low > pre_high) return NULL; ``` ------- 接著程式碼先建立一個新的樹節點 `tn`，並將它的值 `tn->val` 初始化為先序陣列在目前處理範圍 `pre_low` 的元素值 `preorder[pre_low]`。這麼做的原因是:根據先序走訪的規則，第一個元素就是樹的根節點。先序走訪是按照「根->左子樹->右子樹」的順序訪問樹的各個節點，因此先序遍歷數組的第一個元素自然就是樹的根節點。所以這段程式碼先新建一個樹節點 `tn`，並把它的值 `tn->val` 初始化為先序遍歷數組最前面的那個元素值 `preorder[pre_low]`，這個值就是樹的根節點值。 ```c struct TreeNode *tn = malloc(sizeof(*tn)); tn->val = preorder[pre_low]; ``` ------- 最後透過 `find` 找到雜湊表中 `preorder[pre_low]` 在 `inorder` 陣列中的位置。 ```c int idx = find(preorder[pre_low], size, in_heads); ``` 根據要查找的值計算出一個雜湊值， `find` 函數會根據這個雜湊值從事先建立的雜湊表陣列(`heads`)中取出對應的雜湊串列(`hlist_head`)，使用 `hlist_for_each` 走訪這個雜湊串列。而 `hlist_for_each` 需傳入的是雜湊串列的開頭，因此 `BBBB` 應填入 `&heads[hash]` 。而在走訪的過程中會建立一個存放 `order_node` 的指標，因此 `CCCC` 應填入 `list_entry` 使用類似 linux 核心中 `container_of` 巨集效果，找到該 `order_node` 對應的位置，最後找到正確節點後回傳索引值。 ```c static int find(int num, int size, const struct hlist_head *heads) { struct hlist_node *p; int hash = (num < 0 ? -num : num) % size; hlist_for_each (p, BBBB) { struct order_node *on = CCCC(p, struct order_node, node); if (num == on->val) return on->idx; } return -1; } ``` --- 找到 `idx` 後即可透過遞迴呼叫的方式建構整個二元樹 ```c int idx = find(preorder[pre_low], size, in_heads); tn->left = dfs(preorder, pre_low + 1, pre_low + (idx - in_low), inorder, in_low, idx - 1, in_heads, size); tn->right = dfs(preorder, pre_high - (in_high - idx - 1), pre_high, inorder, idx + 1, in_high, in_heads, size); ``` ------- ```graphviz digraph so { rankdir=TB ; shape = "none" inorder [label="Inorder:";shape="none"] {rank=same inorder 9 3 15 20 7} 9 [shape = "none"] 3 [color=blue;shape = ""] 15 [shape = "none"] 20 [shape = "none"] 7 [shape = "none"] inorder -> 9 -> 3 -> 15 -> 20 -> 7 [style=invis] 91 [label="9";shape = "none"] 31 [color=red;label="3";shape = "";] 151 [label="15";shape = "none"] 201 [label="20";shape = "none"] 71 [label="7";shape = "none"] preorder [label="Preorder:";shape="none"] {rank=same preorder 31 91 201 151 71} preorder -> 31 -> 91 -> 201 -> 151 -> 71[style=invis] 31 -> 9 [style=invis] } ``` ----------------- ```graphviz digraph so { rankdir=TB ; shape = "none" inorder [label="Inorder:";shape="none"] {rank=same inorder 9 3 } 9 [color=red;shape = ""] 3 [color=red;label="15 20 7"shape = ""] inorder 31 [color=blue;label="3";shape = "";] 31 -> 9 31 -> 3 } ``` ------ ### 指出其中可改進之處並實作在閱讀並理解上述程式碼後，我認為可以將程式嘗試改為非遞迴版本，在作業一中使用推疊非遞迴快速排序法，因此我想要嘗試以這個方式實現。實作成非遞迴的過程比我想像的複雜一些，總之我使用了推疊，將原本需要遞迴的部份全部都使用堆疊作儲存，希望以這個方式作實作可以改善記憶體的使用。 >程式碼： [Iterative approach to construct a tree](https://github.com/SHChang-Anderson/lab1/blob/main/nonrecurtree.c) 我參考了 @csotaku0926 的檢測程式碼 [github](https://github.com/csotaku0926/Linux_Hw2_Lab/tree/main/2-1) 建立不同深度的 binary tree ，用於實驗我的非遞迴程式記憶體使用量。實驗：建立深度為 10 的 binary tree: 遞迴的實作中總堆積消耗了 79.9 Kib ![2024-03-11 14-55-18](https://hackmd.io/_uploads/ByZlj73aT.png) 我實作的非遞迴版本總堆積消耗了 71.9 Kib ![2024-03-11 14-55-02](https://hackmd.io/_uploads/HJ1-i7nTT.png) 建立深度為 15 的 binary tree: 遞迴的實作中總堆積消耗了 2.5 Mib ![2024-03-11 14-57-20](https://hackmd.io/_uploads/r17wj72aa.png) 非遞迴的實作中總堆積消耗了 2.2 Mib ![螢幕快照 2024-03-11 15-00-13](https://hackmd.io/_uploads/ry0WhmhaT.png) 實驗結果驗證了使用非遞迴減少記憶體使用量的有效性。 ### 在 Linux 核心的 cgroups 相關原始程式碼，找出 pre-order walk 程式碼並探討找到位於 [/kernel/cgroup/cgroup.c](https://github.com/torvalds/linux/blob/master/kernel/cgroup/cgroup.c) 中進行 pre-order traversal 的程式碼。 ```c struct cgroup_subsys_state * css_next_descendant_pre(struct cgroup_subsys_state *pos, struct cgroup_subsys_state *root) { struct cgroup_subsys_state *next; cgroup_assert_mutex_or_rcu_locked(); /* if first iteration, visit @root */ if (!pos) return root; /* visit the first child if exists */ next = css_next_child(NULL, pos); if (next) return next; /* no child, visit my or the closest ancestor's next sibling */ while (pos != root) { next = css_next_child(pos, pos->parent); if (next) return next; pos = pos->parent; } return NULL; } ``` 這個函式接受了 `pos` 以及 `root` 兩個節點分別代表了目前要訪問的節點以及需要走訪後代的根節點。接著針對 `pos` 作條件判斷，若 `pos == NULL` 說明是第一次走訪，則直接訪問其後代根節點。若 `pos` 存在會先嘗試走訪子節點，若子節點存在，則回傳子節點，反之會沿著父節點的方向向上尋找，嘗試找到目前節點或最近 `parent` 的下一個兄弟節點( `sibling` )。如果找到了，就返回該兄弟節點。最終回到節點完成走訪實現了 `pre-order traversal` 。 ### 測驗 2 >延伸問題: >- [ ] 解釋上述程式碼的運作，撰寫完整的測試程式，指出其中可改進之處並實作 >- [ ] 在 Linux 核心找出 LRU 相關程式碼並探討 ### Least Recently Used (LRU) C 程式實作理解針對以下四個函式進行了解: * `lRUCacheCreate(int capacity)` : 建立一個新的 LRU 快取，同時初始化了: * 雙向鏈結串列 `dhead`，用於儲存 LRU 快取中的節點，並按最近使用的順序排列。 * 一個堆疊 `hhead`，用於快速查找給定鍵對應的節點。 `2 * sizeof(int)` 的原因是為了在 LRUCache 結構體中預留空間存放兩個整數成員變數: * capacity - 快取的最大容量 * count - 當前快取中已存放的節點數量 ```graphviz digraph so { rankdir=TB ; Array [ shape = record, label = "{ 0 | 1 | 2 | 3 |4 }"] ; hheads [shape= "none"] hheads -> Array dhead [label="dhead";shape="none"] 01 [label="";shape= "s"] 11 [label="";shape= "s"] 21 [label="";shape= "s"] 31 [label="";shape= "s"] 41 [label="";shape= "s"] dhead -> 01 -> 11 -> 21 -> 31 -> 41 41 -> 31 -> 21 -> 11 -> 01 } ``` * `lRUCacheFree(LRUCache *obj)` : 釋放之前分配的記憶體並清除 LRU 快取。 ```c void lRUCacheFree(LRUCache *obj) { struct list_head *pos, *n; list_for_each_safe (pos, n, &obj->dhead) { LRUNode *cache = list_entry(pos, LRUNode, FFFF); list_del(GGGG); free(cache); } free(obj); } ``` 可以觀察到程式碼使用使用 `list_for_each_safe` 巨集走訪雙向鏈結串列 obj->dhead，並逐一地將其節點刪除，需要注意的是 `LRUNode *cache = list_entry(pos, LRUNode, FFFF);` 使用 `list_entry` 找到 `LRUNode` 結構體的位置並進行刪除，而 `FFFF` 應該替換為 `LRUNode` 結構體中用於鏈結串列的成員變數名稱，因此 `FFFF` 應填入 `link` 。而 `list_del(GGGG)` 用於從雙向鏈結串列中刪除節點。`GGGG` 應該替換為 `LRUNode` 結構體中用於鏈結串列的成員變數名稱 `&cache->link` 。 * `lRUCacheGet(LRUCache *obj, int key)` ```c int lRUCacheGet(LRUCache *obj, int key) { int hash = key % obj->capacity; struct hlist_node *pos; hlist_for_each (pos, &obj->hhead[hash]) { LRUNode *cache = list_entry(pos, LRUNode, HHHH); if (cache->key == key) { list_move(IIII, &obj->dhead); return cache->value; } } return -1; } ``` 在 `lRUCacheGet(LRUCache *obj, int key)` 中，在 `cache` 找到特定鍵值節點並回傳 `value` ，`LRUNode *cache = list_entry(pos, LRUNode, HHHH);` ，目的是找到走訪的節點指標入口， `HHHH` 應替換為 `node` 。 `list_move(IIII, &obj->dhead);` 將節點移動到 `dhead` 開頭 (表示最近使用)，`IIII` 應替換為 `&cache->link`。 * `lRUCachePut(LRUCache *obj, int key, int value)` ```c void lRUCachePut(LRUCache *obj, int key, int value) { LRUNode *cache = NULL; int hash = key % obj->capacity; struct hlist_node *pos; hlist_for_each (pos, &obj->hhead[hash]) { LRUNode *c = list_entry(pos, LRUNode, JJJJ); if (c->key == key) { list_move(KKKK, &obj->dhead); cache = c; } } if (!cache) { if (obj->count == obj->capacity) { cache = list_last_entry(&obj->dhead, LRUNode, link); list_move(&cache->link, &obj->dhead); hlist_del(&cache->node); hlist_add_head(&cache->node, &obj->hhead[hash]); } else { cache = malloc(sizeof(LRUNode)); hlist_add_head(&cache->node, &obj->hhead[hash]); list_add(&cache->link, &obj->dhead); obj->count++; } cache->key = key; } cache->value = value; } ``` 觀察以上程式碼可以看出將節點放入快取分成三種情況作探討: * 快取已存在相同鍵的節點 : * 首先走訪堆疊中對應鍵值的鏈結串列，查找是否已存在相同鍵的節點。 * 如果找到了相同鍵的節點 c ，將其從雙向鏈結串列 dhead 中移動到開頭。 * 將找到的節點 c 賦值給 cache，準備更新其值。 * 快取已滿，需要先移除最久未使用的節點 : * 取得雙向鏈結串列 `dhead` 尾部的節點 (最久未使用的節點)。 * 將最久未使用的節點從 `dhead` 移動到開頭。 * 從`hhead` 堆疊中刪除該最久未使用節點的串列節點。 * 將當前要插入的新節點 `cache` 的節點插入到 `hhead` 串列堆疊的對應位置。 * 快取未滿，直接插入新節點 : * 將新節點插入到的 `hhead` 堆疊的對應串列位置。 * 將新節點插入到雙向鏈結串列 `dhead` 的開頭 (表示最近使用)。 * 增加 `obj->count`。 * 更新新節點的 `key`。 `LRUNode *c = list_entry(pos, LRUNode, JJJJ);` 找到 hhead 對應鍵值的鏈結串列， `JJJJ` 應替換為 `node` 才能將節點位置放入 `*c`。 `list_move(KKKK, &obj->dhead);` 程式碼實現了如果找到了相同鍵的節點 c ，將其從雙向鏈結串列 dhead 中移動到開頭，因此 `KKKK` 應填入 `&c->link` 。圖解 : **Case 1: 快取已存在相同鍵的節點** * 從雙向鏈結串列 dhead 中移動到開頭 ```graphviz digraph so { rankdir=TB; 5 [shape="s"] {rank=same key 5 } key -> 5 [style=invis] hheads112 [label="hheads[0]";shape= "none"] hheads11 [label="hheads[1]";shape= "none"] hheads211 [label="hheads[2]";shape= "none"] hheads311 [label="hheads[3]";shape= "none"] 55 [label="5";shape="s";color=red] 1010 [label="10";shape="S";color=red] hheads112 -> 55 -> 1010 11111 [label="1";shape="s";color=blue] hheads11 -> 11111 121212 [label="12";shape="s";color=green] hheads211 -> 121212 131313 [label="3";shape="s"] hheads311 -> 131313 key [label="key:";shape="none"] } ``` ```graphviz digraph { hheads [label="hheads[1]";shape= "none"] hheads1 [label="hheads[0]";shape= "none"] hheads21 [label="hheads[3]";shape= "none"] hheads31 [label="hheads[2]";shape= "none"] hheads1 -> 11 hheads -> 01 hheads21 -> 21 hheads31 -> 41 {rank=same dhead 11 21 31 41 01 } dhead [label="dhead";shape="none"] 01 [label="1";shape= "s";color=blue] 11 [label="5";shape= "s";color=red] 21 [label="3";shape= "s"] 31 [label="10";shape= "s";color=red] 41 [label="12";shape= "s";color=green] dhead -> 01 -> 11 -> 21 -> 31 -> 41 41 -> 31 -> 21 -> 11 -> 01 } ``` ------ ```graphviz digraph so { rankdir=TB; key [label="key:";shape="none"] 5 [shape="s";color=red] {rank=same key 5 } key -> 5 [style=invis] hheads112 [label="hheads[0]";shape= "none"] hheads11 [label="hheads[1]";shape= "none"] hheads211 [label="hheads[2]";shape= "none"] hheads311 [label="hheads[3]";shape= "none"] 55 [label="5";shape="s";color=red] 1010 [label="10";shape="S";color=red] hheads112 -> 55 -> 1010 11111 [label="1";shape="s"] hheads11 -> 11111 121212 [label="12";shape="s"] hheads211 -> 121212 131313 [label="3";shape="s"] hheads311 -> 131313 } ``` ```graphviz digraph so { rankdir=TB; hheads [label="hheads[1]";shape= "none"] hheads1 [label="hheads[0]";shape= "none"] hheads21 [label="hheads[3]";shape= "none"] hheads31 [label="hheads[2]";shape= "none"] hheads1 -> 11 hheads -> 01 hheads21 -> 21 hheads31 -> 41 {rank=same dhead 11 21 31 41 01 } dhead [label="dhead";shape="none"] 01 [label="5";shape= "s";color=red] 11 [label="1";shape= "s"] 21 [label="3";shape= "s"] 31 [label="10";shape= "s";color=red] 41 [label="12";shape= "s"] dhead -> 01 -> 11 -> 21 -> 31 -> 41 41 -> 31 -> 21 -> 11 -> 01 } ``` ----- **Case 2: 快取已滿，移除最久未使用的節點** ```graphviz digraph so { rankdir=TB; 4 [shape="s"] {rank=same key 4 } key -> 4 [style=invis] hheads112 [label="hheads[0]";shape= "none"] hheads11 [label="hheads[1]";shape= "none"] hheads211 [label="hheads[2]";shape= "none"] hheads311 [label="hheads[3]";shape= "none"] 55 [label="5";shape="s";color=red] 1010 [label="10";shape="S";color=red] hheads112 -> 55 -> 1010 11111 [label="1";shape="s";color=blue] hheads11 -> 11111 121212 [label="12";shape="s";color=green] hheads211 -> 121212 131313 [label="3";shape="s"] hheads311 -> 131313 key [label="key:";shape="none"] } ``` ```graphviz digraph so { rankdir=TB; hheads [label="hheads[1]";shape= "none"] hheads1 [label="hheads[0]";shape= "none"] hheads21 [label="hheads[3]";shape= "none"] hheads31 [label="hheads[2]";shape= "none"] hheads1 -> 11 hheads -> 01 hheads21 -> 21 hheads31 -> 41 {rank=same dhead 11 21 31 41 01 } dhead [label="dhead";shape="none"] 01 [label="1";shape= "s";color=blue] 11 [label="5";shape= "s";color=red] 21 [label="3";shape= "s"] 31 [label="10";shape= "s";color=red] 41 [label="12";shape= "s";color=green] dhead -> 01 -> 11 -> 21 -> 31 -> 41 41 -> 31 -> 21 -> 11 -> 01 } ``` ----- ```graphviz digraph so { rankdir=TB; 4 [shape="s"] {rank=same key 4 } key -> 4 [style=invis] hheads112 [label="hheads[0]";shape= "none"] hheads11 [label="hheads[1]";shape= "none"] hheads211 [label="hheads[2]";shape= "none"] hheads311 [label="hheads[3]";shape= "none"] 55 [label="5";shape="s"] 1010 [label="10";shape="S"] hheads112 -> 55 -> 1010 11111 [label="1";shape="s"] hheads11 -> 11111 121212 [label="12";shape="s";color=red] hheads211 -> 121212 131313 [label="3";shape="s"] hheads311 -> 131313 key [label="key:";shape="none"] } ``` ```graphviz digraph so { rankdir=TB; hheads [label="hheads[2]";shape= "none"] hheads1 [label="hheads[1]";shape= "none"] hheads21 [label="hheads[0]";shape= "none"] hheads31 [label="hheads[3]";shape= "none"] hheads1 -> 11 hheads -> 01 hheads21 -> 21 hheads31 -> 31 {rank=same dhead 11 21 31 41 01 } dhead [label="dhead";shape="none"] 01 [label="12";shape= "s";color=red] 11 [label="1";shape= "s"] 21 [label="5";shape= "s"] 31 [label="3";shape= "s"] 41 [label="10";shape= "s"] dhead -> 01 -> 11 -> 21 -> 31 -> 41 41 -> 31 -> 21 -> 11 -> 01 } ``` ------ ```graphviz digraph so { rankdir=TB; key [label="key:";shape="none"] 4 [shape="s";color=red] {rank=same key 4 } key -> 4 [style=invis] hheads112 [label="hheads[0]";shape= "none"] hheads11 [label="hheads[1]";shape= "none"] hheads211 [label="hheads[2]";shape= "none"] hheads311 [label="hheads[3]";shape= "none"] hheads411 [label="hheads[4]";shape= "none"] 55 [label="5";shape="s"] 1010 [label="10";shape="S"] hheads112 -> 55 -> 1010 11111 [label="1";shape="s"] hheads11 -> 11111 121212 [label="";shape="s";color=red] hheads211 -> 121212 131313 [label="3";shape="s"] hheads311 -> 131313 141414 [label="4";shape="s";color=red] hheads411 -> 141414 } ``` ```graphviz digraph so { rankdir=TB; {rank=same hheads } hheads [label="hheads[4]";shape= "none"] hheads1 [label="hheads[1]";shape= "none"] hheads21 [label="hheads[0]";shape= "none"] hheads31 [label="hheads[3]";shape= "none"] hheads1 -> 11 hheads -> 01 hheads21 -> 21 hheads31 -> 31 {rank=same dhead 11 21 31 41 01 } dhead [label="dhead";shape="none"] 01 [label="4";shape= "s";color=red] 11 [label="1";shape= "s"] 21 [label="5";shape= "s"] 31 [label="3";shape= "s"] 41 [label="10";shape= "s"] dhead -> 01 -> 11 -> 21 -> 31 -> 41 41 -> 31 -> 21 -> 11 -> 01 } ``` ------------------ **Case 3: 快取未滿，直接插入新節點** ```graphviz digraph so { rankdir=TB; key [label="key:";shape="none"] 4 [shape="s"] {rank=same key 4 } key -> 4 [style=invis] hheads112 [label="hheads[0]";shape= "none"] hheads11 [label="hheads[1]";shape= "none"] hheads311 [label="hheads[3]";shape= "none"] 55 [label="5";shape="s";color=red] 1010 [label="10";shape="S";color=red] hheads112 -> 55 -> 1010 11111 [label="1";shape="s";color=blue] hheads11 -> 11111 131313 [label="3";shape="s"] hheads311 -> 131313 key [label="key:";shape="none"] 4 [shape="s"] {rank=same key 4 } key -> 4 [style=invis] } ``` ```graphviz digraph so { rankdir=TB; hheads [label="hheads[1]";shape= "none"] hheads1 [label="hheads[0]";shape= "none"] hheads21 [label="hheads[3]";shape= "none"] hheads1 -> 11 hheads -> 01 hheads21 -> 21 {rank=same dhead 11 21 31 41 01 } dhead [label="dhead";shape="none"] 01 [label="1";shape= "s";color=blue] 11 [label="5";shape= "s";color=red] 21 [label="3";shape= "s"] 31 [label="10";shape= "s";color=red] 41 [label="";shape= "s"] dhead -> 01 -> 11 -> 21 -> 31 -> 41 41 -> 31 -> 21 -> 11 -> 01 } ``` ------- ```graphviz digraph so { rankdir=TB; key [label="key:";shape="none"] 4 [shape="s";color=red] {rank=same key 4 } key -> 4 [style=invis] hheads112 [label="hheads[0]";shape= "none"] hheads11 [label="hheads[1]";shape= "none"] hheads311 [label="hheads[3]";shape= "none"] hheads314 [label="hheads[4]";shape= "none"] 554 [label="4";shape="s";color=red] hheads314 -> 554 55 [label="5";shape="s"] 1010 [label="10";shape="S"] hheads112 -> 55 -> 1010 11111 [label="1";shape="s"] hheads11 -> 11111 131313 [label="3";shape="s"] hheads311 -> 131313 } ``` ```graphviz digraph so { rankdir=TB; hheads [label="hheads[4]";shape= "none"] hheads1 [label="hheads[0]";shape= "none"] hheads21 [label="hheads[3]";shape= "none"] hheads01101 [label="hheads[1]";shape= "none"] hheads1 -> 21 hheads01101 -> 11 hheads -> 01 hheads21 -> 31 {rank=same dhead 11 21 31 41 01 } dhead [label="dhead";shape="none"] 01 [label="4";shape= "s";color=red] 11 [label="1";shape= "s"] 21 [label="5";shape= "s"] 31 [label="3";shape= "s"] 41 [label="10";shape= "s"] dhead -> 01 -> 11 -> 21 -> 31 -> 41 41 -> 31 -> 21 -> 11 -> 01 } ``` ------- ### 指出其中可改進之處並實作在現有的實現中，當查找到一個已存在的鍵值節點時，該節點會被移動到雙向鏈結串列 `dhead` 的開頭，以表示它是最近使用過的節點。然而，在 `lRUCacheGet` 函式中，查找操作是從 `hhead` 對應鍵值的串列中進行的，而沒有利用到將節點移動到 dhead 開頭的優勢。因此，我認為在將節點移動到 `dhead` 開頭的同時，也將該節點移動到散列表 `hhead` 對應鏈表的開頭位置。這樣做可能會帶來更好的效果，因為不僅維護了最近使用順序，同時也可以加快後續對同一鍵值的查找速度。 ### 在 Linux 核心找出 LRU 相關程式碼並探討 ### 測驗 3 >延伸問題 >- [ ] 解釋上述程式碼的運作原理 >- [ ] 在 Linux 核心原始程式碼找出 find_nth_bit 的應用案例，並予以解說和分析，需要涵蓋 CPU affinity 相關的原始程式碼。 ### `find_nth_bit` 程式碼理解 ```c static inline unsigned long find_nth_bit(const unsigned long *addr, unsigned long size, unsigned long n) { if (n >= size) return size; if (small_const_nbits(size)) { unsigned long val = *addr & GENMASK(size - 1, 0); return val ? fns(val, n) : size; } return FIND_NTH_BIT(addr[idx], size, n); } ``` 首先觀察 `find_nth_bit` 所傳入的參數 : * `unsigned long *addr` 指向位元組的指標。 * `unsigned long size` 最大需要被搜尋的位元數。 * `unsigned long n` 找到第 n 個被設為 1 的位數。從傳入的參數大致可以猜想此函式的目的式找到第 n 個為 1 位數的引數 (index)。首先 n 是否超過位元組的大小範圍。接下來，深入了解 `small_const_nbits` 的作用。這個巨集用來確認一個給定的位數 `nbits` 是否在編譯時就已經確定，並且這個位數要小於等於 `BITS_PER_LONG` 同時大於 0。 ```c #define small_const_nbits(nbits) \ (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0) ``` 以上巨集用來判斷 **nbits 是否在 `BITS_PER_LONG` 給定的位元長度的範圍內** (沒有跨越第一個字節的範圍)。另外我參考了 [linux2021-quiz2#測驗-3](https://hackmd.io/@sysprog/linux2021-quiz2#%E6%B8%AC%E9%A9%97-3) 對於 `__builtin_constant_p` 的解釋 : >__builtin_constant_p() 編譯器優化: 此 GCC 內建函式判斷其參數值是否在編譯時期就已經知道。如果是，則回傳 1 (代表編譯器可做 constant folding)。因此我了解到，如果編譯器能夠在編譯時就知道某個參數的值，則 `__builtin_constant_p()` 回傳 1 若位元組的大小是在給定的位元長度的範圍內，則利用 `GENMASK` 來生成一個 `mask` ，以便保留低位的位元值。以上這個條件判斷的主要目的是如果 `size` 足夠小，那麼可以使用更簡單、更快速的位元操作來找到第 n 個 1 位元，而不需要調用更複雜的 `FIND_NTH_BIT` 巨集。 `GENMASK` : ```c #define GENMASK(h, l) \ (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h)))) ``` `GENMASK` 是一個產生 `bitmask` 的巨集，具體實現步驟如下 : 假設 `GENMASK(h, l)` 傳入參數 `h=13` `l=3` `GENMASK(13, 3)`，而 ` BITS_PER_LONG` 設為 32。 `~0UL` 產生一個所有位元都是 1 的數字。 `~0UL` : ![image](https://hackmd.io/_uploads/Hyw48sa6a.png) 而 `(1UL << (l))` 將 1 向左移動 l 位數，我們假設傳入的 `l = 3` 。 `(1UL << (l))` : ![image](https://hackmd.io/_uploads/S19a8jpTT.png) 若將 `((~0UL)` - `(1UL << (l)) ` + `1` 會產生一個從第 l 位置最高位皆為 1 的數字。 `((~0UL) - (1UL << (l)) + 1` : ![image](https://hackmd.io/_uploads/ByCf_i66T.png) ![image](https://hackmd.io/_uploads/HyeUujT6T.png) 接著再將 `~0UL` 向右移 `(BITS_PER_LONG - 1 - (h))` 個位數，在與上述計算出來的數字做 `&` 即可得到一個從 3 至 13 位的 `bitmask` 。 ![image](https://hackmd.io/_uploads/BkxRKoa66.png) 將 `*addr` 與 `mask` 進行位元 AND 運算後，會保留 addr 指向值的低 size 位，而高於 size 的位元會被清為零。最後，使用 fns 函數來尋找第 n 個設為 1 的位元的位置。再來我們繼續探究 `fns` 這個函式。 ```c static inline unsigned long fns(unsigned long word, unsigned int n) { while (word) { unsigned int bit = __ffs(word); if (n-- == 0) return bit; __clear_bit(bit, &word); } return BITS_PER_LONG; } ``` `fns` 主要目標為找到第 n 個被設為 1 的位置，在程式碼中使用了迴圈每一次迴圈不斷地減少 n 值，直到找到第 n 個 1 為止，值得注意的是，找 1 位置的方法使用了 `__ffs` 這樣的函式 (關於 `ffs` 與 `__ffs` 的差異可以參考 ( [ffs 及 __ffs 加雙底線與否的不同](https://hackmd.io/@sysprog/ByS9w_lHh#ffs-%E5%8F%8A-__ffs-%E5%8A%A0%E9%9B%99%E5%BA%95%E7%B7%9A%E8%88%87%E5%90%A6%E7%9A%84%E4%B8%8D%E5%90%8C) ) ，而 `__ffs` 函式使用一個類似二分搜尋的方式高效率的定位到 `word` 中的最低位 1 的位置，使得不需要逐位逐一檢查。 ```c static inline unsigned long __ffs(unsigned long word) { int num = 0; #if BITS_PER_LONG == 64 if ((word & AAAA) == 0) { num += 32; word >>= 32; } #endif if ((word & 0xffff) == 0) { num += 16; word >>= 16; } if ((word & 0xff) == 0) { num += 8; word >>= 8; } if ((word & 0xf) == 0) { num += 4; word >>= 4; } if ((word & 0x3) == 0) { num += 2; word >>= 2; } if ((word & 0x1) == 0) num += 1; return num; } ``` 這段程式碼中，每一個 `if` 條件判斷式會檢查目前 `word` 一半的位元，逐步的縮小找到最低位 1 的範圍。在第一次的條件判斷中 `if ((word & AAAA) == 0)` 需要檢查最低 32 位元的值是否皆為零，因此 `AAAA` 應該替換為 `0xffffffff` 。可以特別觀察的是在 `fns` 迴圈中每一次找到最低位為 1 的位元位置後會呼叫 ` __clear_bit(bit, &word);` 因此我們可以特別探究這個函式。 ```c static inline void __clear_bit(unsigned long nr, volatile unsigned long *addr) { unsigned long mask = BIT_MASK(nr); unsigned long *p = ((unsigned long *) addr) + BIT_WORD(nr); *p &= BBBB; } ``` `__clear_bit` 主要功能是用於清除一個特定位元的值，主要的操作如下: * 首先透過 `BIT_MASK` 巨集產生一個只有在第 `nr` 位是 1 其餘都是 0 的 `mask` 。 * 再使用 `BIT_WORD(nr)` 找到 `nr` 所在的 `unsigned long` 變數位置索引。 * 最後將關鍵的位元做清除，主要的清楚方式是透過與 `BIT_MASK` 所產生的數字做 `AND` 然而這樣的結果是保留 `nr` 這個位元，因此需要做 ~mask 使得 `nr` 得以被刪除保留其他位元。而也因為這個操作可以了解到 `*p &= BBBB;` 應改為 `*p &= ~mask;` ![image](https://hackmd.io/_uploads/HJO6MAaaa.png) ![image](https://hackmd.io/_uploads/rJb-Q0aaa.png) 最後我們看到 `FIND_NTH_BIT` 這個巨集 ```c #define FIND_NTH_BIT(FETCH, size, num) \ ({ \ unsigned long sz = (size), nr = (num), idx, w, tmp; \ \ for (idx = 0; (idx + 1) * BITS_PER_LONG <= sz; idx++) { \ if (idx * BITS_PER_LONG + nr >= sz) \ goto out; \ \ tmp = (FETCH); \ w = hweight_long(tmp); \ if (w > nr) \ goto found; \ \ nr -= w; \ } \ \ if (sz CCCC BITS_PER_LONG) \ tmp = (FETCH) & BITMAP_LAST_WORD_MASK(sz); \ found: \ sz = min(idx * BITS_PER_LONG + fns(tmp, nr), sz); \ out: \ sz; \ }) ``` 在這個巨集中 `addr[idx]` 被替換為 `FETCH` ，巨集使用 `idx` 值得變化來持續走訪並找到特定的位元。 `FIND_NTH_BIT` 能夠搜尋超出單個 `BITS_PER_LONG` 長度的範圍。如果要查找的位元不在目前處理的字節中，迴圈將繼續在下一個 `BITS_PER_LONG` 大小的區塊中查找，直到找到該位為止。值得注意的是 `tmp = (FETCH) & BITMAP_LAST_WORD_MASK(sz);` 當搜尋到最後一個字節，且 `size` 不被 `BITS_PER_LONG` 整除時，應避免找到超出 `size` 範圍的字元，因此使用**取餘數**的方式找到最後一個字節所需要搜尋的字元數量，因此 `CCCC` 應替換為 `%` 。