linux2024-homework2

# 2024q1 Homework2 (quiz1+2) contributed by < `millaker` > ## 第一週測驗題 ### 測驗一 #### 程式運作原理題目中的 `quicksort()` 參考 [Optimized QuickSort — C Implementation (Non-Recursive)](https://alienryderflex.com/quicksort/) 文章內的方法，運用堆疊避免遞迴函式呼叫，但兩者交換元素的方法不同。文章內的做法只要找到一組比 `pivot` 大和比 `pivot` 小的元素就會將兩者交換並進行下一輪比較直到 `L >= R`。題目中則是依序走訪每一個節點並和 `pivot` 比較，如果大於就加入 `left` 鏈結串列，反之則加入 `right`。以下為一個簡單的範例: 將 `pivot` 設為第一個節點，並從 `pivot` 下一個節點開始走訪和比較。將 `left` 和 `right` 重設為空的鏈結串列。 ```graphviz digraph G { node[shape=plaintext]; pivot; left; right; p; node[shape=circle]; ele1[label=3]; ele2[label=2]; ele3[label=5]; ele4[label=1]; ele5[label=4]; { rank="same" ele1->ele2->ele3->ele4->ele5; } pivot->ele1; p->ele2; } ``` 因為 `p` 指向的值 2 小於 `pivot` 的 3，所以將 `p` 指向的節點加入 `left` 接著 `p` 指向下一個節點。 ```graphviz digraph G { node[shape=plaintext]; pivot; left; right; p; node[shape=circle]; ele1[label=3]; ele2[label=2]; ele3[label=5]; ele4[label=1]; ele5[label=4]; { rank="same" ele1->ele3->ele4->ele5; } pivot->ele1; p->ele3; left->ele2; } ``` 因為 `p` 指向的值 5 大於 `pivot` 的 3，所以將 `p` 指向的節點加入 `right` 接著 `p` 指向下一個節點。持續此操作直到整個串列走訪完的結果會像下圖 ```graphviz digraph G { node[shape=plaintext]; pivot; left; right; node[shape=circle]; ele1[label=3]; ele2[label=2]; ele3[label=5]; ele4[label=1]; ele5[label=4]; { rank="same" } pivot->ele1; left->ele2->ele4; right->ele3->ele5; } ``` 接著要更新堆疊讓程式可以繼續排列 `left` 子串列和 `right` 子串列。 ```c begin[i] = left; end[i] = list_tail(&left); begin[i + 1] = pivot; end[i + 1] = pivot; begin[i + 2] = right; end[i + 2] = list_tail(&right); ``` 利用上方程式碼，依序將 `left`, `pivot`, `right` 的頭尾指派給 `begin` 和 `end`，也就是下一個循環要排序的串列頭尾。為什麼需要依照這個順序加入堆疊? 因為操作是從堆疊的最上方開始往下做排序，並在 `L==R` 時將當節點用 `list_add()` (將節點加入串列前端) 加入 `result` 串列，如此才能保持 `left` < `pivot` < `right` 的順序。 #### 改進實作 > commit [`a89c12e`](https://github.com/millaker/linux2024-quiz/commit/a89c12ef7095ae368f0fa6c3ce14268d74d0b601) 原本參考文章內是對陣列進行排序才需要 `begin` 和 `end` 標示要被排序的陣列頭尾。題目中程式碼是對鏈結串列進行排序，串列的尾端不需要特別標示就是 `NULL`，因此可以將 `end[]` 刪除省去一半額外記憶體空間使用。另外，原來為了找出串列尾端並指派給 `end` 的 `list_tail()` 呼叫可以統一在迴圈開始時找出尾端指派給 `R` 即可。 ```diff int max_level = 2 * n; - node_t *begin[max_level], *end[max_level]; + node_t *begin[max_level]; ... //while loop - node_t *L = begin[i], *R = end[i]; + node_t *L = begin[i], *R = list_tail(&L); ... //after comparing + pivot->next = NULL; begin[i] = left; - end[i] = list_tail(&left); begin[i + 1] = pivot; - end[i + 1] = pivot; begin[i + 2] = right; - end[i + 2] = list_tail(&right); ``` 利用 `callgrind` 比較分析使用的指令數: ``` $ valgrind --tool=callgrind --dump-instr=yes ./a.out ``` ``` //original method 44,007,794 (28.92%) 1.c:quicksort [/home/jacob/Documents/linux2024/quiz/1_1/a.out] 31,063,965 (20.41%) 1.c:list_add [/home/jacob/Documents/linux2024/quiz/1_1/a.out] 27,853,352 (18.30%) 1.c:list_tail [/home/jacob/Documents/linux2024/quiz/1_1/a.out] ``` ``` //Modified method 42,674,080 (28.07%) 1.c:quicksort_mod [/home/jacob/Documents/linux2024/quiz/1_1/a.out] 31,063,965 (20.43%) 1.c:list_add [/home/jacob/Documents/linux2024/quiz/1_1/a.out] 29,051,900 (19.11%) 1.c:list_tail [/home/jacob/Documents/linux2024/quiz/1_1/a.out] ``` 利用 `cachegrind` 比較分析較早呼叫 `list_tail()` 對快取記憶體的影響: ``` $ valgrind --tool=cachegrind ``` ``` //Original method ==4078140== ==4078140== I refs: 152,164,356 ==4078140== I1 misses: 1,414 ==4078140== LLi misses: 1,392 ==4078140== I1 miss rate: 0.00% ==4078140== LLi miss rate: 0.00% ==4078140== ==4078140== D refs: 90,476,943 (63,525,439 rd + 26,951,504 wr) ==4078140== D1 misses: 2,617,443 ( 2,531,954 rd + 85,489 wr) ==4078140== LLd misses: 84,023 ( 2,162 rd + 81,861 wr) ==4078140== D1 miss rate: 2.9% ( 4.0% + 0.3% ) ==4078140== LLd miss rate: 0.1% ( 0.0% + 0.3% ) ==4078140== ==4078140== LL refs: 2,618,857 ( 2,533,368 rd + 85,489 wr) ==4078140== LL misses: 85,415 ( 3,554 rd + 81,861 wr) ==4078140== LL miss rate: 0.0% ( 0.0% + 0.3% ) ``` ``` //Modified version Time: 201335 ==4078118== ==4078118== I refs: 152,029,191 ==4078118== I1 misses: 1,410 ==4078118== LLi misses: 1,387 ==4078118== I1 miss rate: 0.00% ==4078118== LLi miss rate: 0.00% ==4078118== ==4078118== D refs: 90,143,609 (63,125,527 rd + 27,018,082 wr) ==4078118== D1 misses: 2,566,097 ( 2,481,643 rd + 84,454 wr) ==4078118== LLd misses: 83,617 ( 1,763 rd + 81,854 wr) ==4078118== D1 miss rate: 2.8% ( 3.9% + 0.3% ) ==4078118== LLd miss rate: 0.1% ( 0.0% + 0.3% ) ==4078118== ==4078118== LL refs: 2,567,507 ( 2,483,053 rd + 84,454 wr) ==4078118== LL misses: 85,004 ( 3,150 rd + 81,854 wr) ==4078118== LL miss rate: 0.0% ( 0.0% + 0.3% ) ``` 從上方數據可以看到，修改過後的版本 `D1 misses` 較少，這也在預期中，因為不再需要存取 `end` 陣列。而比較次數因為沒有更動到 `pivot` 的選擇和其他算法所以不會有改變。 #### 以 Linux list API 改寫 > [commit 981f5f7](https://github.com/millaker/linux2024-quiz/commit/981f5f7de3398b46b41bcd372b1247acef058f2c) `quicksort_it_tail()` 為實作的第一版，參照 Linux `list_sort()` 將雙向環狀串列轉成單向方便操作，比較方法和堆疊操作都和上方 `quicksort()` 一樣。 `quicksort()` 後來參考 [fennecj](https://hackmd.io/@fennecJ/linux2024-q1q2) 和串列 Timsort 當中使用閒置的 `prev` 指標紀錄當前串列的尾端，可以省去 `list_tail()` 每次走訪整條串列的操作，粗略使用 `clock()` 估計運算時間大約可以加速 30%。 ```shell New: 12780 Old: 18481 Improve: 0.308479 ``` #### 將 `quicksort()` 整合進 lab0-c > 見 [homework1 - Port quicksort from quiz1](https://hackmd.io/KbgAPOOvT-uqnX9IbsA15g?both#Port-quicksort-from-quiz1) ### 測驗二題目提供 Timsort 的實作和 Linux `list_sort()` 一樣先將環狀雙向鏈結串列轉成沒有 header 的單向鏈結串列方便後續操作。接下來持續使用 `find_run()` 找出下一個符合規則的 run。 `find_run()` 會找出兩種情況 1. 正常的遞增串列 2. 相反的遞減串列遞減的串列會在尋找的過程中做串列反轉的操作以符合 `merge` 遞增的規則。回傳的 run 會被存在 `pair` 結構體中： ```c struct pair { struct list_head *head, *next; }; ``` `head` 會指向當前 run 的串列， `next` 會指向下一個 run 的第一個節點。因為 `prev` 指標已經沒有用處，所以將當前 run 長度存在 `head->next->prev` 中供後續使用，比較有疑問的是為什麼不是存在 `head->prev` ，需要閱讀更多程式碼才能解答。 ```c do { /* Find next run */ struct pair result = find_run(priv, list, cmp); result.head->prev = tp; tp = result.head; list = result.next; stk_size++; tp = merge_collapse(priv, cmp, tp); } while (list); ``` 上方程式碼為 Timsort 主要循環，每個循環利用 `find_run()` 在剩餘的串列中找出下一個 run並加入 `tp` 所指向串列的頭，到這邊就可以解答為什麼 `len` 需要使用 `head->next->prev` 來儲存，因為這裡和 Linux `list_sort()` 一樣使用 `prev` 鏈結串連現有的 run，畫成示意圖如下: ```graphviz digraph G { R1 -> R1_1 -> R1_2; R2 -> R2_1 -> R2_2 -> R2_3; R3 -> R3_1 -> R3_2 -> R3_3 -> R3_4; R1 -> R2[label = "prev"]; R2 -> R3[label = "prev"]; node[shape=none] Null; R3 -> Null[label="prev"]; {rank = same; R1; R2; R3; Null} } ``` 每一個循環會呼叫一次 `merge_collapse()` 利用 Timsort 的規則: 1. A + B <= C 2. A <= B 若不符合上述兩個規則，就會將 run 合併直到完全符合為止。其中合併的方法採用 Linux `list_sort()` 中使用的 `merge()` 也就是逐一比較合併串列，不是 Timsort 原先使用找地方插入點的方法。結束循環後使用 `merge_force_collapse()` 將堆疊中剩餘的 run 合併到數量只剩 2 個以下的情況，最後合併最後兩個 run 並重建 `prev` 鏈結完成排序。 ## 第二週測驗題 ### 測驗一本題使用 Linux 核心處理雜湊碰撞的 `hlist_node` ，以下是他的結構體 ```c struct hlist_node { struct hlist_node *next, **pprev; }; ``` 下圖參考 [Linux 核心的 Hash table 實作](https://hackmd.io/rpIi8FeqQ9eBJ-UTFgb--A?view) ```graphviz digraph G { rankdir=LR; node[shape=record]; map [label="hlist_head.first |<ht0> |<ht1> |<ht2> |<ht3> |<ht4> |<ht5> |<ht7> |<ht8> "]; node[shape=none] null1 [label=NULL] null2 [label=NULL] subgraph cluster_1 { style=filled; color=lightgrey; node [shape=record]; hn1 [label="hlist_node | {<prev>pprev | <next>next}"]; label="hash_key 1" } subgraph cluster_2 { style=filled; color=lightgrey; node [shape=record]; hn2 [label="hlist_node | {<prev>pprev | <next>next}"]; label="hash_key 2" } subgraph cluster_3 { style=filled; color=lightgrey; node [shape=record]; hn3 [label="hlist_node | {<prev>pprev | <next>next}"]; label="hash_key 3" } map:ht1 -> hn1 hn1:next -> hn2 hn2:next -> null1 hn2:prev:s -> hn1:next:s map:ht5 -> hn3 hn3:next -> null2 hn1:prev:s -> map:ht1 hn3:prev:s -> map:ht5 } ``` 其中 `next` 指向串列的下一個節點，而 `pprev` 為了避免特別情況使用指標的指標指向前一個節點的 `next` 指標。 `hlist_node` 的用法和 Linux 核心 `list` 一樣嵌入自定義的 `struct` 並用 `container_of` 巨集找到物件地址。回到題目，根據題目提示，`preorder` 陣列越靠前的元素就會越接近整個二元樹的頂端，而要畫出唯一的二元樹，需要 `inorder` 判斷元素間的相對位置。以 Leetcode 中的測試一舉例: ``` preorder = [3,9,20,15,7], inorder = [9,3,15,20,7] ``` ```graphviz digraph G { node[shape=none]; preorder; current_root; node[shape=box]; p1[label=3]; p2[label=9]; p3[label=20]; p4[label=15]; p5[label=7]; current_root->p1; { rank=same; preorder->p1->p2->p3->p4->p5; } } ``` ```graphviz digraph G { node[shape=none]; inorder; found[label="index found = 1"]; node[shape=box]; i1[label=9]; i2[label=3]; i3[label=15]; i4[label=20]; i5[label=7]; found->i2; { rank=same; inorder->i1->i2->i3->i4->i5; } } ``` 從 `preorder` 第一個元素開始建構二元樹，在 `inorder` 內找到該元素並分成左右繼續遞迴操作，而左右分別就是當前元素的左右子二元樹。此一做法需要知道 `preorder` 每個元素在 `inorder` 中的位置，題目中的程式便是使用雜湊表來儲存每個元素的位置，並在後續直接查詢即可。 ```c= static struct TreeNode *buildTree(int *preorder, int preorderSize, int *inorder, int inorderSize) { struct hlist_head *in_heads = malloc(inorderSize * sizeof(*in_heads)); for (int i = 0; i < inorderSize; i++) INIT_HLIST_HEAD(&in_heads[i]); for (int i = 0; i < inorderSize; i++) node_add(inorder[i], i, inorderSize, in_heads); return dfs(preorder, 0, preorderSize - 1, inorder, 0, inorderSize - 1, in_heads, inorderSize); } ``` `buildTree()` 第6行到第10行就是在建構雜湊表，並用 `dfs()` 進行建構二元數的操作。 ```c= static struct TreeNode *dfs(int *preorder, int pre_low, int pre_high, int *inorder, int in_low, int in_high, struct hlist_head *in_heads, int size) { if (in_low > in_high || pre_low > pre_high) return NULL; struct TreeNode *tn = malloc(sizeof(*tn)); tn->val = preorder[pre_low]; int idx = find(preorder[pre_low], size, in_heads); tn->left = dfs(preorder, pre_low + 1, pre_low + (idx - in_low), inorder, in_low, idx - 1, in_heads, size); tn->right = dfs(preorder, pre_high - (in_high - idx - 1), pre_high, inorder, idx + 1, in_high, in_heads, size); return tn; } ``` `dfs()` 內第15行從雜湊表獲取在 `inorder` 的位置，第16行和第18行進行遞迴呼叫，其中有兩段計算 `pre_high` 部分對我來說不太直觀， `left` 的`pre_high` 是 `pre_low + 1, pre_low + (idx - in_low)` ， `idx - in_low` 是在計算目標左邊有幾個元素，因此傳進遞迴呼叫的 `dfs` 便只需要傳數量相同的未處理元素即可。而 `right` 也是類似的概念， `pre_high - (in_high - idx - 1)` 計算目標的右邊有幾個元素，只需要傳相同數量的元素進行遞迴呼叫即可。 ```c static int find(int num, int size, const struct hlist_head *heads) { struct hlist_node *p; int hash = (num < 0 ? -num : num) % size; hlist_for_each (p, &heads[hash]) { struct order_node *on = list_entry(p, struct order_node, node); if (num == on->val) return on->idx; } return -1; } ``` `find()` 會算出 `num` 的雜湊值加入雜湊表對應位置，使用的雜湊算法是 `% size`。 #### 測試程式 > commit [aa436d9](https://github.com/millaker/linux2024-quiz/commit/aa436d9fcae320f74d3290ed90232f32c963203c) 我想到的測試方法是將建構的二元樹以前序和中序走訪分別和原來傳入的 `preorder` 和 `inorder` 做比較。 ```c static int curr = 0; void inorder(struct TreeNode *root, int *ans) { if (!root) return; inorder(root->left, ans); assert(ans[curr++] == root->val); inorder(root->right, ans); } void preorder(struct TreeNode *root, int *ans) { if (!root) return; assert(ans[curr++] == root->val); preorder(root->left, ans); preorder(root->right, ans); } ``` 以上方兩個函式驗證，使用 `static int curr` 紀錄當前比較的索引，驗證前需要重置為零。測試的項目為: 1. 題目提供的範例測資 2. 相同前序走訪，不同中序走訪 ``` A: B: ‾‾ ‾‾ 1 1 / / \ 2 2 3 / 3 ``` 兩者前序走訪都為 [1, 2, 3]，中序走訪為 [3, 2, 1] 和 [2, 1, 3] 3. 相同中序走訪，不同前序走訪 ``` A: B: ‾‾ ‾‾ 3 1 / / \ 1 2 3 / 2 ``` 兩者中序走訪都為 [2, 1, 3]，前序走訪為 [3, 1, 2] 和 [1, 2, 3] 如此可以在輸入 `inorder` 和 `preorder` 都正確的情況下，測試 `buildTree` 是否構建出唯一的二元樹。 #### 改進實作 > [commit 9d25ef9](https://github.com/millaker/linux2024-quiz/commit/9d25ef9e615b50d6bc362491af15f3c8e24f2b97) 要改進原來提供的 `buildTree()` 我第一個想到的是利用 [Optimized QuickSort — C Implementation (Non-Recursive)](https://alienryderflex.com/quicksort/) 的手法，將遞迴呼叫改用堆疊實作避免函式呼叫的成本以及可能溢出的系統堆疊。原實作方式每次呼叫 `dfs()` 都需要傳8個變數值，代價非常高。先定義堆疊儲存的內容，使用以下結構體: ```c struct info { int pbegin; int pend; int ibegin; int iend; struct TreeNode **parent; }; ``` 其中四個 `int` 就是原本遞迴呼叫的參數， `parent` 則是指標的指標，讓子串列知道需要把當前節點指派給誰。 ```c while (i > 0) { // if not legal, return NULL to parent if (stk[i].pbegin > stk[i].pend) { *(stk[i].parent) = NULL; i--; continue; } int val = preorder[stk[i].pbegin]; int idx = find(val, size, in_heads); struct TreeNode *temp = malloc(sizeof(*temp)); temp->val = val; *(stk[i].parent) = temp; // push left subtree stk[i].pbegin = stk[i].pbegin + 1; stk[i].pend = stk[i].pbegin + (idx - stk[i].ibegin); stk[i].iend = idx - 1; stk[i].parent = &temp->left; // push right subtree stk[i + 1].pbegin = stk[i].pend - (stk[i].iend - idx - 1); stk[i + 1].pend = stk[i].pend; stk[i + 1].ibegin = idx + 1; stk[i + 1].iend = stk[i].iend; stk[i + 1].parent = &temp->right; i += 1; } ``` 主要的操作迴圈如上方所示，依照四個步驟 1. 檢查是否還有 `preorder` 元素尚未被處理，如果沒有則指派 `NULL` 給母節點並移除堆疊頂端資料。 2. 如果當前 `preorder` 元素需要被處理，則配置一個新的節點並指派 `val`。 3. 呼叫 `find()` 找出該元素在 `inorder` 的位置並將兩個子樹加入堆疊上方。 4. 循環持續做到堆疊是空的為止這裡我的 `pbegin` `pend` `ibegin` `iend` 的計算方法，是參考原題目內遞回呼叫的方法，是否需要這麼多變數還需要額外思考。因為使用指標的指標，因此不需要檢查樹根節點的特殊情況，在進入操作迴圈前堆疊的首個元素把 `parent` 指向要回傳的 `res` 指標即可。 ```c struct TreeNode *res = NULL; stk[0].pbegin = 0; stk[0].pend = size - 1; stk[0].ibegin = 0; stk[0].iend = size - 1; stk[0].parent = &res; ... // process loop ... return res; ``` 使用原本相同的測試程式測試改進後的實作，同樣通過所有測試。 #### 釋放 `hlist` 記憶體 > [commit c9e9f2d](https://github.com/millaker/linux2024-quiz/commit/c9e9f2dba013cf2b5731e4024bc41a987042f446) 題目提供的程式碼並未在運算完成後釋放記憶體，因此我從 Linux `list.h` 移植巨集 `hlist_for_each_safe()` 並在完成運算後釋放所有雜湊表配置的記憶體。使用 Valgrind memcheck 檢查結果如下所示: ```shell ==4118846== Memcheck, a memory error detector ==4118846== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==4118846== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info ==4118846== Command: ./a.out ==4118846== All test case passed ==4118846== ==4118846== HEAP SUMMARY: ==4118846== in use at exit: 0 bytes in 0 blocks ==4118846== total heap usage: 28 allocs, 28 frees, 2,384 bytes allocated ==4118846== ==4118846== All heap blocks were freed -- no leaks are possible ==4118846== ==4118846== For lists of detected and suppressed errors, rerun with: -s ==4118846== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ``` ### 測驗二從 [Leetcode 146. LRU Cache](https://leetcode.com/problems/lru-cache/description/) 說明得知要實作的 LRUCache 需要提供三個 API 1. `CreateCache(int size)` : 創建一個大小為 size 的快取 2. `GetCache(int key)` : 如果 key 存在則回傳對應的值，若無則回傳 -1 3. `PutCache(int key, int value)` : 如果 key 已存在則更新 (key, value) 的值，若沒有則加進快取中，若快取記憶體已滿，則移除最近最少使用的 key 其中 `GetCache()` 和 `PutCache()` 都需要 O(1) 時間複雜度先觀察題目中使用的快取結構體: ```c typedef struct { int capacity; int count; struct list_head dhead; struct hlist_head hhead[]; } LRUCache; typedef struct { int key; int value; struct hlist_node node; struct list_head link; } LRUNode; ``` `LRUCache` 前三個結構體成員很容易理解，雖然還不能從 `dhead` 名字推測出它的作用，最後一個成員被定義成沒有宣告大小的陣列，在翻閱 C99 規格書關於 `type specifier` `struct` 部分在第 16 點看到以下說明: > C99 standard 6.7.2.1 16: > As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. With two exceptions, the flexible array member is ignored. First, the size of the structure shall be equal to the offset of the last element of an otherwise identical structure that replaces the flexible array member with an array of unspecified length.106) Second, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it 當對含有 flexible array member 的結構做 `sizeof` ，結果會是將 flexible array member 忽視的大小。當使用 `.(or ->)` 存取 flexible array member，他的行為會像是這個成員被一個陣列取代 (陣列大小為不讓 `struct` 大小超過當前物件大小的最大值)。 ```c= LRUCache *lRUCacheCreate(int capacity) { LRUCache *cache = malloc(2 * sizeof(int) + sizeof(struct list_head) + capacity * sizeof(struct list_head)); cache->capacity = capacity; cache->count = 0; INIT_LIST_HEAD(&cache->dhead); for (int i = 0; i < capacity; i++) INIT_HLIST_HEAD(&cache->hhead[i]); return cache; } ``` 上方程式碼是建構一個快取記憶體，從 5 ~ 10 行可以推敲出 `hhead` 如同第二週測驗一一樣使用 `hlist` 處理雜湊碰撞的問題。回到第 3 ~ 4 行配置 `LRUCache` 記憶體的空間的部分，我覺得計算大小算法有誤，前兩項是配置 `int capacity` `int count` 和 `struct list_head dhead` 的空間，最後一項應更改為 `capacity * sizeof(struct hlist_head)`。另外從前述提到 C99 規格書內容可知， `sizeof(struct LRUCache)` 的值會是忽略 `hhead` 的大小，因此可以以更簡短且不容易出錯的方法計算大小 ```c LRUCache *cache = malloc(sizeof(*cache) + capacity * sizeof(struct hlist_head)); ``` 至此，可以推測出 `dhead` 這個串列是用來儲存快取記憶體存取的紀錄，越靠串列前端的節點就是越近期有被使用過的，若要移除快取節點從尾端移除一個節點即可，`list_head` 為雙向環狀鏈結串列，因此要找到最後一個節點非常容易。 ```c int lRUCacheGet(LRUCache *obj, int key) { int hash = key % obj->capacity; struct hlist_node *pos; hlist_for_each(pos, &obj->hhead[hash]) { LRUNode *cache = list_entry(pos, LRUNode, node); if (cache->key == key) { list_move(&cache->link, &obj->dhead); return cache->value; } } return -1; } ``` `CacheGet()` 的實作很好理解，算出雜湊值後存取對應的 `hhead` 搜尋是否存在 `key` 並將該節點利用 `list_move` 移動到 `dhead` 的首位。 `CachePut()` 的前半段和 `CacheGet()` 很像，算出雜湊值後找尋現有節點。後半段則是在處理 `key` 不在快取中的情況，若快取已滿便從 `dhead` 尾端移除一個節點。 #### 測試程式之前在實作硬體快取控制器 RTL 時就有遇過驗證的問題，當時的置換規則是選擇隨機置換，因為資料很難手動產生只能仰賴大量隨機資料觸發錯誤，所以驗證難度很高。這題使用最近最少使用，在驗證上可能較簡單。我的想法是，針對以下情況測試 1. 寫入資料後，未被置換的情況讀回來的資料要是正確的 2. 寫入資料後，被置換後讀取資料要回傳 -1 3. 寫入資料後，被置換的節點必須是最近最少使用的 4. 讀取未寫入資料必定回傳 -1 > [commit 7b284d0](https://github.com/millaker/linux2024-quiz/commit/7b284d0b940b1317f6326ee0ce0a8339e50edff1) 這些測試資料用手動產生很容易，使用以下結構體當作命令單元 ```c struct TestCase { enum { GET, PUT } command; int key; int value; int expect; }; ``` 由於 `CachePut` 無回傳任何資料，也沒有任何資訊可以得知被置換的節點是哪一個，所以只能使用 `CacheGet` 的結果得知節點是否被置換。將上方四點寫成　TestCase 使用我寫好的 `test()` 測試。如果是 PUT，則不管結果，如果是 GET 使用 `assert` 比較回傳值與預期結果。 ```c // Test 1 // Cache size = 4, put 4 and get 4, no blocks are evicted, should get // original value struct TestCase t1[8] = {{PUT, 0, 0, 0}, {PUT, 1, 1, 0}, {PUT, 2, 2, 0}, {PUT, 3, 3, 0}, {GET, 0, 0, 0}, {GET, 1, 1, 1}, {GET, 2, 2, 2}, {GET, 3, 3, 3}}; // Test 2 // Cache size = 2, put 3 and get the evicted block, should get -1 struct TestCase t2[4] = { {PUT, 0, 0, 0}, {PUT, 1, 1, 0}, {PUT, 2, 2, 0}, {GET, 0, 0, -1}}; // Test 3 // Cache size = 2, put 4 and the first two blocks should be evicted struct TestCase t3[6] = {{PUT, 0, 0, 0}, {PUT, 1, 1, 0}, {PUT, 2, 2, 0}, {PUT, 3, 3, 0}, {GET, 0, 0, -1}, {PUT, 1, 1, -1}}; // Test 4 // Cache size = 4, get 4 non written key struct TestCase t4[4] = { {GET, 0, 0, -1}, {GET, 1, 1, -1}, {GET, 2, 2, -1}, {GET, 3, 3, -1}}; ``` ```c int test(struct TestCase *t, int tsize, int csize) { LRUCache *cache = lRUCacheCreate(csize); if (!cache) return 0; for (int i = 0; i < tsize; i++) { struct TestCase c = t[i]; if (c.command == GET) { assert(t[i].expect == lRUCacheGet(cache, t[i].key)); } else { lRUCachePut(cache, t[i].key, t[i].value); } } lRUCacheFree(cache); return 1; } ``` #### 記憶體流失測試使用 `valgrind` `memcheck` 測試沒有記憶體流失的情況發生。 ``` $ valgrind --tool=memcheck ./a.out ==4121248== Memcheck, a memory error detector ==4121248== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==4121248== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info ==4121248== Command: ./a.out ==4121248== All test cases passed ==4121248== ==4121248== HEAP SUMMARY: ==4121248== in use at exit: 0 bytes in 0 blocks ==4121248== total heap usage: 13 allocs, 13 frees, 1,632 bytes allocated ==4121248== ==4121248== All heap blocks were freed -- no leaks are possible ==4121248== ==4121248== For lists of detected and suppressed errors, rerun with: -s ==4121248== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ``` ### 測驗三從提供的程式碼註解可以知道 `find_nth_bit` 的功能是在給定的記憶體區段內找出第 `n` 個設定為 1 的位元。 ```c= static inline unsigned long find_nth_bit(const unsigned long *addr, unsigned long size, unsigned long n) { if (n >= size) return size; if (small_const_nbits(size)) { unsigned long val = *addr & GENMASK(size - 1, 0); return val ? fns(val, n) : size; } return FIND_NTH_BIT(addr[idx], size, n); } ``` `find_nth_bit` 第 8 ~ 12 行，在 `small_const_nbits()` 為真的情況下，和其他找尋方法不同，找出 `small_const_nbits()` 的定義: ```c #define BITS_PER_LONG 64 #define small_const_nbits(nbits) \ (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0) ``` `BITS_PER_LONG` 定義為 `long` 在當前系統有多少位元。而 `__builtin_constant_p()` 是一個 GNU 擴充，翻閱 [GNU gcc manual](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fconstant_005fp) 可以看到說明: > You can use the built-in function __builtin_constant_p to determine if the expression exp is known to be constant at compile time and hence that GCC can perform constant-folding on expressions involving that value. The function returns the integer 1 if the argument is known to be a compile-time constant and 0 if it is not known to be a compile-time constant. 由此可以知道若 `exp` 在編譯時期是常數 `__builtin_constant_p(exp)` 會回傳 1，否則 0。因此 `find_nth_bit` 8 ~ 12 行是在判斷若輸入的 `size` 確定小於一個 `long` 的長度，便可以直接計算。先用 `addr` 位址和 `GENMASK` 巨集產生一個數值， `GENMASK` 定義如下: ```c #define GENMASK(h, l) \ (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h)))) ``` 可以拆成兩段來看 1. `((~0UL) - (1UL << (l)) + 1)` 會產生 `[MSB:l]` 位元的 1 ``` ## Take 8 bit number as example, l = 3 ~0 = 11111111 -) 1 << (l) = 00001000 ------------------------ 11110111 +) 1 ------------------------ 11111000 ``` 2. `(~0UL >> (BITS_PER_LONG - 1 - (h)))` 會產生 `[h:LSB]` 位元的 1 ``` ## Take 8 bit number as example, h = 7 ~0 = 11111111 >>) 8 - 1 - 7 = 00000000 ------------------------ 11111111 ## h = 3 ~0 = 11111111 >>) 8 - 1 - 3 = 00000100 ------------------------ 00001111 ``` 得到 `[MSB:l]` 和 `[h:LSB]` 的 1 後做 `&` 操作可以得到 `[h:l]` 的位元遮罩。回到 `find_nth_bit` `val` 的計算，透過 `GENMASK(size - 1, 0)` 產生 `[63:0]` 的遮罩並對記憶體位置 `*addr` bitwise and，如此可以得到 `addr` 指向的前 64位元的值。使用 `fns()` 找出第 `n` 個 1。 ```c static inline unsigned long fns(unsigned long word, unsigned int n) { while (word) { unsigned int bit = __ffs(word); if (n-- == 0) return bit; __clear_bit(bit, &word); } return BITS_PER_LONG; } ``` `fns()` (find nth set bit in word) 的實作方法比較直觀，使用 `__ffs` (find first set bit) 找尋 `word` (這裡 `word` 定義為 64位元) 第一個被設定的位元，若還不是要找尋的第 `n` 個位元就清除當前位元並繼續尋找。其中 `__ffs()` 使用二分搜尋法，每次搜尋最高位的 1 是在前半後半。至此，我清楚知道當輸入的 `size` 小於一個 `word` 的大小時，可以用這些位元操作得到第 `n` 個設定的位元。 `FIND_NTH_BIT(FETCH, size, num)` 為一個巨集，其定義如下 ```c= #define FIND_NTH_BIT(FETCH, size, num) \ ({ \ unsigned long sz = (size), nr = (num), idx, w, tmp; \ \ for (idx = 0; (idx + 1) * BITS_PER_LONG <= sz; idx++) { \ if (idx * BITS_PER_LONG + nr >= sz) \ goto out; \ \ tmp = (FETCH); \ w = hweight_long(tmp); \ if (w > nr) \ goto found; \ \ nr -= w; \ } \ \ if (sz % BITS_PER_LONG) \ tmp = (FETCH) &BITMAP_LAST_WORD_MASK(sz); \ found: \ sz = min(idx * BITS_PER_LONG + fns(tmp, nr), sz); \ out: \ sz; \ }) ``` `FETCH` 在 `find_nth_bit` 函式中為 `addr[idx]` 所以在每次 `for` 循環，都會對當前 64 位元的值指派給 `tmp` 進行檢查。檢查的方法使用一系列 `hweight_*()` 巨集， ```c #define __const_hweight8(w) \ ((unsigned int) ((!!((w) & (1ULL << 0))) + (!!((w) & (1ULL << 1))) + \ (!!((w) & (1ULL << 2))) + (!!((w) & (1ULL << 3))) + \ (!!((w) & (1ULL << 4))) + (!!((w) & (1ULL << 5))) + \ (!!((w) & (1ULL << 6))) + (!!((w) & (1ULL << 7))))) ``` 從上方程式碼可以看出 `__const_hweight8` 對八個位元分別檢查是否設置，利用 `!!` 將值轉為 1 並相加，可以得到八個位元中有幾個被設置。其他相似但對不同長度進行操作的巨集 `__const_hweight16`, `__const_hweight32` ...，都是將 `w` 拆成上下兩部分進行檢查。因此，第 10 行找出 `unsigned long w` 中有幾個 1 後並和 `nr` (剩餘 n 個位元要尋找) 比較。第 11~12 行為已經找到第 n 個被設置的位元，第 6~7 行為不可能在 `size` 範圍內找到第 `n` 個被設置的位元進行提早跳出。第 20 行需要進行最後檢查，找到第 n 個被設置位元的確切位置，並和 `sz` 比較，若 `sz` 較小代表第 n 個被設置位元在範圍外，按照定義需回傳 `size`。第 17~18 行則是處理最後一個 `word` 範圍大於 `sz` 的情況，將當前的 `word` 和 `BITMAP_LAST_WORD_MASK` 產生的遮罩 bitwise and，將範圍外的位元都消除再進行 `fns` 尋找。在看到 `FIND_NTH_BIT` 巨集時，我就有疑問為什麼 `{}` 可以當作 `rvalue` 使用，翻閱 C99 規格書找到 `6.8.2 Compound Statement` 也沒有相關說明，後來才找到是 [GNU GCC](https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html) 擴充的 > GCC manual 6.1 Statements and Declarations in Expressions: > A compound statement enclosed in parentheses may appear as an expression in GNU C. This allows you to use loops, switches, and local variables within an expression. 所以這裡使用 `({ ... })` 的用法完全符合 GNU GCC 的規範，而最後 rvalue 會等於第 22 行的 `sz`。

Read more

linux2024-homework5

linux2024-homework4

Unified model Simulation

linux2024-homework1