2024q1 Homework1 (lab0)

# 2024q1 Homework1 (lab0) contributed by < `kevinzxc1217` > ### Reviewed by `Kuanch` * 未完成 `lib/list_sort.c` 部分 * 未研讀論文〈Dude, is my code constant time?〉 * Git 部分皆在`master`上開發，建議每一功能開分支開發，並練習 `git rebase`；commit [e333ba0](https://github.com/kevinzxc1217/lab0-c/commit/e333ba01d72ffc9d10202fe8d1b375edb0646c0b) 沒有說明為什麼僅將 `q_merge_2list` 重新命名為 `q_merge_2_list`，若是非常無關緊要，commit message 仍可以「說明原因，並註明無須在意，請見 commit ...」，會讓其他人更了解 ### Reviewed by `Lccgth` 貼上程式碼時應只列出關鍵程式碼進行討論，例如 q_delete_mid 可列出快慢指標與刪除節點部分即可。 ```c while(fast -> next != head && fast -> next -> next != head){ slow = slow -> next; fast = fast -> next -> next; } list_del_init(slow); q_release_element(list_entry(slow, element_t, list)); ``` 函式實作應貼上 git commit 紀錄，日後函式優化時可以根據歷史 commit 紀錄進行比較。 q_sort 中有個子標題打錯了 (以第一個節點來切割 -> 以中點切割) ### Reviewed by `kkkkk1109` * 建議可於所有說明中附上 `commmit` 連結，以便後續和他人討論 ### Reviewed by `Leo5307` * 建議可多利用 LIST API 中的```list_for_each_safe```等功能來減少自己設定```while```終止條件的次數。 ## 開發環境 ```shell $ gcc --version gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 126 Model name: Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz Stepping: 5 CPU MHz: 1200.000 CPU max MHz: 3600.0000 CPU min MHz: 400.0000 BogoMIPS: 2380.80 Virtualization: VT-x L1d cache: 192 KiB L1i cache: 128 KiB L2 cache: 2 MiB L3 cache: 6 MiB NUMA node0 CPU(s): 0-7 ``` ## 指定的佇列開發 ### q_delete_mid ```c bool q_delete_mid(struct list_head *head) { // https://leetcode.com/problems/delete-the-middle-node-of-a-linked-list/ struct list_head *fast = head -> next; struct list_head *slow = head -> next; if(head == NULL) return false; while(fast -> next != head && fast -> next -> next != head){ slow = slow -> next; fast = fast -> next -> next; } list_del_init(slow); q_release_element(list_entry(slow, element_t, list)); return true; } ``` :::danger 1. 避免過多的中英文混用，已有明確翻譯詞彙者，例如「鏈結串列」(linked list) 和「佇列」(queue)，就使用該中文詞彙，英文則留給變數名稱、人名，或者缺乏通用翻譯詞彙的場景。 2. 鏈結串列的英語是 linked list，而非 link-list，沒有連字號 ::: 這裡使用 `fast` ( 每次移動兩格 ) 和 `slow` ( 每次移動一格) 的方式去尋找該鏈結串列的中心點，找到之後透過 `list_del_init` 將該中心點前後兩個 node 的 pointer 重新指向，再透過 `q_release_element` 將該中心點存放 `value` 的空間釋出。 ### q_delete_dup ```c bool q_delete_dup(struct list_head *head) { if(!head) return false; struct list_head *list_cur = head -> next; struct list_head *list_next = list_cur -> next; while(list_next != head){ if(strcmp(list_entry(list_cur, element_t, list) -> value, list_entry(list_next, element_t, list) -> value) == 0){ while(list_next != head && strcmp(list_entry(list_cur, element_t, list) -> value, list_entry(list_next, element_t, list) -> value) == 0){ list_del_init(list_cur); q_release_element(list_entry(list_cur, element_t, list)); list_cur = list_next; list_next = list_next -> next; } list_del_init(list_cur); q_release_element(list_entry(list_cur, element_t, list)); } if(list_next != head){ list_cur = list_next; list_next = list_next -> next; } } return true; } ``` 這裡主要需要注意 `while` 迴圈內刪除完重複值之後，仍會剩餘一項重複值需做刪除，所以 `while` 迴圈外還有包一個 `if` ，為了處理剩餘未刪除的重複值。 ### q_reverseK ```c void q_reverseK(struct list_head *head, int k) { struct list_head *list_cur = head; struct list_head *tmp_end = head; struct list_head *tmp_next = head; struct list_head tmp_beg; INIT_LIST_HEAD(&tmp_beg); int cnt = 0; while(tmp_end -> next != head){ if(cnt == k){ tmp_next = tmp_end -> next; list_cut_position(&tmp_beg, list_cur, tmp_end); q_reverse(&tmp_beg); list_splice_init(&tmp_beg, list_cur); list_cur = tmp_next -> prev; tmp_end = list_cur; cnt = 0; } else{ tmp_end = tmp_end -> next; cnt = cnt + 1; } } return; } ``` 利用 `cnt` 計數走了幾格，走到第 `k` 格之後使用到了 `list_cut_position` 的 function 來切割這段鏈結串列，之後使用 `q_reverse` 來將切割出來的鏈結串列做 reverse ，最後再將切割出來的鏈結串列合併回去原先鏈結串列的前端，重複以上步驟。其中要注意的是，當後面剩餘的格數不足 `k` 時，則不用做 reverse 。 ### q_sort :::danger 你如何確保目前的測試資料/方式已涵蓋排序演算法的最差狀況？ ::: ```c void q_sort(struct list_head *head, bool descend) { if(!head || list_is_singular(head) || list_empty(head)) return ; struct list_head *list_cur = head; struct list_head *fast = head -> next; struct list_head *slow = head -> next; struct list_head list_new; INIT_LIST_HEAD(&list_new); while(fast -> next != head && fast -> next -> next != head){ fast = fast -> next -> next; slow = slow -> next; } list_cut_position(&list_new, list_cur, slow); q_sort(&list_new, false); q_sort(head, false); q_merge_2list(head, &list_new); return ; } ``` 這裡使用到遞迴的方法去<s>實現</s> merge sort 。 :::danger 留意[資訊科技詞彙翻譯](https://hackmd.io/@sysprog/it-vocabulary) ::: 至於為什麼做 merge sort 時，我們通常都是以中間的節點做分割，將串列切成一半呢? 比較看看不同的切割位置有什麼差異： #### 以第一個節點來切割 ```graphviz digraph BST { node [fontname="Arial" ]; l1 [ label = "１, 2, 3, 4, 5, 6, 7, 8" ]; l21 [ label = "１" ]; l22 [ label = "2, 3, 4, 5, 6, 7, 8" ]; l31 [ label = "2" ]; l32 [ label = "3, 4, 5, 6, 7, 8" ]; l41 [ label = "3" ]; l42 [ label = "4, 5, 6, 7, 8" ]; l51 [ label = "4" ]; l52 [ label = "5, 6, 7, 8" ]; l61 [ label = "5" ]; l62 [ label = "6, 7, 8" ]; l71 [ label = "6" ]; l72 [ label = "7, 8" ]; l81 [ label = "7" ]; l82 [ label = "8" ]; l1 ->{l21 l22} l22 ->{l31 l32} l32 ->{l41 l42} l42 ->{l51 l52} l52 ->{l61 l62} l62 ->{l71 l72} l72 ->{l81 l82} } ``` 從上面的圖可以觀察，這裡把 8 個元素的陣列做切分，每次都以第一個元素做切分，切到每個小陣列都只有 1 個數字時，需要做 7 次切分，當有 n 個元素時，共需要切割 $n-1$ 次。 ```graphviz digraph BST { node [fontname="Arial" ]; l1 [ label = "１, 2, 3, 4, 5, 6, 7, 8" ]; l21 [ label = "１" ]; l22 [ label = "2, 3, 4, 5, 6, 7, 8" ]; l31 [ label = "2" ]; l32 [ label = "3, 4, 5, 6, 7, 8" ]; l41 [ label = "3" ]; l42 [ label = "4, 5, 6, 7, 8" ]; l51 [ label = "4" ]; l52 [ label = "5, 6, 7, 8" ]; l61 [ label = "5" ]; l62 [ label = "6, 7, 8" ]; l71 [ label = "6" ]; l72 [ label = "7, 8" ]; l81 [ label = "7" ]; l82 [ label = "8" ]; {l21 l22} ->l1 {l31 l32} ->l22 {l41 l42} ->l32 {l51 l52} ->l42 {l61 l62} ->l52 {l71 l72} ->l62 {l81 l82} ->l72 } ``` 第一層，將兩個元素合併，最多比較 1 次，合併成一個串列，共需比較 1 次；第二層，將三個元素合併，最多比較 2 次，合併成一個串列，共需比較 2 次；第三層，將四個元素合併，最多比較 3 次，合併成一個串列，共需比較 3 次； ... 第八層，將八個元素合併，最多比較 7 次，合併成一個串列，共需比較 7 次；總共需要的比較次數為 28 次，以此類推，當總共有 n 個數字時，總共需要比較 $1 + 2 + ... + (n-1)$ 次，即 $(1-n)*n/2$ 次。 #### 以中間節點來切割 ```graphviz digraph BST { node [fontname="Arial" ]; l1 [ label = "１, 2, 3, 4, 5, 6, 7, 8" ]; l21 [ label = "１, 2, 3, 4" ]; l22 [ label = "5, 6, 7, 8" ]; l31 [ label = "1, 2" ]; l32 [ label = "3, 4" ]; l33 [ label = "5, 6" ]; l34 [ label = "7, 8" ]; l41 [ label = "1" ]; l42 [ label = "2" ]; l43 [ label = "3" ]; l44 [ label = "4" ]; l45 [ label = "5" ]; l46 [ label = "6" ]; l47 [ label = "7" ]; l48 [ label = "8" ]; l1 -> {l21 l22} l21 ->{l31 l32} l22 ->{l33 l34} l31 ->{l41 l42} l32 ->{l43 l44} l33 ->{l45 l46} l34 ->{l47 l48} } ``` 從上面的圖可以觀察，這裡把 8 個元素的陣列做切分，切到每個小陣列都只有 1 個數字時，需要做 7 次切分，即 $n-1$ 次。 ```graphviz digraph BST { node [fontname="Arial" ]; l1 [ label = "１, 2, 3, 4, 5, 6, 7, 8" ]; l21 [ label = "１, 2, 3, 4" ]; l22 [ label = "5, 6, 7, 8" ]; l31 [ label = "1, 2" ]; l32 [ label = "3, 4" ]; l33 [ label = "5, 6" ]; l34 [ label = "7, 8" ]; l41 [ label = "1" ]; l42 [ label = "2" ]; l43 [ label = "3" ]; l44 [ label = "4" ]; l45 [ label = "5" ]; l46 [ label = "6" ]; l47 [ label = "7" ]; l48 [ label = "8" ]; {l21 l22} ->l1 {l31 l32} ->l21 {l33 l34} ->l22 {l41 l42} ->l31 {l43 l44} ->l32 {l45 l46} ->l33 {l47 l48} -> l34 } ``` 第一層，將兩個元素合併，最多比較 1 次，合併成四個串列，共需比較 4 次；第二層，將四個元素合併，最多比較 3 次，合併成兩個串列，共需比較 6 次；第三層，將八個元素合併，最多比較 7 次，合併成一個串列，共需比較 7 次。總共需要的比較次數為 17 次，以此類推，當總共有 n 個數字，且 n 為 $2^k$ 時，總共需要比較 $(2^{1}-1) *2^{k-1} + (2^{2}-2) *2^{k-2} + ... +(2^{k}-1) *2^{0}$ 次。 | 節點總數 | 以中點切割的比較次數 | 以第一個節點的切割比較次數 | |:-------------------------:|:------------------------:|:---------------------:| | 2 | $1*1=1$ | $1$ | | 3 | $1*1+2*1=3$ | $1+2=3$ | | 4 | $1*2+3*1=5$ | $1+2+3=6$ | | 5 | $1*2+3*1+4*1=9$ | $1+2+3+4=10$ | | 6 | $1*3+3*1+5*1=11$ | $1+2+3+4+5=15$ | | 7 | $1*3+3*1+2*1+6*1=14$ | $1+2+3+4+5+6=21$ | | 8 | $1*4+3*2+7*1=17$ | $1+2+3+4+5+6+7=28$ | | 16 | $1*8+3*4+7*2+15*1=49$ | $1+2+3+...+15=120$ | 比較上述兩個方法，雖然分割時兩個方法所需要的分割數皆相同，但做合併時，以中心點做切割所需的比較次數較低。 ### q_merge_2list ```c void q_merge_2list(struct list_head *list_1, struct list_head *list_2){ struct list_head list_result; INIT_LIST_HEAD(&list_result); while(!list_empty(list_1) && !list_empty(list_2)){ element_t *elem_1 = list_entry(list_1->next, element_t, list); element_t *elem_2 = list_entry(list_2->next, element_t, list); if(strcmp(elem_1 -> value, elem_2 -> value) < 0) list_move_tail(list_1 -> next, &list_result); else list_move_tail(list_2 -> next, &list_result); } if(!list_empty(list_1)) list_splice_tail_init(list_1, &list_result); else if(!list_empty(list_2)) list_splice_tail_init(list_2, &list_result); list_splice_tail_init(&list_result, list_1); } ``` 這裡的想法是預設 `list_1` 和 `list_2` 都是升序排列好的鏈結串列，這樣就可以用兩個指標分別去比較兩者鏈結串列內各個值的大小，再使用 `list_move_tail` 將較小的值放在新創的 `list_result` 中。要注意的是最後要移除暫時建立的 `list_result` 。 :::danger HackMD 不是讓你張貼完整程式碼的地方，GitHub 才是！因此你在開發紀錄只該列出關鍵程式碼 (善用 diff 標示)，可附上對應 GitHub commit 的超連結，列出程式碼是為了「檢討」和「便於他人參與討論」，不是用來「假裝自己有付出」 ::: --- ## 研讀論文〈Dude, is my code constant time?〉 ### 研讀論文《Dude, is my code constant time?》這篇論文提出了一個重要工具 — dudect ，它利用定時攻擊的概念以及洩漏檢測技術來評估程式碼在特定平台上是否以常數時間運行。過去的工具檢測可變時間程式碼存在缺陷，因為需要對硬體進行模擬，而 CPU 的訊息並非完全公開，這增加了實現的難度。相比之下， dudect 基於 leakage 偵測，使其成為一種簡潔、易於使用和易於維護的工具。它與先前的解決方案有所不同，不需要建立硬體行為模型。 :::danger 注意書寫規範，空白字元。 ::: ### simulation 模式 ```c static bool queue_insert(position_t pos, int argc, char *argv[]) { if (simulation) { if (argc != 1) { report(1, "%s does not need arguments in simulation mode", argv[0]); return false; } bool ok = pos == POS_TAIL ? is_insert_tail_const() : is_insert_head_const(); if (!ok) { report(1, "ERROR: Probably not constant time or wrong implementation"); return false; } report(1, "Probably constant time"); return ok; } ``` 首先到 `qtest.c` 內尋找 `simulation` ，若為 `1` 的話會去執行 `is_insert_tail_const` 或 `is_remove_head_const` 看是否該函式會在常數時間內完成。 ```c #ifndef DUDECT_FIXTURE_H #define DUDECT_FIXTURE_H #include <stdbool.h> #include "constant.h" /* Interface to test if function is constant */ #define _(x) bool is_##x##_const(void); DUT_FUNCS #undef _ #endif ``` 後續就是不斷的追程式碼，尋找與常數有關的函數，沿著 `is_insert_tail_const` 函式，會找到 `fixture.h` 。 ```c #define DUT_FUNC_IMPL(op) \ bool is_##op##_const(void) { return test_const(#op, DUT(op)); } ``` 這裡我們會想了解 `test_const` 內在做什麼，於是會找到 `fixture.c` ```c static bool test_const(char *text, int mode) { bool result = false; t = malloc(sizeof(t_context_t)); for (int cnt = 0; cnt < TEST_TRIES; ++cnt) { printf("Testing %s...(%d/%d)\n\n", text, cnt, TEST_TRIES); init_once(); for (int i = 0; i < ENOUGH_MEASURE / (N_MEASURES - DROP_SIZE * 2) + 1; ++i) result = doit(mode); printf("\033[A\033[2K\033[A\033[2K"); if (result) break; } free(t); return result; } ``` 最後這裡就會呼叫 `doit` 函式，即我們主要需要修改的地方。 ```diff static bool doit(int mode) { int64_t *before_ticks = calloc(N_MEASURES + 1, sizeof(int64_t)); @@ -133,8 +155,17 @@ static bool doit(int mode) bool ret = measure(before_ticks, after_ticks, input_data, mode); differentiate(exec_times, before_ticks, after_ticks); - update_statistics(exec_times, classes); - ret &= report(); + int64_t *percentiles = (int64_t*) calloc(DUDECT_NUMBER_PERCENTILES, sizeof(int64_t)); + bool first_time = percentiles[DUDECT_NUMBER_PERCENTILES - 1] == 0; + + if (first_time) { + // throw away the first batch of measurements. + // this helps warming things up. + prepare_percentiles(exec_times, percentiles); + } else { + update_statistics(exec_times, classes); + ret &= report(); + } ``` 比較 [oreparaz/dudect.h](https://github.com/oreparaz/dudect/blob/master/src/dudect.h#L217) 與 [lab0-c/dudect/fixture.c](https://github.com/sysprog21/lab0-c/blob/master/dudect/fixture.c) 的程式碼，會發現 `fixture.c` 內的 `doit` 可以對應到 `dudect.h` 內的 `dudect_main` 函式，在更改程式的時候要注意本身的程式內無 `ctx` 結構，需要找到本身程式碼內對應 `dudect.h` 中的變數名稱，比如說 `dudect.h` 內的 `(ctx->config->number_measurements-1)` 對應本身程式的 `N_MEASURES` 。其餘有使用到但未宣告的函式也需要補上。其中 `dudect.h` 內的作者測試時會先跑 `prepare_percentiles` 進行準備，且丟棄第一筆測資。 ```diff static void update_statistics(const int64_t *exec_times, uint8_t *classes) { - for (size_t i = 0; i < N_MEASURES; i++) { + for (size_t i = 10; i < N_MEASURES; i++) { int64_t difference = exec_times[i]; /* CPU cycle counter overflowed or dropped measurement */ if (difference <= 0) @@ -116,6 +116,28 @@ static bool report(void) return true; } ``` 這裡主要的差別在於 `dudect.h` 內的作者是在第 10 次才開始進行統計，於是進行更改。 ``` +++ TESTING trace trace-17-complexity: # Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant Probably constant time Probably constant time Probably constant time Probably constant time --- trace-17-complexity 5/5 --- TOTAL 100/100 ``` 更改完後 `trace-17-complexity` 的測試即可通關。 --- ## 確認分析 Linux 核心的 lib/list_sort.c 實作並評估其效益 ```c __attribute__((nonnull(2,3,4))) static struct list_head *merge(void *priv, list_cmp_func_t cmp, struct list_head *a, struct list_head *b) { struct list_head *head, **tail = &head; for (;;) { /* if equal, take 'a' -- important for sort stability */ if (cmp(priv, a, b) <= 0) { *tail = a; tail = &a->next; a = a->next; if (!a) { *tail = b; break; } } else { *tail = b; tail = &b->next; b = b->next; if (!b) { *tail = a; break; } } } return head; } ``` 這段程式碼使用了 pointer to pointer 的技巧來進行合併，比較特別的是，透過更改 `*tail` 指定的目標，改變原先節點 `next` 所指向的位置，以此來進行合併動作。這裡還有一點要注意，傳入的 `a` 和 `b` 並非代表指向兩個鏈結串列，而是指向同個鏈結串列上的兩處地方。其中 `__attribute__((nonnull(2,3,4)))` 代表第2, 3, 4個節點不為NUll。 ```c __attribute__((nonnull(2,3,4,5))) static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head *head, struct list_head *a, struct list_head *b) { struct list_head *tail = head; u8 count = 0; for (;;) { /* if equal, take 'a' -- important for sort stability */ if (cmp(priv, a, b) <= 0) { tail->next = a; a->prev = tail; tail = a; a = a->next; if (!a) break; } else { tail->next = b; b->prev = tail; tail = b; b = b->next; if (!b) { b = a; break; } } } ... ``` 這段主要是將單向的鏈結串列，一邊透過 `cmp` 比較大小後進行合併，一邊改成雙向的鏈結串列，如 `tail->next` 和 `a->prev` 。這裡要注意的是傳入的 `head` 是指向一個空的 prev|next 的節點當作 head ，而 `a` 和 `b` 是指向 `head` 連結串列上的某個節點。 ```c __attribute__((nonnull(2,3))) void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp) { ... } ``` 該函式主要是當 `pending` 所指的串列內有 $3 \times 2^k$ 個節點，即 3 個子串列時，會對前 2 個子串列進行合併，即前 $2^k$ 個節點進行合併，使合併後的子串列變成 $2^{k+1}$ 和 $2^k$ 組成。維持 2:1 的形式可以放進 L1 cache 。 ```c /* Find the least-significant clear bit in count */ for (bits = count; bits & 1; bits >>= 1) tail = &(*tail)->prev; ``` 其中 `count + 1` 為 2 的冪時，不會進行合併， `count` 不為 2 的冪時，會進行合併。這行程式碼 `for (bits = count; bits & 1; bits >>= 1)` 會控制 `bits` 的位移，當 `bits` 內的位元皆為 1 時， `for` 迴圈內的 `bits & 1` 會持續成立，直到 `bits` 完全變成 0 為止。 ```c if (likely(bits)) { struct list_head *a = *tail, *b = a->prev; a = merge(priv, cmp, b, a); /* Install the merged result in place of the inputs */ a->prev = b->prev; *tail = a; } ``` 這樣跑到 `if (likely(bits)) ` 這行程式時便不會成立，即不回進行 `merge` ，反之亦然。 ```c /* Move one element from input list to pending */ list->prev = pending; pending = list; list = list->next; pending->next = NULL; count++; ``` 最後這段主要是進行 `list` 串列上的分割，將欲分割的節點給 `pending` 指向的鏈結串列。 ### 嘗試 lib/list_sort.c 引入到 lab0-c 專案 >[commit - 0dd0836](https://github.com/kevinzxc1217/lab0-c/commit/0dd08366904f603abf6a72bbac8104cd1da64427) ```diff // SPDX-License-Identifier: GPL-2.0 - #include <linux/bug.h> - #include <linux/compiler.h> - #include <linux/export.h> - #include <linux/kernel.h> - #include <linux/list_sort.h> - #include <linux/string.h> + #include "list.h" + #include "list_sort.h" + typedef unsigned char u8; + #define likely(x) __builtin_expect(!!(x), 1) + #define unlikely(x) __builtin_expect(!!(x), 0) ... - EXPORT_SYMBOL(list_sort); ``` 首先在 `lab0-c` 內新增 `list_sort.c` ，裡面參考 [lib/list_sort](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) 的程式碼，其中要注意的是原始程式開頭 `include` 的文件， `lab0-c` 內並無這些檔案。新增 `list.h` 讀之前做過 `list_head` 相關的 API ，以及新增 `list_sort.h` 宣告函式。後面就是編譯時會出現未定義的錯誤，需要定義一些 `list_sort.c` 會用到的型態。 ```diff +/* SPDX-License-Identifier: GPL-2.0 */ + #ifndef _LINUX_LIST_SORT_H + #define _LINUX_LIST_SORT_H + + #include "list.h" + + struct list_head; + + typedef int __attribute__((nonnull(2,3))) (*list_cmp_func_t)(void *, + const struct list_head *, const struct list_head *); + + __attribute__((nonnull(2,3))) + void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp); + #endif ``` 參考 [linux/list_sort.h](https://github.com/torvalds/linux/blob/master/include/linux/list_sort.h) 新增會需要用到的函式。 ```diff bool do_list_sort(int argc, char *argv[]) { ... - q_sort(current->q, descend); + list_sort(NULL, current->q, &cmp); ... } ``` 後續這裡是在 `qtest.c` 內新增 `do_list_sort` 函式，用來測試 `list_sort` 有沒有問題，這裡主要參考了 `qtest.c` 內的 `do_sort` ，其中主要是把呼叫的 `q_sort` 改成 `list_sort` 即可。 ```diff + const int cmp(void *priv, + const struct list_head *a, + const struct list_head *b) + { + return strcmp(list_entry(a, element_t, list)->value, + list_entry(b, element_t, list)->value); + } ``` 由於 `list_sort` 需要傳入用來比較大小的函式，於是也在 `qtest.c` 內新增了 `cmp` 的函式。 ```diff static void console_init() { + ADD_COMMAND(list_sort, "Use Linux kernel sorting algorithm", ""); ... } ``` 在 `console_init` 新增 `list_sort` 的 `ADD_COMMAND` 。 ```diff OBJS := qtest.o report.o console.o harness.o queue.o \ random.o dudect/constant.o dudect/fixture.o dudect/ttest.o \ shannon_entropy.o \ - linenoise.o web.o + linenoise.o web.o list_sort.o ``` 最後更改 `Makefile` 內的 `OBJS` 。 ```diff + option fail 0 + option malloc 0 + new + ih RAND 100000 + time list_sort ``` 完成以上後，在 `traces` 內新增 `sort_test.cmd` ，使用 `perf stat --repeat 5 -e instructions,cycles ./qtest -f traces/sort_test.cmd ` 命令來比較 `list_sort` 和 `sort` 的效能。 ```cmd Performance counter stats for './qtest -f traces/sort_test.cmd' (5 runs): 44,899,408 instructions # 1.67 insn per cycle ( +- 3.05% ) 26,821,247 cycles ( +- 6.70% ) 0.02159 +- 0.00131 seconds time elapsed ( +- 6.08% ) ``` ### 嘗試 tim_sort 引入到 lab0-c 專案 >[commit - 68489e3](https://github.com/kevinzxc1217/lab0-c/commit/68489e307b35d972bfa3c0aca3bd86f1d31d355d) 將測驗2內的 [tim_sort](https://hackmd.io/@sysprog/linux2024-quiz1#%E6%B8%AC%E9%A9%97-2) 加到 `lab0-c` 內的 `tim_sort.c` 和 `tim_sort.h` ，其餘測試的方法與 `list_sort` 類似，詳細方法可以參考 commit 內容。 | | **instructions** | **cycles** | |:-------------------------:|:-----:|:---------------:| | q_sort | 47,132,358 | 30,516,990 | | tim_sort | 46,485,028 | 29,192,272 | | list_sort | 44,899,408 | 26,821,247 | 最後從上方表格看出 `list_sort` 執行的 `instructions` 和 `cycles` 數都較 `q_sort` 和 `tim_sort` 少， `list_sort` 的表現較為優異。 --- ## 在 qtest 提供新的命令 shuffle > [commit - 782692b](https://github.com/kevinzxc1217/lab0-c/commit/782692b8fde8a53d074685551d67dc85df992c7d) ```c #include <stdio.h> #include <stdlib.h> #include <time.h> int main() { int len=5; int *arr = malloc(sizeof(int)*len); for (int i = 0; i < len; i++) { arr[i] = i; } for (int i = len - 1; i > 0; i--) { int index = rand() % (i + 1); int tmp = arr[i]; arr[i] = arr[index]; arr[index] = tmp; } for (int i = 0; i < len; i++) { printf("%d",arr[i]); } free(arr); } ``` 這裡實作亂數主要是參考了 Fisher–Yates shuffle 演算法，其核心概念是在長度為 N 的陣列中，經過洗牌後，達到 $N!$ 種不同的可能性，所有出現的排列結果機率相等。 | 迭帶次數 | i | index範圍 | index | 原始陣列 | 交換後陣列 | |:-------------------------:|:-----:|:---------------:|:------------:|:---------:|:-----------:| | 0 | 4 | 0-4 | 3 |1 2 3 **4** **5** | 1 2 3 5 4 | | 1 | 3 | 0-3 | 1 |1 **2** 3 **5** 4 | 1 5 3 2 4 | | 2 | 2 | 0-2 | 2 |1 5 **3** 2 4 | 1 5 3 2 4 | | 3 | 1 | 0-1 | 0 |**1** **5** 3 2 4 | 5 1 3 2 4 | | 4 | 0 | 0 | 0 |**5** 1 3 2 4 | 5 1 3 2 4 | 這裡假設一開始的陣列為 `1 2 3 4 5` ，每次取 `index` 都是隨機的，總共的可能性為 $5 \times 4 \times 3 \times 2 \times 1 = 5!$ 。 ```c void q_shuffle(struct list_head *head) { if(!head) return; int len = q_size(head); int i = len - 1; int tmp_value; struct list_head *cur = head -> next; struct list_head *tmp = head -> prev; while(i > 0){ int index = rand() % (i+1); while(index){ cur = cur -> next; index--; } element_t *ele_cur = list_entry(cur, element_t, list); element_t *ele_tmp = list_entry(tmp, element_t, list); tmp_value = *(ele_tmp -> value); *(ele_tmp -> value) = *(ele_cur -> value); *(ele_cur -> value) = tmp_value; tmp = tmp -> prev; cur = head -> next; i--; } } ``` 同樣參考 Fisher–Yates shuffle 的演算法，不過這裡使用鏈結串列來實作亂數，主要使用了 `*tmp` 和 `*cur` 兩個指標指向該次迭代所需要交換的節點，再去交換節點彼此 `element_t` 內的 `value` 。 ``` Expectation: 41666 Observation: {'1234': 41689, '1243': 41541, '1324': 41678, '1342': 41587, '1423': 42108, '1432': 41901, '2134': 41836, '2143': 41800, '2314': 41245, '2341': 41456, '2413': 41408, '2431': 41867, '3124': 41557, '3142': 41617, '3214': 41500, '3241': 41629, '3412': 41929, '3421': 41903, '4123': 41767, '4132': 41455, '4213': 41930, '4231': 41809, '4312': 41598, '4321': 41190} chi square sum: 28.630106081697306 ``` ![shuffle](https://hackmd.io/_uploads/Sk5N0IVRT.png) 這裡使用了作業說明內的[測試程式](https://hackmd.io/@sysprog/linux2024-lab0/%2F%40sysprog%2Flinux2024-lab0-d#%E6%B8%AC%E8%A9%A6%E7%A8%8B%E5%BC%8F)，總共測試了 [1, 2, 3, 4] 數組 1,000,000 次的 `shuffle` 函式，由直方圖可見輸出呈現均勻分佈。 :::danger 翻閱統計學教科書，探討什麼量級方可代表樣本特徵，搭配對應的數學分析來解讀上述實驗。 ::: ## tic-tac-toe ### 將 [jserv/ttt](https://github.com/jserv/ttt) 專案的程式碼部分整合進 [lab0-c](https://github.com/kevinzxc1217/lab0-c) > [commit - b420aed](https://github.com/kevinzxc1217/lab0-c/commit/b420aedbd330136ec1cee11554cf0bd85ab89ed4) > 主要需要改進的地方為 `qtest.c` 和 `Makefile` 。將 [jserv/ttt](https://github.com/jserv/ttt) 內的 `main.c` 整合到 `qtest.c` 內，其中 `main.c` 內的 `int main()` 對應到 `qtest.c` 新增的 `do_ttt`，以即額外再新增 `ttt` 命令。 ```diff OBJS := qtest.o report.o console.o harness.o queue.o \ random.o dudect/constant.o dudect/fixture.o dudect/ttest.o \ shannon_entropy.o \ - linenoise.o web.o list_sort.o tim_sort.o + linenoise.o web.o list_sort.o tim_sort.o \ + game.o \ + mt19937-64.o \ + zobrist.o \ + agents/negamax.o \ + agents/mcts.o ... check: qtest ./$< -v 3 -f traces/trace-eg.cmd + ttt: qtest + ./$< -v 3 -f traces/ttt.cmd ``` 我們在 `Makeflie` 內的 `OBJS` 新增整合進來的會編譯到地檔案，以及參考 `check` 命令的語法，新增 `ttt` 命令，該命令主要負責去執行 `qtest.c` 內的 `do_ttt` 函式。 ```c $ make ttt CC qtest.o CC console.o CC harness.o CC game.o CC zobrist.o CC agents/negamax.o CC agents/mcts.o LD qtest ./qtest -v 3 -f traces/ttt.cmd 1 | 2 | 3 | 4 | ---+------------- A B C D ``` 我們可以透過 `make ttt` 命令去嘗試看看程式是否能夠順利執行。 ### 電腦 vs 電腦 ```diff static bool do_ttt() { ... if (turn == ai) { int move = negamax_predict(table, ai).move; if (move != -1) { table[move] = ai; record_move(move); } } else { + if(ai_vs_ai){ + if(first_move){ + move = rand() % (BOARD_SIZE*BOARD_SIZE); + first_move = 0; + } + else + move = negamax_predict(table, turn).move; + + if (move != -1) { + table[move] = turn; + record_move(move); + } + } ``` 新增 `AI_vs_AI` 功能，參考 `turn == ai` 的程式，當作第二個 AI 執行時的步驟，其中額外針對 AI 下的第一步棋的位置進行亂數處理，避免每次都是重複結果。 ```diff + add_param("AI_vs_AI",&ai_vs_ai,"AI vs AI play Tic-Tac-Toe", NULL); add_param("length", &string_length, "Maximum length of displayed string", NULL); ``` 額外新增參數 `AI_vs_AI` ，允許使用者切換「人 vs.電腦」或「電腦 vs. 電腦」的對弈，其中 `AI_vs_AI` 預設為 0 ，表示為「人 vs.電腦」，若要切換模式為「電腦 vs. 電腦」，將 `AI_vs_AI` 設置為 1 即可。 ```c $ ./qtest cmd> option AI_vs_AI 1 cmd> ttt ``` 照以上步驟及可切換模式到「電腦 vs. 電腦」。 ```diff turn = turn == 'X' ? 'O' : 'X'; + if (ai_vs_ai) { + draw_board(table); + time_t rawtime; + struct tm *timeinfo; + time(&rawtime); + timeinfo = localtime(&rawtime); + char buffer[80]; + strftime(buffer, sizeof(buffer), "%Y-%m-%d %H:%M:%S", timeinfo); + printf("%s\n",buffer); } ``` 最後，新增下每步棋的時間顯示功能。 #### Monte Carlo Tree Search > 參考 [Monte Carlo Tree Search 解說｜AlphaGo的核心演算法](https://www.youtube.com/watch?v=J3I3WaJei_E&ab_channel=%E8%B5%B0%E6%AD%AA%E7%9A%84%E5%B7%A5%E7%A8%8B%E5%B8%ABJames) - Select Leaf Node：若遇到兩個節點以上的選擇時，需透過 UCB(Upper Confidence Bound) 的公式去做計算，如下所示： $\frac{w_i}{n_i}+c\sqrt{\frac{ln\ N_i}{n_i}}$ $w_i$：從 $i$ 節點出發的獲勝次數 $n_i$：為從 $i$ 節點出發的模擬次數 $N_i$：從根結點出發的模擬次數 $c$ ：為常數，通常為 $\sqrt{2}$ 其中，公式左半部為紀錄走該路徑獲勝的機率；右半部為節點被訪問的次數，該節點訪問越多次，值越小，從而導向去走訪那些尚未探索的節點。可以有效的將未探索與已探索的節點做平衡，從而幫助找出最佳答案。 - Rollout or Expand： - Rollout：如果尋找到的節點為尚未更新的節點，則執行 Rollout ，以該節點為起點，模擬完整場遊戲直到結束，並獲得比賽結果。獲勝則對該節點所記錄的 $w_i$ 以及 $n_i$ 加 1 ，失敗則僅對 $n_i$ 加 1。 - Expand：若該節點已更新過，則執行 Expand，往下生成兩個新節點，並且回到 Select Leaf Node 階段。 - Backpropagate：Rollout 結束後，向上更新所有相關節點的 $w_i$ 和 $n_i$ ，每個節點的 $w_i$ 和 $n_i$ 為其所延伸子節點的 $w_i$ 和 $n_i$ 總和。 ### 用定點數取代浮點數運算 > [commit - 02ef4d7](https://github.com/kevinzxc1217/lab0-c/commit/02ef4d754d954097977e7facfbeb8c12ef112e9d) #### 定點數定點數是一種表示實數的數字系統，其中小數點的位置是固定的，不會隨著數值的大小而改變。在定點數表示中，通常會將一個整數分成兩部分：整數部分和小數部分。小數點固定在某個位置（通常是在最低位數的右側），因此該位置之前的數字是整數部分，該位置之後的數字是小數部分。 - 特點 - 小數點位置固定，不會隨數值的變化而改變 - 可以是有符號或無符號的 - 固定的小數點位置決定了精度和範圍 - 算術運算速度快，適合嵌入式系統和即時系統 #### 浮點數浮點數是一種表示實數的數字系統，其中小數點的位置可以根據數值的大小而移動。在浮點數表示中，一個數值通常表示為一個尾數（mantissa）乘以某個基數的冪（指數），其中基數通常為2。這種表示方式允許表示非常大或非常小的數，同時保持良好的精度。 - 特點 - 小數點的位置可以浮動，允許表示非常大或非常小的數 - 通常使用 IEEE 754 標準來定義浮點數的表示 - 可以是單精度（32位元）或雙精度（64位元）浮點數 - 精度和範圍通常比定點數更高將原本程式中使用到 `double` 的型態，改成 `uint64_t` 的型態。 ```diff uint64_t calculate_win_value(char win, char player) { if (win == player) - return 1.0; + return 1ULL << FIXED_SCALING_BITS; if (win == (player ^ 'O' ^ 'X')) - return 0.0; + return 0ULL; - return 0.5; + return 1ULL << (FIXED_SCALING_BITS - 1); } ``` 將程式中小數點部分改成用定點數型態，其中 0.5 為 1 除以 2 ，等同將 1 向右位移一步。 ```c static inline unsigned long uct_score(int n_total, int n_visits, unsigned long score) { if (n_visits == 0) return ULONG_MAX; unsigned long ret = score / n_visits + ((fixed_sqrt(2 << FRACTIONAL_BITS) * fixed_sqrt(fixed_log(n_total << FRACTIONAL_BITS) / n_visits)) >> FRACTIONAL_BITS); return ret; } ``` 這裡主要是將原本的 `log` 以及 `sqrt` 改成自訂義的定點數函式 `fixed_log` 和 `fixed_sqrt` ，其中原程式的 `EXPLORATION_FACTOR`　為定義在 `mcts.h` 內的 `sqrt(2)` ，需為轉定點數運算，且最後乘法計算時，因兩個定點數做乘法，所以要 `>> FRACTIONAL_BITS` ，先前的 `score / n_visits` 因一個定點數除以整數，所以計算完成後不用再做位移，仍是定點數。 ```c double fixed_to_float(unsigned long fixed_point) { return (double)(fixed_point) / (1 << FRACTIONAL_BITS); } unsigned long sqrt = fixed_sqrt(16<<FRACTIONAL_BITS); printf("fixed_sqrt: %lu\n",sqrt); printf("float_sqrt: %f\n",fixed_to_float(sqrt)); ``` 寫一個函式將定點數轉浮點數，測試 `fixed_log` 和 `fixed_sqrt` 定點數運算是否正確。 :::danger 探討精確度，針對 MCTS 場景去分析 ::: ### 引入若干 coroutine > [commit - 7d4b778](https://github.com/kevinzxc1217/lab0-c/commit/7d4b77858931faf1f74f897d9347c414eaf87eaa) ```c int coro(void) { move_count = 0; first_move = 1; enableRawMode(); void (*registered_task[])(void *) = {task1, task2}; struct arg arg0 = {.n = 70, .i = 0, .task_name = "Task 1"}; struct arg arg1 = {.n = 70, .i = 1, .task_name = "Task 2"}; struct arg registered_arg[] = {arg0, arg1}; tasks = registered_task; args = registered_arg; ntasks = ARRAY_SIZE(registered_task); memset(table, ' ', N_GRIDS); schedule(); return 0; } ``` 這裡分成兩個 task 進行排程，分別是 task1 和 task2，在對弈時， task1 是讓 ai1 進行 mcts 的演算法，而 task2 是讓 ai2 進行 negamax 的演算法。參考 [Build Your Own Text Editor](https://viewsourcecode.org/snaptoken/kilo/) 加入鍵盤事件處理，按下鍵盤的 'P' 鍵可以暫停或繼續對奕，若要退出對奕可以按下 'Q' 鍵。