2024q1 Homework1 (lab0)

# 2024q1 Homework1 (lab0) contributed by < `LindaTing0106` > ### Reviewed by `lintin528` 在有多個節點被移動的狀況，像是 `q_reverseK` ，我的做法通常會是直接在原本的鏈結串列中使用 `list_move` 去做移動，這樣可以避免 `cut` 跟 `splice` 的使用，還有一個額外的 `list_head` 結構 `buf` ，這樣的話除了交換的動作外會多花費一些計算量，但這麼做的缺點就是程式的可讀性不如你現在的做法，我認為這邊可以做一下取捨。 ## 開發環境 ```shell $ gcc --version gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 36 On-line CPU(s) list: 0-35 Thread(s) per core: 2 Core(s) per socket: 18 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz Stepping: 7 CPU max MHz: 4800.0000 CPU min MHz: 1200.0000 BogoMIPS: 6000.00 Virtualization: VT-x L1d cache: 576 KiB L1i cache: 576 KiB L2 cache: 18 MiB L3 cache: 24.8 MiB NUMA node0 CPU(s): 0-35 ``` ## 指定的佇列操作 ### `q_new` 在使用 `malloc` 配置空間後，檢查是否有配置成功，如配置失敗則回傳 `NULL` ，如配置成功則讓頭尾指標都指向自己，並回傳自身。 ### `q_free` 先檢查 `head` 是否為 NULL，利用 `list.h` 的巨集 `list_for_each_entry_safe` ，將鏈結內每個節點所配置的空間釋放掉。 ```c void q_free(struct list_head *l) { if (!l) return; element_t *entry, *safe; list_for_each_entry_safe (entry, safe, l, list){ free(entry->value); free(entry); } free(l); } ``` ### `q_insert_head` 利用 `strcpy` 將來源字串複製給要新增的節點。 ```c bool q_insert_head(struct list_head *head, char *s) { if (!head) return false; element_t *newnode = malloc(sizeof(element_t)); if (!newnode) return false; newnode -> value = malloc((strlen(s) + 1) *sizeof(char)); if (!newnode->value){ free(newnode); return false; } strcpy(newnode -> value,s); list_add(&newnode->list,head); return true; } ``` ### `q_remove_head` 利用 `list_first_entry` 找出節點對應在鏈結中的位置，再用 `bufsize` 和節點字串的長度，控制複製幾個字元。最後使用 `list_del` 將此節點移除。 ```c element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize) { if (!head || q_size(head) == 0) return NULL; element_t *remove = list_first_entry(head, element_t, list); if (strlen (remove->value)>bufsize){ strncpy(sp, remove->value,bufsize); *(sp+bufsize-1) = '\0'; } else strncpy(sp, remove->value,strlen (remove->value)+1); list_del(&(remove->list)); return remove; } ``` ### `q_delete_mid` 使用計數的方式尋找鏈結中的中心點，將其移除後，再進行節點配置空間的釋放。但因為用到 `q_size` 導致時間複雜度為 O(n) 。 ```diff bool q_delete_mid(struct list_head *head) { - if (!head || q_size(head) == 0) + if (!head || list_empty(head)) return false; int num = q_size(head); num = num/2 +1; int count = 0; struct list_head *del = head; while(count != num) { del = del->next; count++; } element_t *remove = list_entry(del, element_t, list); list_del(del); free(remove->value); free(remove); return true; } ``` :::danger 使用 `diff` 或 `git diff` 命令產生程式碼變更，而非憑感覺標示。 ::: :::warning 這邊我認為可以使用快慢指標實作，他在執行時會有較好的時間局部性，因相鄰的記憶體較容易被連續造訪。 :notes: lintin528 ::: > 以改用快慢指標實作，我知道的是原本的做法會導致 `O(n^2)` ，改用快慢指標則可以降至`O(n)` ，但我的疑惑是這與時間局部性有關係嗎。 ### `q_delete_dup` 使用 `find` 來定位要檢查的字串節點，然後通過 `findsame` 遍歷鏈結。如果找到節點字串相同，則將 `findsame` 找到的節點刪除並將 `del` 設為 1 ，表示隨後需要將 `find` 也刪除。 ### `q_swap` 使用 `list_move` 將節點兩兩交換。 ### `q_reverseK` :::danger 留意[資訊科技詞彙翻譯](https://hackmd.io/@sysprog/it-vocabulary) ::: 用迴圈的方式去搜索要反向的範圍到哪，呼叫 `list_cut_position` 先把要反向的部分移到 `buf` 中在呼叫寫好的函式<s>實現</s> 實作部分反向的功能。利用 `seark` 保存下次開始反向的起點。 ```c void q_reverseK(struct list_head *head, int k) { struct list_head *cur = head; struct list_head *seark = cur->next; struct list_head buf; INIT_LIST_HEAD(&buf); int count = 1; while (seark!=head){ while (count!=k){ if (seark == head) break; seark = seark->next; count ++; } if (seark == head) break; list_cut_position(&buf, cur, seark); q_reverse(&buf); seark = buf.prev; list_splice(&buf,cur); INIT_LIST_HEAD(&buf); cur = seark; seark = seark->next; count = 1; } } ``` ### `q_sort` 最開始使用氣泡排序法來處理排序問題。 ```c void q_sort(struct list_head *head, bool descend) { int times = q_size(head); int num = 0; struct list_head *cur,*nex; while(num != times){ cur = head->next; nex = cur->next; while (nex != head && cur != head){ element_t *curele = list_entry(cur, element_t, list); element_t *nexele = list_entry(nex, element_t, list); if (strcmp(curele->value, nexele->value)>=0){ list_move(cur, cur->next); nex = cur->next; } else{ cur = cur->next; nex = nex->next; } } num = num+1; } } ``` 但因為時間複雜度太高，而後改為合併排序法。找到串列的中間節點後，利用迭回方式不斷進行拆分，直到串列中只有一個節點，再對其進行合併。 ```c void q_sort(struct list_head *head, bool descend) { if (!head || list_empty(head) || list_is_singular(head)) return; struct list_head *mid, *left, *right; left = head->next; right = head->prev; while (left != right && left->next != right){ left = left->next; right = right->prev; } mid = left; struct list_head leftside; list_cut_position(&leftside, head, mid); q_sort(&leftside, descend); q_sort(head, descend); q_merge_two(head, &leftside, descend); } ``` ### `q_descend` 用兩個指標比較左側的值是否小於右側，如果成立的話則將左邊節點刪除。為了避免指標需要重新指到 `head->next` ，將指標指向頭，並且用 `next` 的值去進行比較及刪除。 ```c int q_descend(struct list_head *head) { struct list_head *one = head; bool del = 0; while (one->next != head){ struct list_head *com = one->next; element_t *oneele = list_entry(one->next, element_t, list); while(com->next != head){ element_t *comele = list_entry(com->next, element_t, list); if (strcmp(oneele->value, comele->value)<0){ list_del(one->next); free(oneele->value); free(oneele); del = 1; break; } com = com->next; } if (!del){ one = one->next; } else del = 0; } return q_size(head); } ``` :::danger 改進漢語表達。 ::: ### `q_merge_two` 合併兩個鏈結串列，建立一個 `head` 當作鏈結串列暫時的頭，比較兩個鍵結串列大小後，將比較完的結果移動到 `head` 的尾端，直到其中一個鍵結串列為空後，將另一個鏈結串列剩餘的節點都移動到 `head` 的尾端。最終再將整個鏈結串列(除了 `head` )接回第一個鏈結串列。 ### `q_merge` 因為這裡的 `head` 是佇列的鏈結串列的頭，故我們可以使用這個指標去取得每個佇列，再透過呼叫 `q_merge_two()` 將兩兩佇列合併，一直合併到只剩一個佇列就可以回傳總共的節點數量了。 ```c int q_merge(struct list_head *head, bool descend) { queue_contex_t *queue = list_first_entry (head, queue_contex_t, chain);// find the first queue int num = q_size(queue->q); if (list_is_singular(head)) return num; struct list_head * movelist = head->next->next; queue_contex_t *moveque = list_entry (movelist , queue_contex_t, chain); while(movelist != head){ num = num + q_size(moveque->q); q_merge_two(queue->q, moveque->q, descend); movelist = movelist->next; moveque = list_entry (movelist , queue_contex_t, chain); } return num; } ``` :::danger HackMD 不是讓你張貼完整程式碼的地方，GitHub 才是！因此你在開發紀錄只該列出關鍵程式碼 (善用 diff 標示)，可附上對應 GitHub commit 的超連結，列出程式碼是為了「檢討」和「便於他人參與討論」，不是用來「假裝自己有付出」 ::: :::warning 若 `q_merge` 中，已經得到兩個鏈結串列的頭且經過排序了，或許可以嘗試用 `q_merge_two` 直接將兩者合併。 :notes: lintin528 ::: > 我在上面的程式中，原本就有使用 q_merge_two 將佇列合併了，不太清楚同學的意思。 ## 研讀論文〈[Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf)〉此論文在介紹時提到了，要去評估實作是否為常數時間複雜度，是一件困難的事，但這就會造成偵測有無 time leaks 的困難度也導致安全性低下(由於 Timing attacks )。已經有多個可以檢測可變時間程式碼的工具，但多受制於需要去建模硬體的行為。因此作者的貢獻為提出了 dudect ，可以在給予的平台上檢測程式碼是否為常數時間複雜度。 - dudect 利用測試執行時間上有無 Timging leakage ，其中輸入的資料分為固定與隨機，經過後處理後，透過統計測試來嘗試證明 "兩個時序分佈相等" 為假。 - 測試完後需要再接數據進行後處理，方便於後續的統計分析。 - 再經過統計分析試圖反駁 " 兩個時間分佈相等 " ，使用 Welch’s t-test ，測試兩個時序的平均值是否相等，如果沒有通過此測試的話，表示其有時序洩漏的問題。測試 memcmp 的時間變化性，用此函式去比較一個固定的字串以及輸入字串，使用時間分布圖呈現時，可以發現兩者在圖上的分布明顯不同，固有時序洩漏存在。 ### simulation 模式在 `qtest.c` 中 `insert` 和 `remove` 的區塊中會去檢查 `simulation` 是否為 1 ，如果在 `insert` 中 `simulation` 為 1 時，會使用 `is_insert_tail_const` 和 `is_insert_head_const` 去檢查函式是否是常數時間內可以處理完的。找尋 `is_insert_tail_const` 函式時，我使用 `vscode` 找查其定義，一到搜尋到 `constant.c` 中。 - 在 `measure` 中，可以看到其會將隨機生成的測資與固定測資放入串列中，並且去儲存他這兩個操作做完後對應的時間。由於 `fixture.c` 有引用 `constant.c` ，在此接著看程式碼。 - 在 `doit` 中開始執行，一開始會先生成輸入，接著執行 `measure` ，執行完後會將兩個時間相減，之後將這筆資料放入統計中，檢查統計的數量是否已經足夠。比較 `dudect.h` 與 `fixture.c` 的程式碼後，發現其中 `dudect_main` 對應於 `doit` ，其中差別在於 `dudect_main` 中並沒有使用 `differentiate` ，而是將算執行時間的功能直接整合於 `measure` 中，另一點差別在於 `doit` 中沒有去判斷第一筆測試，在 `dudect` 中作者將第一批次的測試丟棄，並執行 `prepare_percentiles` 作為暖身。並且在於作者的 `update_statistics` 中，統計的動作也是在第十筆後才開始，故更改程式碼。 ```diff + for (size_t i = 10; i < N_MEASURES - 1; i++) { - for (size_t i = 0; i < N_MEASURES; i++) { ``` ```diff + bool first_time = percentiles[DUDECT_NUMBER_PERCENTILES - 1] == 0; + + if (first_time) { + // throw away the first batch of measurements. + // this helps warming things up. + prepare_percentiles(exec_times, percentiles); + } else { + update_statistics(exec_times, classes, percentiles); + ret &= report(); + } ``` 並且加入在 fixture.c 中缺少的程式片段。其中註解提到此處是為了設置不同的閾值來裁剪測試值，但還不是很理解為何需要這樣做。 ```c static int cmp(const int64_t *a, const int64_t *b) { return (int)(*a - *b); } static int64_t percentile(int64_t *a_sorted, double which, size_t size) { size_t array_position = (size_t)((double)size * (double)which); assert(array_position < size); return a_sorted[array_position]; } static void prepare_percentiles(int64_t *exec_times, int64_t *percentiles) { qsort(exec_times, N_MEASURES, sizeof(int64_t), (int (*)(const void *, const void *))cmp); for (size_t i = 0; i < DUDECT_NUMBER_PERCENTILES; i++) { percentiles[i] = percentile( exec_times, 1 - (pow(0.5, 10 * (double)(i + 1) / DUDECT_NUMBER_PERCENTILES)), N_MEASURES); } } ``` 將程式碼更動完後， trace-17-complexity 的 insert 部分就通過了，但 remove 的部分出現非法寫入，一開始以為是使用 strncpy 時的參數設置有誤，後來使用 make valgrind。 ``` ==930078== Invalid write of size 1 ==930078== at 0x484F010: strncpy (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==930078== by 0x10F903: strncpy (string_fortified.h:95) ==930078== by 0x10F903: q_remove_head (queue.c:82) ==930078== by 0x110563: measure (constant.c:121) ==930078== by 0x110B95: doit (fixture.c:166) ==930078== by 0x110B95: test_const (fixture.c:207) ==930078== by 0x110CCE: is_remove_head_const (fixture.c:220) ==930078== by 0x10C4D2: queue_remove (qtest.c:302) ==930078== by 0x10C91C: do_rh (qtest.c:410) ==930078== by 0x10E031: interpret_cmda (console.c:181) ==930078== by 0x10E5E6: interpret_cmd (console.c:201) ==930078== by 0x10E9E7: cmd_select (console.c:610) ==930078== by 0x10F2D3: run_console (console.c:705) ==930078== by 0x10D423: main (qtest.c:1258) ==930078== Address 0x0 is not stack'd, malloc'd or (recently) free'd ``` 看到 Address 0x0 發現應該是對 NULL 進行到寫入的動作，所以在將字串複製到 sp 前，需要對 sp 先做檢查。 ## 確認分析 Linux 核心的 [lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) 實作並評估其效益 `void list_sort` 中使用合併排序法來做排序，其中它會讓要合併的 list 保持在 2 : 1 ，如果有兩個已經排序好的子串列，他們的節點數量都為 $2^k$ ，在當我們有另一個數量為 $2^k$ 的元素後，才會再將剛剛的兩個子串列合併起來，其大小變為 $2^{k+1}$ ，使最終合併 : 未合併的比例維持在 2 : 1 。在合併排序中使用此種方式，就可以將 $3 \times 2^k$ 個元素放入快取中，避免 Cache Thrashing 。 :::warning 為何此舉可以「避免 Cache Thrashing」？ ::: :::info 想法為：在 count： 5 -> 6 的階段初， pending list 應為： 1-1-2-2 ，為了保持合併 : 未合併的元素為 2 : 1 的狀態，應該要將兩個 $2^2$ 合併為 $2^3$ ，如此就可以維持此比例。但若不維持此規則，我們先將後面兩個 $2^0$ 進行合併，如此一來在 count： 6 -> 7 時，才會合併兩個 $2^2$ ，從而導致在後面的回合中，若有元素要進行合併，在其快取失誤時，會先取代掉剛才兩個 $2^0$ 在快取中的位置，而在之後若又需要與其合併時，又需要再將其呼叫至快取處，從而導致 Cache Thrashing 發生。未保持 2 : 1： ```graphviz digraph count56 { #rankdir = LR; node[shape=box]; {rank="same"; 4; 1 ;3 ;5 ; 2 ; 7;}; 1->4; 3->5; 2->7; node[shape="record"]; node1 [label= "<0>1|<1>4|<2>3|<3>5|<4>2|<5>7"]; } ``` ```graphviz digraph count78 { #rankdir = LR; node[shape=box]; {rank="same"; 4; 1 ;3 ;5 ; 2 ; 7; 9; 6;}; 1->3->4->5; 2->7; 6->9; node[shape="record"]; node1 [label= "<0>1|<1>4|<2>3|<3>5|<4>6|<5>9"]; } ``` ```graphviz digraph count89 { #rankdir = LR; node[shape=box]; {rank="same"; 4; 1 ;3 ;5 ; 2 ; 7; 9; 6; 8;}; 1->3->4->5; 2->6->7->9; node[shape="record"]; node1 [label= "<0>2|<1>7|<2>3|<3>5|<4>6|<5>9"]; } ``` ::: 其中合併的動作是由 count 所控制的， count 是已經排好的子串列中節點的個數。每次當 count 增加，我們會將 bit k 設為 1 ，並將其其他位數字設為 0 。也就是說，每次合併只會在 count 為 $n \times 2^k$ ($n$ 為奇數) 時發生。在程式碼中， `void list_sort` 會先將要進行排序的雙向鏈結串列變為單向，一開始 `*pending` 指向 `NULL` ，當 `list` 不為 `NULL` 時進到迴圈，迴圈內會檢查是否要做合併，並且每次將 `list` 的一個元素給到 `pending` 中， `pending` 中是用 prev 來串接節點。最剛開始時，此時 `pending` 中沒有元素， `bit` 為 0 。 ```graphviz digraph all_list { #rankdir = LR; node[shape=box]; {rank="same"; 4; 1 ;3 ;5 ; 2 ; 7;} 4 -> 1 -> 3 -> 5 -> 2 -> 7; node[shape=none]; {rank="same"; 4; 1 ;3 ;5 ; 2 ; 7;NULL} 7->NULL; pending->NULL; list->4; } ``` 將 `list` 的元素給予 `pending` ， `count` 為 1 ，所以還是不會做合併的動作。 ```graphviz digraph all_list { rankdir = LR; node[shape=box]; 4; 1 -> 3 -> 5 -> 2 -> 7; node[shape=none]; //{rank="same"; 4; 1 ;3 ;5 ; 2 ; 7;NULL} 7->NULL; pending->4; list->1; 4->NULL; } ``` 此時 `count` 為 2'b10 了，故進到可以合併的條件式中。 ```graphviz digraph all_list { rankdir = LR; node[shape=box]; 4,1; 3 -> 5 -> 2 -> 7; node[shape=none]; //{rank="same"; 4; 1 ;3 ;5 ; 2 ; 7;NULL} 7->NULL; pending->1; list->3; 1->4; 4->NULL; a->1;b->4; } ``` 開始做合併後，就用 `next` 來連接節點了，原本在 `void list_sort` 中的指標 `a` 和 `b` 會變成 `b` 和 `a` 的順序傳入 `struct list_head *merge` 中。排序後變為圖中樣子 (箭頭表示 next )。 ```graphviz digraph all_list { rankdir = LR; node [shape= "record"]; node[shape=box]; 4,1; node[shape=none]; //{rank="same"; 4; 1 ;3 head ->1; 1->4; 4->NULL; } ``` 則利用以下表格即可推算出當 `count` 數值為何時，會不會做合併、要合併的串列，與做完合併後串列的模樣。 | 開始迭代時的 Count | 是否進行合併 | 開始迭代時的 pending list | 合併操作後 pending list | |:-------------------------:|:-----:|:--------------------------------:|:------------------------------:| | 0 | No | NULL | X | | 1 | No | 1 | X | | 2 | Yes | 1-1 | 2 | | 3 | No | 1-2 | X | | 4 | Yes | 1-1-2 | 2-2 | | 5 | Yes | 1-2-2 | 1-4 | | 6 | Yes | 1-1-4 | 2-4 | | 7 | No | 1-2-4 | X | | 8 | Yes | 1-1-2-4 | 2-2-4 | | 9 | Yes | 1-2-2-4 | 1-4-4 | | 10 | Yes | 1-1-4-4 | 2-4-4 | | 11 | Yes | 1-2-4-4 | 1-2-8 | | 12 | Yes | 1-1-2-8 | 2-2-8 | | 13 | Yes | 1-2-2-8 | 1-4-8 | | 14 | Yes | 1-1-4-8 | 2-4-8 | | 15 | No | 1-2-4-8 | X | | 16 | Yes | 1-1-2-4-8 | 2-2-4-8 | > 取自 [DokiDokiPB](https://hackmd.io/@DokiDokiPB/SJfES914O) ### 嘗試 lib/list_sort.c 引入到 lab0-c 專案將 Linux 的 [list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) 引入 qtest.c 中，途中會遇到很多衝突，以下是我有遇到且解決掉的方式。 - `int count` 的加入：在 `list_sort.c` 中有出現很多傳遞 `priv` 的地方，觀察 timsort 程式碼發現其 priv 是用來計算比較次數的，故加入此變數。 - `u8` 改為 `uint8_t`：編譯程式時，總會出現這個未定義過的型態，故我將其改為了型態相同的 `uint8_t` 。 - `likely` 、 `unlikely` 的定義：由於此函式是為了告訴編譯器，哪個分支更有可能發生或不太可能發生，可以幫助編譯器，但由於無法直接引入 <linux/compiler.h> ，故需要自行定義其行為。 - 加入 `int compare` 函式。使用 `perf stat --repeat 5 -e instructions,cycles ./qtest -f traces/trace-sort.cmd` 比較在 `queue.c` 中實作的 `q_sort` 和 `list_sort` 間的效能差異。可以看出 `list_sort` 的 `cycles` 數和 `instructions` 數都少於我們實作的 `q_sort` 。 | | instructions | cycles | |:---------:|:------------:|:----------:| | q_sort | 47,802,626 | 30,013,600 | | list_sort | 46,498,077 | 29,716,110 | ### 嘗試 timsort 引入到 lab0-c 專案將 timsort 放入 lab0-c 並加以測試其效能，在我測試出來的結果可以看到在指令數量上跟 q_sort 差不多，但其 CYCLE 明顯少於 q_sort 。 | | instructions | cycles | |:---------:|:------------:|:----------:| | q_sort | 47,802,626 | 30,013,600 | | timsort | 47,693,459 | 29,732,870 | | list_sort | 46,498,077 | 29,716,110 | > 參考 [vax-r](https://hackmd.io/@vax-r/linux2024-homework1#%E6%AF%94%E8%BC%83-timsort-%E8%88%87-list_sort) 對於兩種排序的比較方式再利用作業二提供的程式來比較 list_sort 和 timsort 之間的比較次數。一開始先利用原本程式中亂數產生的方式，來看通常情況下他們兩者之間的比較次數。 | 亂數串列 | timsort | list_sort | |:--------------:|:-------:|:---------:| | 第一次比較次數 | 9480 | 8709 | | 第二次比較次數 | 9409 | 8728 | | 第三次比較次數 | 9603 | 8707 | | 第四次比較次數 | 9570 | 8700 | | 第五次比較次數 | 9444 | 8723 | 如果在串列中的元素都為`單調遞增`時，資料會呈現。 | 單調遞增串列 | timsort | list_sort | |:--------------:|:-------:|:---------:| | 第一次比較次數 | 999 | 5036 | | 第二次比較次數 | 999 | 5036 | | 第三次比較次數 | 999 | 5036 | | 第四次比較次數 | 999 | 5036 | | 第五次比較次數 | 999 | 5036 | 如果在串列中的元素都為`單調遞減`時，資料會呈現。 | 單調遞減串列 | timsort | list_sort | |:--------------:|:-------:|:---------:| | 第一次比較次數 | 999 | 4940 | | 第二次比較次數 | 999 | 4940 | | 第三次比較次數 | 999 | 4940 | | 第四次比較次數 | 999 | 4940 | | 第五次比較次數 | 999 | 4940 | 可以看觀察出，如果在一般串列內容為亂數的情況下， list_sort 的表現通常是優於 timsort 的，但如果在串列有排序好的情況下， timsort 的表現會優於 list_sort 。 ## 在 qtest 提供新的命令 shuffle 一開始會想要在 `queue.c` 中新增洗牌指令，但由於 `qtest.c` 中使用的是 `queue.h` 的函式庫，如果直接將其寫入 `queue.c` 的話需要在 `qtest.c` 中也引用 `queue.c` ，或是將指令宣告寫入 `queue.h` ，但我之前在寫作業時有更動過 `queue.h` 檔，導致其在 github 時會執行失敗，所以我便直接在 `qtest.c` 新增 `q_shuffle` 的函式。實作細節參考 Fisher–Yates shuffle 演算法。比較特別的地方在於使用測試程式時，一開始會找不到引用的函式庫，根據 `driver.py` 判斷是由於缺少了 `#!/usr/bin/env python3` 環境的設置。補上後又跑出了 `usr/bin/env: 'python3\r': No such file or directory`的錯誤，此錯誤是由於 Linux 會吃到此行指令的回車鍵，解決方法為在 vim 中輸入 `:set ff=unix` 後保存並且退出。可以看到各個排列的分布都差不多。 ![shuffle](https://hackmd.io/_uploads/Hy8Ee6BCa.png) ``` Expectation: 41666 Observation: {'1234': 41727, '1243': 41735, '1324': 41928, '1342': 41563, '1423': 41972, '1432': 41933, '2134': 41232, '2143': 41571, '2314': 41819, '2341': 41784, '2413': 41516, '2431': 41691, '3124': 41746, '3142': 41903, '3214': 41407, '3241': 41363, '3412': 41763, '3421': 41531, '4123': 41590, '4132': 41537, '4213': 41755, '4231': 41353, '4312': 41756, '4321': 41825} chi square sum: 22.111073777180437 ``` ## tic-tac-toe 將 [jserv/ttt](https://github.com/jserv/ttt) 專案的程式碼部分整合進 lab0-c ，故將需要編譯的檔案額外寫入 Makefile 中。並在 qtest 中額外加入 do_ttt 指令。 ### 將 mcts 演算法改為定點數計算使條件式判斷使用的演算法為 mcts ，故在程式碼中添加 `#define USE_MCTS` 。 ```c #ifdef USE_RL #include "agents/reinforcement_learning.h" #elif defined(USE_MCTS) #include "agents/mcts.h" #else #include "agents/negamax.h" #endif ``` ```diff @@ -8,6 +8,8 @@ #include <string.h> #include <time.h> +#define USE_MCTS ``` 並且將程式碼中使用 `double` 型態的變數改為 `unsigned long`。這裡我設的 `FRACTIONAL_BITS` 大小為 `20` 。由於在原本的 `uct_score` 中使用到了 `double` 型態的 `sqrt` 和 `log` ，在這邊我創了兩個 `unsigned long` 的函式 `fixed_sqrt` 和 `fixed_taylor_series_log` 來取代浮點數運算的函式。 fixed_sqrt 中使用逼近的方式找尋某數的平方根。 > log(1 + x) = x - x^2/2 + x^3/3 - x^4/4 + x^5/5 - ... fixed_taylor_series_log 則是使用泰勒展開式求得某數的對數。 :::warning 但實作下來造成運算變慢，還需要釐清是哪個函式導致。 ::: > [commit 7bca141](https://github.com/LindaTing0106/lab0-c/commit/7bca1410c0bb410abba49b847fa004c93da9c0aa) ### 引入「電腦 vs. 電腦」的對弈模式在閱讀 [並行程式設計: 排程器原理](https://hackmd.io/@sysprog/concurrency/%2F%40sysprog%2Fconcurrency-sched) 後，因為 [coro](https://github.com/sysprog21/concurrent-programs/tree/master/coro) 對我來說比較容易理解，所以我選擇使用他作為引入程式的協同式多工。一開始我先將需要被共享的資源宣告為全域變數，使的每個 task 都可以對其直接進行存取。 ```c static char table[N_GRIDS]; static int AI1_w = 0; static int AI2_w = 0; ``` 在進入排程指令前，先將 `table` 初始化。之後程式就會依序進到 "AI_1 -> 鍵盤與印出事件處理 -> AI_2 -> 鍵盤與印出事件處理" 此流程，但這邊會有個問題是因為使用協同式分工，每個工作一定都是等上一個工作完成後，才會開始去做，那這樣就導致如果 AI_1 處理太久，使用者就能感覺到有延遲的問題發生。這邊還在想是不是只能使用搶占式分工去避免。而鍵盤事件處理是參考[〈Build Your Own Text Editor〉](https://viewsourcecode.org/snaptoken/kilo/) ，實作出 CTRL + q 會跳出 ttt ，而 CTRL + p 可以暫停 / 開始執行程式。中間遇到的問題為，跳出 ttt 後，在進入會發生 segmentation fault ，經檢查發現問題在排程時的 `static int i` ，將其改為 `int i` 之後即可排除錯誤。 > [commit 88cc8b0](https://github.com/LindaTing0106/lab0-c/commit/88cc8b027b30d2855fdb1d312dd7167e9a77b4e6)