2024q1 Homework1 (lab0)

# 2024q1 Homework1 (lab0) contributed by < `dockyu` > #### Reviewed by `gawei1206` 在 `q_ascend`, `q_descend`, `q_merge` 這三個函式中的回傳值都有要求，請你看一下 `queue.h` 中的敘述 #### Reviewed by `fatcatorange` 部份開發流程可以更多的描述作法，或附上對應的 commit 連結 #### Reviewed by `eleanorLYJ` 是否考慮用亂數以外不同的資料分布的資料作測試？ ## 開發環境 ```bash $ gcc --version gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 20 On-line CPU(s) list: 0-19 Vendor ID: GenuineIntel Model name: 13th Gen Intel(R) Core(TM) i5-13500 CPU family: 6 Model: 191 Thread(s) per core: 2 Core(s) per socket: 14 Socket(s): 1 Stepping: 2 CPU max MHz: 4800.0000 CPU min MHz: 800.0000 BogoMIPS: 4992.00 ``` ## 針對佇列操作的程式碼實作 ### `q_new` :::danger 避免過多的中英文混用，已有明確翻譯詞彙者，例如「鏈結串列」(linked list) 和「佇列」(queue)，就使用該中文詞彙，英文則留給變數名稱、人名，或者缺乏通用翻譯詞彙的場景。 ::: 不能在 function 裡使用 `LIST_HEAD(head)` ，會在返回時被釋放掉 >能否說明上述場景發生在什麼地方，並更詳細的在 commit message 或筆記中描述作法 >[name=fatcatorange] ### `q_free` 不能直接釋放掉 object element_t ，因為 element_t 裡有一個 attribute 是指標，指到的 object 不會被釋放呼叫`queue.h` 的 `q_release_element`，`q_release_element`被用來釋放掉**整個** element :::danger 避免過多的中英文混用，已有明確翻譯詞彙者，例如「鏈結串列」(linked list) 和「佇列」(queue)，就使用該中文詞彙，英文則留給變數名稱、人名，或者缺乏通用翻譯詞彙的場景。 > 收到 ::: ```c static inline void q_release_element(element_t *e) { test_free(e->value); test_free(e); } ``` ### q_insert_head 如果直接將 parameter s assign 給新的 element_t 的 value 會出錯 ```c element_t *elem = malloc(sizeof(element_t)); elem->value = s; ``` ``` cmd> new l = [] cmd> ih hello ERROR: Need to allocate and copy string for new queue element l = [hello] cmd> ih fuc ERROR: Need to allocate and copy string for new queue element l = [fuc �c�\] cmd> ih ddd ERROR: Need to allocate and copy string for new queue element l = [ddd �c�\ ddd] cmd> ih sss ERROR: Need to allocate and copy string for new queue element l = [sss �c�\ sss �c�\] cmd> ih jfjd ERROR: Need to allocate and copy string for new queue element l = [jfjd �c�\ jfjd �c�\ jfjd] Error limit exceeded. Stopping command execution ``` 改為使用`strdup`，會自動複製字串到新的記憶體空間(保證足夠？) ```c char *s_cpy = strdup(s); if (!s_cpy) { free(elem); return false; /* copy s fault */ } elem->value = s_cpy; ``` ### q_swap 兩兩交換可以選擇 1. 和後一個 swap 2. 和前一個 swap 有兩個地方會遇到問題 1. 在第1個向前swap 2. 在最後一個向後交換接著我又發現 1. 在第1個向前 swap -> 在奇數個 node 向前交換 2. 在最後一個向後交換 -> 只會在奇數個 node 的 queue 出現，也就是在奇數個 node 向後交換所以問題永遠都出在奇數，剛好這個問題必須間隔一個再做交換，所以選擇在偶數向前交換可以讓會出問題的 case 不存在 ```c struct list_head *li, *safe; bool even = false; struct list_head *swap, *left, *right; list_for_each_safe (li, safe, head) { if (even) { swap = li->prev; left = swap->prev; right = li->next; /* 交換 li 以及 swap */ } even = !even; } ``` ### q_sort split & compare 參考 [25077667](https://github.com/25077667/lab0-c/blob/dev/queue.c) 一開始想用 indirect pointer 指到 merge 的 list 最尾端，再把下一個接在尾部為了方便 split 我先將 list 從環狀變成一條，每次 split 都將 mid 的 next 設為 NUL ，但是這樣就不能用 list.h 提供的 function 後來參考[25077667](https://github.com/25077667/lab0-c/blob/dev/queue.c)的發現只要將新建的鏈結串列 merge 到原本的鏈結串列上就可以了，但是必須嚴格限制哪一個是新的哪一個是舊的，以免新的鏈結串列的 head object 在 function return 被釋放 merge 永遠比較**新建的鏈結串列的第一個**以及走在舊的 list 上的 li ，直到其中一個佇列走完 ```c while (!list_empty(head_from) && li != head_dest) { struct list_head *first = head_from->next; if (comp_func(li, first)) ``` 只有兩種情況 1. 新的佇列走完 2. 舊的佇列走完 -> 新的佇列現存的都比舊的佇列還大不管是哪種情況，都可以直接將新的接在舊的後面，完成這一次的 merge :::danger 改進你的漢語表達。 > yes teacher ::: >應為 NULL 而不是 NUL >[name=fatcatorange] ### 新增 shuffle 命令在 `qtest.c` 中新增命令 ```c ADD_COMMAND(shuffle, "Shuffle the nodes of the queue by Fisher–Yates shuffle algorithm", ""); ``` 編譯並執行，可以看到多出了一個命令 ```bash shuffle | Shuffle the nodes of the queue by Fisher–Yates shuffle algorithm ``` 因為 shuffle 對佇列的處理和 descend 幾乎一樣，都不接受參數並且只對兩個元素以上的佇列有效，所以我直接複製 do_descend 函式的內容到 do_shuffle 函式，並刪除判斷結果的正確性程式碼，留下 q_show 因為我希望 shuffle 結束後可以顯示佇列 ```diff - bool ok = true; - - cnt = current->size; - if (current->size) { - for (struct list_head *cur_l = current->q->next; - cur_l != current->q && --cnt; cur_l = cur_l->next) { - element_t *item, *next_item; - item = list_entry(cur_l, element_t, list); - next_item = list_entry(cur_l->next, element_t, list); - if (strcmp(item->value, next_item->value) < 0) { - report(1, - "ERROR: At least one node violated the ordering rule"); - ok = false; - break; - } - } - } q_show(3); - return ok && !error_check(); + return !error_check(); ``` 在 `queue.h` 中宣告函式原型 ```c int q_shuffle(struct list_head *head); ``` 在 `queue.c` 中實作該函式 ```c int q_shuffle(struct list_head *head) { return 0; } ``` 在 `qtest.c` 中將原本複製過來的函式換為 q_shuffle ```diff - current->size = q_descend(current->q); + q_shuffle(current->q); ``` 此時編譯後就可以使用 shuffle 命令了 ```bash cmd> shuffle Warning: Calling ascend on null queue cmd> new l = [] cmd> shuffle Warning: Calling ascend on empty queue l = [] cmd> ih hello l = [hello] cmd> shuffle Warning: Calling ascend on single node ``` 可以正常執行不會報錯 ```bash cmd> show Current queue ID: 0 l = [dd t h] cmd> shuffle l = [dd t h] ``` --- ##### 在 git commit 時遇到問題 ```bash /usr/bin/sha1sum: WARNING: 1 computed checksum did NOT match [!] You are not allowed to change the header file queue.h or list.h ``` 但是如果把這行刪掉編譯就會出錯 ```bash $ make test CC qtest.o qtest.c: In function ‘do_shuffle’: qtest.c:900:9: error: implicit declaration of function ‘q_shuffle’; did you mean ‘do_shuffle’? [-Werror=implicit-function-declaration] 900 | q_shuffle(current->q); | ^~~~~~~~~ | do_shuffle cc1: all warnings being treated as errors make: *** [Makefile:54: qtest.o] Error 1 ``` <s>參考[這位同學的作業](https://github.com/wilicw/lab0-c/blob/260ecfe7eff6e31ebb0c23d6dbd89cae245f4832/random.h#L26)，發現 `qtest.c` 有引入 `random.h` 而裡面有一個 random_shuffle function </s>，跟我的問題不相關參考 [millaker](https://github.com/millaker/lab0-c)，發現它沒有在 `queue.h` 宣告函式原型，而是直接在 `qtest.c` 宣告函式原型，並在 `queue.c` 實作，我還不知道這是怎麼運作的，但是確實是可以編譯成功， shuffle 命令也可以使用 :::danger 注意看作業說明，學員不該變更 `queue.h`。 ::: ###### qtest.c ```diff #include "queue.h" + void q_shuffle(struct list_head *); ``` ###### queue.c ```c void q_shuffle(struct list_head *head) { return; } ``` 使用 `random.c` 中的 randombytes function 產生隨機數 k ```c randombytes((uint8_t *)&rand_val, sizeof(rand_val)); ``` 將 k 個移至佇列尾部 ```c k = get_random_range_1_to_k(size--); list_for_each(li, head) if (--k) break; list_move_tail(li, head); ``` #### 執行結果 ```bash cmd> show Current queue ID: 0 l = [lk sss nnn dd jjj ccc] cmd> shuffle l = [dd lk nnn jjj ccc sss] ``` --- #### shuffle 分析測試程式發現只有不到10個結果 ```bash python3 ./scripts/shuffle_test.py Expectation: 41666 Observation: {'1234': 0, '1243': 0, '1324': 0, '1342': 0, '1423': 0, '1432': 0, '2134': 2, '2143': 0, '2314': 0, '2341': 0, '2413': 0, '2431': 0, '3124': 0, '3142': 1, '3214': 0, '3241': 0, '3412': 0, '3421': 0, '4123': 0, '4132': 1, '4213': 0, '4231': 0, '4312': 1, '4321': 0} chi square sum: 999974.0001680028 ``` 直接手動洗牌也只能做5次，<s>猜測試洗牌過程中有產生垃圾，被檢測到就不給做了</s> ```bash cmd> ih RAND 5 l = [riurm pjaksxwxh egydoac kctcx yhsrqxjo] cmd> shuffle l = [kctcx egydoac riurm yhsrqxjo pjaksxwxh] cmd> shuffle l = [yhsrqxjo egydoac riurm pjaksxwxh kctcx] cmd> shuffle l = [egydoac riurm pjaksxwxh kctcx yhsrqxjo] cmd> shuffle l = [riurm pjaksxwxh kctcx yhsrqxjo egydoac] cmd> shuffle l = [egydoac pjaksxwxh yhsrqxjo kctcx riurm] Error limit exceeded. Stopping command execution ``` 發現問題出在 do_shuffle 的最後的 error_check ，因為 error_check 回傳的是有沒有錯誤發生 ```diff - return error_check(); + return !error_check(); ``` ![Figure_1](https://hackmd.io/_uploads/SycN9ZMaT.png) ```text Expectation: 41666 Observation: {'1234': 41968, '1243': 41450, '1324': 41543, '1342': 41899, '1423': 41534, '1432': 41809, '2134': 41628, '2143': 41982, '2314': 41785, '2341': 41732, '2413': 41121, '2431': 41815, '3124': 41598, '3142': 41673, '3214': 41664, '3241': 41704, '3412': 41508, '3421': 41849, '4123': 41445, '4132': 41683, '4213': 42126, '4231': 41409, '4312': 41605, '4321': 41470} chi square sum: 26.824845197523164 ``` 所有可能發生的次數差不多 ### Valgrind 分析記憶體問題會出現很多下面的 Invalid read ```bash ==156179== Invalid read of size 8 ``` 發現是下面這行引起的 ```c memcpy(sp, del_elem->value, bufsize); ``` ### 效能分析 :::danger 使用 perf 時一直跳出下面這個錯誤 ```bash ┌─Error:─────────────────────────────┐ │The cycles:P event is not supported.│ │ │ │ │ │Press any key... │ └────────────────────────────────────┘ ``` 參考 [ perf top error: "The cycles:P event is not supported" on 13th gen / raptor lake ](https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2043320) ，發現太新的 CPU perf 沒辦法正確執行 > 太好了，你找到可以貢獻程式碼到 Linux 核心的機會。 :notes: jserv ::: --- 撰寫 shell script 流程可以參考下圖，因為 qtest 可以用 `-f` 參數指定一組命令作為輸入，所以可以將自己寫的 sort 以及 linux 核心的 list_sort 命令分別寫在不同的 cmd 檔案，用 shell script 使用 perf 測量效能，我寫了多個 cmd 檔案，並且迭代地用 perf 測量效能並產生對應的報告最後得到的結果如下當隨機產生 1000000 個 node 時，效能並沒有差很多 :::warning 使用清晰的標註方式 ::: - [ ] <s>我的</s> sort ``` Performance counter stats for './qtest -f ./perf_test/test-sort-1000000.cmd' (5 runs): 1,1153,7322 cache-misses # 52.34% of all cache refs ( +- 2.05% ) 7,3736,1882 branches ( +- 0.03% ) 49,1900,3385 instructions # 0.74 insn per cycle ( +- 0.02% ) 3 context-switches ( +- 19.44% ) 2,1309,0573 cache-references ( +- 0.42% ) 66,7067,1349 cycles ( +- 1.00% ) 1.4348 +- 0.0183 seconds time elapsed ( +- 1.28% ) ``` - [ ] list_sort ``` Performance counter stats for './qtest -f ./perf_test/test-linux_sort-1000000.cmd' (5 runs): 9417,4523 cache-misses # 55.08% of all cache refs ( +- 1.44% ) 6,9026,4605 branches ( +- 0.01% ) 47,5297,0306 instructions # 0.79 insn per cycle ( +- 0.01% ) 4 context-switches ( +- 15.81% ) 1,7098,2842 cache-references ( +- 0.14% ) 60,2361,4972 cycles ( +- 0.83% ) 1.2957 +- 0.0135 seconds time elapsed ( +- 1.05% ) ``` ```graphviz digraph G { // 設定圖形的整體方向為從左到右 rankdir=LR; // 定義節點形狀 node [shape=ellipse]; sh [label="sh"]; perf [label="perf"]; qtest [label="qtest"]; // 特別為qtest設定長方形形狀 report [shape=box, label="report_file"]; // 特別為report設定長方形形狀 // 定義上方的cmd節點 cmd1 [shape=box,label="cmd"]; cmd2 [shape=box,label="cmd"]; // 建立節點間的連接 sh -> perf -> qtest -> report; // cmd節點指向qtest cmd1 -> qtest; cmd2 -> qtest; // 確保sh, perf, qtest, 和 report水平排列 { rank=same; sh; perf; qtest; report; } } ``` --- ## ttt ### hlist 程式碼閱讀 [linux/list.h L992](https://github.com/torvalds/linux/blob/480e035fc4c714fb5536e64ab9db04fedc89e910/include/linux/list.h#L932)開始是與 hlist 相關的程式碼 ```c /* * Double linked lists with a single pointer list head. * Mostly useful for hash tables where the two pointer list head is * too wasteful. * You lose the ability to access the tail in O(1). */ ``` :::danger 不理解就說不理解，沒有「不太理解」這回事。不要將自己的專業成長交給大語言模型。為何你不做實驗來確認呢？ ::: 最一開始就有一段註解，說明在使用雜湊表時，兩個指標的 list head 太過浪費，所以 hlist 的 head 只有一個指標，也就是單鏈，起初我<s>不太理解</s> 一個鏈結串列多一個指標為什麼會被認為是浪費，閱讀[Linux 核心的 hash table 實作](https://hackmd.io/@sysprog/linux-hashtable)並且<s>詢問 ChatGPT</s> 後得出結論， hlist 的意思是 hash list ，也就是雜湊表的一個 bucket 對應到一個 hlist ，而雜湊表預期 key 很多，但是每一個 bucket 裡存放的節點很少(才可以在接近常數時間被搜尋到)，所以為了一個 bucket 多使用一個指標是"**浪費**"的 h_list 的結構定義在[linux/types.h L197](https://github.com/torvalds/linux/blob/480e035fc4c714fb5536e64ab9db04fedc89e910/include/linux/types.h#L197) ```c struct hlist_head { struct hlist_node *first; }; struct hlist_node { struct hlist_node *next, **pprev; }; ``` pprev 使用間接指標讓整個鏈結串列不存在空指標，所以不用考慮特殊情況，讓程式碼更加簡潔單獨的一個 hlist 鏈結串列長相如下(參考[Linux 核心的 hash table 實作](https://hackmd.io/@sysprog/linux-hashtable)) ```graphviz digraph G { rankdir = LR; splines = false; node[shape = "record"] list_head[label = "<m>list_head | <n>first"] node_1[label = "<m>hlist_node_1 | {<p>pprev | <n>next}", group = list]; node_2[label = "<m>hlist_node_2 | {<p>pprev | <n>next}", group = list]; node_3[label = "<m>hlist_node_3 | {<p>pprev | <n>next}", group = list]; NULL[shape = plaintext, label = "NULL", group = list] {rank = "min" list_head} list_head -> node_1 -> node_2 -> node_3[ weight = 10, style = invis ] list_head:n -> node_1:m; node_1:p -> list_head:n; node_1:n -> node_2:m; node_2:p -> node_1:n; node_2:n -> node_3:m; node_3:p -> node_2:n; node_3:n -> NULL; } ``` --- ## 整合 hlist 相關功能 1. 建立 hlist.h 2. 因為我的 [lab0-c](https://github.com/dockyu/lab0-c) 專案沒有 types.h ，所以要手動將 hlist 的資料結構放進 hlist.h [commit](https://github.com/dockyu/lab0-c/commit/74cbaa8d799e3873f708d852acf7161ef2e7b926) 3. 將 [linux/list.h L992](https://github.com/torvalds/linux/blob/480e035fc4c714fb5536e64ab9db04fedc89e910/include/linux/list.h#L932)開始是與 hlist 相關的程式碼搬進 hlist.h [commit](https://github.com/dockyu/lab0-c/commit/81e058ac9f5b632a1f8983e6c7e9f2d9a5ab17bd) --- ##### git hook 不認識 hlist git commit 時因為有提到 hlist 這個字，被 git hook 擋住了 ```bash $ git commit Following files need to be cleaned up: hlist.h Relocate hlist settings to hlist.h [line 1] - Possible misspelled word(s): hlist How to Write a Git Commit Message: https://chris.beams.io/posts/git-commit/ Proceed with commit? [e/n/?] ``` 發現錯誤來自 commit-msg.hook ```bash <lab0-c path>/scripts$ grep -r "Possible misspelled word(s):" commit-msg.hook: add_warning LINE_NUMBER "Possible misspelled word(s): $MISSPELLED_WORDS" ``` 再進一步找到是因為字典檔 `aspell-pws` 沒有 hlist 這個字 ``` ASPELL=$(which aspell) if [ $? -ne 0 ]; then echo "Aspell not installed - unable to check spelling" else LINE_NUMBER=1 MISSPELLED_WORDS=$(echo "$COMMIT_MSG_LINES[LINE_NUMBER]" | $ASPELL --lang=en --list --home-dir=scripts --personal=aspell-pws) if [ -n "$MISSPELLED_WORDS" ]; then add_warning LINE_NUMBER "Possible misspelled word(s): $MISSPELLED_WORDS" fi fi ``` 將 hlist 加入字典檔 ```bash $ echo "hlist" >> aspell-pws ``` :::danger 你本來就不該在標題中書寫 `hlist`，後者只是輔助 hash table 的操作，是 List API 的一部分。 > 但是在 linux kernel 中也有一些 commit 會在標題提及 hlist ，例如 [dentry: switch the lists of children to hlist](https://github.com/torvalds/linux/commit/da549bdd15c295c24b2ee7ffe7ad0f3877fa8a87) ，如果我只是單純的建立一個 hlist.h 檔案，並在裡面定義了 hlist_head hlist_node 型別，該怎麼撰寫標題比較合適？ > > 可書寫 "Add header in preparation for hash table"，本次作業的重點是強化學員的英語書寫，包含用精簡的詞彙來表達你的動機和舉措。Linux 核心的開發者可能會更動到 List API 的實作，因此提及其內部行為是合理的，但以本系列作業來說，要求學員撰寫「符合情境」且「簡單明確」(例如該避免一堆 "Update queue.c" 的標題) 是當初設定的重要練習。 :notes: jserv ::: --- ### 測試 hlist.h 是否可用為了確定 hlist.h 不會在編譯時產生錯誤，我建了一個 hlist.c 引入 hlist.h，並嘗試編譯遇到了一些錯誤 ##### 無法識別 bool ```bash hlist.h: At top level: hlist.h:133:15: error: unknown type name ‘bool’ 133 | static inline bool hlist_fake(struct hlist_node *h) | ^~~~ hlist.h:146:15: error: unknown type name ‘bool’ 146 | static inline bool | ^~~~ ``` 這個錯誤是因為無法識別 bool 型別，引入 <stdbool.h> 就可以了 --- ##### `WRITE_ONCE` `READ_ONCE` 未定義 ```bash hlist.h: In function ‘hlist_unhashed_lockless’: hlist.h:89:17: error: implicit declaration of function ‘READ_ONCE’ [-Werror=implicit-function-declaration] 89 | return !READ_ONCE(h->pprev); | ^~~~~~~~~ hlist.h: In function ‘__hlist_del’: hlist.h:106:9: error: implicit declaration of function ‘WRITE_ONCE’ [-Werror=implicit-function-declaration] 106 | WRITE_ONCE(*pprev, next); | ^~~~~~~~~~ ``` 在 Linux Kernel 中，READ_ONCE 和 WRITE_ONCE 是用於防止編譯器優化可能會改變變數訪問順序的巨集，我在 [linux/tools/virtio/linux/compiler.h](https://github.com/torvalds/linux/blob/480e035fc4c714fb5536e64ab9db04fedc89e910/tools/virtio/linux/compiler.h) 中找到這兩個巨集的定義，將這個整個檔案加入 hlist.h 就可以解決這個錯誤 --- ##### `LIST_POISON1` 以及 `LIST_POISON2` 未定義 ```bash hlist.h: In function ‘hlist_del’: hlist.h:132:19: error: ‘LIST_POISON1’ undeclared (first use in this function) 132 | n->next = LIST_POISON1; | ^~~~~~~~~~~~ hlist.h:132:19: note: each undeclared identifier is reported only once for each function it appears in hlist.h:133:20: error: ‘LIST_POISON2’ undeclared (first use in this function) 133 | n->pprev = LIST_POISON2; | ``` <s>LIST_POISON1 以及 LIST_POISON2 是為了避免 dangling pointer 所以要將指標指到一個明現不合法的地址，參考 [linux/scripts/mod/list.h L25](https://github.com/torvalds/linux/blob/480e035fc4c714fb5536e64ab9db04fedc89e910/scripts/mod/list.h#L25) 的作法就可以解決問題雖然目前專案沒有其他地方用到 LIST_POISON ，但是為了避免之後重複定義，我在 LIST_POISON 的定義前後加上 Include Guards ```c #ifndef HLIST_H #define HLIST_H #define LIST_POISON1 ((void *) 0x100) #define LIST_POISON2 ((void *) 0x122) #endif ``` </s> :::danger `LIST_POISON1` 和 `LIST_POISON2` 的數值很關鍵，這是 Linux 核心特別保留給例外處理的地址，你移植到使用者空間時，需要改用其他手法。 $\to$ [課堂問答](https://hackmd.io/@sysprog/H1QORp-h6) > 改為特定地址 ```c #ifndef LIST_POISONING #define LIST_POISONING #define LIST_POISON1 ((void *) 0x00100100) #define LIST_POISON2 ((void *) 0x00200200) #endif ``` ::: --- zobrist.h 裡有一行 ```c #include "list.h" ``` 因為我已經將 linux 裡的 list.h 拆成 lab0-c/list.h 和 hlist.h 所以我在下面有新增一行 ```diff + #include "hlist.h" ``` 編譯 zobrist.c 時遇到問題 ```bash zobrist.c: In function ‘zobrist_get’: zobrist.c:38:42: error: macro "hlist_for_each_entry" passed 4 arguments, but takes just 3 38 | zobrist_entry_t) | ^ In file included from zobrist.h:7, from zobrist.c:5: hlist.h:303: note: macro "hlist_for_each_entry" defined here 303 | #define hlist_for_each_entry(pos, head, member) \ | zobrist.c:37:5: error: ‘hlist_for_each_entry’ undeclared (first use in this function) 37 | hlist_for_each_entry (entry, &hash_table[hash_key], ht_list, | ^~~~~~~~~~~~~~~~~~~~ zobrist.c:37:5: note: each undeclared identifier is reported only once for each function it appears in zobrist.c:37:25: error: expected ‘;’ before ‘{’ token 37 | hlist_for_each_entry (entry, &hash_table[hash_key], ht_list, | ^ | ; ...... 40 | { | ~ zobrist.c:33:22: error: unused variable ‘entry’ [-Werror=unused-variable] 33 | zobrist_entry_t *entry = NULL; | ^~~~~ zobrist.c:45:1: error: control reaches end of non-void function [-Werror=return-type] 45 | } | ^ cc1: all warnings being treated as errors make: *** [Makefile:60: zobrist.o] Error 1 ``` 查看 ttt/list.h 後修改 `hlist_entry_safe` 和 `hlist_for_each_entry` ```c #ifdef __HAVE_TYPEOF #define hlist_entry_safe(ptr, type, member) \ ({ \ typeof(ptr) ____ptr = (ptr); \ ____ptr ? hlist_entry(____ptr, type, member) : NULL; \ }) #else #define hlist_entry_safe(ptr, type, member) \ (ptr) ? hlist_entry(ptr, type, member) : NULL #endif #ifdef __HAVE_TYPEOF #define hlist_for_each_entry(pos, head, member) \ for (pos = hlist_entry_safe((head)->first, typeof(*(pos)), member); pos; \ pos = hlist_entry_safe((pos)->member.next, typeof(*(pos)), member)) #else #define hlist_for_each_entry(pos, head, member, type) \ for (pos = hlist_entry_safe((head)->first, type, member); pos; \ pos = hlist_entry_safe((pos)->member.next, type, member)) #endif ``` 並且根本不需要 lab0-c/list.h 因為 zobrist.c 只有用到 hlist_head --- 我為 ttt 的功能建立一個 TTT_OBJS 方便管理 ```makefile TTT_OBJS := game.o mt19937-64.o zobrist.o agents/negamax.o OBJS := qtest.o report.o console.o harness.o queue.o list_sort.o \ random.o dudect/constant.o dudect/fixture.o dudect/ttest.o \ shannon_entropy.o \ linenoise.o web.o \ $(TTT_OBJS) ``` 但是在 make 的時候一直<s>出錯</s> 顯示找不到依賴檔 `.agents/negamax.o.d` :::danger 是誰「出錯」？主詞講清楚。 > 已修改 ::: ```bash agents/negamax.c:101:1: fatal error: opening dependency file .agents/negamax.o.d: No such file or directory 101 | } | ^ ``` :::danger directory 的翻譯是「目錄」，而非「資料夾」(folder) > 已修改 ::: 這是因為 Makefile 的規則直接在依賴名稱前面加上 `.` 想要建立一個隱藏檔案，但是我的 negamax.c 是放在 agents <s>資料夾</s> 目錄裡的，直接加上的方法太過暴力，所以我改成判斷檔案的目錄，並在同一個目錄裡建立依賴檔，當然 clean 的規則也要一起修改 ```diff - deps := $(OBJS:%.o=.%.o.d) + deps := $(OBJS:%.o=$(dir %).$(notdir %).d) %.o: %.c - @mkdir -p .$(DUT_DIR) - $(VECHO) " CC\t$@\n" - $(Q)$(CC) -o $@ $(CFLAGS) -c -MMD -MF .$@.d $< + $(VECHO) " CC\t$@\n" + @mkdir -p $(@D) + $(Q)$(CC) $(CFLAGS) -c $< -o $@ -MMD -MF $(@D)/.$(@F:.o=.d) clean: rm -f $(OBJS) $(deps) *~ qtest /tmp/qtest.* rm -rf .$(DUT_DIR) rm -rf *.dSYM + find . -type f -name ".*.d" -delete (cd traces; rm -f *~) ``` ttt 命令已經可以在 qtest 中使用 ```bash $ ./qtest cmd> ttt 1 | 2 | 3 | 4 | ---+------------- A B C D X> a2 1 | 2 | × 3 | ○ 4 | ---+------------- A B C D ``` --- git commit 時被 git hook 抓到不符合風格的程式碼，還有一個語法錯誤 ```bash $git commit hlist.h:121:13: error: Syntax Error: AST broken, 'h' doesn't have a parent. [internalAstError] return !READ_ONCE(h->pprev); ^ game.c:74:28: style: Parameter 'table' can be declared with const [constParameter] int *available_moves(char *table) ^ mt19937-64.c:83:9: style: The scope of the variable 'i' can be reduced. [variableScope] int i; ^ mt19937-64.c:85:21: style: The scope of the variable 'mag01' can be reduced. [variableScope] static uint64_t mag01[2] = {0ULL, MATRIX_A}; ^ mt19937-64.c:85:21: style: Variable 'mag01' can be declared with const [constVariable] static uint64_t mag01[2] = {0ULL, MATRIX_A}; ^ Fail to pass static analysis. ``` 我不知道為什麼會引起 AST broken ，我會去找出原因如果將 table 的型別改成 `const char *` 確實可以通過檢查，但是編譯卻會失敗 mt19937-64.c 內的程式碼是我覺得不應該更動 <s>所以我決定先用 `--no-verify` flag 不檢查就提交 ```bash $ git commit --no-verify ``` </s> 改為用 suppress 繞過特定檢查 ```hook --suppress=internalAstError:hlist.h \ --suppress=constParameter:game.c \ --suppress=variableScope:mt19937-64.c \ --suppress=constVariable:mt19937-64.c" ``` --- #### ttt 電腦對電腦因為目前只有兩個模式 1. 人對電腦 2. 電腦對電腦，我覺得用枚舉來表示能夠增加可讀性 ```c enum game_mode { PVE, EVE }; ``` 並且增加一個 ai_2 ，在 EVE 的情況下用 ai 和 ai_2 對戰 ```diff char turn = 'X'; char ai = 'O'; + char ai_2 = 'X'; ``` ### 定點數參考 [Fixedpoint versions of ln, pow, exp, sqrt and division](https://gist.github.com/Madsy/1088393) 的寫法參考 [chmike/fpsqrt](https://github.com/chmike/fpsqrt/blob/master/fpsqrt.c) 的 sqrt 一開始改成用定點數之後 AI 直接變成智障，一直下最前面的格子嘗試在正常遊戲過程中將每次計算 uct 時的 n_total n_visits score 紀錄在 log 檔裡，用這些資料測試定點數版本的 uct 和原版差多少，然後在慢慢修正錯誤的地方發現第一個錯誤是 sqrt 的部份，接受整數型別的參數但是我卻勿把定點數傳入造成最後結果差了 190 倍現在的 uct 看起來比較正常，但還是有明顯誤差，**甚至有些大小關係跟浮點數版本的相反** ###### 擷取其中幾筆 ```bash n_total: 308, n_visits: 11, score: 5.000000000000000 fp n_total: 20185088, fp n_visits: 720896, fp score: 327680 UCT score: 1.475249292187409 fp UCT score: 1.996109008789062 n_total: 308, n_visits: 21, score: 15.000000000000000 fp n_total: 20185088, fp n_visits: 1376256, fp score: 983040 UCT score: 1.453016916317026 fp UCT score: 1.725845336914062 n_total: 308, n_visits: 13, score: 7.000000000000000 fp n_total: 20185088, fp n_visits: 851968, fp score: 458752 UCT score: 1.477372510154364 fp UCT score: 1.918090820312500 ``` 找到第二個錯誤，原本的 uct 計算中使用到 EXPLORATION_FACTOR ，這是一個浮點數 $\sqrt[]{2}$ ，我直接將它傳入定點數的 sqrt function 但是它接受的是定點數所以會出錯我將 uct 的部分從專案抽離，建立一個專門用來測試定點數計算的專案 [fixed-point-test](https://github.com/dockyu/fixed-point-test) ，下面的數據是在此 [commit](https://github.com/dockyu/fixed-point-test/commit/1f374ce80caee131151c13eb208213acb0c79269) 測試的結果，和浮點數運算結果的差距已經非常小了，但是目前我的定點樹相關的程式碼極度雜亂，所以下一步就是要修改程式碼，等到程式碼變整齊之後再整合進 ttt ```bash n_total: 308, n_visits: 13, score: 7.000000000000000 fp n_total: 20185088, fp n_visits: 851968, fp score: 458752 score / n_visits Float: 0.538461538461538 Fixed Point: 0.538452148437500 log(n_total) Float: 5.73010 Fixed Point: 5.73010 log(n_total) / n_visits Float: 0.44078 Fixed Point: 0.44077 sqrt(log(n_total) / n_visits) Float: 0.66391 Fixed Point: 0.66389 EXPLORATION_FACTOR Float: 1.41421 Fixed Point: 1.41420 EXPLORATION_FACTOR * sqrt(log(n_total) / n_visits) Float: 0.93891 Fixed Point: 0.93887 score / n_visits + EXPLORATION_FACTOR * sqrt(log(n_total) / n_visits) Float: 1.47737 Fixed Point: 1.47733 ``` 此 [commit](https://github.com/dockyu/lab0-c/commit/3f51788aea7d00092cd66676e795a4158a7dc332) 中，井字遊戲的 AI 已經可以用定點數取代浮點數運算，並且他的行為已經"不笨"，可以阻止對手獲勝，並在我故意放水時讓自己獲勝 :::info 我發現兩個 AI 對戰一定會下成這樣，難道是所謂的必勝法? ```bash A B C D 1 | 2 | × 3 | ○ × 4 | × ○ × ○ ---+------------- A B C D X won! Moves: A4 -> D4 -> A2 -> A3 -> C4 -> B4 -> B3 ``` ::: ### git rebase ```bash $ git remote add upstream https://github.com/sysprog21/lab0-c.git $ git fetch upstream $ git merge upstream/master Auto-merging scripts/pre-commit.hook CONFLICT (content): Merge conflict in scripts/pre-commit.hook Automatic merge failed; fix conflicts and then commit the result. ``` ###### pre-commit.hook ``` <<<<<<< HEAD --suppress=internalAstError:hlist.h \ --suppress=constParameter:game.c \ --suppress=nullPointer:agents/mcts.c \ --suppress=variableScope:agents/mcts.c \ --suppress=variableScope:mt19937-64.c \ --suppress=constVariable:mt19937-64.c" ======= --suppress=checkLevelNormal:log2_lshift16.h" >>>>>>> upstream/master ``` 修改為 ``` --suppress=constParameterPointer:console.c \ --suppress=internalAstError:hlist.h \ --suppress=constParameter:game.c \ --suppress=nullPointer:agents/mcts.c \ --suppress=variableScope:agents/mcts.c \ --suppress=variableScope:mt19937-64.c \ --suppress=constVariable:mt19937-64.c \ --suppress=checkLevelNormal:log2_lshift16.h" CPPCHECK_OPTS="-I. --enable=all --error-exitcode=1 --force $CPPCHECK_suppresses $CPPCHECK_unmatched ." ``` ### 使用 coroutine 因為原本的 lab0 太多東西了，我將最新版本的 linenoise 以及 lab0 的命令界面實做的部份抽出，做了一個較輕量的命令行程式 [cli](https://github.com/dockyu/cli) (刪除檢查的部份)用來測試 coroutine 的功能閱讀 [coro.c](https://github.com/sysprog21/concurrent-programs/tree/master/coro) 之後我總結出要用 coroutine 的方式撰寫程式需要做到以下幾點 1. 一個 task 對應到一個 function 2. function 開頭是設定任務的 context，並回到 scheduler 3. scheduler 負責初始化 task ，也就是執行所有 task 的 function 4. scheduler 初始化完成後就可以開始 coroutine 5. task switch 時 task 會暫時從排程中移出，直到做完一次再決定要不要加入排程(由 task 自己判斷) 我將 ttt 劃分為多個任務 1. ai1 下一次 ai 2. 判斷勝利1 (輸出遊戲過程) 3. ai2 下一次 4. 判斷勝利2 (輸出遊戲過程) 5. 畫棋盤狀態 6. 監聽鍵盤事件不知道為什麼出錯了， task->name 是錯的，而且會一直變，最後變成 EVE ，但是 EVE 甚至不是一個字串 ```c enum game_mode { PVE, EVE }; ``` ```bash cmd> ttt EVE 5 Initial task1 Initial task2 Initial task3 Initial task4 Initial task5 Initial task6 ai_1_task resume, task name: H��]�] check_win_task resume, task name: EVE ai_2_task resume, task name: EVE ``` 如果有 printf 就可以正常執行，沒有 printf 就會出現下面的錯誤 ```c if (!list_empty(&tasklist)) { printf("task_switch 1\n"); struct task *t = list_first_entry(&tasklist, struct task, list); // printf("taskname=%s\n", t->task_name); list_del(&t->list); cur_task = t; longjmp(t->env, 1); } ``` ```bash *** longjmp causes uninitialized stack frame ***: terminated Aborted (core dumped) ``` 考關閉編譯器優化解決 ```diff - CFLAGS = -O1 -g -Wall -I. + CFLAGS = -O0 -g -Wall -I. ``` 第一次 coroutine ttt 會成功，但是當執行完一次 coro_ttt 之後，在執行第二次會 Segmentation fault commit : [d8209d](https://github.com/dockyu/cli/commit/d8209d1aa9f1c91d8e3571e30cf3b2daabd637c3) ```bash cmd> coro_ttt Segmentation fault (core dumped) ``` 找到問題，以下程式碼的 i 若是超出 tasks 的範圍會出現 segmentation fault ，起因是當我執行下一次程式時 i 不會被歸零 ```c tasks[i++](&arg); ``` ```diff +static int initial_task_index; void schedule(void) { - static int i; - + int i = initial_task_index; setjmp(sched); ``` #### 監聽鍵盤事件 ###### 修改終端機設定 1. c_lflag 中的 ECHO (預設是啟用)控制鍵盤的輸入要不要顯示在終端機，我希望 ctrl-P 不要顯示，所以要禁用 2. c_lflag 中的 ICANON (預設是啟用)稱為 canonical mode ，表示緩衝區要一行一行 (有換行)才被傳給程式，我希望按下 ctrl-P 馬上就有反應，不希望還要按 Enter ，所以要禁用 3. c_lflag 中的 IXON (預設是啟用)對 ctrl-Q 做暫停以及其他組合鍵，不會被傳給程式，因為我需要對 ctrl-P ctrl-Q 做自定義處理一定要傳給程式，所以要禁用 ```c raw.c_lflag &= ~(ECHO | ICANON); raw.c_iflag &= ~(IXON); ``` 設定 Non-Blocking Mode 在讀取時不會因為沒有數據可以讀而卡在那邊等待，當我沒有按任何組合鍵時， keyboard_listen 不應該卡住而是直接 task_switch ```c fcntl(STDIN_FILENO, F_SETFL, fcntl(STDIN_FILENO, F_GETFL) | O_NONBLOCK); ``` ```c /* 目前設定 */ fcntl(STDIN_FILENO, F_GETFL) /* 目前設定 + Non-Blocking */ fcntl(STDIN_FILENO, F_GETFL) | O_NONBLOCK /* 儲存設定 x */ fcntl(STDIN_FILENO, F_SETFL, <setting>); ```