2025q1 Homework1 (lab0)

# 2025q1 Homework1 (lab0) contributed by < [`EricccTaiwan`](https://github.com/EricccTaiwan) > {%hackmd NrmQUGbRQWemgwPfhzXj6g %} ## 開發環境 ```shell $ gcc --version gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 Copyright (C) 2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz CPU family: 6 Model: 142 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 11 CPU(s) scaling MHz: 92% CPU max MHz: 3900.0000 CPU min MHz: 400.0000 BogoMIPS: 3600.00 Virtualization: VT-x L1d cache: 128 KiB L1i cache: 128 KiB L2 cache: 1 MiB L3 cache: 6 MiB NUMA node0 CPU(s): 0-7 ``` 在作業開發中，我使用 [oh-my-zsh](https://ohmyz.sh/) 做為主要 Shell 環境，有許多實用的功能(e.g. `zsh-autosuggestions`)和美觀的 UI 界面！ ## 實作 queue.c ### `q_new` > Commit [7446f52](https://github.com/EricccTaiwan/lab0-c/commit/7446f527cfa29d62dda76eb48c87cd888d042314) 宣告一個指標 `new_qhead`，其型別為 `struct list_head *`，並使用 `malloc` 分配 `sizeof(struct list_head)` 大小的記憶體空間。使用 `INIT_LIST_HEAD` 初始化新分配的記憶體，確保其被正確設置為一個空的鏈結串列。 ### `q_free` > Commit [0e45a1f](https://github.com/EricccTaiwan/lab0-c/commit/0e45a1f0ba34deae2c81360b4c784ad70cf0f816) 用 `list_for_each_safe` 走訪鏈結串列，並依序從鏈結串列中移除和釋放記憶體空間。起初沒有 `list_del` 下，`make test`是沒有問題的 (現在也是沒有問題)，當下沒有多想，越往下寫越對於 `free` 感到疑惑，`free` 可以在 `harness.h`中看到被巨集定義成 `test_free` ，點進去看能發現 `test_free` 其實是在釋放 `block_element_t` ( `malloc` 也是分配 `block_element_t` )，可以看到下方程式碼， `test_free` 是將欲釋放的 `block_element_t` 移出其所在的鏈結串列。但我這邊的疑惑是，`free` 並沒有將 `list_head` 節點移出其所在的佇列中，為什麼能還順利的釋放？ TBD... ```c /* Unlink from list */ block_element_t *bn = b->next; block_element_t *bp = b->prev; if (bp) bp->next = bn; else allocated = bn; if (bn) bn->prev = bp; free(b); ``` ```diff void q_free(struct list_head *head) { if (!head) return; struct list_head *cur, *tmp; list_for_each_safe (cur, tmp, head) { + list_del(cur); free(list_entry(cur, element_t, list)->value); free(list_entry(cur, element_t, list)); } free(head); } ``` ### `q_insert_head` & `q_remove_head` > Commit [a4cdde9 ](https://github.com/EricccTaiwan/lab0-c/commit/a4cdde975857b9bd544924fd61ad90beb8dd6542) 使用 `malloc` 分配大小為 `element_t` 的記憶體空間，並利用 C 標準函式庫的 `strdup` 複製字串，將其存入 `element_t` 的 `value` 成員中。接著，透過 `INIT_LIST_HEAD` 初始化鏈結節點，確保其鏈結狀態正確，最後使用 `list_add` 或 `list_add_tail`，分別將新建立的 `element` 插入至佇列的首端或尾端。 ### `q_remove_head` & `q_remove_tail` > Commit [2f3213a](https://github.com/EricccTaiwan/lab0-c/commit/2f3213a3c912f2a8bca86720ee1d1bf9e6c769f2) 使用 C 標準函式庫的 `snprintf` 來安全地將刪除的 `element_t->value` 複製到 `sp` ，避免緩衝區溢位 (buffer overflow)，並確保字串正確終止 (`\0`)。 ### `q_size` > Commit [4769b2c](https://github.com/EricccTaiwan/lab0-c/commit/4769b2c07d3070e3804a128ca0e61535de5ebb18) 利用 `list_for_each` 走訪鏈結串列並計算節點個數。 ### `q_delete_mid` > Commit [551ce22](https://github.com/EricccTaiwan/lab0-c/commit/551ce2285c5c050abbbf71774660d231877f2dd7) 利用快慢指針找出中間的節點，從其鏈結串列中移除和釋放記憶體空間。 ### `q_delete_dup` > Commit [66e7e15](https://github.com/EricccTaiwan/lab0-c/commit/66e7e15f9bf9268ff0c432e40d413959679912a0) > 使用 `list_for_each_safe` 來走訪整個鏈結串列，確保在刪除節點時不會影響迴圈的執行。如果當前節點的`value`與下一個節點的`value`相同，則刪除當前節點，並標記 `dup=true` 。在下一次迴圈中，續檢查並刪除所有與該值相同的節點，直到所有相同的節點都被移除。 ### `q_swap` > Commit [bbf456c](https://github.com/EricccTaiwan/lab0-c/commit/bbf456c5782da516954e72178d92a46ffd9eb5c7) 使用 `list_for_each_safe` 來確保在修改結構時的安全性，並對鏈結串列中的相鄰節點進行兩兩交換。調整指標以維持正確的鏈結結構，確保遍歷時不影響整體鏈結的完整性。 ```c void q_swap(struct list_head *head) { if (!head || list_empty(head)) return; struct list_head *cur, *tmp; list_for_each_safe (cur, tmp, head) { // 略: 兩兩節點 next, prev指向反轉 } return; } ``` 在與[森哥]討論後，我們發現 `q_swap` 本質上與 `q_reverseK(head, 2)` 是相同的，當 `k=2` 時， `q_reverseK` 也會實現相同的兩兩節點交換效果。然而，目前尚未深入分析直接實作兩兩交換與呼叫 `q_reverseK` 來達成相同效果之間的優缺點，這部分仍待進一步探討。 TBD... ```c void q_swap(struct list_head *head) { if (!head || list_empty(head)) return; q_reverseK(head, 2); return; } ``` ### `q_reverse` > Commit [12dc9a5](https://github.com/EricccTaiwan/lab0-c/commit/12dc9a509a7d256de1b10c38986e7589b9611d6d) 使用 `list_for_each_safe` 來確保在修改結構時的安全性，並將當前節點的指針 `next` 和 `prev`變換方向，當走訪結束時，再將 `head` 的 `next` 和 `prev`依序變換方向即可。同`q_swap`，`q_reverse` 本質上與 `q_reverseK(head, q_size(head))` 無異，優缺點仍待進一步討論。 ```c void q_reverse(struct list_head *head) { if (!head || list_empty(head)) return; q_reverseK(head, q_size(head)); return; } ``` ### `q_reverseK` > Commit [9cc973e](https://github.com/EricccTaiwan/lab0-c/commit/9cc973edbdecd5b6c4d78ac5f81554ea426bdd3f) 透過一個 `for` 迴圈，程式會每次取出 `k` 個節點進行處理，包括將這些節點從原鏈結串列中切割出來、反轉 (`q_reverse`)，然後將結果串接至新的頭節點 `new_head`。迴圈中止條件是當迴圈變數 `i` 超過 `size - k` 時停止執行，確保每次處理的區間包含 `k` 節點。當剩餘節點不足 `k` 個時，不會再執行反轉操作，迴圈就會結束。最終，已反轉的節點會被拼接回原始鏈結串列，完成整個處理流程。 ```diff - int time = size / k; - for(int i = 0;i < k ; i++){ + for (int i = 0; i <= size - k; i += k) { int j = 0; list_for_each (tail, head) { if (j >= k) break; j++; } list_cut_position(&tmp, head, tail->prev); q_reverse(&tmp); list_splice_tail_init(&tmp, &new_head); } list_splice_init(&new_head, head); ``` ### `q_sort`, `merge_sort` & `merge_two_list` > Commit [855dffa](https://github.com/EricccTaiwan/lab0-c/commit/855dffa4d54a12dc9b4595a9069eeabd533f6f21) 起初用quicksort實做，但在進行`make test`中發現，`q_sort`要求必須為**stable sort**，因此採用merge sort來實踐，`q_sort`會叫輔助函數`static void merge_sort()`和`static void merge_two_list()`，使用`static`是因為這兩個函數僅供`q_sort`，為了清楚理解`static`的定義，翻閱C99規格書， #### C99 (6.2.1.4) > ... If the declarator or type specifier that declares the identifier appears outside of any block or list of parameters, the identifier has file scope, which terminates at the end of the translation unit. ... #### C99 (6.2.2.3) > If the declaration of a file scope identifier for an object or a function contains the storage-class specifier static, the identifier has internal linkage. 由此可知，static的作用域為文件作用域 (file scope) ，即該標識符 (identifier) 的作用範圍僅限於當前翻譯單元 (translation unit) ，並且**內部連結 (internal linkage) 確保該標識符無法被其他翻譯單元引用**。在實做`merge_sort`中，透過快慢指針找出中間的節點，使用linux API進行list的切割成，但卻在靜態分析中，遇到了下方的警告，因為對於 `const` 也是一知半解，所以一樣去翻閱C99規格書 ```diff struct list_head *mid = head; // Variable 'fast' can be declared as pointer to const [constVariablePointer] - struct list_head *fast = head->next; - for (; fast != head && fast->next != head; fast = fast->next->next) - mid = mid->next; + const struct list_head *fast = (const struct list_head *) head->next; + while (fast != (const struct list_head *) head && + fast->next != (const struct list_head *) head) { + mid = mid->next; + fast = (const struct list_head *) fast->next->next; + } ``` #### C99 (6.7.3.5) > If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined. 這段標準表示，當某個變數被 `const` 修飾後，若透過非 `const` 型別的左值 (lvalue) 來修改該變數，則行為是未定義的 (undefined behavior)。也就是說，被 `const` 修飾的變數，只能用來讀取，不能透過非 `const` 型別的指標來修改其內容。在這段程式碼中，`fast`只是用來走訪整個雙向鏈結串列，不需要修改任何`list_head`的內容，用`const`限制不能修改其值，且因為`head->next`的型別為`struct list_head *`，因此需要強制轉型。 ### `q_ascend` & `q_descend` > Commit [56cbacb](https://github.com/EricccTaiwan/lab0-c/commit/56cbacbc26dcb9cf30399bb5e726fd681c4fdd78) 實作升冪/降冪排序的想法是，透過從鏈結串列的尾端 `head->prev` 開始往 `head` 的方向移動，對於升冪排序，用 `minVal` 去維持當前的最小值，若當前節點大於 `minVal` ，將此節點移出鏈結串列並釋放其空間，反之，保留節點並更新 `minVal` ; 對於降冪排序也是一樣的邏輯，用 `maxVal` 去維持當前最大值。但卻在靜態分析中，遇到了下方的警告，因為對於 `const` 也是一知半解，所以一樣去翻閱C99規格書 ```diff // Variable 'elem' can be declared as pointer to const [constVariablePointer] - element_t *elem = list_entry(cur, element_t, list); + const element_t *elem = list_entry(cur, element_t, list); // Variable 'minVal' can be declared as pointer to const [constVariablePointer] - char *minVal = elem->value; + const char *minVal = elem->value; ``` #### C99 (6.7.3.5) > If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined. 這段標準表示，當某個變數被 `const` 修飾後，若透過非 `const` 型別的左值 (lvalue) 來修改該變數，則行為是未定義的 (undefined behavior)。也就是說，被 `const` 修飾的變數，只能用來讀取，不能透過非 `const` 型別的指標來修改其內容。因此，在這段程式碼中，指針 `elem` 和 `minVal` 都不應該被修改，而僅用來讀取鏈結串列的內容，所以它們可以安全地被宣告為 `const`，以強化程式的安全性與可讀性。 ### `q_merge` > Commit [6268965](https://github.com/EricccTaiwan/lab0-c/commit/6268965d75ad1ee02bd751748f312e4f9c8d0ddd) 因為需要兩兩合併要求出 `ceil(size/2))` 的值，為了避免使用除法運算，因此下方的程式碼透過位元操作去計算向上取整。 ```diff - int m = size % 2 ? size/2+1 : size/2 ; + int m = (size >> 1) + (size & 1); ``` 透過 merge sort 的概念，逐步合併兩個佇列的方式，不斷減少佇列的數量，直到只剩下最終合併完成的佇列，最後若處理是否需要反轉 ```diff static void merge_sort(struct list_head *head) { ... for (int i = 0; i < m; i++) { queue_contex_t *first = list_first_entry(head, queue_contex_t, chain); queue_contex_t *second = list_entry(first->chain.next, queue_contex_t, chain); while (!list_empty(first->q) && !list_empty(second->q)) { - merge_two_list(head, first->q, second->q); + merge_two_list(first->q, first->q, second->q); list_move_tail(&second->chain, head); first = list_entry(first->chain.next, queue_contex_t, chain); second = list_entry(first->chain.next, queue_contex_t, chain); } ... } ``` 為了避免新增一個函數僅供`q_merge`使用，因此調用了先前的 `merge_two_list` ，但在呼叫時出現了以下的問題， ```c static void merge_two_list(struct list_head *head, struct list_head *left, struct list_head *right){略}; static void merge_sort(struct list_head *head) { // 找出中間節點，分割鏈結串列 (略) merge_sort(&left); merge_sort(&right); merge_two_list(head, &left, &right); } void q_sort(struct list_head *head, bool descend) { if (!head || list_empty(head)) return; merge_sort(head); if (descend) q_reverse(head); return; } ``` 先回顧`q_sort`的程式碼，在 `merge_sort` 中調用的 `merge_two_list` 是將 `left` 和 `right` 的**已排序後的**鏈結串列合併到 `head` 後方，最終 `head` 就是指向排序後的整個鏈結串列。由此可知，`merge_two_list` 的第一個參數應該是目標位置，也就是合併後的結果會存放在哪個鏈結串列裡。因此，在`merge_sort`的 `while` 迴圈中，目標是要把 `second->q` 的節點合併到 `first->q` 裡，第一個參數就應該是 `first->q`，而不是 `head`。與 [charliechiou](https://hackmd.io/@charliechiu/linux2025-homework1) 討論後，這邊用5組鏈結串列 a, b, c, d, e 作為例子，為了方便解釋，預設其大小已按照升冪排序，`( )`表示排序後的鏈結串列， ``` i=0 : 第一次 while : (a,b), c, d, e 第二次 while : (a,b), (c, d), e -> 跳出 while 迴圈 i=1 : 第一次 while : (a, b, c, d), e -> 跳出 while 迴圈 i=2 : 第一次 while : (a, b, c, d, e) -> 跳出 while 迴圈 i=3 : 跳出 for 迴圈 ``` ## Valgrind + 自動化測試程式 > Commit [f4f99dc](https://github.com/EricccTaiwan/lab0-c/commit/f4f99dca9f69e43649e3efebb17ae1307c1cea57) > 運用 Valgrind 排除 qtest 實作的記憶體錯誤，並透過 Massif 視覺化 "simulation" 過程中的記憶體使用量，需要設計對應的實驗 ```Shell $ # install massif-visualizer $ sudo apt install massif-visualizer -y ``` 可以新增下方的程式到 `./Makefile` 中，若想測試其他的 trace ，把 ` traces/trace-massif.cmd` 改掉就好，最後會用 massif-visualizer 輸出 `.massif.out` 。 ```diff +massif: qtest + valgrind --tool=massif --massif-out-file=.massif.out ./$< -v 3 -f traces/trace-massif.cmd + massif-visualizer .massif.out + ms_print .massif.out ``` 新增 `traces/trace-massif.cmd`，因為作業要求提及到「視覺化 "simulation" 過程中的記憶體使用量」，因此內容直接複製 `trace-17-complexity` 來改就好，以下以 `q_insert_tail` 為例進行分析， ```shell # Test if time complexity of 'q_insert_tail' is constant option simulation 1 it option simulation 0 ``` 開啟終端機輸入， ```Shell $ make massif ``` 輸出如下， ![image](https://hackmd.io/_uploads/B12ATvViyg.png) ```shell 87.51% (1,131,446B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->86.37% (1,116,641B) 0x10F96B: alloc (harness.c:146) | ->86.37% (1,116,641B) 0x10FABE: test_malloc (harness.c:176) | ->49.39% (638,592B) 0x1100F8: q_insert_head (queue.c:37) | | ->49.39% (638,592B) 0x110C3B: measure (constant.c:100) | | ->49.39% (638,592B) 0x1111E3: doit (fixture.c:172) | | ->49.39% (638,592B) 0x1111E3: test_const (fixture.c:203) | | ->49.39% (638,592B) 0x111320: is_insert_tail_const (fixture.c:216) | | ->49.39% (638,592B) 0x10CCD8: queue_insert (qtest.c:192) | | ->49.39% (638,592B) 0x10D0F0: do_it (qtest.c:288) | | ->49.39% (638,592B) 0x10E7AD: interpret_cmda (console.c:181) | | ->49.39% (638,592B) 0x10ED72: interpret_cmd (console.c:201) | | ->49.39% (638,592B) 0x10F001: cmd_select (console.c:593) | | ->49.39% (638,592B) 0x10F8DF: run_console (console.c:673) | | ->49.39% (638,592B) 0x10DB9C: main (qtest.c:1459) | | | ->36.97% (477,929B) 0x10FC7F: test_strdup (harness.c:231) | | ->36.96% (477,881B) 0x110114: q_insert_head (queue.c:41) | | | ->36.96% (477,881B) 0x110C3B: measure (constant.c:100) | | | ->36.96% (477,881B) 0x1111E3: doit (fixture.c:172) | | | ->36.96% (477,881B) 0x1111E3: test_const (fixture.c:203) | | | ->36.96% (477,881B) 0x111320: is_insert_tail_const (fixture.c:216) | | | ->36.96% (477,881B) 0x10CCD8: queue_insert (qtest.c:192) | | | ->36.96% (477,881B) 0x10D0F0: do_it (qtest.c:288) | | | ->36.96% (477,881B) 0x10E7AD: interpret_cmda (console.c:181) | | | ->36.96% (477,881B) 0x10ED72: interpret_cmd (console.c:201) | | | ->36.96% (477,881B) 0x10F001: cmd_select (console.c:593) | | | ->36.96% (477,881B) 0x10F8DF: run_console (console.c:673) | | | ->36.96% (477,881B) 0x10DB9C: main (qtest.c:1459) | | | | | ->00.00% (48B) in 1+ places, all below ms_print's threshold (01.00%) | | | ->00.01% (120B) in 1+ places, all below ms_print's threshold (01.00%) | ->01.15% (14,805B) in 13 places, all below massif's threshold (1.00%) ``` 可以看到其中 heap 的使用資源佔用了 86.37 % ，而 `test_strdup` 佔了其中的 36.97%。這次把 `test_malloc` 和 `test_free` 註解掉，重新測試， ![image](https://hackmd.io/_uploads/rk31W_Niye.png) ```shell 51.01% (334,262B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->36.61% (239,856B) 0x1100F8: q_insert_head (queue.c:37) | ->36.61% (239,856B) 0x110C2B: measure (constant.c:100) | ->36.61% (239,856B) 0x1111D3: doit (fixture.c:172) | ->36.61% (239,856B) 0x1111D3: test_const (fixture.c:203) | ->36.61% (239,856B) 0x111310: is_insert_tail_const (fixture.c:216) | ->36.61% (239,856B) 0x10CCD8: queue_insert (qtest.c:192) | ->36.61% (239,856B) 0x10D0F0: do_it (qtest.c:288) | ->36.61% (239,856B) 0x10E7AD: interpret_cmda (console.c:181) | ->36.61% (239,856B) 0x10ED72: interpret_cmd (console.c:201) | ->36.61% (239,856B) 0x10F001: cmd_select (console.c:593) | ->36.61% (239,856B) 0x10F8DF: run_console (console.c:673) | ->36.61% (239,856B) 0x10DB9C: main (qtest.c:1459) | ->12.14% (79,561B) 0x4A0335E: strdup (strdup.c:42) | ->12.14% (79,553B) 0x11010C: q_insert_head (queue.c:41) | | ->12.14% (79,553B) 0x110C2B: measure (constant.c:100) | | ->12.14% (79,553B) 0x1111D3: doit (fixture.c:172) | | ->12.14% (79,553B) 0x1111D3: test_const (fixture.c:203) | | ->12.14% (79,553B) 0x111310: is_insert_tail_const (fixture.c:216) | | ->12.14% (79,553B) 0x10CCD8: queue_insert (qtest.c:192) | | ->12.14% (79,553B) 0x10D0F0: do_it (qtest.c:288) | | ->12.14% (79,553B) 0x10E7AD: interpret_cmda (console.c:181) | | ->12.14% (79,553B) 0x10ED72: interpret_cmd (console.c:201) | | ->12.14% (79,553B) 0x10F001: cmd_select (console.c:593) | | ->12.14% (79,553B) 0x10F8DF: run_console (console.c:673) | | ->12.14% (79,553B) 0x10DB9C: main (qtest.c:1459) | | | ->00.00% (8B) in 1+ places, all below ms_print's threshold (01.00%) | ->01.47% (9,656B) 0x10E345: malloc_or_fail (report.c:215) | ->01.25% (8,216B) 0x10EA4E: push_file (console.c:460) | | ->01.25% (8,216B) 0x10F7B9: run_console (console.c:651) | | ->01.25% (8,216B) 0x10DB9C: main (qtest.c:1459) | | | ->00.22% (1,440B) in 1+ places, all below ms_print's threshold (01.00%) | ->00.79% (5,189B) in 1+ places, all below ms_print's threshold (01.00%) ``` 可以看到 heap allocation 的使用量從 87.51% (1,131,446B) 降至 51.01% (334,262B) ，且使用 `strdup` 僅佔用了 12.14% (79,561B) ，相比 `test_strdup` 佔用的 36.97% (477,929B) 少了很多，但也可以發現有 `malloc_or_fail` 產生，至於原因為何，TBD。 ## 整合網頁伺服器 > 參考 [charliechiou](https://hackmd.io/@charliechiu/linux2025-homework1) 在 console.c 中 `do_web()` 是一個開啟 Web 伺服器的函式，參考作業說明 [lab0 \(C)](https://hackmd.io/@sysprog/linux2025-lab0/%2F%40sysprog%2Flinux2025-lab0-c) 的操作，開啟兩個終端機 ```shell $ ./qtest # Terminal 1 cmd> web listen on port 9999, fd is 3 ``` 在終端機 2 中輸入 `curl http://localhost:9999/new` ，可以預期在終端機 1 呼叫 `q_new()`，此時打開瀏覽器輸入 `http://localhost:9999/new` ，卻在終端機 1 中看到下方的錯誤訊息， ```shell cmd> Unknown command 'favicon.ico' cmd> l = [] ``` ### 解決 `favicon.ico` > Commit [f2e261d](https://github.com/EricccTaiwan/lab0-c/commit/f2e261d8ece7186c157684bc49f04f90f3fe7d35) 為了解決 `favicon.ico` 所產生的問題，根據老師的步驟修改，同時我加上了 `"<h1>Good Job, Eric!</h1>"`，可以預期成功用瀏覽器 request 後，應該要出現 Good Job, Eric! 的字眼。 ```diff char *p = web_recv(web_connfd, &clientaddr); - char *buffer = "HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\n\r\n"; + char *buffer = + "HTTP/1.1 200 OK\r\n%s%s%s%s%s%s" + "Content-Type: text/html\r\n\r\n" "<html><head><style>" + "body{font-family: monospace; font-size: 13px;}" + "td {padding: 1.5px 6px;}" + "</style><link rel=\"shortcut icon\" href=\"#\">" + "<h1>Good Job, Eric!</h1>" + "</head><body><table>\n"; web_send(web_connfd, buffer); ``` 修正後，重新在瀏覽器輸入 `curl http://localhost:9999/new`，便能成功透過瀏覽器發出 request 和見到下方的字樣。仔細觀察終端機的輸出，Chrome 發送了兩次 request ，因此可以看到終端機輸出了兩次的`l = []`，至於如何解決此問題， TBD 。 ![image](https://hackmd.io/_uploads/BkaP36fikx.png) ### 判斷 request 來源 > Commit [40d8775](https://github.com/EricccTaiwan/lab0-c/commit/40d8775b3bb4cb2db26d15a0bbce3ac5638f1499) 此時，在先前的終端機 2 中輸入 `curl http://localhost:9999/new` ，會顯示一長串的 html 檔案的內容。 ```shell $ curl http://localhost:9999/new # Terminal 2 <html><head><style>body{font-family: monospace; font-size: 13px;}td {padding: 1.5px 6px;}</style><link rel="shortcut icon" href="#"><h1>Good Job, Eric!</h1></head><body><table> ``` 可以由此判斷，網頁伺服器無法判斷 request 的來源，可以在 `parse_request` 中，把透過接收的 request 用終端機輸出，如下， ```shell $ # Request from curl User-Agent: curl/8.5.0 $ # Request from Chrome broswer User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36 ``` 透過 request 的資訊可以看出 User-Agent 的差異，我們能以此判斷 request 是否來自 curl ，可以注意到我只有使用 Chrome 發出 request ，但回傳了一長串的 `User-Agent` ，其中的關鍵字包含 Mozilla 和 Safari ，即是 [User agent spoofing](https://en.wikipedia.org/wiki/User-Agent_header#User_agent_spoofing) ，簡言之 Chrome 的 `User-Agent` 包含了 Mozilla 和 Safari 是歷史相容性需求，同時瀏覽器可以透過 `User-Agent` 欺騙老舊的網站，讓老舊網站認為 Chrome 是支援的瀏覽器; 但好像沒法判別是 Mozilla 或是 Chrome 所發出的，益或是也無此必要性，TBD... 在 `parse_request` 儲存將 User-Agent 的資訊， ```diff typedef struct { char filename[512]; off_t offset; /* for support Range */ size_t end; + char user_agent[256]; /* User-Agent */ } http_request_t; ``` ```diff while (buf[0] != '\n' && buf[1] != '\n') { /* \n || \r\n */ ... + if (strncmp(buf, "User-Agent:", 11) == 0) { + strncpy(req->user_agent, buf + 12, sizeof(req->user_agent) - 1); + req->user_agent[sizeof(req->user_agent) - 1] = '\0'; + } } ``` 並在 `web_recv` 中判斷 user_agent 中的資訊是否包含 `"curl"`， ```diff - char *web_recv(int fd, struct sockaddr_in *clientaddr) + char *web_recv(int fd, struct sockaddr_in *clientaddr, int *is_curl) { http_request_t req; parse_request(fd, &req); + *is_curl = 0; + if (strstr(req.user_agent, "curl") != NULL) + *is_curl = 1; char *p = req.filename; ... } ``` 最後於 `web_eventmux` 中對於判斷 request 來源，進行了下方的修改， ```diff int web_eventmux(char *buf) { fd_set listenset; + int is_curl = 0; ... if (server_fd > 0 && FD_ISSET(server_fd, &listenset)) { ... - char *p = web_recv(web_connfd, &clientaddr); + char *p = web_recv(web_connfd, &clientaddr, &is_curl); - char *buffer = - "HTTP/1.1 200 OK\r\n%s%s%s%s%s%s" - "Content-Type: text/html\r\n\r\n" - "<html><head><style>" - "body{font-family: monospace; font-size: 13px;}" - "td {padding: 1.5px 6px;}" - "</style><link rel=\"shortcut icon\" href=\"#\">" - "<h1>Good Job, Eric!</h1>" - "</head><body><table>\n"; + char *buffer = NULL; + if (is_curl) { + buffer = + "HTTP/1.1 200 OK\r\n" + "Content-Type: text/plain\r\n\r\n" + "Good Job, Eric! Request from curl.\n"; + } else { + buffer = + "HTTP/1.1 200 OK\r\n%s%s%s%s%s%s" + "Content-Type: text/html\r\n\r\n" + "<html><head><style>" + "body{font-family: monospace; font-size: 13px;}" + "td {padding: 1.5px 6px;}" + "</style><link rel=\"shortcut icon\" href=\"#\">" + "<h1>Good Job, Eric! Request from browser.</h1>" + "</head><body><table>\n"; + } web_send(web_connfd, buffer); ... } ``` 最後用終端機來驗證，如下圖呈現，但對於瀏覽器為何會發出兩次的 reqeust ， TBD 。 * Request from curl ![image](https://hackmd.io/_uploads/ByFiTRfs1g.png) * Request from browser ![image](https://hackmd.io/_uploads/HkUyRCMoke.png) [User-Agent Switcher and Manager](https://chromewebstore.google.com/detail/user-agent-switcher-and-m/bhchdcejhohfmigjafbampogmaanbfkg?hl=zh_tw&pli=1) 的擴充功能，可以任意的切換 User-Agent 內容，因此我在 Chrome 中，輸入 curl 的 User-Agent string，或是在原本的 User-Agent 後方加上 "Curl" ，都可以看到下方的結果。 ![image](https://hackmd.io/_uploads/HJ3MZ-Nokl.png) User-Agent spoofing 的優點已在上述提及，至於會有什麼樣的問題? TBD 。 ### 研讀 `select()` > 解釋 select 系統呼叫在本程式的使用方式，並分析 console.c 的實作，說明其中運用 CS:APP RIO 套件的原理和考量點。可對照參考 CS:APP 第 10 章重點提示上面的 code 改完後，才發現並沒有詳讀 `select` ，參考 [Integrate linenoise with web server #162](https://github.com/sysprog21/lab0-c/pull/162) ，看到下方的程式碼片段，可以理解 `web_eventmux()` 在處理來自伺服器和終端機的 mutex 問題，因此查閱了 [select(2)](https://man7.org/linux/man-pages/man2/select.2.html)，先去理解各個 interface 的用意。 #### `select()` > allows a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform a corresponding I/O operation (e.g., read(2), or a sufficiently small write(2)) without blocking. `select()`允許程式同時監聽多個檔案描述符（file descriptors, FDs），並在其中至少有一個變為「可讀」、「可寫」或發生異常時返回，避免程式進入不必要的阻塞狀態。 #### `fd_set` > A structure type that can represent a set of file descriptors. According to POSIX, the maximum number of file descriptors in an fd_set structure is the value of the macro FD_SETSIZE. `fd_set` 是用來存放文件描述 (fd, file descriptors)集合的結構體，通常用在 `select()`中，來監聽多個 fd 的狀態。 line 3 即透過 `fd_set` 宣告 `listenset`。 #### `FD_ZERO()` > This macro clears (removes all file descriptors from) set. It should be employed as the first step in initializing a file descriptor set. `FD_ZERO()` 是一個用來初始化 `fd_set` 的巨集，作用是清空 `fd_set`， line 4 便是初始化 `listenset`。 #### `FD_SET()` > This macro adds the file descriptor fd to set. Adding a file descriptor that is already present in the set is a no-op, and does not produce an error. `FD_SET()` 是用來將指定的FD加入 `fd_set` 的巨集，如果 line 5 就是將標準輸入（`STDIN`）加入 `fd_set`，讓 `select()` 監聽它是否可讀; 如果 `server_fd` 存在，江 `server_fd` 加入 `fd_set`。 `max_fd` 則是計算所有 FD 的最大值。 #### `FD_ISSET()` > `select()` modifies the contents of the sets according to the rules described below. After calling select(), the `FD_ISSET()` macro can be used to test if a file descriptor is still present in a set. `FD_ISSET()` returns nonzero if the file descriptor fd is present in set, and zero if it is not. 用來檢查 `fd_set` 內某個 FD 是否還在 `select()` 返回的結果中。 #### `FD_CLR` > This macro removes the file descriptor fd from set. Removing a file descriptor that is not present in the set is a no-op, and does not produce an error. `FD_SET()` 是從 `fd_set` 來移除指定的FD的巨集，讓 `select()` 不再監聽。 line 17 `if (server_fd > 0 && FD_ISSET(server_fd, &listenset))` 檢查 `server_fd` 是否有新的連線，如果有，用 `FD_CLR` 先移除，避免影響下次的 `select()` ，接著就是使用上面新增的 `is_curl` 來判斷此連線的來源。 ```c= int web_eventmux(char *buf) { fd_set listenset; FD_ZERO(&listenset); FD_SET(STDIN_FILENO, &listenset); int max_fd = STDIN_FILENO; if (server_fd > 0) { FD_SET(server_fd, &listenset); max_fd = max_fd > server_fd ? max_fd : server_fd; } int result = select(max_fd + 1, &listenset, NULL, NULL, NULL); if (result < 0) return -1; int is_curl = 0; if (server_fd > 0 && FD_ISSET(server_fd, &listenset)) { FD_CLR(server_fd, &listenset); struct sockaddr_in clientaddr; socklen_t clientlen = sizeof(clientaddr); ... ``` ### [console.c](https://github.com/sysprog21/lab0-c/blob/master/console.c) 實做 & [RIO 套件](https://csapp.cs.cmu.edu/2e/ch10-preview.pdf) > 可對照參考 [CS:APP 第 10 章重點提示](https://hackmd.io/@sysprog/CSAPP-ch10) `console.c` 的目的為實做 command line interface (CLI) ，允許從標準輸入 `stdin` 或網路 request 接收指令。其中 `select()` 負責同時監聽標準輸入 (`STDIN_FILENO`) 和網路請求 (`web_fd`)，以及當其中一個 FD 可讀時，執行對應的處理。 #### `cmd_select()` 在 `cmd_select()` 中，當 `stdin` 有輸入時，系統會執行 `readline()` 來讀取輸入內容，然後交由 `interpret_cmd()` 解析並執行相應的命令。然而，傳統 `readline()` 每次讀取輸入時可能會頻繁呼叫 `read()`，這會導致效能低下，特別是當輸入來源是檔案或 socket 時，每個字元都可能觸發 `read()` 系統呼叫，使 I/O 成本大幅增加。因此，為了提升效能並減少 `read()` 次數，`console.c` 引入 RIO（Robust I/O）緩衝機制來優化 `readline()`，透過 `rio_read()` 來管理讀取流程。 #### `readline()` 當 `buf_stack->count == 0` 時，`readline()` 才會執行 `read()`： * 一次性讀取 8192 bytes (`RIO_BUFSIZE`)。 * 之後的讀取直接來自 buffer，不再執行 `read()`，直到 buffer 耗盡。這樣避免了每讀取一個字元就調用 read()，提升效能。 ```c if (buf_stack->count <= 0) { buf_stack->count = read(buf_stack->fd, buf_stack->buf, RIO_BUFSIZE); buf_stack->bufptr = buf_stack->buf; } ``` 如果 `readline()` 讀到 EOF，它會從 `buf_stack` 內彈出 (pop) 上一層 buffer，這讓 `readline()` 可以讀取不同來源的輸入，而不會頻繁執行 `read()` 。 ```c if (buf_stack->count <= 0) { pop_file(); } ``` ## 亂數 ### Fisher-Yates shuffle > Commit[02d154d](https://github.com/EricccTaiwan/lab0-c/commit/02d154db08bcffb8b3ae1831edc1a1240671bdee) > 參考 [willwillhi](https://hackmd.io/@willwillhi/lab0-2023#%E5%9C%A8-qtest-%E6%96%B0%E5%A2%9E-shuffle-%E5%91%BD%E4%BB%A4) 參考作業說明 [lab0 (D)](https://hackmd.io/@sysprog/linux2025-lab0/%2F%40sysprog%2Flinux2025-lab0-d#%E5%9C%A8-qtest-%E6%8F%90%E4%BE%9B%E6%96%B0%E7%9A%84%E5%91%BD%E4%BB%A4-shuffle) 實做 `q_shuffle` 和 [willwillhi](https://hackmd.io/@willwillhi/lab0-2023#%E5%9C%A8-qtest-%E6%96%B0%E5%A2%9E-shuffle-%E5%91%BD%E4%BB%A4) 對於 `queue.h` 和 `qtest.c` 的修改，因為 `queue.h` 是不能修改，因此在下方做上註記。 #### queue.h ```diff int q_merge(struct list_head *head, bool descend); + void q_shuffle(struct list_head *head); #endif /* LAB0_QUEUE_H */ ``` ### 測試腳本 > Commit [8484015](https://github.com/EricccTaiwan/lab0-c/commit/8484015be726de844ef1ccfcfd1c1bd2d8afc055) 參考作業說明 [lab0 (D)](https://hackmd.io/@sysprog/linux2025-lab0/%2F%40sysprog%2Flinux2025-lab0-d#%E6%B8%AC%E8%A9%A6%E7%A8%8B%E5%BC%8F) 新增測試腳本 `test_shuffle.py`。 ### 統計方法驗證參考[lab0 (D)](https://hackmd.io/@sysprog/linux2025-lab0/%2F%40sysprog%2Flinux2025-lab0-d#%E7%B5%B1%E8%A8%88%E6%96%B9%E6%B3%95%E9%A9%97%E8%AD%89-shuffle) > 補充: `test_shuffle.py` 會跑有點久，平均一次 13 分鐘左右 ```Shell $ python3 test_shuffle.py Expectation: 41666 Observation: {'1234': 41665, '1243': 41891, '1324': 41722, '1342': 41489, '1423': 41649, '1432': 41608, '2134': 41424, '2143': 41662, '2314': 41396, '2341': 41575, '2413': 41655, '2431': 41365, '3124': 41986, '3142': 41774, '3214': 41848, '3241': 41730, '3412': 42060, '3421': 41784, '4123': 41765, '4132': 41240, '4213': 41498, '4231': 41732, '4312': 41900, '4321': 41582} chi square sum: 22.20851533624538 ``` | | 觀察到的頻率 | 期待的頻率 | ${(O_i - E_i)^2 \over E_i}$ | | --------- | ------------ | ---------- | --------------------------- | | [1,2,3,4] | 41665 | 41666 | 0.0000240004 | | [1,2,4,3] | 41891 | 41666 | 1.2150194400 | | [1,3,2,4] | 41722 | 41666 | 0.0752652042 | | [1,3,4,2] | 41489 | 41666 | 0.7519080305 | | [1,4,2,3] | 41649 | 41666 | 0.0069361110 | | [1,4,3,2] | 41608 | 41666 | 0.0807372918 | | [2,1,3,4] | 41424 | 41666 | 1.4055584890 | | [2,1,4,3] | 41662 | 41666 | 0.0003840061 | | [2,3,1,4] | 41396 | 41666 | 1.7496279940 | | [2,3,4,1] | 41575 | 41666 | 0.1987471800 | | [2,4,1,3] | 41655 | 41666 | 0.0029040465 | | [2,4,3,1] | 41365 | 41666 | 2.1744587910 | | [3,1,2,4] | 41986 | 41666 | 2.4576393220 | | [3,1,4,2] | 41774 | 41666 | 0.2799404790 | | [3,2,1,4] | 41848 | 41666 | 0.7949887198 | | [3,2,4,1] | 41730 | 41666 | 0.0983055729 | | [3,4,1,2] | 42060 | 41666 | 3.7257236120 | | [3,4,2,1] | 41784 | 41666 | 0.3341813469 | | [4,1,2,3] | 41765 | 41666 | 0.2352277636 | | [4,1,3,2] | 41240 | 41666 | 4.3554936880 | | [4,2,1,3] | 41498 | 41666 | 0.6773868382 | | [4,2,3,1] | 41732 | 41666 | 0.1045456727 | | [4,3,1,2] | 41900 | 41666 | 1.3141650270 | | [4,3,2,1] | 41582 | 41666 | 0.1693467095 | | Total | | | 22.20851533624538 | $X^2$ = 22.20851533624538 本次實驗的排列組合有 $4!$ = 24種，所以自由度是 23，透過查尋[卡方分布表](https://www.obhrm.net/index.php/%E5%8D%A1%E6%96%B9%E5%88%86%E5%B8%83%E8%A1%A8_Chi-Square_Probabilities)，可以查到 P value 位於 0.1 ~ 0.9 的區間，因為 P value(0.1 ~ 0.9) > alpha (0.05) ，統計檢定的結果不拒絕虛無假說 ($H_0$) 也就是樣本資料不拒絕 shuffle 的結果遵循 Uniform distribution，因為沒有足夠證據推翻，可以看到下方的圖片， shuffle 的頻率是大致符合 Uniform distribution 的。 ![image](https://hackmd.io/_uploads/ryrN5VQjkg.png) ### 亂度 1. What is random? > 包含以下性質 : Unpredictability, Uniformity, Independence, 和 High Entropy 1. How to measure the randomness? > 用 $Entropy$ #### What is random 隨機（randomness）是一種無法預測的現象，通常具有以下關鍵特性： * 不可預測性（Unpredictability） * 如果一個事件是隨機的，那麼在沒有任何額外資訊的情況下，無法準確預測它的下一個結果。 * 例如，擲骰子時，理論上無法預測下一次擲出的數字。 * 均勻性（Uniformity） * 在理想的隨機情況下，每個可能結果的發生機率應該相等，例如均勻分佈（Uniform Distribution）。 * 以擲骰子為例，$1$ 到 $6$ 各個數字應該有相等的 $\frac{1}{6}$ 機率發生。 * 獨立性（Independence） * 事件之間應該是獨立的，即前一個事件的結果不應影響下一個事件的機率分佈。 * 例如，擲硬幣時，不論上一次結果是正面還是反面，下一次擲出的機率仍然是 50%。 * 高資訊熵（High Entropy） * 亂數應該具有最大的資訊熵（Entropy），因為這表示我們能從過去的數據中獲取的資訊最少（即最難預測）。 * 當所有事件的機率都相等時，Entropy 取最大值，即代表最隨機的狀態；若某些結果比其他結果更容易發生，則 Entropy 會降低 #### How to measure the randomness 透過 $Entropy$ 來測亂度。 1. Self-information * 每個隨機事件發生時所傳遞的資訊量 $I(x_i)$ 與該事件的機率 $P(x_i)$ 成反比，定義如下： $I(x_i)=log_b (\frac{1}{P(x_i)}) = -\log_b P(x_i)$ * 當事件發生的機率 $P(x_i)$ 越低時，其資訊量 $I(x_i)$ 就越高，因為這類事件更罕見，因此提供更多的資訊。 2. Entropy * 資訊熵 $S$ 是隨機變數所有可能結果的**平均資訊量**，定義為： $S = - \sum^{n}_{i=1} P(x_i) log_b P(x_i)=\sum^{n}_{i=1} P(x_i)I(x_i)$ * Entropy 衡量了隨機變數的不確定性，當所有可能事件的機率均等時，Entropy 取最大值，表示最難以預測的情況。 ### 實做 - ent 亂度安裝 ent，此工具 `ent` 可計算一輸入字串的Entropy。 ```shell $ sudo apt-get install ent ``` 根據作業說明，嘗試計算 linux 內建的 PRNG `/dev/random` ```shell $ head -c 10M /dev/random | ent Entropy = 7.999982 bits per byte. Optimum compression would reduce the size of this 10485760 byte file by 0 percent. Chi square distribution for 10485760 samples is 268.02, and randomly would exceed this value 27.54 percent of the times. Arithmetic mean value of data bytes is 127.5092 (127.5 = random). Monte Carlo value for Pi is 3.143029458 (error 0.05 percent). Serial correlation coefficient is 0.000093 (totally uncorrelated = 0.0). ``` 可以看到，因為 `/dev/random` 的內容隨機，因此每一個位元的資料量較大，同時 1 byte char 的大小為 $2^8$ ，所以最大的 entropy 是 $S = log_2(2^8) = 8$ ，對照上方的 `Entropy = 7.999982 bits per byte.`，與理論值無異。 ### 實做 - qtest 亂度在 ./qtest 中執行 `option entropy 1 `並搭配 `it RAND 10` 來計算每個字串的 shannon entropy ，其輸出結果如下 ```shell $ ./qtest cmd> new l = [] cmd> option entropy 1 cmd> it RAND 10 l = [wewgqrne(29.69%) akrfem(29.69%) ljwxio(29.69%) quvmmsno(32.81%) tnftbit(25.00%) wccwf(17.19%) zewuflwh(32.81%) msbyjtb(29.69%) pbrjq(26.56%) eckrxnmys(37.50%)] ``` 這些字串的熵是透過 `shannon_entropy.c` 所計算的，因此查看程式碼， ```c ... for (uint32_t i = 0; i < BUCKET_SIZE; i++) { if (bucket[i]) { uint64_t p = bucket[i]; p *= LOG2_ARG_SHIFT / count; entropy_sum += -p * log2_lshift16(p); } } entropy_sum /= LOG2_ARG_SHIFT; return entropy_sum * 100.0 / entropy_max; ``` 這邊的計算遵循著先前計算 $S$ 的公式，並正規化輸出。針對第一筆 `wewgqrne(29.69%)` 進行分析， | char | times | p(x) | $log_2\ p(x)$ | $-p(x)log_2\ p(x)$ | | ---- | ----- | ----------- | ------------- | ------------------ | | w | 2 | 2/8 = 0.25 | -2 | 0.5 | | e | 2 | 2/8 = 0.25 | -2 | 0.5 | | g | 1 | 1/8 = 0.125 | -3 | 0.375 | | q | 1 | 1/8 = 0.125 | -3 | 0.375 | | r | 1 | 1/8 = 0.125 | -3 | 0.375 | | n | 1 | 1/8 = 0.125 | -3 | 0.375 | | $S_{sum}$ | | | | 2.5 | $S_{sum} = 2.5$ ，計算 1byte 編碼下的 entropy = $\frac{2.5}{8} \times 100\% = 31.25\%$，實際值只有 $5\%$ 的誤差。 ## 論文閱讀〈Dude, is my code constant time?〉 > [論文](https://eprint.iacr.org/2016/1123.pdf) > [oreparaz/dudect](https://github.com/oreparaz/dudect) > 參考 [I-HSIN Cheng](https://hackmd.io/@vax-r/linux2024-homework1#q_delete_mid)， [Yiwei Lin](https://hackmd.io/@RinHizakura/S1PfuxJkv) 作者提出 **dudect** 工具，用於檢測程式碼的時間複雜度是否為 constant time 。論文中 **Step3: Apply statistical test** 是本次實做的關鍵，透過量測程式在兩筆不同輸入（如 fix-vs-random ）時的執行時間分佈，並應用 [Welch’s t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test)來判斷分佈差異，試圖推翻「兩個時間分佈相等」的虛無假設 ([null hypothesis, $H_0$](https://zh.wikipedia.org/zh-tw/%E9%9B%B6%E5%81%87%E8%AE%BE))。 **虛無假設 ([null hypothesis, $H_0$](https://zh.wikipedia.org/zh-tw/%E9%9B%B6%E5%81%87%E8%AE%BE))** 論文將此解釋為 "the two timing distributions are equal" ，若兩組測資的執行時間分佈是相等的，更精準的說若兩個樣本的變異數相等，代表程式碼的時間複雜度即是 constant-time。 Student’s t-test 假設兩個樣本均值來自正態分佈（normally distributed）的母體，並且母體的變異數相等。而 Welch’s t-test 是 Student’s t-test 的變體，適用於變異數不等的情況，但仍然保持正態分佈的假設。因此，本作業測試 constant time implementation 時，使用的是 [Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test)，而非 Student’s t-test。 ### 解釋本程式的 "simulation" 及程式實作的原理。 > Commit [709516f](https://github.com/EricccTaiwan/lab0-c/commit/709516f9c0d5fe21ca0fde57c3e87566bb9397f4) > 可以在 `./qtest.c` 中找到 simulation ，當值為 1 時會檢查 `queue_insert` 和 `queue_remove` 是否符合 constant time 。先從 `queue_insert` 看起，可以發現 `is_insert_tail_const()` 是判斷是否符合 const time 的函式，用 ctrl + 滑鼠左鍵此函式後，會跳到 `./dudect/fixture.h`，但卻找不到 `is_insert_tail_const()` 的介面，觀察到了下方的註解， `/* Interface to test if function is constant */` ，可以預期是透過 `#define _(x) bool is_##x##_const(void);` 去定義剛剛找不到的介面。 ```c ... /* Interface to test if function is constant */ #define _(x) bool is_##x##_const(void); DUT_FUNCS #undef _ #endif ``` 再點開上方的`DUT_FUNCS`會跳到`./dudect/constant.h`， ```c ... #define DUT_FUNCS \ _(insert_head) \ _(insert_tail) \ _(remove_head) \ _(remove_tail) #define DUT(x) DUT_##x enum { #define _(x) DUT(x), DUT_FUNCS #undef _ }; ... ``` 首先，因為 `fixture.h` 中有 `#include "constant.h"` ，所以預處理器 (preprocessor) 會先展開 `constant.h` 的巨集，接著才是 `fixture.h` ，到這邊開始看不懂了，為了看懂需要釐清兩個觀念：巨集的展開，和 `##` 的用意。 #### 巨集展開是時候來查閱規格書了，首先我看不懂 `#define _(x) bool is_##x##_const(void);`的用意？ #### C99 (6.10.3.10) > A preprocessing directive of the form \# define identifier lparen identifier-listopt ) replacement-list new-line \# define identifier lparen ... ) replacement-list new-line \# define identifier lparen identifier-list , ... ) replacement-list new-line defines a function-like macro with parameters, whose use is similar syntactically to a function call. 由規格書可以理解這條巨集的格式如下， ```c //# define identifier ( identifier-listopt ) replacement-list new-line # define _ ( x ) bool is_##x##_const(void); ``` 其實這條巨集和 `#define ADD(a,b) a+b` 沒有差別，只是第一次看到 `identifier` 是 `_` (底線)，原地嚇到。 #### The `##` operator 參考 [Linux 核心原始程式碼巨集: max, min](https://hackmd.io/@sysprog/linux-macro-minmax#%E9%81%BF%E5%85%8D%E5%91%BD%E5%90%8D%E8%A1%9D%E7%AA%81) ，`##` 在 C 語言前置處理器的作用是 concatenation (即連結、接續的意涵)。 #### C99 (6.10.3.3.2) > If, in the replacement list of a function-like macro, a parameter is immediately preceded or followed by a `##` preprocessing token, the parameter is replaced by the corresponding argument’s preprocessing token sequence; #### GNU [3.5 Concatenation](https://gcc.gnu.org/onlinedocs/cpp/Concatenation.html) > This is called token pasting or token concatenation. The `##` preprocessing operator performs token pasting. When a macro is expanded, the two tokens on either side of each `##` operator are combined into a single token, which then replaces the ‘##’ and the two original tokens in the macro expansion. 根據 C99 規範，當 function-like macro 的參數前後緊鄰 `##` 運算子時，該參數會被對應的引數替換，並與 `##` 兩側的標識符連結。因此，在 `#define _(x) bool is_##x##_const(void);` 中 `x` 會被 arugment 取代，而 `is_` 和 `_const(void);` 會分別連結到其前後，最終形成有效的函式宣告。例如，當傳入 `insert_head` 時，展開結果將為 `bool is_insert_head_const(void);`。 #### Preprocessing ```c= // ./dudect/constant.h #define DUT_FUNCS \ _(insert_head) \ _(insert_tail) \ _(remove_head) \ _(remove_tail) #define DUT(x) DUT_##x enum { #define _(x) DUT(x), DUT_FUNCS #undef _ }; ``` 首先在 line 1 的 `DUT_FUNCS` 會被展開成 ```c _(insert_head) _(insert_tail) _(remove_head) _(remove_tail) ``` 展開 line 8 的 `DUT(x)` 巨集，到此 line 為止還沒有使用到此巨集。展開 line 11 的 `#define _(x) DUT(x),` 巨集，會變成 ```c // _(x) => DUT(x) // _(insert_head) => DUT(insert_head) DUT(insert_head), DUT(insert_tail), DUT(remove_head), DUT(remove_tail), ``` line 11 的 `DUT(x)` 又會被 line 8 的巨集 `#define DUT(x) DUT_##x` 展開， ```c // DUT(x) => DUT_##x // DUT(insert_head) => DUT_insert_head DUT_insert_head, DUT_insert_tail, DUT_remove_head, DUT_remove_tail, ``` line 12 取消 `_` 的定義。了解至此，把上述程式碼複製到 `test.c` 對於巨集依序進行測試，並輸入 `$ gcc -E -P test.c -o output.c`，便可以在 `output.c` 中看到預處理後的結果，最後 `enum` 的成員如下， ```c enum { DUT_insert_head, DUT_insert_tail, DUT_remove_head, DUT_remove_tail, }; ``` 現在回到 `dudect/fixture.h` 中， ```c /* Interface to test if function is constant */ #define _(x) bool is_##x##_const(void); DUT_FUNCS #undef _ ``` `#define _(x) bool is_##x##_const(void);` 實際上定義了四個函式原型 ```c bool is_insert_head_const(void); bool is_insert_tail_const(void); bool is_remove_head_const(void); bool is_remove_tail_const(void); ``` 在 `dudect/fixture.c` 中， ```c #define DUT_FUNC_IMPL(op) \ bool is_##op##_const(void) { return test_const(#op, DUT(op)); } #define _(x) DUT_FUNC_IMPL(x) DUT_FUNCS #undef _ ``` 則定義了函式的實作 ```c bool is_insert_head_const(void) { return test_const("insert_head", DUT(insert_head)); } bool is_insert_tail_const(void) { return test_const("insert_tail", DUT(insert_tail)); } bool is_remove_head_const(void) { return test_const("remove_head", DUT(remove_head)); } bool is_remove_tail_const(void) { return test_const("remove_tail", DUT(remove_tail)); } ``` 參考作者在 `update_statistics`中 [discard the first few measurements](https://github.com/oreparaz/dudect/blob/master/src/dudect.h#L303) ，因此做了以下更動 ```diff - for (size_t i = 0 ; i < N_MEASURES; i++) { + for (size_t i = 10 ; i < N_MEASURES; i++) { ``` 觀察 `dudect_main` 中，在呼叫 `update_statics` 前會先呼叫 `prepare_percentile` ，先對測量到的執行時間數據進行預處理，以確保統計分析的準確性。我認為這樣的處理就是論文中的 **Step2. Apply post-processing** ，此步驟實做去掉某些極端值 (Cropping) 和 High-order preprocessing。因此將採用作者的原始碼，新增以下段落，並針對 `fixture.c` 進行了對應的修改。 ```c /* * Reference : https://github.com/oreparaz/dudect/blob/master/src/dudect.h#L203 */ static int cmp(const int64_t *a, const int64_t *b) { if (*a == *b) return 0; return (*a > *b) ? 1 : -1; } /* * Reference : https://github.com/oreparaz/dudect/blob/master/src/dudect.h#L209 */ static int64_t percentile(int64_t *a_sorted, double which, size_t size) { size_t array_position = (size_t) ((double) size * (double) which); assert(array_position < size); return a_sorted[array_position]; } /* * set different thresholds for cropping measurements. * the exponential tendency is meant to approximately match * the measurements distribution, but there's not more science * than that. * Reference : https://github.com/oreparaz/dudect/blob/master/src/dudect.h#L221 */ static void prepare_percentiles(int64_t *exec_times) { qsort(exec_times, N_MEASURES, sizeof(int64_t), (int (*)(const void *, const void *)) cmp); for (size_t i = 0; i < N_MEASURES; i++) { exec_times[i] = percentile( exec_times, 1 - (pow(0.5, 10 * (double) (i + 1) / N_MEASURES)), N_MEASURES); } } ``` 做完改動後，就可以在 action 首次看到卡比了。 ![image](https://hackmd.io/_uploads/BkqBIGIoyx.png) ### 解釋 [Student's t-distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution) TBD... ## Linux核心 lib/list_sort.c ```c= void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp) { ... do { size_t bits; struct list_head **tail = &pending; /* Find the least-significant clear bit in count */ for (bits = count; bits & 1; bits >>= 1) tail = &(*tail)->prev; /* Do the indicated merge */ if (likely(bits)) { struct list_head *a = *tail, *b = a->prev; a = merge(priv, cmp, b, a); /* Install the merged result in place of the inputs */ a->prev = b->prev; *tail = a; } /* Move one element from input list to pending */ list->prev = pending; pending = list; list = list->next; pending->next = NULL; count++; } while (list); ... // continue ``` 用表格紀錄我最一開始的理解，下方的表格對應 `list_sort` 程式碼中的 do-while 迴圈 (line 4~27) ， line 9 判斷當前的 `count` 的 MSB 和 LSB 中，只要有 1 個 0-bit， line 12 就會對於 `pending` 進行 merge, line 21 - 26 再把節點從 `list` 拉一個到 `pending` 上，直到 `list` 上沒有節點，同時也結束 `合併:不合併 = 2:1` 的原則。表格為了輸出美觀，因此省了一些字 * 表格左上的 `#` 代表 `節點總數`; * `count 二進位`欄位中，**pending上** 有 $3 \times 2^k$ 個節點，省略了 `pending 上` * `[ ]` 代表欲合併的數個節點 * `//` 代表合併後，pending 新增的節點， | # | count 變化 | count 二進位 | merge | pending 上的節點 | | --- | ---------- | ----------------------------------------------------------------------------------------------- | ----------- |:---------------------------------------- | | 1 | 0 $\to$ 1 | 0000 $\to$ 0001 | no（$2^0$） | 1 | | 2 | 1 $\to$ 2 | 0001 $\to$ 0010 | no（$2^1$） | 1 $\gets$ 1 | | 2+1 | (過程) | 有 $3 \times 2^0$ 個節點 <br> 合併前 $2 \times 2^0$ 個節點 | | [1* $\gets$ 1*] // $\gets$ 1* | | 3 | 2 $\to$ 3 | 0010 $\to$ 001==1== | yes | (2) $\gets$ 1 | | 4 | 3 $\to$ 4 | 0011 $\to$ 0100 | no（$2^2$） | 2 $\gets$ 1 $\gets$ 1 | | 4+1 | (過程) | 有 $3 \times 2^0$ 個節點 <br> 合併前 $2 \times 2^0$ | | 2 $\gets$ [1* $\gets$ 1*] // $\gets$ 1* | | 5 | 4 $\to$ 5 | 0100 $\to$ 010==1== | yes | 2 $\gets$ (2) $\gets$ 1 | | 5+1 | (過程) | 有 $3 \times 2^1$ 個節點 <br> (一個 2*, 一個2*, 一個(1',1')) <br> 合併前 $2 \times 2^1$ 個節點 | | [2* $\gets$ 2*] $\gets$ 1' // $\gets$ 1' | | 6 | 5 $\to$ 6 | 0101 $\to$ 01==1==0 | yes | (4) $\gets$ 1 $\gets$ 1 | | 6+1 | (過程) | 有 $3 \times 2^0$ 個節點 <br> 合併前 $2 \times 2^0$ 個節點 | | 4 $\gets$ [1* $\gets$ 1*] // $\gets$ 1* | | 7 | 6 $\to$ 7 | 0110 $\to$ 011==1== | yes | 4 $\gets$ (2) $\gets$ 1 | 截至目前，對於這種合併方式有個疑問，以7個節點為例，操作完 do-loop 後， `pending` 明顯沒合併完成阿，再跟 [charliechiou](https://hackmd.io/@charliechiu/linux2025-homework1) 討論並詳閱 `list_sort` 後，才發現 line 30 - 39 就是從尾端把部份合併完成的 pending 上的每個節點，兩兩進行合併，直到全數合併完， line 41 再建構回原本的雙向鏈結串列。 ```c=28 ... /* End of input; merge together all the pending lists. */ list = pending; pending = pending->prev; for (;;) { struct list_head *next = pending->prev; if (!next) break; list = merge(priv, cmp, pending, list); pending = next; } /* The final merge, rebuilding prev links */ merge_final(priv, cmp, head, pending, list); } // end of list_sort EXPORT_SYMBOL(list_sort); ``` 至此，也終於是稿懂了 `list_sort` 的程式碼了，因此就將 `list_sort`, `merge` 和 `merge_final` 做了簡單的修改，補進去 `queue.c` 中，就能執行了。 ## Pull Request ### [Ensure repetitions input is at least 1 #244](https://github.com/sysprog21/lab0-c/pull/244) > 雖然只是很小的 debug ，但很開心能貢獻程式碼。測試時發現 `it test -10` 竟然沒有報錯，原來在 `qtest.c` 中，只會判斷 reps 是否為整數，所以負數和 0 都能通過檢查，雖然不影響程式運行，但對於使用者不太直觀。 ```shell $ ./qtest cmd> new l = [] cmd> it test -10 l = [] cmd> ``` 因此補進了負數和 0 的除錯判斷後， ```diff -if (!get_int(argv[2], &reps)) { +if (!get_int(argv[2], &reps) || reps < 1) { report(1, "Invalid number of insertions '%s'", argv[2]); return false; } ``` 再次進行測試，便能正常報錯了，提交的 PR 已被 merge ! ```shell $ ./qtest cmd> new l = [] cmd> it test -10 Invalid number of insertions '-10' cmd> ``` ### [Fix console hang after exceeding input limit #248](https://github.com/sysprog21/lab0-c/pull/248) > 這次的 patch 與 [charliechiou](https://github.com/charliechiou) 一同協作這邊做個實驗，開啟 `./qtest` 後連續輸入 5 次的 unkown command ， ```shell $ ./qtest cmd> ak # 任意錯誤指令 Unknown command 'ak' cmd> bk Unknown command 'bk' cmd> ak Unknown command 'ak' cmd> ak Unknown command 'ak' cmd> ak Unknown command 'ak' Error limit exceeded. Stopping command execution. cmd> quit # 沒用 cmd> # 只能使用 `Ctrl` + c 跳出 ``` 這次的 bug 是接續上面的測試後，意外發現的，在原本的 `console.c` 中，當輸入的錯誤超過了錯誤上限 `err_limit` ，會讓設 `quit_flag = true` ，這會導致 `interpret_cmd` 因為判斷到 `quit_flag == True` 直接返回 `flase`。 ```c static void record_error() { err_cnt++; if (err_cnt >= err_limit) { report(1, "Error limit exceeded. Stopping command execution"); quit_flag = true; } } ... static bool interpret_cmd(char *cmdline) { if (quit_flag) return false; ... } ``` 簡單來說，一旦錯誤輸入達到了上限，任何 command 都不再允許，包括 `quit` ，只能輸入 `CTRL + c` 強制離開，這會導致可能的記憶體洩漏，起初我們做了以下的改動，預先宣告 `do_quit` 的函式，就可以再不更動程式的結構下，讓 `record_error` 呼叫，一旦輸入錯誤超過錯誤上限，便強制 `quit` 。 ```diff +static bool do_quit(int argc, char *argv[]); static void record_error() { err_cnt++; if (err_cnt >= err_limit) { report( 1, - "Error limit exceeded. Stopping command execution"); + "Error limit exceeded. Stopping command execution, and quitting"); + do_quit(0, NULL); } } ``` 後來便收到了 [jserv 老師的回覆](https://github.com/sysprog21/lab0-c/pull/248#discussion_r1986271372)，根據此回覆我們引入了一個 helper function `force_quit` 給 `do_quit` 和 `record_error` 使用， ```diff +/* Handles forced console termination for record_error and do_quit */ +static bool force_quit(int argc, char *argv[]) +{ + cmd_element_t *c = cmd_list; + bool ok = true; + while (c) { + cmd_element_t *ele = c; + c = c->next; + free_block(ele, sizeof(cmd_element_t)); + } + + param_element_t *p = param_list; + while (p) { + param_element_t *ele = p; + p = p->next; + free_block(ele, sizeof(param_element_t)); + } + + while (buf_stack) + pop_file(); + + for (int i = 0; i < quit_helper_cnt; i++) { + ok = ok && quit_helpers[i](argc, argv); + } + + quit_flag = true; + return ok; +} + static void record_error() { err_cnt++; report( 1, "Error limit exceeded. Stopping command execution, and quitting"); - do_quit(0, NULL); + force_quit(0, NULL); } } /* Built-in commands */ static bool do_quit(int argc, char *argv[]) { - cmd_element_t *c = cmd_list; - bool ok = true; - while (c) { - cmd_element_t *ele = c; - c = c->next; - free_block(ele, sizeof(cmd_element_t)); - } - - param_element_t *p = param_list; - while (p) { - param_element_t *ele = p; - p = p->next; - free_block(ele, sizeof(param_element_t)); - } - - while (buf_stack) - pop_file(); - - for (int i = 0; i < quit_helper_cnt; i++) { - ok = ok && quit_helpers[i](argc, argv); - } - - quit_flag = true; - return ok; + return force_quit(argc, argv); } ``` 經過這樣的更動，當使用者輸入的錯誤指令超越 `err_limit` 時, `record_error()` 會直接call `force_quit` 強制終止，此 PR 也已經被 merge ! ```shell $ ./qtest cmd> ak # 任意錯誤指令 Unknown command 'ak' cmd> ak Unknown command 'ak' cmd> ak Unknown command 'ak' cmd> ak Unknown command 'ak' cmd> ak Unknown command 'ak' Error limit exceeded. Stopping command execution, and quitting Freeing queue $ ``` ### [Improve log command feedback #252](https://github.com/sysprog21/lab0-c/pull/252) Before: Ambiguous log command feedback ```shell $ ./qtest cmd> log No log file given cmd> log temp.txt cmd> new l = [] cmd> quit Freeing queue $ ``` After: Clear and informative log command messages ```shell $ ./qtest cmd> log No log file given. Use 'log <file>' cmd> log temp.txt Logging enabled: temp.txt cmd> new l = [] cmd> quit Freeing queue $ cat temp.txt l = [] Freeing queue $ ``` ### [Improve source command feedback #253](https://github.com/sysprog21/lab0-c/pull/253) ```diff static bool do_source(int argc, char *argv[]) { if (argc < 2) { - report(1, "No source file given"); + report(1, "No source file given. Use 'source <file>'."); return false; } if (!push_file(argv[1])) { report(1, "Could not open source file '%s'", argv[1]); return false; } return true; } ``` Before: Ambiguous log command feedback ```shell $ cat src.txt new free quit $./qtest cmd> source No source file given cmd> source src.txt cmd> new l = [] cmd> free l = NULL cmd> quit Freeing queue $ ``` After: Clear and informative log command messages ```shell $ ./qtest cmd> source No source file given. Use 'source <file>'. cmd> source src.txt cmd> new l = [] cmd> free l = NULL cmd> quit Freeing queue $ ``` ### Not yet ```test Test trailer test trailer Co-authored-by: Eric Chou <mail> Acked-by:Charlie<mail> Signed-off-by: jserv <mail> ``` 更改前 ```shell $ git add . $ git commit Running static analysis... [fix-format 7fc257f] Test trailer 1 file changed, 1 insertion(+), 1 deletion(-) $ git log -1 | cat commit 7fc257fe67ef4db95967d9794ac1d19acdb5a51a Author: Eric Chou <yphbchou0911@gmail.com> Date: Mon Mar 10 20:23:49 2025 +0800 Test trailer test trailer Co-authored-by: Eric Chou <mail> Acked-by:Charlie<mail> Signed-off-by: jserv <mail> Change-Id: I9662301f035e952ed09a7bdc174d723bcf638436 $ ``` 更改後 ```shell $ git add . $ git commit Running static analysis... [fix-format c27cbf2] Test trailer 1 file changed, 52 insertions(+), 8 deletions(-) $ git log -1 | cat commit c27cbf2546b74daa4e1d906dbea403bfdea73b42 Author: Eric Chou <yphbchou0911@gmail.com> Date: Mon Mar 10 20:27:32 2025 +0800 Test trailer test trailer Co-authored-by: Eric Chou <mail> Acked-by: Charlie <mail> Signed-off-by: jserv <mail> Change-Id: I59822e10fe0527329e9fbaff7cf24354cdfb4a13 $ ```