2025q1 Homework1 (lab0)

# 2025q1 Homework1 (lab0) contributed by < `cmosinverter` > {%hackmd NrmQUGbRQWemgwPfhzXj6g %} ## 開發環境 ```shell $ gcc --version gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel Model name: Intel(R) N100 CPU family: 6 Model: 190 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Stepping: 0 CPU(s) scaling MHz: 88% CPU max MHz: 3400.0000 CPU min MHz: 700.0000 BogoMIPS: 1612.80 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 cl flush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_ts c cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popc nt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shado w flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erm s invpcid rdt_a rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt x savec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req vnmi umip pku o spke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear se rialize arch_lbr ibt flush_l1d arch_capabilities Virtualization features: Virtualization: VT-x Caches (sum of all): L1d: 128 KiB (4 instances) L1i: 256 KiB (4 instances) L2: 2 MiB (1 instance) L3: 6 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-3 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Reg file data sampling: Mitigation; Clear Register File Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRS B-eIBRS Not affected; BHI BHI_DIS_S Srbds: Not affected Tsx async abort: Not affected ``` ## 前置作業在準備實作 `queue.[ch]` 裡面關於佇列的各種操作之前，觀察到 `queue.h` 引入了 `harness.h` 以及 `list.h` 兩個標頭檔，其中 `list.h` 提供了一系列的 Linux 風格有關鏈結操作的實作，必須深入研究並運用在這次作業中。 ### list.h 在大部分的佇列的實作中，可以看到 `function definition` 前面都被加上了 `static inline` 兩個英文單字，例如以下關於初始化鏈結的函式: ```C= static inline void INIT_LIST_HEAD(struct list_head *head) { head->next = head; head->prev = head; } ``` 根據 [C99 規格書](https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf) 的描述，可以得知 `inline` 可以讓函式呼叫的速度加快，但不是絕對，詳細探討撰寫於[本文](https://hackmd.io/A9ZqNFEdS9ySaBgs8BWyXg)。在理解 `static` 的功能之前，我將 [C99 規格書](https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf) 上傳至 ChatGPT 的專案功能資料夾內，以便快速理解第一手資料，並嘗試詢問 ChatGPT 以下內容: > static inline void INIT_LIST_HEAD(struct list_head *head) > { > head->next = head; > head->prev = head; > } > > 根據C99規格書，這裡的 static 的用處是什麼 ? ChatGPT 告知在 [C99 規格書](https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf) 中第 $6.2.2.3 節 Linkage of identifiers 有提到: > If the declaration of a file scope identifier for an object or a function contains the storageclass specifier static, the identifier has internal linkage. 也就是說，當我們把包含 `static` 的函式宣告進 `queue.c` 時，他只會在 `queue.c` 的翻譯單元裡內部可見。總而言之，之所以使用 `static inline` 不單單是為了要加速函式呼叫的速度，還可以避免只單獨使用 `inline` 被編譯器設定為 `external linkage` 時造成 `multiple definition` 的情況。 ### harness.h 待整理 ## 實作 queue.[ch] ### q_new > commit [3e0749e](https://github.com/sysprog21/lab0-c/commit/3e0749e2ab45603af1dfcee255f0462eca2a40a8) * 閱讀 `queue.h` 中對於 `q_new` 的敘述，可以得知要在 allocation failed 的時候返回 `NULL` 值 * 搭配 Linux-style API 對 `head` 進行初始化操作 ### q_free > commit [76bf842](https://github.com/sysprog21/lab0-c/commit/76bf842b4fc07aec280dc6d3598bfc17a444e869) * 檢查 `head` 是否為 NULL pointer * `list_for_each_entry_safe` 遍歷每個 entry * `entry` 用來當作當前指標指向 `element_t` * `safe` 是指向前一個 entry 的指標，用來被釋放掉 * 使用 `q_release_element` 來釋放 * 最後用 `q_free` 釋放掉 `head` ### q_insert_head > commit [615b4e1](https://github.com/sysprog21/lab0-c/commit/615b4e1a35e9ee985ba973d8e7bf72b6cbfd204b) * 檢查 `head` 是否為 NULL pointer * 為一個名為 `entry` 的 `element_t` 型別動態配置記憶體 * 宣告一個 `char *cp` 暫存 `char *s` 裡面的值 **(使用strcpy)** * 將 `char *cp` 指派給 `entry->value` 嘗試提交 commit 後發現被 pre-commit hook 擋下，發現是一系列有關**字串**會導致記憶體錯誤的 built-in，於是打開 `.git/hooks/pre-commit` 裡面查看詳細內容。 ```git # Prevent unsafe functions root=$(git rev-parse --show-toplevel) banned="([^f]gets\()|(sprintf\()|(strcpy\()" status=0 for file in $C_FILES; do filepath="${root}/${file}" output=$(grep -nrE "${banned}" "${filepath}") if [ ! -z "${output}" ]; then echo "Dangerous function detected in ${filepath}" echo "${output}" echo echo "Read 'Common vulnerabilities guide for C programmers' carefully." echo " https://security.web.cern.ch/security/recommendations/en/codetools/c.shtml" RETURN=1 fi done ``` 這個 pre-commit hook 一共擋下了三種 built-in，分別是 `gets` 、 `sprintf` 、 `strcpy` ，閱讀[教材](https://security.web.cern.ch/recommendations/en/codetools/c.shtml)後發現這三個函式都沒有檢查字串的範圍，導致原本不該被覆蓋的記憶體區段有可能會被覆蓋。因為原本的程式碼即使有檢查 `entry->value` 是否有成功分配，但沒有失敗分配的情況下釋放 `entry` 的記憶體導致錯誤。 **解決辦法** * 改用 `strdup` 處理字串操作，成功通過 pre-commit hook **結果** 無法通過 `trace-11-malloc`: ```Shell ERROR: Freed queue, but 4 blocks are still allocated ``` 所以我打開 `trace-11-malloc`，發現裡面 `option malloc` 的值是 25，也就是說，有 25% 的機率會 `malloc` 失敗， `20*0.25=5`，所以上述 error 的值 4 並不會是固定的，但有高機率會是 5 左右的值。原本程式碼雖然有檢查 `entry->value` 沒有被成功被分配記憶體的情況，但是後續沒有將記憶體釋放掉導致錯誤發生。 **解決辦法** 在第2行的地方加入 `free(entry)`，通過測試。 > commit [1be2247 ](https://github.com/sysprog21/lab0-c/commit/1be2247d8d0b40d9e668031703b702dbe5993e8b) ```C= if (!entry->value) { free(entry); return false; } ``` ### q_insert_tail > commit [1be2247 ](https://github.com/sysprog21/lab0-c/commit/1be2247d8d0b40d9e668031703b702dbe5993e8b) * 思路與 `q_insert_head` 相同 * 差別在於最後將節點加入佇列時，使用 `list_add_tail` ### q_remove_head > commit [47fa65d](https://github.com/sysprog21/lab0-c/commit/47fa65d2047cdbc31b56e9522a8c84eafb338057) **第一次嘗試** 一開始以為是要自己分配記憶體並把這段記憶體的指標指派給 `sp` ，所以使用了 `strdup` 這個方法來複製一份字串: ```C sp = strdup(entry->value); ``` 但是在測試過程中會出現以下錯誤: ```shell cmd> new l = [] cmd> ih gerbil l = [gerbil] cmd> ih bear l = [bear gerbil] cmd> rh bear ERROR: Failed to store removed value ``` 仔細閱讀了一下 `queue.h` ，理解到 `sp` 跟 `bufsize` 合起來是 caller 預先準備好的一段記憶體 buffer 資訊，讓我們將字串的值搬進去，而不是用自己分配新的記憶體空間。更新後的實作如下: ```C if (sp && bufsize > 0) { strncpy(sp, entry->value, bufsize); sp[bufsize-1] = "\0"; } ``` 編譯後成功通過 `rh` 指令的測試。 ### q_remove_tail > commit [47fa65d](https://github.com/sysprog21/lab0-c/commit/47fa65d2047cdbc31b56e9522a8c84eafb338057) * 思路與 `q_remove_head` 相同 * 差別在於一開始要獲得 tail entry 時採用 `list_last_entry` ### q_size > commit [3e0749e](https://github.com/sysprog21/lab0-c/commit/3e0749e2ab45603af1dfcee255f0462eca2a40a8) * 檢查 `head` 是否為 `NULL` 或佇列是否為空 * 用 `list_for_each` 迭代器配合計數器計算佇列長度 ### q_delete_mid > commit [cecdb75](https://github.com/sysprog21/lab0-c/commit/cecdb75bd20b5c69c84f2cff635e07692991dc74) * 利用兩個指標，`front` 指向 `head`，`tail` 指向 `tail` * 每一次 `front` 前進一步，`tail` 後退一步 * 達到終止條件時 `tail` 會落在中間的位置 * 用 `list_del` 取消鏈結 * 用 `q_release_element` 釋放記憶體空間 ### q_delete_dup > commit [afcc984](https://github.com/sysprog21/lab0-c/commit/afcc9848cc500bcaaebb13da6b0f3a74ff5c95e5) **第一次嘗試** * 單純使用 `list_for_each_entry_safe` 然後刪除 entry ```C list_for_each_entry_safe(entry, safe, head, list) { if (!strcmp(entry->value, safe->value)) { list_del(&entry->list); q_release_element(entry); } } ``` 出現以下錯誤: ```shell l = [c b b a a] cmd> dedup Segmentation fault occurred. You dereferenced a NULL or invalid pointerAborted (core dumped) ``` 點開 `list.h` ，發現 `entry` 在 `tail` 時，此時 `safe` 指向 `head`，而 `head` 並沒有被包進 `element_t` 因此 `safe->value` 無法被 dereference 而報錯。 **第二次嘗試** * 加入 `&safe->list != head` (第2行) ```C= list_for_each_entry_safe(entry, safe, head, list) { if ((&safe->list != head) && !strcmp(entry->value, safe->value)) { list_del(&entry->list); q_release_element(entry); } } ``` 出現以下錯誤: ```shell l = [c b b a a] cmd> dedup ERROR: Duplicate strings are in queue or distinct strings are not in queue l = [c ... ] ERROR: Queue has more than 1 elements ``` 這邊會錯題目意思，以為是要把重複的刪掉剩下一個，比如說: ```python input = [c, b, b, b, a] ``` 以為輸出是: ```python input = [c, b, a] ``` 正確輸出: ```python input = [c, a] ``` 因此須加入判斷是否移除當下 `entry`: * 如果偵測到 `entry` 字串與 `head` 相同，則設 `is_dup` 為真 * 即使 `entry` 字串與 `head` 不相同，若 `is_dup` 為真，也要移除當下 `entry` 目前版本: ```C= bool is_dup = false; list_for_each_entry_safe(entry, safe, head, list) { if ((&safe->list != head) && !strcmp(entry->value, safe->value)) { list_del(&entry->list); q_release_element(entry); is_dup = true; } else if (is_dup) { list_del(&entry->list); q_release_element(entry); is_dup = false; } } ``` ### q_swap > commit [a95515c](https://github.com/sysprog21/lab0-c/commit/a95515cbfaa74f58cd9ef7d0d4086987b31b089d) * `q_swap` 是 `q_reverseK` 當 `k=2` 時的特例 * 呼叫 `q_reverseK(head, 2)` ### q_reverse > commit [ef49fdd](https://github.com/sysprog21/lab0-c/commit/ef49fdde27f9e953eb681d86bce450cabd964bae) * 使用 `list_for_each_safe` 對佇列進行迭代 * 每次都把當前節點移動到 `head` 之後，結束後佇列即被反轉 * 利用 Linux-kernel style API 來移動節點 ### q_reverseK > commit [a95515c](https://github.com/sysprog21/lab0-c/commit/a95515cbfaa74f58cd9ef7d0d4086987b31b089d) * 計算 `group_left` ，還剩下幾個群組要反轉 * 保留每一個群組的 `head`，這樣就可以使用 `list_move` 依序把群組內的每一個節點移到 `head` 之後 * 當 group_left = 0 時，即完成所有群組的反轉 ### q_sort > commit [613775d](https://github.com/sysprog21/lab0-c/commit/613775da40c9393c567897e065037e04fc9a6d43) 在實作 `q_sort` 之前，為了要完成 `merge sort`，我先實作了一個叫做 `q_merge_two` 的函式，此函式可以將兩個排序好的佇列合併成一個新的排序佇列。這樣每次在合併階段就可以直接使用此函式來完成兩個已排序區段的整合，保持程式碼的易讀性。 `q_merge_two` 的實作中，我先確認兩個佇列 `h1` 與 `h2` 不為 NULL，接著使用兩個指標 `curr1` 和 `curr2` 分別指向兩個佇列的開頭，然後逐一比較它們目前指向的節點，較小者會被移動到結果佇列 `result` 的尾端。當其中一個佇列逐一走訪完畢之後，另一個佇列剩下的節點也會被一併移動到結果佇列中。最後使用 `list_splice` 將結果佇列併入 `h1`，並將 `h2` 清空。因為 `merge sort` 的核心就在於不斷遞迴切割和合併，所以 `q_merge_two` 在之後的 `q_sort` 中兩兩合併時會被重複使用。 ### q_ascend > commit [a86abe6](https://github.com/sysprog21/lab0-c/commit/a86abe6174a8e96043f8eec7a4c0d6ca7745c409) **題目重點** 1. 只要當前節點右邊有任何節點比當前的**小**，就把當前的移除 2. 用 `strcmp` 比較字串大小 **解題方法** * 用兩個指標，一個是 `curr` 當前指標，一個 `iter` 是用來走訪當前指標之後的指標 * 用一個 `safe` 指標儲存當前指標的下一個指標 * 只要當前節點右邊有任何節點比當前的**小**，就把當前的移除 * 只要 `iter` 到達終點或是找到比當前節點小的，就跳出內層迴圈 * 把 `safe` 指派給 `curr` ### q_descend > commit [a86abe6](https://github.com/sysprog21/lab0-c/commit/a86abe6174a8e96043f8eec7a4c0d6ca7745c409) 與 `q_ascend` 邏輯相同，差別在於 `strcmp` 比較時的大小。 ### q_merge **第一次實作** > commit [a86abe6](https://github.com/sysprog21/lab0-c/commit/a86abe6174a8e96043f8eec7a4c0d6ca7745c409) **O(nlogn):** 把全部的佇列都串到第一個柱列上，然後對第一個佇列排序。 **第二次實作** > commit [a3fabfe](https://github.com/sysprog21/lab0-c/commit/a3fabfee778775b23dca42e59d08d7c3dab1658c) **O(n):** 先實作一個 `q_merge_two` 的函式，之後在 `q_merge` 裡面頭尾兩兩合併。 ## 開啟 Address Sanitizer > 開啟 Address Sanitizer，修正 qtest 執行過程中的錯誤 ## Valgrind 排除記憶體錯誤 > 運用 Valgrind 排除 qtest 實作的記憶體錯誤 ## Massif 視覺化記憶體使用量 > 透過 Massif 視覺化 "simulation" 過程中的記憶體使用量，需要設計對應的實驗 **實驗一: 觀察不同操作，使用記憶體的情況** **實驗條件** 1. 先透過 `ih RAND 10000` 隨機加入字串，比較 `q_reverse` 跟 `q_sort` 2. 使用 `--time-unit i` 來當作X軸，`i` 代表 `instruction` 佇列反轉 `q_reverse` ![reverse](https://hackmd.io/_uploads/SyWrsRQCyx.png) 排序 `q_sort` (自己實作的 merge sort) ![sort-custom](https://hackmd.io/_uploads/rylSxk4Akl.png) 從上面兩張圖可以看到，q_sort 在執行期間使用了很多 stack memory，因為我的 merge sort 是用遞迴的方式實作，需要 call stack 把遞迴呼叫的函示推入 stack-frame，所以可以看到在執行過程中會很明顯的看到 stack 的使用。另外值得注意的是，從圖上面來，q_reverse 使用到的 stack 非常少，甚至從圖上都很難看出來，但其實實際上都還有 2KB 左右的 stack 被使用。 **實驗二: 透過設定時間軸為 B (Bytes) 觀察 memory allocation fail 的情況** **實驗條件** 1. 先透過 `ih RAND 10000` 隨機加入字串，調整 malloc 失敗的機率為，觀察不同機率是否有不同記憶體行為。 2. 使用 `--time-unit B` 來當作X軸，`B` 代表 `byte`，當作 X 軸的意思就是總共配置了多少記憶體的意思。 * 猜想失敗率越高的，增長幅度會比較慢，因為有高機率會配置失敗，導致 entry 原先配置的記憶體空間會被釋放掉。 `option malloc 5`: ``` MB 1.163^ : | @@:#: | @::@ :#: | @@@@: @ :#:: | :@::@@@@: @ :#:: | @@::@: @@@@: @ :#::: | @:@@ ::@: @@@@: @ :#::: | @::@:@@ ::@: @@@@: @ :#:::: | @:@@: @:@@ ::@: @@@@: @ :#:::: | :@@@:@@: @:@@ ::@: @@@@: @ :#:::: | :::@ @:@@: @:@@ ::@: @@@@: @ :#:::: | @::@: :@ @:@@: @:@@ ::@: @@@@: @ :#::::: | @:@: @: :@ @:@@: @:@@ ::@: @@@@: @ :#::::: | :::@:@: @: :@ @:@@: @:@@ ::@: @@@@: @ :#:::::: | @@@:: @:@: @: :@ @:@@: @:@@ ::@: @@@@: @ :#:::::: | @@@@@ :: @:@: @: :@ @:@@: @:@@ ::@: @@@@: @ :#::::::: | @:@:@@ @@ :: @:@: @: :@ @:@@: @:@@ ::@: @@@@: @ :#::::::: | @@@@:@:@@ @@ :: @:@: @: :@ @:@@: @:@@ ::@: @@@@: @ :#:::::::@ | @::@@ @:@:@@ @@ :: @:@: @: :@ @:@@: @:@@ ::@: @@@@: @ :#:::::::@ | @@:@: @@ @:@:@@ @@ :: @:@: @: :@ @:@@: @:@@ ::@: @@@@: @ :#:::::::@ 0 +----------------------------------------------------------------------->MB 0 47.26 ``` `option malloc 25`: ``` KB 749.0^ # | @@@# | @@@@@@# | @@@@@@@@@@# | :@@@@@@@@@@@#: | @@@@@:@@@@@@@@@@@#: | @@@@@@ :@@@@@@@@@@@#: | ::::@ @@@@ :@@@@@@@@@@@#: | ::@::: :@ @@@@ :@@@@@@@@@@@#: | @:::: @::: :@ @@@@ :@@@@@@@@@@@#: | :::@:: : @::: :@ @@@@ :@@@@@@@@@@@#: | @:@@:: :@:: : @::: :@ @@@@ :@@@@@@@@@@@#:: | ::@:@ :: :@:: : @::: :@ @@@@ :@@@@@@@@@@@#:: | ::@::@:@ :: :@:: : @::: :@ @@@@ :@@@@@@@@@@@#:: | @@@@::@::@:@ :: :@:: : @::: :@ @@@@ :@@@@@@@@@@@#:: | @@@@@@ ::@::@:@ :: :@:: : @::: :@ @@@@ :@@@@@@@@@@@#:: | @@@@:@@@@@@ ::@::@:@ :: :@:: : @::: :@ @@@@ :@@@@@@@@@@@#:: | :::@@ @:@@@@@@ ::@::@:@ :: :@:: : @::: :@ @@@@ :@@@@@@@@@@@#:: | @@@:: @@ @:@@@@@@ ::@::@:@ :: :@:: : @::: :@ @@@@ :@@@@@@@@@@@#:: | @:::@@@:: @@ @:@@@@@@ ::@::@:@ :: :@:: : @::: :@ @@@@ :@@@@@@@@@@@#:: 0 +----------------------------------------------------------------------->MB 0 128.3 ``` `option malloc 75`: ``` KB 18.44^# |# |# |# |# |# |# |# |# |# : : : :: :: :: : : @ @ @ : : : : :: : |# :::: :: : : : @: :: :: : @ @ : :@ ::: : ::@:: : : : |#::::::: :::: : :: : @: :: @: : @ @ : :@: ::::: ::@:::::: :: |#:::::: :::: : :: : @: :: @: : @ @ : :@: ::::: ::@:::::: :: |#:::::: :::: : :: : @: :: @: : @ @ : :@: ::::: ::@:::::: :: |#:::::: :::: : :: : @: :: @: : @ @ : :@: ::::: ::@:::::: :: |#:::::: @:::::::::::: ::: ::::@: ::::@::::@:::@::::@::::@:::::@::::::@:: |#:::::: @::::: :: ::: ::: :: :@: ::::@::::@:::@::::@::::@:::::@::::::@:: |#:::::: @::::: :: ::: ::: :: :@: ::::@::::@:::@::::@::::@:::::@::::::@:: |#:::::: @::::: :: ::: ::: :: :@: ::::@::::@:::@::::@::::@:::::@::::::@:: |#:::::: @::::: :: ::: ::: :: :@: ::::@::::@:::@::::@::::@:::::@::::::@:: 0 +----------------------------------------------------------------------->MB 0 165.0 ``` 可以觀察到以下兩點: 1. 只要是動態記憶體配置的失敗率越高，X軸總配置的記憶體就越少。 | Fail rate = 5% | Fail rate = 25% | Fail rate = 75% | | -------- | -------- | -------- | | 47.3 MB | 128.3 MB | 165.0 MB | 2. 執行期間的記憶體使用量也會隨著失敗率增加而減少。 ## 解釋 select 系統 > 解釋 select 系統呼叫在本程式的使用方式 ## 分析 console.c 的實作 > 並分析 console.c 的實作，說明其中運用 CS:APP RIO 套件的原理和考量點。可對照參考 CS:APP 第 10 章重點提示 ## 研讀 lib/list_sort.c > 研讀 Linux 核心原始程式碼的 lib/list_sort.c 並嘗試引入到 lab0-c 專案，比較你自己實作的 merge sort 和 Linux 核心程式碼之間效能落差，也該嘗試改進針對鏈結串列排序實作的效能 ## 提供新的命令 shuffle > 在 qtest 提供新的命令 shuffle，允許藉由 Fisher–Yates shuffle 演算法，對佇列中所有節點進行洗牌 (shuffle) 操作，需要以統計的原理來分析你的實作，探究洗牌的「亂度」 > Commit [e472664](https://github.com/sysprog21/lab0-c/commit/e472664c4d713d71016bd3c6e8fc862804c3ea6a) **事前準備:** 在加入新的命令之前，必須要知道我們是怎麼跟 queue.c 互動的。首先看到 console.c 的 console_init，裡面記載了所有對佇列操作的指令，因此我加了一條: ```C ADD_COMMAND(shuffle, "Shuffle the queue", ""); ``` 然後上面還要有對應的 do_shuffle 實作，觀察了一下其他的實作，發現所有的操作都有相對應的錯誤檢查機制，比如說 q_size 就必須檢查當前的佇列是否為 NULL，於是思考有哪些是必須要檢查的項目。與q_shuffle 最接近的操作應該是 q_swap，因為本次要實作的演算法也是要一直不斷地交換兩個節點的位置。原本想要用移動鏈結的方式來達成，但是後來覺得為何不如直接交換兩個節點的字串直，於是就有了以下操作: ```C char *temp = list_entry(tail, element_t, list)->value; list_entry(tail, element_t, list)->value = list_entry(chosen, element_t, list)->value; list_entry(chosen, element_t, list)->value = temp; ``` 使用一個暫時的指標儲存位於 `tail` 的字串，將另一個字串存入 `temp` 指標，然後將 `chosen` 指派給 `tail`，最後再將 `temp` 儲存的地址放入 `chosen`。 **測試程式** 利用課程講義提供的[測試腳本](https://hackmd.io/@sysprog/linux2025-lab0/%2F%40sysprog%2Flinux2025-lab0-d%3Fstext%3D3241%253A4%253A0%253A1744290609%253ALUfztH)，生成一個包含 1, 2, 3, 4 的佇列，然後隨機洗牌 500000，算出卡方測試值如下: ``` chi square sum: 38.497575961215375 ``` 由於四個不同的數字代表有 `4! = 24` 種可能，根據卡方分布表，可以得知 p-value 大約介於 0.95 ~ 0.975 之間，大於一般設定顯著水準 0.05，所以沒有證據可以推翻虛無假設，而虛無假設為: 洗牌的結果遵循均勻分布。 ![image](https://hackmd.io/_uploads/BJiHrBrRJl.png) 從執行結果分布圖來看，因為總執行次數為 `500000` 次，所以根據虛無假設，理論上每一次的結果都要接近 `500000/24 = 20833.33` 次。 ## 資訊與熵互補，資訊就是負熵 > 在 qtest 中執行 option entropy 1 並搭配 ih RAND 10 一類的命令，觀察亂數字串的分布，並運用「資訊與熵互補，資訊就是負熵」的觀念，設計一組新的命令或選項，得以在 qtest 切換不同的 PRNG 的實作 (可選定 Xorshift)，搭配統計分析工具，比較亂數產生器 (如 Xorshift vs. 內建) 的品質 ## Dude, is my code constant time? > 研讀論文〈Dude, is my code constant time?〉，解釋本程式的 "simulation" 模式是如何透過以實驗而非理論分析，達到驗證時間複雜度，需要解釋 Student's t-distribution 及程式實作的原理在 simulation 的時候，在 `queue_insert` 或是 `queue_remove` 有 `is_##x##_const` 的 function，相關的實作在 `dudect/fixture.[ch]` 裡面。因為有多重個需要被測試的函式，因此在這裡用了一個 macro 去產生所有需要的 function signature: ```C=7 /* Interface to test if function is constant */ #define _(x) bool is_##x##_const(void); DUT_FUNCS #undef _ ``` 然後在 `fixture.c` 裡面再用一層 macro 去封裝剛剛在 header 宣告的函式: ```C=250 #define DUT_FUNC_IMPL(op) \ bool is_##op##_const(void) \ { \ return test_const(#op, DUT(op)); \ } #define _(x) DUT_FUNC_IMPL(x) DUT_FUNCS #undef _ ``` 最後詳細的 simulation 實作則是在 `test_const` 裡面。根據論文，我們需要準備兩個 class ，然後對這兩個 class 做 t-test (Welch’s t-test) 來確是否有 leakage。看到程式碼，class 1 會初始一個具有 0 個 element 的 queue，class 2 會初始一個具有 0~9999 個 element 的 queue，以下是初始化每一次測量的佇列的程式碼片段: ```C dut_insert_head( get_random_string(), *(uint16_t *) (input_data + i * CHUNK_SIZE) % 10000); ``` 由於 `prepare_inputs` 在初始化 `input_data` 時已經將每個 chunk 根據 class 的不同設為 0 (class 1) 或是隨機的值 (class 2)，每個 chunk 又有兩個 bytes，所以當我們將這兩個 bytes 的指標轉型成 uint16_t 之後，他會落在一個 0~65,535 中間的值，此時對他取 mod 100000 的話就會落在 0~9999 之間了。看完了 class 1/2 分配的方法後，接著分析 `doit` 函式裡面的 `measure`，假設現在要分析的是 `list_insert_head`: ```C= case DUT(insert_head): for (size_t i = DROP_SIZE; i < N_MEASURES - DROP_SIZE; i++) { char *s = get_random_string(); dut_new(); dut_insert_head( get_random_string(), *(uint16_t *) (input_data + i * CHUNK_SIZE) % 10000); int before_size = q_size(l); before_ticks[i] = cpucycles(); dut_insert_head(s, 1); after_ticks[i] = cpucycles(); int after_size = q_size(l); dut_free(); if (before_size != after_size - 1) return false; } ``` 看到第二行，迴圈並沒有執行 `N_MEASURES` 裡面頭尾兩個 DROP_SIZE 的部分，這樣我覺得有些困惑，因為這樣會產生有 40 組 execution time 為 0 的數據。於是繼續往下 trace code，終於在 `update_statistics` 的第 7 行找到比較合理的敘述: * CPU cycle counter overflowed or dropped measurement ```C int64_t difference = exec_times[i]; /* CPU cycle counter overflowed or dropped measurement */ if (difference <= 0) continue; ``` 不過這邊還是產生了兩個疑問: 1. 為什麼沒有被初始化的 `before tick` 跟 `after tick` 相減之後會是一個 `<= 0` 的值 ? 2. 假設 dropped measurement 相減確實會小於等於 0，那為什麼還需要 dropped measurement ? 反正在 `update_statistics` 的階段也會被跳過。為了回答第一個問題，於是看了一下 `before_ticks` 跟`after_ticks` 分配記憶體的地方: ```C int64_t *before_ticks = calloc(N_MEASURES + 1, sizeof(int64_t)); int64_t *after_ticks = calloc(N_MEASURES + 1, sizeof(int64_t)); ``` 他們都是用 `calloc` 所分配記憶體的而不是 `malloc`，所以一開始的值的確有被初始化，而且為 0。至於第二個問題，從程式碼裡面還是看不太出來 dropped measurement 之所以存在的用途，於是做了以下推測 (待實驗): > 為了保持量測的一致性，排除除了演算法本身時間複雜度以外之因素所造成的影響，而去掉程式頭尾兩個 DROP_SIZE 的量測。接下來是本次實驗最重點的地方，也就是每做一次 `doit` 就更新一次一個型別為 `t_context_t` 的指標 `ctxs`: ```C typedef struct { double mean[2]; double m2[2]; double n[2]; } t_context_t; ``` * **mean:** 平均值 * **m2**: 累計平方差 * **n**: 該 class 的有效樣本數在 `lab0-c` 裡面，計算這三個變數的方式是每完成一次 `doit` 就會把該次 `doit` 的有效樣本一個一個更新 `(t_push)` 至維護的資料結構 `t_context_t` 裡面，直到總處理樣本數達到所規定的上限，也就是 `ENOUGH_MEASURE` 次。詳細的 metric 更新方式可以參考 [dennis40816](https://hackmd.io/@dennis40816/linux2025-homework1#Code-tracing) 的數學推導。在論文中，作者有提到一種 **post-processing** 的技巧，以下是擷取自論文的片段: > Cropping: Typical timing distributions are positively skewed towards larger execution times. This may be caused by measurement artifacts, the main process being interrupted by the OS or other extraneous activities. In this case, it may be convenient to crop the measurements (or discard measurements that are larger than a fixed, class-independent threshold). 在 `lab0-c` 也的確有相對應的實作在 `update_statistics` 跟 `prepare_percentiles`，在每輪的 `doit` 中，都會計算一次 `percentiles` ，然後對每一輪的測量進行裁切: ```C /* t-test on cropped execution times, for several cropping thresholds. */ for (size_t j = 0; j < NUM_PERCENTILES; j++) { if (difference < percentiles[j]) { t_push(ctxs[j + 1], difference, classes[i]); } } ``` 這邊有一個疑問產生: * `percentile` 並不是在測量完 `ENOUGH_MEASURE` 次才進行計算，而是完成一次 `doit` 就算一次，也就是說，單一次計算出的 `percentile` 並不代表整理測量的 `percentile`，是否會發生單次 `doit` 有整體時間偏高的出現而影響整體的結果，還有待確認。測量完所有的樣本之後，可以開始計算 t-test 的 t 了，以下是 Welch’s t-test t 的公式: $$t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$$ 其中，S1 和 S2 代表 class 1 和 class 2 的變異數: $$s^2 = \frac{\sum (x_i - \bar{x}_1)^2}{n_1 - 1} = \frac{m2_1}{n_1 - 1}$$ $$s_2^2 = \frac{\sum (x_i - \bar{x}_2)^2}{n_2 - 1} = \frac{m2_2}{n_2 - 1}$$ 然而已經有 `max_t` 為什麼還要做 `max_t / sqrt(n)` ? * 這是為了消除樣本數的影響，得到一個能「跨樣本數比較」的指標。回憶一下 t 檢定公式： $$t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$$ 如果樣本數越大，t 的值就容易越大，且變大的趨勢跟樣本數的平方根呈現正相關。因此才會有以下公式: $$\text{max}\,\tau = \frac{\text{max}\,t}{\sqrt{n}}$$ 最後，在 report 中，設定了兩個 threshold，分別是 `t_threshold_bananas` 跟 `t_threshold_moderate`，用來判定是否有證據否定虛無假設。 ```C /* Definitely not constant time */ if (max_t > t_threshold_bananas) return false; /* Probably not constant time. */ if (max_t > t_threshold_moderate) return false; ```