2024q1 Homework1 (lab0)

# 2024q1 Homework1 (lab0) contributed by < `yan112388` > ### Reviewed by `yeh-sudo` #### Git commit messages Git commit message 內文要清楚地描述什麼東西改變以及為什麼，以 commit [c3f821b](https://github.com/yan112388/lab0-c/commit/c3f821b2e9f047aa745b2e861275a8793c48a262) 為例，沒有內文，可以使用 `git rebase -i` 來作修改，撰寫 commit message 的部份可以參考〈[How to Write a Git Commit Message](https://cbea.ms/git-commit/)〉。 #### `q_sort` `q_sort` 中有錯誤，沒辦法以降序的方式排列，每次排序的結果都是升序。因為在 `q_merge` 中也有呼叫 `q_sort` ，經過測試， `q_merge` 函式也有錯誤。 - [ ] 錯誤範例 ```shell cmd> option descend 1 cmd> new l = [] cmd> it RAND 1000 l = [iedeqfvty ezlvxu fbgtpdt uppiv kpdxsdw iklmb eqkmfuxv cfwnfp xqxcf mdvpihpe fynexlkml cmhip spwdvtngp jxbjvrxy obnusreva ntiafklch hhaunnf jisfojyyn xvhssb gjgflfx fekrzfj zryjbxnf sfflyzffl bjrka iedxtof lsnusw dyvnpb ahwke foaljfdl ndokbcs ... ] cmd> sort ERROR: Not sorted in descending order l = [aazyc adckjz adntgwao aduqlnmk adzvmi afbum ahdvbzk ahqnz ahwke ailbtdv aiwztbgvz ajdnh ajjkg akozp aktjuj almfintvt almywslbb amqlul amzck ancaohqpr anfpl anhvhc aniyi anvizfbcz aodzl apawwff apiwy aqjaloqnl aqodkw arixelv ... ] cmd> free l = NULL cmd> quit Freeing queue ``` #### 共筆避免不必要的縮排，例如在 `q_sort` 中的敘述。 > 考量 Merge Sort 的時間複雜度為 $O(n\log{n})$ ，因此採用此法實作 `q_sort` 會呼叫 `q_mergeSort` > 已更正 ## 開發環境 ```shell $ gcc --version gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 20 On-line CPU(s) list: 0-19 Vendor ID: GenuineIntel Model name: 13th Gen Intel(R) Core(TM) i7-13700H CPU family: 6 Model: 186 Thread(s) per core: 2 Core(s) per socket: 14 Socket(s): 1 Stepping: 2 CPU max MHz: 5000.0000 CPU min MHz: 400.0000 BogoMIPS: 5836.80 ``` :::warning 在終端機執行 `export LC_ALL=C` 之後再執行上述命令，可得到英語訊息。上方翻譯詞彙不精確。 > 已更正 ::: ## 針對指定佇列操作的實作 :::danger 改進漢語表達。 ::: 在進行過程中，參考 `list.h` 與 `queue.h` 中的巨集與函式，以簡化程式碼 ### q_new 使用 malloc 配置空間給 list_head ，並透過 `INIT_LIST_HEAD` 來初始化節點 ### q_free 透過 `list_for_each_entry_safe` 走訪節點並進行釋放節點的動作 ### q_insert_head * 原先的撰寫如下： ```c bool q_insert_head(struct list_head *head, char *s) { if (!head) return false; element_t *newh = malloc(sizeof(element_t)); if (!newh) return false; newh->value = strdup(s); if (!newh->value) return false; list_add(&newh->list, head); return true; } ``` 在 commit 時出現錯誤訊息： > error: Memory leak: newh [memleak] Fail to pass static analysis. 判斷應為記憶體管理失當解決方法：加入 free 來釋放 newh 所佔用的空間 :::danger 避免非必要的項目縮排 (即 `* ` 和 `- `)，以清晰、明確，且流暢的漢語書寫。 ::: ```diff bool q_insert_head(struct list_head *head, char *s) { if (!head) return false; element_t *newh = malloc(sizeof(element_t)); if (!newh) return false; newh->value = strdup(s); if (!newh->value) { + free(newh); return false; } list_add(&newh->list, head); return true; } ``` :::warning 使用 `git diff` 或 `diff` 工具產生程式碼差異列表，注意變更的位移量 ::: ### q_insert_tail 與 `q_insert_head` 類似，但是透過 `list_add_tail` 添加節點到尾端。 ### q_remove_head 使用 `list_first_entry` 取得開頭節點的記憶體位置，接著將節點內容複製到 sp ，並在 sp 末端加入<s>結束符</s> 字串結束字元，最後再使用 `list_del` 刪除該節點 ### q_remove_tail 類似於 `q_remove_head`，但是使用 `list_tail_entry` 取得尾節點的記憶體位 ### q_delete_mid 參考[你所不知道的 C 語言: linked list 和非連續記憶體](https://hackmd.io/@sysprog/c-linked-list#%E6%A1%88%E4%BE%8B%E6%8E%A2%E8%A8%8E-Leetcode-2095-Delete-the-Middle-Node-of-a-Linked-List) 使用快慢指標的技巧（`fast`、`slow`）找到中間節點 ### q_delete_dup 透過 ```list_for_each_entry_safe``` 走訪鏈結串列中的所有節點 1. 如果當前節點的值與下一個節點的值相同，則認定存在重複 2. 如果上一次循環中存在重複，則刪除當前節點 >以上兩項 q_delete，皆包含以下操作： * 以 `list_del_init` 刪除節點 * 以 `q_release_element` 釋放節點佔用的資源 ### q_swap 透過循環，交換相鄰的節點 ```c void q_swap(struct list_head *head) { if (!head || list_empty(head)) return; struct list_head *first = head->next; while (first != head && first->next != head) { struct list_head *second = first->next; list_move(first, second); first = first->next; } } ``` 其中，`list_move` 包含了 `list_del` 與 `list_add`，藉此使程式碼更加精簡 ### q_reverse 採用最基礎的作法，使用 `pre` 、 `curr` 、 `nex` 三個指標紀錄節點，完成反向鏈結串列的操作 ``` c void q_reverse(struct list_head *head) { if (!head || list_empty(head)) return; struct list_head *pre = head->prev, *curr = head, *nex = NULL; while (nex != head) { nex = curr->next; curr->next = pre; curr->prev = nex; pre = curr; curr = nex; } } ``` ### q_reverseK :::danger access 的翻譯是「存取」，而非「訪問」(visit) ::: 透過 `count` 計算當前<s>拜訪</s> 存取過的節點數是否等於 `k` ，若等於，則表示需要將最近存取的這 k 個節點做反向的操作。完成反向操作後，會將 `k` 歸零並繼續存取節點，再並繼續依序存取後續的節點，重複進行上述的判斷與反向操作，直到所有節點都處理完畢。 :::danger 段落中的程式碼標注，使用 1 個 [backtick](https://en.wikipedia.org/wiki/Backtick)，而非 3 個。 > 已更正 ::: ```c void q_reverseK(struct list_head *head, int k) { if (!head || list_empty(head) || k <= 1) return; int count = 0; struct list_head *cur, *nex, target, *tmp_head = head; INIT_LIST_HEAD(&target); list_for_each_safe (cur, nex, head) { count++; if (count == k) { list_cut_position(&target, tmp_head, cur); q_reverse(&target); list_splice_init(&target, tmp_head); count = 0; tmp_head = nex->prev; } } return; } ``` 透過 `list_cut_position` 與 `list_splice_init` 切割與拼接鏈結串列 ### q_sort 考量 Merge Sort 的時間複雜度為 $O(n\log{n})$ ，因此採用此法實作`q_sort` 會呼叫 `q_mergeSort` :::danger 注意詞彙！ $\to$ [資訊科技詞彙翻譯](https://hackmd.io/@sysprog/it-vocabulary) ::: `q_mergeSort`：使用快慢<s>指針</s> 指標找到鏈結串列的中間節點，將鏈結串列分成兩半，遞迴地對左右兩半進行排序，再使用 `mergeTwo` <s>函數</s> 函式將排序後的左右兩半合併為一個排序後的鏈結串列 ```c struct list_head *q_mergeSort(struct list_head *head) { ... struct list_head *left = q_mergeSort(head), *right = q_mergeSort(mid); return mergeTwo(left, right); } ``` `mergeTwo`：該函式的輸入為兩個已排序的鏈結串列 `left` 和 `right`，兩者會合併為一個排序過的鏈結串列。其中，使用`strcmp` 比較 `left` 和 `right` 兩節點值，將值較小的節點插入合併後的鏈結串列中經 `yeh-sudo` 提醒，發現我原先撰寫的程式碼有誤，僅能以升序的方式排列，沒辦法以降序的方式排列。因此，加入判斷式，將需要降序排列的鏈結串列以 `q_reverse` 反向操作。 ```diff + if (descend) + q_reverse(head->next); ``` 此作法為較直覺的簡易作法，缺點為排序完的結果會是 unstable，待改進。 ### q_ascend ### q_descend 以上兩個函式的目的為從鏈結串列中刪除節點，使得鏈結串列保持遞減（descend）與遞增（ascend）順序。因此，兩者操作大致相同，差別僅在於節點值的比較操作。從尾部開始存取鏈結串列，比較相鄰節點的值，並刪除不符合遞增/遞減順序的節點，最終使得鏈結串列保持遞增/遞減順序。 ### q_merge 首先，會取出鏈結串列中的第一個節點，作為輸出合併鏈結串列的頭節點 `qhead`，透過 `list_for_each_entry` 來存取鏈結串列中的每個節點 `cur`。當前節點 `cur` 所指向的<s>隊列</s> 佇列會合併到 `qhead` 所指向的<s>隊列</s> 中，並同時更新合併後的<s>隊列</s> 佇列大小 `qhead->size` ，直到整個鏈結串列存取完成。最終，將合併後的 `qhead` 節點重新插入到原始鏈結串列的頭部，並以 `q_sort`進行排序 :::danger 改進你的漢語表達。 ::: ```c int q_merge(struct list_head *head, bool descend) { if (!head || list_empty(head)) return 0; queue_contex_t *qhead = list_first_entry(head, queue_contex_t, chain); list_del_init(&qhead->chain); queue_contex_t *cur = NULL; list_for_each_entry (cur, head, chain) { list_splice_init(cur->q, qhead->q); qhead->size += cur->size; cur->size = 0; } list_add(&qhead->chain, head); q_sort(qhead->q, descend); return qhead->size; } ``` ## 引入 lib/list_sort.c 到 lab0-c 中 ### 引入步驟先將 [lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) 與 [linux/list_sort.h](https://github.com/torvalds/linux/blob/master/include/linux/list_sort.h) 放入 lab-0 的目錄下 **q_test.c** 引入`#include "list_sort.h"` 在 `console_init()` 函式中新增命令 ```c ADD_COMMAND(listsort, "Sort queue using list sort function from the Linux kernel.", ""); ``` 參考 `do_sort` ，新增 `do_listsort` 來呼叫 `list_sort` 新增 cmp 函式 ```c static int element_cmp(void *priv, const struct list_head *a, const struct list_head *b) { element_t *ela = list_entry(a, element_t, list); element_t *elb = list_entry(b, element_t, list); return strcmp(ela->value, elb->value); } ``` 修改 `Makefile` 以確保 `list_sort.c` 被編譯並連結到最終的可執行文件中 OBJS 變數：用於列出構建目標程式所需的所有目標文件 ```diff OBJS := qtest.o report.o console.o harness.o queue.o \ random.o dudect/constant.o dudect/fixture.o dudect/ttest.o \ shannon_entropy.o \ - linenoise.o web.o + linenoise.o web.o \ + list_sort.o ``` **list_sort.h** 定義 `likely(x)` 與 `unlikely(x)` 巨集，並加上 `#include <stdint.h>` ```diff +#define likely(x) __builtin_expect(!!(x), 1) +#define unlikely(x) __builtin_expect(!!(x), 0) +#include <stdint.h> ``` 發生錯誤 ``` error: data definition has no type or storage class [-Werror] 251 | EXPORT_SYMBOL(list_sort); | ^~~~~~~~~~~~~ ``` > [EXPORT_SYMBOL](https://lkw.readthedocs.io/en/latest/doc/04_exporting_symbols.html) ：在 Linux 核心中，EXPORT_SYMBOL() 巨集是用來將函式或變數符號匯出給其他核心模組使用的在此不需要使用到，因此註解掉，錯誤不再產生最後在 `git commit` 時，發生錯誤 ``` list_sort.c:17:20: style: Variable 'head' is not assigned a value. [unassignedVariable] struct list_head *head, **tail = &head; ``` 參照 [chiangkd](https://hackmd.io/@chiangkd/2023spring-lab0-c#%E7%A0%94%E8%AE%80-list_sortc-%E4%B8%A6%E5%BC%95%E5%85%A5%E5%B0%88%E6%A1%88) 的方法，用 cppcheck-suppress 跨過去 ```diff + // cppcheck-suppress unassignedVariable struct list_head *head, **tail = &head; ``` ### 比較自己的 sort 和 Linux 核心程式碼之間效能參閱 [GNU / Linux 開發工具](https://hackmd.io/@sysprog/gnu-linux-dev/https%3A%2F%2Fhackmd.io%2Fs%2FB11109rdg#%E5%AE%89%E8%A3%9D)，使用 perf 分析工具新增 `trace_100.cmd`、`trace_10000.cmd`、`trace_1000000.cmd` 測試不同筆數資料情況 ```cmd # Test of the performance: sorting 10000 RAND strings option fail 0 option malloc 0 new ih RAND 10000 time sort time free ``` 以下列命令將不同資料數的測試分別進行 5 次 ```shell $ sudo perf stat --repeat 5 ./qtest -f traces_perf/trace_10000.cmd ``` 以下為分析結果所繪製而成的比較表格 | 測試資料筆數 | 指標 | sort | list_sort | |----------|-------------------------|---------------|---------------| | 1000000 | task-clock | 1,527.38 msec | 1,471.19 msec | | | cpu_core/cycles/ | 4,942,712,038 | 4,710,039,345 | | | cpu_core/instructions/ | 4,616,911,710 | 4,695,221,888 | | | IPC | 0.93 | 1.00 | | 10000 | task-clock | 10.89 msec | 11.15 msec | | | cpu_core/cycles/ | 25,118,920 | 24,439,350 | | | cpu_core/instructions/ | 45,358,560 | 45,850,365 | | | IPC | 1.81 | 1.88 | | 100 | task-clock | 2.85 msec | 2.85 msec | | | cpu_core/cycles/ | 1,691,474 | 1,691,474 | | | cpu_core/instructions/ | 2,351,987 | 2,351,987 | | | IPC | 1.39 | 1.39 | 在大規模測試（1000000 和 10000）中，`list_sort` 在大多數指標上都優於 `sort`，尤其是在指令數和每周期指令（IPC），`list_sort` 能夠在更短的 CPU 週期內執行更多的指令。但在小規模測試（100）中，`sort` 的執行時間略短於 `list_sort`。這可能是因為在資料量較小時，排序函式本身的實作細節和常數成本會對效能產生更大的影響。 ## 修改 lab0-c 中 dudect 實作觀察 [oreparaz/dudect/src/dudect.h ](https://github.com/oreparaz/dudect/blob/master/src/dudect.h) ```c if (first_time) { // throw away the first batch of measurements. // this helps warming things up. prepare_percentiles(ctx); } else { update_statistics(ctx); ret = report(ctx); } ``` > 論文有提到： > Pre-processing: We pre-process measurements prior to statistical processing. We discard measurements that are larger than the p percentile, for various2 values of p. **The rationale here is to test a restricted distribution range, especially the lower cycle count tail. The upper tail may be more influenced by data-independent noise.** This (heuristic) process gives good empirical results (makes detection faster); nevertheless we also process measurements without pre-processing. >這段話待詳細理解，為什麼上尾部可能更容易受到與數據無關的雜訊影響？在 `update_statistics` 前，先執行 `prepare_percentiles`，於是參照該檔案修改 `lab0-c/dudect/fixture.c` * 加入 `prepare_percentiles` 與 `prepare_percentiles` * 把 `prepare_percentiles` 新增到 `doit` ```c static int64_t percentile(const int64_t *a_sorted, double which, size_t size) { size_t array_position = (size_t)((double)size * (double)which); assert(array_position < size); return a_sorted[array_position]; } static void prepare_percentiles(int64_t *exec_times) { qsort(exec_times, N_MEASURES, sizeof(int64_t), compare_int64); for (size_t i = 0; i < N_MEASURES; i++) { exec_times[i] = percentile( exec_times, 1 - (pow(0.5, 10 * (double) (i + 1) / (double) N_MEASURES)), N_MEASURES); } } ``` 修改過後，能減少極端值（分佈的上尾部）對執行時間的影響 ## 整合 tic-tac-toe