2024q1 Homework1 (lab0)

# 2024q1 Homework1 (lab0) contributed by < [`weihsinyeh`](https://github.com/weihsinyeh/lab0-c) > ## 開發環境 ```shell $ gcc --version gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 142 Model name: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz Stepping: 12 CPU MHz: 2100.000 CPU max MHz: 4200.0000 CPU min MHz: 400.0000 BogoMIPS: 4199.88 Virtualization: VT-x L1d cache: 128 KiB L1i cache: 128 KiB L2 cache: 1 MiB L3 cache: 6 MiB NUMA node0 CPU(s): 0-7 ``` ## 原始檔個別作用開始作業的項目整合網頁伺服器與亂數+論文閱讀，應先清楚閱讀作業要求並理解整個專案架構。 * `queue.[ch]` : Queue Implementation * `list.h` : Linux-like doubly-linked list * `console[ch]` : Implement the command line interface * `qtest.c` : Implement the shuffle command and test the functionality of the queue by running the function in console.h. * `Makefile` : Modify it to execute Valgrind or test/debug a single file * `./trace` : In this folder, 17 test `.cmd` file can be feeded to qtest * `harness.c` : 其中有 `find_header` 與 `find_footer` 函式用來追蹤記憶體尾端 magic_number 是否有改變。在程式碼裡面可以看到 `test_calloc` 跟 `test_malloc` 的差別在 calloc 有再把 malloc 的記憶體 memset 為 0 。而檢查時間的 `time_limit = 1` * `./dudect`: 實作論文 "Dude, is my code constant time?" ## 可用的工具 ### Valgrind ``` make valgrind ``` 舉例 : 透過將測試 14 複製到 traces/trace-massif.cmd 然後在 Makefile 新增命令。 ```diff +check-massif: qtest + valgrind --tool=massif ./$< -v 3 -f traces/trace-massif.cmd ``` 動態分析記憶體後，透過命令視覺化記憶體用量，並用 Massif_Visualizer 將檔案用圖片顯。 ``` ms_print massif.out.12498 ``` ### git #### 當原先 fork 前的版本有更新 ``` git remote add upstream git@github.com:sysprog21/lab0-c.git git fetch upstream git rebase upstream/master git add file # 因為有檔案衝突所以必須先手動加入衝突的檔案 ``` 再來手動修正錯誤的檔案。檔案中會有標示衝突的地方。 ``` <<<<<<< HEAD >>>>>>> ``` ``` git rebase --continue git branch # 檢查 branch 名稱 git push origin master -- force ``` #### 當要修改過去的 commit message ``` git log --oneline # 透過將 commit 的紀錄顯示出來，尋找要修正的 commit git rebase -i <commit id> ``` 要改的 commit 將前面的 pick 改為 reword 當離開後則會進入 vim 修改 commit message。 #### 回復個別檔案的版本 ```shell git checkout [commit ID] -- path/to/file ``` 知道 `header file` 不能夠被修改。發現原來這也有其原因。由於 Makefile 的設計是標頭檔的內容被更改了，那麼所有跟此標頭檔相依的檔案都會被重新編譯。 ## 實作指定佇列操作 :::warning 注意書名號 (即 `〈` 和 `〉`) 的使用，不該用大於 (即 `>`) 和小於 (即 `<`) 取代，不僅排版效果不同，也會造成 HackMD 解析的問題。 ::: ### `create_new_element` 將反覆出現在 `q_insert_head` 與 `q_insert_tail` 的部份都用 `create_new_element` 函式呼叫。而從寫 shell 的經驗得知，在終端機輸入命令會紀錄在字元陣列中，再執行對應的操作，問題常常會出現在 PIPELINE 的過程，此時需將字元放到另外一個字元陣列中，再執行對應的操作。否則，當 PIPELINE 其中一個行程完成工作釋放記憶體，另外一個命令就會被影響。 :::danger command 是「命令」，而非「指令」(instruction) ::: 透過 `strcpy()` 將字串複製，然而 commit 時出現 vulnerable 的問題，參閱規格書後改為 `strncpy()` 。 > **C99/C11 Standard (§ 7.21.2.4)** The strncpy function > The strncpy function copies not more than n characters (characters that follow a null character are not copied) from the array pointed to by s2 to the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined. > **參考〈 [Why should you use strncpy instead of strcpy?](https://stackoverflow.com/questions/1258550/why-should-you-use-strncpy-instead-of-strcpy)〉** ```diff - strncpy(new_element->value ,s); + size_t dest_size = strlen(new_element->value); + if (dest_size > 0) { + *(new_element->value) = '\0'; + strncat(new_element->value, s, dest_size - 1); + } ``` 配置記憶體後由於 overcommit 的問題，因此要先檢查是否為指向 NULL 的指標。**C99/C11 Standard (§ 7.20.3 (28))** 發現檢查為指向 NULL 的指標後並釋放是冗餘的動作。 > The free function causes the space pointed to by ptr to be deallocated, that is, made available for further allocation. If ptr is a null pointer, no action occurs. ```diff if (new_element == NULL) { + free(new_element); - free(new_element); ``` ```diff - size_t dest_size = strlen(new_element->value); - if (dest_size > 0) { - *(new_element->value) = '\0'; - strncat(new_element->value, s, dest_size - 1); - } + new_element->value = strdup(s); ``` 修正字串問題仍有不穩定的問題。同樣的程式碼有時可以通過測試，有時失敗。 ```shell # Test of malloc failure on insert_tail ERROR: Need to allocate separate string for each queue element ``` 因此修改 `malloc` 與 `free` 為 `test_malloc` 與 `test_free`。後來改為直接在 queue.c 引入 harness.h 讓 `malloc` 與 `free` 可以透過 function hooking 的技巧被替換掉。 > [commit f6c4877](https://github.com/weihsinyeh/lab0-c/commit/f6c487742dbf17002f48a35242e90d73729b9872) ### `remove_element` 將 `q_remove_head` 以及 `q_remove_tail` 會共同使用到的程式碼寫在 `remove_element` 中。`remove_element` 需要複製要被移除的字串放照到指標 `sp` 指向的位置中。 > [Wikipedia: Null character](https://en.wikipedia.org/wiki/Null_character) > null character (also the null terminator) is a control character with the value zero. >[Feis Studio C](https://youtu.be/kRFmcG_YZEk?list=PLY_qIufNHc293YnIjVeEwNDuqGo8y2Emx&t=522) 影片中的用來儲存字串的字元的陣列最後一個元素為字元 `'\0'` 。 > [What is `\0` (null byte) in C? Explained with examples](https://codedamn.com/news/c/what-is-0-null-byte-in-c) > In the C language, `\0` represents the null character. Not to be mistaken for the digit '0', it's a character with an ASCII value of zero. When you see `\0` in code, you're looking at a single character that represents the number 0 in the ASCII table. It's often utilized as a marker or an endpoint, especially in strings. > **ASCII Table** > ASCII code 00 = NULL (Null character) > ASCII code 48 = 0 (number zero) ```c sp[bufsize - 1] = '\0'; ``` 移除開頭端的節點時，回傳節點裡面的值。此想法可應用在透過節點的值確認移除是否成功。 :::warning 改進漢語表達。 ::: 實作完 `q_remove_head` 後，理解 `the head of the queue` 是一個能存取佇列的結構體，而非佇列的第一個元素。這是命名函式為 `q_insert_head` 以及 `q_remove_head` 的原因，以描述會從 `the head of the queue` 那一端插入或刪除節點的功能。 :::warning 留意[資訊科技詞彙翻譯](https://hackmd.io/@sysprog/it-vocabulary) ::: 透過 Valgrind 分析發現每當建立節點時都會有 8 byte 累加，遇到這種情況通常源自於忘記檢查緩衝 buffer 的大小就去使用。 ```shell +++ TESTING trace trace-01-ops: # Test of insert_head and remove_head ==6700== Invalid read of size 8 ==6700== at 0x4842C30: memmove (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==6700== by 0x10F84C: memcpy (string_fortified.h:34) ==6700== by 0x10F84C: remove_element (queue.c:37) ==6700== by 0x10F9BA: q_remove_head (queue.c:114) ... ==6700== by 0x10D3C2: main (qtest.c:1258) ==6700== Address 0x4ba8448 is 8 bytes before a block of size 2,049 alloc'd ``` 於是先將節點的值所需要的記憶體配置後才去存取。 ```diff + new_element->value = test_malloc(sizeof(char) * (strlen(s) + 1)); if (new_element->value == NULL) { test_free(new_element); return NULL; } ``` 更正後發現在 `memcpy` 時多出 3 bytes 。而 `test_malloc` 則多出 8 bytes。 ```shell +++ TESTING trace trace-01-ops: # Test of insert_head and remove_head ==12324== Invalid read of size 8 ==12324== at 0x4842A8F: memmove (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==12324== by 0x10F888: memcpy (string_fortified.h:34) ==12324== by 0x10F888: remove_element (queue.c:37) ==12324== by 0x10FA03: q_remove_head (queue.c:119) ... ==12324== by 0x10D3E2: main (qtest.c:1258) ==12324== Address 0x4ba80a0 is 3 bytes after a block of size 45 alloc'd ==12324== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==12324== by 0x10F3E3: test_malloc (harness.c:133) ==12324== by 0x10F81A: create_new_element (queue.c:23) ==12324== by 0x10F977: q_insert_head (queue.c:86) ... ==12324== by 0x10D3E2: main (qtest.c:1258) ==12324== Address 0x4ba80a8 is 11 bytes after a block of size 45 alloc'd ``` > **queue.h** > if sp is non-NULL and an element is removed, copy the removed string to *sp (up to a maximum of bufsize-1 characters, plus a null terminator.) **C99 Standard (§ 7.21.2.1) The memcpy function** > The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined. 閱讀規格書上的描述，加入判斷式檢測 overlap，同時也把緩衝的大小依據要複製的字串長度調整。將 `remove_element` 。 ```diff if (sp != NULL) { + size_t len; + if (strlen(element->value) < bufsize - 1) + len = strlen(element->value); + else + len = bufsize - 1; memcpy(sp, element->value, len); sp[len] = '\0'; } ``` ### `q_new` 由於 `malloc` 有 overcommit 的問題。當 `malloc` 回傳 NULL 代表配置不成功，然而有時即便配置成功也並不保證能夠使用該位址。因此透過立即使用 `INIT_LIST_HEAD` 初始化了該 `malloc` 分配的記憶體。不太可能觸發 overcommit 的問題，因為已經開始使用指標 `head` 指向的被配置的記憶體了。補充:`malloc(0)` 要看實作定義。 > **C99/C11 Standard (§ 7.20.3 (28))** > If the space cannot be allocated, a null pointer is returned. If the size of the space requested is zero, the behavior is implementation defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object. ```c struct list_head *head = malloc(sizeof(struct list_head)); if(!head) return NULL; INIT_LIST_HEAD(head); return head; ``` ### `q_free` > 〈[你所不知道的 C 語言：記憶體管理、對齊及硬體特性](https://hackmd.io/@sysprog/c-memory)〉 > `free` 函式釋放的是指標指向位於 heap 的連續記憶體，而非指標本身佔有的記憶體 (`*ptr`)。因此在釋放記憶體時，不能只是將指標 `head` 指向的記憶體 (`*head`) 釋放，應逐一將 `head` 指向的佇列中每個元素的記憶體都釋放。思考每個前置處理器的使用情境，決定用 `list_for_each_entry_safe` 。由於參數只有結構體 (`list_head`) 的指標，指標指向的地方紀錄佇列存放的地址，可用來刪除佇列放的每個元素。而`list_for_each_safe` 使用的情境當 `safe` 是 `list_head` 的指標。 > q_free() - Free all storage used by queue, no effect if header is NULL ```c /* * @node: list_head pointer used as iterator * @safe: list_head pointer used to store info for next entry in list * @head: pointer to the head of the list * / ``` 當只能得到結構體 (`list`) 的位址的話，則無法刪掉結構體 (`element_t`) 裡面的字串位址，因為兩個都為結構體 (`element_t`) 的成員。 `container_of` 巨集逆轉上述流程的意思是能夠透過結構體 (`element_t`) 的成員 `list` 的位址知結構體 (`element_t`) 的位址。想明白後，開始對此作業感到很有趣，過去常常會有題目要求有節點有編號與值，然後依照值的大小去排列回傳編號。我都會額外再開記憶體去紀錄他們的編號，但現在卻可以只知道值的位置，便可知道值從而進行排序，然後在重值的位置得知編號回傳。 > TODO: 到此又開始想要推翻先前的想法，因為只要存取到結構體的其中一個成員都可以透過 `container_of` 將之刪掉。 ```c #ifdef __LIST_HAVE_TYPEOF #define list_for_each_entry(entry, head, member) \ for (entry = list_entry((head)->next, __typeof__(*entry), member); \ &entry->member != (head); \ entry = list_entry(entry->member.next, __typeof__(*entry), member)) #endif ``` ```diff if(l == NULL ) return; + element_t * entry = NULL, *safe = NULL; + list_for_each_entry_safe(entry, safe, l, list) { + free(entry->value); + free(entry); + } free(l); return; ``` ### `q_insert_head` `q_insert_tail` 原先有存取結構體 (`new_element`) 裡的成員要用哪種運算子的問題。閱讀規格書後了解到，存取結構體 (`new_element`) 裡的成員 `list` 的成員 `next` 是存取一個 pointer to qualified structure 的 qualified structure 的 pointer 。因此第一個 operator 為 `->` 第二個為 `.` 。 > **C99 Standard (§ 6.2.3) `.` operator and `->` operator** > * the members of structures or unions; each structure or union has a separate name space for its members (disambiguated by the type of the expression used to access the member via the `.` or `->` operator > * The first operand of the . operator shall have a qualified or unqualified structure or union type, and the second operand shall name a member of that type. > * The first operand of the `->` operator shall have type "pointer to qualified or unqualified structure" or "pointer to qualified or unqualified union", and the second operand shall name a member of the type pointed to. ```c element * new_element = create_new_element(s); new_element->list.next = head->next; ``` 修正：再將新增的結構體 (`element_t`) 的成員 `list` 存入存取佇列的 `head` 指向的第一個 `list_head`　。 ```c list_add(&(new_element->list), head); ``` 修正：原先使用 `list_add_tail` 並未仔細看其中的內容，誤將參數設為 `head->prev`。正確寫法可以有兩種如下。 ```diff - list_add_tail(&(new_element->list), head->prev); +　　//list_add_tail(&(new_element->list), head); + list_add(&(new_element->list), head->prev); ``` 在解決這個問題以前，在測試 2 的時候會有回傳的字串不斷的重複，讓我不斷思考移除結構體 (`element_t`) 以及有重複 `value` 成員的 `element_t` 是否有錯誤。甚至最後還去懷疑 `create_new_element` 函式的正確性。這讓我在錯誤的方向徘徊很久 ### `q_delete_dup` `q_delete_mid` 從 Valgrind 分析測試 6 發現有問題的函式為 `q_delete_dup` 。 ```shell +++ TESTING trace trace-06-ops: # Test of insert_head, insert_tail, delete duplicate, sort, descend and reverseK ==4540== Invalid read of size 8 ==4540== at 0x10FB57: q_delete_dup (queue.c:202) ==4540== by 0x10C033: do_dedup (qtest.c:465) ... ==4540== Address 0x4ba8350 is 48 bytes inside a block of size 64 free'd ``` 但 `make test` 仍然有`Segmentation fault occurred. You dereferenced a NULL or invalid pointer` 的問題。於是我先將 `q_delete_dup` 中跟記憶體配置有關的函式 `test_free()` 拿掉以分析是哪裡出錯。結果如下 ```shell ERROR: Calling delete duplicate on null queue ERROR: Attempted to free unallocated block. Address = 0x5555555555555555 ``` 這是由於沒有釋放記憶體，而造成使用過多記憶體。在持續分析 Valgrind 沒有進展後，改回從測資找到導致 core dumped 的原因。 ```shell cmd> new l = [] cmd> ih RAND 4 l = [qifel qrqzcbu nolhooqq wbjyhgmi] cmd> ih gerbil 3 l = [gerbil gerbil gerbil qifel qrqzcbu nolhooqq wbjyhgmi] cmd> it lion 2 l = [gerbil gerbil gerbil qifel qrqzcbu nolhooqq wbjyhgmi lion lion] cmd> it zebra 2 l = [gerbil gerbil gerbil qifel qrqzcbu nolhooqq wbjyhgmi lion lion zebra zebra] cmd> dedup l = [qifel qrqzcbu nolhooqq wbjyhgmi] cmd> ``` ```shell cmd> sort l = [arlpbjsgb gerbil gerbil gerbil glerdlhpb jjrds lion lion ptraskonp zebra zebra] cmd> dedup Bus error (core dumped) ``` 發現如果前一步有 `sort` 就會有 core dumped，而問題在於 `q_sort` 函式沒有將 `prev` 的指標重新建立。更正後仍有 ```cmd Segmentation fault occurred. You dereferenced a NULL or invalid pointer ``` 所以我就在每次釋放前，都會先檢查結構體（`element_t`）的指標 `value` 是否非 NULL 才可以刪除。因為 `q_delete_dup` 函式跟 `q_delete_mid` 都會刪除節點，所以也修改 `q_delete_mid` 。 ```diff list_del_init(posptr); - test_free(middle->value); + if (!middle->value) + test_free(middle->value); test_free(middle); ``` 然而這樣卻讓原本能夠過的測試 2 會有 2 個 blockes 仍然被留下來。 ```shell +++ TESTING trace trace-02-ops: # Test of insert_head, insert_tail, remove_head, remove_tail, and delete_mid ERROR: Freed queue, but 2 blocks are still allocated --- trace-02-ops 0/6 ``` 由於`q_delete_mid` 剛好只有 2 個，是 `middle->value` 的記憶體未被釋放的原因。因此調整成直接刪除就成功了，因為當釋放的是 NULL 釋放不會起任何功用。 ```diff list_del_init(posptr); - if (!middle->value) - test_free(middle->value); + test_free(middle->value); ``` :::danger command 是「命令」，而非「指令」(instruction) ::: 因為剛剛修改 `q_sort` 的問題沒有原先 (core dumped) 的問題後，就變成還有一些 blocks 還有下，是釋放記憶體的問題。 ```shell +++ TESTING trace trace-06-ops: # Test of insert_head, insert_tail, delete duplicate, sort, descend and reverseK ERROR: There is no queue, but 7 blocks are still allocated ERROR: There is no queue, but 7 blocks are still allocated ERROR: Freed queue, but 7 blocks are still allocated --- trace-06-ops 0/6 ``` 於是改為釋放 `element_t` 的則不用去檢查先檢查結構體（`element_t`）的指標 `value` 是否非 NULL 才可以刪除，而是直接刪掉，就全部都過了，甚至連原先測試 14 都會出現剩下 blocks 如下的問題也隨之解決。 ```shell ERROR: There is no queue, but 88448 blocks are still allocated ERROR: Time limit exceeded. Either you are in an infinite loop, or your code is too inefficient ERROR: Not sorted in ascending order ERROR: There is no queue, but 206500 blocks are still allocated Error limit exceeded. Stopping command execution ``` ### `q_swap` 方式：兩兩為一個單位，所以迭代器 iterator 一次移動二個。 ```c for (node = (head)->next, safe = node->next; node != (head) && safe != (head); node = node->next,safe = node->next) { node->prev->next = safe; safe->next->prev = node; safe->prev = node->prev; node->next = safe->next; safe->next = node; node->prev = safe; } ``` ### `q_reverse` 把每個節點的兩個指標 `prev` 跟 `next` 作交換。以前都要寫成如下： ```c struct list_head *node, *tmp; for (...) { tmp = node->next; node->next = node->prev; node->prev = tmp; } ``` 修正：而現在則是用 `list_for_each_safe` 取代。 ```c list_for_each_safe (node, tmp, head) { node->next = node->prev; node->prev = tmp; } ``` ### `q_reverseK` 原先採用的方式為透過一個迭代器走訪整個鏈結串列，當迭代器經過一個節點時透過第二層迴圈再去繼續拜訪 K 個節點，當經過 K 個節點後，將這 K 個節點形成一個環形雙向鏈串串列。再用 `q_reverse`，並修正頭尾節點的指標將此鏈結串列加入。參考 [vax-r](https://hackmd.io/@vax-r/linux2024-homework1) 後，用 `list.h` 裡的函式利於開發與維護。這也是 Linux 核心最基本的想法，要完成的任務都可以透過很小的命令 (command) 組合，而非開發大型原始碼。此外也發現 `LIST_HEAD(tmp)` 的用法。然而他的想法跟原先一次移動 K 個不一樣，所以改成能不使用除法的方式來計算，每次增加 K 個。 ```diff - int times = q_size(head) / k; + int times = q_size(head); struct list_head *tail; LIST_HEAD(tmp); LIST_HEAD(new_head); - for (int i=0; i < times; i++ ) { + for (int i=0; i < times; i+=k) { int j = 0; list_for_each(tail, head) { - if (j >= k) + if (j == k) break; j++; } ... } ``` ### `q_sort` 先將環形雙向連鏈結串列建為雙向鏈結串列，再用合併排序。參考〈 [linkedListSort](https://npes87184.github.io/2015-09-12-linkedListSort/) 〉重先建立回環形雙向連鏈結串列。由於合併排序實作方式採用升序，因此如果為降序，則直接反轉 (reverse) 即可。雖然合併排序中只修改每個節點的指標 `next` 即可，但需留意在合併排序完後，應依序拜訪排序後的鏈結串列的節點，建立每個件的的成員結構體（`list_head`）的指標 `prev`。 ```diff struct list_head *iter = head->next; + struct list_head *pre = head; while (iter->next) { + iter->prev = pre; iter = iter->next; + pre = pre->next; } + iter->prev = pre; iter->next = head; head->prev = iter; ``` 參考 [yeh-sudo](https://github.com/yeh-sudo/lab0-c/tree/master) 將遞迴的方式改為迴圈的寫法。[commit 0d1ae11](https://github.com/weihsinyeh/lab0-c/commit/0d1ae114a5faddf10b8a937e5987cdf53937c872) 用 top-down 的方式直接比較兩個鏈結串列的第一個節點，然後加到一個鏈結串列後。留意 `!head` 要先檢查，否則先檢查 `list_empty(head)` 會直接存取不存在的結構體 (`list_head`) 的成員 `next` 。`list_is_singular()` 中雖有檢查 `list_empty(head)` ，但卻不行簡化移除判斷式中檢查 `list_empty(head)` 的部分。因為 `list_is_singular(head)` 對 `list_empty()` 的檢查是先假設 `head` 指向的不為空的佇列，所以如果為空反而就不是只含單一節點所以反而會回傳`false`。 ```c if (!head || list_empty(head) || list_is_singular(head)) ``` ```c static inline int list_is_singular(const struct list_head *head) { return (!list_empty(head) && head->prev == head->next); } ``` 同時將合併的寫法運用運算子 `xor` 反轉位元的特性簡化程式。下方註解為真值表。 > [commit 68fa622](https://github.com/weihsinyeh/lab0-c/commit/68fa6229bf9f101f48464ff06f508c4406a34e85) 並簡化為三元運算子。 ```c if (descend == false) { if (strcmp(list_entry(l1, element_t, list)->value, list_entry(l2, element_t, list)->value) < 0) { l1->next = merge(l1->next, l2, descend); return l1; // descend 小於 ==> return l1 } else { l2->next = merge(l1, l2->next, descend); return l2; // descend 大於 ==> return l2 } } else { if (strcmp(list_entry(l1, element_t, list)->value, list_entry(l2, element_t, list)->value) > 0) { l1->next = merge(l1->next, l2, descend); return l1; // ascend 大於 ==> return l1 } else { l2->next = merge(l1, l2->next, descend); return l2; // ascend 小於 ==> return l2 } } ``` ```c struct list_head *node = (descend ^ (strcmp(e1->value, e2->value) < 0)) ? L1->next : L2->next; ``` 原本作法為不斷拆解到最小遞迴。因為在〈 [你所不知道的 C 語言：遞迴呼叫篇](https://hackmd.io/@unknowntpo/B1WzhkvKL/%2F%40sysprog%2Fc-recursion) 〉中知道所有的遞迴都可以轉成迴圈，且得益於現代的編譯器最佳化技術，遞迴的程式不見得會比迭代的版本來得慢。所以，我想知道合併排序遞迴與迴圈的作法在記憶體使用量的差別。於是用 Massif 實驗比較兩種作法。使用遞迴的方式在動態記憶體分析 Valgrind 中出現問題 `Stack overflow in thread #1: can't grow stack to 0x1ffe801000`。透過 `ms_print massif.out.12498` 視覺化記憶體用量。 ![massif12498](https://hackmd.io/_uploads/H1TJ0DdRp.png) > **〈 [Valgrind](https://valgrind.org/docs/manual/ms-manual.html) 〉** Finally, there is at most one peak snapshot. The peak snapshot is a detailed snapshot, and records the point where memory consumption was greatest. The peak snapshot is represented in the graph by a bar consisting of '#' characters. The text at the bottom shows that snapshot 14 was the peak. >peak snapshots are only ever taken after a deallocation happens. This avoids lots of unnecessary peak snapshot recordings (imagine what happens if your program allocates a lot of heap blocks in succession, hitting a new peak every time). But it means that if your program never deallocates any blocks, no peak will be recorded. It also means that if your program does deallocate blocks but later allocates to a higher peak without subsequently deallocating, the reported peak will be too low. 遞迴視覺化的結果如同描述，每一刻都是新的峰值。而改用迴圈實作合併排序演算法。[commit b208c81](https://github.com/weihsinyeh/lab0-c/commit/b208c81a6e3243abfadc2dcbecf54f5271f848c0) 修正完這個問題就通過測試 14 了。然而遞迴的結果會這樣可能源自於記憶體未釋放，待確認。並不代表不能用遞迴。但在 Linux 核心原始程式碼中是用迴圈的作法。 ![massif.out.15262](https://hackmd.io/_uploads/r1S_hv_R6.png) 在測試 17 中透過 Massif 分析發現有跟上方同樣的問題，觀察其中有錯誤的函式為 `test malloc` 與 `q_insert_head` 與 `q_insert_tail` 。 ![massif15654](https://hackmd.io/_uploads/SynX6vuAT.png) :::warning 使用圖片展現 massif 結果。是否定位出記憶體問題？ ::: ### `q_ascend` `q_descend` 原先只是比較兩個存放字元指標的位置，閱讀規格書後應該用 `strcmp`。 > **C99 Standard (§ 7.21.4.2) The strcmp function** > The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2. ```diff - if (entry->value > safe->value) + if (strcmp(entry->value,safe->value) > 0) ``` 更正對於描述的理解，queue `[3 2]` 應該為 `[2]`。 > **queue.h 中 q_ascend q_descend 的定義** > * q_descend() - Remove every node which has a node with a strictly greater value anywhere to the right side of it. > * q_ascend() - Remove every node which has a node with a strictly less value anywhere to the right side of it. 然而當輸入 `[5 4 3 2 2]` 原先的佇列變為 `[5 2]` ，依舊不符合規則，重新審視函式的定義，改為兩層迴圈。外迴圈判斷此次遍歷所有佇列的節點是否還有不符合規則的，如果還有則持續更新直到最後的佇列也符合嚴格單調遞減的規則。注意定義中 strictly less 。將 `>` 改為`>=`。 ```c=1 while(1){ bool nothandle = true; //traverse all element_t in the queue while (entry != head && safe != head){ // check if it meet the rule if (strcmp(first->value, second->value) >= 0) { ... nothandle = false; } entry = safe; safe = entry->next; } if(nothandle == true) break; } ``` 原本直接使用 `queue.h` 中 `list_for_each_entry_safe(entry, safe, head, member)` 函式將不符合規則的`element_t` 移除。然而這樣 `list_for_each_entry_safe` 迭代過程中會將指標 `entry` 指向指標 `safe` 指向的 `element_t` 的位址。如果直接釋放指標 `safe` 會有錯誤。因為`@safe` 是用來存資訊的並非迭代器 iterator 是用作被刪除的。 > **list.h** > @safe: @type pointer used to store info for next entry in list ```diff - list_del_init(&(safe->list)); - free(safe->value); - free(safe); + list_del_init(&(first->list)); + free(first->value); + free(first); ``` ### `q_merge` 這裡的作法我是參照 [vax-r](https://hackmd.io/@vax-r/linux2024-homework1) 。原本花時間思考的問題是 `list_entry` 也就是用 `container_of` 實作的巨集，其中由於結構體 `queue_contex_t` 有兩個成員分別為指標`struct list_head *` 名稱為 `q` 與結構體（`struct list_head`）名稱為`chain`。原本認為是用成員 `q`。 > queue.h 中的定義 > * @q: pointer to the head of the queue > * @chain: used by chaining the heads of queues ```diff - queue_contex_t *headList = list_entry(head,queue_contex_t,q); + queue_contex_t *headList = list_entry(head,queue_contex_t,chain); ``` 但從`container_of` 的操作方式去想後便非常清楚。因為其中一個為指標 `struct list_head *` 另一個為結構體（`struct list_head`）。依據 `container_of` 的操作方式思考，要取得的記憶體所佔據的空間包含一個存放結構體的空間，與一個存放結構體的指標的空間。如果是 q 的話那傳進去的那node則應該是 pointer to list node的位置而非名字，所以要用的話也應該是 `＆head` 。但在 `q_merge` 函式中的意義就是傳入結構體 chain 的起始位置。 >list_entry() - Get the entry for this node > * @node: pointer to list node > * @type: type of the entry containing the list node > * @member: name of the list_head member variable in struct @type > * Return: @type pointer of entry containing node --- ## 研讀論文〈 [Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf) 〉 #### Introduction 時序攻擊法 (Timing attack) 是旁路攻擊 (side channel) 常見的攻擊法。會利用測量實作的時間或是監聽程式運行時的聲音訊號來破解密碼，參見〈 [解讀計算機編碼](https://hackmd.io/@sysprog/binary-representation) 〉。與其他旁路攻擊不同之處在於，「透過偵測目標程式運行的時間變化」來進行時序攻擊的程式只能以少量的與目標程式互動且以遠端的形式。評估的程式要在常數時間並非難事，可以透過檢查是否有分支的程式，但缺點是要建立在硬體上面，由於不同的硬體有不同的硬體特性，為了確保在各種硬體下都能運行才是比較困難的。由平常寫程式的方式偵測時間的例子思考，每次測出的時間次都不固定。原因如下： 1. CPU並沒有在執行完目標程式後才去做其他的事情，而是同時在運行多個行程中快速切換，因此執行的時間會包含其他的行程運行的時間。 2. 以「比對每個密碼的字元」的程式為例，如果密碼很長大於一個快取的容量，那透過分析現在是到第幾個字元才回報密碼不相符就會有將解決 cache miss 的時間計算進去。這篇論文所提出的方法旨在消除因硬體差異而引起的「偵測目標程式運行時間變化」的問題，使其不再需要被考慮。 #### Approach 用於檢測 Timing Leakage 的步驟如下： 1. Measure execution time 前處理 *a) Class definiton : 兩種輸入 fix-vs-random。* 設計採用這兩種輸入的測試方法的原因：可以同時考慮特定條件下的固定情況，也考慮了現實中的隨機變化。論文中的 low weight 是指：因為常態分佈下如果只用隨機的輸入，那邊界情況由於發生的機率很小，有導致會有些情況未從被測試到。在步驟一所需要的技巧不是 Cycle counters 與 Environmental conditions，而是輸入 fix 要如何去設計。如何設計輸入資料不只是在這篇論文，也在其他實驗都很重要。我在code review [vestata](https://hackmd.io/@tony268596/linux2024-homework1) 的筆記時對設計輸入資料認為很有趣，好奇其設計想法。有沒有一個方式去評估一件事要如何證明能夠解決的問題覆蓋率有多大，還是只能侷限在每個人的能夠想到哪裡，就做到哪裡。這樣一點也不科學。 *c) Environmental conditions：* To minimize the effect of environmental changes in the results, each measurement corresponds to a randomly chosen class. 在[原始程式碼](https://github.com/oreparaz/dudect/blob/master/src/dudect.h)dudect_main中 prepare_inputs 函式在 measure 函式前。 2. Apply post-processing *a) Cropping* 通常時間分佈會朝較長的執行時間份佈，而呈現 positively skewed 。原因來自於 OS 的中斷或是很多行程的 Logical control flows 交錯...與測量目標無關的事件。在這種情況下，作者的解決方法為不考慮比 fixed, class-independent threshold 的測量時間。但令人困惑的是 threshold 要設為多少？ *b) Higher-order* 高階差分電流分析 Differential Power Analysis ，因為缺乏背景知識，看了相關的說明，仍無法從字面上知道只透過運行的功率變化即可以知道密碼的原因。 3. Apply statistical test Welch's t-test 透過判斷 null hypothesis :「兩種母體的平均是一樣的」是否是對的。 #### Results *B. Memory comparison* *a) Variable-time implementation* 論文中實驗的兩種input class分別為 1. Our first test harness is as follows. The first class for the leakage detection test considers uniformly distributed random 16-byte strings (“random class”); 2. The second class fixes the input to s (“fix class”). (This assumes a white-box evaluator that has access to implementation internals.) > 詳見〈 [White box testing](https://en.wikipedia.org/wiki/White-box_testing) 〉 > The tester chooses inputs to exercise paths through the code and determine the expected outputs. > Though this method of test design can uncover many errors or problems, it has the potential to miss unimplemented parts of the specification or missing requirements. * timing leakage about the input data : 發生在當實驗即便只執行函式少於 100 次，然而兩種輸入資料卻呈現不同的分佈 t-value 超過 4.5 則視為不同的分佈。 * the asymptotic behavior of the t statistic: it grows as $\sqrt{N}$ where ${N}$ is the number of samples. > 詳見〈 [Black box testing](https://en.wikipedia.org/wiki/Black-box_testing) 〉 > Specific knowledge of the application's code, internal structure and programming knowledge in general is not required. The tester is aware of what the software is supposed to do but is not aware of how it does it. For instance, the tester is aware that a particular input returns a certain, invariable output but is not aware of how the software produces the output in the first place. 作者將第二種 `fix class` 固定在 `s` 為 $0$ 時，兩組輸入資料即便統計的 Timing Distribution 已經很相似了，但仍有些微差異 "reject the null hypothesis that both distributions are the same." *C. AES* 作者還分析了兩種測試資料在進階加密標準上 AES ，由實驗結果可以看出，兩種 Timing distribution 不同，也因此便可以透過時間的變化差異做時序攻擊。 *D. Curve25519 on an ARM7TDMI* 在 Curve25519 on an ARM7TDMI 將 dudect 應用在嵌入式平台上。輸入資料如同 Introduction 所說分為兩組。第一組為均勻分佈的隨機資料，第二組則全為 $0$ ，大部分的輸入資料都在目標處理器上測試，只有少部份以及統計是在主機上測試。而測試出來發現兩組輸入資料的 Timing distribution 不同。也因為 "reject the null hypothesis that both distributions are the same."，知道實作為 variable-time 。作者進而分析 Timing variability 的原因是來自乘法的指令。由於 ARM7TDMI 的其中一個特色是 variable time multiplier 。 > **ARM7TDMI Technical Reference Manual (§ 6.20 ) Instruction speed summary** > Due to the pipelined architecture of the CPU, instructions overlap considerably. In a typical cycle, one instruction can be using the data path while the next is being decoded and the one after that is being fetched. > Unexecuted instructions take one cycle. 因為 ARM7TDMI Technical Reference Manual 提到 N ， S ， I ， C cycle。 * 常見的週期 Cycle > N cycle (Normal cycle): 傳輸資料的週期。 > S cycle (Special cycle): 如果有些指令不能滿足條件的特殊操作。 > I cycle (Interrupt cycle): 處理中斷的週期。 > C cycle (Cache cycle): 從快取中讀資料的週期。到這裡可以回應之前讀 Introduction 產生的問題：「要如何設計輸入資料fix？」。在此作者示範利用 ARM7TDMI 的特性設計出特別的輸入資料 $0$ ，因此當乘法器知道多個 bits 為 $0$ ，會運算較快而提早結束指令。然而同樣的輸入資料卻在 x86 架構上符合虛無假設 "the null hypothesis that both distributions are the same."，因此為 constant time 。因此要說一個原始程式碼為 constant-time 或 variable-time，需要強調是在哪一種處理器上執行的。 #### Discussion *a)* Dudect 是一個只需要 C 編譯器與測量時間的方法的工具。 *b)* 計算成本大部份都是計算要被測試的加密演算法，統計的成本相當少。而記憶體成本則是 $O(n)$ *c)* 作者說明 Dudect 的優點是可以不用去將考慮為個別的硬體作 Hardware modeling，但前提：「 `MUL` 與 conditional move 像是`CMOV` 這些指令都是常數時間內可以執行結束。」 *d)* How many measurements to taks? 看到這段感到很新奇，從作者對先前的實驗結果的解釋來說，如果有 $N$ 個樣本不能滿足虛無假說，則該原始程式碼被視為 variable-time 。然而為何當 $N$ 個樣本滿足虛無假說，就可以視其為 consant time 。是否有些樣本使得 Timing distribution 不一樣但卻未被觀察到？所以作者說一個原始程式碼為 constant-time 或 variable-time，是在多少 $N$ 的樣本下，且不能進一步推論說 $N+1$ 的樣本數量下也會有一樣的結果。再來作者用 DPA 攻擊來說明， DPA 攻擊是需要至少 $N$ 個樣本數量才能運作。這個意思是指它需要觀察到 $N$ 個電流變化才能破解。也因此針對這樣的例子，設計者就需要設定一個 security level 來說明在 security level (特定數量的 measurements)內，原始程式碼會為 constant-time 或 variable-time。而這表示當有更多的證據，則推論是可以被改變的。 *e)* 而 Dudect 的 leakeag detection 還可以輸出 leakage 的量，由於有些原始程式碼需要億級的 measurements 數量級的才能被發現有 Timing leaks 。而這種情況就很難運用在密碼學的計算(需要在有限的操作次數下完成)。所以如果能夠輸出 leakage 的量會對這樣的情況有用。 #### Conclusion 這篇論文就是介紹了一個可以透過觀察兩組輸入 fix-vs-random 的 Timing distribution 是否符合虛無假說，來檢測原始程式碼是否為 constant-time 亦或是 variable-time。 #### [原始程式碼](https://github.com/oreparaz/dudect/blob/master/src/dudect.h) [constant.c](https://github.com/sysprog21/lab0-c/blob/master/dudect/constant.c) 中跑測量是否為常數時間的函式最外面都會加一層 `do{...}while(0)` [Linux kernel coding style](https://www.kernel.org/doc/html/v4.10/process/coding-style.html#macros-enums-and-rtl)。會用在 `constant.c` 的 `measure` 函式測量執行時間並紀錄，在 `differentiate` 計算兩者的差別。接下來 `prepare_percentiles` 先將執行時間全部都 `qsort` 一次，目的是不考慮左邊括號內的超過 threshold 的 measurements。 [commit 9ee34db](https://github.com/weihsinyeh/lab0-c/commit/9ee34dbb1a5daeb98205ca5b100ab9f785a48b41) $1 - \left(0.5^{10 \times \frac{i + 1}{N\_PERCENTILES}}\right)$　。後來我對其中 `t_push` 的操作無法理解，所以看了其參考書籍《[The Art of Computer Programming](https://www.haio.ir/app/uploads/2022/01/The-art-of-computer-programming.-Vol.2.-Seminumerical-algorithms-by-Knuth-Donald-E-z-lib.org_.pdf)》解釋程式計算 $s^2 = \frac{\sum_{i=1}^{N} x_i^2 - N\bar{x}^2}{N-1}$ 不能直接寫分子上面 sigma 累加後相減的原因：相減兩個運算子過於接近，會有 [Catastrophic Cancellation](https://en.wikipedia.org/wiki/Catastrophic_cancellation) ，所以這也是要一直買更好的 GPU 的原因，兩數值在人類視角差異很小的值，一旦有高低之分，無論是小數點後第幾位，他們在電腦中就是差很大的數值（況且小與大都只是人類的想法，一釐米對微生物也算很大）。像是經過 AI 加權總和後都會變很大的。接著我很期待的繼續看 Welford 的方法怎樣解決這個問題。 $(N-1)s^2_N - (N-2)s^2_{N-1} =(x_N - \overline{x}_N)(x_N - \overline{x}_{N-1})$ 看到數學式知道它可以從第一個 sample 迭代去跟下一個 sample 計算。但知道這個還是想不明白[演算法](https://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/)可以避免 Catastrophic Cancellation 的原因，因為也有減法。所以重點是要能夠確定減法的兩個運算子的關係。想了一下才說服自己為何 $(X-Y)(X+Y)$ 可以避免　$X^2-Y^2$ 。糾正剛剛的說法 $X$ 跟 $Y$ 相減要產生 Catastrophic Cancellation 前提是 $X$ 跟 $Y$ 不一樣，但我讓卻讓它相減變成一樣，那這樣的情況會發生在哪裡？因為照理來說精度可以用來表示他們的不同之處，同理也可以表示相減的不同之處。因此就先對 $X$ 跟 $Y$ 的精度做手腳，參考〈 [你所不知道的 C 語言: 浮點數運算](https://hackmd.io/@sysprog/c-floating-point#Floating-Point-Determinism) 〉的 Rounding: "If digits do not fit, they get chopped (rounded)" 所以 X 跟 Y 差距很小，各自平方後，他們的小差距被捨棄，導致 $X^2 = Y^2$ 。而$(X-Y)(X+Y)$ 中 $X+Y$ 與 $X-Y$ 原先都可以被表示不會有捨棄的問題，相加相減亦然，且不會有捨棄發生，避免 Catastrophic Cancellation 。這就是 `t_push` 的由來。此都不考慮 overflow 與 underflow，因為無論計算如何處理，計算結果大於可表示範圍都會遇到，只是看處理的方式。睡到一半突然想到，論文說要做 crop percentile，這件事情沒有科學依據，只是因為作者認為執行時間較長的是在做無關的任務，同時去掉執行效率變高。而我的方式不是常數時間，因爲做了去除 percentile 的操作通過測試。但我的實作不是 constant time 本來就不應該通過測試才是合理的，說以這個論文有兩個疑慮要考慮一、cropping 的存在意義。二、可能真的需要 cropping 但 threshold 要怎麼設定以解決執行時間包含無關的任務執行的時間。 ## 實作 shuffle 命令 #### q_shuffle 原先 [commit ffb47f5](https://github.com/weihsinyeh/lab0-c/commit/ffb47f5e60d57e7a2e4e3c7cc38e3e4a51adcfc2) ，透過先建立與原先佇列有相同節點數量的佇列 `used_head` 。每個節點的字串被作為有沒有被用過的識別用途。然而下命令`shuffle` 卻遇到 `malloc disallowed`。看到 `harness.c` 中有 `noallocate_mode` 才知道有限定使用時機。因此一定有其他寫法。看《[The Art of Computer Programming](https://www.haio.ir/app/uploads/2022/01/The-art-of-computer-programming.-Vol.2.-Seminumerical-algorithms-by-Knuth-Donald-E-z-lib.org_.pdf)》提及 shuffle 過程的四步驟 : 1. Initialize : 是將序列的數量存給 `j` 。 2. Generate $U$ : 產生一個介於 0~1 的數字。 3. Exchange : Set $k$ <- $\lfloor{j\times U}\rfloor+1$. Exchange $X_j$ <-> $X_k$ 。 4. Decrease $j$ : if $j>1$ 回到第二步。看完後才發現原來是我誤會以為一定要從未曾被當作 random 的元素尋找`old`。我的想法原先是這樣 1 為被隨機選過的`0 0 1 0 0 1` 然後我要從 0 中選出`old` 與目前隨機選的值交換。然而有效率的演算法，可以不需要透過去紀錄哪些沒有被選過去找`old`。想法是每次沒被選過的都是 1~j，因為每次都會選一個節點交換，而選的那個節點不需要是隨機的，只要交換一個節點為隨機的便可以達到 shuffle 的功能，因此才將另一個節點都是 $j$。因此我將演算法修正用迴圈每回合減少 j，且每回合選一個值跟 j 交換。而交換的演算法我使用 [Linux list.h 的 swap 函式](https://github.com/torvalds/linux/blob/master/include/linux/list.h)。 > [comit 88ae9fd](https://github.com/weihsinyeh/lab0-c/commit/88ae9fd29865adffcf63a51e5d8a00454b163cf7) 我閱讀時看到討論：「取隨機數應該要用產生一個介於 0~1 的數字再做乘法還是用取餘數的方式?」 > Treating U as an integer and computing its remainder mod k to get a random integer between 0 and k − 1, instead of multiplying as suggested in the text. Thus `LDA U; MUL K;` would be changed to `ENTA 0; LDX U; DIV K;` 但我在想 `int random = rand() % (len--);` 可以直接用 `rand` 取 0~RAND_MAX 的值。那要如何才能避免產生更多運算，來取 0~1 的浮點數?思考這件事的過程又讓我想到取隨機數的範圍都只有 0~RAND_MAX。在 C99 standard 中說明 "RAND_MAX assumed to be 32767" ，這代表隨機數的範圍會被限制。 ```c int rand(void) // RAND_MAX assumed to be 32767 { next = next * 1103515245 + 12345; return (unsigned int)(next/65536) % 32768; } ``` > **C99/C11 Standard (§ 7.23.2.4) The time function - Returns** > The time function returns the implementation’s best approximation to the current calendar time. The value (time_t)(-1) is returned if the calendar time is not available. If timer is not a null pointer, the return value is also assigned to the object it points to. `srand(time(NULL))` 將目前的時間的放進 `next`。傳遞參數 `NULL` 代表沒有要將值存到一個指標。但這沒有解決隨機數的範圍會被限制的問題。同時，終於找到如何取 0~1 的浮點數。首先發現這裡對 RAND_MAX 取餘數，所以只要再除 RAND_MAX 的大小就可以得到 0~1 間的小數。而 $2^{16} = 65536$ 、 $2^{15}=32768$ 這二個數字一定跟 `next` 的型態有關連。去查 `unsigned long int` 的 minimum size (bits) 是 32 ，果然真的有關連。因為 $0$~$2^{32}$ 除 $2^{16}$ 得到的範圍在 $0$~$2^{16}$ 之間。範圍只取　$2^{15}$ 則是因為可以轉型到 `unsigned int`。因為理解這件事後，我才剛好想到得到 0~1 浮點數的方法就是直接`double result = rand()/(RAND_MAX+0.0);` 即可。 ```c srand(time(NULL)); for(; len != 0 ;safe = safe->prev) { int random = rand() % (len--); ``` `srand(time(NULL))` 只初始化以取得當下時間一次，但每次卻都可以拿到隨機的數字。為了知道原因，就寫程式把規格書裡面 `rand` 與 `srand` 的定義寫出來，不使用函式庫，使我可以印出其中變數的值。發現其實初始化 `next` 這個變數後，就不再跟時間有關，不會再去呼叫 `time()` 拿取當下的時間。會動的只有 `next` ，而變化的關係是乘 1103515245 再加 12345。因此，這個亂數是有規律的一點都不亂。我看到這裡覺的超級有趣的。也就說所謂的 random seed 一點也不是產生亂數的由來，這個名字誤導我很久。它其實是找一個獨一無二的數字，也就是當下的時間作為一個起點。讓下次如果要再 `srand(time(NULL))` ，會產生像隨機的序例，因為他們的起點不一樣。 > The same initializing value of seed will always return the same random sequence. 接下來是乘 1103515245 與加 12345 的由來：這源自於 [Linear congruential generator](https://en.wikipedia.org/wiki/Linear_congruential_generator) 。$X_{n+1} = (aX_{n} + c) \bmod m$。從上面的例子明顯看得出來這是有規律的，這個規律會發生重複，即便在最好的情況也會在 $m$ 次後就再次重複，而重複的週期取決於 $m$、$a$、$c$ 的值，一個數值改變，週期就會變。因此為了產生看起來像是亂數的序列，就要去選擇適當的 $m$、$a$、$c$ 讓重複的頻率將低，也就是要增加週期的長度 period length 。參考《[Numerical Recipes in C](https://www.grad.hr/nastava/gs/prg/NumericalRecipesinC.pdf)》的解釋 $c$ 是 $2^{15}$ 的原因： > arithmetic done on unsigned long quantities is guaranteed return the correct low-order bits 選擇的理論來自《[The Art of Computer Programming](https://www.haio.ir/app/uploads/2022/01/The-art-of-computer-programming.-Vol.2.-Seminumerical-algorithms-by-Knuth-Donald-E-z-lib.org_.pdf)》3.2.1 The linear congruential sequence > $m$、$a$、$c$ and $X_0$ has period length m if and only if *i)* c is relatively prime to m; *ii)* b = a − 1 is a multiple of p, for every prime p dividing m; *iii)* b is a multiple of 4, if m is a multiple of 4. 但即便這樣也只能知道這樣選有理論根據，還是不知道 ANSI C 這樣選的原因。雖然還是沒找到答案，但知道不能如同《The Art of Computer Programming》所言用 `LDA U; MUL K;` 。因為產生一個介於 0~1 的數字，如果是透過 `double result = rand()/(RAND_MAX+0.0);` ，則要先用 LCG 的方式變成一個整數亂數，放到暫存器後再除。因此同樣的問題 "`LDX U; DIV K;` with the result appearing in register X. Is this a good method?" 會再度發生。 #### 命令在〈[亂數 + 論文閱讀](https://hackmd.io/@sysprog/linux2024-lab0/%2F%40sysprog%2Flinux2024-lab0-d)〉提供的程式碼中，看到需要透過 `qtest.c` 來執行 `suffle` 的命令。首先在 `qtest.c` 準備 `do_shuffle` 函式，接著加入至 `console_init()` 中。 ```c ADD_COMMAND(shuffle, "Shuffle the nodes of the queue", ""); ``` 實作 `do_shuffle` 的作法參考 `do_swap` 只將其中 `q_swap` 換成 `q_shuffle`。 1. shuffle 不需要任何參數。 2. 確認有佇列才做 shuffle 3. q_shuffle 4. 將 q_show 出來。 5. 在 queue.h 加入 q_shuffle 的定義。由於 `console.c` 透過 `#include <queue.h>` 去使用 `queue.c` 的函式。而標頭檔不能被修改，以避免跟此標頭檔相依的檔案都會被重新編譯。因此提交時會刪掉。 ```c=252 void q_shuffle(struct list_head *head); ``` #### 統計方法驗證 shuffle 樣本數量為 4 ，自由度為 4!= 24 - 1 。 | p-value (probability) | 0.995 | 0.99 | 0.975 | 0.95 |0.90 | 0.10 | 0.05 | 0.025 | 0.01 | 0.005 | | -------- | -------- | -------- |-------- |-------- |-------- |-------- |-------- |-------- |-------- |-------- | | freedom : 23 | 9.260 | 10.196 |11.689 | 13.091 | 14.848 | 32.007 | 35.172 | 38.076 | 41.638 | 44.181 | ##### 實驗ㄧ、這是我用統計方法的驗證，很明顯完全不符合 Uniform Distribution 。![Exoeriment1](https://hackmd.io/_uploads/Skb0ck0AT.png) chi square sum: 36487.09585753372 其對應的 p value < 0.05 。不支持 Uniform distribution 。 ##### 實驗二、這是我修正程式碼用統計方法的驗證，很明顯完全跟作業說明一樣符合 Uniform Distribution 。但我做的唯一修正就是把原先在上方圖中的 `srand(time(NULL));` 這行移掉。它的功能是每次 shuffle 都會將 Linear congruential generator 的 seed 初始化為當下的時間。但這樣其實每次shuffle 都是一樣的操作，根本沒有shuffle 到，因為每次 seed 都被 `static unsigned long int next = 1;` 這樣初始化。這樣的結論讓我對這件事說明是 Uniform distribution 的論點好奇。因為其實 shuffle 是有規律的，只是不知道它從週期的哪個起點開始。那有規律不就不是獨立事件。![right](https://hackmd.io/_uploads/SkZuos6Ap.png) chi square sum: 24.600921614745832 其對應的 p value > 0.05 。支持 Uniform distribution 。但再做一次實驗發現他們的數值又不一樣同時也沒有規律，但每次都確實是用 `next = 1`，重新開始。所以我又寫一個程式去測試最基本的東西，我發現不加初始化的時候，用一個迴圈去跑 `rand` 。每次重新執行程式的確跑出來的是會一樣，這與上面的理解一樣。所以代表 shuffle 的程式碼有一個會變化的東西。我發現原來是在 `q_test.c` 裡面有這 `srand(os_random(getpid() ^ getppid()));` 所以才讓每次都會初始化。但用這種方式 `os_random` 初始化，跟用時間去初始化竟然效果差如此多。但要確認兩件事，到底是每次都在 `q_shuffle` 裡面做 `srand(time(NULL));` 才導致上面的第一張圖沒有 Uniform Distribution 。還是因為改成 `srand(os_random(getpid() ^ getppid()));` 不用時間初始化的原因。所以就做下面實驗三與實驗四。 :::warning 很好，你發現這細節，注意這涉及程式碼的安全議題。在 [CVE](https://cve.mitre.org/) 找 "randomness" 可見幾項資訊安全的通報。 :notes: jserv ::: 但由於我認為程式執行一次期間每次 `q_shuffle` 的這件事情應該需要跟上一次沒有關係。如果每次都沒有在 `q_shuffle` 初始化 `seed` ，其實從程式執行一開始，就可以知道 `shuffle` 完的所有結果，因為我只有一開始去變化 `seed` ，後面一切操作都是有跡可尋的，就只是按照一定的規則在交換 queue 裡面元素的位置而已。 ##### 實驗三、是將 `q_test.c` 裡面的初始化改為 `srand(time(NULL));` 。![Experiment3](https://hackmd.io/_uploads/Skdk0TaCa.png) chi square sum: 12.213699419190709 其對應的 p value > 0.05 。支持 Uniform distribution 。在 `q_test.c` 的 `main` 函式裡初始化只是讓每次程式重跑一次會得到不一樣的結果而已。所以用 `srand(os_random(getpid() ^ getppid()));` 還是 `srand(time(NULL));` 都可以得到 Uniform Distribution 。 ##### 實驗四、是在每次呼叫 `q_shuffle` 函式前先有初始化 seed 的步驟，實作方式是在 `do_shuffle` 裡面增加 `srand(os_random(getpid() ^ getppid()));` ![Experiment4](https://hackmd.io/_uploads/ryQIcC6R6.png) chi square sum: 18.36418182690923 其對應的 p value > 0.05 。支持 Uniform distribution 。原先還沒做實驗前我以為是 `q_shuffle` 裡面「有無」加入初始化 seed 的步驟影響很大。但做完實驗才發現是`srand(os_random(getpid() ^ getppid()));` 與`srand(time(NULL));`差別。因此我先前是不需要去懷疑這個實作方式能夠說明是 Uniform distribution 的，是我自己沒有用對的方式實作，而第二個作法即便最後呈現的是 Uniform distribution 的樣子，但它仍然是錯的，因為不是 q_suffle 這個演算法做的正確才導致呈現成 Uniform distribution ，而是 Linear congruential generator 本身在週期內不重複的特性才得以使第二個作法呈現 Uniform distribution 。此外由於每一次 `rand()` 都是的 `next` 都跟前一次 `next` 有關係，所以每次 `shuffle` 都不是獨立事件，這次的 `shuffle` 影響下一次 `shuffle` ，因此根本不符合 Uniform distribution 。第四個作法就解決此問題，每次 `shuffle` 前都透過`srand(os_random(getpid() ^ getppid()));` 初始化。讓 `shuffle` 不影響下次 `shuffle` 為獨立事件，符合 Uniform distribution 。所以重點是 random seed 的作法。接下來看 [commit f17a9a8](https://github.com/weihsinyeh/lab0-c/commit/f17a9a82514beb2986ed27315b258d92bb889172) 變動 seed 的方法。步驟一、先用 `os_random` 函式的位址對傳進去的　`seed = (getpid() ^ getppid())` 做 xor 。而這個操作完，不只用每一次執行一次程式就透過 `pid` 與 `ppid` 得到，函式的位址也是 Address space layout randomization 隨機化過得。但想了一下 `os_random` 的函式其中接下來的步驟二、是取得目前的高解析度時間 ([high-resolution time](https://www.w3.org/TR/hr-time-3/))，然後步驟三、再做 `random_shuffle` 。因為 ASLR 在程式碼一執行後函式的位址就不再變動，所以他的功能是防止 exploitation of memory corruption vulnerabilities。要解決的問題是一個原先 Linux 2.6.12 版本以前都是將行程 1 與行程 2 配置的記憶體起始位址都固定。因此駭客的攻擊方式藉由先知道如何會讓行程一導致其 stack overflow ，而同樣的方式也可以應用在行程二導致其stack overflow。然而這個 ASLR 為什麼不能直接當亂數種子的原因是他的 entropy 也不夠，因為想要系統要真正達到 entropy 很高可以 TRNG ，但缺點就是為了做到 TRNG 是加入很多 noise 這些可以來源自溫度、電壓等真實世界的參數。然而不這麼做原因是這個產生真的亂數的成本太高了。系統製作亂數要很快，效率與亂度二者平衡下才設計了 PRNG ，因為 PRNG 預先算好( 預先的想法就像 Linear congruential generator，不需要用到當下真實世界的參數)產生一組亂數。這也是在第二步驟使用時間(真實世界的參數)的原因。同時也說明 ASLR 不夠亂的原因，因為發明 ASLR 的最初目的是在載入行程至記憶體的時機使用的，而這個階段要快。那讓 `shuffle` 變成獨立事件的就是後面步驟二與步驟三。所以我就先不做步驟一看結果會不會影響到亂數這件事情，也就是把程式碼的 `x = seed` 直接做後面步驟。實驗完發現不影響仍然是 Uniform distribution，所以步驟一的確主要的功能是避免漏洞。算 random_shuffle 的時候，參考 [Prospecting for Hash Functions](https://nullprogram.com/blog/2018/07/31/)，讓操作不可逆。作者 Chris Wellons 發現達成不可逆的兩個性質。作者尋找適當的 hash function 是透過程式 prospect 一直去算看是否有盡可能滿足兩個特性 High avalanche effect 與 Low bias。而 prospect 自己算出一個很好的 pattern : > The pattern is XOR-right-shift, multiply, XOR-right-shift, multiply, XOR-right-shift.這個操作因為常數時間不常被使用在密碼學。雖然我還是不清楚這三個運算子到底如何變成隨機數，因為前面作者有說 `XOR with a constant is trivially reversible`。同時用這三個運算子後的結果仍需是 1-to-1 mapping 的函數，否則又會有分佈不均勻的問題。補充 `XOR` 這個運算子的特性:二個運算元經過 `XOR` 運算後 0 跟 1 的機率變為一半一半，其同樣源自Uniform distribution帶來的 entroypy 增加，因此新的值要被預測成功的難度又提升了。這件事情呼應 >〈[你所不知道的 C 語言：數值系統](https://hackmd.io/@sysprog/c-numerics#%E5%8A%A0%E8%A7%A3%E5%AF%86%E7%9A%84%E6%87%89%E7%94%A8)〉 >得知其他運算子無法達到 0 或 1 的機率都是 50%。若是用 XOR 的話，每個 bit 變成 0 或 1 的機率都是 50%，所以圖片就會變成看不出東西的雜訊。作者對這三個運算子的功能描述很有趣。 > The XOR-right-shift diffuses bits rightward and the multiply diffuses bits leftward. I like to think it’s “sloshing” the bits right, left, right, left. 想了一下這個操作的概念：比如原先是` ABCD` $\oplus$ `CDAB` ，操作後得到的資訊是（A 跟 C）與（B 跟 D）兩組是同或異。XOR 的結果會導致 higher bytes 與 lower bytes 是一樣的，這對隨機這件事情沒有幫助。接下去看 [multiply xor shift](https://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html#multiplyxorshift) 裡面解釋 shift 將高位元移到低位元的原因：高位元會因低位元乘法進位導致變化大。而這個操作要做至少兩次，因為要讓 shift 到低位元「原先的高位元」也影響高位元「原先的低位元」（再乘法進位一次）。如此一來才能達成讓高低位元變化量平衡的目的。之所以要做步驟三是因為又將取得的當下時間變成一個 > It’s statistically indistinguishable from a random permutation of all 32-bit integers.　而這個目的是為了讓隨機數不可逆。接下來把步驟三拿掉看結果。發現移掉也是 Uniform distribution 。所以第三步驟也不是讓每次執行都變獨立事件的關鍵。那就是第二步驟了。所以現在就要實驗第二步驟改成 `time(NULL)` 其他不變的結果。做完實驗後發現真的是第二步驟讓整建事情變成獨立事件。 ![Experiment7](https://hackmd.io/_uploads/By4hIM0C6.png) 從來沒有想過取得目前時間的方法會讓結果差很大。在([high-resolution time](https://www.w3.org/TR/hr-time-3/))裡面知道原先用 time(NULL) 初始化 seed 是用 wall clock's unsafe current time ，改成 monotonic clock's unsafe current time 。就像他們的命名方式，monotonic clock 只存在一個 process 可以看到。這讓我想起以前第一次看到 tv_secv 是在寫 context switch的時候，所以不像 wall clock 是可以被多個 process 共同使用的。也因此獨立事件這個說法有問題，使用 monotonic clock 是讓取時間這個操作變得更均勻分佈，因爲每次 shuffle 的因為迴圈會跑 sample 數量的次數，每輪 `rand` 再交換，操作所花費為 constant time。所以第二步驟中每次會依時間變化的變數更均勻的分佈。 > attacker may be able to obtain high-resolution estimates through repeat execution and statistical analysis. 然而 monotonic clock 得到更精準的時間，讓變數變化均勻分佈。原先我都認為要得到亂數，不是越不規律越好，像前面提到的 LCG 就是因為有一個規律，所以才要用 random seed 。讀 [Prospecting for Hash Functions](https://nullprogram.com/blog/2018/07/31/) 中這段話讓我對亂數這件事有新的心得。 > One of the important properties of an integer hash function is that it maps its inputs to outputs 1:1. In other words, there are no collisions. If there’s a collision, then some outputs aren’t possible, and the function isn’t making efficient use of its entropy. > 雖然越不規律較好，但不規律的前提就是要變數均勻分布，否則如果一個樣本出現次數過高，那就不能叫做亂數。所以一開始才用 [Pearson's chi-squared test](https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test) 檢驗虛無假說（Null hypothesis），用這個統計方法驗證是否 shuffle，此正好解釋為什麼用 monotonic clock 才能產生不拒絕虛無假設的 Uniform distribution。再繼續思考獨立事件的意義。如果是用 `srand(time(NULL))` 代表亂數種子初始化是用 low-resolution time。如此一來，就會有「前一次初始化亂數種子得到的時間」與「此次初始化亂數種子得到的時間」相同的疑慮。如此一來兩者事件相依。無法達到 Uniform Distribution。可以把它想像成程式碼執行的耗時小於奈秒，因為依據〈 [Linux 核心設計: Timer 及其管理機制](https://hackmd.io/@sysprog/linux-timer#clock-and-timer) 〉中使用者得到的 Wall time 如果小於 nsec 。那兩個事件取得的時間就會一樣(不獨立)。且駭客也可以從第一個執行時取得的時間推算第二個執行時取得的時間(不安全)。因此要做的事情是先透過 high-resolution time 達成(獨立事件)，降低「前次執行時取得的時間」與「此次執行時取得的時間」會相等的機率。而這樣的結果是「前次執行時取得的時間」+「程式執行的時間」=「此次執行時取得的時間」。如果「程式執行的時間」是固定的，那麼每次「執行時取得的時間」就為周期。然而如果有週期的話就代表有規律，駭客則依然可以預測得出來(不安全)。因此時間精準度提高仍未解決 timing side-channel attacks 的問題。因此從此也知曉做第三步驟的原因，為了避免攻擊者透過推算出時間做攻擊，因此要讓第二步驟的變數不可逆推。這整件事情來看每步驟的功能。第一步驟是為了避免漏洞，所以搭配 ASLR。第二步驟為了讓每次操作的變數符合虛無假說，所有樣本出現的機率相同使用高精度。第三步驟的意義在於，因為一些系統上面 [ASLR](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-2556) (補充:ASLR有分等級的) 也無法解絕漏洞被攻擊，同時還有第二步驟會導致 side-channel attack 發生，所以還要第三步驟的 random-suffle 使變數不可被攻擊者逆推。最後再次說明對因為「亂(entropy)而有隨機」的心得，平常都直覺認為預測不出來就是很亂(entropy)，則一定是隨機的，然而這卻是從一維的角度思考。統計的意義使分析可以提升維度，因此一次預測不出來確實是亂，但並不代表預測很多次就找不到其中的規則。就像擲硬幣，確實預測不出來這一次會出現正或反面，則預測一次的時候是亂度高，因為無法從一次獲得資訊量。然而擲很多次後就發現為 50%-50% 是 Uniform Distribution。如果擲很多次後發現是 90%-10%，此亂度就不高，因為從中可以發現「每次預測正面會提高對的機率」，是有規則(不亂)，也就是90%-10%是提供資訊量使人能進而預測為正面。總結來說，要亂度高也就是真的很隨機，就不能只看一次，因此隨機需要到思考到更高的維度，需要統計達到 Uniform Distribution。當我第一次聽老師說統計是提高分析資料的維度，我只是覺得這個思維很新奇。現在親身體會這句話，覺得這句話背後的數學意義很令人難以置信的美好。 [commit 7a51365](https://github.com/weihsinyeh/lab0-c/commit/7a51365cfc0ba3815143a5c8fbe4bed5de846014) [參考 mimalloc](https://github.com/microsoft/mimalloc) 這裡就是一秒 ```c static int time_limit = 1; ``` ```c // 避免碰到 stack 在 sigsegv_handler ，搭配工具把 stack(back) trace印出來。 abort() ``` --- ## 研讀 Linux 核心原始程式碼的 [lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) ### 原理起初想了解 `3*2^k elements can fit into the cache.` 的原因，我就上網去查了關於 L1 cache 的機制，嘗試尋找 `3*2^k`。但我都找不到，後來回去看〈 [研讀 Linux 核心原始程式碼的 lib/list_sort.c](https://hackmd.io/@sysprog/linux2024-lab0/%2F%40sysprog%2Flinux2024-lab0-e) 〉看到老師提供的表格，我真的覺的超級激動的。首先，這句話就是二進位的機制： > 合併前兩個 $2^k$ 變成 $2^{k + 1}$，並留下一個 $2^k$ 不動。透過這種二補數的進位機制，發現原來二進位的機制剛好符合合併排序演算法的 buttom up 。也是這樣的 buttom up 的方式可以避免 cache trashing。這三件事情環環相扣。從count變化的表格可以看到一件事情，黃色螢光筆的位元為第 k 個bit。而當 k 改變，就表示 2 個 $2^k$ 的鏈結串列可以合併。此原理起初不明白但是觀察後發現以 count 變化 `14` $\to$ `15` 來說明： `14` $\to$ `15` 二進位變化為 `1 1 1 0` $\to$ `1 1 1 1`。接下來如何從 `1 1 1 1` 知道合併排序是從哪兩個鏈結串列要做合併呢？ `1 1 1 1` 對應為所代表的十進制是 `8 4 2 1`。那想像有四個鏈結串列，這分別有8個節點、4個節點、2個節點、1個節點。目前此步四個鏈結串列的表示 8 + 4 + 2 + 1 那合併兩個鏈節而來的話上一步可能有（已知 pending 會是哪四個鏈結串列所表示）。 case1: `1 1 1 0` $\to$ 14 ，8 + 4 + (1 + 1) + 1 合併兩個鏈結串列其中有 $2^0$ 個節點 case2: `1 1 0 1` $\to$ 13 ，8 + (2 + 2) + 2 + 1 合併兩個鏈結串列其中有 $2^1$ 個節點 case3: `1 0 1 1` $\to$ 11 ，　(4 + 4) + 4 + 2 + 1 合併兩個鏈結串列其中有 $2^2$ 個節點 case1: 8 + 4 + (1 + 1) + 1 $\to$ 8 + 4 + 2 + 1 是合併二個鏈結串列都含 $2^0$ 個節點 case2: 8 + (2 + 2) + 2 + 1 $\to$ 8 + 4 + 2 + 1 是合併二個鏈結串列都含 $2^1$ 個節點 case3: (4 + 4) + 4 + 2 + 1 $\to$ 8 + 4 + 2 + 1 是合併二個鏈結串列都含 $2^2$ 個節點而很顯然看 count 變化的表格可以知道都是 `1` $\to$ `2` $\to$ `3` $\to$ ... $\to$ `14` $\to$ `15`。轉成二進位來說就是每次都在第 0 個 bit 加 1 個節點。也就是直覺的從 `14` $\to$ `15`。而 pending 上的節點表示也是由 8 + 4 + (2) + 1 表示為併兩個鏈結串列都含 $2^0$ 個節點。但這就是很神奇的地方。首先對應到一件事情看到「[圖解排序過程](https://hackmd.io/@sysprog/linux2024-lab0/%2F%40sysprog%2Flinux2024-lab0-e#%E5%9C%96%E8%A7%A3%E6%8E%92%E5%BA%8F%E9%81%8E%E7%A8%8B)」，指標 `list` 逐一走訪每個節點，並每每一走訪過一個節點就立刻將含有該節點的鏈結串列加入至pending list 中。而非一次先走訪多個節點而形成的鏈結串列，再加入至 pending list 。第一個方式明顯實作比第二個方式容易。在將它第一個方式連結到合併排序，這就剛好是 buttom up 。由很小的單元（一個只含有一個節點的鏈結串列）逐一兩兩合併變成很大的。所以在 source code 看到註解 `Find the least-significant clear bit in count` ```c for (bits = count; bits & 1; bits >>= 1) tail = &(*tail)->prev; ``` 再將這件事情繼續連結到剛剛的二進位的概念中。當有 count 變化 5 $\to$ 6 時，表示加入一個含有一個節點的鏈結串列。相當於最後一個位元加 `1` 此是由原先的某個二進位表示改變成 `1 1 0`　。可能情況如下。 `1 0 0` $\to$ 4 為合併兩個含有兩個節點的鏈結串列。因為第 1 個 bit 做了更動。因此 2 2 1 加入一個含有一個節點的鏈結串列要合併的不是兩個含有1個節點的鏈結串列變成　2 2 (2)　，而是　(4) 1 1 。當加入一個 bit 造成哪個 bit 做更動，就代表它讓 2 個一樣節點數量的鏈結串列合併了。以這個例子而言 `1 1 0` 會變動 1 個 bit 那就是 `1 0 0` ，變動第 1 個 bit ，因此而我合併的就是兩個有 $2^1$ 個節點數量的鏈結串列。因此從這裡就可以看的出來每增加一個節點，會是哪兩個有相同節點數量的鏈結串列需要合併。間接證明了增加一個節點，一定會有二個有相同節點數量的鏈結串列需要合併。 :::warning power of 2 的翻譯是「2 的冪」 ::: 而如果增加一個節點使總節點數量變成 2 的冪不能的原因，也便得很直觀因為 2 的冪的二進位表示為 1 後面更低位元都為 0。所以如果只要變動 1 個 bit 那相當於原先是 0 加兩個有相同節點數量的鏈結串列。此不可能是增加一個節點就可以做到。也因為有上述的思考，發現了二進位的機制剛好符合合併排序演算法的 buttom up　，甚至還利於鏈結串列的實作。而 buttom up 的方式可以避免 cache trashing　這三者巧妙的關係讓我很興奮。還有能夠將這樣的巧妙的關係用數學證明出來，超級厲害的。 ### 原始程式碼在〈 [list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) 〉在 `merge` 其實可以看到也是透過迴圈將兩個鏈結串列相合併，同時它不需要透過指標 `previous` 去維護，而我的方式因為是雙向鏈結串列，所以在 `list_move_tail` 中的 `list_del` 可以只要知道當下的節點就可以改變 `previous` 節點的指標。這是不同的地方。 ```c if (unlikely(!++count)) ``` unlikeyly 是讓編譯器知道 !++count 不太可能發生，所以編譯器可不用對這段程式碼優化。 > 這說法不精確，請看 GCC 手冊和對照 branch prediction 的行為。 > :notes: jserv [Linux compiler](https://github.com/torvalds/linux/blob/master/include/linux/compiler.h) ### 分析 Linux 核心的 lib/list_sort.c 實作並評估其效益針對 Linux 核心風格的鏈結串列開發 [Timsort 排序程式](https://hackmd.io/@sysprog/linux2024-quiz1)，該設計對應的排序測試實驗方案。 --- ## 教材回顧 ### 〈 [你所不知道的 C 語言：指標篇](https://hackmd.io/@sysprog/c-prog/%2Fs%2FHyBPr9WGl) 〉不知道 void 存在的意義: 以前寫程式都會覺得不回傳的函式就會寫一個 void ，延伸 void 就是說不回傳的概念。看教材困惑在於 returns a pointer to a pointer to void。如果 void 是不回傳那上面英文句子不會成立。 > **C99 Standard (§ 6.2.5) Types** > The void type comprises an empty **set** of **values**; it is an incomplete type that cannot be completed. void type 是一個空集合的概念。但好奇，void 為 set 卻不能說 void 為一個 empty value。因為函式每次只會回傳一個 value。再往下看其他定義。 > **C99 Standard (§ 6.2.5) Types** > * array type : describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. > * structure type : describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array) > * union type : describes an overlapping nonempty set of member objects, each of which has an optionally specified name and possibly distinct type. 閱讀完規格書 void 的解釋，因為 function type 可以回傳 array type 也可以回傳 value，如果要讓可以更 general 的表示所以用 set。 pointer to pointer 使用情境：修改指標所指向位址的值。 pointer to pointer to pointer 使用情境：會用到 array of pointer 的地方是像是 CPU 在換行程 (process) 的時候，需要有 array of pointer ，每個指標都會指標指向一個 process context 裡面存 program counter...等。所以就可以用 pointer to pointer 去改 program counter。 ### 〈 [你所不知道的 C 語言： Stream I/O, EOF 和例外處理](https://hackmd.io/@sysprog/c-stream-io) 〉 > 若程式都要週期性更新，這樣不足以反映操作，於是可藉由 SIGWINCH (Window size change) signal 及其註冊的 handler 來實作更新。應用程式透過異常控制流 (Exceptional Control Flow)與作業系統互動。<s>異常處理通過 indirect procedure call 把返回的地址存入stack中。而返回的地址可能為目前或下一個指令。</s> procedure 的形式包涵 (function, method, subroutine, handler) > [indirect procedure call](https://www.gnu.org/software/gawk/manual/html_node/Indirect-Calls.html)： By using indirect function calls, you can specify the name of the function to call as a string variable, and then call the function. 這個例子就好比在執行 cmp 時，透過傳遞函式的名稱（以字串表示）來明確指定所需的排序規則，而這些排序規則則是在函式內部被定義的。我看完還不懂 indirect 的意思，所以我就讀〈 [Reducing Indirect Function Call Overhead In C++ Programs](https://dl.acm.org/doi/pdf/10.1145/174675.177973) 〉裡面的定義 > the address of the call target is loaded from memory. 看完的理解是如果函式是需要透過位址以 exception 的處理而言是 exception table 存放的位址，或是以指標像是 cmp 用字串來呼叫。而 direct procedure 是直接透過函式的名稱來呼叫，所以編譯的時候就可以直接知道會執行到哪個函式。 :::danger 這段話不精準。 ::: 返回地址為目前指令或下一個指令取決於異常的類別(interrupt, trap, fault, abort )。有機會返回目前指令的為 fault ，不會返回的是 abort。只有 interrupt 為非同步的訊號處理。 > 同步與非同步的由來 **CASPP (§ 8.1.2 Classes of Exceptions - Interrupt)** > Interrupts occur asynchronously as a result of signals from I/O devices that are external to the processor. Hardware interrupts are asynchronous in the sense that they are not caused by the execution of any particular instruction. 從這句話推得來自 processor 外的訊號是非同步的。從電路的角度解釋同步與非同步：同步是因為 clock 之間有因果關係。非同步是 clock 之間沒有因果關係。所以，電路設計分成同步電路與非同步電路設計，是因為同步電路利用時脈讓多個子系統間可以同步運作，以完成有因果關係的功能。從 interrupt 是 Signal from I/O device 。所以這個訊號不是 processor 運算得來的，因此對於 processor 而言，此訊號與其他系統沒有因果關係，因此才用非同步訊號。 ### Completely Fair Scheduler 紅黑樹 EEVDF RCU-safe lock lock 之間不會相互干擾 radix tree 為什麼 list 的 API 裡面不提供 swap ？因為可以用 splice 操作達到 [Maple Tree](https://docs.kernel.org/core-api/maple_tree.html) RCU lock free `__typeof__` 底線 `if __name__ == '__main__'` : 向下相容 `__const` 只能傳常數進去 tty ## 整合 tic-tac-toe 遊戲功能 #### 加入　`ttt`　命令我原本以為是要在 `qtest.c` 準備 `do_ttt` 函式，接著加入至 `console_init()` 中。但後來我發現有 `qtest.c` 裡面有 `#include "console.h"`且在主函式有呼叫 `console.c` 的`init_cmd()`，所以可以加在裡面。 ```c ADD_COMMAND(ttt,"Play tic tac toe", ""); ``` #### 實作 do_ttt() 為了能夠跑 ttt 專案的程式碼，先把需要的程式碼移進去。將 ttt 專案的 `main.c` 函式裡面的程式放在 do_ttt()中，接著相關程式碼( `agents/mcts.[ch]`, `agents/negamax.[ch]`, `agents/util.h`, `game.[ch]`, `mt19937-64.[ch]`, `zobrish.[ch]` )移入。之後在 `console.c` 引入標頭檔。原本修改 `Makefile` 的時候只有在 OBJS 加目標檔，但是因為 `deps := $(OBJS:%.o=.%.o.d)` 的 `.%.o.d` 最前面有這一個`.`，直接`-include $(deps)` 的時候找不到 `.agents/negamax.o.d` 。所以先建立 `.agents` 的資料夾透過命令 `mkdir -p .$(AGENTS_DIR)`。在commit 的時候把變數的範圍縮小，然而仍然會有這個錯誤:"Following files need to be cleaned up:"所以我就先把 pre-commit.hook 的 `FILES` 註解掉，讓 git 可以追蹤後再解註解回去原本的。 [commit 9da01fb](https://github.com/weihsinyeh/lab0-c/commit/9da01fb1f5b13d113006731d916c4c27f27a5419) ##### negamax 演算法看到negamax 演算法，先想到之前的資料結構 Min-Max Heap ，但因為 negamax 節點的更新會從 child 選，所以不用用到這個資料結構來做插入跟刪除。 ##### 使用 Monte Carlo tree search (MCTS) 演算法將 `agents/mcts.[ch]` 中的 double 改為定點數的 type unsigned long ## 檢查項目 ### 〈[Teach Yourself Programming in Ten Years](https://norvig.com/21-days.html)〉由於同學分享原先找不到的論文〈The Cost Distribution of Queue-Mergesort, Optimal Mergesorts, and Power-of-2 Rules〉。看到論文時發現作者還是統計所。學到現在對大學有很多與高中不一樣的感覺。我原先都把各科目當作獨立的學問。像是學人工智慧好像新的東西，很多人就算不懂電腦科學背後很多事情，也能大談闊論人工智慧。在直接跳級從 AI 開始學，連線性代數、演算法、C都沒學過的人面前，我覺得自己沒有甚麼價值，因為他們早起步就表現的比我好。但我現在知道兩個原因: 一、我讀過兩年仍只是學習現象，二、探究的仍還只是 AI 的皮毛。學習現象需要大量時間，同時也很重要。而當學習現象後去思考原理，發現他們不再只是單一學科，而是很多學科的知識。我爸說要學，才不會閉門造車，從輪做起。要思，才能納為己有，而不只是抄襲。如此一來，才能站在巨人的肩膀上，看得更遠。這件事情剛好呼應這篇文章的下面這段話。 > Get interested in programming, and do some because it is fun. 學習新知，起初會感到新鮮有趣。然而持續思考原理卻需要付出努力，這並非總是有趣的。然而，如果僅止步於此階段，就不會理解背後的原理，體會到更大的有趣。所以也呼應下句話持續讓這個過程有趣。 > Make sure that it keeps being enough fun so that you will be willing to put in your ten years/10,000 hours. 再來是這句話，我之前在讀很多論文的時候，我都覺得我讀懂了，但當開始實作時，才發現自己連一段程式碼都看不懂，但一直不知道要怎麼解決「總是花很長的時間去研究程式到底怎麼對應到論文的說法」。上課聽到「缺乏的是 domain knowledge」後，我才知道到底出了甚麼問題。因為以我擁有的知識，可能只能理解到某個地步，然而這與作者的思維與知識其實是有落差的。也感謝老師上課說他每年都只要求學生讀一篇論文然後把他們實作出來。聽完這句話後，我現在再也不會羨慕別人讀書都讀的很快，因為知道自己的資質就是只能慢慢地讀，讀太快反而會變成甚麼都學不會。呼應文章中這段話。 > The best kind of learning is learning by doing.