Linux 核心專題: 重做 lab0

# Linux 核心專題: 重做 lab0 > 執行人: bclegend > [專題解說影片](https://www.youtube.com/watch?v=FopiIjY-zgo) ### Reviewed by `LindaTing0106` > 參照 dudect 中函式 update_statistics() 會放棄前 10 個量測值，而不是從頭開始計算。因此在 lab0-c 中也做了對應的更改，即在進行統計分析時，同樣放棄前 10 個量測值。 dudect 中捨棄前 10 個量測值的理由為何？ ### Reviewed by `Booker-Chen` 在 dudect 原始碼中的 `prepare_percentiles` 函式是將百分位數存進 `ctx->percentiles` 這個陣列，但你是把百分位數存進 `ctx->exec_times` 這個陣列，可以解釋為甚麼做這樣的改動嗎 ? ### Reviewed by `SuNsHiNe-75` > [commit ebc7886](https://github.com/sysprog21/lab0-c/commit/ebc78862d7bc8b3b680eccbd3d3439b53b15f84f) [commit 805c765](https://github.com/sysprog21/lab0-c/commit/805c765d730d66726b5058401d4b6e4b88270312) 你的 Commit message 可以寫得更好，參見 [How to Write a Git Commit Message](https://cbea.ms/git-commit/)，大原則如「內文每行最多 72 字」等需特別注意。 > 收到，已更正 [Comparing changes](https://github.com/sysprog21/lab0-c/compare/master...bclegend:lab0-c:master) ### Reviewed by `lintin528` 在佇列操作的程式碼實作中，可以將 `commit` 的重點部分放到簡記裡，比對文字說明的效果會更好。 ## TODO: 重做 lab0 並彙整學員成果 > 務必依循作業規範，執行所有要求之任務 > 善用 git rebase 操作，在最新的 [sysprog21/lab0-c](https://github.com/sysprog21/lab0-c) 開發 ### 佇列操作的程式碼實作 #### q_new > [commit 1195f8a](https://github.com/sysprog21/lab0-c/commit/1195f8aa7fe77169ea438ba2d5441f843269f7e4) :::danger $\to$ [commit 6bb8bbf](https://github.com/bclegend/lab0-c/commit/6bb8bbfadb3e79d7139cf8dfb690f6891b52fd78): > Implementation of q_new > In q_new, we have to check the memory whether has been allocated correctly after we called malloc. 不是清晰的訊息，因為: 1. 標題的 `q_new` 多餘，本修改著重在鏈結串列的建立，沒必要寫出函式名稱，應描述其英語的操作 2. 內文也不用提 `q_new`，git 總是會追蹤變更的檔案內容，你該強調配置記憶體要注意什麼、在什麼時候可釋放已配置的空間，和定義「建立新的佇列」和「建立新的節點」有什麼差別 3. 之所以要寫好 git commit message，就是讓你的想法和相關舉措得以清晰地記錄下來，從而便於日後的協作。並非機械性地對程式碼變更進行逐行解釋 > <s> 收到，已改進 [commit 2cf0d43](https://github.com/sysprog21/lab0-c/commit/2cf0d439b3fd5a7a0bc61ea2d2e4df86781ecfe7) </s> > :warning: 你沒有改進！務必依循 [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/) 的規範，再次 rebase! ::: > 已修正 [commit 1195f8a](https://github.com/sysprog21/lab0-c/commit/1195f8aa7fe77169ea438ba2d5441f843269f7e4) 宣告一個新的佇列並將 `list_head` 大小的空間分配給新的佇列。由於 `malloc` 依據 C99 規格書 7.20.3 中所敘述，可能會配置失敗。 > If the space cannot be allocated, a null pointer is returned. 因此利用 `if(!q_empty)` 來判斷該佇列是否成功分配。將佇列結構中指向下一個以及前一個節點的指標 `prev` 和 `next` 宣告在該新佇列之中。 :::danger Peter Hutterer 在 [On commit messages](https://who-t.blogspot.com/2009/12/on-commit-messages.html) 說得很精闢： > 「重新了解一段程式碼更動的脈絡很浪費腦力。雖然這件事情沒辦法完全避免，但是我們可以盡量[降低](https://www.osnews.com/story/19266/wtfsm/)這件事情的複雜度。Commit messages 正可以做到這點，而**我們可以從 commit message 看出一個開發者是否為一位好的合作對象**。」一個專案是否能長期且成功地運作 (撇除其他影響的因素)，取決於它的可維護性，而在這件事上沒有其他工具比專案本身的開發軌跡更為強大。因此花時間學習如何撰寫與維護專案的開發紀錄，是件值得投資的事。一開始你可能會覺得麻煩，但它很快就會變成一種習慣，甚至能成為你之所以感到自豪及具備生產力的因素。 Chris Beams 在 [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/) 一文 (繁體中文翻譯: [如何寫一個 Git Commit Message](https://blog.louie.lu/2017/03/21/%E5%A6%82%E4%BD%95%E5%AF%AB%E4%B8%80%E5%80%8B-git-commit-message/)) 提出，好的 Git commit message 應符合七條規則: 1. 用一行空白行分隔標題與內容 2. 限制標題最多只有 50 字元 3. 標題開頭要大寫 4. 標題不以句點結尾 5. 以祈使句撰寫標題 6. 內文每行最多 72 字 7. 用內文解釋 what 以及 why vs. how ::: #### q_free > [commit d3d5e80](https://github.com/sysprog21/lab0-c/commit/d3d5e808697f7b744cc332d91b7ef017e3fc05e0) :::danger 改進你的漢語表達。 ::: 首先需要確認佇列是否存在。接著宣告指標 `entry` 用於指向當前的節點和指標 `safe` 用於指向下一個節點。利用 `list_for_each_entry_safe` 遍歷佇列中的每個節點，使用 `list_del` 刪除節點與佇列的連結，並使用 `q_release_element` 釋放該節點的記憶體。最後，在刪除完所有節點後，釋放 head 節點的記憶體。 #### q_insert_head / q_insert_tail > [commit 8c8f841](https://github.com/sysprog21/lab0-c/commit/8c8f84102ac928583f9a50639c248f9afc40617a) 為了符合 linux 的編寫風格，在佇列的頭部或尾端插入新節點可以使用 `list_add` 和 `list_add_tail` 函式。首先需要確認佇列是否存在。接著宣告並分配一個新的 `element`，並確認分配是否成功。然後使用 strdup 將目標字元指標 s 複製到新的 `element->value` 中，並確認是否成功複製。最後，根據需要使用 `list_add` 將新節點插入到佇列的頭部，或者使用 `list_add_tail` 將新節點插入到佇列的尾端。 ```shell $ man strdup ``` > The `strdup()` function returns a pointer to a new string which is a duplicate of the string s. Memory for the new string is obtained with `malloc(3)`, and can be freed with `free(3)`. 利用 `list_add` 或是 `list_add_tail` 將 `element` 加入到佇列之中，並回傳 `true` 回報 `element` 已經加入佇列之中。 #### q_remove_head / q_remove_tail > [commit dc52bfc](https://github.com/sysprog21/lab0-c/commit/dc52bfcccb5cee04c29f384b8ed27c11988f7e08) :::danger 改進你的漢語表達。 ::: 首先，需要確認佇列是否存在且不為空。接著宣告一個新的指標 `node`，將 `head->next` 或 `head->prev` 指派給 `node`。`q_remove_head` 函式使用 `head->next` 來指定目標節點，而 `q_remove_tail` 函式使用 `head->prev` 來指定目標節點。然後宣告一個 `element_t` 的指標 `element`，並將 `node` 的進入點指派給 `element`。接著，將該節點中的字串根據設定的長度 `bufsize` 使用 strncpy 複製到字元指標 `sp` 中，並在字串的最後加入 `\0`。最後，移除並釋放該節點。 ```shell $ man strdup ``` > The strndup() function is similar (with `strdup`) , but copies at most n bytes. If s is longer than n, only n bytes are copied, and a terminating null byte ('\0') is added. 使用函式 `list_del` 刪除該節點在佇列中的連結。 #### q_size > [commit 53e8039](https://github.com/sysprog21/lab0-c/commit/53e803927f6f7fbf88e348887756f378ab908593) :::danger 注意用語： * access 是「存取」，而非「訪問」 ::: 首先需要確認佇列是否存在。然後宣告一個整數變數 `count` 並將其初始化為 0。接著使用 `list_for_each` 函式遍歷每一個節點，每遍歷一個節點就將 count 加 1。這樣可以計算出在佇列中的節點數量。 #### q_delete_mid > [commit 05f80bf](https://github.com/sysprog21/lab0-c/commit/05f80bf8926419aecdff3c7c6f3573c5a2c97be4) :::danger 改進你的漢語表達。 ::: 首先確認佇列是否存在。接著宣告兩個指標 `move_head` 和 `move_tail`，將 `head` 指派給這兩個指標。在迴圈中，同時將 `move_head` 移動到下一個節點 `next` 以及將 `move_tail` 移動到前一個節點 `prev`。當 `move_head->next` 和 `move_tail->prev` 指向同一個位置，或者 `move_head->next` 指向 `move_tail` 時，離開迴圈。接下來，將 move_head 移動到下一個節點，然後刪除並釋放該節點。這樣就能達成我們所期望的效果，即刪除佇列的中間節點。 #### q_delete_dup > [commit eecb514](https://github.com/sysprog21/lab0-c/commit/eecb5141f9f01b978457671874f9b6a1f115a5c0) :::danger 改進你的漢語表達。 ::: 根據 [wanghanchi](https://hackmd.io/@wanghanchi/linux2023-lab0#q_merge) 的筆記中提到的，`q_delete_dup` 函式是針對已經排序過的佇列進行刪除重複字串的節點。因此，在實作時，我們需要比較每個節點與下一個節點的字串，如果發現重複則刪除該節點。用函式 `list_for_each_entry_safe` 可以組成巢狀迴圈來比對每個節點中的字串值是否一致，若一致則刪除該節點。 #### q_swap > [commit 00b33dd](https://github.com/sysprog21/lab0-c/commit/00b33dd3c5bc7729dcae8a365e85e02656790855) :::danger 改進你的漢語表達。 ::: 要將佇列中的節點從頭到尾進行兩兩交換，首先需要確認佇列是否存在。然後宣告一個新的指標 `node`，並利用函式 `list_for_each` 讓該指標經過佇列中的每個節點。在經過每個節點時，將該節點與其所指向的下一節點進行交換。為了避免將佇列的頭部和尾端節點進行交換，我們需要在程式碼中加入以下判斷式：如果 `node` 或 `node->next` 為空，則跳過交換操作。這樣可以確保只進行預期的節點交換。 ```c if (node->next == head) break; ``` #### q_reverse/q_reverseK > [commit fad95c1](https://github.com/sysprog21/lab0-c/commit/fad95c14869db6eb52d091b39733d904f1dc8789) 將佇列中的所有節點進行反向操作時，使用 `list_for_each_safe` 函式會將指標 `safe` 指向 `node->next`。由於 `list_for_each_safe` 函式支持在迭代過程中移除節點的操作，因此我們選擇此函式來實現反向操作。首先，我們需要確認佇列是否存在。然後宣告兩個指標 `node` 和 `safe`，在調用 `list_for_each_safe` 函式時，將 `node` 指向 `head->next`，並將 safe 指向 `node->next`。接著，在每次迭代中，將 `safe` 指向 `node->next`，然後對調 `node` 中前後節點的指向。由於迭代結束後 `head` 節點沒有被修改，因此最後需要將 `head` 的前後指標進行對調。 #### q_descend / q_ascend > [commit 69ca1d1](https://github.com/sysprog21/lab0-c/commit/69ca1d1ab418e0629097af663c82184d63c11f04) ![git commit](https://hackmd.io/_uploads/SyuB2G5LC.png) :::danger 務必使用 git rebase，而非 git merge! 一旦正確使用 git rebase，就不會有 "Merge branch 'sysprog21:master' into master" 的訊息。唯有掌握細節，方可挑戰更大的目標。你若無法從錯誤中學習，憑感覺持續盲目投入，只是徒勞。 > 收到，已更正 > ![git rebase](https://hackmd.io/_uploads/Sk-ux8nU0.png) ::: :::danger 詳閱〈[Git 教學和 GitHub 設定指引](https://hackmd.io/@sysprog/git-with-github)〉(含解說錄影)，使用 git 命令的 `fetch` 操作取得最新的 [sysprog21/lab0-c](https://github.com/sysprog21/lab0-c) 變更，並用 `rebase` 操作讓自己提交的 commit 得以在 [sysprog21/lab0-c](https://github.com/sysprog21/lab0-c) 最新的變革之上。注意：你可能會遇到若干衝突，你需要自行排除。過程中，確保符合〈[How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/)〉規範，你或許會需要用 `git rebase -i` 命令來改進，最後用 `git push --force` (注意: 操作務必謹慎) 公開發佈於 GitHub。 > `git rebase` 示意: ![image](https://hackmd.io/_uploads/HkkxHWE6T.png =80%x) ::: ### 研讀 [lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) 並完成 q_sort :::danger 改進你的漢語表達。 ::: list_sort.c 是 Linux 核心中實作的 merge sort 檔案。在執行 `list_sort` 函式時，會先將環狀雙向鏈結串列轉換為 null-terminated 的單向鏈結串列，並在排序結束後使用 `merge_final` 函式將其轉換回環狀雙向鏈結串列。 ```c __attribute__((nonnull(2,3,4))) ``` > **5.25 Declaring Attributes of Functions** > The keyword `__attribute__` allows you to specify special attributes when making a declaration. 根據 [A GNU Manual](https://gcc.gnu.org/onlinedocs/gcc-3.4.3/gcc/Function-Attributes.html) 中所敘述在函式宣告時，檢查在位置 `(2,3,4)` 輸入的參數是否為 `non-null` 的指標。 ### 閱讀論文〈[ Dude, is my code constant time? ](https://eprint.iacr.org/2016/1123.pdf)〉並改進 dudect 論文中提到，評估一個實作是否以恆定時間運行並非易事，即使是在理論時間複雜度為 O(1) 的實作中也是可能存在 timing leakage 。現有的幾種檢測變時性代碼的工具大多需要某種方式的硬件建模，然而這非常困難，因為 CPU 製造商對 CPU 內部運作原理的 <s>信息</s> 公開很少，且這種行為可能會因微碼更新而改變。因此論文中介紹了一種名為 dudect 的工具，用於檢測某段代碼是否在給定平台上以恆定時間運行。論文中解釋了 time leakage 檢測測試的實作，首先測量兩個不同輸入數據類別的執行時間，然後檢查這兩個時間分佈是否在統計上存在顯著差異。在進行執行時間的量測時，leakage 檢測技術針對兩個不同輸入數據類別進行測量。其中，一種被廣泛應用的測試方法是 fix-vs-random 測試。在這種測試中，第一類輸入數據被固定為一個常數值，而第二類輸入數據則每次測量時隨機選擇。而根據論文中所描述，以上兩組測試數據所得到的時間分佈應該是相等的。 :::danger 探討更多該論文的議題，不要只做摘要！ ::: #### 實驗方法量測方法：使用 fix-vs-random 的方法進行量測。具體來說，將第一組輸入設定為固定值，而第二組輸入則會根據每次的量測隨機設定。後處理：在進行統計分析之前，需要對每次的量測結果進行後處理。由於大部分的量測值集中在較小的執行時間，但也有部分量測值出現特別高的執行時間，這可能導致時間分佈呈現正偏態。因此，需要對結果進行篩選或取捨，以確保統計分析的準確性。 :::danger 一旦進行後處理，是否會影響統計模型？ ::: >會，根據論文〈[ Dude, is my code constant time? ](https://eprint.iacr.org/2016/1123.pdf)〉中所描述，若少了對 t-test 資料的**前**處理，則 t-test 將只能檢測 first-order timing information leakage > >A simple statistical test to use and implement is Welch’s t-test. This statistic tests for equivalence of means, and hence failure of this test trivially implies that there is a ==first-order timing information leakage==. Note that when a t-test is coupled with cropping pre-processing, one is no longer testing only for equality of means ==but also for higher order== statistical moments since the cropping is a non-linear transform. >〈[ Dude, is my code constant time? ](https://eprint.iacr.org/2016/1123.pdf)〉 :::danger 翻閱統計學教科書，確認上述說法。 ::: 統計測試：使用 [Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test) 來測試兩組數據的時間分佈是否相等（例如，平均數是否相同）。如果檢驗結果顯示不相等，則可能意味著存在時間洩漏。 :::danger 先做數學分析，不要急著寫程式。 ::: #### 數學分析 #### 實作 >[commit f5c709b](https://github.com/bclegend/lab0-c/commit/f5c709b5f6698d313a15a2a3960bb294982edf3b) 在 lab0 的實作中加入 leakage detection test 來測試程式是否為 constant time 。閱讀 [dudect](https://github.com/oreparaz/dudect/blob/master/src/dudect.h) 程式碼可以發現，函式 `dudect_main` 與 `lab0-c/dudect/fixture.c` 中的函式 `doit` 相對應，因此在函式 `doit` 中加入函式 `prepare_percentiles` ```diff bool ret = measure(before_ticks, after_ticks, input_data, mode); differentiate(exec_times, before_ticks, after_ticks); + prepare_percentiles(exec_times); update_statistics(exec_times, classes); ret &= report(); ``` :::danger 區分「函數」和「函式」，見: https://hackmd.io/@sysprog/it-vocabulary ::: 函式 `prepare_percentiles` 主要操作的是使用函式 `qsort` <s>函數</s> 對 `exec_times `進行排序，迭代每個測量點，計算其對應的百分位數，並將結果儲存 <s>存儲</s> 回 `exec_times` <s>數組</s> 陣列中。 :::danger 注意用語！ ::: ```c static int cmp(const int64_t *a, const int64_t *b) { if (*a == *b) return 0; return (*a > *b) ? 1 : -1; } static int64_t percentile(int64_t *a_sorted, double which, size_t size) { size_t array_position = (size_t) ((double) size * (double) which); assert(array_position < size); return a_sorted[array_position]; } /* * set different thresholds for cropping measurements. * the exponential tendency is meant to approximately match * the measurements distribution, but there's not more science * than that. * Reference : https://github.com/oreparaz/dudect/blob/master/src/dudect.h */ static void prepare_percentiles(int64_t *exec_times) { qsort(exec_times, N_MEASURES, sizeof(int64_t), (int (*)(const void *, const void *)) cmp); for (size_t i = 0; i < N_MEASURES; i++) { exec_times[i] = percentile( exec_times, 1 - (pow(0.5, 10 * (double) (i + 1) / N_MEASURES)), N_MEASURES); } } ``` 參照 `dudect` 中函式 `update_statistics()` 會放棄前 10 個量測值，而不是從頭開始計算。因此在 lab0-c 中也做了對應的更改，即在進行統計分析時，同樣放棄前 10 個量測值。 :::danger 探討此操作在統計的意義。 ::: ```diff static void update_statistics(const int64_t *exec_times, uint8_t *classes) { - for (size_t i = 0; i < N_MEASURES; i++) { + for (size_t i = 10; i < N_MEASURES; i++) { ``` ### 利用 valgrind 分析記憶體使用情況 ### 整合網頁伺服器 :::danger 務必依循作業規範，執行所有要求之任務。注意細節！整個學期過去，我看不到你的成長，為何如此？說好的翻身呢？ :::