2024q1 Homework1 (lab0)

# 2024q1 Homework1 (lab0) contributed by < `stevendd543` > ### Reviewed by `SuNsHiNe-75` - 你的「開發環境」沒有呈現出，作業有規定。 - 請把完整程式碼移除，如要討論才將要討論之部分「重點列出」。 - 數字或英文等字元，兩邊應要有空格來隔開。 - 注意標點符號的使用，有些地方都沒有「句號」。 - 字如「在/再」、「做/作」應分清楚。 > 你的洞見呢？與其談「心法」，不如設想「如果我是你」，我會怎麼做，示範一次給同學看，而且檢討自己哪裡沒做好，從而呼應原本學員的不足處。 > 清楚標示學員的 git commits 和敘述的不足處，予以檢討。 > [軟體工程師要學會說故事](https://ruddyblog.wordpress.com/2016/06/18/)，本作業要求學員從良性詳盡的批評開始。繼續檢討！ > :notes: jserv ### 開發環境 ```shell $ gcc --version gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz CPU family: 6 Model: 158 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 9 CPU max MHz: 3800.0000 CPU min MHz: 800.0000 BogoMIPS: 5599.85 ``` ## 指定的佇列操作 #### Free and Malloc ```c static inline void q_release_element(element *e) { test_free(e->value); test_free(e); } ``` 在 queue.h 中我發現了一個函式 `q_release_elements` 裡面使用了 `test_free` 而非 `free` ，一開始還有個疑問是為何 `q_release_element` 要先將 value 釋放掉記憶體在釋放掉 list ，後來仔細看 `element` 這個結構後才發現原來當初 value 的型態是 *char**，因此在**動態**建立 `element` 這個結構時，也同時動態配置了 value 空間，因此在釋放空間的時候，如果沒有將 value 釋放將會產生錯誤。 ```c struct list_head *q_new() { struct list_head *q_head=malloc(sizeof(struct list_head)); /*return queue head address if successfully allocate memory else return NULL*/ if(q_head){ INIT_LIST_HEAD(q_head); return q_head; } return NULL; } ``` :::danger function call 是「函式呼叫」，其權限沒有大到可以「調用」你。 ::: 在 `INIT_LIST_HEAD` 註解中提到，一個沒被初始化的節點，有可能會被其他函式呼叫，並使用 `list_del` 將其刪除，這樣的結果是不安全，雖然配置記憶體空間給了`q_head`，但是結構中的鏈結串列沒有使用 `INIT_LIST_HEAD` 初始化的話會造成[解引用空指標](https://learn.snyk.io/lesson/null-dereference/)。 :::danger 1. 留意[資訊科技詞彙翻譯](https://hackmd.io/@sysprog/it-vocabulary) 2. 改進漢語表達 :::  #### strcpy and strncpy :::danger store 的翻譯是「儲存」，而非「存儲」，本處針對者是存取記憶體的邊界議題。 ::: strcpy() 與 memcpy() 皆有可能會造成儲存溢位要小心使用 ! 兩者差異在於複製的型態限制不一樣，但不像 `strdup` 在配置空間時會返還配置失敗的訊息，而會直接覆蓋原始資料。 ```c element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize) { if(!head || list_empty(head)) return NULL; element_t *e = container_of(head->next,element_t,list); \ // `head->prev` for q_remove_tail list_del_init(head->next); if(sp) strncpy(sp,e->value,bufsize-1); strcpy(sp+bufsize-1,"\0"); return e; } ``` 這裡嘗試了使用`strncpy`來複製字串，透過 bufsize-1 限制了複製位元組的次數，可以避免掉存儲溢位，但要注意要預留最後給空白字元進行字串中止 `strcpy(sp+bufsize-1,"\0")` 其餘操作像是 q_delete_dup、q_swap、q_reverseK、q_ascend、 q_descend 等等程式碼詳見 [Github: lab0-c/queue.c](https://github.com/stevendd543/lab0-c)  [likely](https://meetonfriday.com/posts/cecba4ef/) ### list_sort ==名詞解釋== * count : 透過二進位記錄 pending 的節點數量來控制合併操作 * pending list : 儲存等待合併的串列 **合併過程** ``` Each time we increment "count", we set one bit (bit k) and clear bits k-1 .. 0. Each time this happens (except the very first time for each bit, when count increments to 2^k), we merge two lists of size 2^k into one list of size 2^(k+1). ``` 此處擷取 list_sort.c 說明了 merge 兩個 list 的時機，每當我們增加 count 時，發生第 k-bit 且 0~k-1-bit 都設為0時進行 merge **合併條件** ``` This merge happens exactly when the count reaches an odd multiple of 2^k, which is when we have 2^k elements pending in smaller lists, so it's safe to merge away two lists of size 2^k. ``` 裡面提到了當 count 也就是紀錄 pending list 的節點數量，剛好是 $2^k$ 的奇數倍，就進行合併，但我覺得這裡一開始看相當不值觀，因為 $2^k$ 的奇數倍換句話說就是 $2^k$ 的偶數倍 1、2、4、8... 不進行合併，那再搭配表看看 |count decimal|count binary|merge| | -------- | -------- | -------- | 0|0000 |No| 1|0001 |No| 2|0010 |Yes| 3|0011 |No| 4|0100 |Yes| 5|0101 |Yes| 明顯的搭不起來。 :::danger 無論標題和內文中，中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)； :::   #### 以圖來解釋 list_sort 程式碼 :::danger 改進你的漢語表達。 ::: 後來我直接去了解搭不起來的原因，主要是 count 的變化時間點與剛剛的程式碼註解有稍微一點落差，在理解上比較容易搞混，因此我按照原本程式碼圖解了 pending list 、 merge 、 bits 的個別用意，以下為擷取部分[lib/list_sort.c 程式碼](https://github.com/torvalds/linux/blob/master/lib/list_sort.c)。 ```c for (bits = count; bits & 1; bits >>= 1) tail = &(*tail)->prev; /* Do the indicated merge */ if (likely(bits)) { struct list_head *a = *tail, *b = a->prev; a = merge(priv, cmp, b, a); /* Install the merged result in place of the inputs */ a->prev = b->prev; *tail = a; } ``` 假設目前 `bits = count` 計算到 0100~2~，且目前 pending list 如下圖，表示剛剛加入一個長度為 1 的元素後，就會有兩個一樣大小的要準備合併，現在 pendlist 狀況長這樣。 ```graphviz digraph{ rankdir=LR "pending list"[shape=box] "a 長度 2"[shape=box] "b 長度 1"[shape=box] "c 長度 1"[shape=box] "a 長度 2"->"pending list" "b 長度 1"->"a 長度 2" "c 長度 1"->"b 長度 1" [color =blue] } ``` 合併後再加入一個長度為 1。然後這個時候 count 變成 0101~2~，在上面有提到說除了第一次 flip外，如果第 k 個 bit 被轉成 1 且 0~k-1 bits 都因為進位變 0代表有長度 2^k^ 的 list 要做合併，那此處就是 a 與 b進行合併，且 count + 1 變成 0110~2~。 ```graphviz digraph{ rankdir=LR "pending list"[shape=box] "a 長度 2"[shape=box][color =red] "b 長度 2"[shape=box] [color =red] "c 長度 1"[shape=box] "a 長度 2"->"pending list" "b 長度 2"->"a 長度 2" "c 長度 1"->"b 長度 2" [color =blue] } ``` 合併後再加一個進來變成下圖。那這邊 count + 1 後變成 0111~2~ 此處同上第 0 bit 進位了，所以 2^0^ 長度要進行合併。 ```graphviz digraph{ rankdir=LR "pending list"[shape=box] "a 長度 4"[shape=box] "b 長度 1"[shape=box] [color =red] "c 長度 1"[shape=box][color =red] "a 長度 4"->"pending list" "b 長度 1"->"a 長度 4" "c 長度 1"->"b 長度 1" [color =blue] } ``` 合併且加入元素變成，可以注意到 0111~2~ 中每個 1都代表長度為 2^k^ 各有一個。 ```graphviz digraph{ rankdir=LR "pending list"[shape=box] "a 長度 4"[shape=box] "b 長度 2"[shape=box] "c 長度 1"[shape=box] "a 長度 4"->"pending list" "b 長度 2"->"a 長度 4" "c 長度 1"->"b 長度 2" [color =blue] } ``` 同理可推 1000~2~ 因為是第一次轉成 1 所以不做任何合併。 #### 總整理所有規則可以簡單分成以下三點 : * 每次 +1 必定會有一個節點加入 pending list 等待被合併。 * 當第 k 個 bit 由 0 變成 1 且不是第一次發生，代表有 2^k^ 長度的 list 在 pending list 中要被合併。 * count = 2^k^-1 時代表有 k 個不同長度的 list 在 pending list 中。 :::danger 明確標示 GitHub 帳號名稱及出處。 ::: 有了上面的圖在看整理圖表([來源 : kdnvt](https://hackmd.io/byNYeoGvQv-a9TNDc2CKFQ?view))就可以理解整個設計了。 |count decimal|count binary|merge|before merge|after merge | -------- | -------- | -------- |---|---| 0|0000 |No| NULL| X 1|0001 |No| 1 |X 2|0010 |Yes| 1-1 |2 3|0011 |No| 1-2 |X 4|0100 |Yes| 1-1-2 |2-2 5|0101 |Yes| 1-2-2 |1-4 6|0110 |Yes| 1-1-4 |2-4 7|0111 |No| 1-2-4 |X 8|1000 |Yes| 1-1-2-4 |2-2-4 9|1001 |Yes| 1-2-2-4 |1-4-4 10|1010 |Yes | 1-1-4-4| 2-4-4 11|1011 |Yes | 1-2-4-4| 1-2-8 12|1100 |Yes| 1-1-2-8| 2-2-8 13|1101 |Yes | 1-2-2-8 |1-4-8 14|1110|Yes | 1-1-4-8 |2-4-8 15|1111 |No | 1-2-4-8 |X 16|10000 |Yes| 1-1-2-4-8| 2-2-4-8 :::warning 回到第一手材料，閱讀作業說明所及的三篇論文。 ::: --- ### 將 list_sort 引入 lab0-c 專案參考 [chiangkd](https://hackmd.io/duPg38XOTL-XKa7_h0Ielw?view#%E7%A0%94%E8%AE%80%E8%AB%96%E6%96%87%E4%B8%A6%E6%94%B9%E9%80%B2-dudect) - 複製 list_sort.[ch] 到專案底下 - 新增 cmp 函式在 list_sort.c - 修改 Makefile 指令新增 list_sort 目標檔 - 根據 cppcheck 報錯修改程式碼 perf 效能測試，參見 [perf 原理和實務](https://hackmd.io/@sysprog/gnu-linux-dev/https%3A%2F%2Fhackmd.io%2Fs%2FB11109rdg) 1. 用此命令檢查權限，若非 -1 ``` $ cat /proc/sys/kernel/perf_event_paranoid ``` 2. 透過以下命令設定權限 ``` $ sudo sh -c "echo -1 > /proc/sys/kernel/perf_event_paranoid" ``` 3. (optional)perf 指令加入 Makefile ``` export cycle ?= 10 perf: qtest perf stat --repeat $(cycle) -e cache-misses,cache-references,instructions,cycles ./qtest -f traces/trace-sort.cmd ``` --- #### Instructions and cycles 執行結果 : Delta time | 資料量 |q_sort | list_sort | |------|------|--------| |500,000|0.45|0.406| |1,000,000|0.987|0.859| #### cache 執行結果 : Delta time | 資料量 |q_sort | list_sort |difference| |------|------|--------|-------| |cache-misses|126,801,501|107,195,973|19,605,528| |cache-references|210,171,035|171,331,504|38,839,531| 可以從結果看出 list_sort 對 cache 相當友善! :::success TODO : 改進 q_sort ::: --- ### Valgrind + 自動測試程式 **Valgrind 簡介** valgrind 是個在使用者層級 (user space) 對程式在執行時期的行為提供動態分析的系統軟體框架，具備多種工具，可以用來追蹤及分析程式效能。 * [memcheck](https://valgrind.org/docs/manual/mc-manual.html) : 用於偵測記憶體錯誤，例如存取非法記憶體位置、使用未初始化的值等等。 * [massif](https://valgrind.org/docs/manual/ms-manual.html) : 他是一個 head profiler，用來量測 heap 可使用空間。 * [chachegrind](https://valgrind.org/docs/manual/cg-manual.html) : 能夠精確地量測程式執行的指令數。 <詳細功能可參閱超連結> valgrind 會自行維護一份暫存器與記憶體的副本來安全地使用並測試，因為他屬於 dynamic Binary Instrumentation (DBI) 框架，因此會操作機械碼來進行分析，但這過程會經過轉譯成 VEX 中間表示法 (intermediate representation，簡稱 IR，是種虛擬的指令集架構)，並插入分析工具來進行檢測。 --- ### 研讀論文〈Dude, is my code constant time?〉論文網址: [<Dude, is my code constant time?>](https://eprint.iacr.org/2016/1123.pdf) #### Introduction ##### 歷史回顧 Timing attacks 是一種 side-channel 實體攻擊手法，攻擊者主要是基於不同的 logical operation，來量測每個 opertaion 執行時間推敲出輸入資料的資訊為了解決人工解查耗時的問題，現今研究提出了由**動態分析工具**`Valgrind`延伸而來的`ctgrind`去追蹤 secret-dependant paths 或者 memory accesses。也有人利用**靜態指令分析**去分析部分程式碼是否為常數執行時間嵌入硬體過程繁雜，難以建模實現。 :::danger 避免非必要的項目縮排 (即 `* ` 和 `- `)，以清晰、明確，且流暢的漢語書寫。 ::: ##### 本篇論文提出貢獻 * dudect 偵測程式碼執行時間是否為常數執行時間 * 專注在執行時間量測的統計分析，非靜態程式碼分析，可規避硬體建模上的困難 What is timing variability ? --- #### 流程 **step 1. Measure execution time** 根據兩組不同資料，執行 leakage detection test 去統計執行時間，因為有多種 leakage detection test 常見的手法有兩種 **fix** 與 **random** * Classes definition - fix : 輸入資料固定為 constant value，fixed value 有可能可以用來觸發極端案例 - random : 輸入資料隨機產生 * Cycle counters - CPUs 提供精確的執行時間量測 * Environmental conditions - 為了減少環境差異影響，每次量測前都會對應到一個隨機的類別 **step 2. Apply post-processing** 在做統計分析前會先做以下一些處理 * Cropping : 典型的時間分布屬於正偏斜，也就是大部分時間分布都落在較短時間，這是因為許多程序會被作業系統或者其他事件給中斷，因此很容易可以根據設定閾值去切割量測的範圍 ![image alt](https://hackmd.io/_uploads/rykpk1an6.png =250x250) Right skewed = Postively skewed * Higher-order preprocessing : 根據統計分析結果可以應用 higher-order 的預處理，像是 centered product 模仿了 higher-order DPA 攻擊 **Step 3. Apply statistical test** 去驗證「*兩個時間分布是相等*」的虛無假設 * T-test * Non-parametric test #### dudect 程式碼 :::danger 指出論文和程式碼之間的差異。 ::: 從[dudect/src/dudect.h](https://github.com/oreparaz/dudect/blob/master/src/dudect.h#L419)開始分析 ```c static int64_t percentile(int64_t *a_sorted, double which, size_t size) ``` 從參數名稱`int64_t *a_sorted`知道輸入資料必須是經過排序，輸入資料要從另一個函式`prepare_percentiles`取得 ```c static void prepare_percentiles(dudect_ctx_t *ctx) { qsort(ctx->exec_times, ctx->config->number_measurements, sizeof(int64_t), (int (*)(const void *, const void *))cmp); for (size_t i = 0; i < DUDECT_NUMBER_PERCENTILES; i++) { ctx->percentiles[i] = percentile( ctx->exec_times, 1 - (pow(0.5, 10 * (double)(i + 1) / DUDECT_NUMBER_PERCENTILES)), ctx->config->number_measurements); } } ``` 這兩個函式在 $1-0.5^{10*(i+1)/100}$ 控制抽樣的比例 * 論文提到根據經驗較低的 cycle count tail 比較不會受資料相依(**noise 中文待查**)的影響去切割量測(measurements)，當大於某個 percentile-p 時將其去除，此操作可以提升偵測速度 --- #### Simlation 如何透過以實驗而非理論分析來驗證時間複雜度解釋 Student's t-test 以及實作 :::danger 避免非必要的項目縮排 (即 `* ` 和 `- `)，以清晰、明確，且流暢的漢語書寫。 ::: Student's t-test : 先介紹什麼是**假說檢定**，主要功能是透過現有的數據去支持特定假設的方法，因此會有兩個假說 *Null hypothesis* $H0$ 以及對立的假說 *Alternative hypothesis* $H1$，根據結果來推論真正的母體。再來就是 Student's t-test，它是用來檢定兩組資料的母體平均數是否有顯著差異，但再進行檢定前必須滿足三個前提假設，[獨立性、常態性、變異數同質性](https://highscope.ch.ntu.edu.tw/wordpress/?p=69817)，才會去進行 Student's t-test 倘若兩組資料變異數不具同質性，才會使用 Welch’s t-test 來檢定會更具可靠性來自 [lab0-c/dudect/ttest.c](https://github.com/sysprog21/lab0-c/blob/master/dudect/ttest.c) 實作 Welch’s t-test 分成三組函數 * `void t_push(t_context_t *ctx, double x, uint8_t class)` : 裡面有計算變異數 Welford’s 方法 * `double t_compute(t_context_t *ctx)` : 計算在變異數不同下的 t 統計量 * `void t_init(t_context_t *ctx)` : 初始化所有統計變數將 [dudect/src/dudect.h](https://github.com/oreparaz/dudect/blob/master/src/dudect.h#L205) 中的 percentile 引入 lab0-c，新增在[commit : da59019](https://github.com/sysprog21/lab0-c/commit/da59019b59072549540adf9352fcf78c0f1a13c0) ```c ctx->percentiles[i] = percentile( exec_times, 1 - (pow(0.5, 10 * (double))(i + 1) /DUDECT_NUMBER_PERCENTILES)), N_MEASURES); ``` 可以知道 `percentiles[i]` 由小至大由 $1-0.5^{10*(i+1)/100}$ (下圖為變化曲線) 加權設置不同的閾值來限制 measurements ，但仔細看曲線可以發現所有的`percentiles`並不全然相異，等於從時間統計中抽樣出來的值會重複在 larger time ![image alt](https://hackmd.io/_uploads/Hynm0gzaT.png =400x300) * [traces/trace-17-complexity.cmd](https://github.com/sysprog21/lab0-c/blob/master/traces/trace-17-complexity.cmd) : 提供模擬參數 * [dudect/fixture.c](https://github.com/sysprog21/lab0-c/blob/master/dudect/fixture.c) : 多次給定執行函式測量執行時間，並且執行 Welch's t-test 來檢定是否為 constant time， --- ## Code review > 錄影 : [Amazing Code Reviews](https://www.youtube.com/watch?v=ly86Wq_E18o) 有時候我們會像單細胞生物一樣 PR 與 merge，但事實上這樣很容易會有程式問題，因此作者強調著重在 code review 的工程上會讓團隊生產更有效率 * Build the right thing : 越早 PR 越有它的價值，==can show direction==、==minimize re-work==，因此要保持開放的心態去接受批評。 * Build the thing right : 每次 PR 盡可能控制程式碼行數 200-300 行，使得別人在 code review 時不會看得眼花撩亂，但也不要太小無法了解整體的想法。1 PR == 1 Concern 一個 PR 只關注一件事情，萬一程式碼出錯不必逐一偵錯。 * Build it fast : 篩選 PR，避免在 review 下面過多的討論 * Build it toghether --- ## Tic-tac-toe **前提** : 在 monte-calro 中大量使用 *浮點數* 計算，當我們無法接受浮點數帶來的計算負擔時，就需要使 *定點數* 來解決。其實一開始直觀的想法，浮點數與定點數都屬於小數，為什麼定點數就能比浮點數還低的計算開銷，參考了[作業講解](https://hackmd.io/@sysprog/linux2024-ttt#-Linux-%E6%A0%B8%E5%BF%83%E7%9A%84%E6%B5%AE%E9%BB%9E%E6%95%B8%E9%81%8B%E7%AE%9)中提到，kernel 會多了 trap 以及 initiate 這兩步驟，那定點數呢? > When using floating-point instructions kernel normally catches a trap and then initiates the transition from integer to floating point mode. 定點數雖然是小數，但他不屬於 IEEE 754 的浮點數規範，不必處裡浮動的小數點，在某些精度要求不高的地方，定點數可以比浮點數更有效率，也要特別注意浮點數與定點數的表示方式是有差異的，在這裡很容易搞混。以下簡單的整理作業提到的定點數運算範例 ```c static unsigned long fixed_power_int(unsigned long x, unsigned int frac_bits, unsigned int n) { unsigned long result = 1UL << frac_bits; if (n) { for (;;) { if (n & 1) { result *= x; result += 1UL << (frac_bits - 1); result >>= frac_bits; } n >>= 1; if (!n) break; x *= x; x += 1UL << (frac_bits - 1); x >>= frac_bits; } } return result; } ``` 本範例的精確度取決於 `frac_bits` ，也就是當你固定小數位置，後面表示小數的位數有多少個 bits。 `result = 1UL << frac_bits` 這段程式碼的用意主要是在於，將其結果初始化成定點數的形式，因為日後的計算都是以定點數做操作。再來迴圈內的所有乘法都屬於定點數乘法，因此經過乘法後小數點偏移，因此要做一次反操作成規定的定點數位置。 ### 整合網頁伺服器 line editing library 是在 CLI 中提供給使用者功能的 library，linenoise 就是其中一種輕型的應用。以下來自[官方 github](https://github.com/antirez/linenoise) 提供的用途 * Single and multi line editing mode with the usual key bindings implemented. * History handling. * Completion. * Hints (suggestions at the right of the prompt as you type). * Multiplexing mode, with prompt hiding/restoring for asynchronous output. 我從 `do_web` 中發現當執行 web 命令後 use_lineboise 會被設為 false 也就是會將 linenoise 關閉，那這個會如何影響 web 呢? 先回去看一下 linenoise。 #### linenoise.c `linenoise.c/linenoise` 這個函式主要用來檢查終端的基本能力。首先會透過 `isatty` 檢查是否連上 terminal ，因為每個 terminal 都會有一個唯一的 device file 儲存在 dev/ 下，也可透過 [`tty`](https://support.microsoft.com/zh-tw/office/%E4%BB%80%E9%BA%BC%E6%98%AF-tty-2c46b431-6681-43e6-b8a5-cf3886e3a53d) 命令將當前終端的名稱顯示出來。如果 terminal 支援但無法解析輸入命令就會執行 `line_raw` 切換到 *raw mode* 讀取命令，使用者不管有沒有 enter 命令都會被完整 `read`。 >[Terminal mode](https://en.wikipedia.org/wiki/Terminal_mode) 假設使用者在 terminal 中輸入 “ABC<Backspace>D” >1. cooked mode ：在資料傳進程式之前會先做預處理，變成 ABD >2. raw mode ：不做任何的特殊字轉換就將其送入程式，變成 ABC<Backspace>D 在 `console.c` 中如果 `linenoise` 被關閉後，只會執行 `cmd_select` 而不會去透過 `linenoise` 處理終端輸入的命令。不過目前還是沒辦法理解作業中提到的「當 read 阻塞時，便無法接收 web 傳來的資訊」，我有去參考 [vax-r](https://hackmd.io/@vax-r/linux2024-homework1#%E6%95%B4%E5%90%88%E7%B6%B2%E9%A0%81%E4%BC%BA%E6%9C%8D%E5%99%A8) 將 `cmd_select` 中的 `char *cmdline = readline()` 將其改成 `linenoise(prompt)` 去處理輸入的命令，但會發現只能在 web 成功完成命令，在終端卻無法同時使用，這裡給的結論是 **read 被阻塞**，不過我覺得不夠清楚理解(TODO:解釋為何無法同時操作)。