2020q1 Homework1 (lab0)

# 2020q1 Homework1 (lab0) contributed by < `fdgkhdkgh` > > [作業要求](https://hackmd.io/@sysprog/linux2020-lab0#-%E4%BD%9C%E6%A5%AD%E8%A6%81%E6%B1%82) ## 開發紀錄 ### queue.h 新增了 tail 以及 q_size 欄位 ```cpp /* Queue structure */ typedef struct { list_ele_t *head; /* Linked list of elements */ list_ele_t *tail; /* the tail of the queue */ int q_size; } queue_t; ``` ### queue.c #### q_new 給 q 一塊記憶體，假如 malloc 失敗就回傳 NULL ， malloc 成功就初始化 q 的結構 ```cpp /* * Create empty queue. * Return NULL if could not allocate space. */ queue_t *q_new() { queue_t *q = malloc(sizeof(queue_t)); if (q == NULL) { return NULL; } q->head = NULL; q->tail = NULL; q->q_size = 0; return q; } ``` #### q_free 假如 q 本身沒東西，就直接跳出 function，有東西的話，就把整個 queue 的 value 以及 list_ele_t 給 free 掉。 ```cpp /* Free all storage used by queue */ void q_free(queue_t *q) { if (q == NULL) { return; } list_ele_t *now = q->head; if (now == NULL) { } else { while (now) { list_ele_t *next = now->next; free(now->value); free(now); now = next; } } /* Free queue structure */ free(q); } ``` #### q_insert_head 假如 q 本身是 NULL ，回傳 false 。不是 NULL 的話，就 malloc 出一個 list_ele_t 接在這個 linked list 的開端。假如是第一個 list_ele_t 的話，需要同時初始化 q->head 以及 q->tail 每次 malloc 都要檢查 malloc 有沒有發生錯誤。假如 malloc 發生錯誤，要把這個 function 內已經 malloc 出來的記憶體給 free 掉，不然會 memory leak。每當 list_ele_t 有變動時，需要維護 q->q_size 每當頭端有變動時，要維護 q->head ```cpp /* * Attempt to insert element at head of queue. * Return true if successful. * Return false if q is NULL or could not allocate space. * Argument s points to the string to be stored. * The function must explicitly allocate space and copy the string into it. */ bool q_insert_head(queue_t *q, char *s) { list_ele_t *newh; if (q == NULL) { return false; } newh = malloc(sizeof(list_ele_t)); /* Don't forget to allocate space for the string and copy it */ /* What if either call to malloc returns NULL? */ if (newh == NULL) { return false; } newh->value = (char *) malloc(sizeof(char) * strlen(s) + 1); if (newh->value == NULL) { free(newh); return false; } newh->next = NULL; strncpy(newh->value, s, strlen(s)); newh->value[strlen(s)] = 0; if (q->head == NULL) { q->tail = newh; } newh->next = q->head; q->head = newh; q->q_size++; return true; } ``` #### q_insert_tail 假如 q 本身是 NULL ，回傳 false 。不是 NULL 的話，就 malloc 出一個 list_ele_t 接在這個 linked list 的尾端。假如是第一個 list_ele_t 的話，需要同時初始化 q->head 以及 q->tail 每次 malloc 都要檢查 malloc 有沒有發生錯誤。假如 malloc 發生錯誤，要把這個 function 內已經 malloc 出來的記憶體給 free 掉，不然會 memory leak 。每當 list_ele_t 有變動時，需要維護 q->q_size 每當尾端有變動時，要維護 q->tail ```cpp /* * Attempt to insert element at tail of queue. * Return true if successful. * Return false if q is NULL or could not allocate space. * Argument s points to the string to be stored. * The function must explicitly allocate space and copy the string into it. */ bool q_insert_tail(queue_t *q, char *s) { list_ele_t *newt; int charlen = strlen(s); if (q == NULL) { return false; } newt = malloc(sizeof(list_ele_t)); /* Don't forget to allocate space for the string and copy it */ /* What if either call to malloc returns NULL? */ if (newt == NULL) { return false; } newt->value = (char *) malloc(sizeof(char) * charlen + 1); if (newt->value == NULL) { free(newt); return false; } newt->next = NULL; strncpy(newt->value, s, charlen); newt->value[charlen] = 0; if (q->tail == NULL) { q->head = newt; q->tail = newt; } else { q->tail->next = newt; q->tail = newt; } q->q_size++; return true; } ``` #### q_remove_head 假如 q 本身是 NULL ，回傳 false 。不是 NULL 的話，就拿掉頭端的一個 list_ele_t 結構。假如 bufsize != 0 的話，需要把刪除的 value ，複製到 sp 裡。每當 list_ele_t 有變動時，需要維護 q->q_size, q->tail, q->head ```cpp /* * Attempt to remove element from head of queue. * Return true if successful. * Return false if queue is NULL or empty. * If sp is non-NULL and an element is removed, copy the removed string to *sp * (up to a maximum of bufsize-1 characters, plus a null terminator.) * The space used by the list element and the string should be freed. */ bool q_remove_head(queue_t *q, char *sp, size_t bufsize) { if (q == NULL || q->head == NULL) { return false; } if (sp != NULL && bufsize != 0) { strncpy(sp, q->head->value, bufsize - 1); sp[bufsize - 1] = 0; } list_ele_t *deleted; deleted = q->head; q->head = q->head->next; free(deleted->value); free(deleted); if (q->head == NULL) { q->tail = NULL; } q->q_size--; return true; } ``` #### q_size 直接回傳 q->size 。問題是，這一段常常被 dudect 視為非常數時間的實作。 ```cpp /* * Return number of elements in queue. * Return 0 if q is NULL or empty */ int q_size(queue_t *q) { if (q == NULL || q->head == NULL) { return 0; } return q->q_size; } ``` #### q_reverse 用三個指標遍尋一次 linked list ，就能夠將 linked list 倒轉記得要維護 q->head, q->tail ```cpp /* * Reverse elements in queue * No effect if q is NULL or empty * This function should not allocate or free any list elements * (e.g., by calling q_insert_head, q_insert_tail, or q_remove_head). * It should rearrange the existing ones. */ void q_reverse(queue_t *q) { if (q == NULL || q->head == NULL || q->head->next == NULL) { return; } list_ele_t *last = NULL; list_ele_t *now = q->head; list_ele_t *next = now->next; q->tail = q->head; while (now) { now->next = last; last = now; now = next; if (next != NULL) { next = next->next; } } q->head = last; } ``` #### q_sort 一開始使用了簡單的 insertion sort ，但後來發覺會超出規定的時間（因為時間複雜度為 $O(n^2)$）於是實作簡單的 merge sort ，使用一個陣列來存取所有 `list_ele_t *` (是存取指標，而非整個結構體)，成功降低時間複雜度，通過了 "trace-16-perf" ，變成沒辦法通過大測資 "trace-15-perf" 最後省略了陣列，成功通過所有測資。但不得不說，這個程式碼有待改善。 ```cpp /* * Sort elements of queue in ascending order * No effect if q is NULL or empty. In addition, if q has only one * element, do nothing. */ list_ele_t *merge(list_ele_t *lefthead, list_ele_t *righthead, int leftsize, int rightsize) { int headsize = 0; list_ele_t *temphead = NULL; list_ele_t *now = NULL; while (leftsize != 0 && rightsize != 0) { if (strcmp(lefthead->value, righthead->value) <= 0) { if (headsize == 0) { temphead = lefthead; lefthead = lefthead->next; temphead->next = NULL; now = temphead; } else { now->next = lefthead; lefthead = lefthead->next; now = now->next; now->next = NULL; } leftsize--; } else { if (headsize == 0) { temphead = righthead; righthead = righthead->next; temphead->next = NULL; now = temphead; } else { now->next = righthead; righthead = righthead->next; now = now->next; now->next = NULL; } rightsize--; } headsize++; } while (leftsize != 0) { if (headsize == 0) { temphead = lefthead; lefthead = lefthead->next; temphead->next = NULL; now = temphead; } else { now->next = lefthead; lefthead = lefthead->next; now = now->next; now->next = NULL; } leftsize--; headsize++; } while (rightsize != 0) { if (headsize == 0) { temphead = righthead; righthead = righthead->next; temphead->next = NULL; now = temphead; } else { now->next = righthead; righthead = righthead->next; now = now->next; now->next = NULL; } rightsize--; headsize++; } return temphead; } list_ele_t *mergeSort(list_ele_t *lefthead, list_ele_t *righthead, int leftsize, int rightsize) { if (leftsize == 0 || rightsize == 0) { return NULL; } int i; int lhalf_qsize = leftsize / 2; list_ele_t *llhead = lefthead; list_ele_t *lrhead = lefthead; int llsize = lhalf_qsize; int lrsize = leftsize - llsize; for (i = 0; i < lhalf_qsize; i++) { lrhead = lrhead->next; } llhead = mergeSort(llhead, lrhead, llsize, lrsize); if (llsize == 0 && lrsize == 1) { llhead = lrhead; llhead->next = NULL; } int rhalf_qsize = rightsize / 2; list_ele_t *rlhead = righthead; list_ele_t *rrhead = righthead; int rlsize = rhalf_qsize; int rrsize = rightsize - rhalf_qsize; for (i = 0; i < rhalf_qsize; i++) { rrhead = rrhead->next; } rlhead = mergeSort(rlhead, rrhead, rlsize, rrsize); if (rlsize == 0 && rrsize == 1) { rlhead = rrhead; rlhead->next = NULL; } return merge(llhead, rlhead, leftsize, rightsize); } void q_sort(queue_t *q) { if (q == NULL || q->head == NULL || q->head->next == NULL) { return; } int half_qsize = q->q_size / 2; int i = 0; list_ele_t *lefthead = q->head; list_ele_t *righthead = q->head; int leftsize = q->q_size / 2; int rightsize = q->q_size - leftsize; for (i = 0; i < half_qsize; i++) { righthead = righthead->next; } q->head = mergeSort(lefthead, righthead, leftsize, rightsize); } ``` ### 結果 ``` +++ TESTING trace trace-17-complexity: # Test if q_insert_tail and q_size is constant time complexity Probably constant time Probably constant time --- trace-17-complexity 5/5 --- TOTAL 100/100 ``` :::danger 文字訊息不要用圖片去展現 :notes: jserv ::: ### 待解疑問 Q:在 q_insert_tail 會用到的字串複製 function ，算不算是時間複雜度 O(1) ？不是的話，為什麼 dudect 仍然認為是時間複雜度 O(1) 的 function 呢？ Guess : 我猜是因為測資使用的字串長度變化不大 [1] Q:dudect 檢查出來的結果不穩定，有時候會 pass 有時候不會，為什麼？有方法可以讓它更穩定嗎？ Guess : 目前我讓它更穩定的方法是把電腦其他的 process 砍一砍，會變得比較穩定一點。或是我本身 function 就寫錯了（可能實作出來的 function 時間複雜度不是 O(1) ）。 ### 使用 Valgrind 及 Massif 檢查記憶體錯誤 #### valgrind ``` make valgrind ``` 沒發生記憶體錯誤 #### valgrind & massif ##### 安裝 massif-visualizer [massif-visualizer的github](https://github.com/KDE/massif-visualizer) ```shell= git clone https://github.com/KDE/massif-visualizer.git cd massif-visualizer mkdir build cd build cmake .. make sudo make install ``` 我所使用的作業系統是 ubuntu 18.04 ，假如 `cmake` 遇到錯誤的話，例如： ```shell -- Could NOT find KF5Archive (missing: KF5Archive_DIR) -- Could NOT find KF5Archive: found neither KF5ArchiveConfig.cmake nor kf5archive-config.cmake -- Could NOT find KF5Config (missing: KF5Config_DIR) -- Could NOT find KF5Config: found neither KF5ConfigConfig.cmake nor kf5config-config.cmake ``` 可以使用 apt-file 去尋找遺失的檔案在哪個套件，像是上面的錯誤就可以 ```shell $ apt-file search KF5ArchiveConfig.cmake ``` 找出這個檔案需要安裝套件 ```shell libkf5archive-dev: /usr/lib/x86_64-linux-gnu/cmake/KF5Archive/KF5ArchiveConfig.cmake ``` 最後用 ```shell $ sudo apt install libkf5archive-dev ``` 就可以解決問題。假如遺失的檔案很多，就反覆做幾次上面的動作就可以了。最後能得到執行時期，記憶體使用情況的精美圖表。 ```shell $ massif-visualizer ./massif.out.4880 ``` ![](https://i.imgur.com/BJr1BOs.png) ### 使用 Address Sanitizer 檢查記憶體錯誤 ```shell $ make clean $ make SANITIZER=1 $ make test ``` 經過幾個小時的奮戰，終於 debug 完了(暈)，目前利用 Address Sanitizer 抓到了兩個記憶體溢出。 1. simulation 的 type 2. const size_t number_measurements --- 在執行 trace-17-complexity 時得到了一個 report : ``` ================================================================= ==27107==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55cbc265c9c0 at pc 0x55cbc244d81c bp 0x7ffc54aaa4b0 sp 0x7ffc54aaa4a0 READ of size 4 at 0x55cbc265c9c0 thread T0 #0 0x55cbc244d81b in do_option_cmd /home/tommy/linux_internal/lab0-c/console.c:368 #1 0x55cbc244c438 in interpret_cmda /home/tommy/linux_internal/lab0-c/console.c:220 #2 0x55cbc244ce3c in interpret_cmd /home/tommy/linux_internal/lab0-c/console.c:243 #3 0x55cbc244db25 in cmd_select /home/tommy/linux_internal/lab0-c/console.c:571 #4 0x55cbc244df42 in run_console /home/tommy/linux_internal/lab0-c/console.c:630 #5 0x55cbc244b05b in main /home/tommy/linux_internal/lab0-c/qtest.c:770 #6 0x7f0c5c113b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96) #7 0x55cbc24487c9 in _start (/home/tommy/linux_internal/lab0-c/qtest+0x67c9) 0x55cbc265c9c1 is located 0 bytes to the right of global variable 'simulation' defined in 'console.c:20:6' (0x55cbc265c9c0) of size 1 'simulation' is ascii string '' SUMMARY: AddressSanitizer: global-buffer-overflow /home/tommy/linux_internal/lab0-c/console.c:368 in do_option_cmd Shadow bytes around the buggy address: 0x0ab9f84c38e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9f84c38f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9f84c3900: 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9 0x0ab9f84c3910: 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 0x0ab9f84c3920: 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 =>0x0ab9f84c3930: 00 f9 f9 f9 f9 f9 f9 f9[01]f9 f9 f9 f9 f9 f9 f9 0x0ab9f84c3940: 00 00 00 00 01 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 0x0ab9f84c3950: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9f84c3960: 00 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 0x0ab9f84c3970: f9 f9 f9 f9 01 f9 f9 f9 f9 f9 f9 f9 01 f9 f9 f9 0x0ab9f84c3980: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ``` 經過 report 給予的線索，我開始看 simulation 以及 console.c:368 附近的程式碼。可以發現，雖然全域變數 simulation 在宣告的時候，型態是 bool ，可是在使用的時候卻會被當作 int 來使用。 ```cpp add_param("simulation", (int *) &simulation, "Start/Stop simulation mode", NULL); ``` 於是當我們輸入命令： ``` cmd> option simulation 1111 ``` 的時候，就有機會覆蓋到鄰近 simulation 的全域變數。因為型態 bool 的大小為 1byte ，可是 int 的大小為 4bytes 。 ![](https://i.imgur.com/EDhnF45.png) 情況可能會像附圖這樣，發生記憶體溢出。但是通常會因為 memory 對齊的關係，雖然 simulation 是 bool ，但也會分配 4bytes 給它，這時候就不會有資訊安全的疑慮。例如我們可以 ```cpp $ objdump -D -M intel ./qtest > obj ``` 去觀察 simulation 被編譯器分配到的記憶體大小。 ``` 0000000000209500 <simulation>: 209500: 00 00 add BYTE PTR [rax],al ... 0000000000209504 <quit_helper_cnt>: ``` 從這裡可以看出 simulation 離下一個全域變數的位移為4bytes。也可以使用 gdb 來看 simulation 附近有哪些全域變數： ```shell $ gdb ./qtest gdb-peda$ b main gdb-peda$ r gdb-peda$ p &simulation $1 = (int *) 0x55555575d500 <simulation> gdb-peda$ x/wx 0x55555575d500 0x55555575d500 <simulation>: 0x00000000 0x55555575d504 <quit_helper_cnt>: 0x00000000 ``` 解決方法：將 simulation 的型態改為 int --- 修改好 simulation 的問題後，再執行一次 ```./qtest -f ./traces/trace-17-complexity.cmd``` ，又得到了一個 report 。 ``` ================================================================= ==32532==ERROR: AddressSanitizer: global-buffer-overflow on address 0x557218d6f060 at pc 0x7f08f4e4866e bp 0x7ffc4dd23300 sp 0x7ffc4dd22aa8 READ of size 1 at 0x557218d6f060 thread T0 #0 0x7f08f4e4866d (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d) #1 0x557218b60778 in q_insert_head /home/tommy/linux_internal/lab0-c/queue.c:93 #2 0x557218b6169b in measure dudect/constant.c:75 #3 0x557218b619e4 in doit dudect/fixture.c:136 #4 0x557218b61d1f in is_insert_tail_const dudect/fixture.c:168 #5 0x557218b5bde5 in do_insert_tail /home/tommy/linux_internal/lab0-c/qtest.c:259 #6 0x557218b5e467 in interpret_cmda /home/tommy/linux_internal/lab0-c/console.c:220 #7 0x557218b5ee6b in interpret_cmd /home/tommy/linux_internal/lab0-c/console.c:243 #8 0x557218b5fb54 in cmd_select /home/tommy/linux_internal/lab0-c/console.c:571 #9 0x557218b5ff71 in run_console /home/tommy/linux_internal/lab0-c/console.c:630 #10 0x557218b5d08a in main /home/tommy/linux_internal/lab0-c/qtest.c:770 #11 0x7f08f4689b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96) #12 0x557218b5a7c9 in _start (/home/tommy/linux_internal/lab0-c/qtest+0x67c9) 0x557218d6f060 is located 32 bytes to the left of global variable 'q' defined in 'dudect/constant.c:22:17' (0x557218d6f080) of size 8 0x557218d6f060 is located 0 bytes to the right of global variable 'random_string' defined in 'dudect/constant.c:23:13' (0x557218d6ed40) of size 800 SUMMARY: AddressSanitizer: global-buffer-overflow (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d) Shadow bytes around the buggy address: 0x0aaec31a5db0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0aaec31a5dc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0aaec31a5dd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0aaec31a5de0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0aaec31a5df0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0aaec31a5e00: 00 00 00 00 00 00 00 00 00 00 00 00[f9]f9 f9 f9 0x0aaec31a5e10: 00 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 f9 f9 f9 0x0aaec31a5e20: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00 0x0aaec31a5e30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0aaec31a5e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0aaec31a5e50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==32532==ABORTING ``` 這個問題我找了很久，最後發現是 ```dudect/constant.c``` 裡的常數 number_measurements 發生問題。在 get_random_string 裡，會取用 random_string 這個全域變數。而 random_string 是在 prepare_inputs 裡先準備好的字串陣列。每當我們呼叫 get_random_string 一次，就會拿出 random_string 這個字串陣列裡的下一個字串。 ``` static char random_string[100][8]; ``` 從變數的宣告可以看出，這個字串陣列總共可以儲存 100 個字串。然而 number_measurements 則被設定為 150 。此時在 get_random_string 裡 ``` random_string_iter = (random_string_iter + 1) % number_measurements; ``` 只要 random_string_iter 這個 int 大於 99 ，就會發生越界存取。例如當 random_string_iter 等於 100 時，就會嘗試存取 random_string_iter[100] ，也就是存取第 101 個字串，發生越界存取。解決方法：將 random_string[100][8] 改為 random_string[150][8]，或將 number_measurements 改為 100。 ## 論文〈dude, is my code constant time?〉研讀 ### Introduction ### APPROACH : TIMING LEAKAGE DETECTION #### A. Step 1 : Measure execution time ##### a)Classes definition Leakage detection 技術是利用對兩種不同類別的輸入進行比較。其中一種知名的方式就是 fixed-vs-random tests ("A testing methodology for side channel resistance validation.", "Test Vector LeakageAssessment (TVLA) methodology in practice.")， fix 類別的輸入是一個常數值，而 random 類別的輸入在每次的測量（ measurement ）都會隨機改變。 ##### b) Cycle counters 現代的 CPU 通常會提供 cycle counters ##### c) Environmental conditions 為了不影響時間的計算， fix 以及 random 類別的輸入都是在 measurement 前就要先準備好的。 #### B. step 2: Apply post-processing 實際上，在真正執行統計分析( statistical analysis )前會需要對每一個 measurement 做一些後處理。 ##### a) Cropping 去掉不合理的數值。因為 OS 可能會對測試程式執行時把它 interrupt 掉，導致執行時間異常地長。 :::warning 列出可能的因素: (後續作業也有探討) 1. 多處理器的排程; (注意 fibdrv 作業的 `taskset` 使用) 2. 在 userspace 可做到 preemptive scheduling，但 kernel 模式中，不見得有 preemption; 3. 中斷處理會有必要的開銷和影響; 請繼續探討。 :notes: jserv ::: ##### b) Higher-order preprocessing 不懂意義，可能需要參考論文 Towards sound approaches to counteract power-analysisattacks.(DPA), Leakage assessment methodology. 後會更加理解 #### C. step 3: Apply statistical test 開始對進行後處理( post-processing )過的資料進行假說檢定。零假說為" fix 輸入集以及 random 輸入集兩者的時間分佈是相同的"。 ##### a) t-test 學生 T 檢驗中的「雙樣本檢驗」，因為母體變異數不同，稱為 Welch 檢驗[1]。此外，又因為我們是希望比較兩個種類資料的趨勢是否相同，所以會是使用雙尾檢定[6] ##### b) Non-parametric tests 不懂，可能需要看 Test as a Competitorto Mutual Information Analysis. 才會比較懂。 ### RESULTS #### 實作得出的結論相同的程式碼，可能會因為不同的開發工具 (toolchain)、處理器、處理器架構、以及其他參數而有了不同的行為。例如相同的程式碼，在 A 處理器可能是常數執行時間，在 B 處理器可能變成線性執行時間（進而產生side-channel attack 的疑慮）。 --- ### 一些名詞： ### statistical test 應是假設檢定 (statistical hypothesis testing) 的意思。假設檢定有以下步驟（大致上）： 1. 提出零假設( null hypothesis )，以及對立假設( alternative hypothesis ) 2. 決定哪個檢測是合適的，並選擇檢驗統計量( Test Statistic ) T ，常用的檢定統計量有 Z（使用常態分配），以及這篇論文所使用的 t （使用 T 分配）。母體變異數已知：使用常態分配母體變異數未知，樣本數夠大時，使用變化後的常態分配[2] 母體變異數未知，樣本數不大時，使用 T 分配 3. 選擇一個顯著水平 a ，若低於這個機率閥值，就會拒絕零假設（最常用 5% 或是 1%） 4. 根據在零假設成立時的檢驗統計量，找到數值最接近對立假設，且機率為顯著性水平 ( α )的區域，此區域稱為「拒絕域」，意思是在零假設成立的前提下，落在拒絕域的機率只有 α 5. 針對檢驗統計量 T ，根據收集到的樣本（在這篇論文的樣本就是程式執行時間，屬於”連續型隨機變數“ ）計算其估計值 $t_{obs}$。 6. 若估計值 $t_{obs}$ 在拒絕域以外，則暫時“不拒絕”零假設，若估計值落在拒絕域，則拒絕零假設。我的看法： null hypothesis 並不是通過一次假設檢定（其中一次試驗中計算出的估計量）就能夠“確定的”，而是有可能在一次的試驗中”不被拒絕“。在下一次搜集更多樣本，並重新計算估計量後，可能就能夠”拒絕“零假設。 --- 完整假說檢定應用的簡單範例可參考[2] ### null hypothesis（零假設）這裡的零假設為 "the two timing distributions are equal"。 ### leakage detection paper : Statistics and secret leakage. Differential power analysis(DPA) ### student's t-distribution （學生 t 分佈）在不知道母體平均數時，利用抽樣的資訊來推論出近似常態分配的分佈。自由度會影響學生 t 分佈的圖型。自由度無限大的學生 t 分佈便是常態分佈。 ### student's t-test （學生t檢驗）利用學生 t 分佈進行假設檢定。 ### Welch's t-test (Welch's 檢驗) 兩組資料變異數不同時的 t-test 。 ### test vector 測試向量，作為程式的輸入。 ### Welford's online algorithm[9] --- ### Simulation是如何實作 dudect 的概念很感謝 colinyoyo26 同學的共筆。 qtest.c ```cpp static bool do_insert_tail(int argc, char *argv[]) { if (simulation) { if (argc != 1) { report(1, "%s does not need arguments in simulation mode", argv[0]); return false; } bool ok = is_insert_tail_const(); if (!ok) { report(1, "ERROR: Probably not constant time"); return false; } report(1, "Probably constant time"); return ok; } ``` qtest.c ```cpp static bool do_size(int argc, char *argv[]) { if (simulation) { if (argc != 1) { report(1, "%s does not need arguments in simulation mode", argv[0]); return false; } bool ok = is_size_const(); if (!ok) { report(1, "ERROR: Probably not constant time"); return false; } report(1, "Probably constant time"); return ok; } ``` 當變數 Simulation 不為 0 時，就會開始進行 dudect 的測試 (`is_size_const`, `is_insert_tail_const`)，來試驗這個 function 是否為常數執行時間。 dudect/fixture.c: ```cpp bool is_size_const(void) { bool result = false; t = malloc(sizeof(t_ctx)); for (int cnt = 0; cnt < test_tries; ++cnt) { printf("Testing size...(%d/%d)\n\n", cnt, test_tries); //初始化測試會用到的結構 queue_t *q 以及 t_ctx *t init_once(); for (int i = 0; i < enough_measurements / (number_measurements - drop_size * 2) + 1; ++i) //正式開始測試 result = doit(1); printf("\033[A\033[2K\033[A\033[2K"); if (result == true) break; } free(t); return result; } ``` 這裡給了我們 10 次機會證明自己是常數執行時間 (test_tries == 10) dudect/fixture.c:doit ```cpp static bool doit(int mode) { int64_t *before_ticks = calloc(number_measurements + 1, sizeof(int64_t)); int64_t *after_ticks = calloc(number_measurements + 1, sizeof(int64_t)); int64_t *exec_times = calloc(number_measurements, sizeof(int64_t)); uint8_t *classes = calloc(number_measurements, sizeof(uint8_t)); uint8_t *input_data = calloc(number_measurements * chunk_size, sizeof(uint8_t)); //沒有成功分配記憶體 if (!before_ticks || !after_ticks || !exec_times || !classes || !input_data) { die(); } prepare_inputs(input_data, classes); measure(before_ticks, after_ticks, input_data, mode); differentiate(exec_times, before_ticks, after_ticks); update_statistics(exec_times, classes); bool ret = report(); ``` prepare_input: 在這裡準備幾個值```input_data, classes, random_string``` input_data 決定了在執行 q_new 以及 q_insert_tail 前要插入幾個 queue_t。 classes 便是決定是fix 或是 random ，假如是 fix(class == 0) 的話，該次的input_data = 0x00(在執行 q_new 或 q_insert_tail 前不會插入任何的queue_t 結構) measure : 實際執行帶測程式，記錄下執行時間 differentiate : 算出每一次執行的時間 update_statistics : 更新 t_ctx 結構 ( Welford's online algorithm[9] ) report : 算出檢驗統計量t[1]，並查看是否落在拒絕域( t > t_threshold_bananas 就是太誇張了，直接踢掉， t > t_threshold_moderate 就是落在臨界點附近的拒絕域) 假如落在拒絕域，則拒絕零假設，表示 fix class 的執行時間分布與 random class 的執行時間分布不同，也就表示執行時間並非常數。假若沒有落在拒絕域，則暫時不拒絕零假設，表示暫時不拒絕 fix class 的時間分布與 random class 相同，也就表示執行時間為常數。 --- ### 待解問題 Q: 沒看到此次假說檢定所使用的顯著性水平 ( α )是使用什麼值？ GUESS: 論文上有提到，在[這裏](https://github.com/oreparaz/dudect/blob/master/src/fixture.c#L57)，但我看不懂。 Q: 為什麼會是 t-value 超過什麼值就算拒絕，而 t-value 沒有小於什麼值呢？這樣就不會是雙尾檢定，而是單尾檢定了？ Q: 沒有看到計算臨界點的地方，我猜論文使用 4.5~5 是為了節省這方面的運算，求一個大致上的值？ P.S. 臨界點是 t_threshold_bananas 以及 t_threshold_moderate Q: 為什麼要測試10( test_tries )次？ :::warning `test_tries` 是個妥協，但不見得合理，你應該繼續探討。 :notes: jserv ::: Q: `enough_measurements / (number_measurements - drop_size * 2) + 1` 為什麼要執行這個次數？ ## select 系統呼叫在這裡我會整理一些我閱讀的資源，以及在閱讀過程中所碰到的問題。 ### RIO（Robust I/O）套件是為了能夠 "thread safe"，以及提升效能，也解決了“不足值”的問題（這讓 rio_readn 以及 rio_writen 可以交替使用），而在原生的 Unix I/O function 外多加了一層包裝。 RIO 提供了兩種類型的 function : 1. Unbuffered input and output functions 相較於原本的 read, write 的優點：現在假想一個情況，自己已經開了一塊 1MB 的記憶體，希望可以從硬碟中利用檔案系統讀取 1MB 的檔案。但是系統核心的緩衝區沒有那個大，所以一定會需要很多次的 read(int fd, char *buf, size_t len) ，並在每次 read 後自行計算還不足多少，還需要讀多少才夠，也就是需要自行計算不足值。而 rio_readn 則幫我們處理了這個問題。此外 rio_readn 也支援被訊號中斷後的處理 (EINTR 附近的程式碼)。 2. Buffered input functions 相較於原本的 read, write 的優點：解決了 thread safe 的問題。 #### Unbuffered input and output functions ```cpp ssize_t rio_readn(int fd, void *userbuf, size_t n); ssize_t rio_writen(int fd, void *userbuf, size_t n); ``` #### buffered input and output functions ```cpp void rio_readinitb(rio_t *rp, int fd); ssize_t rio_readlineb(rio_t *rp, void *userbuf, size_t maxlen); ssize_t rio_readnb(rio_t *rp, void *userbuf, size_t n); ssize_t rio_read(rio_t *rp, char *userbuf, size_t n); ``` rio_readnb 以及 rio_readlineb 都是基於 rio_read ， rio_read 會在每次讀取時，檢查自己的 buffer 是否還有未讀取的字元。當自己的 buffer 沒有剩餘的字元後 (rp->rio_cnt <= 0) 才會使用系統呼叫。這樣的做法可以”節省系統呼叫“（每次系統呼叫都會試著拿滿 rp->rio_buf 大小的字串）。 ### 分析 console.c 的實作很有趣的地方是，會把所有輸入的檔案丟進一個 stack 裡，並且在想要取用的時候再用 pop_file 取出。然而這裡只會使用到最多一個 file ，所以我猜後續會需要修改成能夠輸入多個 file 。 [2] -- 目前進度：現在真的會跑到 select 那一行嗎？ add_cmd 的方式？ ### 疑問 Q: 什麼是 thread safe ? 要怎麼做到 thread safe ? G: 可以參考 CS:APP(Section 12.7.1) Q: 為什麼 rio 套件可以幫助達到thread safe ? G: 我猜是因為每個 thread 都會維護自己的一份 rio_t 結構？ [3] :::warning 猜測後要驗證，實驗呢？ :notes: jserv ::: 需要做的事： - [ ] 分析 console.c的實作 [進行中] - [x] 閱讀[RIO套件](http://csapp.cs.cmu.edu/2e/ch10-preview.pdf) [完成] - [x] 閱讀CS:APP 第10章 [完成] - [ ] 比照 qtest 實作，撰寫獨立的終端機命令解譯器 - [ ] [CS:APP 第 10 章重點提示](https://hackmd.io/@sysprog/H1TtmVTTz) [進行中] - [x] 實驗[1] dudect[完成] - [x] 實驗[2] qtest[完成] - [x] 實驗[3] rio-package and thread-safe[完成] - [ ] 繼續探討會被 dudect cropping 掉的可能原因 - [ ] 繼續探討 test_tries 為什麼為 10 次的原因 ## 相關實驗 ### 實驗[1] 實驗目標：查看字串長度的變化，會不會對 lab0-c 中與 dudect 相關的實驗帶來影響。實驗設計：讓字串長度在 1, 999 間變動，查看經過 dudect 的驗證後，是否依舊是常數時間。更改的程式碼只在 dudect/constant.c:prepare_input ```cpp void prepare_inputs(uint8_t *input_data, uint8_t *classes) { srand(time(NULL)); randombytes(input_data, number_measurements * chunk_size); for (size_t i = 0; i < number_measurements; i++) { classes[i] = randombit(); if (classes[i] == 0) *(uint16_t *) (input_data + i * chunk_size) = 0x00; } for (size_t i = 0; i < 100; ++i) { /* Generate random string */ randombytes((uint8_t *) random_string[i], 1000-1); //因為 randombytes 假如 random 到 0 的話也會把字串截斷 // 而我們現在希望可以控制字串長度，所以把 0 用可視字元取代 for(int j = 0;j < 1000;j++) { if (random_string[i][j] == 0) { random_string[i][j] = rand()%95+32; } } //隨機挑選長度 0 或是 999 的字串 if (i % 2 == 0) { random_string[i][1] = 0; } else { random_string[i][1000-1] = 0; } } } ``` 結果： dudect 的測試中，對於 q_insert_tail 的測試結果從 constant time 變得不再是 constant time ### 實驗[2] 實驗目標：檢驗 push_file, pop_file 的設計是否是能輕易支援多檔案輸入。實驗設計：將原先 console.c:run_console 只能 push 一個檔案，改為會 push 多個檔案，查看需不需要更改其他程式碼，以完成能輸入多個檔案的功能。更改的程式碼只在 console.c:run_console ```cpp bool run_console(char *infile_name) { if (!push_file(infile_name)) { report(1, "ERROR: Could not open source file '%s'", infile_name); return false; } //把原本的檔案再 push 一次 if (!push_file(infile_name)) { report(1, "ERROR: Could not open source file '%s'", infile_name); return false; } while (!cmd_done()) cmd_select(0, NULL, NULL, NULL, NULL); return err_cnt == 0; } ``` 結果：在不用更改其他程式碼的情況下，同一個檔案的指令被執行了兩次，可知道這樣可以很容易地擴充成輸入多個檔案。 ### 實驗[3] 實驗目的：探討 RIO 為什麼是 thread-safe 實驗設計：嘗試舉一個例子，使得使用 unix I/O 會導致 thread-unsafe ，但使用 rio 能夠 thread-safe 由 CS:APP 3rd edition section 10.5 尾端的討論 "Origins of the RIO package" ，可以得知比較的對象為 W.Richard Stevens 在 Unix network programming 裡提出的 readline, readn, writen (詳情可見 Unix network programming, Volume 1, 3rd, section 3.9)。事實上在 Unix network programming, Volume 1, 3rd, section 11.18 以及 section 26.5 也有提出 thread safe 的實作。只不過 rio library 是透過每一個 rio_t 結構都有自己的 buffer 來達到 thread safe。而 UNP 則是利用 TLS (Thread-Local Storage) 來達到 Thread safe 以及 re-entrant 。 rio library 除了解決了 thread-safe 的問題，也解決了原本 UNP 的 readn, 以及 readline 沒辦法對同一個 file descriptor 混用的問題（因為 readn 沒有 buffer，而 readline 有，混用的話會導致讀取的資料不連續）。 rio library 的 readnb 以及 readlineb 都有 buffer ，且使用的是同一個 buffer (包含在 rio_t 結構裡)，使得即使混用 readnb 以及 readlineb，所得出的結果也都是連續的。知道比較的對象後，也比較容易舉出範例了。假設檔案 read 的內容全為 'a' ，檔案 read2 全為 'b'。 ```cpp #include <stdio.h> #include <stdlib.h> #include <unistd.h> //Header file for sleep(). man 3 sleep for details. #include <pthread.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> char shared_buffer[51]; void *myThreadFun(void *vargp) { int fd = open("read2", O_RDONLY); for(int i = 0;i < 100000;i++) { read(fd, shared_buffer, 50); shared_buffer[50] = 0; printf("printf shared_buf in thread : %s\n", shared_buffer); } close(fd); return NULL; } int main() { int fd = open("read", O_RDONLY); pthread_t thread_id; printf("Before Thread\n"); pthread_create(&thread_id, NULL, myThreadFun, NULL); for(int i = 0;i < 100000;i++) { read(fd, shared_buffer, 50); shared_buffer[50] = 0; printf("print shared_buf in main : %s\n", shared_buffer); } close(fd); exit(0); } ``` 向這樣用一個全域的字串去存，兩個 thread 間當然會互相干擾。結果會如下： ``` printf shared_buf in thread : aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa print shared_buf in main : aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ``` 假如沒有 race condition 的話，thread 跟 main 印出的資料應該要不同才對，而上面則是印出相同的資料。那使用 rio 套件呢？ ```=cpp void *myThreadFun(void *vargp) { int fd = open("read2", O_RDONLY); char buf[51]; rio_t *rioptr = (rio_t *)malloc(sizeof(rio_t)); rioptr->rio_fd = fd; rioptr->rio_cnt = 0; rioptr->rio_bufptr = rioptr->rio_buf; for(int i = 0;i < 100000;i++) { rio_readnb(rioptr, buf, 50); buf[50] = 0; printf("printf shared_buf in thread : %s\n", buf); } free(rioptr); close(fd); return NULL; } int main() { int fd = open("read", O_RDONLY); char buf[51]; pthread_t thread_id; printf("Before Thread\n"); pthread_create(&thread_id, NULL, myThreadFun, NULL); rio_t *rioptr = (rio_t *)malloc(sizeof(rio_t)); rioptr->rio_fd = fd; rioptr->rio_cnt = 0; rioptr->rio_bufptr = rioptr->rio_buf; for(int i = 0;i < 100000;i++) { rio_readnb(rioptr, buf, 50); buf[50] = 0; printf("print shared_buf in main : %s\n", buf); } free(rioptr); close(fd); exit(0); } ``` 也就不會出現這樣的問題了，因為每個 thread 不會使用共用的字串，也就不會相互干擾了。 ## CS:APP 第十章自我檢查清單 ### tty 是什麼？緣起為何？對 UNIX 的影響又是？ ### 為何 socket 也是一種 file type 呢？具體使用方式為何？ ### RIO package 提供 I/O 封裝的目的為何？對應到真實世界的應用為何？ ### Buffered I/O 的動機和效益為何？ ### 在 fork 系統呼叫後，child process 和 parent process 的檔案分享狀況為何？(提示：閱讀 man-pages) ## Reference :::warning 電腦科學相關的詞目應避免參照中文 Wikipedia，改以英文 Wikipedia 為主，避免前者資訊落後於最新進展 :notes: jserv ::: [1][假設檢定，維基百科](https://zh.wikipedia.org/wiki/%E5%81%87%E8%A8%AD%E6%AA%A2%E5%AE%9A) [2][假設檢定，投影片](http://web.ntpu.edu.tw/~wtp/statpdf/Ch_10.pdf) [3][假設檢定，blog](https://mropengate.blogspot.com/2015/03/hypothesis-testing-p-value.html) [4][學生t分配，維基百科](https://zh.wikipedia.org/wiki/%E5%AD%A6%E7%94%9Ft-%E5%88%86%E5%B8%83) [5][學生t檢定，維基百科](https://zh.wikipedia.org/wiki/%E5%AD%B8%E7%94%9Ft%E6%AA%A2%E9%A9%97) [6][Welch's t-test](https://en.wikipedia.org/wiki/Welch's_t-test?fbclid=IwAR2bf8PIpTYhoYcC_rJLeLdko-y7HR0h_GZm2a802EaN6_z8CIVGul4VzTE) [7][Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf) [8][colinyoyo26同學的共筆](https://hackmd.io/@colinyoyo26/2020q1lab0?fbclid=IwAR2ArsIu7MHRB0cGHrg87Pl3OFyeVLzs_sDwSyfu5eJ7J0SFO4Ym7wsQCd4) [9][Algorithms_for_calculating_variance](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance) [10][統計學:大家都喜歡問的系列-p值是什麼](https://medium.com/@chih.sheng.huang821/%E7%B5%B1%E8%A8%88%E5%AD%B8-%E5%A4%A7%E5%AE%B6%E9%83%BD%E5%96%9C%E6%AD%A1%E5%95%8F%E7%9A%84%E7%B3%BB%E5%88%97-p%E5%80%BC%E6%98%AF%E4%BB%80%E9%BA%BC-2c03dbe8fddf) [11][基礎統計](http://demo1.nkuht.edu.tw/~tient/bs/bsbook.pdf) [12]CS:APP 3rd edition 第十章 [13][CS:APP筆記](https://xxg1413.gitbooks.io/csapp/content/Chapter10/10.4.html) [14][CS:APP CMU 投影片](http://www.cs.cmu.edu/afs/cs/academic/class/15213-f19/www/schedule.html) [15][CS:APP 課程影片](https://www.youtube.com/watch?v=G4z6h_DcV4c&list=PLbY-cFJNzq7z_tQGq-rxtq_n2QQDf5vnM&index=16) [16][CS:APP 第十章重點提示](https://hackmd.io/@sysprog/H1TtmVTTz) [17][CS:APP 第十章筆記](https://blog.csdn.net/zy691357966/article/details/51519608) [18][Wikpedia: Thread safety](https://en.wikipedia.org/wiki/Thread_safety) [19][UNIX NETWORK PROGRAMMING, Volume 1, Third Edition Source code](https://github.com/unpbook/unpv13e) [20]UNIX NETWORK PROGRAMMING, Volume 1, Third Edition, section 3.9, 11.18, 26.15 ###### tags: `linux2020`