2021q1 Homework1 (lab0)

--- tags: linux2021-hw --- # 2021q1 Homework1 (lab0) contributed by < `YoLinTsai` > ## :penguin: 作業要求 - [x] 在 GitHub 上 fork [lab0-c](https://github.com/sysprog21/lab0-c) - [x] 參閱 [Git 教學和 GitHub 設定指引](https://hackmd.io/@sysprog/git-with-github) ^附教學影片^ - [x] ==詳細閱讀 [C Programming Lab](https://www.cs.cmu.edu/afs/cs/academic/class/15213-s20/www/labs/cprogramminglab.pdf)== ，依據指示著手修改 `queue.[ch]` 和連帶的檔案，測試後用 Git 管理各項修改，記得也該實作 `q_sort` 函式。 - [x] 在提交程式變更前，務必詳閱 [如何寫好 Git Commit Message](https://blog.louie.lu/2017/03/21/%E5%A6%82%E4%BD%95%E5%AF%AB%E4%B8%80%E5%80%8B-git-commit-message/) - [x] 不用理會 [Autolab](http://www.autolabproject.com/) 和檔案下載的描述，這兩者都是 CMU 專屬 - [x] 除了修改程式，也要編輯「[作業區](https://hackmd.io/@sysprog/linux2021-homework1)」，增添開發紀錄和 GitHub 連結，除了提及你如何逐步達到自動評分程式的要求外，共筆也要涵蓋以下: - [x] 開啟 [Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer)，修正 `qtest` 執行過程中的錯誤 - [x] 先執行 `qtest` 再於命令提示列輸入 `help` 命令，會使開啟 [Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer) 觸發錯誤，應予以排除 - [ ] 運用 Valgrind 排除 `qtest` 實作的記憶體錯誤，並透過 Massif 視覺化 "simulation" 過程中的記憶體使用量，需要設計對應的實驗 - [ ] 在 `qtest` 中實作 [coroutine](https://en.wikipedia.org/wiki/Coroutine)，並提供新的命令 `web`，提供 web 伺服器功能，注意: web 伺服器運作過程中，`qtest` 仍可接受其他命令 - [ ] 可嘗試整合 [tiny-web-server](https://github.com/7890/tiny-web-server)，將 `FORK_COUNT` 變更為 `0`，並以 [coroutine](https://en.wikipedia.org/wiki/Coroutine) 取代原本的 fork 系統呼叫 - [ ] 解釋 [select](http://man7.org/linux/man-pages/man2/select.2.html) 系統呼叫在本程式的使用方式，並分析 [console.c](https://github.com/sysprog21/lab0-c/blob/master/console.c) 的實作，說明其中運用 CS:APP [RIO 套件](http://csapp.cs.cmu.edu/2e/ch10-preview.pdf) 的原理和考量點。可對照參考 [CS:APP 第 10 章重點提示](https://hackmd.io/@sysprog/H1TtmVTTz) :arrow_forward: 為避免「舉燭」，請比照 `qtest` 實作，撰寫獨立的終端機命令解譯器 (可建立新的 GitHub repository) - [ ] 說明 [antirez/linenoise](https://github.com/antirez/linenoise) 的運作原理，注意到 [termios](http://man7.org/linux/man-pages/man3/termios.3.html) 的運用 - [ ] 研讀論文 [Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf)，解釋本程式的 "simulation" 模式是如何透過以實驗而非理論分析，達到驗證時間複雜度，需要解釋 [Student's t-distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution) 及程式實作的原理。注意：現有實作存在若干致命缺陷，請討論並提出解決方案; - [ ] 指出現有程式的缺陷 (提示: 和 [RIO 套件](http://csapp.cs.cmu.edu/2e/ch10-preview.pdf) 有關)，嘗試強化並提交 pull request * 截止日期： * Mar 7, 2021 (含) 之前 * 越早在 GitHub 上有動態、越早接受 code review，評分越高 ## 修改 queue.c * **q_new** * 回傳q之前先檢查malloc的回傳值 ```cpp queue_t *q_new() { queue_t *q = malloc(sizeof(queue_t)); if (!q) { return NULL; } q->head = NULL; q->tail = NULL; q->size = 0; return q; } ``` * **q_free** * 先檢查q是否合法，依序 free linked list 中的 elements ```cpp= void q_free(queue_t *q) { if (q) { list_ele_t *newh; while (q->head) { newh = q->head->next; free(q->head->value); free(q->head); q->head = newh; } free(q); } return; } ``` * **q_insert_head** * malloc 一個新的 element 放到 q 的最前面，即: * ```new_e->next = q->head;``` * 把 q 中的變數妥當 maintain (size, head, tail) ```cpp= bool q_insert_head(queue_t *q, char *s) { list_ele_t *newh; if (!q) { return false; } newh = malloc(sizeof(list_ele_t)); if (!newh) { return false; } newh->value = malloc((strlen(s) + 1) * sizeof(char)); if (!newh->value) { free(newh); return false; } memset(newh->value, 0, strlen(s) + 1); snprintf(newh->value, strlen(s) + 1, "%s", s); newh->next = q->head; q->head = newh; q->size++; if (q->size == 1) { q->tail = newh; } return true; } ``` * **q_insert_tail** * 和 inset_head 概念大同小異，在 q 最後面再加一個element: * ```q->tail->next = new_e;``` * 特別處理 ```q->size = 0``` 的 case ```cpp= bool q_insert_tail(queue_t *q, char *s) { if (!q) { return false; } list_ele_t *newt; newt = malloc(sizeof(list_ele_t)); if (!newt) { return false; } newt->value = malloc((strlen(s) + 1) * sizeof(char)); if (!newt->value) { free(newt); return false; } memset(newt->value, 0, strlen(s) + 1); snprintf(newt->value, strlen(s) + 1, "%s", s); newt->next = NULL; q->size++; if (q->size == 1) { q->tail = newt; q->head = q->tail; } else { q->tail->next = newt; q->tail = newt; } return true; } ``` * **q_remove_head** * 先判斷 q 是否可以再拔 element (```q->size```是否為0) * strncpy? ```cpp= bool q_remove_head(queue_t *q, char *sp, size_t bufsize) { if (!q || q->size == 0) { return false; } list_ele_t *rmh = q->head; q->head = q->head->next; bool ret = false; if (sp) { strncpy(sp, rmh->value, bufsize - 1); sp[bufsize - 1] = '\0'; ret = true; } free(rmh->value); free(rmh); q->size--; return ret; } ``` * **q_size** * 檢查 q 是否為 NULL 後回傳 ```q->size``` ```cpp= int q_size(queue_t *q) { if (!q) { return 0; } return q->size; } ``` * **q_reverse** * 當 q 中的 element 數量小於等於1時不用反轉後沒有差別 * 我的想法是把每個 element 的 next 指向前一個 element，總共使用三個變數 cur_e、 next_e 和 tmp <br> * 這是一開始的狀態 (將 tail->next 指到 head) ```graphviz digraph linked_list{ rankdir=LR; node [shape=record]; a [label="{ <data> a | <ref> }"]; b [label="{ <data> b | <ref> }"]; c [label="{ <data> c | <ref> }"]; a:ref:c -> b:data [arrowtail=dot, dir=both, tailclip=false]; b:ref:c -> c:data [arrowtail=dot, dir=both, tailclip=false]; head -> a; tail -> c; } ``` * reverse a->next ```graphviz digraph linked_list{ rankdir=LR; node [shape=record]; a [label="{ <data> a | <ref> }"]; b [label="{ <data> b | <ref> }"]; c [label="{ <data> c | <ref> }"]; cur [label="{ <data> cur}"]; NULL [label="{ <data> NULL }"]; a:ref:c -> b:data [arrowtail=dot, dir=both, tailclip=false, color=grey]; b:ref:c -> c:data [arrowtail=dot, dir=both, tailclip=false]; head -> a; tail -> c; cur->a; a:ref:c -> NULL:data [arrowtail=dot, dir=both, tailclip=false, color=red]; } ``` * reverse b->next ```graphviz digraph linked_list{ rankdir=LR; node [shape=record]; a [label="{ <data> a | <ref> }"]; b [label="{ <data> b | <ref> }"]; c [label="{ <data> c | <ref> }"]; cur [label="{ <data> cur}"]; NULL [label="{ <data> NULL }"]; head -> a; tail -> c; cur->b; a:ref:c -> NULL:data [arrowtail=dot, dir=both, tailclip=false]; b:ref:c -> c:data [arrowtail=dot, dir=both, tailclip=false, color=grey]; b:ref:c -> a:data [arrowtail=dot, dir=both, tailclip=false, color=red]; } ``` * reverse c->next ```graphviz digraph linked_list{ rankdir=LR; node [shape=record]; a [label="{ <data> a | <ref> }"]; b [label="{ <data> b | <ref> }"]; c [label="{ <data> c | <ref> }"]; cur [label="{ <data> cur}"]; NULL [label="{ <data> NULL }"]; head -> a; tail -> c; cur->c; a:ref:c -> NULL:data [arrowtail=dot, dir=both, tailclip=false]; b:ref:c -> a:data [arrowtail=dot, dir=both, tailclip=false]; c:ref:c -> b:data [arrowtail=dot, dir=both, tailclip=false, color=red]; } ``` * maintain attribute (head, tail) ```graphviz digraph linked_list{ rankdir=LR; node [shape=record]; c [label="{ <data> c | <ref> }"]; b [label="{ <data> b | <ref> }"]; a [label="{ <data> a | <ref> }"]; NULL [label="{ <data> NULL }"]; a:ref:c -> NULL:data [arrowtail=dot, dir=both, tailclip=false]; b:ref:c -> a:data [arrowtail=dot, dir=both, tailclip=false]; c:ref:c -> b:data [arrowtail=dot, dir=both, tailclip=false]; head -> c[color=red]; tail -> a[color=red]; } ``` ```cpp= void q_reverse(queue_t *q) { if (!q || !q->head || q->size <= 1 ) { return; } list_ele_t *tmp = q->head->next; q->tail->next = q->head; while (tmp != q->tail) { q->head->next = tmp->next; tmp->next = q->tail->next; q->tail->next = tmp; tmp = q->head->next; } q->tail = q->head; q->tail->next = NULL; q->head = tmp; return; } ``` * **q_sort** ```cpp= list_ele_t *merge(list_ele_t *l1, list_ele_t *l2) { if (!l1) { return l2; } if (!l2) { return l1; } list_ele_t *head = NULL; list_ele_t *tmp = NULL; if (strcmp(l1->value, l2->value) <= 0) { head = l1; l1 = l1->next; } else { head = l2; l2 = l2->next; } tmp = head; while (l1 && l2) { if (strcmp(l1->value, l2->value) <= 0) { tmp->next = l1; l1 = l1->next; } else { tmp->next = l2; l2 = l2->next; } tmp = tmp->next; } while (l1 || l2) { if (l1) { tmp->next = l1; l1 = l1->next; } else { tmp->next = l2; l2 = l2->next; } tmp = tmp->next; } return head; } list_ele_t *merge_sort(list_ele_t *head) { if (!head || !head->next) { return head; } list_ele_t *fast = head->next; list_ele_t *slow = head; while (fast && fast->next) { fast = fast->next->next; slow = slow->next; } fast = slow->next; slow->next = NULL; list_ele_t *l1 = merge_sort(fast); list_ele_t *l2 = merge_sort(head); return merge(l1, l2); } void q_sort(queue_t *q) { if (!q || !q->head || !q->head->next) { return; } q->head = merge_sort(q->head); while (q->tail->next) { q->tail = q->tail->next; } } ``` ## Address Sanitizer lab0-c 支援透過 ```SANITIZER=1``` 開起 Address Sanitizer 強化記憶體檢測， command 如下: ```shell= $ make SANITIZER=1 $ make test ``` 實際執行後會產生這個錯誤 messenge: ``` ================================================================= ==7886==ERROR: AddressSanitizer: global-buffer-overflow on address 0x558be8287620 at pc 0x558be826fae8 bp 0x7ffe9a15fbd0 sp 0x7ffe9a15fbc0 READ of size 4 at 0x558be8287620 thread T0 #0 0x558be826fae7 in do_option_cmd /home/alan/linux2021/lab0-c/console.c:369 #1 0x558be826e8d0 in interpret_cmda /home/alan/linux2021/lab0-c/console.c:221 #2 0x558be826f0b5 in interpret_cmd /home/alan/linux2021/lab0-c/console.c:244 #3 0x558be82702e1 in cmd_select /home/alan/linux2021/lab0-c/console.c:594 #4 0x558be827085b in run_console /home/alan/linux2021/lab0-c/console.c:667 #5 0x558be826d3e5 in main /home/alan/linux2021/lab0-c/qtest.c:780 #6 0x7f21aaac50b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) #7 0x558be826ab8d in _start (/home/alan/linux2021/lab0-c/qtest+0x8b8d) 0x558be8287621 is located 0 bytes to the right of global variable 'simulation' defined in 'console.c:21:6' (0x558be8287620) of size 1 'simulation' is ascii string '' ``` trace 一下會發現錯誤發生在 ```console.c``` 中的 369 行: ```cpp=367 while (!found && plist) { if (strcmp(plist->name, name) == 0) { int oldval = *plist->valp; *plist->valp = value; if (plist->setter) plist->setter(oldval); found = true; } else plist = plist->next; } ``` 369行乍看沒有錯，但 ```plist->valp``` 指向的位置是```bool```，換句話說我們希望用```int```取值時會超過原本```bool```規劃的範圍，而取到**違法**的記憶體區塊。 ```cpp= //global variable simulation int simulation = false; //init_cmd() call add_param() using simulation add_param("simulation", (int *) &simulation, "Start/Stop simulation mode", NULL); //in the function we treat vlap as a pointer to int void add_param(char *name, int *valp, char *documentation, setter_function setter) { ... ele->valp = valp; ... return; } ``` 這個問題也會發生在執行 ```qtest``` 的指令 ```help``` 的時候，錯誤的原因和上述相同，這個問題我認為有兩個解決方式： 1. 暴力的把所有 ```vlap``` 統一用 ```int``` 去宣告 2. 用 ```void*``` 去 handle 各種不同的變數型態目前我粗糙的用第一種方式讓 Address Sanitizer 能通過，第二種方法還會牽涉到變數長度問題。 ## Valgrind ## Dudect [Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf) 是一篇發表於2016年的論文，根據我閱讀此論文後的理解如下。本篇論文的目的是要確認程式的某些運行時間是否是常數時間，他採用了統計的理論去**估計是否運行時間為常數的假設為真**，他的步驟如下： 1. Measure execution time: * 用兩個類別 ( fix vs. random ) 當作 testing input data 2. Apply post-processing: * 論文提到 Cropping 和 High-order preprocessing 的方法，後者尚未細讀過，目前的理解是本篇的實驗是建立在 Cropping 上討論 3. Apply statistical test: * 用 Welch’s t-test 來檢驗「兩組數據的運行時間分佈相同」這個假說 (hypothesis) **第一步**我認為不難理解，意義就是做實驗組跟對照組看看時間分佈是否相同。 **第二步**意外的容易影響實驗結果，論文中的論點是因為程式運行的當下可能被 OS 打斷等等因素，導致一些運行時間讓分佈失真，論文中的實驗結果也可以看出**不同的 cropping 參數確實會影響到結果的判讀**。 ### Welch’s t-test 今天我們要判斷兩個分佈是否相同， Welch’s t-test 用 **兩個分佈平均是否相同** 當作依據去判斷，也就是： $$ H_0 : \bar{X_0} - \bar{X_1} = 0$$ 所以我是這樣看待 t value 的：他和平均相同這個假設到底差多少 $$ t = \frac{(\bar{X_0} - \bar{X_1})-0}{\sqrt{\frac{s^2_1}{N_1} + \frac{s^2_2}{N_2}}} $$