2021q1 Homework1 (lab0)

# 2021q1 Homework1 (lab0) contributed by < `Tzuying` > ###### tags: `linux2021` > [J01: lab0](https://hackmd.io/@sysprog/linux2021-lab0) ## Roadmap 1. 實做, 本程式內建自動評分程式 * :heavy_check_mark: 補完並逐步精進實做 * `q_new` / `q_free` / `q_insert_head` / `q_insert_tail` / `q_size` / `q_remove_head` / `q_reverse` / `q_sort` * :heavy_check_mark: `q_insert_tail` 和 `q_size` 實作改寫為 $O(1)$ 時間複雜度 * :ferris_wheel: 實做 Coroutine * :ferris_wheel: 解釋 select * :ferris_wheel: antirez/linenoise 的運作原理等等 [作業要求](https://hackmd.io/7GxMpQwbSGuOl93CDEAr2g?view#-%E4%BD%9C%E6%A5%AD%E8%A6%81%E6%B1%82) 2. 開放原始碼專案的關鍵基本素養 (前期訓練) :::success * [ ] 動態程式分析工具 [Valgrind](https://valgrind.org/) * [ ] 思考高效能的實作 (參照 [dudect](https://github.com/oreparaz/dudect) 工具及對應的論文 [Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf)) * [ ] 執行時期的記憶體檢測 [Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer) * [ ] 程式風格檢查 [clang-format](https://clang.llvm.org/docs/ClangFormat.html) * [ ] 拼字檢查 [GNU Aspell](http://aspell.net/) * [ ] 靜態程式碼檢查 [Cppcheck](http://cppcheck.sourceforge.net/) * [ ] 移除不安全函式 [Common vulnerabilities guide for C programmers](https://security.web.cern.ch/recommendations/en/codetools/c.shtml) * [ ] 可讀好維護：改善 Makefile，使其更加模組化; 縮減程式碼，移除不使用的部分; * [ ] [git-good-commit](https://github.com/tommarshall/git-good-commit) * [ ] 應用 [Git pre-push hook](https://github.com/tommarshall/git-good-commit) / [Git pre-commit hook](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks) 至 [quiz1](https://hackmd.io/@X3iwPYFoQeS00wezEjCf3w/HkGpIgiMu) ::: 4. 研讀 * [ ] [C Programming Lab](https://www.cs.cmu.edu/afs/cs/academic/class/15213-s20/www/labs/cprogramminglab.pdf) * [ ] [CS:APP 第 10 章重點提示](https://hackmd.io/@sysprog/H1TtmVTTz) ## Preview (開始 coding 前的紀錄) 1. 瀏覽作業說明，決定 Roadmap and mindset ### Address Sanitizer * [Address/Thread/Memory Sanitizer](https://www.slideshare.net/sermp/sanitizer-cppcon-russia) * [A look into the sanitizer family (ASAN & UBSAN)](https://www.slideshare.net/CysinfoCommunity/a-look-into-the-sanitizer-family-asan-ubsan-by-akul-pillai) ### Valgrind * 閱讀 [以 Valgrind 分析記憶體問題](https://hackmd.io/JSFDnHn0Rpe0V7d-AqLAXA?view#%E4%BB%A5-Valgrind-%E5%88%86%E6%9E%90%E8%A8%98%E6%86%B6%E9%AB%94%E5%95%8F%E9%A1%8C) * Valgrind is an undefined behavior checking tool first, a function and memory profiler second, a data-race detection tool third, and a leak checking tool last. [ref](http://maintainablecode.logdown.com/posts/245425-valgrind-is-not-a-leak-checker) * [你所不知道的 C 語言：函式呼叫篇](https://hackmd.io/@sysprog/c-function) * [lab0-c](https://github.com/sysprog21/lab0-c) 已整合 [Valgrind](https://valgrind.org/) 來協助學員細緻地找出記憶體相關的問題 ```shell $ make valgrind ``` * [Valgrind User Manual](https://valgrind.org/docs/manual/manual.html) ### 自動測試程式 * 偵測問題發生 * [ ] [你所不知道的C語言: Stream I/O, EOF 和例外處理](https://hackmd.io/@sysprog/c-stream-io) * 追蹤記憶體 * 以 `block_ele_t` 紀錄所有程式中 malloc/free 的情況，其中成員 `payload[0]` 與 `magic_header` 是具功能性的變數設計 ```cpp typedef struct BELE { struct BELE *next; struct BELE *prev; size_t payload_size; size_t magic_header; /* Marker to see if block seems legitimate */ unsigned char payload[0]; /* Also place magic number at tail of every block */ } block_ele_t; ``` * `payload[0]` * 以 instance 記憶體角度來看，`payload` 是 append 在 `block_ele_t` 後的一塊記憶體 * 當然要做到這件事，該 instance 記憶體給定要用：`malloc(sizeof(block_ele_t) + payload 大小)` * 以 lab0 的使用情境，可解讀為每份記憶體追蹤紀錄，直接包含被紀錄的記憶體本身，開頭在 `payload[0]` ，大小在 `payload_size`。 * 好處1 => 最後一個成員大小可任意指定 * 好處2 => 這樣的作法暗示這兩個(記憶體追蹤紀錄與被紀錄的記憶體本身)的 malloca 是連動的，不用分開做兩次 * 這樣的設計(結構體中的最後一個成員是大小為 0 的陣列)在 Linux kernel 常見，也曾在 windows SDK 看過，是種約定俗成的作法。反過來說，若不知道這些約定俗成的作法，在多人合作的情況(kernel)，從解讀到使用就有各種掙扎與出錯的可能 * [ ] GCC 中 [Array of Length Zero](https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html), [中文解說](http://frankchang0125.blogspot.com/2013/01/c-struct-hack-structure-with-variable.html) * `magic_header` * 以 instance 記憶體角度來看，"被紀錄的記憶體"(`payload`) 會用兩個整數包著，裡面各自紀錄著兩個 Magic Number，作為驗證這塊記憶體是否是由 test_malloc 分配的依據，以及作為記憶體是否有產生越界等不佳使用的依據 * 要做到這件事，該 instance 記憶體給定要再變成：`malloc(sizeof(block_ele_t) + payload_size + sizeof(size_t)) * 首 magic number = 0xdeadbeef MAGICHEADER * 尾 magic number = 0xbeefdead MAGICFOOTER * FILLCHAR (即 0x55) * [hexspeak](https://en.wikipedia.org/wiki/Hexspeak) * ![](https://i.imgur.com/Ovsk2Qy.png) * Signal * 像是中斷的作用，基本使用是處理特殊的系統事件(SIGSEGV, SIGALRM, SIGWINCH)，使用者可以更改 signal 對應的動作, [SIGNAL(7) ](https://man7.org/linux/man-pages/man7/signal.7.html) * `signal(SIGSEGV, sigsegvhandler)` => overwrite default handler * SIGSEGV = 無效的記憶體參照 * 原本這個事件的預設處理是終止行程的運作，修改後就會跑去 `sigsegvhandler` 不會 * 同一程式任意位置的 goto 可以用 `setjmp` / `longjmp` * 用 setjmp 指定之後想跳回的目的地，用 longjmp 跳回去之前 setjmp 位置 * 所以**呼叫 setjmp 位置** (the target of the goto) 會重複跑到，自然設定與跳回時的處理會不同，這時就可以依據 setjmp 的回傳值分辨，用跳的會回傳非零值 * The stack context will be invalidated if the function which called setjmp() returns. => 理解： setjmp 被呼叫的位置決定了 longjmp's caller 的範圍，使用前先確定其 setjmp's caller 還在 function stack 裡 * 不同執行緒呢？各自一份？但後面 signal 版本是 system-wide ? * 保留 signal 狀態版本：`sigsetjmp` / `siglongjmp` * 選有 sig 的還是沒有 sig 的 jmp 好呢？ * 在 signal handler 內用 sig 那組： (1) 可以指定該 handler 不被哪些特定 signal 中斷執行 act.sa_mask (2) 同一個 signal 如果持續發生，handler 不會重複執行，因為自動加到 mask 裡了 (3) 務必成對使用，例如如果 sigsetjmp + longjmp，就沒辦法預估 signal mask 變成怎樣 * 因為 signal 可能發生在任意時候，無法保證發生時 setjmp 設定好了沒 => employ a guard variable, jmp_ready, to indicate whether the env buffer has been initialized. * overwrite signal hander 本來就有改變原有執行流程的效果，為什麼還要在引用 sigsetjmp / siglongjmp ? 模組化？ ### [Coroutine](https://hackmd.io/7GxMpQwbSGuOl93CDEAr2g?view#Coroutine) * 多任務切換技巧, since 1960 * coroutine 的 “yield” vs multi-threading 的 “yield” * 簡單來說，該如何一方面允許多 thread 並行，又能適當規劃個別單元對系統資源的使用。 * 參考實作 * [Coroutines in less than 20 lines of standard C](https://fanf.livejournal.com/105413.html) * [Coroutines in one page of C](https://hackmd.io/7GxMpQwbSGuOl93CDEAr2g?view) ### "量測程式執行時間是否能在常數時間" 的背景理論 * 很多數字與方程式 => 從數學上理解效能問題設計與感受該有的模樣讓開發階段就有這樣的期待 * Welch’s t-test * 好習慣：有思考效能的開發，有客觀評斷的設計 ## 解題 ### 產出第一個版本 #### q_new : 觀察 malloc 所有可能回傳值 1. 是自定義的 : test_malloc => 引入設定失敗率的設計 / 記憶體追蹤管理 (參考 "追蹤記憶體" 了解設計原理) 2. return value : NULL or a pointer #### q_free : 觀察所有以 malloc 生成的記憶體 1. 新節點如何變新增的？ do_insert_head => q_insert_head => malloc node and string 2. string 長度不定，怎麼知道要 free 多少？ 1. test_free 裡有相關紀錄 (b->payload_size)，所以知道一開始是 malloc 多大 2. 若是一般 maollc/free，應該有類似的機制可以查找到一開始動態設定多少，所以也是直接 free 就可以了 [C Strings: malloc & free (6)](http://www.se.rit.edu/~swen-250/activities/MicroActivities/C/mu_string_malloc/distrib/index.html) 3. 作法：抽出第一個 element (重設 q->head)，依序 free 該 element 的 string 與本身，直到 q->head 為空 1. 過去處理類似動作時，想像的都是**怎麼刪**。但若改成**怎麼抽離**(要刪的元素)是更安全的處理角度 2. 雖然只是暫態，但還是在過程中處理 q->size，不確定是否是有意義的習慣？ #### q_insert_head 1. string 為使用者輸入 => 空字串的可能？ None 2. q 的各種情況 => (1) q == NULL (2) q->head == NULL 3. 新建一個 element 並處理各種無法順利完成的情況 #### q_insert_tail 1. add *tail in queue struct ? 1. tail 首次給值 1. 當 size=1 時，會剛好 head 等於 tail 皆指在同一個 element 上 2. 可能發生在 q_insert_head 或 q_insert_tail，兩個地方都要考慮 head / tail 的維護 ``` // q_insert_head: do it after inserting (new head is ready) if (q->size == 1) q->tail = q->head; // q_insert_tail: do it befor inserting (dereference tail) if (q->tail == NULL) q->tail = q->head = newh; ``` 3. 更新 => 在 q_insert_tail 的最後，利用把新 element 串進 tail 4. 除了最後一步，其他步驟與需要注意的事跟 q_insert_head 一樣 5. 測試 1. `int len = sizeof(*s);` ==> `int len = strlen(s);` 2. forget to initiate `q->size` at q_new #### q_remove_head(queue_t *q, char *sp, size_t bufsize) 1. 觀察參數 sp 與 bufsize 如何作用 (do_remove_head) 1. sp(也就是 removes in do_remove_head)預計存放被移除的字串 2. 為了檢查是否 overflow (?)，removes 分為兩個部份 (1) 字串部份 len: string_length=1024 (2) padding 部份 len: STRINGPAD=1024 3. do_remove_head 有兩個地方還不是很理解/想像 1. `strncpy(checks, argv[1], string_length + 1);` argv[1] 不會有自己的結尾字元嗎為什麼需要複製到 1024+1 這麼長？ 2. 檢查 overflow 機制 1. 從 `string_length+1 ~ STRINGPAD` 通通要檢查？ => 因為記憶體操作可能亂跳，所以要檢查遠一點比較保險？ 2. `string_length+1` 之內不用檢查？沒有看到 string_length(預設值為 1024) 有在其他地方改值，這樣只是稍微超出一點的那種 overflow 不就檢查不出來？ ``` int i = string_length + 1; while ((i < string_length + STRINGPAD) && (removes[i] == 'X')) i++; if (i != string_length + STRINGPAD) { report(1, "ERROR: copying of string in remove_head overflowed " "destination buffer."); ok = false; } ``` #### q_reverse 1. 如果有雙向連結就可以 O(1) 了 2. 反轉的話 1. tail 是原本的 head => 一開始可以先做，但 next 要等反轉完 2. head 是原本的 tail => 一定會走一遍 queue，到結尾再將 head 給值 (或是直接用 head 走，但這樣(對我來說)會降低可讀性但無法得知一般而言) 3. 對每個 element 來說，就是把 next 從 right element 改成 left element 1. 如果不紀錄 left，就無法給定新的 next 值 2. 如果不紀錄 right，設定完新的 next 後就走不到原本的下一個 element 4. 網路上有看到另一種作法：把原本的 right 改為現在的 head，當前的 element->next 指到下下個 element，若僅以 BigO 來看還是 O(n) 5. 應該也有遞迴的寫法。遞迴與非遞迴，使用上的選擇？ #### q_sort 1. 避免使用遞迴呼叫 [Optimized QuickSort — C Implementation (Non-Recursive)](https://alienryderflex.com/quicksort/) 2. 遞迴與非遞迴比較 1. FUNCTION CALLS => function call 有其成本，進出函式使用的 processor time 1. 非遞迴可避免，這是單純的成本 3. STACK => 遞迴是將儲存成本轉嫁在 stack 內，所以 code 本身會變得比較簡潔 1. real array 其實速度比較快 2. stack 也有大小限制，stack overflow 處理不直覺或容易直接沒做 3. SWAP VARIABLE 1. 遞迴 = pass items through a swap variable many times in any one pivot round. 2. 非遞迴 = passes only one item (the pivot) through a swap variable, and only once per round. 3. 遞迴是 divide and conquer，核心操作簡單，但因為每次只做局部，就可能做很多次？ 4. MULTIPLE MOVES 5. UNNECESSARY MOVES 6. SELF-SWAPS 8. COMPARISONS => 非遞迴方式減少其實不必要的動作，是因為多做判斷後才作動 9. 文中提供的作法 1. quick sort 概念：每 round 選擇一個 pivot，對剩餘的 element 比大小，大的放右邊，小的放左邊 (這裡的大小左右處理是基於遞增 sort 來說) 2. 作者的非遞迴作法： 1. inside the round : pivot 最後一定在 list 中間（指數值排序而非空間上的位置），**右左各別**找出位置還不對的 element 交換位置，重複多次直到 `L<R` 不成立，最後留下的位置就會是數值中間位置，也就是 pivot 正確的位置 1. 停止條件：L R 相等或交錯 => 該區間已經都掃過一次，沒有其他錯位的元素 `while (L<R)` 2. ![](https://i.imgur.com/tTbCimZ.png) 4. outside the round : 1. 每次做完一個 round 代表一個 pivot 位置已經確認，並且增加一個之後要回頭看的區間(左半部)，與接下來要處理的區間(右半部) 1. 接下來要處理的區間 = pivot 位置+1 ~ 數列結尾 3. 之後要回頭看的區間 = 當前 round 的開頭 ~ pivot 位置-1 3. sort or back 條件：當這個 round 的元素個數已經不到 2 個，就會跳過處理上一層 `if (L<R)` 5. 所以 level 也可以說是剩下要處理的區間數 ![](https://i.imgur.com/jhqa2pq.png) 1. 綠色底色 = 目前處理的區間 2. 紅色底色 = 尚未處理的區間，以黑框分別 3. 紅字 = pivot 4. 橘字 = 已經排好的 ``` i = 0 ==> arr (4, 5, 3, 6, 2, 8, 1, 7, ) beg (0, 0, 0, 0, 0, 0, 0, 0, ) end (8, 0, 0, 0, 0, 0, 0, 0, ) piv = 4, [0 ~ 7] beg (0, 4, 0, 0, 0, 0, 0, 0, ) end (3, 8, 0, 0, 0, 0, 0, 0, ) i = 1 ==> arr (1, 2, 3, 4, 6, 8, 5, 7, ) beg (0, 4, 0, 0, 0, 0, 0, 0, ) end (3, 8, 0, 0, 0, 0, 0, 0, ) piv = 6, [4 ~ 7] beg (0, 4, 6, 0, 0, 0, 0, 0, ) end (3, 5, 8, 0, 0, 0, 0, 0, ) i = 2 ==> arr (1, 2, 3, 4, 5, 6, 8, 7, ) beg (0, 4, 6, 0, 0, 0, 0, 0, ) end (3, 5, 8, 0, 0, 0, 0, 0, ) piv = 8, [6 ~ 7] beg (0, 4, 6, 8, 0, 0, 0, 0, ) end (3, 5, 7, 8, 0, 0, 0, 0, ) i = 3 ==> arr (1, 2, 3, 4, 5, 6, 7, 8, ) beg (0, 4, 6, 8, 0, 0, 0, 0, ) end (3, 5, 7, 8, 0, 0, 0, 0, ) [8 ~ 7] back!! i = 2 ==> arr (1, 2, 3, 4, 5, 6, 7, 8, ) beg (0, 4, 6, 8, 0, 0, 0, 0, ) end (3, 5, 7, 8, 0, 0, 0, 0, ) [6 ~ 6] back!! i = 1 ==> arr (1, 2, 3, 4, 5, 6, 7, 8, ) beg (0, 4, 6, 8, 0, 0, 0, 0, ) end (3, 5, 7, 8, 0, 0, 0, 0, ) [4 ~ 4] back!! i = 0 ==> arr (1, 2, 3, 4, 5, 6, 7, 8, ) beg (0, 4, 6, 8, 0, 0, 0, 0, ) end (3, 5, 7, 8, 0, 0, 0, 0, ) piv = 1, [0 ~ 2] beg (0, 1, 6, 8, 0, 0, 0, 0, ) end (0, 3, 7, 8, 0, 0, 0, 0, ) i = 1 ==> arr (1, 2, 3, 4, 5, 6, 7, 8, ) beg (0, 1, 6, 8, 0, 0, 0, 0, ) end (0, 3, 7, 8, 0, 0, 0, 0, ) piv = 2, [1 ~ 2] beg (0, 1, 2, 8, 0, 0, 0, 0, ) end (0, 1, 3, 8, 0, 0, 0, 0, ) i = 2 ==> arr (1, 2, 3, 4, 5, 6, 7, 8, ) beg (0, 1, 2, 8, 0, 0, 0, 0, ) end (0, 1, 3, 8, 0, 0, 0, 0, ) [2 ~ 2] back!! i = 1 ==> arr (1, 2, 3, 4, 5, 6, 7, 8, ) beg (0, 1, 2, 8, 0, 0, 0, 0, ) end (0, 1, 3, 8, 0, 0, 0, 0, ) [1 ~ 0] back!! i = 0 ==> arr (1, 2, 3, 4, 5, 6, 7, 8, ) beg (0, 1, 2, 8, 0, 0, 0, 0, ) end (0, 1, 3, 8, 0, 0, 0, 0, ) [0 ~ -1] back!! ``` 6. 程式中的 `beg` 和 `end` 表示每一層的開始與結束位置 1. 假設當前 level 為 2，那表示現在處理的是 `beg[2]` ~ `end[2]-1` 區段，之後還要回頭處理 `beg[1]` ~ `end[1]-1` 與 `beg[0]` ~ `end[0]-1` 2. `beg` 放的是開頭的位置，但 `end` 放的是結束位置+1，所以使用時都會減 1 8. 版本 1 什麼時候會 fail ? 版本 2 怎麼改善的？ 1. 因為無法評估 array 需要多少 level 才能完成 sort ，所以若一開始設定的 `beg` 跟 `end` 不夠大，就會回傳失敗 2. 作者預期 array 越長，需要的 level 就越大，因此每次在決定接下來要處理的區間時，不再直接選右邊區間，而是看兩邊元素個數決定(選元素數量少的) 3. 以上圖同一個例子來看，使用版本 2，LEVEL 數可以從 3 降到 2 ![](https://i.imgur.com/J8l8vuq.png) ``` i = 0 ==> arr (4, 5, 3, 6, 2, 8, 1, 7, ) beg (0, 0, 0, 0, 0, 0, 0, 0, ) end (8, 0, 0, 0, 0, 0, 0, 0, ) piv = 4, [0 ~ 7] beg (0, 4, 0, 0, 0, 0, 0, 0, ) end (3, 8, 0, 0, 0, 0, 0, 0, ) swap!! i = 1 ==> arr (1, 2, 3, 4, 6, 8, 5, 7, ) beg (4, 0, 0, 0, 0, 0, 0, 0, ) end (8, 3, 0, 0, 0, 0, 0, 0, ) piv = 1, [0 ~ 2] beg (4, 0, 1, 0, 0, 0, 0, 0, ) end (8, 0, 3, 0, 0, 0, 0, 0, ) swap!! i = 2 ==> arr (1, 2, 3, 4, 6, 8, 5, 7, ) beg (4, 1, 0, 0, 0, 0, 0, 0, ) end (8, 3, 0, 0, 0, 0, 0, 0, ) [0 ~ -1] back!! i = 1 ==> arr (1, 2, 3, 4, 6, 8, 5, 7, ) beg (4, 1, 0, 0, 0, 0, 0, 0, ) end (8, 3, 0, 0, 0, 0, 0, 0, ) piv = 2, [1 ~ 2] beg (4, 1, 2, 0, 0, 0, 0, 0, ) end (8, 1, 3, 0, 0, 0, 0, 0, ) swap!! i = 2 ==> arr (1, 2, 3, 4, 6, 8, 5, 7, ) beg (4, 2, 1, 0, 0, 0, 0, 0, ) end (8, 3, 1, 0, 0, 0, 0, 0, ) [1 ~ 0] back!! i = 1 ==> arr (1, 2, 3, 4, 6, 8, 5, 7, ) beg (4, 2, 1, 0, 0, 0, 0, 0, ) end (8, 3, 1, 0, 0, 0, 0, 0, ) [2 ~ 2] back!! i = 0 ==> arr (1, 2, 3, 4, 6, 8, 5, 7, ) beg (4, 2, 1, 0, 0, 0, 0, 0, ) end (8, 3, 1, 0, 0, 0, 0, 0, ) piv = 6, [4 ~ 7] beg (4, 6, 1, 0, 0, 0, 0, 0, ) end (5, 8, 1, 0, 0, 0, 0, 0, ) swap!! i = 1 ==> arr (1, 2, 3, 4, 5, 6, 8, 7, ) beg (6, 4, 1, 0, 0, 0, 0, 0, ) end (8, 5, 1, 0, 0, 0, 0, 0, ) [4 ~ 4] back!! i = 0 ==> arr (1, 2, 3, 4, 5, 6, 8, 7, ) beg (6, 4, 1, 0, 0, 0, 0, 0, ) end (8, 5, 1, 0, 0, 0, 0, 0, ) piv = 8, [6 ~ 7] beg (6, 8, 1, 0, 0, 0, 0, 0, ) end (7, 8, 1, 0, 0, 0, 0, 0, ) i = 1 ==> arr (1, 2, 3, 4, 5, 6, 7, 8, ) beg (6, 8, 1, 0, 0, 0, 0, 0, ) end (7, 8, 1, 0, 0, 0, 0, 0, ) [8 ~ 7] back!! i = 0 ==> arr (1, 2, 3, 4, 5, 6, 7, 8, ) beg (6, 8, 1, 0, 0, 0, 0, 0, ) end (7, 8, 1, 0, 0, 0, 0, 0, ) [6 ~ 6] back!! ``` 10. MAX_LEVELS 與 array size 的評估方法? 8. 如何改寫成 linked list 適用？ 1. 要能夠交換兩個 element in O(1) ? 2. 要能夠知道每個 element 目前的 index in O(1) => 增加 index 為 member ？ 10. 其他 1. 好的圖結構可以簡單展現出完整的概念與差異 3. 要到達目標，該做的事不會減少，但可以用先釐清關鍵，設計過程讓作法簡潔 #### 不管了先 test & commit 1. 53/100 .... => 應該編寫邊跑作業裡的測試早點發現問題的 ``` --- trace-07-robust 0/6 +++ TESTING trace trace-08-robust: # Test operations on empty queue --- trace-11-malloc 0/6 +++ TESTING trace trace-12-malloc: # Test of malloc failure on insert_tail ERROR: Freed queue, but 4 blocks are still allocated --- trace-12-malloc 0/6 +++ TESTING trace trace-13-perf: # Test performance of insert_tail --- trace-16-perf 0/6 --- TOTAL 53/100 ``` 2. 65/100 => fix : 未處理 newh malloc 成功，但 newh->value 失敗的情況 3. :white_check_mark: 其他測試掃過一遍 4. Sanitizer 開起來 1. `make clean; make test SANITIZER=1` 2. ![](https://i.imgur.com/UwDiWix.png) 1. AddressSanitizer: global-buffer-overflow ? [wiki 簡單的例子](https://en.wikipedia.org/wiki/AddressSanitizer#Global-buffer-overflow) 3. 整理如下 1. 問題發生位置 0x559e5e022680 (data 已初始化) 4. PC(program counter) = 0x559e5de0ceea (text) 5. the stack call (base ~ top) = 0x7ffd89b4c210 ~ 0x7ffd89b4c200 1. bp 0x7ffd89b4c210 高記憶體位置 2. sp 0x7ffd89b4c200 低記憶體位置 6. ![](https://i.imgur.com/VJdzLei.png) 7. 出問題時是想讀取位於 0x559e5e022680 的 size **4** 資料，但 0x559e5e022680 位置上是定義大小為 **1** byte 的變數 `simulation` (回到程式可確認其為 bool 型態)，所以造成讀取範圍超過變數定義的大小 4. 到這裡其實就可以回到程式裡找出 where & why 變數 `simulation` 會放在 int* valp 指標變數裡，但因為不熟悉 Sanitizer 且 `param_list` / `simulation` 位置相近，所以一開始以為 overflow 的是 `param_list` 1. 要相信 Sanitizer 說的 2. `plist->` 這一步就跳離原本 plist 記憶體位置了 (BTW, `->` 優先權大於 `*`) 5. 回到原問題： `param_list` 是裝有各種 `Integer-valued parameters` 的容器，其中有幾個參數型態是 bool 而非 int，像是 `simulation`，設定記憶體位置時沒有影響(因為只需要知道開頭)，**但取值時 bool 型態的參數就會多讀 3 bytes，若其中有不為零的數值，就會出錯** ``` /* init_cmd : 初始化各種 cmd */ add_param("simulation", (int *) &simulation, "Start/Stop simulation mode", NULL); add_param("verbose", &verblevel, "Verbosity level", NULL); add_param("error", &err_limit, "Number of errors until exit", NULL); /* do_option_cmd : 取出特定 option cmd 並複寫相對應的變數值 */ param_ptr plist = param_list; int oldval = *plist->valp; *plist->valp = value; ``` 6. 修正：最簡單的作法就是把要放進 `param_list` 的參數都改為 int。若考慮減少記憶體使用，就要設計讓讀取時也知道其型態的方法 10. [複習] 一些名詞 1. [pc, program counter](https://en.wikipedia.org/wiki/Program_counter) = a processor register that indicates where a computer is in its program sequence 2. sp, stack pointer = the top of the stack 3. bp(ebp) base pointer [1998](https://www.cs.miami.edu/home/burt/journal/NT/basepointer.html), [call stack](https://en.wikipedia.org/wiki/Call_stack) 11. [複習] text / data / bss / stack / heap 記憶體管理的分類 * 靜態結構與變數 * 結構程式碼 text * 已初始化 data => 已初始化的全域或 static 變數 * 未初始化 bss => 未初始化的全域或 static 變數 * 區域/動態(memory)變數 * stack => 函數的區域變數，以及各種函數呼叫時需要儲存的資訊 * heap => 動態配置的變數 (by malloc( c++ ), new( c ))動態宣告 * ![](https://i.imgur.com/1PFPhVE.png) 12. Shadow ![](https://i.imgur.com/QgbbWOS.png) 1. 標示出問題的記憶體位置附近的可存取情況 2. 能理解的部份：[01] => 只有 1 byte 是可讀取的，由附近的 f9 可知道是 global 記憶體問題 3. 還不理解的 1. :white_check_mark: 看了相關投影片大概知道是透過 shadow & poison 偵測記憶體是否出問題，但無法理解確切作法？ [AddressSanitizer: A Fast Address Sanity Checker](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37752.pdf) 1. 8 bytes 壓縮以 shadow 的 1 byte 表示，兩者的記憶體位置對應就會是 `A(Shadow) = RealAddr >> 3` 3. 附近還有 3 個 01, 2 個 04 都是有問題的嗎？ 6. 寫的好慢阿 ... QQ 摸到作業的核心之前先不要停下來研究細節？ #### back to sort ### 解釋 select 系統呼叫在本程式的使用方式 ### 實做 Coroutine ### antirez/linenoise 的運作原理 ### 現有效能評估實作存在若干致命缺陷，請討論並提出解決方案 ### 指出現有程式的缺陷 ## 好奇的書籍 :books: * 《The Art of Computer Programming》 * 《The Linux Programming Interface》