2021q1 Homework1 (lab0)

# 2021q1 Homework1 (lab0) contributed by < `RainbowEye0486` > ## TODO 因為是第二次修課，所以已經實做過的內容僅放在[舊的共筆](https://hackmd.io/ifqanrYJST-RmTXNHnn78A)中 :::spoiler TODO list - [x] 重新 fork 檔案，更新至最新版本 - [x] 完整 `queue.c` 的實做 - [x] q_new: 建立新的「空」佇列 - [x] q_free: 釋放佇列所佔用的記憶體 - [x] q_insert_head: 在佇列開頭 (head) 加入 (insert) 給定的新元素 (以 LIFO 準則) - [x] q_insert_tail: 在佇列尾端 (tail) 加入 (insert) 給定的新元素 (以 FIFO 準則) - [x] q_remove_head: 在佇列開頭 (head) 移去 (remove) 給定的元素 - [x] q_size: 計算佇列中的元素數量 - [x] q_reverse: 以反向順序重新排列鏈結串列，該函式不該配置或釋放任何鏈結串列元素，換言之，它只能重排已經存在的元素 - [x] q_sort: 「Linux 核心設計」課程所新增，以遞增順序來排序鏈結串列的元素，可參閱 Linked List Sort 得知實作手法 - [x] 支援 Address Sanitizer，強化執行時期的記憶體檢測 - [x] 先執行 qtest 再於命令提示列輸入 help 命令，會使開啟 Address Sanitizer 觸發錯誤，應予以排除 - [ ] 運用 Valgrind 排除 qtest 實作的記憶體錯誤，並透過 Massif 視覺化 “simulation” 過程中的記憶體使用量，需要設計對應的實驗 - [ ] 可嘗試整合 tiny-web-server，將 FORK_COUNT 變更為 0，並以 coroutine 取代原本的 fork 系統呼叫 - [ ] 依據 [CERN Common vulnerabilities guide for C programmers](https://security.web.cern.ch/recommendations/en/codetools/c.shtml) 建議而移除 gets / sprintf / strcpy 等不安全的函式 - [ ] 解釋 select 系統呼叫在本程式的使用方式，並分析 console.c 的實作，說明其中運用 CS:APP RIO 套件的原理和考量點。可對照參考 CS:APP 第 10 章重點提示 :arrow_forward: 為避免「舉燭」，請比照 qtest 實作，撰寫獨立的終端機命令解譯器 (可建立新的 GitHub repository) - [ ] 說明 antirez/linenoise 的運作原理，注意到 termios 的運用 - [ ] 研讀論文 Dude, is my code constant time?，解釋本程式的 “simulation” 模式是如何透過以實驗而非理論分析，達到驗證時間複雜度，需要解釋 Student’s t-distribution 及程式實作的原理。注意：現有實作存在若干致命缺陷，請討論並提出解決方案 - [ ] 指出現有程式的缺陷 (提示: 和 RIO 套件有關)，嘗試強化並提交 pull request ::: ## 支援 [Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer)，強化執行時期的記憶體檢測在原本的版本中直接下命令: ```shell $ make ``` 並且測試後可以得到結果 ```shell +++ TESTING trace trace-17-complexity: # Test if q_insert_tail and q_size is constant time complexity Probably constant time Probably constant time --- trace-17-complexity 5/5 --- TOTAL 100/100 ``` 跟我們預期的一樣，保留了去年的實做內容，但是去年沒有針對記憶體進行檢測，所以今年從這邊開始，下命令: ```shell $ make SANITIZER=1 ``` 我們得到以下結果 ```shell +++ TESTING trace trace-17-complexity: # Test if q_insert_tail and q_size is constant time complexity ================================================================= ==6633==ERROR: AddressSanitizer: global-buffer-overflow on address 0x558fdbe4a640 at pc 0x558fdbc33e76 bp 0x7ffe771817f0 sp 0x7ffe771817e0 READ of size 4 at 0x558fdbe4a640 thread T0 #0 0x558fdbc33e75 in do_option_cmd /home/hastur/linux2021/lab0-c/console.c:369 #1 0x558fdbc32cf4 in interpret_cmda /home/hastur/linux2021/lab0-c/console.c:221 #2 0x558fdbc33496 in interpret_cmd /home/hastur/linux2021/lab0-c/console.c:244 #3 0x558fdbc3470e in cmd_select /home/hastur/linux2021/lab0-c/console.c:594 #4 0x558fdbc34c0b in run_console /home/hastur/linux2021/lab0-c/console.c:667 #5 0x558fdbc3197a in main /home/hastur/linux2021/lab0-c/qtest.c:787 #6 0x7f889b922bf6 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6) #7 0x558fdbc2ef59 in _start (/home/hastur/linux2021/lab0-c/qtest+0x7f59) 0x558fdbe4a641 is located 0 bytes to the right of global variable 'simulation' defined in 'console.c:21:6' (0x558fdbe4a640) of size 1 'simulation' is ascii string '' SUMMARY: AddressSanitizer: global-buffer-overflow /home/hastur/linux2021/lab0-c/console.c:369 in do_option_cmd Shadow bytes around the buggy address: 0x0ab27b7c1470: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab27b7c1480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab27b7c1490: 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9 0x0ab27b7c14a0: 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 0x0ab27b7c14b0: 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 =>0x0ab27b7c14c0: 00 f9 f9 f9 f9 f9 f9 f9[01]f9 f9 f9 f9 f9 f9 f9 0x0ab27b7c14d0: 00 00 00 00 01 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 0x0ab27b7c14e0: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab27b7c14f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 0x0ab27b7c1500: f9 f9 f9 f9 01 f9 f9 f9 f9 f9 f9 f9 01 f9 f9 f9 0x0ab27b7c1510: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==6633==ABORTING --- trace-17-complexity 0/5 --- TOTAL 95/100 ``` 想要理解輸出訊息代表的是什麼意思，閱讀[Address/Thread/Memory Sanitizer](https://www.slideshare.net/sermp/sanitizer-cppcon-russia) 、 [A look into the sanitizer family (ASAN & UBSAN)](https://www.slideshare.net/sermp/sanitizer-cppcon-russia) 兩份簡報之後，知道 Address Sanitizer 可以幫我們做到以下： 1. Buffer Overflow( stack, heap, ==globals==) 2. Signed interger overflow 3. heap-use-after-free 4. stack-use-after-return 5. Compiler module(clang, gcc): 編譯器會自動加入程式當中並在程式進行時動態檢測 6. Run-time library: malloc replacement 7. Uninitialized memory reads :::warning **Shadow Memory:** 包含一些 metadata 的想法在裡面，拿來描述一段正常的記憶體可以被 addressing (尋址)的範圍有多少，雖然本身也是 menory 中的一塊，但是主要功能不是用來儲存資料，而是描述其他正常記憶體空間的配置情形。在[你所部知道的C語言:記憶體管理](https://hackmd.io/@sysprog/c-memory?type=view)中提到 `malloc` 配置多數系統都是以 8 byte 對齊，因此對於一段 8 byte 大小的地址必定會落入九種狀態之一 ![](https://i.imgur.com/jV8hPTW.png) 最前面的 N byte 是可以被 addressing 的，但是後面的 8-N 個 bit 是不能被 addressing 的，使用一個 byte 來紀錄非常足夠(因為可以紀錄$2^8=256$)種狀態 ![](https://i.imgur.com/TQPlEL7.png) ::: 然後我遇到的情形就是上面的第一項中的 `global-buffer-overflow` ，發現可能是全域變數 `simulation` 出現問題，看看下面那一堆記憶體配置的情形: `=>0x0ab27b7c14c0: 00 f9 f9 f9 f9 f9 f9 f9[01]f9 f9 f9 f9 f9 f9 f9` 中 `[01]` 出現問題，表示在這 8 byte 當中只有第一個 byte 是能夠使用的，其他部份有可能是 `global redzone` (如果看前後配置 f9 的話)。只配置 1 個 byte 的原因是因為當初宣告 `simulation` 這個變數的時候型態定為 `bool` ，所以記憶體只分配了 1 個 byte 給它，但是當我們尋找所有對 `simulation` 的操作的時候，我們發現在初始化 cmd 的時候，將 `simulation` 做了一次強制轉型: ```cpp=104 add_param("simulation", (int *) &simulation, "Start/Stopsimulation mode",NULL); ``` 在 [chenishi 的筆記](https://hackmd.io/@chenishi/rJJfgHEa7)當中，我們可以知道強制轉型的底層實做是把原本只佔 1 byte 的 bool 形態用int (4 byte) 的方法取出，讀取的時候其他 3 byte 其實就是非法空間了修改 `sumulation` 以及有相同問題的 `echo` ，將一開始的宣告型態從 `bool` 改成 `int` 就行了。 ## 運用 [Valgrind](https://valgrind.org/) 排除 qtest 實作的記憶體錯誤，並透過 [Massif]() 視覺化 “simulation” 過程中的記憶體使用量，需要設計對應的實驗執行以下命令： ```shell $ make valgrind ``` 後得到結果 ```shell ... --- trace-17-complexity 5/5 --- TOTAL 100/100 Test with specific case by running command: scripts/driver.py -p /tmp/qtest.XqplQV --valgrind -t <tid> ``` 通過測試，接著透過 **Massif** 工具，設計一個實驗來判斷記憶體的使用情形。執行： ```shell $ valgrind -q --tool=massif --stacks=yes --time-unit=i --massif-out-file=massif.out ./qtest ``` 實驗一測試 ```shell cmd=new cmd=it RAND 10000 cmd=reverse cmd=sort cmd=free cmd=quit ``` 得到輸出 `massif.out` ，經由 visualizer 視覺化呈現之後 ![](https://i.imgur.com/1zz4Pid.png) 一開始我們隨機塞入 10000 個節點， ## 依據 [CERN Common vulnerabilities guide for C programmers](https://security.web.cern.ch/recommendations/en/codetools/c.shtml) 建議而移除 gets / sprintf / strcpy 等不安全的函式 - **`gets`** 不安全的原因是它從不檢查寫入 buffer 的長度，如果輸入的字串長度大於程式預先劃分的記憶體空間，便會產生 Segmentation fault ，作者提出的辦法是用 `fgets` 作為替代，對輸入長度、 `\n` 進行判別，並在長度不足的時候動態分配記憶體 - **`strcpy`** 不安全的地方也是不會檢查寫入 buffer 長度，更不會去防止輸入字串寫入鄰近記憶體空間，如果遇到寫入非法記憶體的時候甚至不會給出 error。解決方案是用 `strncpy` 代替，如果遇到過長的字串，在 buffer 的階段就會用尾端塞入 `\0` 用於截斷過長的字串，在真正讀取這個 buffer 的時候便只會讀到 `\0` 之前的字串並且停止，然後只比較前 n 個字元 - **`sprintf`** 一樣也是不檢查 buffer 長度的問題，解決方式是 ### 移除 lab0 中不安全的函式 ###### tags: `linux2021`