2021 年春季 Linux 核心設計課程作業 —— lab0

--- title: 2021 年春季 Linux 核心設計課程作業 —— lab0 image: https://i.imgur.com/robmeiQ.png description: 改寫自 CMU Introduction to Computer Systems 課程第一份作業，擴充 GNU/Linux 開發工具的使用並強化自動分析機制。 --- # J01: lab0 ###### tags: `linux2021` > 主講人: [jserv](http://wiki.csie.ncku.edu.tw/User/jserv) / 課程討論區: [2021 年系統軟體課程](https://www.facebook.com/groups/system.software2021/) :mega: 返回「[Linux 核心設計](http://wiki.csie.ncku.edu.tw/linux/schedule)」課程進度表 ==[作業解說錄影](https://youtu.be/a8GwXtrUc6Q)== / ==[Code Review 解說錄影](https://youtu.be/3k_tJa-f_4M)== ## :memo: 預期目標 * [C 語言程式設計](https://hackmd.io/@sysprog/c-programming) 議題，如[不定個數參數的處理](https://en.wikibooks.org/wiki/C_Programming/stdarg.h), [signal](https://en.wikibooks.org/wiki/C_Programming/signal.h), [setjmp/longjmp](https://en.wikibooks.org/wiki/C_Programming/setjmp.h) 和記憶體管理; * 學習 [GNU/Linux 開發工具](https://hackmd.io/@sysprog/gnu-linux-dev) - [Cppcheck](http://cppcheck.sourceforge.net/): 靜態程式碼分析工具 - [Valgrind](https://valgrind.org/): 動態程式分析工具 * 學習使用 Git 與 GitHub * 樹立一致且易於協作的程式開發規範 * 研究自動測試機制 * 接觸 [Linux Programming Interface](http://man7.org/tlpi/) * 學習 [coroutine](https://en.wikipedia.org/wiki/Coroutine)^New^ ### :rocket: 改寫自 CMU 計算機概論的作業 [CMU](https://www.cmu.edu/) (位於美國賓州匹茲堡的卡內基美隆大學) 電腦科學系的 [Introduction to Computer Systems (ICS)](http://www.cs.cmu.edu/~213/index.html) 課程準備了 [C Programming Lab](https://www.cs.cmu.edu/afs/cs/academic/class/15213-s20/www/labs/cprogramminglab.pdf) 作為檢閱學生 C 語言程式設計的認知: * 記憶體管理; * 建立並管理需要透過指標操作的資料結構; * 字串操作; * 改善特定關鍵操作的效能; * 提高程式碼的穩固程度，例如能夠處理無效的參數，包含 NULL 指標; 在給定的程式碼中，[queue.h](https://github.com/sysprog21/lab0-c/blob/master/queue.h) 包含以下鏈結串列 ([linked list](https://en.wikipedia.org/wiki/Linked_list)) 結構定義: ```cpp /* Linked list element */ typedef struct ELE { char *value; struct ELE *next; } list_ele_t; ``` 接著用鏈結串列來實作佇列 ([queue](https://en.wikipedia.org/wiki/Queue_(abstract_data_type))) ```cpp /* Queue structure */ typedef struct { list_ele_t *head; /* Linked list of elements */ /* 其他自行定義的結構成員 */ } queue_t; ``` 這過程可用下圖來表示：佇列的上層建構於 `queue_t` 類型的物件。一開始，此資料結構僅包含單一成員 `head`，之後陸續以鏈結串列增添給定的字串，如 `"cab"`, `"bead"`, `"cab"` 等等。佇列內的每個元素由單向的鏈結串列構成，每個元素由 `list_ele_t` 類型的物件來表示，後者包含真正保存字串的 `value` 和指向下個鏈結串列元素開頭地址的 `next`。 ![圖 1](https://i.imgur.com/uZqLc9b.png) > 圖 1: 透過鏈結串列實作的佇列。每個鏈結串列元素都有個 `value` 指標，指向一段連續的記憶體 (即 C-style string，以 NULL 字元結尾的字串)，字元依據 ASCII 編碼 (以十六進位顯示)。 C 語言中，字串以連續的`char` 型態物件的排列來表示，對多數的硬體來說，`char` 型態表示為 1 個 byte。而為了儲存長度 `l` 的字串，需要至少 `l + 1` 個 `char` 元素，並在最後設定為 `\0` 字元。上圖的 `[cab, bead, cab]`，其中字元 `a` 和 `e` 以十六進位表示為 ASCII 編碼的 `61` 和 `65`。觀察上圖，字串 `"cab"` 和 `"bead"` 在執行時期可能對應到兩段不相鄰的記憶體。在給定的 [C Programming Lab](http://www.cs.cmu.edu/~213/labs/cprogramminglab.pdf) 程式碼中，佇列透過 `queue_t *` 指標型態來操作，程式碼需要考慮以下二種特別狀況，作出對應的處置: 1. `NULL` 佇列：是指標值為 `NULL`; 2. 「空」(empty) 的佇列是指向有效結構，但開頭 (`head`) 的值為 `NULL`; :::info 另一種解讀方式: 若你有空的紅包袋，裡面的東西是 empty; 若你連紅包袋都沒有，就可以說 NULL; ::: [queue.c](https://github.com/sysprog21/lab0-c/blob/master/queue.c) 僅提供介面 ([interface](https://en.wikipedia.org/wiki/Interface-based_programming)) 但尚未有完整程式實作，需要由學員補完並逐步精進，包含以下: * `q_new`: 建立新的「空」佇列; * `q_free`: 釋放佇列所佔用的記憶體; * `q_insert_head`: 在佇列開頭 (head) 加入 (insert) 給定的新元素 (以 LIFO 準則); * `q_insert_tail`: 在佇列尾端 (tail) 加入 (insert) 給定的新元素 (以 FIFO 準則); * `q_remove_head`: 在佇列開頭 (head) 移去 (remove) 給定的元素。 * `q_size`: 計算佇列中的元素數量。 * `q_reverse`: 以反向順序重新排列鏈結串列，該函式不該配置或釋放任何鏈結串列元素，換言之，它只能重排已經存在的元素; * ==`q_sort`==: 「[Linux 核心設計](http://wiki.csie.ncku.edu.tw/linux/schedule)」課程所新增，以==遞增順序==來排序鏈結串列的元素，可參閱 [Linked List Sort](https://npes87184.github.io/2015-09-12-linkedListSort/) 得知實作手法; :::info :information_source: "delete" 和 "remove" 看似都是「刪去某物」，在 Linux 核心的 [include/list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h) 中，這二個動作並存，但實際行為卻不同。[Difference between "delete" and "remove"](https://english.stackexchange.com/questions/52508/difference-between-delete-and-remove) 其中的解釋可知： * "remove" 這動作對於 linked list 來說，是斷開節點和節點的關聯，也就是說原本若是 A $\to$ B $\to$ C，在 `remove(B)` 操作完成後，就會變成 A $\to$ C 的節點關係。特別注意到 B 這個節點其實還存在於記憶體中，只是我們無法從節點間的關係看出來。於是 "remove" 可解讀為 "take away" (將某物帶走) * "delete" 則有 "erase" 的涵義，若用於 linked list，則指某個節點將被「抹除」，對應的記憶體就會有變化 ::: 在檔案 [queue.c](https://github.com/sysprog21/lab0-c/blob/master/queue.c) 的程式碼註解可見到更詳盡的資訊，包括如何處理無效的操作 (如，從「空」佇列或 `NULL` 佇列中刪除)，及應具有的副作用和返回值。其中 `q_insert_tail` 和 `q_size` 需要將原本 $O(n)$ 時間複雜度的實作改寫為 $O(1)$ 時間複雜度，亦即無論佇列空間為何，該操作僅需要固定數量的步驟。你的程式碼將透過 `1,000,000` 個以上元素的佇列進行測試。「[Linux 核心設計](http://wiki.csie.ncku.edu.tw/linux/schedule)」課程延伸 [C Programming Lab](http://www.cs.cmu.edu/~213/labs/cprogramminglab.pdf) 並追加以下機制: 1. 探索 [GNU/Linux 開發環境](https://hackmd.io/@sysprog/gnu-linux-dev)並善用 [Git](https://git-scm.com/) 進行版本控制 * 整合 [Valgrind](https://valgrind.org/) 動態程式分析工具，作為精準的記憶體分析和除錯 3. 除了原本對佇列的插入、刪除、反轉等操作外，額外要求針對佇列和鏈結串列撰寫排序程式，並思考高效能的實作; * 原本作業註解提示 `It should operate in O(1) time` 卻未有自動評分，本課程參照 [dudect](https://github.com/oreparaz/dudect) 工具及對應的論文 [Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf)，提出以實驗而非理論驗證時間複雜度的方法 5. 支援 [Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer)，強化執行時期的記憶體檢測; 6. 引入 [Git pre-commit hook](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks) * 引入 [git-good-commit](https://github.com/tommarshall/git-good-commit)，以互動操作來提示學員撰寫[良好的 Git commit message](https://chris.beams.io/posts/git-commit/) * 透過 [clang-format](https://clang.llvm.org/docs/ClangFormat.html) 確保一致的程式風格 * 透過 [GNU Aspell](http://aspell.net/) 進行拼字檢查; * 開啟更多 [Cppcheck](http://cppcheck.sourceforge.net/) 靜態程式碼檢查功能; * 依據 CERN [Common vulnerabilities guide for C programmers ](https://security.web.cern.ch/security/recommendations/en/codetools/c.shtml) 建議而移除 gets / sprintf / strcpy 等不安全的函式; 4. 改善 Makefile，使其更加模組化; 5. 縮減程式碼，移除不使用的部分; 6. 引入 [Git pre-push hook](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks) * 檢查是否自給定的 GitHub repository 進行 fork; * 當提交 master 分支到遠端時確認程式碼可通過編譯; * 提示學員 [Why is Git always asking for my password?](https://help.github.com/en/github/using-git/why-is-git-always-asking-for-my-password) 上述的素養對於接觸有一定規模的開放原始碼專案 (特別是 Linux 核心)，是非常關鍵的前期訓練。給定的程式碼在編譯後，會得到名為 `qtest` 的執行檔，後者提供許多測試這次作業可用的工具，執行 `qtest` 後，輸入 `help` 即有各命令的解說。參考執行過程: (以 `$ ` 開頭的字串表示應當在命令列執行的命令) ```shell $ ./qtest cmd> help Commands: # ... | Display comment free | Delete queue help | Show documentation ih v [n] | Insert v at head of queue n times (default: n == 1) it v [n] | Insert v at tail of queue n times (default: n == 1) log file | Copy output to file new | Create new queue option [name val] | Display or set options quit | Exit program reverse | Reverse queue rh [v] | Remove from head of queue. Optionally compare to expected value v rhq [v] | Remove from head of queue without reporting value show | Show queue contents size [n] | Compute queue size n times (default: n == 1) sort | Sort queue in ascending order source file | Read commands from source file time cmd arg ... | Time command execution Options: echo 1 Do/don't echo commands error 5 Number of errors until exit fail 30 Number of times allow queue operations to return false malloc 0 Malloc failure probability percent verbose 4 Verbosity level ``` :::info :notebook: "command" 一詞的翻譯是「命令」，常見於終端機裡頭執行的 [shell](https://en.wikipedia.org/wiki/Unix_shell)，而 "instruction" 的翻譯是「指令」，特別指在 CPU (central processing unit，中央處理單元) 所執行的 instruction。儘管在漢語中，這兩個詞彙很容易相互混淆，但「[Linux 核心設計](http://wiki.csie.ncku.edu.tw/linux/schedule)」課程特別強調兩者差異，並期望學員務必使用精準的用語。 ::: 本程式內建自動評分程式，以下是參考執行輸出。一開始什麼都沒做，當然是零分: ``` Test performance of insert_tail, size, and reverse ERROR: Need to allocate separate string for each list element ERROR: Insertion of gerbil failed (1 failures total) ERROR: Computed queue size as 0, but correct value is 2 ERROR: Computed queue size as 0, but correct value is 2 --- trace-15-perf 0/7 --- TOTAL 0/100 ``` ## :package: 開發環境設定 :::warning :warning: 由於 [lab0-c](https://github.com/sysprog21/lab0-c) 進行前一節所提的客製化，對 GNU/Linux 和開發工具有高度依賴，所以學員務必依據下方描述，事先做好開發環境設定，再修改給定程式碼 ::: 1. 依據 [GNU/Linux 開發工具共筆](https://hackmd.io/@sysprog/gnu-linux-dev) 的指引，設定好 Ubuntu Linux (在本例可透過虛擬機器安裝), git, [GitHub](https://github.com/) 帳號, vim (或其他你偏好的編輯器) gnuplot 2. 安裝必要的開發工具套件 ```shell $ sudo apt install build-essential git-core valgrind $ sudo apt install cppcheck clang-format aspell colordiff ``` 3. 如果你之前尚未設定或使用過 GitHub 服務，請依循 [GitHub 設定指引](http://wiki.csie.ncku.edu.tw/github)。當然，你需要學習 git，可參考線上中文教材 ==《[為你自己學 Git](https://gitbook.tw/)》== 4. 學員需要知曉 GNU/Linux 命令列的操作，可參考成功大學校友[鳥哥](http://linux.vbird.org/vbird/) (蔡德明博士) 整理的 ==[鳥哥的 Linux 私房菜](http://linux.vbird.org/)== 進行線上學習。相關的章節: * [Linux 的檔案權限與目錄配置](http://linux.vbird.org/linux_basic/0210filepermission.php) * [Linux 檔案與目錄管理](http://linux.vbird.org/linux_basic/0220filemanager.php) * [檔案與檔案系統的壓縮、打包與備份](http://linux.vbird.org/linux_basic/0240tarcompress.php) 5. 在 Ubuntu Linux 18.04-LTS 的 [Cppcheck](http://cppcheck.sourceforge.net/) 套件版本較舊，可能會使後文給定的程式碼進行靜態分析時，遇到 [false positive](https://en.wikipedia.org/wiki/False_positives_and_false_negatives)，因此最好升級到 Cppcheck `v1.90`，可用 `$ cppcheck --version` 檢查版本，若小於 `1.90` 則需要額外準備: * 以 [snap](https://snapcraft.io/) 安裝 Cppcheck，此法單純且直接: ```shell $ sudo snap install cppcheck $ export PATH=/snap/bin:$PATH ``` 至於 `$PATH` 環境變數的更新，也可寫在 `~/.bashrc` 檔案最底端，下次重開終端機就生效; 或是`$ source ~/.bashrc`也可 * 另外，你也可從 [Cppcheck 原始程式碼](https://github.com/danmar/cppcheck/) 開始編譯，這樣可取得最新發展的 Cppcheck，有助於揭露和修正問題。首先移除現有 Cppcheck 套件: `$ sudo apt remove cppcheck`，再依據 [Cppcheck Developer Info](http://cppcheck.sourceforge.net/devinfo/) 網頁指示，編譯並安裝 Cppcheck; :::success "If I had eight hours to chop down a tree, I’d spend six hours sharpening my axe." -- Abraham Lincoln > 「如果我有 8 小時可以砍 1 棵樹，我會花 6 小時把斧頭磨利。」 -- 亞伯拉罕．林肯 > (類似漢語「工欲善其事，必先利其器」) ::: ## :dart: 取得程式碼並進行開發 :::warning :warning: 如果你在 2021 年 2 月 23 日 15:00 之前，已從 GitHub [sysprog21/lab0-c](https://github.com/sysprog21/lab0-c) 進行 fork，請依據 ==[Alternatives to forking into the same account](https://github.community/t5/Support-Protips/Alternatives-to-forking-into-the-same-account/ba-p/7428)== 一文，對舊的 repository 做對應處置，然後重新 fork ::: 首先，在 Ubuntu Linux 環境建立適合開發的目錄 (directory)，例如 `/home/ubuntu/linux2020`: ```shell $ cd ~ $ mkdir -p linux2020 ``` :::info :warning: 避免在 Ubuntu Linux 的「桌面」建立開發用的目錄，因為一旦路徑中出現漢字，在後續開發可能會遭遇困難，而且命令列輸入也不便。 :warning: "directory" 一詞的翻譯是「目錄」，不要跟 folder (檔案夾) 搞混了，後者是 Microsoft Windows 提出的用語。"directory" 這詞彙已行之有年，我們會說 UNIX directory，而非 folder。 ::: 在剛才建立的新目錄，自 GitHub 取得 [lab0-c](https://github.com/sysprog21/lab0-c) 程式碼: ```shell $ git clone https://github.com/sysprog21/lab0-c ``` :::success 一旦你 fork 後，可將上面 `sysprog21` 換為你的 GitHub 帳號，對應的命令列會是: ```shell $ git clone git@github.com:你的帳號名稱/lab0-c ``` ::: 切換到 `lab0-c` 目錄並嘗試編譯程式碼: ```shell $ cd lab0-c $ make ``` 預期會看到以下輸出: ``` CC qtest.o CC report.o CC console.o CC harness.o CC queue.o LD qtest ``` 也可清除編譯輸出的檔案: ```shell $ make clean ``` 若你想知道編譯過程的細節，可執行以下命令: ```shell $ make VERBOSE=1 ``` 預期輸出: ``` gcc -o qtest.o -O1 -g -Wall -Werror -c -MMD -MF .qtest.o.d qtest.c gcc -o report.o -O1 -g -Wall -Werror -c -MMD -MF .report.o.d report.c gcc -o console.o -O1 -g -Wall -Werror -c -MMD -MF .console.o.d console.c gcc -o harness.o -O1 -g -Wall -Werror -c -MMD -MF .harness.o.d harness.c gcc -o queue.o -O1 -g -Wall -Werror -c -MMD -MF .queue.o.d queue.c gcc -o qtest qtest.o report.o console.o harness.o queue.o ``` 不難發現，名為 `qtest` 的執行檔正是編譯過程最終的輸出。請不要被上述一堆看似嚇人的參數給迷惑了。UNIX 風格 (當然 GNU/Linux 也繼承這優良傳統) 是儘量揭露系統資訊給使用者，這也是「[Linux 核心設計](http://wiki.csie.ncku.edu.tw/linux/schedule)」課程特別想讓學員體會到的精神。 :::info :+1: 關於上述 make 的行為，可參見 ==[Makefile 語法和示範](https://hackmd.io/@sysprog/SySTMXPvl)== :+1: 上述 gcc 和 `-o` 一類的參數使用，可參見 ==[你所不知道的 C 語言：編譯器和最佳化原理篇](https://hackmd.io/@sysprog/c-compiler-optimization)== ::: 參閱 [lab0-c](https://github.com/sysprog21/lab0-c) 的文件 (即 `README.md` 檔案) 後，我們得知能用以下命令測試 `qtest`: ```shell $ make check ``` 對應的程式輸出如下: ``` ./qtest -v 3 -f traces/trace-eg.cmd cmd> cmd> # Demonstration of queue testing framework cmd> # Use help command to see list of commands and options cmd> # Initial queue is NULL. cmd> show q = NULL cmd> # Create empty queue cmd> new q = [] cmd> # Fill it with some values. First at the head cmd> ih dolphin ``` 參閱 `qtest` 的 `help` 命令輸出，我們可得知以下: |qtest comand | function | |---|---| | new | Create new queue | | free | Delete queue | | ih str [n] | Insert string str at head of queue n times (default: n == 1) | | it str [n] | Insert string str at tail of queue n times (default: n == 1) | 也就是說，`$ make check` 將 [traces/trace-eg.cmd](https://github.com/sysprog21/lab0-c/blob/master/traces/trace-eg.cmd) 檔案內容指派給 `qtest`，進行以下操作: (僅列出部分) 1. ==new==: 建立新的佇列; 2. ==ih==: 從佇列開頭 (head) 插入一個字串，內容是 "dolphin"; 3. ==ih==: 繼續從佇列開頭插入一個字串，內容是 "bear"; 4. ==ih==: 繼續從佇列開頭插入一個字串，內容是 "gerbil" ([沙鼠](https://zh.wikipedia.org/wiki/%E6%B2%99%E9%BC%A0)); 5. ==it==: 從佇列尾端 (tail) 插入一個字串，內容是 "meerkat" ([狐獴](https://zh.wikipedia.org/wiki/%E7%8B%90%E7%8D%B4)); 6. ==it==: 繼續從佇列尾端 (tail) 插入一個字串，內容是 "bear"; 7. ==reverse==: 將佇列內部的鏈結串列順序予以反向; :::info :notebook: 你注意到檔名叫做 `trace-eg` 嗎？顯然這是個示範的檔案，但為何叫作 `eg` 呢？為何不用 "example" 開頭兩個字母來簡稱 "ex" 呢？這可有學問！"e.g." 是拉丁文 "exempli gratia" 的縮寫，意思是「例如」、「舉例來說」，即英文的 "for example" 或 "for instance"。 :notebook: 需要特別注意的是，==沒有 e.x. 這種縮寫== 而且 ==ex.== 的意思是「範例或練習」，也就是英文的 "exercise"。別再把 `ex.` 放到句子裡當「例如」的用途！ ::: 由於 [lab0-c](https://github.com/sysprog21/lab0-c) 已整合 [linenoise](https://github.com/antirez/linenoise)，我們可善用其中的自動補齊 (auto-complete) 功能，假如目前已輸入 `he`: ``` cmd> he |___ 游標在此 ``` 按下 tab 鍵，則完整的命令 `help` 會自動補齊，同樣地，若我們已輸入 `op` 再按下 tab 鍵，則會得到 `option`。此外，在 `cmd> ` 命令提示列中，我們可按「上」鍵和「下」鍵來切換歷史紀錄的命令。初步理解 `qtest` 運作機制後，就可前進到 [lab0-c](https://github.com/sysprog21/lab0-c) 的自動評分系統。操作命令如下: ```shell $ make test ``` 實際就是執行 [scripts/driver.py](https://github.com/sysprog21/lab0-c/blob/master/scripts/driver.py) 這個 Python 程式。同樣地，請不要因為自己沒學過 Python 程式語言就退堂 (或「退選」)，你可以快速瀏覽，應能發現，這個自動評分程式就是將 traces/trace-XX-CAT.cmd 這樣形式的命令序列提供給 `qtest` 內建的命令直譯器，然後依據給定的檢驗標準，逐一判定該給測試者多少分數。或許你會糾結於 `driver.py` 這個檔名，在你的印象中，"driver" 的翻譯是「驅動程式」，不就是該用低階程式語言撰寫嗎？怎麼會跟 Python 和自動評分系統有關？查閱 [dictionary.com](https://www.dictionary.com/) 的 "[driver](https://www.dictionary.com/browse/driver)" 詞目，我們會發現: > Computers. software or hardware that controls the interface between a computer and a peripheral device. > Electronics. a circuit whose output provides the input of another circuit 在電子電機電腦科學 (即 [EECS](https://en.wikipedia.org/wiki/Computer_Science_and_Engineering)) 領域中，"driver" 可以是軟體，也能是硬體，顯然上述的 "driver.py" 是軟體，其作用是控制電腦程式 (這裡指給定的命令序列，如 `ih` 及 `it`) 和特定的單元，後者就是由 `qtest` 所提供的命令直譯器。平常我們看到「驅動程式」的翻譯方式，其實是 device driver 的簡稱，在程式設計中，我們可見到類似的 "driver" 案例，如: * database driver: [JDBC driver](https://en.wikipedia.org/wiki/JDBC_driver) * compiler driver: gcc 本身就是 * test driver: 在 [Unit testing](http://softwaretestingfundamentals.com/unit-testing/) 常見; 另外，[lab0-c](https://github.com/sysprog21/lab0-c) 專案支援透過 `$ make SANITIZER=1` 開啟 [Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer) 從而強化執行時期的記憶體檢測，隨後 `$ make test` 執行過程中會見到類似以下輸出: ``` ==8522==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008 (pc 0x55ea517092cb bp 0x7ffe778b4900 sp 0x7ffe778b4900 T0) ==8522==The signal is caused by a READ memory access. ==8522==Hint: address points to the zero page. #0 0x55ea517092ca in q_remove_head lab0-c/queue.c:74 #1 0x55ea51704880 in do_remove_head lab0-c/qtest.c:311 #2 0x55ea51707054 in interpret_cmda lab0-c/console.c:217 #3 0x55ea51707a58 in interpret_cmd lab0-c/console.c:240 #4 0x55ea51708725 in cmd_select lab0-c/console.c:568 #5 0x55ea51708b42 in run_console lab0-c/console.c:627 #6 0x55ea51705c7d in main lab0-c/qtest.c:700 #7 0x7facce0d8b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96) #8 0x55ea51703819 in _start (lab0-c/qtest+0x5819) ``` 這是**符合預期**的結果，請不要慌張。你可依循上述訊息，追查程式碼檔名及行號，推測為何會觸發 SIGSEGV (在 UNIX 及其相容系統上，意味著 [Segmentation fault](https://en.wikipedia.org/wiki/Segmentation_fault))。請搭配閱讀以下簡報: * ==[Address/Thread/Memory Sanitizer](https://www.slideshare.net/sermp/sanitizer-cppcon-russia)== * ==[A look into the sanitizer family (ASAN & UBSAN)](https://www.slideshare.net/CysinfoCommunity/a-look-into-the-sanitizer-family-asan-ubsan-by-akul-pillai)== ### clang-format 工具和一致的程式撰寫風格使用一致的 [programming style](https://en.wikipedia.org/wiki/Programming_style) 很重要，我們可透過 [clang-format](https://clang.llvm.org/docs/ClangFormat.html) 這個工具來調整作業程式要求的風格，使用方式如下: ```shell $ clang-format -i *.[ch] ``` 課程要求的 C 程式撰寫風格簡述: * 使用 ==4 個空白字元==來進行縮排，不用 Tab; - 為何不比照 Linux 核心都用 tab 呢？ - 首先是為了 code review 的便利，4 個空白字元可避免程式碼過寬，從而易於課堂討論，且和共筆系統 (這裡指 [HackMD](https://hackmd.io/)) 預設的縮排方式相符。 - 再者是考慮到不同的編輯器 (editor) 對於 tab 行為有落差，而且 [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html) 也建議使用空白字元而非 tab，這樣的風格被許多開放原始碼專案所採納; * 將單一行程式碼寬度控制在 80 個字元內 - 任何一行超過 80 列寬度的敘述都該拆分成多個行 * switch 敘述的縮排方式是讓 case 與 switch 對齊，範例: ```cpp switch (c) { case 'h': usage(argv[0]); break; ``` * 延續 [K&R C](https://en.wikipedia.org/wiki/The_C_Programming_Language) 風格去處理大括號 (`{ }` 英語: curly brackets，又稱花括號) 與空格 - Kernighan 和 Ritchie 在撰寫《[The C Programming Language](https://en.wikipedia.org/wiki/The_C_Programming_Language)》一書所用的程式範例，是把左大括號 `{` 放在行末，把右大括號 `}` 放在行首，如: ```cpp if (x == true) { do_y(); } else { do_z(); } do { /* body of do-loop */ } while (condition); ``` - 然而有個特殊的例子，就是函式：函式的左大括號應該放在行首，如： ```cpp int function(int x) { body of function } ``` * 在不降低可讀性的前提下，儘可能減少空行的數量。無用的換行一旦減少，那就有更多的空行來寫註解。 * Linux 核心風格對於空格，體現於一些關鍵字的運用，即在關鍵字之後增添一個空格。值得關注的例外是一些長得像函式的關鍵字，如: `sizeof`, `typeof`, `alignof`, `__attribute__` (註: 這是 [gcc 延展的功能](https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html))，在 Linux 核心中，這些關鍵字的使用都會帶上一對括號，儘管在 C 語言的使用上並不需帶上括號 * `sizeof` 不是函式，而是 operator! * 在這些關鍵字之後新增一個空格: `if`, `switch`, `case`, `for`, `do`, `while` - 但不要新增在 `sizeof`, `typeof`, `alignof`, `__attribute__` 之後 - 期望見到的用法是 ```cpp s = sizeof(struct file); ``` * 不要在括號周圍多此一舉的新增空格 - 下面這個例子糟透了 ```cpp s = sizeof( struct file ); ``` * 在宣告指標或者返回值為指標的函式時，指標 (即 `*` 符號) 的位置應該緊靠著變數名稱或函式名稱，而非型別名稱，例如： ```cpp char *linux_banner; unsigned long long memparse(char *ptr, char **retptr); char *match_strdup(substring_t *s); ``` * 在二元操作符和三元操作符周圍新增一個空格 - 例如: `= + - < > * / % | & ^ <= >= == != ? :` - 但不要在一元操作符之後新增空格: `&*+-~!` `sizeof` `typeof` `alignof` `__attribute__` `defined` - 不要在後綴的 `++` `--` 運算子之前新增空格 (即 `i++`) - 不要在開頭的 `++` `--` 之後新增空格 (即 `++i`) - 不要在結構體成員運算子 (即 `.` 和 `->`) 周圍新增空格 * 不要在行末新增多餘的空格 - 某些編輯器的「智慧」縮排會幫你在行首新增一些空格，好讓你在下一行可以立即撰寫程式碼。但某些編輯器不會自動刪去多餘的空格，儘管你已經寫完了一行程式碼。比如你只想留一行空行，但是編輯器卻「好心」地幫你填上了一些空格。這樣一來，你就在行末添加多餘的空格。 * 變數和函式命名力求簡潔且精準 - C 是種==簡潔粗獷==的語言，因此命名也該簡潔 - C 程式設計師不會像 Pascal 程式設計師那樣使用 `ThisVariableIsATemporaryCounter` 這種「可愛」的名字，相反地，一個 C 程式設計師會把這種變數命名為 `tmp`，如此簡潔易寫。 - 全域變數 (只有當你真正需要的時候才用它) 和全域函式 (也就是沒有用 `static` 宣告的函式) 需要使用描述性的名稱。若你有個計算活躍使用者數量的函式，你應該用 count_active_users() 一類的名稱，避免用 `cntusr()` 這樣不易望文生義的名稱。 - 起一個包含函式型別的名字([匈牙利命名法](https://en.wikipedia.org/wiki/Hungarian_notation))是摧殘大腦的行為，編譯器知道函式的型別並且會檢查型別，這樣的名字不會起到任何幫助，它僅僅會迷惑程式設計師。 - 區域變數名應該簡短，若你需要寫一個迴圈，定義一個計數器，在不產生歧義的情況下，你大可命名為 `i`。反過來說，命名為 `loop_counter` 是生產力很低的行為。同樣地，`tmp` 可以是任何型別的區域變數。 :::info :-1: 你可以想像 Apple 和 Google 的工程師隨便安置程式碼，然後不用管合作的議題嗎？ :balloon: 「[Linux 核心設計](http://wiki.csie.ncku.edu.tw/linux/schedule)」課程希望引導學員最終能夠欣然面對 Linux 核心或者有一定規模的軟體專案，不再只是「[舉燭](http://dict.revised.moe.edu.tw/cgi-bin/cbdic/gsweb.cgi?ccd=UOmTQ4&o=e0&sec1=1&op=sid=%22Z00000158282%22.&v=-2)」，而是真正和世界各地的高手協作，上述看似繁雜的程式開發風格就會是相當基本且該貫徹的基本素養。 :+1: 或許你會反問：「只是一個作業，有必要這樣自虐嗎？」不妨這樣想：即便一個人寫作業，其實是三人的參與 —— 過去的你、現在的你，以及未來的你 ::: ### [Git Hooks](https://www.atlassian.com/git/tutorials/git-hooks) 進行自動程式碼排版檢查首次執行 `make` 後，Git pre-commit / pre-push hook 將自動安裝到現行的工作區 (workspace)，之後每次執行 `git commit` 時，Git hook 會檢查 C/C++ 原始程式碼的風格是否一致，並透過 [Cppcheck](http://cppcheck.sourceforge.net/) 進行靜態程式碼檢查。 :::warning :warning: 任何人都可以寫出機器看得懂的程式碼 (在 Windows 檔案總管裡面，隨便選一個 EXE 檔，按右鍵複製，隨後再貼上即可)，但我們之所以到資訊工程系接受訓練，為了寫出人看得懂、可持續維護和改進的程式 ::: 下圖展示 Git pre-commit hook 偵測到開發者的修改並未遵守一致的 coding style，主動回報並提醒開發者: ![](https://i.imgur.com/g7DQtYF.png) 紅色標注的二行程式碼 (即 `int member1;` 和 `int member2;`) 不符合指定的 4 個空白縮排方式，在 git pre-commit 階段就成功阻擋這樣風格不一致的程式碼變更。 ### 撰寫 Git Commit Message 和自動檢查機制 ![](https://i.imgur.com/stK5oBN.png) Git commit message 是什麼呢？在取得 [lab0-c](https://github.com/sysprog21/lab0-c) 程式碼後，執行 `$ git log` 的輸出就是了。你或許會納悶，commit message 又不是程式碼，充其量只能算是「程式開發的軌跡」，為何要特別探討？ :::info :notebook: 可安裝 `tig` 套件，更便利地瀏覽 git repository 資訊。安裝方式: `$ sudo apt install tig` 參考執行畫面如下: ![](https://i.imgur.com/pMRybxY.png) ::: Peter Hutterer 在 [On commit messages](https://who-t.blogspot.com/2009/12/on-commit-messages.html) 說得很精闢： > 「重新了解一段程式碼更動的脈絡很浪費腦力。雖然這件事情沒辦法完全避免，但是我們可以盡量[降低](https://www.osnews.com/story/19266/wtfsm/)這件事情的複雜度。Commit messages 正可以做到這點，而**我們可以從 commit message 看出一個開發者是否為一位好的合作對象**。」一個專案是否能長期且成功地運作 (撇除其他影響的因素)，取決於它的可維護性，而在這件事上沒有其他工具比專案本身的開發軌跡更為強大。因此花時間學習如何撰寫與維護專案的開發紀錄，是件值得投資的事。一開始你可能會覺得麻煩，但它很快就會變成一種習慣，甚至能成為你之所以感到自豪及具備生產力的因素。 Chris Beams 在 [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/) 一文 (繁體中文翻譯: [如何寫一個 Git Commit Message](https://blog.louie.lu/2017/03/21/%E5%A6%82%E4%BD%95%E5%AF%AB%E4%B8%80%E5%80%8B-git-commit-message/)) 提出，好的 Git commit message 應符合七條規則: 1. 用一行空白行分隔標題與內容 2. 限制標題最多只有 50 字元 3. 標題開頭要大寫 4. 標題不以句點結尾 5. 以祈使句撰寫標題 6. 內文每行最多 72 字 7. 用內文解釋 what 以及 why vs. how [lab0-c](https://github.com/sysprog21/lab0-c) 內建 Git pre-commit hook 就包含依循上述七條規則的自動檢查，可阻擋不符合規則的 git commit message，此外還透過 [GNU Aspell](http://aspell.net/) 對 commit message 的標題進行拼字檢查。 :::info :warning: 既然我們著眼於 Linux 核心，當然要用英語撰寫 commit message，英語不該只是大學畢業門檻，更該要活用，「[Linux 核心設計](http://wiki.csie.ncku.edu.tw/linux/schedule)」課程還兼英語寫作的功效，真划算！ ::: 以下實際操作 Git。事先編輯檔案 `queue.h` 後，執行 `$ git commit` 會發現: ```shell $ git commit On branch master Your branch is up-to-date with 'origin/master'. Changes not staged for commit: modified: queue.h no changes added to commit ``` 原來要指定變更的檔案，用命令 `$ git add`: ```shell $ git add queue.h ``` :::success :notebook: 可改用 `$ git commit -a` 這道命令，讓 Git 自動追蹤變更的檔案並提交 ::: 重新 `git commit`，這時可發現安裝的 Git hooks 發揮作用: ```shell= $ git commit -m "add fields to point to the tail" add fields to point to the tail [line 1] - Capitalize the subject line Proceed with commit? [e/n/?] e add fields to point to the tail [line 1] - Capitalize the subject line Proceed with commit? [e/n/?] y How to Write a Git Commit Message: https://chris.beams.io/posts/git-commit/ e - edit commit message n - abort commit ? - print help Proceed with commit? [e/n/?] ? How to Write a Git Commit Message: https://chris.beams.io/posts/git-commit/ e - edit commit message n - abort commit ? - print help Proceed with commit? [e/n/?] e [master 23f7113] Add fields to point to the tail ``` Git pre-commit hook 提示我們: * 第 4 行輸入 `e`，可編輯 git commit message; * 第 7 行輸入 `y`，因為 y 不是有效選項, 所以出現 help; * 第 17 行輸入 `e`, 再次編輯訊息，因訊息有限制標題開頭要大寫; 注意：請避免用 `$ git commit -m`，而是透過編輯器調整 git commit message。許多網路教學為了行文便利，用 `$ git commit -m` 示範，但這樣很容易讓人留下語焉不詳的訊息，未能提升為 [好的 Git Commit Message](https://blog.louie.lu/2017/03/21/%E5%A6%82%E4%BD%95%E5%AF%AB%E4%B8%80%E5%80%8B-git-commit-message/)。因此，從今以後，不要用 `git commit -m`, 改用 `git commit -a` (或其他參數) 並詳細查驗變更的檔案。在 Ubuntu Linux 預設組態中，執行 `$ git commit -a` 會呼叫 [GNU nano](https://www.nano-editor.org/) 來編輯訊息，可透過變更 `EDITOR` 環境變數，讓 git 找到你偏好的編輯器，例如指定 vim 就可以這樣做: ```shell export EDITOR=vim ``` 你也可把上述變更加到 bash 設定檔案中，例如: ```shell $ echo "export EDITOR=vim" >> ~/.bashrc ``` 或者透過 git config 命令來將偏好的編輯器寫入 git 的組態，例如： ```shell $ git config --global core.editor vim ``` 參考連結：[Customizing Git - Git Configuration](https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration) 一番折騰後，終於順利提交 (commit) 我們的變更: ``` shell $ git commit -a On branch master Your branch is ahead of 'origin/master' by 1 commit. (use "git push" to publish your local commits) nothing to commit, working tree clean ``` 需要注意的是，目前提交的變更**僅存在於這台電腦中**，還未發佈到 GitHub 上，於是我們需要執行 `git push` > 下方的 `jserv` 是示範帳號，請換為你自己的 GitHub 帳號名稱 ```shell $ git push Username for 'https://github.com': jserv Password for 'https://jserv@github.com': Hint: You might want to know why Git is always asking for my password. https://help.github.com/en/github/using-git/why-is-git-always-asking-for-my-password Running pre push to master check... Trying to build tests project... Pre-push check passed! Counting objects: 4, done. Delta compression using up to 64 threads. Compressing objects: 100% (4/4), done. Writing objects: 100% (4/4), 709 bytes | 709.00 KiB/s, done. Total 4 (delta 3), reused 0 (delta 0) remote: Resolving deltas: 100% (3/3), completed with 3 local objects. To https://github.com/sysprog21/lab0-c d2d7711..b42797a master -> master ``` 不難發現上面訊息提到 [Why is Git always asking for my password?](https://help.github.com/en/github/using-git/why-is-git-always-asking-for-my-password) 這個超連結，如果你厭倦每次 `$ git push` 都需要輸入帳號及密碼，可參考超連結內容去變更設定。又因 pre-push hook 使然，每次發布變更到 GitHub 服務時，只要是 `master` branch (主要分支) 就會觸發編譯要求，只有通過編譯的程式碼才能發布，但 `master` 以外的 branch 不受此限，換言之，我們鼓勵學員運用 `git branch` 進行各項實驗，請參照線上中文教材 ==《[為你自己學 Git](https://gitbook.tw/)》==。 ### 牛刀小試若你很有耐心地讀到這裡，恭喜你，終於可著手修改 [lab0-c](https://github.com/sysprog21/lab0-c) 程式碼。這裡我們嘗試實作函式 `q_size`，首先找出 [`queue.c`](https://github.com/sysprog21/lab0-c/blob/master/queue.c) 檔案中對應的註解: > Return number of elements in queue. > Return 0 if q is NULL or empty > Remember: It should operate in **$O(1)$** time `q_size(queue_t *)` 由於要在 $O(1)$ 的常數時間內執行完成，所以也不可能每次都走訪整個鏈結串列，以取得佇列的容積。又在 [`queue.h`](https://github.com/sysprog21/lab0-c/blob/master/queue.h) 的註解，我們得知: > You will need to add more fields to this structure to efficiently implement `q_size` and `q_insert_tail` 因此，我們嘗試新增 **int size** 這個新成員到 `struct queue_t`: ```cpp typedef struct { list_ele_t *head; /* Linked list of elements */ int size; } queue_t; ``` 接著修改 `q_size` 的傳回值，改成傳回 `q->size`。一切就緒後，提交修改: ```cpp $ git commit -m "Change q_size return value to q->size" --- modified queue.c +++ expected coding style @@ -70,7 +70,7 @@ bool q_insert_tail(queue_t *q, char *s) { /* You need to write the complete code for this function */ /* Remember: It should operate in O(1) time */ - list_ele_t *newt; // newt means new tail + list_ele_t *newt; // newt means new tail newt = malloc(sizeof(list_ele_t)); q->tail = newt; return true; [!] queue.c does not follow the consistent coding style. Make sure you indent as the following: clang-format -i queue.c ``` Git pre-commit hook 偵測到程式縮排不符合前述的風格，因而擋住我們的提交，我們要執行 `$ clang-format -i queue.c` 去調整程式縮排，才得以繼續提交。 ### 以 [Valgrind](https://valgrind.org/) 分析記憶體問題 [Valgrind](https://valgrind.org/) 是個在使用者層級 (user space) 對程式在執行時期的行為提供動態分析的系統軟體框架，具備多種工具，可以用來追蹤及分析程式效能，最為人們所熟知的特性就是幫助使用者偵測記憶體錯誤，諸如使用未初始化的記憶體、不當的記憶體配置、或取消配置記憶體，並加以分析。所有工具都包含在 valgrind 套件中，可以透過以下命令執行： ```shell $ valgrind --tool=<toolname> <program> ``` 以記憶體追蹤工具來說，上述 toolname 可以是 memcheck, massif, 或 cachegrind。請注意，使用 Valgrind 時，會讓程式執行速度比平常更慢。依據 [Valgrind is *NOT* a leak checker](http://maintainablecode.logdown.com/posts/245425-valgrind-is-not-a-leak-checker) 的描述: > Valgrind is an undefined behavior checking tool first, a function and memory profiler second, a data-race detection tool third, and a leak checking tool last. dynamic Binary Instrumentation (DBI) 著重於二進位執行檔的追蹤與資訊彙整，而 dynamic Binary Analysis (DBA) 則強調對收集資訊的分析。上述 [Valgrind](https://valgrind.org/) 是個 DBI 系統框架，可建構一系列 DBA 工具，藉由 shadow values 技術來實作，後者要求對所有的暫存器和使用到的主記憶體做 shadow (即自行維護的副本)，這也使得 [Valgrind](https://valgrind.org/) 相較其他分析方法會較慢。為了實作 shadow values，需要以下 4 個部分： * Shadow State * 提供 shadow registers (例如 integer, floating point, SIMD 等暫存器) * 提供 shadow memory * 讀寫操作 9 個具體功能: - instrument read/write instructions - instrument read/write system calls - Allocation and deallocation operations - instrument start-up allocations - instrument system call (de)allocations - instrument stack (de)allocations - instrument heap (de)allocations - Transparent execution, but with extra output - extra output 也就是說，[Valgrind](https://valgrind.org/) 主要的手法是將暫存器和主記憶體的內容自行維護副本，並在任何情況下都可以安全正確地使用，同時記錄程式的所有操作，在不影響程式執行結果前提下，輸出有用的資訊。為了實作功能，[Valgrind](https://valgrind.org/) 利用 [dynamic binary re-compilation](https://en.wikipedia.org/wiki/Dynamic_recompilation) 把測試程式 (稱為 client 程式）的機械碼解析到 VEX 中間表示法 (intermediate representation，簡稱 IR，是種虛擬的指令集架構，規範在原始程式碼 [VEX/pub/libvex_ir.h](https://sourceware.org/git/?p=valgrind.git;a=blob;f=VEX/pub/libvex_ir.h))。VEX IR 在 [Valgrind](https://valgrind.org/) 採用執行導向的方式，以 just-in-time (JIT) 編譯技術動態地把機械碼轉為 IR，倘若觸發特定工具感興趣的事件 (如記憶體配置)，就會跳躍到對應的處理工具，後者會插入一些分析程式碼，再把這些程式碼轉換為機械碼，儲存到 code cache 中，以利後續需要時執行。整個流程如下圖: ``` Machine Code --> IR --> IR --> Machine Code ^ ^ ^ | | | translate | | | | instrument | | translate ``` [Valgrind](https://valgrind.org/) 啟動後，會開始轉換 client 程式，[Valgrind](https://valgrind.org/) 執行的都是加工後的 client 程式。運作細節可參見: * [Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation](https://valgrind.org/docs/valgrind2007.pdf): 2007 年的論文; * dv 針對上面論文和相關原始程式碼，做出的[繁體中文版本論文導讀](https://wdv4758h-notes.readthedocs.io/zh_TW/latest/valgrind/dynamic-binary-instrumentation.html); ### [Valgrind](https://valgrind.org/) 使用案例由於 [Valgrind](https://valgrind.org/) 是動態追蹤，會從給定的程式一路追蹤到 [glibc](https://www.gnu.org/software/libc/) (GNU C library)，為了更好的分析效果，需要安裝對應包含除錯訊息的套件: ```shell $ sudo apt install libc6-dbg ``` - [ ] 案例 1: Memory Leak 常見的記憶體錯誤，就是配置的記憶體 (一般透過呼叫 [malloc](https://linux.die.net/man/3/malloc) 所得) 在一系列操作後，忘記釋放 (即呼叫 [free](https://linux.die.net/man/3/free))，且已無任何指標指著該記憶體空間。我們舉個簡單的例子，並使用 [Valgrind](https://valgrind.org/) 來協助偵測： (檔名 `case1.c`) ```cpp= #include <stdlib.h> void func(void) { char *buff = malloc(10); } int main(void) { func(); return 0; } ``` 編譯: ```shell $ gcc -o case1 -g case1.c ``` Valgrind 使用方法很簡單，如果有參數就加在後面，程式執行結束就會產生報告： ```shell $ valgrind -q --leak-check=full ./case1 ``` 預期輸出： ``` ==107295== 10 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==107295== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==107295== by 0x10865B: func (case1.c:3) ==107295== by 0x10866B: main (case1.c:6) ==107295== ``` 報告中指明有一個 10 bytes 的 block 被偵測為 "definitely lost"，位置在 case1.c 的第 3 行: ```cpp=3 char *buff = malloc(10); ``` 可看到 memory lost 分成幾種類型： * definitely lost: 真的 memory leak * indirectly lost: 間接的 memory leak，structure 本身發生 memory leak，而內部的 member 如果是 allocate 的出來的，一樣會 memory leak，但是只要修好前面的問題，後面的問題也會跟著修復。 * possibly lost: allocate 一塊記憶體，並且放到指標 ptr，但事後又改變 ptr 指到這塊記憶體的中間 * still reachable: 程式結束時有未釋放的記憶體，不過卻還有指標指著，通常會發生在 global 變數即便是在所用的函式庫裡頭配置的記憶體，也可偵測到，這也是動態分析手法的優勢。 - [ ] 案例 2: Invalid Memory Access Invalid memory access 有時不會立即造成 segmentation fault，所以不會有 core dump 可分析，於是需要借助像 [Valgrind](https://valgrind.org/) 這類的工具來偵測。一般情況可能是在配置記憶體合法範圍之外的地方進行記憶體操作，或用了已釋放的記憶體 ([use-after-free](https://cwe.mitre.org/data/definitions/416.html))。以下是範例程式： (檔名: `case2.c`) ```cpp= #include <stdlib.h> #include <stdio.h> #include <string.h> int main(void) { /* 1. Invalid write */ char *str = malloc(4); strcpy(str, "Brian"); free(str); /* 2. Invalid read */ int *arr = malloc(3); printf("%d", arr[4]); free(arr); /* 3. Invalid read */ printf("%d", arr[0]); /* 4. Invalid free */ free(arr); return 0; } ``` 比照前一個案例編譯並啟動 [Valgrind](https://valgrind.org/): ```shell $ gcc -o case2 -g case2.c $ valgrind -q --leak-check=full ./case2 ``` 預期可見以下報告輸出: ``` ==13415== Invalid write of size 2 ==13415== at 0x1086FA: main (case2.c:7) ==13415== Address 0x522d044 is 0 bytes after a block of size 4 alloc'd ==13415== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==13415== by 0x1086EB: main (case2.c:6) ==13415== ==13415== Invalid write of size 1 ==13415== at 0x108700: main (case2.c:7) ==13415== Address 0x522d046 is 2 bytes after a block of size 4 alloc'd ==13415== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==13415== by 0x1086EB: main (case2.c:6) ==13415== ==13415== Invalid read of size 4 ==13415== at 0x108726: main (case2.c:12) ==13415== Address 0x522d0a0 is 13 bytes after a block of size 3 alloc'd ==13415== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==13415== by 0x108719: main (case2.c:11) ==13415== ==13415== Invalid read of size 4 ==13415== at 0x10874B: main (case2.c:16) ==13415== Address 0x522d090 is 0 bytes inside a block of size 3 free'd ==13415== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==13415== by 0x108746: main (case2.c:13) ==13415== Block was alloc'd at ==13415== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==13415== by 0x108719: main (case2.c:11) ==13415== ==13415== Invalid free() / delete / delete[] / realloc() ==13415== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==13415== by 0x10876B: main (case2.c:19) ==13415== Address 0x522d090 is 0 bytes inside a block of size 3 free'd ==13415== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==13415== by 0x108746: main (case2.c:13) ==13415== Block was alloc'd at ==13415== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==13415== by 0x108719: main (case2.c:11) ``` 不要被報告的篇幅嚇到，我們可歸納錯誤如下: 1. `Invalid write of size 2` 和 `Invalid write of size 1`: 試圖寫入一個非法的區域，[Valgrind](https://valgrind.org/) 精準地提示問題在 `case2.c` 這檔案的第 6 行所配置記憶體 4 bytes。通常遇到這種情況都是忘記檢查 buffer 的 size 就去用。 ```cpp=6 char *str = malloc(4); strcpy(str, "Bryant"); ``` 2. (第一次的) `Invalid read of size 4`: 試圖讀取一個非法的區域。 ```cpp=11 int *arr = malloc(3); printf("%d", arr[4]); free(arr); ``` 3. (第二次的) `Invalid read of size 4`: 讀取的區域已被釋放，該釋放的操作也被 [Valgrind](https://valgrind.org/) 指出，在 `case2.c` 檔案的第 13 行 ```cpp=13 free(arr); /* 3. Invalid read */ printf("%d", arr[0]); ``` 4. `Invalid free`: 即釋放一個不存在的區域，或雙重釋放 (double free) - [ ] 其他案例若遇到 `Conditional jump or move depends on uninitialised value(s)` 的錯誤，可能是用到沒有結束字元 (null-terminated string) 的字串。另外，若函式 A 用到函式 B 配置出來的記憶體空間 (特別在 [stack 空間](https://www.geeksforgeeks.org/stack-vs-heap-memory-allocation/) 中)，但因而造成程式出現非預期的行為。下圖為典型的 C 語言程式在執行時的記憶體配置圖，記憶體的使用可分為 text, data, bss, stack, heap 這幾個主要部分: ![](https://imgur.com/OhqUECc.png) 堆疊區段 (stack segment) 用以儲存函式的區域變數，及各種函式呼叫時需要儲存的資訊 (例如函式返回的記憶體位址，和呼叫者函式的狀態一類資訊)，每次的函式呼叫就會在堆疊區段建立一個 stack frame，儲存該次呼叫的所有變數與狀態，這樣一來，同一個函式重複被呼叫時就會有不同的 stack frame，不會互相干擾，遞迴函式就是透過這樣的機制執行。 :::info :cake: 詳細資訊請參照 ==[你所不知道的 C 語言](https://hackmd.io/@sysprog/c-programming)== 線上講座的這兩個議題: * :+1: [函式呼叫篇](https://hackmd.io/@sysprog/c-function) * :+1: [遞迴呼叫篇](https://hackmd.io/@sysprog/c-recursion) ::: 對於區域變數的存取若超過範圍，可能造成 stack corrupt，致使出現以下錯誤訊息： ``` *** stack smashing detected ***: ./mem_test terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x64f8f47] /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x0)[0x64f8f10] ./mem_test(func+0x2db)[0x407f22] ======= Memory map: ======== 00400000-00420000 r-xp 00000000 08:01 280804 /tmp/mem_test 0061f000-00620000 r--p 0001f000 08:01 280804 /tmp/mem_test 00620000-00622000 rw-p 00020000 08:01 280804 /tmp/mem_test 00622000-00623000 rw-p 00000000 00:00 0 04000000-04022000 r-xp 00000000 08:01 160861 /lib/x86_64-linux-gnu/ld-2.15.so ``` 這訊息是由 gcc 內建的 [Stack Smashing Protector](https://wiki.osdev.org/Stack_Smashing_Protector) (ssp) 所產生。之前提到 [Valgrind](https://valgrind.org/) 可支援多種工具，其中分析記憶體使用狀況的工具叫做 [Massif](https://valgrind.org/docs/manual/ms-manual.html) 可分析以下: * Heap blocks; * Heap administration blocks; * Stack sizes. 可搭配視覺化工具展現給定程式在執行時期的記憶體行為，類似以下: ![](https://imgur.com/F7hbcsN.png) 若不想看到 `possibly lost`，可追加 `--show-possibly-lost=no` 參數到 [Valgrind](https://valgrind.org/) 的命令列。若 file descriptor 開了沒關也可偵測，只要加上 `--track-fds=yes` 看到這裡，相信你感受到 [Valgrind](https://valgrind.org/) 的強大，趕快從第一手材料汲取更多資訊: [Valgrind User Manual](https://valgrind.org/docs/manual/manual.html) [lab0-c](https://github.com/sysprog21/lab0-c) 已整合 [Valgrind](https://valgrind.org/) 來協助學員細緻地找出記憶體相關的問題，諸如 [memory leak](https://en.wikipedia.org/wiki/Memory_leak), [buffer overflow](https://en.wikipedia.org/wiki/Buffer_overflow), [Dangling pointer](https://en.wikipedia.org/wiki/Dangling_pointer) 等等。使用方式如下: ```shell $ make valgrind ``` 參考的程式輸出: ``` --- Trace Points +++ TESTING trace trace-01-ops: # Test of insert_head and remove_head ERROR: Failed to store removed value ERROR: Failed to store removed value ERROR: Failed to store removed value ==105595== 32 bytes in 1 blocks are still reachable in loss record 1 of 25 ==105595== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==105595== by 0x10AFFE: malloc_or_fail (report.c:189) ==105595== by 0x10BB0B: add_cmd (console.c:120) ==105595== by 0x10BBEC: init_cmd (console.c:92) ==105595== by 0x10A81A: main (qtest.c:687) ... Test with specific case by running command: scripts/driver.py -p /tmp/qtest.cjLJHC --valgrind -t <tid> ``` 在最後一行訊息，我們見到類似以下: ```shell scripts/driver.py -p /tmp/qtest.cjLJHC --valgrind -t <tid> ``` 其中 `/tmp/qtest.cjLJHC` 是隨機產生的檔名，請依據實際程式輸出調整，而 `<tid>` 則是 `traces` 目錄裡頭個別 trace 檔案的編號，例如 `01`, `02`, `03`, ..., `15`。 ## :microscope: 自動測試程式自動「測試」的議題會比自動「評分」多，至少需要涵蓋以下: 1. 需要偵測程式非預期的終止 (如 [Segmentation fault](https://en.wikipedia.org/wiki/Segmentation_fault)); 2. 需要在程式執行過程設定計時器 (timer) 或設定鬧鐘 (alarm)，在特定時間點觸發動作，這樣才可知道程式是否執行超時; 3. 需要得知記憶體配置和使用的狀況，可用 [Valgrind](https://valgrind.org/)，但更輕量的解決方案也該考慮; 4. 前述提過 `qtest` 本身就是個命令直譯器，該如何設計可擴充的架構呢？ 5. 特定的條件限制，如 $O(1)$ 時間複雜度，該如何在執行時期確保？前兩者涉及 Linux 的 [signal](http://man7.org/linux/man-pages/man7/signal.7.html)，後兩者則需要額外的程式設計技巧。探討細部議題之前，請先參閱 ==[你所不知道的C語言: Stream I/O, EOF 和例外處理](https://hackmd.io/@sysprog/c-stream-io)==。 ### 追蹤記憶體配置和釋放的狀況在 [harness.h](https://github.com/sysprog21/lab0-c/blob/master/harness.h) 和 [harness.c](https://github.com/sysprog21/lab0-c/blob/master/harness.c) 檔案中，我們可見對記憶體管理進行相關的調整。說明如下： 1. 將原先的 `malloc` 與 `free` 使用 `test_malloc` 與 `test_free` 取代。該部份可見 [harness.h](https://github.com/sysprog21/lab0-c/blob/master/harness.h) 檔案內容最後方： ```cpp /* Tested program use our versions of malloc and free */ #define malloc test_malloc #define free test_free ``` 因此，一旦像是 `queue.c` 這樣的 C 程式引入 (即透過 `#include` 前置處理) [harness.h](https://github.com/sysprog21/lab0-c/blob/master/harness.h) 後，原本呼叫 `malloc` 與 `free` 就被替換為在 [harness.c](https://github.com/sysprog21/lab0-c/blob/master/harness.c) 內所實作的 `test_malloc` 與 `test_free` 這兩個函式。為了區隔，以下用 `test_malloc` 與 `test_free` 稱呼該二函式，而非 `malloc` 與 `free`。這樣的程式開發技巧稱為 [function hooking](https://en.wikipedia.org/wiki/Hooking)。至於為何這檔名叫做 `harness` 呢？這是有學問的，請見 [Test harness](https://en.wikipedia.org/wiki/Test_harness)。 2. 所有由於呼叫 `test_malloc` 得來的記憶體，紀錄於一個名為 `allocated` 的一個雙向鏈結串列中，後者對應的結構為 `block_ele_t`，定義如下： ```cpp typedef struct BELE { struct BELE *next; struct BELE *prev; size_t payload_size; size_t magic_header; /* Marker to see if block seems legitimate */ unsigned char payload[0]; /* Also place magic number at tail of every block */ } block_ele_t; ``` 本結構體的最後一個成員 `payload[0]` 是實際呼叫者使用的記憶體開頭。該成員使用到的 GCC 中 [Array of Length Zero](https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html) 特徵，後者給予開發的便利：若結構體中的最後一個成員是大小為 0 的陣列，那可透過 `malloc(sizeof(block_ele_t) + 希望 payload 後面接著多少記憶體)` 來達成「最後一個成員大小可任意指定」的效果（另外一個好處是 `malloc` 跟 `free` 一次就可以完成，不用替 `payload` 開頭的記憶體再 `malloc` 一次）。圖解上述結構體在記憶體中的抽象呈現: ![](https://i.imgur.com/j1fRN0B.png) `test_malloc` 函式實作如下: ```cpp void *test_malloc(size_t size) { if (noallocate_mode) { report_event(MSG_FATAL, "Calls to malloc disallowed"); return NULL; } if (fail_allocation()) { report_event(MSG_WARN, "Malloc returning NULL"); return NULL; } block_ele_t *new_block = malloc(size + sizeof(block_ele_t) + sizeof(size_t)); ... ``` 這技巧體現於程式碼的 `malloc(size + sizeof(block_ele_t) + sizeof(size_t))` 當中。除了 `sizeof(block_ele_t)` 與 `size`，還多了 `sizeof(size_t)` 的空間。這是因為使用者實際可使用的記憶體，前後被兩個 `size_t` 整數包著，裡面各自紀錄著兩個 Magic Number，作為驗證這塊記憶體是否是由 `test_malloc` 分配的依據，以及作為記憶體是否有產生越界等不佳使用的依據。 `test_malloc` 配置的空間，示意如下圖: ![block from test_malloc](https://i.imgur.com/nkCgAZL.png) :::info gcc 的擴充功能 "Arrays of Length Zero" 在 Linux 核心大量使用，可參照中文解說 [C Struct Hack - Structure with variable length array](http://frankchang0125.blogspot.com/2013/01/c-struct-hack-structure-with-variable.html) ::: 3. 位於配置的記憶區塊之前的 magic number，實際上就是結構體中 `magic_header` 這個成員，其值會在 `test_malloc` 中被指定為 `MAGICHEADER`，即 `0xdeadbeef`; 在記憶體尾端的 `size_t` 整數，數值必定為 `MAGICFOOTER`，也就是 `0xbeefdead`。 :::info 像 `0xdeadbeef` 這樣本身是十六進位數值但卻有英語上的意義 (dead beef)，稱為 [hexspeak](https://en.wikipedia.org/wiki/Hexspeak)，也大量存在於 Linux 核心原始程式碼中。 ::: 4. 因為要回傳給使用者的指標實際上是 `payload` 的位置，所以程式中另外有 `find_footer` 跟 `find_header` 這兩個函式作為輔助。前者傳入 `block_ele_t*` ，回傳位於 `payload` 尾端 magic number 的位址。如 `test_malloc` 中有段指定尾端 magic number 的程式： ```cpp *find_footer(new_block) = MAGICFOOTER; ``` 而後者則是傳入使用者呼叫後得到的記憶體開頭，反推該記憶體所屬的 `block_ele_t` 的開頭位置。這兩個函式除了用在 `test_malloc` 當中，也會用在`test_free` 當中。在呼叫 `test_free` 時，使用者傳入的，實際上是 `payload` 的位置，但釋放記憶體時，除了該記憶體之外，該記憶體所屬的結構體，也要一併釋放。因此需要有一個尋找該記憶體所屬開頭的方法。這兩個函式的原理不難理解：給定 `block_ele_t`，尾端的 magic number，由 `payload_size` 進行推算即可; 對於反推記憶體所屬的結構體，則是利用 `sizeof(block_ele_t)` 反推，並且尋找反推過後的記憶體是否在 `allocated` 的清單中。若沒有，則推論為錯誤的配置。而在 `find_header` 中有檢測傳入指標是否為空的設定。 5. `test_free` 中用的原理類似：首先用 `find_header` 找出使用者準備釋放的記憶體，其所屬的結構體有沒有再 `allocated` 裡面。若傳入的指標為空指標，或是該記憶體不屬於任何一個節點，`find_header` 內部會分別傳出「試圖釋放空指標」或「記憶體損壞」等警告。 6. 若順利找到該記憶體所屬的節點，接著檢驗尾端的 magic number 是否仍為 `MAGICFOOTER`，作為記憶體有沒有被越界存取等狀況的判斷依據。若比對結果發現不合，也會發出該區段記憶體損壞的警告。 7. 若確認結果無誤，則將記憶體區段前後的 magic number 都設成 `MAGICFREE`。並且將記憶體內容都用 `FILLCHAR` 填充。最後移出 `allocated` 這個雙向鏈結串列。 ```cpp new_block->next = allocated; new_block->prev = NULL; if (allocated) allocated->prev = new_block; allocated = new_block; allocated_count++; ``` 不難發現，串連方式其實就是佇列的 `insert_head`，只是 `block_ele_t` 為雙向鏈結串列。配置空間的主要區塊，實際拿來存放資料的空間，也會被填上 `FILLCHAR` (即 `0x55`) 做初始化： ```cpp void *p = (void *) &new_block->payload; memset(p, FILLCHAR, size); ``` ![](https://i.imgur.com/Ovsk2Qy.png) 8. 在分配記憶體時，每呼叫一次 `test_malloc` 函式，`allocate_count` 就會遞增；而每呼叫一次 `test_free` 函式，該數值就會遞減。因此，只要在程式即將結束時，判斷該數值，即可判斷是否妥當地釋放記憶體。 ### `qtest` 命令直譯器的實作在 `qtest` 執行後，會見到 `cmd> ` 命令提示訊息，當輸入一則命令後，會經過以下函式呼叫，一路到 `parse_arg`，後者解析使用者的輸入，呼叫流程是: > main → run_console → cmd_select → interpret_cmd → parse_args 之後，`interpret_cmda` 會在 `cmd_list` 這個 `struct CELE` 為元素的單向鏈結串列中找對應的命令。`cmd_list` 會在 `main` 裡面的 `cmd_init` 跟 `console_init` 呼叫時初始化。找到後，`struct CELE` 這個結構體中有個 `operation`成員，儲存每個命令的結構體中的 `operation` 包裝著待測試的函式: ```cpp typedef struct CELE cmd_ele, *cmd_ptr; struct CELE { char *name; cmd_function operation; char *documentation; cmd_ptr next; } ``` 程式執行到此，會將之前解析的輸入傳遞過去。以 `it` 命令為例： ```shell cmd> it LookAtMe ``` `parse_arg` 執行完畢，將跳躍到對應的測試程式，其呼叫堆疊如下: ```cpp [#0] 0x5555555596bc → q_insert_tail(q=0x555555563b50, s=0x555555563cc0 "LookAtMe") [#1] 0x555555555ae6 → do_insert_tail(argc=0x2, argv=0x555555563d60) [#2] 0x555555557e93 → interpret_cmda(argc=0x2, argv=0x555555563d60) [#3] 0x555555557f36 → interpret_cmd(cmdline=0x55555555e6a0 <linebuf> "it LookAtMe\n") [#4] 0x555555558dd8 → cmd_select(nfds=0x1, readfds=0x7fffffffdaf0, writefds=0x0, exceptfds=0x0, timeout=0x0) [#5] 0x555555558f42 → run_console(infile_name=0x0) [#6] 0x5555555569c2 → main(argc=0x1, argv=0x7fffffffded8) ``` 接著我們可推知，一旦事先提供 `do_*` 函式，並在 `console_init` 中呼叫 `add_cmd("<instruction>", <do_*>, "<documentation>")`，即可增加新命令。以下示範在`console.c` 檔案新增名為 `hello` 的命令。先準備 `do_hello` 函式: ```cpp bool do_hello(int argc, char *argv[]) { return (bool) printf("Hello, World\n"); } ``` 並在 `console_init` 中新增以下: ```cpp static void console_init() { add_cmd("new", do_new, " | Create new queue"); add_cmd("free", do_free, " | Delete queue"); ... add_cmd("hello", do_hello, " | Print hello message"); ... } ``` 重新編譯後，在 `qtest` 的命令列中輸入 `hello`，可發現該命令已可被命令直譯器所識別: ```shell $ ./qtest cmd> hello Hello, World ``` 再輸入 `help`，則會發現對應的說明也新增: ```shell cmd>help Commands: # ... | Display comment free | Delete queue hello | Print hello message help | Show documentation ih str [n] | Insert string str at head o ``` ### Signal 處理和應用我們在 `queue_init` 函式的實作發現一些特別的東西: ```cpp static void queue_init() { fail_count = 0; q = NULL; signal(SIGSEGV, sigsegvhandler); signal(SIGALRM, sigalrmhandler); } ``` 注意到 [signal](http://man7.org/linux/man-pages/man7/signal.7.html) 的使用，註冊了 `SIGSEGV` 和 `SIGALRM` 對應的處理常式 (handler)。在 UNIX 或者符合 [POSIX](https://en.wikipedia.org/wiki/POSIX) 規範的作業系統 (例如 GNU/Linux) 中具備 signal 的機制，可傳送 signal (中文翻譯為「訊號」，不過為了行文彰顯該術語，後面一律用原文 "signal") 給其他程式。當我們在終端機中，透過鍵盤按下 Ctrl+C 組合鍵，作業系統會傳一個 [SIGINT](https://en.wikipedia.org/wiki/Signal_(IPC)#SIGINT) 給給定的行程 (process，可理解為「執行中的程式」)，而行程收到 signal 後就一定要放下當下的工作，優先處理 signal。至於如何「處理」，程式會有預設行為，根據不同 signal 往往有有不同處理行為，比如： |Signal | 描述 | 預設的動作 (action) | |-|-|-| |SIGSEGV | 無效的記憶體參照 | 終止行程的運作 | |SIGALRM | 由 [alarm](http://man7.org/linux/man-pages/man2/alarm.2.html) 傳遞來的 Timer signal | 終止行程的運作 | |SIGWINCH| 終端機視窗寬度或高度變更 | 忽略該 signal | 這裡需要特別提 [alarm](http://man7.org/linux/man-pages/man2/alarm.2.html)，後續會運用到：當呼叫 alarm(n) 後，等待 n 秒後，會觸發 `SIGALRM` 這個 signal，若 alarm 的參數為 `0`，則之前設置的 Timer 會被取消，並將剩下的時間返回。不過使用者可更改 signal 對應的動作，而且在許多程式會做這樣的操作，例如收到 SIGINT 後，將目前行程的資訊記錄於某個日誌檔案 (軟體的「黑盒子」)，待善後舉措完畢後再真正終止該行程。 `SIGSEV` ([segmentation fault](https://en.wikipedia.org/wiki/Segmentation_fault) 會發出的 signal）的處理常式如下： ```cpp void sigsegvhandler(int sig) { report(1, "Segmentation fault occurred. You dereferenced a NULL or invalid " "pointer"); /* Raising a SIGABRT signal to produce a core dump for debugging. */ abort(); } ``` 另一個 `SIGALARM` 的處理函式是： ```cpp void sigalrmhandler(int sig) { trigger_exception( "Time limit exceeded. Either you are in an infinite loop, or your " "code is too inefficient"); } ``` 其中 `sigalrmhandler` 使用 `trigger_exception`，後者的實作如下: ```cpp void trigger_exception(char *msg) { error_occurred = true; error_message = msg; if (jmp_ready) siglongjmp(env, 1); else exit(1); } ``` 除了設定相關的變數、全域變數 `error_message` 指向給錯誤訊息外，還有一個 `siglongjmp`。這個 `siglongjmp` 會跳到哪裡去呢？觀察 [lab0-c](https://github.com/sysprog21/lab0-c) 裡頭用到 `sigsetjmp` 的程式碼，會發現在 `exception_setup` 函式裡頭: ```cpp= bool exception_setup(bool limit_time) { if (sigsetjmp(env, 1)) { /* Got here from longjmp */ jmp_ready = false; if (time_limited) { alarm(0); time_limited = false; } if (error_message) { report_event(MSG_ERROR, error_message); } error_message = ""; return false; } else { /* Got here from initial call */ jmp_ready = true; if (limit_time) { alarm(time_limit); time_limited = true; } return true; } } ``` 再根據手冊對 [sigsetjmp](https://linux.die.net/man/3/sigsetjmp) 的描述: > [setjmp()](https://linux.die.net/man/3/setjmp) and [sigsetjmp()](https://linux.die.net/man/3/sigsetjmp) return 0 if returning directly, and nonzero when returning from [longjmp(3)](https://linux.die.net/man/3/longjmp) or [siglongjmp(3)](https://linux.die.net/man/3/siglongjmp) using the saved context. 亦即，上述 `exception_setup` 函式列表的第 3 行會是關鍵 —— 此部分將被執行 2 次: * 第一次是在初始設定，回傳值 `0`，接著跳到第 15 行執行 `else{ ... }`; * 第二次是 [setlongjmp()](https://linux.die.net/man/3/siglongjmp) 跳躍回來，回傳值 `1`，繼續執行第 5 行之後的 `if{ ... }`; 我們也可發現除了暫停 `SIGALARM` 的計時外，若從 setlongjmp() 返回，會呼叫 `report_event()` 把錯誤訊息輸出於終端機，但因傳入的參數只是 `MSG_ERROR`，所以不會終止程式（要是 `MSG_FATAL` 才會終止)。`report_event` 函式的實作程式如下: ```cpp void report_event(message_t msg, char *fmt, ...) { va_list ap; bool fatal = msg == MSG_FATAL; char *msg_name = msg == MSG_WARN ? "WARNING" : msg == MSG_ERROR ? "ERROR" : "FATAL ERROR"; int level = msg == MSG_WARN ? 2 : msg == MSG_ERROR ? 1 : 0; if (verblevel < level) return; if (!errfile) init_files(stdout, stdout); va_start(ap, fmt); fprintf(errfile, "%s: ", msg_name); vfprintf(errfile, fmt, ap); fprintf(errfile, "\n"); fflush(errfile); va_end(ap); ... if (fatal) { if (fatal_fun) fatal_fun(); exit(1); } ``` 會發現這是一個接受[不定個數參數](https://en.wikibooks.org/wiki/C_Programming/stdarg.h)的函式，主要的功能是輸出錯誤訊息。這也說明為什麼 `qtest` 收到 SIGSEGV 後，還會繼續執行。翻閱《[The Linux Programming Interface](http://man7.org/tlpi/)》，得知 signal 運用上的議題: 1. `sigsetjmp` 跟 `siglongjmp` 是能在 signal handler 內部使用的非區域跳躍。不使用 `setjmp` 或`longjmp` 的考量點是： > The sa_mask field allows us to specify a set of signals that aren’t permitted to interrupt execution of this handler. In addition, the signal that caused the handler to be invoked is automatically added to the process signal mask. This means that a signal handler won’t recursively interrupt itself if a second instance of the same signal arrives while the handler is executing. 以及： > However, there is a problem with using the standard longjmp() function to exit from a signal handler. We noted earlier that, upon entry to the signal handler, the kernel automatically adds the invoking signal, as well as any signals specified in the act.sa_mask field, to the process signal mask, and then removes these signals from the mask when the handler does a normal return. > > What happens to the signal mask if we exit the signal handler using longjmp()? The answer depends on the genealogy of the particular UNIX implementation. 簡言之，當某個 signal handler 被觸發時，該 signal 會在執行 signal handler 時會被遮罩住，並在 signal handler 回傳時恢復。而，在裡面使用 `longjmp` 時，解除訊號遮罩的行為有可能不會發生(是否解除則依照實作決定)。為了保證在非區域跳躍後能夠恢復，所以 POSIX 另行規範得以在 signal handler 中呼叫的 `sigsetjmp` 跟 `siglongjmp`。 2. `jmp_ready` 的技巧： > Because a signal can be generated at any time, it may actually occur before the target of the goto has been set up by sigsetjmp() (or setjmp()). To prevent this possibility (which would cause the handler to perform a nonlocal goto using an uninitialized env buffer), we employ a guard variable, canJump, to indicate whether the env buffer has been initialized. If canJump is false, then instead of doing a nonlocal goto, the handler simply returns. 因為 signal 有可能在任何時候發生，包含 `sigsetjmp` 處理中，但未處理完 `sigjmp_buf` 之際。倘若這時 signal handler 又進行 `siglongjmp`，那麼將產生錯誤。改進方式是引入一個 `jmp_ready` 變數，表示「`sigjmp_buf` 是否準備好」，並在可能進行 `siglongjmp` 的 signal handler 中先檢查這個變數，以確保 `sigjmp_buf` 有正確地初始化。 ### 命令直譯器的初始化準備觀察以下程式碼: ```cpp void init_cmd() { cmd_list = NULL; param_list = NULL; err_cnt = 0; quit_flag = false; add_cmd("help", do_help_cmd, " | Show documentation"); add_cmd("option", do_option_cmd, " [name val] | Display or set options"); add_cmd("quit", do_quit_cmd, " | Exit program"); add_cmd("source", do_source_cmd, " file | Read commands from source file"); add_cmd("log", do_log_cmd, " file | Copy output to file"); add_cmd("time", do_time_cmd, " cmd arg ... | Time command execution"); add_cmd("#", do_comment_cmd, " ... | Display comment"); add_param("verbose", &verblevel, "Verbosity level", NULL); add_param("error", &err_limit, "Number of errors until exit", NULL); add_param("echo", &echo, "Do/don't echo commands", NULL); init_in(); init_time(&last_time); first_time = last_time; } ``` 前面是在初始化系列全域變數，姑且先略過這些變數代表的功能，但後面的 `add_cmd` 顯然是新增命令，其對應實作如下: ```cpp void add_cmd(char *name, cmd_function operation, char *documentation) { cmd_ptr next_cmd = cmd_list; cmd_ptr *last_loc = &cmd_list; while (next_cmd && strcmp(name, next_cmd->name) > 0) { last_loc = &next_cmd->next; next_cmd = next_cmd->next; } cmd_ptr ele = (cmd_ptr) malloc_or_fail(sizeof(cmd_ele), "add_cmd"); ele->name = name; ele->operation = operation; ele->documentation = documentation; ele->next = next_cmd; *last_loc = ele; } ``` 再參照 [console.h](https://github.com/sysprog21/lab0-c/blob/master/console.h) 中的結構體： ```cpp typedef bool (*cmd_function)(int argc, char *argv[]); /* Information about each command */ /* Organized as linked list in alphabetical order */ typedef struct CELE cmd_ele, *cmd_ptr; struct CELE { char *name; cmd_function operation; char *documentation; cmd_ptr next; }; ``` 於是可發現，在 `qtest` 命令直譯器所支援的命令是透過一個單向鏈結串列來儲存，而每個結構是用一個名稱 (`name`)、一個指向對應功能的函式的函式指標，及一個字串的說明 (`documentation`) 所構成。而插入的手法類似 [insertion sort](https://en.wikipedia.org/wiki/Insertion_sort)，將所有命令以字典順序排列。另外，`add_param` 運作方式也類似: ```cpp /* Add a new parameter */ void add_param(char *name, int *valp, char *documentation, setter_function setter) { param_ptr next_param = param_list; param_ptr *last_loc = &param_list; while (next_param && strcmp(name, next_param->name) > 0) { last_loc = &next_param->next; next_param = next_param->next; } param_ptr ele = (param_ptr) malloc_or_fail(sizeof(param_ele), "add_param"); ele->name = name; ele->valp = valp; ele->documentation = documentation; ele->setter = setter; ele->next = next_param; *last_loc = ele; } ``` 對應的結構體為 `param_ele` 和 `param_ptr`: ```cpp typedef struct PELE param_ele, *param_ptr; struct PELE { char *name; int *valp; char *documentation; /* Function that gets called whenever parameter changes */ setter_function setter; param_ptr next; }; ``` 在 `interpret_cmda` 中，如果有找到命令，會呼叫對應的 `struct CELE` 中的 `operation` 成員。`operation` 每個函式的型別都是 `bool (*do_something) (int, char**)`，對應的命名就是 `do_<operation name>` 的那些函式。呼叫 `operation` 時，採用以下手法: ```cpp static bool interpret_cmda(int argc, char *argv[]) { ... if (next_cmd) { ok = next_cmd->operation(argc, argv); if (!ok) record_error(); } else { report(1, "Unknown command '%s'", argv[0]); record_error(); ok = false; } } return ok; } ``` ### 檢測非預期的記憶體操作或程式執行逾時如果收到 `SIGSEGV` 或 `SIGALRM`，那會跳進 signal handler 裡面，而 signal handler 會執行 `siglongjmp`，至於接著會跳躍到何處呢？我們繼續分析可發現，每個形如 `do_<operation_name>` 的函式，都有類似下面這段程式： ```cpp if (exception_setup(true)) { for (r = 0; ok && r < reps; r++) { bool rval = q_insert_tail(q, inserts); if (rval) { qcnt++; if (!q->head->value) { report(1, "ERROR: Failed to save copy of string in list"); ok = false; } } else { fail_count++; if (fail_count < fail_limit) report(2, "Insertion of %s failed", inserts); else { report(1, "ERROR: Insertion of %s failed (%d failures total)", inserts, fail_count); ok = false; } ok = ok && !error_check(); } } exception_cancel(); ``` 當執行流程第一次 (或說「不因 signal handler 而跳來這邊」) 時，會呼叫 `exception_setup`，實作如下: ```cpp bool exception_setup(bool limit_time) { if (sigsetjmp(env, 1)) { /* Got here from longjmp */ jmp_ready = false; if (time_limited) { alarm(0); time_limited = false; } if (error_message) { report_event(MSG_ERROR, error_message); } error_message = ""; return false; } else { /* Got here from initial call */ jmp_ready = true; if (limit_time) { alarm(time_limit); time_limited = true; } return true; } } ``` 於是可推知，首次呼叫上述函式時，`exception_setup` 的功能是建立 `sigjmp_buf`，並回傳 `true`。回到稍早提及的 `if (exception_setup(true))` 敘述中，若是第一次回傳，那麼會開始測試函式。若測試函式的過程中，發生任何錯誤 (亦即觸發 `SIGSEGV` 或 `SIGALRM` 一類的 signal)，就會立刻跳回 signal handler。signal handler 會印出錯誤訊息，並進行 `siglongjmp`。由 `exception_setup` 的程式可以知道又是跳到 `exception_setup(true)` 裡面，但這時會回傳 `false`，因而跳過測試函式，直接結束測試並回傳 `ok` 內含值。換言之，`exception_cancel()` 後就算再發生 `SIGALRM` 或 `SIGSEGV`，也不會再有機會回到 `exception_setup()` 裡面。將上述討論繪製為流程圖: ```mermaid graph TB init[init] save_state((save_state)) exception{exception} restore_state((restore_state)) queue_op exception_occur{exception_occur} exit init --> save_state save_state--exception = false-->state state -->exception exception--false-->queue_op exception--true-->exit queue_op-->exception_occur exception_occur--true-->restore_state restore_state--exception = true-->state exception_occur--false-->exit ``` 綜合上述，`bool exception_setup(bool limit_time)` 用來判別程式於執行時期是否超過指定時間範圍 (此處 `time_limit` 設定為有效)，探討以下兩種狀況: * 如果超過時間 `jmp_ready == true` → 呼叫 `trigger_exception` 保存錯誤訊息 (即 `error_message`) 並將 `error_occurred` 設成 true → 由 `siglongjmp` 返回 `exception_setup` 進入 if 迴圈，重新計時並顯示錯誤訊息，最後回傳 `false`; * 如果沒有超過時間 `time_limited == true` → 呼叫 `exception_cancel` 重新計時並將 `jmp_ready` 設回 `false`，此時 `error_message` 也會設定成空白字元，使其不顯示; 若在執行作業要求的函式中，沒有發出任何訊號，就接著檢測個別位於 queue.c 實作的函式回傳結果是否正確。以流程圖總結例外處理和驗證機制: ```mermaid graph TB init[init] set_error_occurred_false(set error_occurred = false) set_error_occurred_true(set error_occurred = true) error(signal not <br>from queue_op) queue_op exception_setup error_occurred{error_occurred} sigsetjmp stack{stack} set_jmp_ready_true(set jmp_ready =true) set_jmp_ready_false(set jmp_ready =false) signal_received{signal received} siglongjmp trigger_exception exit report_error[report error message] init --> set_error_occurred_false set_error_occurred_false-->exception_setup exception_setup -->set_jmp_ready_true set_jmp_ready_true -->sigsetjmp sigsetjmp--save stack context-->stack stack--from initial call-->error_occurred stack--from siglongjmp-->report_error report_error--set false-->set_jmp_ready_false set_jmp_ready_false-->error_occurred error_occurred--false-->queue_op error_occurred--true-->exit queue_op -->signal_received signal_received --true-->trigger_exception signal_received --false-->exit error--jmp_ready should be false-->trigger_exception trigger_exception-->set_error_occurred_true set_error_occurred_true --jmp_ready == true-->siglongjmp set_error_occurred_true --jmp_ready == false-->exit siglongjmp-->stack ``` ## [Coroutine](https://en.wikipedia.org/wiki/Coroutine) Donald E. Knuth 的經典著作《The Art of Computer Programming》提到 [coroutine](https://en.wikipedia.org/wiki/Coroutine) 這個自 1960 年代即現身的多任務切換技巧。coroutine 的 "yield" 是屬於程式語言層面，透過特定技巧或機制，讓原本循序執行的陳述指令，得以做出交錯執行的結果，而 multi-threading 的 "yield" 則偏重於系統層面 (如 CPU resource)，簡單來說，該如何一方面允許多 thread 並行，又能適當規劃個別單元對系統資源的使用。參考實作: * [Coroutines in less than 20 lines of standard C](https://fanf.livejournal.com/105413.html) * [Coroutines in one page of C](https://yosefk.com/blog/coroutines-in-one-page-of-c.html) 延伸閱讀: [你所不知道的C語言: goto 和流程控制篇](https://hackmd.io/@sysprog/c-control-flow?type=view) ## :computer: 論文〈[Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf)〉重點提示此論文提出一套量測方法與對應原始碼 [dudect](https://github.com/oreparaz/dudect)，用以量測程式執行時間是否能在常數時間 $O(1)$ 完成，其特點如下 - 不侷限某特定 CPU 硬體平台，此特點相當重要。該論文也提及其它相關研究，往往只能在某特定 CPU 架構才能量測 (因為需要更新 CPU 的 [microcode](https://en.wikipedia.org/wiki/Microcode))。 - 該論文使用 [Student's t-distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution) (使用情境: 取樣大小不大且[標準差](https://en.wikipedia.org/wiki/Standard_deviation)無法得知的情況下) 之 [Student's t-test](https://en.wikipedia.org/wiki/Student%27s_t-test) 方法利用 leakage detection test 測試程式是否是 constant time ，優點是在這個方法下不需要硬體模型，分為以下三個步驟進行 1. Repeatedly measure exeution time 分別用兩類(class) input data ，論文中提到的是 fix-vs-random ， fixed class 是為了 corner-case 2. Apply post-processing - Cropping: 去掉某些極端值 - High-order preprocessing 3. Apply statistical test - null hypothesis: "the two timing distributions are equal" - if statistical test reject the null hypothesis, then we know it's not run in constant time - [Welch's t-test](https://en.wikipedia.org/wiki/Welch's_t-test): 本專案使用的測試 ### Welch’s t-test 先簡介 Student’s t-distribution 就是抽樣下的常態分佈，抽樣越多就會越接近標準常態分佈(因為越接近母體)，圖中的 one-tailed test 和 two-tailed test 和 alternative hypothesis 有關，我們只需要看兩組數據是否不一樣，所以用 two-tailed test > alternative hypothesis: "the two timing distributions are not equal" > ![](https://i.imgur.com/jeWKqt1.gif) 接著是 Welch’s t-test - 先計算出兩個 class (class 0, class 1) 的平均值和變異數 - Welch's t-test defines the statistic t by the following formula: - $t = \frac{\bar{X_0} - \bar{X_1}}{\sqrt{\frac{Var_0}{N_0} + \frac{Var_1}{N_1}}}$ - $t$ 越接近 0 代表兩組數據的差異越小 - lab0 實作上只有到這裡，然後直接檢查 $t$ 是否大於 `t_threshold_moderate` - approximate df (degree of freedom) using the [Welch–Satterthwaite equation](https://en.wikipedia.org/wiki/Welch%E2%80%93Satterthwaite_equation) - 由 df 我們可以得到相應的 t-distribution 如上圖的鐘型曲線 - 圖上的藍白交界處分別為 +-t - 把藍色部份積分起來可以得到 p value 若 p 小於特定值 (通常 0.05 這部份沒有深入理解) 我們可以說兩組數據有統計上的顯著差異 (statistically significant difference) ### 實作細節 `lab0-c` 的 Welch's test 測試項目有兩個: * `is_insert_tail_const()`: 用來測試將字串加到 Queue 尾端的執行時間是否能在常數時間完成。 * `is_size_const()`: 用來測試取得 Queue 大小時間是否能在常數時間完成。每個測項又細分兩個階段: * Pre-processing: 主要測量待測物 (給定的程式碼) 的執行時間 (CPU cycle)，其步驟如下: * 兩種產生 Queue 方式 (即對照 Welch's Test 所需兩組取樣資料): * Queue 大小 = 0 $\to$ 論文提及的 `fix class` * Queue 大小 = n (隨機產生)，並插入 n 個相同的隨機字串至 Queue 開頭 (字串長度=7bytes, 加上結束字元` '\0'` 則為 8bytes) $\to$ 論文提及的`random class` * 執行待測物 (`is_insert_tail_const()` 或 `is_size_const()`)，並記錄執行時間 (CPU cycles) * Statistical test: 將所記錄的CPU執行時間 (CPU cycles) 套用如下 [Welch's test](https://en.wikipedia.org/wiki/Welch%27s_t-test) 公式，用以計算t value。可對照 `lab0-c` 原始碼的 [t_push()](https://github.com/AdrianHuang/lab0-c/blob/master/dudect/ttest.c#L19) 與 [t_compute()](https://github.com/AdrianHuang/lab0-c/blob/master/dudect/ttest.c#L31)。其所計算得出 t value 就可用來判斷測試項目是否在常數時間完成。 $$ t = \frac{\bar{X_0} - \bar{X_1}}{\sqrt{\frac{s^2_1}{N_1} + \frac{s^2_2}{N_2}}} $$ > $t$ 越接近 0 代表兩組數據的差異越小 ## :penguin: 作業要求 * 在 GitHub 上 fork [lab0-c](https://github.com/sysprog21/lab0-c) * 參閱 [Git 教學和 GitHub 設定指引](https://hackmd.io/@sysprog/git-with-github) ^附教學影片^ * ==詳細閱讀 [C Programming Lab](https://www.cs.cmu.edu/afs/cs/academic/class/15213-s20/www/labs/cprogramminglab.pdf)== ，依據指示著手修改 `queue.[ch]` 和連帶的檔案，測試後用 Git 管理各項修改，記得也該實作 `q_sort` 函式。 * 在提交程式變更前，務必詳閱 [如何寫好 Git Commit Message](https://blog.louie.lu/2017/03/21/%E5%A6%82%E4%BD%95%E5%AF%AB%E4%B8%80%E5%80%8B-git-commit-message/) * 不用理會 [Autolab](http://www.autolabproject.com/) 和檔案下載的描述，這兩者都是 CMU 專屬 * 除了修改程式，也要編輯「[作業區](https://hackmd.io/@sysprog/linux2021-homework1)」，增添開發紀錄和 GitHub 連結，除了提及你如何逐步達到自動評分程式的要求外，共筆也要涵蓋以下: * 開啟 [Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer)，修正 `qtest` 執行過程中的錯誤 * 先執行 `qtest` 再於命令提示列輸入 `help` 命令，會使開啟 [Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer) 觸發錯誤，應予以排除 * 運用 Valgrind 排除 `qtest` 實作的記憶體錯誤，並透過 Massif 視覺化 "simulation" 過程中的記憶體使用量，需要設計對應的實驗 * 在 `qtest` 中實作 [coroutine](https://en.wikipedia.org/wiki/Coroutine)，並提供新的命令 `web`，提供 web 伺服器功能，注意: web 伺服器運作過程中，`qtest` 仍可接受其他命令 * 可嘗試整合 [tiny-web-server](https://github.com/7890/tiny-web-server)，將 `FORK_COUNT` 變更為 `0`，並以 [coroutine](https://en.wikipedia.org/wiki/Coroutine) 取代原本的 fork 系統呼叫 * 解釋 [select](http://man7.org/linux/man-pages/man2/select.2.html) 系統呼叫在本程式的使用方式，並分析 [console.c](https://github.com/sysprog21/lab0-c/blob/master/console.c) 的實作，說明其中運用 CS:APP [RIO 套件](http://csapp.cs.cmu.edu/2e/ch10-preview.pdf) 的原理和考量點。可對照參考 [CS:APP 第 10 章重點提示](https://hackmd.io/@sysprog/H1TtmVTTz) :arrow_forward: 為避免「舉燭」，請比照 `qtest` 實作，撰寫獨立的終端機命令解譯器 (可建立新的 GitHub repository) * 說明 [antirez/linenoise](https://github.com/antirez/linenoise) 的運作原理，注意到 [termios](http://man7.org/linux/man-pages/man3/termios.3.html) 的運用 * 研讀論文 [Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf)，解釋本程式的 "simulation" 模式是如何透過以實驗而非理論分析，達到驗證時間複雜度，需要解釋 [Student's t-distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution) 及程式實作的原理。注意：現有實作存在若干致命缺陷，請討論並提出解決方案; * 指出現有程式的缺陷 (提示: 和 [RIO 套件](http://csapp.cs.cmu.edu/2e/ch10-preview.pdf) 有關)，嘗試強化並提交 pull request * 截止日期： * Mar 7, 2021 (含) 之前 * 越早在 GitHub 上有動態、越早接受 code review，評分越高 ## 作業觀摩 * [2020 年秋季班](https://hackmd.io/@sysprog/2020-homework1) * [2020 年春季班](https://hackmd.io/@sysprog/linux2020-homework1) * [2019 年春季班](https://hackmd.io/@sysprog/SJ4kPZYS4) * [2018 年秋季班](https://hackmd.io/@sysprog/rJwiDHGKQ)