2019q3 Homework5 (quiz7)

# 2019q3 Homework5 (quiz7) contributed by < `colinyoyo26` > ### 測驗 `3` [CS:APP 第 3 版](https://hackmd.io/@sysprog/S1vGugaDQ) 一書的第 3 章家庭作業 `3.70` (Page 224) 題目: > 考慮到以下 union 宣告 ```cpp union ele { struct { long *p, y; } e1; struct { long x; union ele *next; } e2; }; ``` > 編譯器產生以下組合語言程式碼: ```cpp proc: movq 8(%rdi), %rdx movq (%rdx), %rax movq (%rax), %rax subq 8(%rdx), %rax movq %rax, (%rdi) ret ``` 對應的 C 語言程式如下: ```cpp void proc(union ele *ptr) { ptr->AAA.x = *(ptr->BBB.CCC->DDD.p) - (ptr->EEE.FFF->GGG.HHH); } ``` 提示: 可用 `gcc -Os -S` 輸出來檢驗請補完程式碼。 ==作答區== AAA = ? * `(a)` e1 * `(b)` e2 * `(c)` p * `(d)` y * `(e)` x * `(f)` next BBB = ? * `(a)` e1 * `(b)` e2 * `(c)` p * `(d)` y * `(e)` x * `(f)` next CCC = ? * `(a)` e1 * `(b)` e2 * `(c)` p * `(d)` y * `(e)` x * `(f)` next next DDD = ? * `(a)` e1 * `(b)` e2 * `(c)` p * `(d)` y * `(e)` x * `(f)` next EEE = ? * `(a)` e1 * `(b)` e2 * `(c)` p * `(d)` y * `(e)` x * `(f)` next FFF = ? * `(a)` e1 * `(b)` e2 * `(c)` p * `(d)` y * `(e)` x * `(f)` next GGG = ? * `(a)` e1 * `(b)` e2 * `(c)` p * `(d)` y * `(e)` x * `(f)` next HHH = ? * `(a)` e1 * `(b)` e2 * `(c)` p * `(d)` y * `(e)` x * `(f)` next :::success 延伸問題: 1. 解釋你反組譯的過程中，是如何答題 2. 在 `gcc -Os -S` 的輸出中，可見 `.cfi_startproc` 和 `.cfi_endproc` 這兩道假指令 ([pseudo instruction](https://nasm.us/doc/nasmdoc3.html))，其作用為何？gcc 程式碼產生器透過這樣的假指令要達到什麼效果？ 3. 在 GitHub 的 [組合語言版本 linked list 實作](https://github.com/search?l=assembly&p=2&q=linked+list&type=Repositories) 中挑一個為基礎，改寫為有效的 x86_64 組合語言程式碼，並確保其具備針對 singly-linked 的 list push, pop, print 等功能 ::: #### 如何答題可以觀察出 - ptr in `%rdi` 從第四行 `sub` 可以先看出 - `*(ptr->BBB.CCC->DDD.p)` 的結果會先放在 `%rax` - `(ptr->EEE.FFF->GGG.HHH)` 的結果在 `8(%rdx)` 其實這裡不從組語也很容易看出 `CCC` `FFF` 都是 `pointer` 而且明顯是 `next` ，再從 `movq 8(%rdi), %rdx` 可以看出 `%rdi + 8` 為 `next` 的擺放位址 - 所以 `BBB` `EEE` 為 `e2` 再來 `8(%rdx)` 從剛剛推論出來的記憶體位置判斷 `HHH` 不是 `y` 就是 `next` 再從型態判斷一下明顯是 `y` ``` movq 8(%rdi), %rdx movq (%rdx), %rax movq (%rax), %rax subq 8(%rdx), %rax ``` #### .cfi_startproc 和 .cfi_endproc 這兩道假指令 (pseudo instruction)作用 Stack unwinding - The process of removing function entries from function call stack at run time is called Stack Unwinding. Stack Unwinding is generally related to Exception Handling. 查閱 [the x86-64 Linux “ABI” - Intel® Developer Zone](https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf) 得到以下資訊 6.3 Unwinding Through Assembler Code For successful unwinding on AMD64 every function must provide a valid debug information in the DWARF Debugging Information Format. In high level languages (e.g. C/C++, Fortran, Ada, ...) this information is generated by the compiler itself. However for hand-written assembly routines the debug info must be provided by the author of the code. To ease this task some new assembler directives are added: > cfi (call frame information) compiler 會自動加入，用於程式遇錯誤時紀錄 Stack Trace > 另外 .debug_frame 要加入 -g 編譯才會有，除錯資訊量更大 .cfi_startproc - is used at the beginning of each function that should have an entry in .eh_frame. It initializes some internal data structures and emits architecture dependent initial CFI instructions. Each .cfi_startproc directive has to be closed by.cfi_endproc > 在有 .eh_frame 內有 entry 的 function 需要加入這個，用於初始化內部資料結構 .cfi_endproc - is used at the end of a function where it closes its unwind entry previously opened by.cfi_startprocand emits it to.eh_frame > 和 .cfi_startproc 搭配使用 4.2.3 Special SectionsTable - .got This section holds the global offset table. - .plt This section holds the procedure linkage table. - .eh_frame section holds the unwind function table. 4.2.4 EH_FRAME sections The call frame information needed for unwinding the stack is output into one or more ELF sections of type SHT_X86_64_UNWIND. > eh_frame 用於支援 stack unwinding #### x86_64 組語實作 singly-linked list 我把實作放在 [Github](https://github.com/colinyoyo26/linked-list-x86-64) 裡面有兩個版本，一個是以下 c program ，另一個是使用者可以下 command 利用 `switch case` 對 list 操作，寫組語時真的有很多小細節，像是 - `ret` 前要先 `leave` 把 `%rsp` `%rbp` 指回 callee stack 的末端和起點 - 因為 `%rbp` 在固定 function 內不會變，很適合用來 access data in stack 相反的隨著 stack 往下長 `%rsp` 會一直變 - caller 自己要保存 `%rax` `%rbx` 等暫存器 - call `printf` 如果沒有先 `movl $0, %eax` 會 segmentation fault 這點我還沒釐清 - 還有像是以下等等給 linker 的訊息還沒完全理解 .text .globl create_list .type create_list, @function ```cpp #include <stdio.h> #include <stdlib.h> typedef struct List list; typedef struct Node node; struct Node{ int key; node *next; }; struct List{ node *head; }; list *create_list(){ list *l = malloc(sizeof(list)); l->head = NULL; return l; } void push(list *l, int val){ node *cur = malloc(sizeof(node)); cur->key = val; cur->next = l->head; l->head = cur; } node *pop(list *l){ node *cur = l->head; l->head = cur->next; return cur; } void print(list *l){ node *cur = l->head; while(cur){ printf("%d ", cur->key); cur = cur->next; } printf("\n"); } int main(void){ list *l = create_list(); for(int i = 0; i < 10; i++) push(l, i); for(int i = 0; i < 10; i++){ print(l); pop(l); } } ``` 參照 CSAPP 一書提到的 jump table 實做出 `switch case` 的 multiway branch - 因為總共有 `a p c q` 四個 command ，所以 table 的大小開 `'q' - 'a' = 16` - 最後一個 `.EXIT` 是 `case q` ``` .JTABLE: .quad .CASEA .quad .DEFAULT .quad .CASEC .quad .DEFAULT .quad .DEFAULT .quad .DEFAULT .quad .DEFAULT .quad .DEFAULT .quad .DEFAULT .quad .DEFAULT .quad .DEFAULT .quad .DEFAULT .quad .DEFAULT .quad .DEFAULT .quad .DEFAULT .quad .CASEP .quad .Exit ``` 以下是 `switch case` 轉跳的地方 - 再來檢查是否範圍則直接跳到 `DEFAULT` print error message - `*` 是間接尋址，會從地址 `$.JTABLE + 8*%eax` 的地方取值 - 每個 Label 都是一個位址 ``` movzbl -16(%rbp), %eax subl $97, %eax cmpl $3, %eax ja .DEFAULT jmp *.JTABLE(,%eax, 8) ``` 執行畫面 - 推測 add key 時 `\n` 會被當下次 `fgets` 的輸入所以會直接跳到 `default` 還沒修正 ``` Commands: c create list a add key to top of list p pop list from top q quit choice: c list: Commands: c create list a add key to top of list p pop list from top q quit choice: a enter key: 5 list: 5 Commands: c create list a add key to top of list p pop list from top q quit choice: Inlavid command Commands: c create list a add key to top of list p pop list from top q quit choice: ``` --- ### 測驗 `1` 考慮貌似簡單卻蘊含實作深度的 [FizzBuzz](https://en.wikipedia.org/wiki/Fizz_buzz) 題目: * 從 1 數到 n，如果是 3的倍數，印出 "Fizz" * 如果是 5 的倍數，印出 "Buzz" * 如果是 15 的倍數，印出 "FizzBuzz" * 如果都不是，就印出數字本身直覺的實作程式碼如下: (`naive.c`) ```cpp #include <stdio.h> int main() { for (unsigned int i = 1; i < 100; i++) { if (i % 3 == 0) printf("Fizz"); if (i % 5 == 0) printf("Buzz"); if (i % 15 == 0) printf("FizzBuzz"); if ((i % 3) && (i % 5)) printf("%u", i); printf("\n"); } return 0; } ``` 觀察 `printf` 的(格式)字串，可分類為以下三種: 1. 整數格式字串 `"%d"` : 長度為 2 B 2. "Fizz" 或 "Buzz" : 長度為 4 B 3. "FizzBuzz" : 長度為 8 B 考慮下方程式碼: ```cpp #define MSG_LEN 8 char fmt[MSG_LEN + 1]; strncpy(fmt, &"FizzBuzz%u"[start], length); fmt[length] = '\0'; printf(fmt, i); printf("\n"); ``` 我們若能精準從給定輸入 `i` 的規律去控制 `start` 及 `length`，即可符合 FizzBuzz 題意: ```cpp string literal: "fizzbuzz%u" offset: 0 4 8 ``` 以下是利用 bit-wise 和上述技巧實作的 FizzBuzz 程式碼: (`bitwise.c`) ```cpp #define MSG_LEN 8 static inline bool is_divisible(uint32_t n, uint64_t M) { return n * M <= M - 1; } static uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1; static uint64_t M5 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 5 + 1; int main(int argc, char **argv) { for (size_t i = 1; i <= 100; i++) { uint8_t div3 = is_divisible(i, M3); uint8_t div5 = is_divisible(i, M5); unsigned int length = (2 << div3) << div5; char fmt[MSG_LEN + 1]; strncpy(fmt, &"FizzBuzz%u"[(MSG_LEN >> KK1) >> (KK2 << KK3)], length); fmt[length] = '\0'; printf(fmt, i); printf("\n"); } return 0; } ``` 其中 `is_divisible` 函式技巧來自 [Faster remainders when the divisor is a constant: beating compilers and libdivide](https://lemire.me/blog/2019/02/08/faster-remainders-when-the-divisor-is-a-constant-beating-compilers-and-libdivide/)，甚至 gcc-9 還內建了 [FizzBuzz optimization](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853) (Bug 82853 - Optimize x % 3 == 0 without modulo)。請補完。 ==作答區== KK1 = ? * `(a)` 0 * `(b)` 1 * `(c)` 2 * `(d)` div3 * `(e)` div5 KK2 = ? * `(a)` 0 * `(b)` 1 * `(c)` 2 * `(d)` div3 * `(e)` div5 KK3 = ? * `(a)` 0 * `(b)` 1 * `(c)` 2 * `(d)` div3 * `(e)` div5 :::success 延伸問題: 1. 解釋上述程式運作原理並評估 `naive.c` 和 `bitwise.c` 效能落差 * 避免 stream I/O 帶來的影響，可將 `printf` 更換為 `sprintf` 2. 分析 [Faster remainders when the divisor is a constant: beating compilers and libdivide](https://lemire.me/blog/2019/02/08/faster-remainders-when-the-divisor-is-a-constant-beating-compilers-and-libdivide/) 一文的想法 (可參照同一篇網誌下方的評論)，並設計另一種 bitmask，如「可被 3 整除則末位設為 1」「可被 5 整除則倒數第二位設定為 1」，然後再改寫 `bitwise.c` 程式碼，試圖運用更少的指令來實作出 branchless; * 參照 [fastmod: A header file for fast 32-bit division remainders on 64-bit hardware](https://github.com/lemire/fastmod) 3. `printf` 本身就是個直譯器，甚至符合 [Turing complete](https://en.wikipedia.org/wiki/Turing_completeness)，這意味著你可以用 `printf` 的字串處理來實作出各式程式，包含實作出另一個 [Turing complete](https://en.wikipedia.org/wiki/Turing_completeness) 的程式語言，例如 Brainf*ck: [printbf -- Brainfuck interpreter in printf](https://github.com/HexHive/printbf) * 閱讀簡報 [Memory corruption: why we can't have nice things](https://github.com/HexHive/printbf/blob/master/BalCCon-presentation.pdf) * 對照 [CS:APP 第 3 版](https://hackmd.io/@sysprog/S1vGugaDQ) 一書的第 3 章研讀 * 記錄你的認知並重現實驗 ::: ==回答== `unsigned int length = (2 << div3) << div5` - 輸出的長度已經判斷好了，下面只要判斷起始位址就好 `strncpy(fmt, &"FizzBuzz%u"[(MSG_LEN >> KK1) >> (KK2 << KK3)], length)` `MSG_LEN` 初值為 8 所以不位移的狀況下 `&"FizzBuzz%u"[(MSG_LEN >> KK1) >> (KK2 << KK3)]` 為 `"%u"` - 因為 $5|i$ 時要輸出 `Buzz` 先得出 `KK1` = `div5` - $3|i$ 時要指向起始位址，所以要把 `MSG_LEN` 歸零，要位移 4 格得 `(KK2 << KK3)` 為 `(div3 << 2)` #### --- ## 參考資料 [分析.cpp文件編譯生成文件語句的作用](https://www.cnblogs.com/justinyo/archive/2013/03/08/2950718.html) [the x86-64 Linux “ABI” - Intel® Developer Zone](https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf) [在沒有 Frame Pointer 的情況下進行 Stack Unwind](https://www.facebook.com/notes/scott-tsai/%E5%9C%A8%E6%B2%92%E6%9C%89-frame-pointer-%E7%9A%84%E6%83%85%E6%B3%81%E4%B8%8B%E9%80%B2%E8%A1%8C-stack-unwind/784226238316104/) [Stack Unwinding in C++](https://www.geeksforgeeks.org/stack-unwinding-in-c/) [Assembly x86 - “leave” Instruction](https://stackoverflow.com/questions/29790175/assembly-x86-leave-instruction/29790275)