自我檢查事項(隔離執行環境的建構)

# 自我檢查事項(隔離執行環境的建構) contributed by <`ian910297`> - [x] 閱讀[隔離執行環境的建構與應用](https://hackmd.io/s/Hk9HLRbkf) ## Trusted Execution Environment (TEE) 的應用為何？騰訊所開發的 [soter](https://github.com/Tencent/soter) ，讓 app 開發者可以快速實現安全的指紋驗證。實現安全性的原因 * 所有關鍵數據操作都依賴 TEE * 廠商設備在出廠前安全環境會專門生成 TENCENT SOTER 設備密鑰 * 生物授權表示密鑰簽名， TEE 級別保證『無授權，不簽名』架構 ![架構](https://github.com/WeMobileDev/article/raw/master/assets/soter/SoterFramework.png) TENCENT SOTER 中，有三個級別的密鑰：ATTK, App Secure Key(ASK) 以及 AuthKey。這些都是 RSA-2048 的非對稱密鑰。所有的密鑰都由 TEE（一個獨立於Android系統的安全環境，這也是 TENCENT SOTER 能解決 root 下手機認證的關鍵所在）保護安全保存。這也就意味著除了數據所有者之外，沒有人可以使用私鑰。更重要的是，如果密鑰是在 TEE 中生成的，所有人 - 包括微信，廠商等 - 都無法得到密鑰私鑰。一句話總結，TENCENT SOTER 中所有非對稱密鑰均是在 TEE 內部生成的，即使設備被根，私鑰也不會洩露。除了密鑰之外，TENCENT SOTER 所有關鍵流程，如簽名等，均在 TEE 中進行，這也保證了核心流程不會被惡意應用篡改。更詳細的流程流程可以參閱 [Github Wiki - TENCENT SOTER原理](https://github.com/Tencent/soter/wiki/%E5%8E%9F%E7%90%86) ## Moxie 組合語言寫出迴圈版本的 Fibonacci 數列程式詳閱 [Moxie](http://moxielogic.org/blog/) 處理器架構的 [Architecture](http://moxielogic.org/blog/pages/architecture.html) 文件，了解指令架構 ```clike= ``` ## binutils, gcc, glibc, qemu 到 [libffi](https://sourceware.org/libffi/) 這些專案的作用為何？在看專案前，要先了解一下 libffi 概念：假設我們有一個 pointer 指向我們想使用的 function ，還需要知道輸入、輸出的數值跟型態。我們不會直接接觸到那個 function ，都透過 ffi 所提供的介面來操控。 `ffi_cif` 代表 Call InterFace ，用來符合 [function signature](https://en.wikipedia.org/wiki/Type_signature) `ffi_prep_cif` 初始化，在準備用來呼叫的物件 `ffi_call` 搭配初始化過後的 ffi_cif 來做使用，即可成功 call function ```clike= #include <stdio.h> #include <ffi.h> int main() { ffi_cif cif; ffi_type *args[1]; void *values[1]; char *s; ffi_arg rc; /* Initialize the argument info vectors */ args[0] = &ffi_type_pointer; values[0] = &s; /* Initialize the cif */ if (ffi_prep_cif(&cif, FFI_DEFAULT_ABI, 1, &ffi_type_sint, args) == FFI_OK) { s = "Hello World!"; ffi_call(&cif, puts, &rc, values); /* rc now holds the result of the call to puts */ /* values holds a pointer to the function's arg, so to call puts() again all we need to do is change the value of s */ s = "This is cool!"; ffi_call(&cif, puts, &rc, values); } return 0; } ``` * binutils 待補 * gcc gcc 裡面的 gcj 有使用到 ffi 這個 library ，GCJ 是在 interpreted 和 natively compiled 程式碼之間溝通做使用，不過 gcj 在 gcc 7 時已經被從專案移除掉，所以選擇 `gcc-6.4.0` 來做研究 `libjava/link.cc` 運用到 closure 包裝 ```clike=1072 void * _Jv_Linker::create_error_method (_Jv_Utf8Const *class_name, jclass klass) { void *code; method_closure *closure = (method_closure *)ffi_closure_alloc (sizeof (method_closure), &code); closure->arg_types[0] = &ffi_type_void; // Initializes the cif and the closure. If that worked the closure // is returned and can be used as a function pointer in a class' // atable. if ( ffi_prep_cif (&closure->cif, FFI_DEFAULT_ABI, 1, &ffi_type_void, closure->arg_types) == FFI_OK && ffi_prep_closure_loc (&closure->closure, &closure->cif, _Jv_ThrowNoClassDefFoundErrorTrampoline, class_name, code) == FFI_OK) { closure->list.registerClosure (klass, closure); return code; ``` * glibc 待補 * qemu 待補參考資料 * [Using libffi](http://www.chiark.greenend.org.uk/doc/libffi-dev/html/Using-libffi.html#Using-libffi) * [GCC Wiki - GCJ](https://gcc.gnu.org/wiki/GCJ) ## Intel 的 [Software Guard Extensions](https://software.intel.com/en-us/sgx) 關鍵特性為何？並舉出共筆以外的應用案例 * 關鍵特性基於 Intel CPU Instruction 去實作的 TEE 讓開發者可以將 App 切為兩個部分，可以把關鍵私密的資料都儲存在 Trusted Part ，且無法從指定的 call function 之外的地方對 Trusted Part 內容進行存取 ![run time](https://software.intel.com/sites/default/files/managed/6f/ab/runtime-execution.png) * 應用案例 - Azure confidential computing 使用兩套 TEE: VSM (software-based), Intel SGX (hardware-based) ![](https://azurecomcdn.azureedge.net/mediahandler/acomblog/media/Default/blog/91f67ece-811d-46de-8add-ab64d4fd050f.png) 參考資料 * [Software Guard Extensions - details](https://software.intel.com/en-us/sgx/details) * [Microsoft blog - Introducing Azure confidential computing](https://azure.microsoft.com/en-us/blog/introducing-azure-confidential-computing/?_ga=2.22608115.993430301.1510843037-2034239733.1510843036) ## [moxiebox](https://github.com/sysprog21/moxiebox) 打造出隔離執行 (sandbox) 的運作環境該如何與外界溝通？列出對應的程式碼並解讀 ## 法國的加密貨幣硬體錢包公司 [Ledger](https://www.ledgerwallet.com/) 的 BOLOS 運作原理詳閱文件 [sandbox execution environment](https://github.com/sysprog21/moxiebox/blob/master/sandbox-design.md) ## 何謂 cross compiler 呢？我們為何需要？待補開發環境未必會等於使用環境，使開發出的程式可以在特定環境底下執行 ## ELF 執行檔格式包含哪些 section 呢？又在哪裡可見到詳細描述？以 `moxiebox/tests/basic` 為例使用 `moxie-none-moxiebox-readelf -a` 來讀取 ELF 檔案，會得到 `ELF Header`, `Section Header`, `Program Header` 以及 `Symbol table` ```clike= ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Moxie Version: 0x1 Entry point address: 0x1000 Start of program headers: 52 (bytes into file) Start of section headers: 28200 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 2 Size of section headers: 40 (bytes) Number of section headers: 22 Section header string table index: 21 ``` 除了使用 `moxie-none-moxiebox-readelf` 也可以藉由 `moxie-none-moxiebox-objdump` 來列出 ELF 執行擋有哪些 section，列出的項目會不太一樣 ```clike= Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .text PROGBITS 00001000 000074 000452 00 AX 0 0 2 [ 2] .init PROGBITS 00001452 0004c6 00000e 00 AX 0 0 2 [ 3] .fini PROGBITS 00001460 0004d4 000008 00 AX 0 0 2 [ 4] .rodata PROGBITS 00001468 0004dc 000004 00 A 0 0 4 [ 5] .data PROGBITS 0000156c 0004e0 000430 00 WA 0 0 4 [ 6] .eh_frame PROGBITS 0000199c 000910 000004 00 WA 0 0 4 [ 7] .ctors PROGBITS 000019a0 000914 000008 00 WA 0 0 4 [ 8] .dtors PROGBITS 000019a8 00091c 000008 00 WA 0 0 4 [ 9] .bss NOBITS 000019b0 000924 000028 00 WA 0 0 4 [10] .comment PROGBITS 00000000 000924 00003c 01 MS 0 0 1 [11] .debug_aranges PROGBITS 00000000 000960 0000f0 00 0 0 1 [12] .debug_info PROGBITS 00000000 000a50 0031bb 00 0 0 1 [13] .debug_abbrev PROGBITS 00000000 003c0b 000cbe 00 0 0 1 [14] .debug_line PROGBITS 00000000 0048c9 000dd1 00 0 0 1 [15] .debug_frame PROGBITS 00000000 00569c 000178 00 0 0 4 [16] .debug_str PROGBITS 00000000 005814 000a1b 01 MS 0 0 1 [17] .debug_loc PROGBITS 00000000 00622f 000415 00 0 0 1 [18] .debug_ranges PROGBITS 00000000 006644 000038 00 0 0 1 [19] .symtab SYMTAB 00000000 00667c 0004d0 10 20 49 4 [20] .strtab STRTAB 00000000 006b4c 000214 00 0 0 1 [21] .shstrtab STRTAB 00000000 006d60 0000c7 00 0 0 1 ``` * `.text` 執行的程式碼 * `.init` 在進入 main 之前所執行的程式碼 * `.fini` 程式正常結束後所需執行的程式碼 * `.rodata` 唯獨的資料，不可寫入 * `.data` 有初始化的資料 * `.ctors` constructor 相關資訊 * `.dtors` deconstructor 相關資訊 * `.bss` 未初始化的資料，這邊的資料沒有檔案空間，所以 section type 是 `NOBIT` * `.comment` 版本控制相關資訊 * `.debug_XXX` debug 相關的資訊 * `.symtab` 控制 symbol table * `.strtab` 控制跟 symbol table 有關的字串 * `.shstrtab` 控制 section name ```clike Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x00000f8c 0x00000f8c 0x004e0 0x004e0 R E 0x1 LOAD 0x0004e0 0x0000156c 0x0000156c 0x00444 0x0046c RW 0x1 Section to Segment mapping: Segment Sections... 00 .text .init .fini .rodata 01 .data .eh_frame .ctors .dtors .bss ``` ```clike= Symbol table '.symtab' contains 77 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00001000 0 SECTION LOCAL DEFAULT 1 2: 00001452 0 SECTION LOCAL DEFAULT 2 3: 00001460 0 SECTION LOCAL DEFAULT 3 4: 00001468 0 SECTION LOCAL DEFAULT 4 5: 0000156c 0 SECTION LOCAL DEFAULT 5 6: 0000199c 0 SECTION LOCAL DEFAULT 6 7: 000019a0 0 SECTION LOCAL DEFAULT 7 ``` 參考資料 * [Tool Interface Standard (TIS) - ELF](http://refspecs.linuxbase.org/elf/elf.pdf) * [Stackoverflow - __do_global_dtors_aux and __do_global_ctors_aux ](https://stackoverflow.com/a/6477522) ## 是否已掌握 GNU gprof 的使用？運作原理為何？待補參考資料 * [Microsoft Docs - 認識程式碼剖析方法](https://docs.microsoft.com/zh-tw/visualstudio/profiling/understanding-performance-collection-methods) ## 如何在執行時期檢驗載入的 ELF 執行檔裡頭 `.text `和 `.data` section 內容呢？以 `moxiebox/tests/basic` 為例，使用 `moxie-none-moxiebox-gdb` 進行除錯使用 `info files` 取得 section address ```clike= (gdb) info files Symbols from "/home/demonic/sandbox/moxiebox/tests/basic". Local exec file: `/home/demonic/sandbox/moxiebox/tests/basic', file type elf32-littlemoxie. Entry point: 0x1000 0x00001000 - 0x00001452 is .text 0x00001452 - 0x00001460 is .init 0x00001460 - 0x00001468 is .fini 0x00001468 - 0x0000146c is .rodata 0x0000156c - 0x0000199c is .data 0x0000199c - 0x000019a0 is .eh_frame 0x000019a0 - 0x000019a8 is .ctors 0x000019a8 - 0x000019b0 is .dtors 0x000019b0 - 0x000019d8 is .bss ``` 使用 `x` (for "examine memory") ```clike= (gdb) x/5i 0x1000 0x1000 <__start>: xor $fp, $fp 0x1002 <__start+2>: gsr $sp, 0x7 0x1004 <__start+4>: ldi.l $r0, 0x19b0 0x100a <__start+10>: xor $r1, $r1 0x100c <__start+12>: ldi.l $r2, 0x19d8 (gdb) x/5i 0x156c 0x156c <_impure_ptr>: mul.x $r5, $fp 0x156e <_impure_ptr+2>: bad 0x1570 <impure_data>: bad 0x1572 <impure_data+2>: bad 0x1574 <impure_data+4>: bad ``` 參考資料 * [Debugging with GDB - 8.5 Examining memory](http://www.delorie.com/gnu/docs/gdb/gdb_56.html) ## 是否詳讀文件 [遠端除錯](http://www.study-area.org/cyril/opentools/opentools/x1265.html) 並記錄心得呢？是否在 GNU/Linux 實際照著操作？又，遇到什麼問題呢？ * Multi-Thread 除錯在運行第二個 gdb 時，發現不能 attach to process ```clike= ~> ps -aef | grep a.out demonic 9201 8370 3 19:28 pts/0 00:00:00 gdb a.out demonic 9203 9201 0 19:28 pts/0 00:00:00 /home/demonic/test/a.out demonic 9207 9203 0 19:28 pts/0 00:00:00 /home/demonic/test/a.out demonic 9210 9090 0 19:28 pts/9 00:00:00 grep --color=auto a.out ~> gdb -q a.out (gdb) attach 9207 Attaching to program: /home/demonic/test/a.out, process 9207 Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. ``` 解決辦法：編輯 `/etc/sysctl.d/10-ptrace.conf` 將其數值改為 0，並且使用 `sysctl -p /etc/sysctl.d/10-ptrace.conf` 更新 `/proc/sys/kernel/yama/ptrace_scope` 的數值，如此即可繼續練習使用 gdb 。也可以只更動 proc 底下的數值，但重新開機就會需要重新設定 * 遠端除錯 target remote 部份，已使用 `moxie-none-moxiebox-gdb` 進行練習 target cisco 部份，沒有機器可以試用 * core dump 除錯為了產生 coredump 檔案，須先執行 `ulimited -c unlimited` 參考資料 * [Stackoverflow - after upgrade gdb won't attach to process](https://askubuntu.com/a/41656) * [[Linux] 開啟 core dump 與設定產生 core dump 的程式](https://ephrain.net/linux-%E9%96%8B%E5%95%9F-core-dump-%E8%88%87%E8%A8%AD%E5%AE%9A%E7%94%A2%E7%94%9F-core-dump-%E7%9A%84%E7%A8%8B%E5%BC%8F/) ## GDB 命令如 step 是如何透過 GDB stub 傳遞到 moxiebox 裡頭呢？兩邊的通訊協定又為何？ * moxiebox 啟動 `sandbox` ```shell= ~> ../src/sandbox -e basic -g 9999 ``` 使用 `netstat` 進行觀察，發現在 `0.0.0.0:9999` 的位置上，已經開啟一個 tcp 連線 ```shell= ~> netstat -tcpl Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:902 0.0.0.0:* LISTEN - tcp 0 0 localhost:mysql 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:hostmon 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:9999 0.0.0.0:* LISTEN 11680/../src/sandbo ``` 實作程式碼在 `src/sandbox.cc` ```clike=366 static int gdb_main_loop(uint32_t &gdbPort, machine &mach) { int sockfd, newsockfd, on; struct sockaddr_in serv_addr, cli_addr; socklen_t clilen; sockfd = socket(AF_INET, SOCK_STREAM, 0); if (sockfd < 0) { perror("ERROR opening socket"); return EXIT_FAILURE; } on = 1; setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, (char *) &on, sizeof(on)); on = 1; setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, (char *) &on, sizeof(on)); serv_addr.sin_family = AF_INET; serv_addr.sin_addr.s_addr = htonl(INADDR_ANY); serv_addr.sin_port = htons(gdbPort); if (bind(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0) { close(sockfd); perror("ERROR on binding"); return EXIT_FAILURE; } listen(sockfd, 1); clilen = sizeof(cli_addr); newsockfd = accept(sockfd, (struct sockaddr *) &cli_addr, &clilen); ``` 老師在這部分還預留了兩個 bug 以及一個 TODO 待補 * GDB stub 啟動 `moxie-none-moxiebox-gdb` ```shell= ~> moxie-none-moxiebox-gdb -q basic (gdb) target remote :9999 ``` 待補 * GDB stub 跟 moxie 之間的溝通待補參考資料 * [GDB Remote Serial Protocol](https://sourceware.org/gdb/onlinedocs/gdb/Remote-Protocol.html) ## 是否理解 GDB Macro 和 Command Files 呢？能否透過 moxiebox 進行練習呢？ # 自我檢查事項(Model Checking) contributed by <`st9007a`> ## cbmc 的作用為何？在什麼領域會需要呢？ cmbc 是一個用於 C/C++ 的 Bounded Model Checker，主要來檢測程式碼中是否存在潛在不合法的陣列或指標使用，更詳細的資訊可以使用指令 `cbmc --help` 查看： ``` $ cbmc --help Program instrumentation options: --bounds-check enable array bounds checks --pointer-check enable pointer checks (always enabled for Java) --memory-leak-check enable memory leak checks --div-by-zero-check enable division by zero checks --signed-overflow-check enable signed arithmetic over- and underflow checks --unsigned-overflow-check enable arithmetic over- and underflow checks --pointer-overflow-check enable pointer arithmetic over- and underflow checks --conversion-check check whether values can be represented after type cast --undefined-shift-check check shift greater than bit-width --float-overflow-check check floating-point for +/-Inf --nan-check check floating-point for NaN --no-built-in-assertions ignore assertions in built-in library --no-assertions ignore user assertions --no-assumptions ignore user assumptions --error-label label check that label is unreachable --cover CC create test-suite with coverage criterion CC --mm MM memory consistency model for concurrent programs ``` 其主要檢測項目有陣列 , 指標 , 記憶體跟 overflow 等等參考 [Applications of CBMC](http://www.cprover.org/cbmc/applications/)，cbmc 在這裡主要用於除錯或偵測潛在的錯誤，再參考 [使用事件自動機規約的C语言有界模型檢測](http://www.jos.org.cn/html/2014/11/4609.htm)，在一些複雜且不容出錯或者難以除錯的環境中，如交通跟醫療，可以透過形式化驗證來檢測系統 ## 能否簡介自動機理論 (automata theory) 呢？ ### 簡介有限自動機一個有限自動機中有以下特徵： - 有限數量的狀態 - 一個起始狀態 - 可以接受的狀態 - 固定的輸入 - 轉移函數 ![](https://i.imgur.com/qPTBy2h.png) 以上圖來舉例，A, B, C 就是所有的狀態，0, 1 則是輸入，箭頭代表輸入後狀態的轉變，其中： - A 為起始狀態 - B, C 為可以接受的狀態轉移函數則可以用表格來表示： | | 0 | 1 | | - | - | - | | A | A | B | | B | A | C | | C | C | C | 最左邊的 column 代表現在的狀態，對應到給定的輸入後轉變的狀態 ## model checking 中局部狀態表示每個主題執行通訊協定的過程，而所有局部狀態構成全域狀態，每個局部狀態的改變由主體的訊息接收、發送來觸發，能否舉出實際案例說明？待補 ## 是否已編譯 cbmc 並用來分析 C 程式呢？若是，舉出詳細的操作和你的解讀跟著 [CBMC: Bounded Model Checking for C/C++ and Java](http://www.cprover.org/cprover-manual/cbmc.shtml) 操作，首先對以下程式執行指令 `cbmc test1.c --bounds-check --pointer-check` ```clike int puts(const char *s) { } int main(int argc, char **argv) { puts(argv[2]); } ``` 可以看到以下輸出 ``` ** Results: [main.pointer_dereference.1] dereference failure: pointer NULL in argv[(signed long int)2]: SUCCESS [main.pointer_dereference.2] dereference failure: pointer invalid in argv[(signed long int)2]: SUCCESS [main.pointer_dereference.3] dereference failure: deallocated dynamic object in argv[(signed long int)2]: SUCCESS [main.pointer_dereference.4] dereference failure: dead object in argv[(signed long int)2]: SUCCESS [main.pointer_dereference.5] dereference failure: pointer outside dynamic object bounds in argv[(signed long int)2]: SUCCESS [main.pointer_dereference.6] dereference failure: pointer outside object bounds in argv[(signed long int)2]: FAILURE ** 1 of 6 failed (2 iterations) VERIFICATION FAILED ``` 從輸出訊息中注意到 `pointer outside object bounds in argv[(signed long int)2]: FAILURE`，接著利用 `--show-properties` 可以查看每個要驗證的 property： ``` Property main.pointer_dereference.6: file test1.c line 4 function main dereference failure: pointer outside object bounds in argv[(signed long int)2] 16l + POINTER_OFFSET(argv) >= 0l && OBJECT_SIZE(argv) >= 24ul + (unsigned long int)POINTER_OFFSET(argv) || DYNAMIC_OBJECT(argv) ``` 上面為 `--show-properties` 的其中一個 property，而這個 property 的 rule 就是最後一行的 `16l + POINTER_OFFSET(argv) >= 0l && OBJECT_SIZE(argv) >= 24ul + (unsigned long int)POINTER_OFFSET(argv) || DYNAMIC_OBJECT(argv)` 而這個 rule 來自於 cbmc 自己的規定，再來可以用 `--trace` 來查看 fail 發生的原因，它會在 fail 的 property 中 trace 失敗發生時的前幾個狀態 ``` Trace for main.pointer_dereference.6: State 18 thread 0 ---------------------------------------------------- INPUT argc: 1 (00000000000000000000000000000001) State 19 thread 0 ---------------------------------------------------- argv'[1]=((char *)NULL) (0000000000000000000000000000000000000000000000000000000000000000) State 22 file test1.c line 3 thread 0 ---------------------------------------------------- argc=1 (00000000000000000000000000000001) State 23 file test1.c line 3 thread 0 ---------------------------------------------------- argv=argv' (0000001000000000000000000000000000000000000000000000000000000000) Violated property: file test1.c line 4 function main dereference failure: pointer outside object bounds in argv[(signed long int)2] 16l + POINTER_OFFSET(argv) >= 0l && OBJECT_SIZE(argv) >= 24ul + (unsigned long int)POINTER_OFFSET(argv) || DYNAMIC_OBJECT(argv) ``` 從 trace 的結果可以大概知道當 argc = 1 時，在第 4 行程式碼會發生違反 property 的狀況，而第 4 行剛好是 ```clike puts(argv[2]); ``` 也就是 argc = 1 時會發生存取到超過 argv 的有效範圍的位置 ## 以 moxiebox 為例，能否使用 cbmc 來找出程式碼實作的缺陷 (或者自己創造) 呢？進入 `moxiebox/runtime` 中，可以發現裡面有幾個跟記憶體操作有關的程式碼，打開其中的 `memchr.c` 它實際上執行的程式碼如下： ```clike void *memchr(const void *src_void, int c, size_t length) { const unsigned char *src = (const unsigned char *) src_void; unsigned char d = c; while (length--) { if (*src == d) return (void *) src; src++; } return NULL; } ``` 它會從 `src_void` 為起點，往後 `length` 的長度中找到符合 `c` 的 character 針對這個 function 做 unsigned integer overflow 的檢查，會出現驗證失敗的狀況： ``` $ cbmc memchr.c --function memchr --unwind 10 --unsigned-overflow-check ... ** 1 of 1 failed (1 iteration) VERIFICATION FAILED ``` 在 cbmc 中加入 `--trace` 查看詳細的原因： ``` $ cbmc memchr.c --function memchr --unwind 10 --unsigned-overflow-check --trace ... State 105 file memchr.c line 120 function memchr thread 0 ---------------------------------------------------- length=0ul (0000000000000000000000000000000000000000000000000000000000000000) State 108 file memchr.c line 123 function memchr thread 0 ---------------------------------------------------- src=&tmp$1!0 (0000001000000000000000000000000000000000000000000000000000001001) Violated property: file memchr.c line 120 function memchr arithmetic overflow on unsigned - in length - 1ul !overflow("-", size_t, length, 1ul) ** 1 of 1 failed (1 iteration) VERIFICATION FAILED ``` 研究了一下，其原因出在當 length = 0 時，除了跳出 while 迴圈以外還會將 length 再減一次 1，由於 length 型態為 unsigned int 因此這裡就出現 unsigned integer overflow，以這個案例來說，這樣的實作方式其實不會造成什麼危害，要注意到的是 cbmc 找到的不一定是 bug，也有可能是潛在的漏洞修正方法： ```clike while (length) { length--; if (*src == d) return (void *) src; src++; } return NULL; ``` 為防止發生 0ul - 1 的狀況，在迴圈中才做 `length--` 的操作，利用 cbmc 再次確認： ``` $ cbmc memchr.c --function memchr --unwind 10 --unsigned-overflow-check ... VERIFICATION SUCCESSFUL ``` ## 在 GitHub 專案中找出使用 model checking 的案例，並且具體說明其作用 (當然要親自實驗) 待補 ## 參考資料 [Applications of CBMC](http://www.cprover.org/cbmc/applications/) [使用事件自動機規約的C语言有界模型檢測](http://www.jos.org.cn/html/2014/11/4609.htm) [[Coursera][Automata] 自動機理論-Automata筆記-第一週Finite Automata](http://www.evanlin.com/moocs-coursera-automata-note1/) [自動機理論](http://prompt.nou.edu.tw/web/sno204/content/w01/01-02-01t.htm) [What is model checking](http://www.cse.chalmers.se/~gersch/model-checking/What-is-Model-Checking.html) # 自我檢查事項(第 6 週作業回顧) contributed by <`st9007a`> ## 是否已知曉搭配 computed goto 實作更高效的 opcode dispatch 呢？舉例說明並且附上效能分析 ### 說明 full-stack-hello 這個 repo 中的 virtual machine 使用 direct threading 的方法來實作 opcode dispatch，direct threading 是在一個 function 內設置多個 label，透過一個 direct threading table 來決定每個 opcode 所要前往的 label，在這個 repo 中 `vm_run` 一開始使用 `BEGIN_OPCODES` 所產生的 label list 就是用來扮演 direct threading table 的腳色 ### 效能分析這邊將比較兩種 opcode dispatch 方式的效能，這兩種方式分別是 computed goto 跟 switch 首先在 full-stack-hello 開另一個 branch 實作 switch statement 版本的 virtual machine： ```clike void vm_run(vm_env *env) { while (1) { switch (OPCODE.opcode) { case OP(ADD): { VM_CALL_HANDLER(); break; } case OP(SUB): { VM_CALL_HANDLER(); break; } ...中間略... case OP(CLZ): { VM_CALL_HANDLER(); break; } case OP(JLT): { VM_JLT(); break; } case OP(JLE): { VM_JLE(); break; } case OP(JGT): { VM_JGT(); break; } case OP(JGE): { VM_JGE(); break; } case OP(JZ): { VM_JZ(); break; } case OP(JNZ): { VM_JNZ(); break; } case OP(JMP): { profiling(); GOTO(OPCODE.op1.value.id); break; } case OP(CALL): { VM_CALL(OPCODE.op1.value.id); break; } case OP(RET): { VM_RET(); break; } case OP(HALT): { profiling(); END_OPCODES; #if defined(profile) show_profiling_info(); #endif return; } } } } ``` macro 的部分也要做一點修改，主要是改掉 goto 的部分，詳細實作可以看 [github](https://github.com/st9007a/full-stack-hello/commit/c4a67aa748cd7166f4f0a52b76afa822fd195bf5) 接著效能比較使用 perf 對 recursive 版本的 Fibonacci 來做比較： ``` /* computed goto */ $ perf stat --repeat 100 -e instructions,cycles ./as_exec --input 34 tests/fib-recursive.s Performance counter stats for './as_exec --input 34 tests/fib-recursive.s' (100 runs): 186,0646,9616 instructions # 2.68 insns per cycle ( +- 0.00% ) 69,3560,8922 cycles ( +- 0.07% ) 19,9244,3216 branches ( +- 0.00% ) 1862,3270 branch-misses # 0.93% of all branches ( +- 0.00% ) 1.934058912 seconds time elapsed ( +- 0.08% ) /* switch */ $ perf stat --repeat 100 -e instructions,cycles ./as_exec --input 34 tests/fib-recursive.s Performance counter stats for './as_exec --input 34 tests/fib-recursive.s' (100 runs): 190,7057,9491 instructions # 2.69 insns per cycle ( +- 0.00% ) 70,9904,7359 cycles ( +- 0.25% ) 23,0182,8702 branches ( +- 0.00% ) 2256,1088 branch-misses # 0.98% of all branches ( +- 0.06% ) 1.979487518 seconds time elapsed ( +- 0.25% ) ``` 這邊選擇觀察 instruction 跟 cycle 數量以及 branch 跟 branch miss，在執行速度上差距比較不明顯，computed goto 大約比 switch 快 2%，而 branch 上 computed goto 都比 switch 少了 10% 以上 ( 其中 branches 減少 13%，branch miss 減少 17% ) ## 何謂 tail recursion 呢？舉出具體而微的案例並用 perf 解釋細部差異 (特別是 cycle count 的落差) 一般的 recursion 會 call function 至最後一個 function return 之後再開始做計算，而 tail recursion 會在 call function 之前先做計算，拿 Fibonacci 做舉例 ```clike int fib_recursive(int n) { if (n == 0) return 0; if (n == 1) return 1; return fib_recursive(n - 1) + fib_recursive(n - 2); } int fib_tail_recursion(int n, int a, int b) { if (n == 0) return a; return fib_tail_recursion(n - 1, b, a + b); } ``` recursive 版本的 Fibonacci 實作最後會回傳 fib(n - 1) + fib(n - 2)，而這兩個 function 在最後也會同樣繼續呼叫 fib，直到 fib 的輸入值為 0 或 1 時才會開始回傳數值，並且將回傳的數值相加 tail recursion 在最後 call function 之前已經先做完相加了，即最後一個參數的位置 `a + b`，相加完才傳給下一個 function，由於計算已經做完了，所以之後的 call function 就可以省去重複的計算利用 perf 來比較兩者在 n = 10 的 cycles 跟 instructions： ``` Performance counter stats for './main' (10 runs): 59,3426 cycles ( +- 0.50% ) 51,2857 instructions # 0.86 insns per cycle ( +- 0.33% ) 0.001705757 seconds time elapsed ( +- 14.87% ) Performance counter stats for './main' (10 runs): 59,0573 cycles ( +- 0.40% ) 50,8912 instructions # 0.86 insns per cycle ( +- 0.21% ) 0.001778989 seconds time elapsed ( +- 18.14% ) ``` 上面為 tail recursion，下面為 recursive，可以注意到在 n = 10 時，兩者所花費的時間, cycles, instructions 相近，接著比較 n = 40 時的狀況： ``` Performance counter stats for './main' (10 runs): 59,3367 cycles ( +- 1.23% ) 51,1270 instructions # 0.86 insns per cycle ( +- 0.68% ) 0.001777684 seconds time elapsed ( +- 18.38% ) Performance counter stats for './main' (10 runs): 26,9610,2973 cycles ( +- 0.00% ) 61,6681,1237 instructions # 2.29 insns per cycle ( +- 0.00% ) 0.752938374 seconds time elapsed ( +- 0.10% ) ``` 上面為 tail recursion，下面為 recursive，可以發現到 n = 40 時，recursive 所花費的時間, cycles, instructions 有顯著的增加，這是由於過多的重複運算造成 ## 是否理解 Fibonacci 數列求值的多種實作方式？若是，詳述之 ### Iterative 透過迴圈從 f0 跟 f1 不斷相加直到要求的 index，程式碼如下： ```clike int fib_iterative(int n) { int f0 = 0; int f1 = 1; int sum = n == 1 ? f1 : f0; for (int i = 1; i < n; i++) { sum = f0 + f1; f0 = f1; f1 = sum; } return sum; } ``` ### Recursive 透過不斷呼叫 `fib(n - 1) + fib(n - 2)` 的方式，讓程式一直往下呼叫，最後再將所有結果相加，程式碼如下： ```clike int fib_recursive(int n) { if (n == 0) return 0; if (n == 1) return 1; return fib_recursive(n - 1) + fib_recursive(n - 2); } ``` ### Tail Recursion 同樣是透過不斷往下呼叫 function，但是在呼叫前會先做計算，再把計算後的值當作參數傳入 function 中，程式碼如下，其中 `a + b` 就是預先計算的值： ```clike int fib_tail_recursion(int n, int a, int b) { if (n == 0) return a; return fib_tail_recursion(n - 1, b, a + b); } ``` ### Q-Matrix 參考 [你所不知道的 C 語言 - 遞迴呼叫篇](https://hackmd.io/s/rJ8BOjGGl)，Fibonacci 可透過矩陣轉換來計算： $A = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} = \begin{bmatrix} F_2 & F_1 \\ F_1 & F_0 \end{bmatrix}$ $A ^ {n+1} = A A ^ n = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} F_{n+1} & F_n \\ F_n & F_{n-1} \end{bmatrix} = \begin{bmatrix} F_{n+1} + F_n & F_n + F_{n-1} \\ F_{n+1} & F_n \end{bmatrix} = \begin{bmatrix} F_{n+2} & F_{n+1} \\ F_{n+1} & F_n \end{bmatrix}$ 到這邊為止，就可以透過 iterative 或 tail recursion 來計算 $A ^ n$ 進而求得 fib(n)，透過以下矩陣加速計算的方法來提升計算速度 $A ^ {2n+1} = A ^ {n + 1} A ^ n$ ### Fast Doubling 根據 Fibonacci Q-Matrix 的特性可以推導出以下公式： $F(2k) = F(k) \times (2F(k+1) - F(k))$ $F(2k+1) = F(k + 1) ^ 2 + F(k) ^ 2$ 透過這兩條公式，可以在時間複雜度為 $O(log_2N)$ 的狀況下求出 fib(n)，另外 Q-Matrix 的計算時間複雜度也是 $O(log_2N)$ 以下為程式碼： ```clike int fib_fast_doubling (int n) { if (n == 0) return 0; if (n == 1) return 1; int cbs = 30 - __builtin_clz(n); int f1 = 1; int f2 = 1; int sum, sum_1; while (cbs >= 0) { sum = (f2 * 2 - f1) * f1; sum_1 = f1 * f1 + f2 * f2; f2 = sum_1; f1 = sum; if ((n >> cbs) & 1) { f1 = f2; f2 = sum_1 + sum; } cbs--; } return f1; } ``` 程式主要是從輸入值的最高有效位開始往前檢查每個 bit，如果 bit = 0 就計算 fib(2k) 及 fib(2k + 1)，如果 bit = 1 除了計算 fib(2k), fib(2k + 1) 以外，再根據 fib(2k), fib(2k + 1) 計算 fib(2k + 2) ## GCC 如何從 3-address code 轉換，又如何轉換出給定 ISA 的組合語言程式？待補 ## 目前採用的 LP64 資料模型下，int 實質為 32-bit，而 fib(46) 尚在 32-bit 能表示的數值範圍內，但 fib(47) 該如何處理？需做 integer overflow 的檢查，必須將相加及相乘的運算用能夠處理 overflow 的 function 來取代 ```clike uint32_t add(uint32_t a, uint32_t b, int *carry) { *carry = (a >> 1) + (b >> 1) + (a & b & 1) >= (uint32_t)(1 << 31); return a + b; } void mul(uint32_t a, uint32_t b, uint32_t *hi, uint32_t *lo) { uint32_t alo = a & 0xFFFF; uint32_t blo = b & 0xFFFF; uint32_t ahi = a >> 16; uint32_t bhi = b >> 16; int carry = 0; *hi = (ahi * blo >> 16) + (alo * bhi >> 16); *hi += ahi * bhi; *lo = add((ahi * blo) & 0xFFFF << 16, (alo * bhi) & 0xFFFF << 16, &carry); *hi += carry; *lo = add(*lo, alo * blo, &carry); *hi += carry; } ``` ## Fibonacci sequence 在真實世界中究竟有何應用？ ### Fibonacci Search 用於在一個排序過的陣列中尋找 element，類似於 binary search，首先選擇一個 index 當作 entry，在 binary search 中為 `array.length / 2`，fibonacci search 則為 `n , where min(f(n)) >= array.length` 接著如果沒找到就根據搜尋目標與現在索引的值的大小往左或往右移動 f(n - 1)，若沒找到就繼續移動 f(n - 2) , f(n - 3) ... 以此類推與 binary search 比較： - binary search 每次的移動使用除法，而 fibonacci search 使用加法或減法，在 cpu 上除法的開銷相對比較大 - 在數據極大的狀況下，fibonacci search 每次移動的距離相對較近，在 cache 上有優勢 ## Fibonacci Q-Matrix 如何運作？由數學歸納法來證明： 1. 有一矩陣 A ， n = 1 時如下： $A ^ 1 = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} = \begin{bmatrix} F_2 & F_1 \\ F_1 & F_0 \end{bmatrix}$ 2. 假設 $n = m$ 時，符合條件： $A ^ m = \begin{bmatrix} F_{m+1} & F_m \\ F_m & F_{m-1} \end{bmatrix}$ 3. 證明 $n = m + 1$，條件依舊成立： $A ^ {m+1} = A A ^ m = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} F_{m+1} & F_m \\ F_m & F_{m-1} \end{bmatrix}$ $= \begin{bmatrix} F_{m+1} + F_m & F_m + F_{m-1} \\ F_{m+1} & F_m \end{bmatrix} = \begin{bmatrix} F_{(m+1) + 1} & F_{m+1} \\ F_{m+1} & F_{(m+1) - 1} \end{bmatrix}$ 故由數學歸納法原理得證 ## 如何設計 CPU profiler 呢？參考 [github - simple.vm](https://github.com/skx/simple.vm)　的設計方式，利用 `#if defined(profile)` 將 profiler 相關的程式碼寫在其中，如蒐集 instruction, cycles 的數量等，如果 profile mode 就透過 gcc 的 -D 選項來定義 profile，normal mode 就不定義，因此在正常模式中就不會執行跟 profiler 相關的程式碼相關程式碼放在 [github](https://github.com/st9007a/full-stack-hello/commit/e63412bfe8a94e97cf8ce138890a99033ad87e9d) ## 如果我們也想對 full-stack-hello 提出 GDB 支援，該如何實作 GDB stub 呢？ ## 參考資料 [3 Dispatch Techniques](http://www.cs.toronto.edu/~matz/dissertation/matzDissertation-latex2html/node6.html) [你所不知道的 C 語言 - 遞迴呼叫篇](https://hackmd.io/s/rJ8BOjGGl) [github - simple.vm](https://github.com/skx/simple.vm) [fibonacci search](http://www.geeksforgeeks.org/fibonacci-search/)