Function Call and Stack Frame

# Function Call and Stack Frame This article records content related to my learning of function call and stack frame, including: * Compilation of C code and instruction explain * How stack frames grow and reclaim * Basic gdb operations If there are any questions or problems about the article, please feel free to leave the comment. --- Here is our example code, compile command, and my machine information ## Example code ```clike= // stack.c int FooB(int val) { return val + 1; } int FooA(int val) { return FooB(val + 1) } int main(void) { int val = FooA(0); return 0; } ``` ## Compiler command ```bash # compile with (-no-pie for the constant virtual address) $ gcc -no-pie -g -o stack stack.c ``` ## CPU info. ```bash! $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian ... ``` Here is the assembler code from my machine using `objdump -M intel -d stack` ``` assembly Disassembly of section .text: 0000000000400497 <FooB>: 400497: 55 push rbp 400498: 48 89 e5 mov rbp,rsp 40049b: 89 7d fc mov DWORD PTR [rbp-0x4],edi 40049e: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 4004a1: 83 c0 01 add eax,0x1 4004a4: 5d pop rbp 4004a5: c3 ret 00000000004004a6 <FooA>: 4004a6: 55 push rbp 4004a7: 48 89 e5 mov rbp,rsp 4004aa: 48 83 ec 08 sub rsp,0x8 4004ae: 89 7d fc mov DWORD PTR [rbp-0x4],edi 4004b1: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 4004b4: 83 c0 01 add eax,0x1 4004b7: 89 c7 mov edi,eax 4004b9: e8 d9 ff ff ff call 400497 <FooB> 4004be: c9 leave 4004bf: c3 ret 00000000004004c0 <main>: 4004c0: 55 push rbp 4004c1: 48 89 e5 mov rbp,rsp 4004c4: 48 83 ec 10 sub rsp,0x10 4004c8: bf 01 00 00 00 mov edi,0x1 4004cd: e8 d4 ff ff ff call 4004a6 <FooA> 4004d2: 89 45 fc mov DWORD PTR [rbp-0x4],eax 4004d5: b8 00 00 00 00 mov eax,0x0 4004da: c9 leave 4004db: c3 ret 4004dc: 0f 1f 40 00 nop DWORD PTR [rax+0x0] ``` Let's start the experiment! We will divide into 6 parts 1. [Before main function](#Before-main-Function) 2. [Before FooA function call](#Before-FooA-Function-Call) 3. [Before FooB function call](#Before-FooB-Function-Call) 4. [Function FooB return](#Function-FooB-Return) 5. [Function FooA return](#Function-FooA-return) 6. [Function main return](#Function-main-return) --- ## Before main Function **(Optional) GDB Environment Setup** ```bash # (optional) setup assembly language syntax # (gdb) set disassembly-flavor intel ``` **Step1. Set the breakpoint at the first instruction of the main function** ```bash! # exec gdb gdb -q ./stack # We can use `objdump -M intel -d stack` or # (gdb) disassemble main` to found out the first instruction address (gdb) b *0x00000000004004c0 (gdb) run # We can setup all interesting breakpoints or step one instruction by si (stepi) # We can print the stack (rsp) or base (rbp) by (gdb) x /2x $rsp 0x7fffffffddf8: 0xf7a05b97 0x00007fff ^ pointer address ^ value in the little endian form ``` Before main function, the stack looks like ![image](https://hackmd.io/_uploads/S16F3sKJR.png) ## Before FooA Function Call **Step1. Push the last base pointer to the stack / Fig.1 → Fig. 2-a** ```asm! 0x00000000004004c0 <+0>: push rbp ``` The direction and the word size are depended on the machine. For example, in my machine the stack will subtract by 8 bytes. **Step2. Copy the stack pointer to the base pointer / Fig. 2-a → Fig. 2-b** ```asm! 0x00000000004004c1 <+1>: mov rbp,rsp ``` **Step3. Reserve space for the `main` function / Fig. 2-b → Fig. 2-c** ```asm! 0x00000000004004c4 <+4>: sub rsp,0x10 ``` Because in my machine, the stack is growing downward. From the assembly code, we can learn that the stack will reserve 16 bytes for the main function. **Step4. Call the function `FooA` / Fig. 2-c → Fig. 2-d** ```asm! 0x00000000004004cd <+13>: call 0x4004a6 <FooA> ``` The `CALL` instruction will push the instruction where execution of the calling precedure should resume. In our example, it should resume to `0x4004d2`. We can check the result with ```bash! # After we step one instruction with si (gdb) print $pc $3 = (void (*)()) 0x4004a6 <FooA> (gdb) x /2x $rsp 0x7fffffffddd8: 0x004004d2 0x00000000 ``` ![image](https://hackmd.io/_uploads/B1Vfvotk0.png) ## Before FooB Function Call Basically, in terms of the stack frame perspective, it's the same as the main function. * **Fig. 2-d → Fig. 3-a** - Push the last base pointer (main function) to the stack * **Fig. 3-a → Fig. 3-b** - Copy the stack pointer to the base pointer * **Fig. 3-b → Fig. 3-c** - Reserve space (8bytes) for the `FooA` function * **Fig. 3-c → Fig. 3-d** - Call the function `FooB` ![image](https://hackmd.io/_uploads/rybzHiY10.png) ## Function FooB Return We can set a breakpoint at `0x4004a4`, which program starts returning `FooB` stack frame. Here is the stack before `FooB` return. ![image](https://hackmd.io/_uploads/rylyiiF1R.png) **Step1. Loads the top of the stack to the base pointer / Fig. 4 → Fig. 5-a** Because`FooB` stack don't have extra space, the stack pointer has already pointed at the memory which stores the last base pointer address. The next step is to restore the base pointer to the last stack frame. ```asm! 0x00000000004004a4 <+13>: pop rbp ``` The `POP` instruction will load the top of the stack (0x7fffffffddd0) to the operand (rbp). The stack pointer will decrease by a word size (8bytes) in the same time. ```bash (gdb) p $rbp $1 = (void *) 0x7fffffffddd0 (gdb) p $rsp $2 = (void *) 0x7fffffffddc0 ``` **Step2. Transfers program control to FooA / Fig. 5-a → Fig. 5-b** After all `FooB` works are done, the program can resume to the called procedure (which is the next instruction after `FooA`\'s `call`). ```asm! 0x00000000004004a5 <+14>: ret ``` The `RET` instruction will resume the program to the top of the stack (0x4004be) (which is pushed by `call`). Fig. 5-b is the stack after `FooB` return. ![image](https://hackmd.io/_uploads/By3zqxj10.png) ## Function FooA Return Just as constructing the stack, there isn't much difference between `FooA` and `FooB` when it comes to the returned function. The only different between them is `FooA` having extra space for its operations. If the function has reserved stack space for its functionality, it will include a `LEAVE` instruction before `RET`. Otherwise, `pop rbp` will suffice. **Step1. Release the stack frame / Fig. 5-b → Fig. 6-a** The `LEAVE` instruction copies the frame pointer to the stack pointer, which releases the stack space. The old frame pointer is then popped from the stack into the frame pointer. ```asm! 0x00000000004004be <+24>: leave ``` **Step2. Transfers program control to main / Fig. 6-a → Fig. 6-b** Return program to main function (0x4004d2) ```asm! 0x00000000004004bf <+25>: ret ``` ![image](https://hackmd.io/_uploads/B1LD2YakR.png) ## Function main Return After main function work is done, the program will return to the libc start by the `LEAVE` and `RET` instruction. The example program is ended! # Recursion vs Iteration After we understand the overhead of the function call, we can discuss the battle between recursion and iteration. In the iterative method, the task is accomplished by looping until certain conditions are met. This method maintains only one stack frame throughout its execution. On the contrary, in the recursive method, the program generates stack frames recursively until specific conditions are fulfilled. However, the frequent creation of stack frames may slow the program down. So, can we just simply jump to this conclusion? ![iteration_recursion](https://hackmd.io/_uploads/HkuhpWteC.jpg =400x300) No! By the power of the modern compiler, the recursive method may not necessary be slower than the iterative one. Let's consider the binary search algorithm ```clike! int RecursiveBinarySearch(int *arr, int low, int high, int key) { if (low > high) return -1; int mid = low + (high - low) / 2; if (arr[mid] == key) return mid; else if (arr[mid] > key) return RecursiveBinarySearch(arr, low, mid - 1, key); else return RecursiveBinarySearch(arr, mid + 1, high, key); } int IterativeBinarySearch(int *arr, int low, int high, int key) { while (low <= high) { int mid = low + (high - low) / 2; if (arr[mid] == key) return mid; else if (arr[mid] > key) high = mid - 1; else low = mid + 1; } return -1; } ``` Here are the [compilation results](https://godbolt.org/z/rYafovn1z) by `gcc -O3` It's not hard to find that the assembly results are the same. But if we change to `-O0` there is still a `call` instruction in the recursive function. > Fedor G. Pikus states, "Never guess about performance" (The Art of Writing Efficient Programs) We can compare the runtime performance between two methonds by micro benchmark. We will use the google benchmark as an example. **`-O0` result** ![image](https://hackmd.io/_uploads/Syw2Uv6lR.png) **`-O3` result** ![image](https://hackmd.io/_uploads/ByC4DwTeC.png) Columns explaination * Benchmark: `BM_IterativeBinarySearch` is the function name and the number after the slash is the array size (`arr`). * Time: the execution real time * CPU: the time cost in CPU * Iterations: how many iteration does function be called It's evdient that there isn't much different in the `-O3` build. But in the `-O0` build, the performance of the recursive method is 30% slower compared the iterative one. # Summary Function using iterative is often harder to understand in many cases, so there’s a direct intellectual cost to writing and maintaining the code, and the additional complexity can also lead to more bugs. I think every design is a trade-off. If the project has the concern about runtime performance, we should use profiling tools such as linux perf or google benchmark. If the function isn't the bottleneck of the program, we may keep the better readability one.