Any URL in the form //#SOMETHING
will crash the web server. We can use the timing of the crash to leak the flag.
The state of the program might be messed up if one of the leaks is mistimed. Also, the instancer seemed to be doing some kind of exponential backoff with the restart which is why part of the program is dedicating to automatically restarting the instance it every now and then
Source:
We are also given encoded.txt
, which contains points
. The x
values range from 0
to 1769610
with seemingly random y
values. So the total length of the matrix is n = 1769611
. The source offers a "solution" which would theoretically give us the flag if it were not far too slow for an by matrix. The source solves the system of equations below.
np.linalg.solve()
in the script given simply uses gaussian elimination which would solve the system in . Unfortunately that is much too slow for n = 1769611
. After googling we found the matrix to be a Vandermonde matrix which is related to Lagrange interpolation. This makes sense with the names x
, y
, and point
in the code.
There were several algorithms listed on Wikipedia, but we thought they would be too slow. So we found a sub-quadratic Lagrange interpolation algorithm: https://codeforces.com/blog/entry/94143 (thanks grhkm!). So we simply implemented idea 2 on the blog in sage. The solution took 5 hours, but I think we could have optimized it by computing and and storing them in a tree when computing the initial . But it worked anyway so I didn't care, maybe I'll optimize it later.
According to the image that we got from this the intended solution was apparently supposed to be FFT
Unpack the binary with the UPX CLI tool, patch out the while (true)
loop and use ScyllaHide to bypass the other protections.
"Diary" refers to the ext4 journal. Find all ext4 journal entries that refer to innocuous-file.txt
, because something being labelled as innocuous is usually a sign that you should investigate it.
Going through these entries, we find that starting at entry #2039 there are fragments of the flag stored in each entry:
Putting these fragments together will give us the flag.
Use $_
to refer to the last argument of the previous command, and extract characters from there
Source code of the program:
This looks like the setup for a typical CTF heap challenge. But the first thing that we notice is that there's no way to free any of the chunks we allocate, which eliminates a lot of heap exploits as malloc()
-only exploits have a limited range when it comes to corrupting the heap (most exploits depend on freeing chunks to get them into bins or otherwise modify the heap's state). Also, since gets()
adds a null byte to the end of user input, we can't leak any data directly using PKT_OPT_ECHO
either because the null byte will prevent us from reading past the end of our input. But we do have a heap overflow via gets()
, and an allocation of controlled size.
The first thing we need to figure out is how we are going to get a libc leak. Even if we find some way to get an arbitrary write, it's going to be hard to get RCE if we don't know the addresses of functions in libc (like system()
). Usually in heap challenges the strategy is to allocate a chunk that is too big for tcache at the start and then to free it, which would give us a libc leak in the fd/bk pointers (which would point inside the main_arena struct, assuming that there are no other chunks in the same bin that they could point to).
Thankfully there is a technique called the House of Orange that lets us get a free chunk without actually having access to free()
: https://github.com/shellphish/how2heap/blob/master/glibc_2.23/house_of_orange.c. The basic idea is to corrupt the size of the top chunk in the heap. The original FSOP technique used to gain RCE in House of Orange has been patched but the heap exploitation part still works. Since we have an overflow that can extend into the top chunk, we can use this to get a free chunk and get pointers to libc in the heap. Note that it won't work with small sizes like 0x11
and you might have to adjust allocation sizes if it gets too small.
However, after we get the libc pointers in the heap, we still need some way to output them or we won't be able to leak them. We can't use PKT_OPT_ECHO
to leak memory so I researched leakless heap exploitation techniques and found House of Rust, which references this technique that uses FSOP to leak libc addresses: https://vigneshsrao.github.io/posts/babytcache/. The basic idea is to corrupt the stdout
file struct so that it leaks memory in libc. However, the last byte of _IO_write_base
was set to 0x03 with the provided libc, which was not enough to leak a libc address. I ended up having to add another null byte to expand the leaked range. The exact size and contents of the leak would change based on ASLR, but I was able to find a specific libc address to leak (specifically _IO_2_1_stdin_
) by indexing backwards instead of forwards.
We now need some primitive to get arbitrary write so we can corrupt stdout
. This is where I used the hint to allocate an mmap()
ed chunk. Through some research, I found this document: https://github.com/Naetw/CTF-pwn-tips, which had this useful part about the TLS section
, which just happened to lie after mmap()
ed chunks:
Secret of a mysterious section - .tls
constraints:
- Need
malloc
function and you can malloc with arbitrary size- Arbitrary address leaking
We make
malloc
usemmap
to allocate memory(size 0x21000 is enough). In general, these pages will be placed at the address just before.tls
section.There is some useful information on
.tls
, such as the address ofmain_arena
,canary
(value of stack guard), and a strangestack address
which points to somewhere on the stack but with a fixed offset.Before calling mmap:
After call mmap:
After some more research I found out that TLS was actually Thread Local Storage. I did a search for variables with the __thread
indicator in the glibc 2.35 source to find anything that I could overwrite, and sure enough, I found out that the pointer to each thread's tcache_perthread_struct
was stored in the TLS:
For whatever strange reason the TLS would be placed before and after libc very consistently depending on the environment. On spencerpogo's Ubuntu docker container (where the host was NixOS) it would change depending on what folder the binary was placed in (and it didn't seem to be linked to the length of the file path either). Thankfully though the TLS was placed before libc on the challenge remote.
Using GDB, I found out that the default tcache_perthread_struct
was stored as a chunk at the beginning of the heap. So we could partially overwrite the tcache
variable to point somewhere else in the heap and hopefully get an arbitrary allocation. With some more experimentation with GDB I found out that the size of thecounts
array in tcache_perthread_struct
was 0x80
. If the pointer inside main_arena
that we got earlier was stored at address x
, then setting tcache
to x-0x80
would make the first tcache_entry
point inside libc.
This is the only part of the exploit that depends on being lucky because of the partial overwrite. Unfortunately, because of null bytes, the best probability we can get is . It's impossible to overwrite with only the trailing null byte because we would need an underflow to write to the resulting address. We also can't overwrite with one arbitrary byte and one null byte because the 5th nibble from the right will be different i.e. if our fake tcache_perthread_struct
is at an address that looks like 0x20000
, then the original tcache_perthread_struct
's address would start around 0x20000 - 0x1000 = 0x1f000
, which has a different 5th nibble. However, two arbitrary bytes and a null byte will work as long as the 4th nibble from the right is non-zero (so that the aforementioned problem doesn't occur).
Finally, after we get a leak, we just need to get another arbitrary write to get RCE. There are many ways you could proceed here but the simplest way is to just use Sammy Hajhamid's setcontext32, which just needs a single arbitrary write and a libc leak. Instead of finding a new write primitive, we can just reuse the one we used earlier by allocating another mmap()
ed chunk and overflowing into the TLS. Except this time we know the address of libc so we don't have to guess. Since fd and bk both point inside main_arena
we can actually get another allocation inside libc by allocating a chunk that has a size that corresponds to the entry index 1. As we know the address of libc, we can just forge another tcache_perthread_struct
here.
A problem with this is that we've corrupted the state of main_arena
so much that it'll crash before it gets to the mmap
logic. To fix this I just restored as much of the original state as possible (where the pointer at addresses x
, x+0x8
point to x-0x10
where x & 0xf
is 0). After doing this we won't have enough space for a complete tcache_perthread_struct
as we'll start corrupting some IO structures but we really only need enough space for the counts
array and our pointer to the libc GOT that we use for setcontext32.
Finally, we just need to get an allocation to the libc GOT and write our setcontext32 payload there. Unfortunately, the program will write our allocation size to the first 8 bytes of our allocation, but we need the value 0x0
in the first 8 bytes. We can bypass this by abusing an implementation quirk in libc: malloc(0x0)
will go through the typical allocation logic (and in this case end up using the tcache as it's in the range of sizes). After this, setcontext
will eventually get triggered by a function and we'll get a shell.
The hardest part of solving this using this method was letting the script until we hit the magical 1/4096 chance. It took some time but it eventually got us the flag. Sometimes tcache
would end up pointing somewhere readable/writable but not to our forged tcache_perthread_struct
which would cause the program to continue normally but without any leaks. We let it run for a few hours (restarting it if it froze/the instance ended) and Mathnetic eventually got it on the 74th try on one of his (multiple) runs.
Here's our solve script (hopefully cleaned up enough):