Alpacahack Round 4 (Rev) - Writeups (all solved)

# Alpacahack Round 4 (Rev) - Writeups (all solved) ## 1. Simple Flag Checker ### 1) Challenge Summary The binary reads input with: ```c printf("flag? "); fgets(buf, 0x32, stdin); ``` It then initializes a 16‑byte state to zero and, for **each index** `i = 0..0x30` (i.e., **49 bytes**), it does: 1. `update(&state, buf[i])` 2. `memcmp(state, table + i*0x10, 0x10)` If *all* comparisons match, the flag is accepted. **Key point:** - `table[i]` stores the **expected state after processing character `i`**. - This is an **incremental (prefix) checker**: the first wrong character immediately causes a mismatch at that iteration. --- ### 2) Why We Don’t Reverse `update()` `update()` looks intentionally nasty in decompilation: - lots of bit rotations/shifts and XOR mixing, - SIMD operations like `pmaddwd`, - multi‑round mixing loops, - modifies the 16‑byte state in non‑trivial ways. Reversing/inverting it from decompiled code is time‑consuming. But we don’t need to, because the program leaks *prefix correctness* via repeated comparisons. --- ### 3) Turn the Binary into a Prefix Oracle Because the checker does: ```c memcmp(&state, table + i*0x10, 0x10); ``` if we can learn **which iteration first fails**, we can recover the flag one byte at a time: - If the first mismatch is at index `k`, then bytes `0..k-1` are correct. - By testing candidates for the next byte and picking the one that pushes the mismatch further, we can build the flag incrementally. The program only prints `Wrong...` / `Correct...`, so we need visibility into `memcmp()`. Solution for this challenge: **hook `memcmp()` with `LD_PRELOAD`** and record the first mismatching call. --- ### 4) Hooking `memcmp()` with LD_PRELOAD Create `hookmemcmp.c`: ```c #define _GNU_SOURCE #include <dlfcn.h> #include <stdio.h> #include <string.h> static int (*real_memcmp)(const void*, const void*, size_t) = NULL; static long call_count = 0; static long first_mismatch = -1; __attribute__((constructor)) static void init_hook(void){ real_memcmp = dlsym(RTLD_NEXT, "memcmp"); call_count = 0; first_mismatch = -1; } int memcmp(const void* s1, const void* s2, size_t n){ if(!real_memcmp) real_memcmp = dlsym(RTLD_NEXT, "memcmp"); int r = real_memcmp(s1, s2, n); // The checker uses memcmp(..., 16) each iteration if(n == 16){ long idx = call_count++; if(first_mismatch < 0 && r != 0){ first_mismatch = idx; } } return r; } __attribute__((destructor)) static void fini_hook(void){ // Print first mismatch index; -1 means all comparisons matched if(first_mismatch < 0) fprintf(stderr, "MISMATCH -1\n"); else fprintf(stderr, "MISMATCH %ld\n", first_mismatch); } ``` Compile: ```bash gcc -shared -fPIC -O2 -o libhookmemcmp.so hookmemcmp.c -ldl ``` Run with preload: ```bash LD_PRELOAD=./libhookmemcmp.so ./checker ``` The hook prints something like: ``` MISMATCH 12 ``` Meaning: the first failing comparison occurred at iteration 12. --- ### 5) Python Solver (Byte‑by‑Byte Brute Force) Approach: - Build the flag incrementally. - For each position `i`, try a character set (e.g., printable ASCII). - Choose the character that makes the first mismatch index as large as possible. - If mismatch becomes `-1`, the entire flag matches. Solver example: ```python import subprocess, os BIN = "./checker" PRE = "./libhookmemcmp.so" def mismatch_index(inp49: bytes) -> int: p = subprocess.run( [BIN], input=inp49 + b"\n", stdout=subprocess.DEVNULL, stderr=subprocess.PIPE, env={**os.environ, "LD_PRELOAD": PRE}, ) line = p.stderr.decode().strip().splitlines()[-1] return int(line.split()[-1]) # "MISMATCH N" flag = bytearray() TOTAL = 49 # 0x31, as in the loop for i in range(TOTAL): best = None best_score = -10**9 for b in range(32, 127): # printable ASCII test = flag + bytes([b]) + b"A" * (TOTAL - len(flag) - 1) idx = mismatch_index(test) score = 999 if idx == -1 else idx if score > best_score: best_score = score best = b flag.append(best) print(f"[{i:02d}] now = {flag.decode(errors='ignore')!r} best_mismatch={best_score}") print("\nFLAG =", flag.decode(errors="ignore")) ``` --- ### 6) Final Flag ``` Alpaca{h4sh_4lgor1thm_1s_b4s3d_0n_MD5_4nd_keccak} ``` ## 2. Everything Silver This challenge hides the real flag checker behind a tiny “VM”: the code you *think* the child executes is just a page full of `int3` (`0xCC`). The **parent** process uses `ptrace()` to patch the *real* instructions into the child at runtime. --- ## 1) Recon: what happens when you run it? Running the binary prints a prompt: ``` flag? ``` If the input is wrong, you get “Wrong…”, otherwise “Correct!”. The interesting part is visible immediately in `main`: - The program does `fork()`. - The **child** maps RWX memory (`mmap`) and fills it with `0xCC`. - The child enables tracing via `ptrace(PTRACE_TRACEME)` and then calls into the RWX page. - The **parent** waits for `SIGTRAP` stops and repeatedly calls: - `PTRACE_GETREGS` - `PTRACE_POKETEXT` - `PTRACE_SETREGS` - `PTRACE_CONT` That is the classic pattern: *int3 sled + parent patching = self-modifying VM*. --- ## 2) Static analysis: child vs parent roles ### 2.1 Child path (simplified) The child does: 1. `page = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0)` 2. `memset(page, 0xCC, 0x1000)` 3. `ptrace(PTRACE_TRACEME, 0, 0, 0)` 4. prints `flag?` and reads up to `0x29` bytes 5. requires **≥ 0x28 bytes (40 bytes)** 6. writes a **sentinel 0 byte** right after the 40 bytes (`buf[40] = 0`) 7. calls `((void(*)(char*)) (page + 0x100))(buf)` Since the code page is all `0xCC`, executing it causes `SIGTRAP` constantly. ### 2.2 Parent path (simplified) The parent runs a loop: - `waitpid(child, &status, 0)` - Verify the child stopped because of `SIGTRAP` (expected) - `ptrace(PTRACE_GETREGS, child, 0, &regs)` - Patch one 8-byte word of “real” code using `PTRACE_POKETEXT` - Possibly mutate an internal table depending on `RIP & 0xff` - `ptrace(PTRACE_SETREGS, child, 0, &regs)` - `ptrace(PTRACE_CONT, child, 0, 0)` The parent also performs a validation check at a specific point and advances a pointer through a 10-element array. --- ## 3) Extracting the targets (`dat2`) from `.data` In the parent, there is a pointer initialized to `dat2` (seen as `0x40e0` in the disassembly). That array contains **10 × 32-bit values** that the parent compares against the child’s computed values during execution. You can extract it directly from the file (example using `od`): ```bash # .data starts at VA 0x4000, file offset 0x3000 # dat2 is at VA 0x40e0 → file offset = 0x3000 + (0x40e0 - 0x4000) = 0x30e0 od -Ax -tx4 -N 40 -j $((0x30e0)) silver ``` Which yields: ```text 66993576 3bd23991 6000c8f0 06f05a64 74843975 c77bd448 ef9ba544 6244ed09 83124ea1 4da78b03 ``` So the 10 targets are: ```text TARGET[0..9] = 0x66993576, 0x3BD23991, 0x6000C8F0, 0x06F05A64, 0x74843975, 0xC77BD448, 0xEF9BA544, 0x6244ED09, 0x83124EA1, 0x4DA78B03 ``` --- ## 4) How we recover the real checker logic Because the child’s “code page” is all `0xCC`, static disassembly of the child’s executed region is useless. The standard move is: **log what the parent writes** with `PTRACE_POKETEXT`. Two common approaches: ### Option A: GDB — break on `ptrace` and log POKETEXT 1. Tell GDB to follow the **parent**: ```gdb set follow-fork-mode parent run ``` 2. Break on `ptrace` and print arguments when `request == PTRACE_POKETEXT`. ### Option B: `LD_PRELOAD` wrapper Write a tiny shared library that wraps `ptrace()` and prints `(addr, data)` whenever `request == PTRACE_POKETEXT`. Once you dump all patched words and reconstruct the executed byte stream, the hidden “VM program” becomes normal x86-64 code again. --- ## 5) Recovered validation: two stages The reconstructed logic has two clean stages: 1) **Stage A:** a 40-byte byte-mixing transform 2) **Stage B:** 10 independent 32-bit modular equations checked against `dat2` --- ## 6) Stage A : 40-byte byte-mixing Let `in[0..39]` be the original input bytes. The program also sets a sentinel byte: - `in[40] = 0` Then Stage A transforms the buffer in place: $$ out[i] = \big(in[i] \oplus ((mul_i \cdot in[i+1]) \bmod 256) + 0x35\big)\bmod 256 \qquad \text{for } i=0..39 $$ **Multiplier schedule:** `mul_i` changes each iteration and follows: $$ mul_i = (0x47 + 3i)\bmod 256 $$ > Plain-text equivalent (if math rendering is disabled): > `out[i] = (in[i] XOR ((mul_i * in[i+1]) & 0xff) + 0x35) & 0xff` ### Inverting Stage A (important!) Because `out[i]` depends on `in[i+1]`, we invert **from the end**: - Set `in[40] = 0` - For `i = 39 .. 0`: $$ in[i] = \big(out[i]-0x35 \bmod 256\big)\ \oplus\ \big((mul_i \cdot in[i+1])\bmod 256\big) $$ This gives a unique recovery of the original input once we know `out[0..39]`. --- ## 7) Stage B : 10× 32-bit multiplicative checks After Stage A, the 40 bytes are interpreted as 10 little-endian 32-bit integers: ```text dword_i = u32le(out[4*i : 4*i+4]) for i = 0..9 ``` For each `i`, the checker enforces: $$ (dword_i \cdot IMM_i)\bmod 2^{32} = TARGET_i $$ Where `TARGET_i = dat2[i]` from the extracted array. **Immediate schedule:** the multiplier changes per iteration: - `IMM_0 = 0xDEADBEEF` - `IMM_i = (IMM_0 + i * 0x00C0FFEE) mod 2^32` Because every `IMM_i` is **odd**, it is invertible modulo `2^32`. Therefore we can solve each equation independently: $$ dword_i = (TARGET_i \cdot IMM_i^{-1})\bmod 2^{32} $$ > Plain-text equivalent (if math rendering is disabled): > - `(dword_i * IMM_i) % 2**32 = TARGET_i` > - `dword_i = (TARGET_i * inv_mod_2_32(IMM_i)) % 2**32` Once we recover all 10 `dword_i`, we pack them back to 40 bytes → that is `out[0..39]`, then invert Stage A to recover the original input. --- ## 8) Final solve pipeline : 1. **Extract** `TARGET[0..9]` from `dat2` in `.data`. 2. For each `i=0..9`: - `IMM_i = (0xDEADBEEF + i * 0x00C0FFEE) & 0xffffffff` - `dword_i = TARGET_i * inv32(IMM_i) mod 2^32` 3. Pack the 10 dwords into `out[0..39]` (little-endian). 4. Invert Stage A from `i=39..0` to recover `in[0..39]`. 5. Send `in` to the binary → prints `Correct!`. --- ## 9) Reference solver (Python) Below is a clean reference implementation of the core math. (The attached `solver_silver.py` also includes a `--check` mode to run the binary.) ```python #!/usr/bin/env python3 import struct TARGET = [ 0x66993576, 0x3BD23991, 0x6000C8F0, 0x06F05A64, 0x74843975, 0xC77BD448, 0xEF9BA544, 0x6244ED09, 0x83124EA1, 0x4DA78B03, ] def egcd(a, b): if b == 0: return (a, 1, 0) g, x, y = egcd(b, a % b) return (g, y, x - (a // b) * y) def inv_mod_2_32(x): # inverse modulo 2^32 exists iff x is odd MOD = 1 << 32 g, a, _ = egcd(x, MOD) if g != 1: raise ValueError("no inverse (x must be odd)") return a % MOD def recover_out_bytes(): out = bytearray() for i, t in enumerate(TARGET): imm = (0xDEADBEEF + i * 0x00C0FFEE) & 0xffffffff inv = inv_mod_2_32(imm) dw = (t * inv) & 0xffffffff out += struct.pack("<I", dw) return out # 40 bytes def invert_stage_a(out40): # out40 = out[0..39], with in[40]=0 sentinel out40 = list(out40) inp = [0] * 41 inp[40] = 0 for i in range(39, -1, -1): mul = (0x47 + 3*i) & 0xff rhs = (mul * inp[i+1]) & 0xff inp[i] = ((out40[i] - 0x35) & 0xff) ^ rhs return bytes(inp[:40]) def main(): out40 = recover_out_bytes() flag = invert_stage_a(out40) print(flag.decode("ascii")) if __name__ == "__main__": main() ``` --- ## Flag ``` Alpaca{Ev3ry7h1ng_3ve2y7h1n9_a11_cccccc} ``` --- ## 3. Pytecode --- ### 1) High-level overview `chall.py` does: 1. Enforces **Python 3.11**. 2. Reads `flag`, checks: - length is **64 bytes** - bytes are **ASCII (< 0x80)** 3. Converts a long hex string into bytes and runs `pickle.loads(...)`. 4. If the pickle result is `True`, the flag is correct. So the “real” checker is hidden inside a pickle payload. ### 2) Why Python 3.11 matters The payload reconstructs **bytecode** at runtime by creating `types.CodeType` and `types.FunctionType`. Bytecode / `CodeType` layouts change across versions, so Python 3.12 often crashes or throws low-level errors. That’s why the distfile includes a Docker image pinned to Python 3.11. ### 3) What the pickle does (conceptually) Disassembling the pickle (e.g. `pickletools.dis`) shows it: - loads globals like `types.FunctionType` / `types.CodeType` - uses `REDUCE` to build functions and execute them Conceptually, the payload implements a verification pipeline: 1. Parse the 64-byte input into **8 big-endian 64-bit words** 2. Apply several reversible bitwise/swap steps 3. XOR the list with a PRNG keystream 4. Compare against a fixed target list `EXPECTED` ### 4) The transform pipeline Let `L` be 8 unsigned 64-bit words: 1. `L[0] ^= 1384820248286204727` 2. swap `L[2] <-> L[7]` 3. `L[4] = abs(L[4] - 9259542123273814144)` 4. `L[1] = ROR64(L[1], 13)` 5. `L[1] ^= L[6]` 6. `L[5] ^= (L[5] >> 12)` 7. swap `L[3] <-> L[6]` 8. Stream XOR (PRNG): - initial `s = 1384596539316466481` - for i=0..7: - `L[i] ^= s` - `s = (s*42 + 1337) mod 2^64` 9. Check `L == EXPECTED` Target list: ``` [4714351799183481704, 9920649113211881281, 1577726041120304458, 11479115687019652446, 3159180328194279316, 7893169807958830647, 9845433486767146114, 7349253534905816330] ``` ### 5) Solving approach: invert everything All steps are reversible (or nearly): - Stream XOR is involutive: XOR the same keystream again to undo - Swaps undo themselves - XORs undo themselves - Rotate-right undoes with rotate-left - `x ^= x >> k` is inverted via a standard xorshift “unmix” routine - `abs(x - C)` has two possible preimages (`C + y` or `C - y`), try both and keep the one that yields valid ASCII and passes a forward re-check. Pack the recovered words back into 64 bytes (big-endian) and decode ASCII. ### 6) Flag Recovered flag: `Alpaca{not_only_the_cake,_varnames,_filenames_and_etc._are_lies}` ### 7) Solver ```python #!/usr/bin/env python3 # solve.py — static solver (doesn't execute the pickle). Works on Python 3.x. MASK64 = (1 << 64) - 1 # Extracted constants from chall's pickle pipeline EXPECTED = [ 4714351799183481704, 9920649113211881281, 1577726041120304458, 11479115687019652446, 3159180328194279316, 7893169807958830647, 9845433486767146114, 7349253534905816330, ] XOR0_CONST = 1384820248286204727 SWAP1 = (2, 7) ABSDIFF_IDX = 4 ABSDIFF_C = 9259542123273814144 ROR_IDX = 1 ROR_SHIFT = 13 XORCOMB_I = 1 XORCOMB_J = 6 XORSHIFT_IDX = 5 XORSHIFT_S = 12 SWAP2 = (3, 6) # Keystream step: s = (s*42 + 1337) mod 2^64; L[i] ^= s for i=0..7 STREAM_SEED = 1384596539316466481 STREAM_MUL = 42 STREAM_ADD = 1337 def rol64(x: int, r: int) -> int: r %= 64 return ((x << r) & MASK64) | (x >> (64 - r)) def ror64(x: int, r: int) -> int: r %= 64 return ((x >> r) | ((x << (64 - r)) & MASK64)) & MASK64 def inv_rshift_xor(y: int, s: int) -> int: # Invert y = x ^ (x >> s) for 64-bit x. x = y & MASK64 k = s while k < 64: x ^= x >> k k *= 2 return x & MASK64 def stream_xor(seed: int, L: list[int]) -> list[int]: out = L[:] s = seed & MASK64 for i in range(8): out[i] = (out[i] ^ s) & MASK64 s = (s * STREAM_MUL + STREAM_ADD) & MASK64 return out def words_to_bytes(words: list[int]) -> bytes: return b"".join((w & MASK64).to_bytes(8, "big") for w in words) def forward(words: list[int]) -> list[int]: # Forward transform, for sanity-check. L = words[:] L[0] ^= XOR0_CONST a, b = SWAP1 L[a], L[b] = L[b], L[a] L[ABSDIFF_IDX] = abs(L[ABSDIFF_IDX] - ABSDIFF_C) L[ROR_IDX] = ror64(L[ROR_IDX], ROR_SHIFT) L[XORCOMB_I] ^= L[XORCOMB_J] L[XORSHIFT_IDX] ^= (L[XORSHIFT_IDX] >> XORSHIFT_S) a, b = SWAP2 L[a], L[b] = L[b], L[a] L = stream_xor(STREAM_SEED, L) return L def solve() -> str: # Start from EXPECTED and invert the pipeline L = EXPECTED[:] # inv stream (XOR twice cancels) L = stream_xor(STREAM_SEED, L) # inv swap2 a, b = SWAP2 L[a], L[b] = L[b], L[a] # inv xorshift L[XORSHIFT_IDX] = inv_rshift_xor(L[XORSHIFT_IDX], XORSHIFT_S) # inv xorcombine L[XORCOMB_I] ^= L[XORCOMB_J] # inv rotate L[ROR_IDX] = rol64(L[ROR_IDX], ROR_SHIFT) # inv absdiff (two branches) y = L[ABSDIFF_IDX] for x in (ABSDIFF_C + y, ABSDIFF_C - y): if not (0 <= x < (1 << 64)): continue L2 = L[:] L2[ABSDIFF_IDX] = x # inv swap1 a, b = SWAP1 L2[a], L2[b] = L2[b], L2[a] # inv xor0 L2[0] ^= XOR0_CONST raw = words_to_bytes(L2) if len(raw) == 64 and all(c < 0x80 for c in raw) and forward(L2) == EXPECTED: return raw.decode("ascii") raise RuntimeError("No valid candidate found") if __name__ == "__main__": print(solve()) ``` ## 4. I2C Device --- ### Challenge overview The dist contains a Linux kernel + initrd (rootfs) booted under QEMU, plus a custom device: ```sh -device i2c-alpaca,address=0x41 ``` So the core idea is: **a userspace program in the guest talks to an I2C device**, and that device performs a crypto-based check on the submitted “flag”. --- ### Recon: `run.sh` and the rootfs `run.sh` boots `bzImage` with `rootfs.cpio`. Inside the initrd, `/bin/checkflag` is the interactive checker. --- ### Reversing `checkflag`: it speaks to `/dev/i2c-0` From `strings`/`objdump` you can see: - opens `"/dev/i2c-0"` - selects I2C slave address `0x41` - embeds the 16-byte key string `AlpacaHackRound4` - contains a 64-byte reference table (“answer”) in `.rodata` The high-level logic is: ```c read user_input[0..63]; // scanf("%63s") open("/dev/i2c-0"); ioctl(fd, I2C_SLAVE, 0x41); send(0xff, 0x00); // reset device for i in 0..15: iv[i] = recv(i); // read initial 16-byte IV for i in 0..15: send(i, key[i]); // write the 16-byte key prev = iv; for block = 0..3: for i in 0..15: tmp[i] = user_input[block*16+i] ^ prev[i]; send(0x10+i, tmp[i]); // write block to regs 0x10..0x1f send(0xfe, 0x00); // trigger crypto for i in 0..15: out[i] = recv(i); // read regs 0..15 ok &= (out == answer_block[block]); // compare to embedded table prev = answer_block[block]; // chaining ``` Because the input buffer is zeroed before reading, the “padding” for shorter strings is just trailing NUL bytes. --- ### Reversing the QEMU device: it uses AES **decrypt** The provided QEMU binary is not stripped. The device implementation is in functions like: - `hyena_state_reset` - `hyena_send` - `hyena_recv` From disassembly: **Register map:** - `0x00..0x0f`: 16-byte **key** - `0x10..0x1f`: 16-byte **input block** - `0xff`: reset (restores the initial IV) - `0xfe`: perform crypto On `0xfe`, the device calls: - `QEMU_AES_set_decrypt_key(key)` - `QEMU_AES_decrypt(in_block, out_buf, keyctx)` So the device outputs **AES-128-ECB decryption** of the submitted 16-byte block. On reset, the device seeds the output buffer with a constant IV: ``` 79 68 57 46 35 24 13 02 f1 e0 df ce bd ac 9b 8a ``` --- ### What’s the math? The checker verifies: - `tmp = P XOR prev` - `C = AES_DEC(tmp)` ← returned by the device - `prev = C` This is CBC-like chaining, but using **AES decrypt as the forward primitive**. To recover plaintext: - `tmp = AES_ENC(C)` (inverse of AES_DEC) - `P = tmp XOR prev` With: - `key = "AlpacaHackRound4"` - `prev0 = IV` - `C` = the 64-byte “answer” table in `checkflag` --- ### Offline recovery Take the 4 ciphertext blocks from `.rodata` and do: 1. `tmp = AES_ECB_ENCRYPT(Ci, key)` 2. `Pi = tmp XOR prev` 3. `prev = Ci` The result is the flag string plus a few trailing `\x00` bytes. --- ## Solver (Python) ```python #!/usr/bin/env python3 from Crypto.Cipher import AES KEY = b"AlpacaHackRound4" IV = bytes.fromhex("7968574635241302f1e0dfcebdac9b8a") CIPHERTEXT = bytes.fromhex( "6e45ebb3f3e912104f6dd579e85338ec" "7bb8d0f07c85ae56b4a470ea90d96ded" "6e2f3f63b0bbfa44137165a219bdce0f" "0709213ea1f83907fe1f9e3c3fec49e8" ) def main(): aes = AES.new(KEY, AES.MODE_ECB) prev = IV pt = b"" for off in range(0, len(CIPHERTEXT), 16): c = CIPHERTEXT[off:off+16] tmp = aes.encrypt(c) # inverse of AES.decrypt p = bytes([tmp[i] ^ prev[i] for i in range(16)]) pt += p prev = c # strip trailing NUL padding (because input was memset(0)) print(pt.rstrip(b"\x00").decode()) if __name__ == "__main__": main() ``` --- ## Flag > **Flag:** `Alpaca{SMbus_f0r_t3mpeR4tUr3_f4n_v0lTag3_LED_4nd_th3_fl4g}` # Notes Thanks for reading! If you like Reverse Engineering writeups like this, consider following for more CTF notes, tooling tips, and reversing tricks.