# Alpacahack Round 4 (Rev) - Writeups (all solved)
## 1. Simple Flag Checker
### 1) Challenge Summary
The binary reads input with:
```c
printf("flag? ");
fgets(buf, 0x32, stdin);
```
It then initializes a 16‑byte state to zero and, for **each index** `i = 0..0x30` (i.e., **49 bytes**), it does:
1. `update(&state, buf[i])`
2. `memcmp(state, table + i*0x10, 0x10)`
If *all* comparisons match, the flag is accepted.
**Key point:**
- `table[i]` stores the **expected state after processing character `i`**.
- This is an **incremental (prefix) checker**: the first wrong character immediately causes a mismatch at that iteration.
---
### 2) Why We Don’t Reverse `update()`
`update()` looks intentionally nasty in decompilation:
- lots of bit rotations/shifts and XOR mixing,
- SIMD operations like `pmaddwd`,
- multi‑round mixing loops,
- modifies the 16‑byte state in non‑trivial ways.
Reversing/inverting it from decompiled code is time‑consuming.
But we don’t need to, because the program leaks *prefix correctness* via repeated comparisons.
---
### 3) Turn the Binary into a Prefix Oracle
Because the checker does:
```c
memcmp(&state, table + i*0x10, 0x10);
```
if we can learn **which iteration first fails**, we can recover the flag one byte at a time:
- If the first mismatch is at index `k`, then bytes `0..k-1` are correct.
- By testing candidates for the next byte and picking the one that pushes the mismatch further, we can build the flag incrementally.
The program only prints `Wrong...` / `Correct...`, so we need visibility into `memcmp()`.
Solution for this challenge:
**hook `memcmp()` with `LD_PRELOAD`** and record the first mismatching call.
---
### 4) Hooking `memcmp()` with LD_PRELOAD
Create `hookmemcmp.c`:
```c
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <string.h>
static int (*real_memcmp)(const void*, const void*, size_t) = NULL;
static long call_count = 0;
static long first_mismatch = -1;
__attribute__((constructor))
static void init_hook(void){
real_memcmp = dlsym(RTLD_NEXT, "memcmp");
call_count = 0;
first_mismatch = -1;
}
int memcmp(const void* s1, const void* s2, size_t n){
if(!real_memcmp) real_memcmp = dlsym(RTLD_NEXT, "memcmp");
int r = real_memcmp(s1, s2, n);
// The checker uses memcmp(..., 16) each iteration
if(n == 16){
long idx = call_count++;
if(first_mismatch < 0 && r != 0){
first_mismatch = idx;
}
}
return r;
}
__attribute__((destructor))
static void fini_hook(void){
// Print first mismatch index; -1 means all comparisons matched
if(first_mismatch < 0) fprintf(stderr, "MISMATCH -1\n");
else fprintf(stderr, "MISMATCH %ld\n", first_mismatch);
}
```
Compile:
```bash
gcc -shared -fPIC -O2 -o libhookmemcmp.so hookmemcmp.c -ldl
```
Run with preload:
```bash
LD_PRELOAD=./libhookmemcmp.so ./checker
```
The hook prints something like:
```
MISMATCH 12
```
Meaning: the first failing comparison occurred at iteration 12.
---
### 5) Python Solver (Byte‑by‑Byte Brute Force)
Approach:
- Build the flag incrementally.
- For each position `i`, try a character set (e.g., printable ASCII).
- Choose the character that makes the first mismatch index as large as possible.
- If mismatch becomes `-1`, the entire flag matches.
Solver example:
```python
import subprocess, os
BIN = "./checker"
PRE = "./libhookmemcmp.so"
def mismatch_index(inp49: bytes) -> int:
p = subprocess.run(
[BIN],
input=inp49 + b"\n",
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE,
env={**os.environ, "LD_PRELOAD": PRE},
)
line = p.stderr.decode().strip().splitlines()[-1]
return int(line.split()[-1]) # "MISMATCH N"
flag = bytearray()
TOTAL = 49 # 0x31, as in the loop
for i in range(TOTAL):
best = None
best_score = -10**9
for b in range(32, 127): # printable ASCII
test = flag + bytes([b]) + b"A" * (TOTAL - len(flag) - 1)
idx = mismatch_index(test)
score = 999 if idx == -1 else idx
if score > best_score:
best_score = score
best = b
flag.append(best)
print(f"[{i:02d}] now = {flag.decode(errors='ignore')!r} best_mismatch={best_score}")
print("\nFLAG =", flag.decode(errors="ignore"))
```
---
### 6) Final Flag
```
Alpaca{h4sh_4lgor1thm_1s_b4s3d_0n_MD5_4nd_keccak}
```
## 2. Everything Silver
This challenge hides the real flag checker behind a tiny “VM”: the code you *think* the child executes is just a page full of `int3` (`0xCC`). The **parent** process uses `ptrace()` to patch the *real* instructions into the child at runtime.
---
## 1) Recon: what happens when you run it?
Running the binary prints a prompt:
```
flag?
```
If the input is wrong, you get “Wrong…”, otherwise “Correct!”.
The interesting part is visible immediately in `main`:
- The program does `fork()`.
- The **child** maps RWX memory (`mmap`) and fills it with `0xCC`.
- The child enables tracing via `ptrace(PTRACE_TRACEME)` and then calls into the RWX page.
- The **parent** waits for `SIGTRAP` stops and repeatedly calls:
- `PTRACE_GETREGS`
- `PTRACE_POKETEXT`
- `PTRACE_SETREGS`
- `PTRACE_CONT`
That is the classic pattern: *int3 sled + parent patching = self-modifying VM*.
---
## 2) Static analysis: child vs parent roles
### 2.1 Child path (simplified)
The child does:
1. `page = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0)`
2. `memset(page, 0xCC, 0x1000)`
3. `ptrace(PTRACE_TRACEME, 0, 0, 0)`
4. prints `flag?` and reads up to `0x29` bytes
5. requires **≥ 0x28 bytes (40 bytes)**
6. writes a **sentinel 0 byte** right after the 40 bytes (`buf[40] = 0`)
7. calls `((void(*)(char*)) (page + 0x100))(buf)`
Since the code page is all `0xCC`, executing it causes `SIGTRAP` constantly.
### 2.2 Parent path (simplified)
The parent runs a loop:
- `waitpid(child, &status, 0)`
- Verify the child stopped because of `SIGTRAP` (expected)
- `ptrace(PTRACE_GETREGS, child, 0, ®s)`
- Patch one 8-byte word of “real” code using `PTRACE_POKETEXT`
- Possibly mutate an internal table depending on `RIP & 0xff`
- `ptrace(PTRACE_SETREGS, child, 0, ®s)`
- `ptrace(PTRACE_CONT, child, 0, 0)`
The parent also performs a validation check at a specific point and advances a pointer through a 10-element array.
---
## 3) Extracting the targets (`dat2`) from `.data`
In the parent, there is a pointer initialized to `dat2` (seen as `0x40e0` in the disassembly). That array contains **10 × 32-bit values** that the parent compares against the child’s computed values during execution.
You can extract it directly from the file (example using `od`):
```bash
# .data starts at VA 0x4000, file offset 0x3000
# dat2 is at VA 0x40e0 → file offset = 0x3000 + (0x40e0 - 0x4000) = 0x30e0
od -Ax -tx4 -N 40 -j $((0x30e0)) silver
```
Which yields:
```text
66993576 3bd23991 6000c8f0 06f05a64
74843975 c77bd448 ef9ba544 6244ed09
83124ea1 4da78b03
```
So the 10 targets are:
```text
TARGET[0..9] =
0x66993576, 0x3BD23991, 0x6000C8F0, 0x06F05A64, 0x74843975,
0xC77BD448, 0xEF9BA544, 0x6244ED09, 0x83124EA1, 0x4DA78B03
```
---
## 4) How we recover the real checker logic
Because the child’s “code page” is all `0xCC`, static disassembly of the child’s executed region is useless.
The standard move is: **log what the parent writes** with `PTRACE_POKETEXT`.
Two common approaches:
### Option A: GDB — break on `ptrace` and log POKETEXT
1. Tell GDB to follow the **parent**:
```gdb
set follow-fork-mode parent
run
```
2. Break on `ptrace` and print arguments when `request == PTRACE_POKETEXT`.
### Option B: `LD_PRELOAD` wrapper
Write a tiny shared library that wraps `ptrace()` and prints `(addr, data)` whenever `request == PTRACE_POKETEXT`.
Once you dump all patched words and reconstruct the executed byte stream, the hidden “VM program” becomes normal x86-64 code again.
---
## 5) Recovered validation: two stages
The reconstructed logic has two clean stages:
1) **Stage A:** a 40-byte byte-mixing transform
2) **Stage B:** 10 independent 32-bit modular equations checked against `dat2`
---
## 6) Stage A : 40-byte byte-mixing
Let `in[0..39]` be the original input bytes.
The program also sets a sentinel byte:
- `in[40] = 0`
Then Stage A transforms the buffer in place:
$$
out[i] = \big(in[i] \oplus ((mul_i \cdot in[i+1]) \bmod 256) + 0x35\big)\bmod 256
\qquad \text{for } i=0..39
$$
**Multiplier schedule:** `mul_i` changes each iteration and follows:
$$
mul_i = (0x47 + 3i)\bmod 256
$$
> Plain-text equivalent (if math rendering is disabled):
> `out[i] = (in[i] XOR ((mul_i * in[i+1]) & 0xff) + 0x35) & 0xff`
### Inverting Stage A (important!)
Because `out[i]` depends on `in[i+1]`, we invert **from the end**:
- Set `in[40] = 0`
- For `i = 39 .. 0`:
$$
in[i] = \big(out[i]-0x35 \bmod 256\big)\ \oplus\ \big((mul_i \cdot in[i+1])\bmod 256\big)
$$
This gives a unique recovery of the original input once we know `out[0..39]`.
---
## 7) Stage B : 10× 32-bit multiplicative checks
After Stage A, the 40 bytes are interpreted as 10 little-endian 32-bit integers:
```text
dword_i = u32le(out[4*i : 4*i+4]) for i = 0..9
```
For each `i`, the checker enforces:
$$
(dword_i \cdot IMM_i)\bmod 2^{32} = TARGET_i
$$
Where `TARGET_i = dat2[i]` from the extracted array.
**Immediate schedule:** the multiplier changes per iteration:
- `IMM_0 = 0xDEADBEEF`
- `IMM_i = (IMM_0 + i * 0x00C0FFEE) mod 2^32`
Because every `IMM_i` is **odd**, it is invertible modulo `2^32`. Therefore we can solve each equation independently:
$$
dword_i = (TARGET_i \cdot IMM_i^{-1})\bmod 2^{32}
$$
> Plain-text equivalent (if math rendering is disabled):
> - `(dword_i * IMM_i) % 2**32 = TARGET_i`
> - `dword_i = (TARGET_i * inv_mod_2_32(IMM_i)) % 2**32`
Once we recover all 10 `dword_i`, we pack them back to 40 bytes → that is `out[0..39]`, then invert Stage A to recover the original input.
---
## 8) Final solve pipeline :
1. **Extract** `TARGET[0..9]` from `dat2` in `.data`.
2. For each `i=0..9`:
- `IMM_i = (0xDEADBEEF + i * 0x00C0FFEE) & 0xffffffff`
- `dword_i = TARGET_i * inv32(IMM_i) mod 2^32`
3. Pack the 10 dwords into `out[0..39]` (little-endian).
4. Invert Stage A from `i=39..0` to recover `in[0..39]`.
5. Send `in` to the binary → prints `Correct!`.
---
## 9) Reference solver (Python)
Below is a clean reference implementation of the core math. (The attached `solver_silver.py` also includes a `--check` mode to run the binary.)
```python
#!/usr/bin/env python3
import struct
TARGET = [
0x66993576, 0x3BD23991, 0x6000C8F0, 0x06F05A64, 0x74843975,
0xC77BD448, 0xEF9BA544, 0x6244ED09, 0x83124EA1, 0x4DA78B03,
]
def egcd(a, b):
if b == 0:
return (a, 1, 0)
g, x, y = egcd(b, a % b)
return (g, y, x - (a // b) * y)
def inv_mod_2_32(x):
# inverse modulo 2^32 exists iff x is odd
MOD = 1 << 32
g, a, _ = egcd(x, MOD)
if g != 1:
raise ValueError("no inverse (x must be odd)")
return a % MOD
def recover_out_bytes():
out = bytearray()
for i, t in enumerate(TARGET):
imm = (0xDEADBEEF + i * 0x00C0FFEE) & 0xffffffff
inv = inv_mod_2_32(imm)
dw = (t * inv) & 0xffffffff
out += struct.pack("<I", dw)
return out # 40 bytes
def invert_stage_a(out40):
# out40 = out[0..39], with in[40]=0 sentinel
out40 = list(out40)
inp = [0] * 41
inp[40] = 0
for i in range(39, -1, -1):
mul = (0x47 + 3*i) & 0xff
rhs = (mul * inp[i+1]) & 0xff
inp[i] = ((out40[i] - 0x35) & 0xff) ^ rhs
return bytes(inp[:40])
def main():
out40 = recover_out_bytes()
flag = invert_stage_a(out40)
print(flag.decode("ascii"))
if __name__ == "__main__":
main()
```
---
## Flag
```
Alpaca{Ev3ry7h1ng_3ve2y7h1n9_a11_cccccc}
```
---
## 3. Pytecode
---
### 1) High-level overview
`chall.py` does:
1. Enforces **Python 3.11**.
2. Reads `flag`, checks:
- length is **64 bytes**
- bytes are **ASCII (< 0x80)**
3. Converts a long hex string into bytes and runs `pickle.loads(...)`.
4. If the pickle result is `True`, the flag is correct.
So the “real” checker is hidden inside a pickle payload.
### 2) Why Python 3.11 matters
The payload reconstructs **bytecode** at runtime by creating `types.CodeType` and `types.FunctionType`.
Bytecode / `CodeType` layouts change across versions, so Python 3.12 often crashes or throws low-level errors.
That’s why the distfile includes a Docker image pinned to Python 3.11.
### 3) What the pickle does (conceptually)
Disassembling the pickle (e.g. `pickletools.dis`) shows it:
- loads globals like `types.FunctionType` / `types.CodeType`
- uses `REDUCE` to build functions and execute them
Conceptually, the payload implements a verification pipeline:
1. Parse the 64-byte input into **8 big-endian 64-bit words**
2. Apply several reversible bitwise/swap steps
3. XOR the list with a PRNG keystream
4. Compare against a fixed target list `EXPECTED`
### 4) The transform pipeline
Let `L` be 8 unsigned 64-bit words:
1. `L[0] ^= 1384820248286204727`
2. swap `L[2] <-> L[7]`
3. `L[4] = abs(L[4] - 9259542123273814144)`
4. `L[1] = ROR64(L[1], 13)`
5. `L[1] ^= L[6]`
6. `L[5] ^= (L[5] >> 12)`
7. swap `L[3] <-> L[6]`
8. Stream XOR (PRNG):
- initial `s = 1384596539316466481`
- for i=0..7:
- `L[i] ^= s`
- `s = (s*42 + 1337) mod 2^64`
9. Check `L == EXPECTED`
Target list:
```
[4714351799183481704,
9920649113211881281,
1577726041120304458,
11479115687019652446,
3159180328194279316,
7893169807958830647,
9845433486767146114,
7349253534905816330]
```
### 5) Solving approach: invert everything
All steps are reversible (or nearly):
- Stream XOR is involutive: XOR the same keystream again to undo
- Swaps undo themselves
- XORs undo themselves
- Rotate-right undoes with rotate-left
- `x ^= x >> k` is inverted via a standard xorshift “unmix” routine
- `abs(x - C)` has two possible preimages (`C + y` or `C - y`), try both and keep the one that yields valid ASCII and passes a forward re-check.
Pack the recovered words back into 64 bytes (big-endian) and decode ASCII.
### 6) Flag
Recovered flag:
`Alpaca{not_only_the_cake,_varnames,_filenames_and_etc._are_lies}`
### 7) Solver
```python
#!/usr/bin/env python3
# solve.py — static solver (doesn't execute the pickle). Works on Python 3.x.
MASK64 = (1 << 64) - 1
# Extracted constants from chall's pickle pipeline
EXPECTED = [
4714351799183481704,
9920649113211881281,
1577726041120304458,
11479115687019652446,
3159180328194279316,
7893169807958830647,
9845433486767146114,
7349253534905816330,
]
XOR0_CONST = 1384820248286204727
SWAP1 = (2, 7)
ABSDIFF_IDX = 4
ABSDIFF_C = 9259542123273814144
ROR_IDX = 1
ROR_SHIFT = 13
XORCOMB_I = 1
XORCOMB_J = 6
XORSHIFT_IDX = 5
XORSHIFT_S = 12
SWAP2 = (3, 6)
# Keystream step: s = (s*42 + 1337) mod 2^64; L[i] ^= s for i=0..7
STREAM_SEED = 1384596539316466481
STREAM_MUL = 42
STREAM_ADD = 1337
def rol64(x: int, r: int) -> int:
r %= 64
return ((x << r) & MASK64) | (x >> (64 - r))
def ror64(x: int, r: int) -> int:
r %= 64
return ((x >> r) | ((x << (64 - r)) & MASK64)) & MASK64
def inv_rshift_xor(y: int, s: int) -> int:
# Invert y = x ^ (x >> s) for 64-bit x.
x = y & MASK64
k = s
while k < 64:
x ^= x >> k
k *= 2
return x & MASK64
def stream_xor(seed: int, L: list[int]) -> list[int]:
out = L[:]
s = seed & MASK64
for i in range(8):
out[i] = (out[i] ^ s) & MASK64
s = (s * STREAM_MUL + STREAM_ADD) & MASK64
return out
def words_to_bytes(words: list[int]) -> bytes:
return b"".join((w & MASK64).to_bytes(8, "big") for w in words)
def forward(words: list[int]) -> list[int]:
# Forward transform, for sanity-check.
L = words[:]
L[0] ^= XOR0_CONST
a, b = SWAP1
L[a], L[b] = L[b], L[a]
L[ABSDIFF_IDX] = abs(L[ABSDIFF_IDX] - ABSDIFF_C)
L[ROR_IDX] = ror64(L[ROR_IDX], ROR_SHIFT)
L[XORCOMB_I] ^= L[XORCOMB_J]
L[XORSHIFT_IDX] ^= (L[XORSHIFT_IDX] >> XORSHIFT_S)
a, b = SWAP2
L[a], L[b] = L[b], L[a]
L = stream_xor(STREAM_SEED, L)
return L
def solve() -> str:
# Start from EXPECTED and invert the pipeline
L = EXPECTED[:]
# inv stream (XOR twice cancels)
L = stream_xor(STREAM_SEED, L)
# inv swap2
a, b = SWAP2
L[a], L[b] = L[b], L[a]
# inv xorshift
L[XORSHIFT_IDX] = inv_rshift_xor(L[XORSHIFT_IDX], XORSHIFT_S)
# inv xorcombine
L[XORCOMB_I] ^= L[XORCOMB_J]
# inv rotate
L[ROR_IDX] = rol64(L[ROR_IDX], ROR_SHIFT)
# inv absdiff (two branches)
y = L[ABSDIFF_IDX]
for x in (ABSDIFF_C + y, ABSDIFF_C - y):
if not (0 <= x < (1 << 64)):
continue
L2 = L[:]
L2[ABSDIFF_IDX] = x
# inv swap1
a, b = SWAP1
L2[a], L2[b] = L2[b], L2[a]
# inv xor0
L2[0] ^= XOR0_CONST
raw = words_to_bytes(L2)
if len(raw) == 64 and all(c < 0x80 for c in raw) and forward(L2) == EXPECTED:
return raw.decode("ascii")
raise RuntimeError("No valid candidate found")
if __name__ == "__main__":
print(solve())
```
## 4. I2C Device
---
### Challenge overview
The dist contains a Linux kernel + initrd (rootfs) booted under QEMU, plus a custom device:
```sh
-device i2c-alpaca,address=0x41
```
So the core idea is: **a userspace program in the guest talks to an I2C device**, and that device performs a crypto-based check on the submitted “flag”.
---
### Recon: `run.sh` and the rootfs
`run.sh` boots `bzImage` with `rootfs.cpio`. Inside the initrd, `/bin/checkflag` is the interactive checker.
---
### Reversing `checkflag`: it speaks to `/dev/i2c-0`
From `strings`/`objdump` you can see:
- opens `"/dev/i2c-0"`
- selects I2C slave address `0x41`
- embeds the 16-byte key string `AlpacaHackRound4`
- contains a 64-byte reference table (“answer”) in `.rodata`
The high-level logic is:
```c
read user_input[0..63]; // scanf("%63s")
open("/dev/i2c-0");
ioctl(fd, I2C_SLAVE, 0x41);
send(0xff, 0x00); // reset device
for i in 0..15: iv[i] = recv(i); // read initial 16-byte IV
for i in 0..15: send(i, key[i]); // write the 16-byte key
prev = iv;
for block = 0..3:
for i in 0..15:
tmp[i] = user_input[block*16+i] ^ prev[i];
send(0x10+i, tmp[i]); // write block to regs 0x10..0x1f
send(0xfe, 0x00); // trigger crypto
for i in 0..15: out[i] = recv(i); // read regs 0..15
ok &= (out == answer_block[block]); // compare to embedded table
prev = answer_block[block]; // chaining
```
Because the input buffer is zeroed before reading, the “padding” for shorter strings is just trailing NUL bytes.
---
### Reversing the QEMU device: it uses AES **decrypt**
The provided QEMU binary is not stripped. The device implementation is in functions like:
- `hyena_state_reset`
- `hyena_send`
- `hyena_recv`
From disassembly:
**Register map:**
- `0x00..0x0f`: 16-byte **key**
- `0x10..0x1f`: 16-byte **input block**
- `0xff`: reset (restores the initial IV)
- `0xfe`: perform crypto
On `0xfe`, the device calls:
- `QEMU_AES_set_decrypt_key(key)`
- `QEMU_AES_decrypt(in_block, out_buf, keyctx)`
So the device outputs **AES-128-ECB decryption** of the submitted 16-byte block.
On reset, the device seeds the output buffer with a constant IV:
```
79 68 57 46 35 24 13 02 f1 e0 df ce bd ac 9b 8a
```
---
### What’s the math?
The checker verifies:
- `tmp = P XOR prev`
- `C = AES_DEC(tmp)` ← returned by the device
- `prev = C`
This is CBC-like chaining, but using **AES decrypt as the forward primitive**.
To recover plaintext:
- `tmp = AES_ENC(C)` (inverse of AES_DEC)
- `P = tmp XOR prev`
With:
- `key = "AlpacaHackRound4"`
- `prev0 = IV`
- `C` = the 64-byte “answer” table in `checkflag`
---
### Offline recovery
Take the 4 ciphertext blocks from `.rodata` and do:
1. `tmp = AES_ECB_ENCRYPT(Ci, key)`
2. `Pi = tmp XOR prev`
3. `prev = Ci`
The result is the flag string plus a few trailing `\x00` bytes.
---
## Solver (Python)
```python
#!/usr/bin/env python3
from Crypto.Cipher import AES
KEY = b"AlpacaHackRound4"
IV = bytes.fromhex("7968574635241302f1e0dfcebdac9b8a")
CIPHERTEXT = bytes.fromhex(
"6e45ebb3f3e912104f6dd579e85338ec"
"7bb8d0f07c85ae56b4a470ea90d96ded"
"6e2f3f63b0bbfa44137165a219bdce0f"
"0709213ea1f83907fe1f9e3c3fec49e8"
)
def main():
aes = AES.new(KEY, AES.MODE_ECB)
prev = IV
pt = b""
for off in range(0, len(CIPHERTEXT), 16):
c = CIPHERTEXT[off:off+16]
tmp = aes.encrypt(c) # inverse of AES.decrypt
p = bytes([tmp[i] ^ prev[i] for i in range(16)])
pt += p
prev = c
# strip trailing NUL padding (because input was memset(0))
print(pt.rstrip(b"\x00").decode())
if __name__ == "__main__":
main()
```
---
## Flag
> **Flag:** `Alpaca{SMbus_f0r_t3mpeR4tUr3_f4n_v0lTag3_LED_4nd_th3_fl4g}`
# Notes
Thanks for reading! If you like Reverse Engineering writeups like this, consider following for more CTF notes, tooling tips, and reversing tricks.