Google CTF 2020: Oracle
===
I was teamed-up with Black Bauhinia on Google CTF this time. I have solved 7 challenges alone and 3 challenges with my teammates.
In particular, _Oracle_ is a crypto challenge with 13 solves. It has got me spending 12 hours. All in all, it was a great experience in terms of learning, but my liver hurts. This piece of writeup may be _very_ computation intensive, just because I would like to make everything clear.
## Challenge Summary
There are two parts of the challenges. In the first part, we are required to recover an internal state for AEGIS-128L given the encryption oracle. For the second part, we are required to forge a ciphertext given an error oracle from decryption.
## Solution
### Part I: A brief summary for the state in AEGIS-128L
AEGIS-128L has an internal state that is initially computed solely by the _key_ and the _IV_. It is of 128 bytes, broken into eight 16-byte blocks. Let's $S_i$ is updated to $S_{i+1}$ given 32-byte payload $M$. Let's define $S_i = (s_{i, 0}, s_{i, 1}, ..., s_{i, 7})$ and $M = (m_0, m_1)$. We have:
* $s_{i+1, 0} \leftarrow \text{AESEnc}(s_{i, 7}, s_{i, 0}) \oplus m_0$,
* $s_{i+1, 4} \leftarrow \text{AESEnc}(s_{i, 3}, s_{i, 4}) \oplus m_1$, and
* $s_{i+1, j} \leftarrow \text{AESEnc}(s_{i, j-1}, s_{i, j})$ for $j = 1, 2, 3, 5, 6, 7$.
But what is `AESEnc`? Let's see the implementation.
```python
def aes_enc(s: block, t: block) -> block:
"""Performs the AESENC operation with tables."""
t0 = (te0[s[0]] ^ te1[s[5]] ^ te2[s[10]] ^ te3[s[15]])
t1 = (te0[s[4]] ^ te1[s[9]] ^ te2[s[14]] ^ te3[s[3]])
t2 = (te0[s[8]] ^ te1[s[13]] ^ te2[s[2]] ^ te3[s[7]])
t3 = (te0[s[12]] ^ te1[s[1]] ^ te2[s[6]] ^ te3[s[11]])
s = _block_from_ints([t0, t1, t2, t3])
return _xor(s, t)
```
Well... we will go through this later. Let's introduce how keystreams are generated from the state. It is (relatively) simple. The keystream $(k_{i, 0}, k_{i, 1})$ for the $i$-th round is given by:
$$k_{i, 0} = (s_{i, 2} \wedge s_{i, 3}) \oplus s_{i, 1} \oplus s_{i, 6},\\k_{i, 1} = (s_{i, 6} \wedge s_{i, 7}) \oplus s_{i, 5} \oplus s_{i, 2}.$$
### Part II: Recovering part of the state
Now we are given that key and IV are unchanged. This implies that the initial state, i.e., $s_{00}, s_{01}, ..., s_{09}$ are constants too.
Suppose that we have two 96-byte messages $M^{(1)}$ and $M^{(2)}$ with only the first two blocks are different (Formally, if $M^{(k)} := (m^{(k)}_{00}, m^{(k)}_{01}, ..., m^{(k)}_{21}$), then $m^{(1)}_{ij} = m^{(2)}_{ij}$ if and only if $i \neq 0$).
The following table summarizes which of the $s_{ij}$'s that would be different (marked by an `!`), when encrypting $M^{(1)}$ and $M^{(2)}$ respectively.
| $i$ \ $j$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|-----------|---|---|---|---|---|---|---|---|
| 0 | | | | | | | | |
| 1 | ! | | | | ! | | | |
| 2 | ! | ! | | | ! | ! | | |
What does this imply? Knowing that $s^{(1)}_{2,j} = s^{(2)}_{2,j}$ for $j = 2, 3, 6, 7$. Let's look closely on the last 32 bytes of the keystream:
$$
\begin{align}
k^{(1)}_{20} \oplus k^{(2)}_{20}
&= m^{(1)}_{20} \oplus c^{(1)}_{20} \oplus m^{(2)}_{20} \oplus c^{(2)}_{20} \\
&= \left[ (s^{(1)}_{22} \wedge s^{(1)}_{23}) \oplus s^{(1)}_{21} \oplus s^{(1)}_{26} \right] \oplus \left[ (s^{(2)}_{22} \wedge s^{(2)}_{23}) \oplus s^{(2)}_{21} \oplus s^{(2)}_{26} \right] \\
&= s^{(1)}_{21} \oplus s^{(2)}_{21}.
\end{align}
$$
And similarly $k^{(1)}_{21} \oplus k^{(2)}_{21} = s^{(1)}_{25} \oplus s^{(2)}_{25}$.
Why is it useful? Let's define a new function, `p`:
```python
def p(s: block) -> block:
t0 = (te0[s[0]] ^ te1[s[5]] ^ te2[s[10]] ^ te3[s[15]])
t1 = (te0[s[4]] ^ te1[s[9]] ^ te2[s[14]] ^ te3[s[3]])
t2 = (te0[s[8]] ^ te1[s[13]] ^ te2[s[2]] ^ te3[s[7]])
t3 = (te0[s[12]] ^ te1[s[1]] ^ te2[s[6]] ^ te3[s[11]])
return _block_from_ints([t0, t1, t2, t3])
```
Déjà vu? It is more or less the same with `AESEnc`. We can state that `AESEnc(s, t) == p(s) ^ t` too. Looking even more closely, one could observe that the first four bytes from `p` solely depends on bytes 0, 5, 10 and 15 from `s`.
Knowing this, we can further expand $k^{(1)}_{20} \oplus k^{(2)}_{20}$:
$$\begin{align}
k^{(1)}_{20} \oplus k^{(2)}_{20}
&= s^{(1)}_{21} \oplus s^{(2)}_{21} \\
&= \text{AESEnc}(s^{(1)}_{10}, s^{(1)}_{11}) \oplus \text{AESEnc}(s^{(2)}_{10}, s^{(2)}_{11}) \\
&= p(s^{(1)}_{10}) \oplus s^{(1)}_{11} \oplus p(s^{(2)}_{10}) \oplus s^{(2)}_{11} \\
&= p(s^{(1)}_{10}) \oplus p(s^{(2)}_{10}) \\
&= p\left(\text{AESEnc}(s^{(1)}_{07}, s^{(1)}_{00}) \oplus m^{(1)}_{00}\right) \oplus p\left(\text{AESEnc}(s^{(2)}_{07}, s^{(2)}_{00}) \oplus m^{(2)}_{00}\right) \\
&= p(x \oplus m^{(1)}_{00}) \oplus p(x \oplus m^{(2)}_{00}).
\end{align}$$
(We define $x := \text{AESEnc}(s_{07}, s_{00}) = s_{10} \oplus m^{(1)}_{00}$ for ease of reading.)
And now the _only_ unknown is $x$. Can we solve it easily? Yes indeed: we can compute bytes 0, 5, 10, 15 of $x$ from the first four bytes of $k^{(1)}_{20} \oplus k^{(2)}_{20}$. Along with three more equalities from `p`, we are able to recover $x$ completely. I used an meet-in-the-middle approach to solve for $x$ in five seconds.
![](https://i.imgur.com/sdSHCtb.png)
But wait. There is a problem: I am able to find 65536 candidates (or even more) instead of 1, but I am _unable_ to eliminate the rest. The possible number of states will be growing exponentally! What can I do? The solution is actually simple: Just send $M^{(3)}$ and compute another solution set of $x$. After all, it is very likely that $x$ is the only element in the intersection of the two sets. With $x$, we are able to compute $s_{10}$ (respectively $s_{14}$).
### Part III: Finishing the first part of the challenge
We can extend the above idea to leak more. By sending two 128-byte messages with blocks 3 and 4 being different, we are able to recover $s_{20}$ and $s_{24}$. We are able to leak $s_{30}$ and $s_{34}$ with the same idea.
Two more questions remain: How is it made possible in seven queries? And more importantly, how can we recover $s_{ij}$ for all $j$, for some $i$ (preferably $i = 0\ \text{or}\ 1$)?
**Challenge 1.** Recover the above states in 7 queries.
In short, we are encrypting these seven plaintexts (each `0` represents 16 `\x00`'s, etc):
1. `0000000000`
2. `0000110000`
3. `0000220000` - Derive $s_{10}$ and $s_{14}$ uniquely with (1) and (2)
4. `0000001100`
5. `0000002200` - Derive $s_{20}$ and $s_{24}$ uniquely with (1) and (4)
6. `0000000011`
7. `0000000022` - Derive $s_{30}$ and $s_{34}$ uniquely with (1) and (6)
**Challenge 2.** Recover $s_{1, j}$ for all $j$.
From above, we are able to derive $s_{i, 0}$ and $s_{i, 4}$ for $i = 1, 2, 3$ with $m_{ij} = 0$. Hence, the state transition would be $s_{i+1, j} \leftarrow p(s_{i, j-1}) \oplus s_{ij}$ for all $i, j$. Equivalently $s_{i, j-1} = p^{-1}(s_{i+1, j} \oplus s_{ij})$.
We are able to compute inverses of $p^{-1}$ easily. Solving system of linear equations would be all good, but I'm doing it with meet-in-the-middle. Code reuse for the win! For now, let's visualize how $s_{1, j}$'s can be derived.
```graphviz
digraph {
rankdir=BT
s₁₀[fillcolor=yellow,style=filled]
s₁₄[fillcolor=yellow,style=filled]
s₂₀[fillcolor=yellow,style=filled]
s₂₄[fillcolor=yellow,style=filled]
s₃₀[fillcolor=yellow,style=filled]
s₃₄[fillcolor=yellow,style=filled]
s₂₀ -> s₂₇
s₃₀ -> s₂₇
s₂₄ -> s₂₃
s₃₄ -> s₂₃
s₁₄ -> s₁₃
s₂₄ -> s₁₃
s₁₃ -> s₁₂
s₂₃ -> s₁₂
s₁₀ -> s₁₇
s₂₀ -> s₁₇
s₁₇ -> s₁₆
s₂₇ -> s₁₆
s₁₂ -> s₁₁
s₁₃ -> s₁₁
s₁₆ -> s₁₁
s₁₆ -> s₁₅
s₁₇ -> s₁₅
s₁₂ -> s₁₅
}
```
After all, the first part of the challenge is done.
### Part IV: AEGIS-128 vs AEGIS-128L
For the second part, AEGIS-128 is used. The state is now 80 bytes (five 16-byte blocks). The payload size has been reduced to one block (let's denote it by $m$). This is how the state transited:
* $s_{i+1, 0} \leftarrow p(s_{i, 4}) \oplus s_{i, 0} \oplus m$, and
* $s_{i+1, j} \leftarrow p(s_{i, j-1}) \oplus s_{i, j}$ for $1 \leq j \leq 4$.
Moreover, the keystream $k_i$ for the $i$-th round is also altered: $k_i = (s_{i, 2} \wedge s_{i, 3}) \oplus s_{i, 1} \oplus s_{i, 4}$.
### Part V: Exploring the challenge
I have no idea what's going on, so I decided to recover the printable `secret_plaintext` first.
It is pretty easy, and is made possible because we are able to receive the error from the oracle. In particular, from `pt.decode("ascii")`.
We are able to recover the plaintext with bit-flipping. To begin with, we can flip the whole ciphertext by `\x80`. The first 32 bytes for the plaintext would be flipped by `\x80` as well. If we send the flipped ciphertext (denote by $c_?$) to the oracle, we will obtain:
```
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 0: ordinal not in range(128)
```
This means that the first byte of the _flipped_ plaintext would be `\xe7`. Hence, the first byte of the plaintext is `\x67` (`g`). We then flip the first byte of $c_?$ by `\x80` and send it to the oracle, we will be receiving another error:
```
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc6 in position 1: ordinal not in range(128)
```
This recovers the second byte - `x46` (`F`). Since the secret plaintext is 96-byte long, we can recover it with 96 oracle calls.
**REMAINING ORACLE CALLS: 231 - 96 = 135.**
With a plaintext recovered, it is time for us to try to recover the internal state. Can we devise a similar strategy that is similar to the first part of the challenge? Formally, what will happen if we have two 48-byte messages $M^{(1)} := (m^{(1)}_0, m^{(1)}_1, m^{(1)}_2)$ and $M^{(2)} := (m^{(2)}_0, m^{(2)}_1, m^{(2)}_2)$ with only the first block being different. Then the last 16 bytes in the keystream will be:
$$
\begin{align}
k^{(1)}_2 \oplus k^{(2)}_2
&= \left[ (s^{(1)}_{22} \wedge s^{(1)}_{23}) \oplus s^{(1)}_{21} \oplus s^{(1)}_{24} \right] \oplus \left[ (s^{(2)}_{22} \wedge s^{(2)}_{23}) \oplus s^{(2)}_{21} \oplus s^{(2)}_{24} \right] \\
&= s^{(1)}_{21} \oplus s^{(2)}_{21} \\
&= p(s^{(1)}_{10}) \oplus s^{(1)}_{11} \oplus p(s^{(2)}_{10}) \oplus s^{(2)}_{11} \\
&= p(s^{(1)}_{10}) \oplus p(s^{(2)}_{10}) \\
&= p\left(\text{AESEnc}(s^{(1)}_{04}, s^{(1)}_{00}) \oplus m^{(1)}_1\right) \oplus p\left(\text{AESEnc}(s^{(2)}_{04}, s^{(2)}_{00}) \oplus m^{(2)}_1\right) \\
&= p(x \oplus m^{(1)}_0) \oplus p(x \oplus m^{(2)}_0).
\end{align}
$$
Hereby denote $x := \text{AESEnc}(s_{04}, s_{00}) = s_{10} \oplus m^{(1)}_0$. Simply put, if we have the ciphertexts for $M^{(1)}$ and $M^{(2)}$ (denote it as $C^{(k)} = (c^{(k)}_0, c^{(k)}_1, c^{(k)}_2)$), we are able to recover one-fifths of the state if this happens.
How are we able to do it? Well actually, we have recovered the secret plaintext above. We can flip the first block of the ciphertext arbitrarily (to $C_?$).
However, since $k^{(2)}_2$ is altered, the third block of the message would be updated. Luckily we are able to recover the message in 17 oracle calls. Here's how:
1. Sends $C_?$. We will obtain something like this:
```
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 34...
```
2. Flips the 35th byte by `\xe8` in $C_?$. Sends the patched $C_?$:
```
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcb in position 35...
```
3. Flips the 36th byte by `\xcb` in $C_?$. Repeat the process until we receive **OK**, meaning that the plaintext is now ASCII-encoded.
4. For now, we have recovered a subset of message bytes. We then flip the unknown bytes by `\x80` (for example, bytes 33 and 34) to throw errors from the oracle.
5. Repeat step 1 until all unknown bytes are recovered.
In short, we spent 16 oracle calls to recover the message, and one oracle call to indicate us to flip all the bytes those were originally printable. We are then able to recover a possible set of $s_{10}$ with 65536 entries (or more). We can spend another 17 queries to find the actual $s_{10}$, however.
**REMAINING ORACLE CALLS: 135 - 17×2 = 101.**
With the same idea, we can recover $s_{20}, s_{30}, s_{40}$ with 17×6 queries. This would allow us to recover $s_{10}, s_{11}, ..., s_{14}$ and hence forging arbitrary messages (along with a slightly longer AD).
**REMAINING ORACLE CALLS: 101 - 17×6 = -1.**
Shoot - we are one query short. Since we are able to recover one byte of the plaintext in each of the queries, so it doesn't hurt to sacrifice one oracle calls by guessing one byte. So... in theory, we are able to finish the challenge with once every 256 times.
Luckily, if we are given a incorrect plaintext (actually keystream), we are unable to recover a single $s_*$. That's pretty good, we are able to solve the challenge every time.
**REMAINING ORACLE CALLS: -1 + 1 = :tada:.**
With the exploit script written, I am able to reach the very end locally. Congratulations to me!
### Part IV: Wait... Aren't we done?
No... When I am interacting to the server, I am _always_ disconnected while sending one of the 231 oracle calls. Asking the organizers in IRC, knowing that there was an 1-minute timeout - it was later increased to 10 minutes. Unfortunately, my solution runs for around 5 minutes. I have two choices:
1. Wait until the challenge has a 10-minute timeout, or
2. Optimize the script and have it completed in one minute.
Seeing that there are already few teams solving the challenge, I think (2) would be fun.
#### 6.1. Reducing online complexity
For inputs that does not require immediate feedbacks, we can send them at the same time. This is an example when I am recovering `secret_plaintext` in the second part.
```python
# Before optimization
test_ciphertext = bytes([c^0x80 for c in ciphertext])
m0 = b''
for i in range(96):
r.sendline(base64.b64encode(test_ciphertext))
test_ciphertext = cxor(test_ciphertext, i, 0x80)
p, mc = try_decrypt_read(r)
assert p == i
m0 += bytes([mc^0x80])
# After optimization
test_ciphertext = bytes([c^0x80 for c in ciphertext])
m0 = b''
for i in range(96):
r.sendline(base64.b64encode(test_ciphertext))
test_ciphertext = cxor(test_ciphertext, i, 0x80)
for i in range(96):
p, mc = try_decrypt_read(r)
assert p == i
m0 += bytes([mc^0x80])
```
#### 6.2. Reducing offline complexity
For example, this is the method I implemented to solve for $x$ from $p(x \oplus a) \oplus p(x \oplus b) = c$ - it takes one second each time:
```python
def px_subsolve(a_sub, b_sub, c_sub):
# Given a_sub, b_sub, c_sub (4 bytes), find x_sub such that
# te0[(x_sub ^ a_sub)[0]] ^ te1[(x_sub ^ a_sub)[1]] ^ te2[(x_sub ^ a_sub)[2]] ^ te3[(x_sub ^ a_sub)[3]]
# ^ te0[(x_sub ^ a_sub)[0]] ^ te1[(x_sub ^ a_sub)[1]] ^ te2[(x_sub ^ a_sub)[2]] ^ te3[(x_sub ^ a_sub)[3]]
# = c_sub
# Reformulating:
# te0[(x_sub ^ a_sub)[0]] ^ te1[(x_sub ^ a_sub)[1]] ^ te0[(x_sub ^ a_sub)[0]] ^ te1[(x_sub ^ a_sub)[1]] ^ c_sub
# = te2[(x_sub ^ a_sub)[2]] ^ te3[(x_sub ^ a_sub)[3]] ^ te2[(x_sub ^ a_sub)[2]] ^ te3[(x_sub ^ a_sub)[3]]
lhss = {}
for x0, x1 in itertools.product(range(256), repeat=2):
# LHS
xs = [be0[x0^a_sub[0]], be0[x0^b_sub[0]], be1[x1^a_sub[1]], be1[x1^b_sub[1]], c_sub]
y = reduce(_xor, xs)
lhss[y] = lhss.get(y, []) + [(x0, x1)]
solns = []
for x2, x3 in itertools.product(range(256), repeat=2):
# RHS
xs = [be2[x2^a_sub[2]], be2[x2^b_sub[2]], be3[x3^a_sub[3]], be3[x3^b_sub[3]]]
y = reduce(_xor, xs)
for x0, x1 in lhss.get(y, []):
solns.append(bytes([x0, x1, x2, x3]))
return solns
```
However, if we force `a_sub == b'\0'*4` and `b_sub == b'\1'*4 or b_sub == b'\2'*4`, the right hand side can be precomputed. We are able to solve for $x$ once every 0.2 second.
At last - we are able to get the flag in 30 seconds locally and around 55 seconds online! :tada:
---
## Credits
* Thanks @harrier who noticed that my _lever_ did not hurt. Playing Minecraft too much, I misspelt liver.
* Thanks @hellman pointing that we are able to bruteforce byte bybyte instead of bruteforcing columns, since that we can apply `MixColumns` inverse ([the corresponding tweet](https://twitter.com/hellman1908/status/1298187833065312256)).
###### tags: `CTF`