bi0sCTF 2025: avernos [rev] writeup

# bi0sCTF 2025: avernos [rev] writeup Writeup author: fsharp ## Introduction I had a little free time during the weekend and played bi0sCTF 2025 with [`DeadSec`](https://deadsec.team). Out of all challenges, I managed to solve two reverse engineering challenges and one cryptography challenge. The first one I solved is `avernos`, which I also got first blood on! ## Problem description Author: the.m3chanic An ancient engine stirs in the dark. It speaks no language you know. (File provided: avernos.exe) ![bi0sctf_2025_avernos_first_blood](https://hackmd.io/_uploads/HJNrSM37xe.png) ## First look of the program Running `avernos.exe` in a terminal and entering input that doesn't follow the flag format causes the program to exit without printing anything. Providing random inputs that follow the flag format causes it to either hang or print `Wrong` and exit. ``` C:\Users\fsharp\Desktop\bi0s>avernos.exe random input bla bla C:\Users\fsharp\Desktop\bi0s>avernos.exe bi0s{abcdefg} ^C C:\Users\fsharp\Desktop\bi0s>avernos.exe bi0s{this_is_a_really_long_string_i_dunno_what_the_correct_flag_is} Wrong ``` What kind of program is this? [Detect It Easy](https://github.com/horsicq/Detect-It-Easy) indicates this is a .NET program, so it can be analyzed in a .NET decompiler. ![die_results](https://hackmd.io/_uploads/ryFf5f27ge.png) Let's open it in [dnSpyEx](https://github.com/dnSpyEx/dnSpy) then... ![dnspyex_info](https://hackmd.io/_uploads/S1GPD7h7lg.png) Right off the bat, I see this program is a bit unconventional as the decompiler says it contains [unmanaged code](https://learn.microsoft.com/en-us/dotnet/framework/interop/). This means the program contains two kinds of code: managed code is executed under the .NET runtime (which can be decompiled with programs like dnSpyEx and [ILSpy](https://github.com/icsharpcode/ILSpy)), while unmanaged code is executed without the runtime, which can include native code. So, to reverse-engineer this program, I need to analyze it with both a native code decompiler (e.g. [IDA](https://hex-rays.com/), [Ghidra](https://github.com/NationalSecurityAgency/ghidra), [Binary Ninja](https://binary.ninja/)) and a .NET decompiler. ## Program structure Let's start with `avernos.exe`'s entry point, which is `mainCRTStartup`. It is unmanaged code, so I opened the program in [IDA Free](https://hex-rays.com/ida-free): ![ida_start_and_function_list](https://hackmd.io/_uploads/HkirB44Vex.png) The `start()` function just calls `CorExeMain()` and many function names are stripped. Which one's the entry point and what do these functions do? Luckily, the program contains [metadata](https://learn.microsoft.com/en-us/dotnet/standard/metadata-and-self-describing-components) that lets us identify the functions. dnSpyEx indicates that `mainCRTStartup()` is at [RVA (relative virtual address)](https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#general-concepts) `0x4464` and file offset `0x3864`. ![dnspyex_maincrtstartup_metadata](https://hackmd.io/_uploads/H1B7jQ27le.png) This gives away the location of that function and means I can rename `sub_140004464()` in IDA to `mainCRTStartup()`! I can do the same thing for other stripped functions and variables to annotate them. This leads me to the main function (`sub_140003BA0()`): ![ida_main](https://hackmd.io/_uploads/BJjOpE2Xgl.png) The program attempts to open some file. A variable (`v4`) is set to `1` if the file exists and [to `0` if it doesn't](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170#return-value). It is then passed into `func2()`, which is managed code. Let's see its decompilation in dnSpyEx: ![dnspyex_func2](https://hackmd.io/_uploads/SkDDF4nQll.png) Two numbers are calculated and a congratulatory message gets printed. The second number calculated (`num6`) involves dividing the first calculated number (`num5`) by the variable passed into it (`value`). Uh... that's it? What about the program printing `Wrong` on some inputs as mentioned earlier? Why is the congratulatory message not printed? Where does any of that come from??? If I switch to looking at the disassembly of the main function, I can see what's actually going on: ![ida_try_except_main](https://hackmd.io/_uploads/S1PvqV2Qxg.png) There's a try-except block containing instructions that don't show up in the pseudocode! The try block (instructions highlighted in yellow) catches errors thrown from `func2()`, and the instructions after that are executed if an error was caught. In this case, if the value passed to `func2()` is `0` (i.e. the file the program tries to open doesn't exist), a division-by-zero error occurs, the congratulatory message doesn't get printed, and code absent from the pseudocode is run. It's annoying for IDA to not show such important information. I patched the instructions within the highlighted `try` block with NOPs (go to Edit > Patch program > Change byte... and insert a bunch of hexadecimal `90`s) to get the instructions inside the `except` block to show up in the pseudocode. This is what it looks like after I do that: ![ida_main_hidden_code](https://hackmd.io/_uploads/HJDQAN27xe.png) Doing this a few times, I can see an anti-debug function (`sub_14000188C()`) and `try` blocks that intentionally throw an error (division by zero) and cause `except` blocks to execute. Eventually, I reach `sub_140003DA4()`, which calls `func6()`. Let's look at that in dnSpyEx: ![dnspyex_func6](https://hackmd.io/_uploads/BkjvyS3Xxx.png) The program reads a line of user input. If the input doesn't match the regex `^bi0s{([a-zA-Z0-9_]+)}$`, the program just exits. Otherwise: 1. The text between `bi0s{` and `}` is extracted. 2. `global_flag` is set (`func11()`). 3. The extracted text is copied into `user_input`, `injectedInput` is set to point to the start of `user_input` and `injectedIndex` is set to `0` (`func8()`). 4. `func1()` is called. ![dnspyex_func1](https://hackmd.io/_uploads/HJMPbH2Qxe.png) Inside `func1()`: 1. The program exits if `global_flag` isn't set (which should never happen). 2. `func27()` decrypts `embedded_instructions`. 3. `func5()` copies the decrypted instructions into `vm_instructions` and sets `vm_instruction_count` to the number of instructions there are (`1127`). 4. `func23()` (i.e. `sub_14000161C()`) does a few things. It `memset`s an area of `0x1178` bytes (`unk_140011030`) to zero (`sub_140001530()`) and contains a `while` loop that does something with `vm_halt` and `global_instruction_tracker`. 5. A number is calculated from `func18()` (it should always be the same), and one of two strings is decrypted and printed depending on the result of an `if` statement. This is probably where the `Wrong` string comes from. ![ida_func5](https://hackmd.io/_uploads/S1bws4EEee.png) ![ida_func23](https://hackmd.io/_uploads/rkWypN44ll.png) The logic for `func5()` and `func23()` tells me that the flag checker is implemented in a [custom virtual machine (VM)](https://www.0ffset.net/reverse-engineering/solving-a-vm-based-crackme/)! `vm_instructions` should contain the VM bytecode executed by the flag checker and `func23()` should call a function that interprets that bytecode. `unk_140011030` should be the stack. The goal now is to understand what the bytecode does. ## VM bytecode structure Let's go back to analyzing the functions used by `func23()`. ![ida_sub_140001570](https://hackmd.io/_uploads/SJ4s6VNEel.png) ![ida_sub_140001540](https://hackmd.io/_uploads/Bka-CENVex.png) `sub_140001570()` copies a VM instruction into the buffer pointed by the first argument if the value returned by `sub_140001540()` on that instruction is equal to `global_instruction_tracker`. This means `global_instruction_tracker` is probably the address of the instruction to be executed, and `sub_140001570()` returns the instruction located at that address. The instruction is eventually passed into `sub_140001EAC()`, which is quite big and does a lot with it. Here's a peek at the start of the function's pseudocode: ![ida_sub_140001EAC_start](https://hackmd.io/_uploads/B1hoxBN4el.png) This function is probably the VM bytecode interpreter! The value of `v68` determines which `if` or switch-case block should be executed, so it likely represents the type of bytecode being interpreted. `vm_inst4` and `vm_inst5` are used in some of these code blocks and they might represent bytecode operands. Going back to the function that calculated `v68`... ![ida_sub_140001CF0](https://hackmd.io/_uploads/SkR1mB4Nxe.png) ![ida_sub_140007030](https://hackmd.io/_uploads/HJhsmS4Exl.png) ![ida_qword_140010F10](https://hackmd.io/_uploads/r1JNEr4Vge.png) The second and fourth bytes of the VM instruction are passed into `sub_140007030()`, which are then passed into the function pointer located at `qword_140010F10`. Before the program's initialized, it has a value of `0x6000003`. This corresponds to the [metadata token](https://learn.microsoft.com/en-us/dotnet/standard/metadata-and-self-describing-components#metadata-tokens) of `decrypt_block_cipher()`: ![dnspyex_decrypt_block_cipher](https://hackmd.io/_uploads/B1Lb8rEEgg.png) So, those two VM instruction bytes are passed into `decrypt_block_cipher()` to calculate the bytecode type. To sum up what I've found about VM instructions so far: 1. Every VM instruction (`vm_inst`) is 7 bytes long. 2. Instruction address = `sub_150001540(vm_inst)` = `(vm_inst[0] << 8) | (vm_inst[2])` 3. Opcode (Bytecode type) = `sub_140001CF0(vm_inst)` = `decrypt_block_cipher(vm_inst[1], vm_inst[3])` 4. Operand 1 (Argument 1) = `vm_inst[4]` 5. Operand 2 (Argument 2) = `vm_inst[5]` 6. `vm_inst[6]` is unused. ## Disassembling the flag checker VM I can now work on extracting information from the VM instructions. First, I need to get the decrypted instructions. I can do that by dumping them when the program begins copying them into `vm_instructions`, which is the start of `func5()` (RVA `0x16a8`, file offset `0xaa8`). I load the program into the [x64dbg debugger](https://x64dbg.com/), set a breakpoint at `func5()`, let the division-by-zero errors happen, and provide input that passes the flag format regex. When the breakpoint's hit: 1. Right-click the RCX register (the first argument passed into the function, i.e. the memory address pointing at the start of the decrypted instructions). 2. Click 'Follow in Dump'. 3. Select 7889 (`0x1ED1`) bytes from the start of the dump. 4. Right-click them and go to Binary > Save To a File. Save the instructions into `embedded_instructions.bin`. Here's a screenshot of what x64dbg would show when the breakpoint's hit and the decrypted instructions are selected in the dump: ![x64dbg_dumping_vm_instructions](https://hackmd.io/_uploads/SyCOMI4Nxe.png) Next, I need to get the addresses, opcodes and arguments of the dumped instructions. Here's a cleaned up Python script I used for doing that: ```python def func7(x, k): return (x + k) % 16 def decrypt_block_cipher(key, ct): b = ct >> 4 return ((func7(b, key & 15) << 4) ^ ((ct << 4) & 240)) | b def get_instruction_address(vm_inst): return (vm_inst[0] << 8) | (vm_inst[2]) def get_opcode(vm_inst): return decrypt_block_cipher(vm_inst[1], vm_inst[3]) vm_insts = open("embedded_instructions.bin", "rb").read() vm_insts = [vm_insts[i : i + 7] for i in range(0, len(vm_insts), 7)] info = [] for vm_inst in vm_insts: address, opcode = get_instruction_address(vm_inst), get_opcode(vm_inst) arg1, arg2 = vm_inst[4], vm_inst[5] info.append((address, opcode, arg1, arg2)) # Sort VM instruction infos by ascending addresses info = sorted(info) ``` The bytecode interpreter supports a TON of opcodes. Do I really have to understand them all? Let's see which opcodes the flag checker uses: ```python used_opcodes = set([opcode for (address, opcode, arg1, arg2) in info]) print(len(used_opcodes)) # 32 print(sorted(used_opcodes)) # [0, 90, 91, 92, 94, 106, 107, 122, 124, 138, 139, 140, 145, # 148, 149, 150, 151, 152, 153, 154, 162, 172, 174, 177, 192, # 193, 194, 195, 197, 198, 248, 254] ``` Luckily, there's no need for me to understand opcodes other than 32 of them! Now let's head back to the VM bytecode interpreter (`sub_140001EAC()`) and have a look at the functions used by those 32 opcodes. The first function is `sub_1400010B0()`. It takes a pointer to the start of the stack and a number as arguments: ![ida_sub_1400010B0](https://hackmd.io/_uploads/H14vZlrVlx.png) Depending on the value of the number, either a byte from the stack or `0` is returned. The next function is `sub_1400011F0()`. It takes the same kinds of arguments as the previous function I covered: ![ida_sub_1400011F0](https://hackmd.io/_uploads/rknbfeBVle.png) The function's logic is similar to the previous one, but instead of returning bytes from the stack, it returns 64-bit numbers (qwords) instead. So, it looks like there are functions for retrieving numbers of various sizes from the stack. Let's check the third function (`sub_140001370()`): ![ida_sub_140001370](https://hackmd.io/_uploads/rJklQeHVgg.png) This time, instead of getting stack numbers, it sets them. The function sets a particular number in the stack to a qword. Moving on, I see functions that are 'stack getters' and others that are 'stack setters.' There are two kinds of getters and setters: One that uses specific parts of the stack, and one that uses an offset from the start of the stack, such as `sub_1400013B0()`. ![ida_sub_1400013B0](https://hackmd.io/_uploads/BkSANlrVel.png) All of the offset-based functions calculate the offset by adding `82` to the argument. It looks like each part of the stack serves a different purpose. I noticed this for the logic of opcodes `0xc0` and `0xc1`: ![ida_sub_140001EAC_comparisons](https://hackmd.io/_uploads/HyLhIlHNgx.png) The difference of two equally sized stack numbers is calculated, then two bits of `stack[80]` are set depending on whether the difference is `0` and whether it's smaller than `0`. `stack[80]` likely represents a flags register, with the least significant bit being the zero flag (ZF) and the other bit being the sign flag (SF). There are a few other functions that are relevant: 1. `sub_1400013C0()` swaps two bytes of a 16-bit number (ushort) located at a stack offset. 2. `sub_140001550(vm_inst)` returns `(vm_inst[4] << 8) | (vm_inst[5])`. This is the same as `(arg1 << 8) | (arg2)`. 3. `func3()`'s logic looks a little intimidating, but all it does is retrieve the next character of the text extracted from the entered flag (`injectedInput`). If there aren't any characters left, the program reads another line of input. If the line contains characters, the program sets `injectedInput` to it and continues reading characters off of that. Otherwise, a null byte is returned. This is why the program hangs for entered flags that are too short! ![dnspyex_func3](https://hackmd.io/_uploads/rkg0FlSVle.png) After going through the functions used by the 32 opcodes the flag checker VM contains, I got a clear picture of what the functions do and how the stack's structured. The first 80 bytes of the stack store 8-bit (***b***yte), 16-bit (***u***short), 32-bit (***d***word) and 64-bit (***q***word) registers, none of which overlap one another in memory. The 81st byte of the stack is an 8-bit flags register, while everything from the 82nd byte onwards are used to store offset-based numbers of various sizes. Knowing all of this, I can now disassemble the opcodes with a massively cleaned up Python script: ```python # Notation: # b, u, d and q represent a number that has the size of a byte / ushort / dword / qword # (b,u,d,q){number} represents a register # stk_(b,u,d,q)[offset] represents an offset-based number of a particular size # input_char represents the next character of the extracted text def b(n): if 1 <= n <= 5: return n - 1 if 21 <= n <= 25: return n - 16 # Technically incorrect, but this doesn't really affect # understanding the disassembly that much return 0 def u(n): if 6 <= n <= 10: return n - 6 # Ditto return 0 def d(n): if 11 <= n <= 15: return n - 11 # Ditto return 0 def q(n): if 16 <= n <= 20: return n - 16 # Ditto return 0 for (address, opcode, arg1, arg2) in info: print(str(address).zfill(4) + ": ", end = '') if opcode == 0: print("nop") elif opcode == 0x5a: print(f"mov b{b(arg1)}, b{b(arg2)}") elif opcode == 0x5b: print(f"mov b{b(arg1)}, stk_b[{arg2}]") elif opcode == 0x5c: print(f"mov stk_b[{arg1}], b{b(arg2)}") elif opcode == 0x5e: print(f"mov b{b(arg1)}, {arg2}") elif opcode == 0x6a: print(f"add b{b(arg1)}, b{b(arg2)}") elif opcode == 0x6b: print(f"sub b{b(arg1)}, b{b(arg2)}") elif opcode == 0x7a: print(f"and b{b(arg1)}, b{b(arg2)}") elif opcode == 0x7c: print(f"xor b{b(arg1)}, b{b(arg2)}") elif opcode == 0x8a: print(f"jmp {str(arg1 * 256 + arg2).zfill(4)}") elif opcode == 0x8b: print(f"je {str(arg1 * 256 + arg2).zfill(4)}") elif opcode == 0x8c: print(f"jne {str(arg1 * 256 + arg2).zfill(4)}") elif opcode == 0x91: print(f"mov b{b(arg1)}, input_char") elif opcode == 0x94: print(f"mov d{d(arg1)}, d{d(arg2)}") elif opcode == 0x95: print(f"mov q{q(arg1)}, q{q(arg2)}") elif opcode == 0x96: print(f"mov u{u(arg1)}, swapped(stk_u[{arg2}])") elif opcode == 0x97: print(f"mov stk_u[{arg1}], u{u(arg2)}") elif opcode == 0x98: print(f"mov d{d(arg1)}, stk_d[{arg2}]") elif opcode == 0x99: print(f"mov stk_d[{arg1}], d{d(arg2)}") elif opcode == 0x9a: print(f"mov q{q(arg1)}, stk_q[{arg2}]") elif opcode == 0xa2: print(f"xor u{u(arg1)}, u{u(arg2)}") elif opcode == 0xac: print(f"and d{d(arg1)}, d{d(arg2)}") elif opcode == 0xae: print(f"xor d{d(arg1)}, d{d(arg2)}") elif opcode == 0xb1: print(f"shr d{d(arg1)}, {arg2}") elif opcode == 0xc0: print(f"cmp u{u(arg1)}, u{u(arg2)}") elif opcode == 0xc1: print(f"cmp d{d(arg1)}, d{d(arg2)}") elif opcode == 0xc2: print(f"cmp q{q(arg1)}, q{q(arg2)}") elif opcode == 0xc3: print(f"mov b{b(arg1)}, stk_b[b{b(arg2)}]") elif opcode == 0xc5: print(f"mov stk_b[b{b(arg1)}], b{b(arg2)}") elif opcode == 0xc6: print(f"inc b{b(arg1)}") elif opcode == 0xf8: print("exit") elif opcode == 0xfe: print(f"cmp b{b(arg1)}, b{b(arg2)}") ``` ## Reversing the flag checker VM The start of my flag checker VM disassembly looks like this: ``` 0000: mov b0, 0 0001: mov b1, 1 0002: mov b2, 2 0003: mov b3, 3 0004: mov stk_b[0], b0 0005: mov stk_b[1], b1 0006: mov stk_b[2], b2 0007: mov stk_b[3], b3 0008: mov b0, 4 0009: mov b1, 5 0010: mov b2, 6 0011: mov b3, 7 0012: mov stk_b[4], b0 ... ``` Skimming through it, I see a block of instructions that appears multiple times throughout the disassembly: ``` 0657: mov d0, stk_d[50] 0658: mov d1, stk_d[2] 0659: cmp d0, d1 0660: jne 0662 0661: jmp 0666 0662: mov b0, stk_b[237] 0663: mov b1, 0 0664: and b0, b1 0665: mov stk_b[237], b0 ; Actually 'mov stk_b[237], 0', but this makes sense too lol 0666: nop ``` Two numbers are compared. If they're not equal, `stk_b[237]` is set to `0`. What's the initial value of `stk_b[237]` then? ``` 0480: mov b0, 236 0481: mov b1, 1 0482: mov b2, 238 0483: mov b3, 239 0484: mov stk_b[236], b0 0485: mov stk_b[237], b1 0486: mov stk_b[238], b2 0487: mov stk_b[239], b3 ``` This is likely the number used in the second `if` statement of `func1()` for decrypting and printing one of the two messages! The goal is probably to have `stk_b[237]` remain `1` after the flag is checked. There are four parts to the flag checker, each checking 8 characters of the flag in a different way. I'll go from what I consider the easiest part to the hardest part. ### Part 4: String equality ``` 1084: mov b0, 100 1085: mov b1, 101 1086: mov b2, 95 1087: mov b3, 115 1088: mov b4, 55 1089: mov b5, 117 1090: mov b6, 102 1091: mov b7, 102 1092: mov stk_b[100], b0 1093: mov stk_b[101], b1 1094: mov stk_b[102], b2 1095: mov stk_b[103], b3 1096: mov stk_b[104], b4 1097: mov stk_b[105], b5 1098: mov stk_b[106], b6 1099: mov stk_b[107], b7 1100: mov b0, input_char 1101: mov b1, input_char 1102: mov b2, input_char 1103: mov b3, input_char 1104: mov b4, input_char 1105: mov b5, input_char 1106: mov b6, input_char 1107: mov b7, input_char 1108: mov stk_b[70], b0 1109: mov stk_b[71], b1 1110: mov stk_b[72], b2 1111: mov stk_b[73], b3 1112: mov stk_b[74], b4 1113: mov stk_b[75], b5 1114: mov stk_b[76], b6 1115: mov stk_b[77], b7 1116: mov q0, stk_q[100] 1117: mov q1, stk_q[70] 1118: cmp q0, q1 1119: jne 1121 1120: jmp 1125 1121: mov b0, stk_b[237] 1122: mov b1, 0 1123: and b0, b1 1124: mov stk_b[237], b0 1125: nop 1126: exit ``` 8 bytes are loaded into `q0` and the last 8 input characters are loaded into `q1`. `q0` should be equal to `q1`. This is essentially: ```python q0 = int.from_bytes([100, 101, 95, 115, 55, 117, 102, 102], "little") q1 = int.from_bytes(input_chars[-8:], "little") if (q0 != q1): stk[237] = 0 ``` So the last 8 correct input characters are: ```python part4 = bytes([100, 101, 95, 115, 55, 117, 102, 102]).decode() print(part4) # "de_s7uff" ``` ### Part 3: XOR ``` 1009: mov b0, 67 1010: mov b1, 92 1011: mov stk_b[48], b0 1012: mov stk_b[49], b1 1013: mov u0, swapped(stk_u[48]) 1014: mov b0, input_char 1015: mov stk_b[50], b0 1016: mov b1, input_char 1017: mov stk_b[51], b1 1018: mov b0, input_char 1019: mov stk_b[52], b0 1020: mov b1, input_char 1021: mov stk_b[53], b1 1022: mov b2, input_char 1023: mov stk_b[54], b2 1024: mov b0, input_char 1025: mov stk_b[55], b0 1026: mov b0, input_char 1027: mov stk_b[56], b0 1028: mov b0, input_char 1029: mov stk_b[57], b0 1030: mov u1, swapped(stk_u[50]) 1031: xor u1, u0 1032: cmp u1, u0 1033: mov stk_u[50], u1 1034: mov u1, swapped(stk_u[51]) 1035: xor u1, u0 1036: cmp u1, u0 1037: mov stk_u[51], u1 1038: mov u1, swapped(stk_u[52]) 1039: xor u1, u0 1040: cmp u1, u0 1041: mov stk_u[52], u1 1042: mov u1, swapped(stk_u[53]) 1043: xor u1, u0 1044: cmp u1, u0 1045: mov stk_u[53], u1 1046: mov u1, swapped(stk_u[54]) 1047: xor u1, u0 1048: cmp u1, u0 1049: mov stk_u[54], u1 1050: mov u1, swapped(stk_u[55]) 1051: xor u1, u0 1052: cmp u1, u0 1053: mov stk_u[55], u1 1054: mov u1, swapped(stk_u[56]) 1055: xor u1, u0 1056: cmp u1, u0 1057: mov stk_u[56], u1 1058: mov b0, 119 1059: mov b1, 102 1060: mov b2, 64 1061: mov b3, 107 1062: mov b4, 112 1063: mov b5, 64 1064: mov b6, 119 1065: mov b7, 109 1066: mov stk_b[60], b0 1067: mov stk_b[61], b1 1068: mov stk_b[62], b2 1069: mov stk_b[63], b3 1070: mov stk_b[64], b4 1071: mov stk_b[65], b5 1072: mov stk_b[66], b6 1073: mov stk_b[67], b7 1074: mov q0, stk_q[50] 1075: mov q1, stk_q[60] 1076: cmp q0, q1 1077: jne 1079 1078: jmp 1083 1079: mov b0, stk_b[237] 1080: mov b1, 0 1081: and b0, b1 1082: mov stk_b[237], b0 1083: nop ``` `u0` is set to a constant number, 8 input characters are put into `stk_b[50 : 58]`, every two characters there are XOR'd by `u0`, and they're checked against 8 bytes for equality. So, this part basically does: ```python from pwn import xor q0 = bytearray(input_chars[16 : 24].encode()) u0 = bytes([67, 92]) for i in range(len(q0) - 1): q0[i : i + 2] = xor(q0[i : i + 2], u0) q1 = bytes([119, 102, 64, 107, 112, 64, 119, 109]) if (q0 != q1): stk[237] = 0 ``` The correct input characters can be recovered by applying the XOR encryption on `q1` in the same way: ```python from pwn import xor part3 = bytearray([119, 102, 64, 107, 112, 64, 119, 109]) u0 = bytes([67, 92]) for i in range(len(part3) - 1): part3[i : i + 2] = xor(part3[i : i + 2], u0) part3 = part3.decode() print(part3) # "4y_to_h1" ``` ### Part 2: Custom RC4 The disassembly for this part and the first one will be a bit long, so I'll split them up into smaller sections. ``` 0721: mov b0, 0 0722: mov stk_b[0], b0 0723: mov b0, 1 0724: mov stk_b[1], b0 0725: mov b0, 2 0726: mov stk_b[2], b0 0727: mov b0, 3 0728: mov stk_b[3], b0 0729: mov b0, 4 0730: mov stk_b[4], b0 0731: mov b0, 5 0732: mov stk_b[5], b0 0733: mov b0, 6 0734: mov stk_b[6], b0 0735: mov b0, 7 0736: mov stk_b[7], b0 0737: mov b0, 8 0738: mov stk_b[8], b0 0739: mov b0, 9 0740: mov stk_b[9], b0 0741: mov b0, 10 0742: mov stk_b[10], b0 0743: mov b0, 11 0744: mov stk_b[11], b0 0745: mov b0, 12 0746: mov stk_b[12], b0 0747: mov b0, 13 0748: mov stk_b[13], b0 0749: mov b0, 14 0750: mov stk_b[14], b0 0751: mov b0, 15 0752: mov stk_b[15], b0 0753: mov b0, 222 0754: mov b1, 173 0755: mov b2, 190 0756: mov b3, 239 0757: mov stk_b[50], b0 0758: mov stk_b[51], b1 0759: mov stk_b[52], b2 0760: mov stk_b[53], b3 0761: mov b0, 202 0762: mov b1, 254 0763: mov b2, 186 0764: mov b3, 190 0765: mov stk_b[54], b0 0766: mov stk_b[55], b1 0767: mov stk_b[56], b2 0768: mov stk_b[57], b3 ``` This is initializing part of the stack: ```python stk = list(range(16)) + [0 for i in range(256 - 16)] stk[50 : 58] = (222, 173, 190, 239, 202, 254, 186, 190) ``` Moving on... ``` 0769: mov b0, 0 0770: mov b1, stk_b[0] 0771: add b0, b1 0772: mov b2, stk_b[50] 0773: add b0, b2 0774: mov b4, 15 0775: and b0, b4 0776: mov b4, b0 0777: mov b2, stk_b[0] 0778: mov b3, stk_b[b0] 0779: mov stk_b[b0], b2 0780: mov stk_b[0], b3 0781: mov b0, b4 0782: mov b1, stk_b[1] 0783: add b0, b1 0784: mov b2, stk_b[51] 0785: add b0, b2 0786: mov b4, 15 0787: and b0, b4 0788: mov b4, b0 0789: mov b2, stk_b[1] 0790: mov b3, stk_b[b0] 0791: mov stk_b[b0], b2 0792: mov stk_b[1], b3 <repeat several times, snip...> ``` This is doing: ```python b1_indices = list(range(15)) b2_indices = [50, 51, 52, 53, 54, 55, 56, 57, 50, 51, 52, 53, 54, 55, 56] b4 = 0 for (b1_index, b2_index) in zip(b1_indices, b2_indices): b0 = b4 b1 = stk[b1_index] b0 = (b0 + b1) & 0xff b2 = stk[b2_index] b0 = (b0 + b2) & 0xff b4 = 15 b0 &= b4 b4 = b0 b2 = stk[b1_index] b3 = stk[b0] stk[b0] = b2 stk[b1_index] = b3 ``` This looks messy! Looking closer, it's implementing the key-scheduling algorithm of the [RC4 stream cipher](https://en.wikipedia.org/wiki/RC4) with a permutation box of just 16 bytes instead of the usual 256 bytes, so this is what it's really doing: ```python S = [i for i in range(16)] key = [222, 173, 190, 239, 202, 254, 186, 190] j = 0 for i in range(15): j = (j + S[i] + key[i % len(key)]) % len(S) S[i], S[j] = S[j], S[i] ``` After that... ``` 0949: mov b0, input_char 0950: mov b1, input_char 0951: mov b2, input_char 0952: mov b3, input_char 0953: mov b4, input_char 0954: mov b5, input_char 0955: mov b6, input_char 0956: mov b7, input_char 0957: mov stk_b[48], b0 0958: mov stk_b[49], b1 0959: mov stk_b[50], b2 0960: mov stk_b[51], b3 0961: mov stk_b[52], b4 0962: mov stk_b[53], b5 0963: mov stk_b[54], b6 0964: mov stk_b[55], b7 ``` 8 input chars are put into `stk_b[48 : 56]`: ```python stk[48 : 56] = list(input_chars[8 : 16].encode()) ``` And after that... ``` 0965: mov b0, 0 0966: mov b1, 16 0967: mov b3, 1 0968: mov b0, 112 0969: mov b1, 164 0970: mov d0, stk_d[112] 0971: mov d1, stk_d[164] 0972: mov stk_d[112], d0 0973: mov stk_d[116], d1 0974: mov b0, 0 0975: mov b1, 8 0976: mov b2, 48 0977: mov b3, 0 0978: mov b4, 0 0979: mov b5, 15 0980: mov q0, stk_q[112] 0981: mov q0, q0 0982: inc b3 0983: and b3, b5 0984: mov b6, stk_b[b3] 0985: add b4, b6 0986: and b4, b5 0987: mov b7, stk_b[b4] 0988: mov stk_b[b4], b6 0989: mov stk_b[b3], b7 0990: add b6, b7 0991: and b6, b5 0992: mov b7, stk_b[b6] 0993: mov b6, stk_b[b2] 0994: xor b6, b7 0995: mov stk_b[b2], b6 0996: inc b0 0997: inc b2 0998: cmp b0, b1 0999: jne 0982 1000: mov q1, stk_q[48] 1001: cmp q0, q1 1002: je 1008 1003: jmp 1004 1004: mov b0, stk_b[237] 1005: mov b1, 0 1006: and b0, b1 1007: mov stk_b[237], b0 1008: nop ``` Two dwords are loaded into `stk_d[112]` (`stk_b[112 : 116]`) and `stk_d[116]` (`stk_b[116 : 120]`). `q0` is set to the concatenation of those two dwords. The 8 input characters are XOR'd using numbers calculated in a `while` loop and are then compared against `q0` for equality: ```python b0 = 0 b1 = 8 b2 = 48 b3 = 0 b4 = 0 b5 = 15 q0 = stk[112 : 120] while (b0 != b1): b3 = (b3 + 1) & b5 b6 = stk[b3] b4 = (b4 + b6) & b5 b7 = stk[b4] stk[b4] = b6 stk[b3] = b7 b6 = (b6 + b7) & b5 b7 = stk[b6] b6 = stk[b2] b6 ^= b7 stk[b2] = b6 b0 += 1 b2 += 1 q1 = stk[48 : 56] if (q0 != q1): stk[237] = 0 ``` This is the pseudorandom generation algorithm component of RC4: ```python i = 0 j = 0 ct = list(input_chars[8 : 16].encode()) for k in range(len(ct)): i = (i + 1) % len(S) j = (j + S[i]) % len(S) S[i], S[j] = S[j], S[i] t = (S[i] + S[j]) % len(S) xor_byte = S[t] ct[k] ^= xor_byte if (ct != q1): stk[237] = 0 ``` All of this means the 8 input characters are encrypted using a custom RC4 cipher and compared against a ciphertext. I just need to RC4-encrypt the correct ciphertext to recover the second part of the flag. But... where's the correct ciphertext? The VM instructions located at `0970` and `0971` tell me that they should be located at `stk_d[112]` and `stk_d[164]`. Looking at the beginning of the flag checker VM disassembly, I see this: ``` 0216: mov b0, 108 0217: mov b1, 109 0218: mov b2, 110 0219: mov b3, 111 0220: mov stk_b[108], b0 0221: mov stk_b[109], b1 0222: mov stk_b[110], b2 0223: mov stk_b[111], b3 0224: mov b0, 112 ; Bogus 0225: mov b1, 113 0226: mov b2, 114 0227: mov b3, 115 0228: mov b0, 89 ; Real 0229: mov b1, 50 0230: mov b2, 92 0231: mov b3, 100 0232: mov stk_b[112], b0 0233: mov stk_b[113], b1 0234: mov stk_b[114], b2 0235: mov stk_b[115], b3 ... 0324: mov b0, 160 0325: mov b1, 161 0326: mov b2, 162 0327: mov b3, 163 0328: mov stk_b[160], b0 0329: mov stk_b[161], b1 0330: mov stk_b[162], b2 0331: mov stk_b[163], b3 0332: mov b0, 164 ; Bogus 0333: mov b1, 165 0334: mov b2, 166 0335: mov b3, 167 0336: mov b0, 126 ; Real 0337: mov b1, 109 0338: mov b2, 90 0339: mov b3, 113 0340: mov stk_b[164], b0 0341: mov stk_b[165], b1 0342: mov stk_b[166], b2 0343: mov stk_b[167], b3 ``` So that's where the ciphertext is! Let's decrypt it: ```python S = [i for i in range(16)] key = [222, 173, 190, 239, 202, 254, 186, 190] j = 0 for i in range(15): j = (j + S[i] + key[i % len(key)]) % len(S) S[i], S[j] = S[j], S[i] i = 0 j = 0 part2 = bytearray([89, 50, 92, 100, 126, 109, 90, 113]) for k in range(len(part2)): i = (i + 1) % len(S) j = (j + S[i]) % len(S) S[i], S[j] = S[j], S[i] t = (S[i] + S[j]) % len(S) xor_byte = S[t] part2[k] ^= xor_byte part2 = part2.decode() print(part2) # "_4_fun_w" ``` ### Part 1: Custom hashing ``` 0000: mov b0, 0 0001: mov b1, 1 0002: mov b2, 2 0003: mov b3, 3 0004: mov stk_b[0], b0 0005: mov stk_b[1], b1 0006: mov stk_b[2], b2 0007: mov stk_b[3], b3 <repeat many times, snip...> ``` There are a lot of instructions for setting the first 256 bytes of `stk_b`, but they're irrelevant to this part. ``` 0520: mov b0, input_char 0521: mov b1, input_char 0522: mov stk_b[50], b0 0523: mov stk_b[51], b1 0524: mov u0, swapped(stk_u[50]) 0525: mov b0, 217 0526: mov b1, 11 0527: mov b2, 131 0528: mov b3, 32 0529: mov stk_b[69], b0 0530: mov stk_b[70], b1 0531: mov stk_b[71], b2 0532: mov stk_b[72], b3 0533: mov d0, stk_d[69] 0534: mov b0, 0 0535: mov stk_b[48], b0 0536: mov stk_b[49], b0 0537: mov d1, stk_d[48] 0538: mov b0, 32 0539: mov b1, 0 0540: mov stk_b[230], b1 0541: mov stk_b[231], b1 0542: mov stk_b[232], b1 0543: mov b1, 1 0544: mov stk_b[233], b1 0545: mov d2, stk_d[230] 0546: mov d3, d1 0547: sub b1, b1 0548: mov b2, 1 0549: mov d4, d1 0550: and d4, d2 0551: cmp d4, d2 0552: jne 0558 0553: shr d1, 1 0554: sub b0, b2 0555: cmp b0, b1 0556: je 0564 0557: jmp 0549 0558: shr d1, 1 0559: xor d1, d0 0560: sub b0, b2 0561: cmp b0, b1 0562: je 0564 0563: jmp 0549 0564: mov stk_d[2], d1 ... ``` There are four blocks of instructions that look more or less like this. Each stores two input characters into `stk_b[50 : 52]`. `d0` is set to a constant dword and is used to XOR `d1` a total of at most 32 times inside a loop. `d1` is initialized to a dword where the two most significant bytes are `0` and the two other bytes are the two input characters. For each iteration of the loop, `d1` is shifted to the right by one bit, and if that shifted bit is unset, `d1` is XOR'd by `d0`. Finally, `d1` is stored into a `stk_d` position. This looks like a custom hashing algorithm applied on every two input characters: ```python stk_d = [] d0 = int.from_bytes(bytes([32, 131, 11, 217]), "little") for i in range(4): d1 = int.from_bytes(input_chars[2 * i : 2 * (i + 1)].encode(), "big") for j in range(32): if d1 & 1 != 0: d1 >>= 1 else: d1 >>= 1 d1 ^= d0 stk_d.append(d1) ``` What comes after that? ``` 0649: mov b0, 204 0650: mov b1, 157 0651: mov b2, 8 0652: mov b3, 203 0653: mov stk_b[50], b0 0654: mov stk_b[51], b1 0655: mov stk_b[52], b2 0656: mov stk_b[53], b3 0657: mov d0, stk_d[50] 0658: mov d1, stk_d[2] 0659: cmp d0, d1 0660: jne 0662 0661: jmp 0666 0662: mov b0, stk_b[237] 0663: mov b1, 0 0664: and b0, b1 0665: mov stk_b[237], b0 0666: nop 0667: mov b0, 246 0668: mov b1, 162 0669: mov b2, 151 0670: mov b3, 149 0671: mov stk_b[50], b0 0672: mov stk_b[51], b1 0673: mov stk_b[52], b2 0674: mov stk_b[53], b3 0675: mov d0, stk_d[50] 0676: mov d1, stk_d[6] 0677: cmp d0, d1 0678: jne 0680 0679: jmp 0684 0680: mov b0, stk_b[237] 0681: mov b1, 0 0682: and b0, b1 0683: mov stk_b[237], b0 0684: nop <two more blocks of this, snip...> ``` The four hashes calculated (`d1`) are checked against four constant dwords (`d0`) for equality. Those are the correct hashes. Knowing how the hashing algorithm works, I can now bruteforce the two correct input characters for each of the four correct hashes: ```python d0 = int.from_bytes(bytes([32, 131, 11, 217]), "little") target_hashes = [bytes([203, 8, 157, 204]), bytes([149, 151, 162, 246]), bytes([84, 215, 18, 92]), bytes([249, 60, 86, 33])] target_hashes = [int.from_bytes(target_hash, "little") for target_hash in target_hashes] alphabet = b"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_" part1 = "" for target_hash in target_hashes: for c1 in alphabet: for c2 in alphabet: h = (c1 << 8) + c2 for i in range(32): if h & 1 != 0: h >>= 1 else: h >>= 1 h ^= d0 if h == target_hash: part1 += chr(c1) + chr(c2) print(part1) # "m1x_m0d3" ``` ## Solving the challenge With all four parts of the flag recovered, I can now check if I've got the correct flag! ``` C:\Users\fsharp\Desktop\bi0s>avernos.exe bi0s{m1x_m0d3_4_fun_w4y_to_h1de_s7uff} Correct ``` I felt incredibly lucky to have first blooded this challenge. This was fun to solve! ![bi0sctf_2025_avernos_first_blood_discord](https://hackmd.io/_uploads/BJ5uPvS4lx.png)