# ImaginaryCTF 2024 Printf Reversing Writeup Printf is one of the reverse engineering challenges in ICTF 2024. It had 14 solves by the end of the CTF, and I got the second blood. Unfortunately, I wasn't fast enough to get the first blood, but it's my loss. ## Introduction As the name of the challenge suggests, it involves something related to printf. You're right; it was a VM-based challenge with format specifiers. Given two files, the binary and libc, it's important to use the correct libc because different versions can cause different values in the stack. The first thing you want to do is clear: patch it. You could do it with `patchelf` or `pwninit`, it doesn't really matter as long as the binary runs and you don't [segfault](https://en.wikipedia.org/wiki/Segmentation_fault). Before reversing the VM, you have to know that printf needs something called format specifiers to tell whether you're printing this variable as an integer or any other type. It is basic C knowledge after all. But what happens if variables provided in the printf argument is less than the specifier? The answer is it would use stack values. This describes exactly what's a [format string vulnerability](https://owasp.org/www-community/attacks/Format_string_attack). It is a vulnerability that occurs when you print user controlled buffer without a format specifier. ## Get back to work Reversing a VM requires time. We need time to understand what the VM actually does, what algorithm it is trying to mimic, or even to understand what the opcodes are. ![image](https://hackmd.io/_uploads/HkDUJs2dC.png) So in our case, the opcode was format specifier and we need to understand it, but what if i told you we don't need to do that, understanding the format specifier given in the binary was a complicated thing to do, it will require us more time and it's not guarantee that we will understand the whole VM. let's take a look at the `format`. ![image](https://hackmd.io/_uploads/HkK3JihdR.png) Understanding the format specifiers it self is not necessary, but knowing what it does was the important thing here. So there's printf with lot of specifier there must be another function to support this vm, i mean format specifier it self won't write a shellcode, it just change the flow of execution. ![image](https://hackmd.io/_uploads/By0D-ih_0.png) we shouldn't ignore this function, let's take a look at one that actually do `stuff`. ![image](https://hackmd.io/_uploads/rJSibi2OA.png) `sub_11A8` is function to get user input, and put it somewhere on the memory, to make it short there's function to add, multiply, mod, etc reverse it yourself for proper understanding. We haven't try to run the binary, so let's run it. ![image](https://hackmd.io/_uploads/H1ciMi2uC.png) A flag checker, to understand the vm we'll writing `gdb script` to know the order of execution, from there we'll try to understand what the VM does. using `gdb.execute` and `gdb.parse_and_eval` will do all job we need. If you want. here's the script i'm using ```python! import gdb import os FILE = 'vm_opcode.py' if os.path.exists(FILE): os.remove(FILE) def bp(addr: int): gdb.execute("b *"+str(addr)) # gdb.execute("file printf_patched") gdb.execute("file printfff") func = [ 0x00005555555551A8, 0x00005555555551E5, 0x0000555555555212, 0x000055555555525B, 0x00005555555552CE, 0x000055555555533A, 0x00005555555553BE, 0x0000555555555446, 0x00005555555554CB, 0x0000555555555555, 0x00005555555555DF, 0x000055555555567E, # 0x0000555555555663, 0x0000555555555694, 0x00005555555551DE, 0x000055555555520a, 0x0000555555555250, 0x00005555555552C3, 0x000055555555532F, 0x000055555555539D, 0x00005555555553B3, 0x0000555555555422, 0x000055555555543B, 0x00005555555554A9, 0x00005555555554C0, 0x0000555555555531, 0x000055555555554A, 0x00005555555555bb, 0x00005555555555D4, 0x0000555555555642, 0x0000555555555658, 0x00005555555552AD ] func_name = { func[0] : 'getchar', func[1] : 'putchar', func[2] : 'mov1', func[3] : 'mov2', func[4] : 'mov3', func[5] : 'add', func[6] : 'sub', func[7] : 'mul', func[8] : 'div', func[9] : 'mod', func[10] : 'xor', func[11] : 'check', func[12] : 'setup' } for _ in func: bp(_) gdb.execute("r <<< " + "a"*(100)) myfile = open(FILE, 'w') myfile.write('''def putchar(c): print(c, end='') def getchar(): print('input 1 char here') glob_var = [0]*2000 other_var = [0]*2000 ''') while True: try: rip = int(gdb.parse_and_eval("$rip")) arg1 = hex(gdb.parse_and_eval("$rdi")) arg2 = hex(gdb.parse_and_eval("$rsi")) if rip in func_name: name = func_name[rip] if name == 'getchar': myfile.writelines('getchar()\n') gdb.execute('c') addr = eval(gdb.execute('p $rcx+$rdx', to_string=True).split(' ')[-1]) val = hex(gdb.parse_and_eval("$rax")) myfile.writelines(f"glob_var[{(addr-0x55555556e040)//8}]={eval(val)} ") myfile.writelines(f"# mov {hex(addr)}, {val}, val is from our input char\n") if name == 'putchar': gdb.execute('c') arg1 = int(gdb.parse_and_eval("$rdi")) myfile.writelines(f'putchar({repr(chr(arg1 & 0xff))})\n') if name == 'mov1': gdb.execute('c') addr = eval(gdb.execute('p $rcx+$rdx', to_string=True).split(' ')[-1]) val = hex(gdb.parse_and_eval("$rax")) myfile.writelines(f"glob_var[{(addr-0x55555556e040)//8}]={eval(val)} ") myfile.writelines(f"# mov {hex(addr)}, {val}\n") if name == 'mov2': gdb.execute('c') idx = (eval(gdb.execute('p $rcx+$rax', to_string=True).split(' ')[-1]) - 0x555555570060) // 8 gdb.execute('c') addr = eval(gdb.execute('p $rcx+$rdx', to_string=True).split(' ')[-1]) val = hex(gdb.parse_and_eval("$rax")) myfile.writelines(f"# from other_var[{idx}]\n") myfile.writelines(f"glob_var[{(addr-0x55555556e040)//8}]={eval(val)} ") myfile.writelines(f"# mov {hex(addr)}, {val}\n") if name == 'mov3': gdb.execute('c') addr = eval(gdb.execute('p $rcx+$rdx', to_string=True).split(' ')[-1]) val = hex(gdb.parse_and_eval("$rax")) myfile.writelines(f"other_var[{(addr-0x555555570060)//8}]={eval(val)} ") myfile.writelines(f"# mov {hex(addr)}, {val}\n") if name == 'add': gdb.execute('c') arg1 = hex(gdb.parse_and_eval("$rcx")) arg2 = hex(gdb.parse_and_eval("$rax")) idx = eval(gdb.execute('x/gx 0x555555570040', to_string=True).split('\t')[-1]) myfile.writelines(f"# {arg1} + {arg2}, idx at {idx}\n") gdb.execute('c') addr = eval(gdb.execute('p $rax+$rdx', to_string=True).split(' ')[-1]) val = hex(gdb.parse_and_eval("$rcx")) myfile.writelines(f"other_var[{(addr-0x555555570060)//8}]={eval(val)} ") myfile.writelines(f"# add {hex(addr)}, {val}\n") if name == 'sub': gdb.execute('c') arg1 = hex(gdb.parse_and_eval("$rax")) arg2 = hex(gdb.parse_and_eval("$rdx")) idx = eval(gdb.execute('x/gx 0x555555570040', to_string=True).split('\t')[-1]) myfile.writelines(f"# {arg1} - {arg2}, idx at {idx}\n") gdb.execute('c') addr = eval(gdb.execute('p $rcx+$rdx', to_string=True).split(' ')[-1]) val = hex(gdb.parse_and_eval("$rax")) myfile.writelines(f"other_var[{(addr-0x555555570060)//8}]={eval(val)} ") myfile.writelines(f"# sub {hex(addr)}, {val}\n") if name == 'mul': gdb.execute('c') arg1 = hex(gdb.parse_and_eval("$rax")) arg2 = hex(gdb.parse_and_eval("$rdx")) idx = eval(gdb.execute('x/gx 0x555555570040', to_string=True).split('\t')[-1]) myfile.writelines(f"# {arg1} * {arg2}, idx at {idx}\n") gdb.execute('c') addr = eval(gdb.execute('p $rcx+$rdx', to_string=True).split(' ')[-1]) val = hex(gdb.parse_and_eval("$rax")) myfile.writelines(f"other_var[{(addr-0x555555570060)//8}]={eval(val)} ") myfile.writelines(f"# mul {hex(addr)}, {val}\n") if name == 'div': gdb.execute('c') arg1 = hex(gdb.parse_and_eval("$rax")) arg2 = hex(gdb.parse_and_eval("$rsi")) idx = eval(gdb.execute('x/gx 0x555555570040', to_string=True).split('\t')[-1]) myfile.writelines(f"# {arg1} / {arg2}, idx at {idx}\n") gdb.execute('c') addr = eval(gdb.execute('p $rax+$rdx', to_string=True).split(' ')[-1]) val = hex(gdb.parse_and_eval("$rcx")) myfile.writelines(f"other_var[{(addr-0x555555570060)//8}]={eval(val)} ") myfile.writelines(f"# div {hex(addr)}, {val}\n") if name == 'mod': gdb.execute('c') arg1 = hex(gdb.parse_and_eval("$rax")) arg2 = hex(gdb.parse_and_eval("$rcx")) idx = eval(gdb.execute('x/gx 0x555555570040', to_string=True).split('\t')[-1]) myfile.writelines(f"# {arg1} % {arg2}, idx at {idx}\n") gdb.execute('c') addr = eval(gdb.execute('p $rcx+$rdx', to_string=True).split(' ')[-1]) val = hex(gdb.parse_and_eval("$rax")) myfile.writelines(f"other_var[{(addr-0x555555570060)//8}]={eval(val)} ") myfile.writelines(f"# mod {hex(addr)}, {val}\n") if name == 'xor': gdb.execute('c') arg1 = hex(gdb.parse_and_eval("$rcx")) arg2 = hex(gdb.parse_and_eval("$rax")) idx = eval(gdb.execute('x/gx 0x555555570040', to_string=True).split('\t')[-1]) myfile.writelines(f"# {arg1} ^ {arg2}, idx at {idx}\n") gdb.execute('c') addr = eval(gdb.execute('p $rax+$rdx', to_string=True).split(' ')[-1]) val = hex(gdb.parse_and_eval("$rcx")) myfile.writelines(f"other_var[{(addr-0x555555570060)//8}]={eval(val)} ") myfile.writelines(f"# xor {hex(addr)}, {val}\n") if name == 'check': rax = int(gdb.parse_and_eval("$rax")) myfile.writelines(f"print('win' if ({rax} == 0) else 'lose')\n") print("done should be here") if name == 'setup': myfile.writelines("# set up some variable check ida for more\n") else: print("error fuck") gdb.execute("c") except: myfile.close() gdb.execute('q') ``` basically we just hook each function and extract the value and put it in other file, maybe there's a better tools for this, but i haven't learned something except this python gdb script (sorry). And to run it do `gdb -x script.py` But there's a catch the binary dies for the wrong input, in my solution i need to patch one function, the check function (just nop it). ![image](https://hackmd.io/_uploads/r1ntro3_0.png) So now there wont be any exit and the binary will accepts all of our input. now let's inspect our `vm_opcode.py` generated by gdb script. ![image](https://hackmd.io/_uploads/SJANIi3d0.png) there're a lot more down there, let search the check. ![image](https://hackmd.io/_uploads/BkC_8jhdC.png) btw this is the file i generated with correct input so that's why it'll print `win`. after reading it carefully i get the equation. ``` eq1 = input1 * 0x342e70ab89920dd9 eq2 = input2 * -0x796b3031dc23f20f eq3 = input3 * 0x5e14dede8db175aa eq4 = input4 * 0x488ede6e062613be eq5 = eq1 + eq2 + eq3 + eq4 eq5 ^ -0x74926eb3301c1b34 ^ -0x62d634b8a1be3471 == 0 ``` so it'll check our input 4 by 4, so we need to get 4 char correct to get the next block checked, that's why i patch the check exit. each block will have different constant value, but the equations still the same, i was not expecting that the flag would be 72 chars, so i extract value from my `vm_opcode.py` manually one by one and throw it in z3 one by one, smh i'm so stupid for doing this. here's my z3 script: ```python! from z3 import * # Define the constants c1 = -0x66dc43851a6bdfb c2 = -0x68460b0b3ee4ec12 c3 = 0x2452efbf17fce49 c4 = 0x7b75687e9288270a # XOR constants xor1 = 0x54ccfb02994249cf xor2 = -0x305197660dbf0cf6 # Create Z3 variables input1 = BitVec('input1', 64) input2 = BitVec('input2', 64) input3 = BitVec('input3', 64) input4 = BitVec('input4', 64) # Define the equations eq1 = input1 * c1 eq2 = input2 * c2 eq3 = input3 * c3 eq4 = input4 * c4 eq5 = eq1 + eq2 + eq3 + eq4 # Define the equation to solve solver = Solver() solver.add(eq5 ^ xor1 ^ xor2 == 0) solver.add(And(input1 >= 32, input1 <= 126)) # Printable ASCII range for input1 solver.add(And(input2 >= 32, input2 <= 126)) # Printable ASCII range for input2 solver.add(And(input3 >= 32, input3 <= 126)) # Printable ASCII range for input3 solver.add(And(input4 >= 32, input4 <= 126)) # Printable ASCII range for input4 # Check satisfiability if solver.check() == sat: model = solver.model() print(f"Solution found:") print(f"{chr(model[input1].as_long())}", end='') print(f"{chr(model[input2].as_long())}", end='') print(f"{chr(model[input3].as_long())}", end='') print(f"{chr(model[input4].as_long())}", end='') else: print("No solution exists") ``` and that's it, by very patiently extracting and running z3 we get our flag `ictf{n3v3r_too_m4ny_form4t_sp3cifi3rs_9a7837294d1633140433f51d13a033736}` It was really fun to do this challenge, next time i need to be smarter on how to approach VM challenge, thanks.