Try   HackMD

ImaginaryCTF 2024 Printf Reversing Writeup

Printf is one of the reverse engineering challenges in ICTF 2024. It had 14 solves by the end of the CTF, and I got the second blood. Unfortunately, I wasn't fast enough to get the first blood, but it's my loss.

Introduction

As the name of the challenge suggests, it involves something related to printf. You're right; it was a VM-based challenge with format specifiers.

Given two files, the binary and libc, it's important to use the correct libc because different versions can cause different values in the stack. The first thing you want to do is clear: patch it.

You could do it with patchelf or pwninit, it doesn't really matter as long as the binary runs and you don't segfault.

Before reversing the VM, you have to know that printf needs something called format specifiers to tell whether you're printing this variable as an integer or any other type. It is basic C knowledge after all. But what happens if variables provided in the printf argument is less than the specifier? The answer is it would use stack values.

This describes exactly what's a format string vulnerability. It is a vulnerability that occurs when you print user controlled buffer without a format specifier.

Get back to work

Reversing a VM requires time. We need time to understand what the VM actually does, what algorithm it is trying to mimic, or even to understand what the opcodes are.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

So in our case, the opcode was format specifier and we need to understand it, but what if i told you we don't need to do that, understanding the format specifier given in the binary was a complicated thing to do, it will require us more time and it's not guarantee that we will understand the whole VM.

let's take a look at the format.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Understanding the format specifiers it self is not necessary, but knowing what it does was the important thing here. So there's printf with lot of specifier there must be another function to support this vm, i mean format specifier it self won't write a shellcode, it just change the flow of execution.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

we shouldn't ignore this function, let's take a look at one that actually do stuff.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

sub_11A8 is function to get user input, and put it somewhere on the memory, to make it short there's function to add, multiply, mod, etc reverse it yourself for proper understanding.

We haven't try to run the binary, so let's run it.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

A flag checker, to understand the vm we'll writing gdb script to know the order of execution, from there we'll try to understand what the VM does.

using gdb.execute and gdb.parse_and_eval will do all job we need. If you want.

here's the script i'm using

import gdb
import os

FILE = 'vm_opcode.py'

if os.path.exists(FILE):
    os.remove(FILE)

def bp(addr: int):
    gdb.execute("b *"+str(addr))


# gdb.execute("file printf_patched")
gdb.execute("file printfff")
func = [
0x00005555555551A8,
0x00005555555551E5,
0x0000555555555212,
0x000055555555525B,
0x00005555555552CE,
0x000055555555533A,
0x00005555555553BE,
0x0000555555555446,
0x00005555555554CB,
0x0000555555555555,
0x00005555555555DF,
0x000055555555567E, # 0x0000555555555663,
0x0000555555555694,
0x00005555555551DE,
0x000055555555520a,
0x0000555555555250,
0x00005555555552C3,
0x000055555555532F,
0x000055555555539D,
0x00005555555553B3,
0x0000555555555422,
0x000055555555543B,
0x00005555555554A9,
0x00005555555554C0,
0x0000555555555531,
0x000055555555554A,
0x00005555555555bb,
0x00005555555555D4,
0x0000555555555642,
0x0000555555555658,
0x00005555555552AD
]

func_name = {
    func[0] : 'getchar',
    func[1] : 'putchar',
    func[2] : 'mov1',
    func[3] : 'mov2',
    func[4] : 'mov3',
    func[5] : 'add',
    func[6] : 'sub',
    func[7] : 'mul',
    func[8] : 'div',
    func[9] : 'mod',
    func[10] : 'xor',
    func[11] : 'check',
    func[12] : 'setup'
}

for _ in func:
    bp(_)

gdb.execute("r <<< " + "a"*(100))

myfile = open(FILE, 'w')

myfile.write('''def putchar(c): print(c, end='')
def getchar(): print('input 1 char here')
glob_var = [0]*2000      
other_var = [0]*2000      
''')

while True:
    try:
        rip = int(gdb.parse_and_eval("$rip"))
        arg1 = hex(gdb.parse_and_eval("$rdi"))
        arg2 = hex(gdb.parse_and_eval("$rsi"))
        if rip in func_name:
            name = func_name[rip]
            if name == 'getchar':
                myfile.writelines('getchar()\n')
                gdb.execute('c')
                addr = eval(gdb.execute('p $rcx+$rdx', to_string=True).split(' ')[-1])
                val = hex(gdb.parse_and_eval("$rax"))
                myfile.writelines(f"glob_var[{(addr-0x55555556e040)//8}]={eval(val)} ")
                myfile.writelines(f"# mov {hex(addr)}, {val}, val is from our input char\n")
            if name == 'putchar':
                gdb.execute('c')
                arg1 = int(gdb.parse_and_eval("$rdi"))
                myfile.writelines(f'putchar({repr(chr(arg1 & 0xff))})\n')
            if name == 'mov1':
                gdb.execute('c')
                addr = eval(gdb.execute('p $rcx+$rdx', to_string=True).split(' ')[-1])
                val = hex(gdb.parse_and_eval("$rax"))
                myfile.writelines(f"glob_var[{(addr-0x55555556e040)//8}]={eval(val)} ")
                myfile.writelines(f"# mov {hex(addr)}, {val}\n")
            if name == 'mov2':
                gdb.execute('c')
                idx = (eval(gdb.execute('p $rcx+$rax', to_string=True).split(' ')[-1]) - 0x555555570060) // 8
                gdb.execute('c')
                addr = eval(gdb.execute('p $rcx+$rdx', to_string=True).split(' ')[-1])
                val = hex(gdb.parse_and_eval("$rax"))
                myfile.writelines(f"# from other_var[{idx}]\n")
                myfile.writelines(f"glob_var[{(addr-0x55555556e040)//8}]={eval(val)} ")
                myfile.writelines(f"# mov {hex(addr)}, {val}\n")
            if name == 'mov3':
                gdb.execute('c')
                addr = eval(gdb.execute('p $rcx+$rdx', to_string=True).split(' ')[-1])
                val = hex(gdb.parse_and_eval("$rax"))
                myfile.writelines(f"other_var[{(addr-0x555555570060)//8}]={eval(val)} ")
                myfile.writelines(f"# mov {hex(addr)}, {val}\n")
            if name == 'add':
                gdb.execute('c')
                arg1 = hex(gdb.parse_and_eval("$rcx"))
                arg2 = hex(gdb.parse_and_eval("$rax"))
                idx = eval(gdb.execute('x/gx 0x555555570040', to_string=True).split('\t')[-1])
                myfile.writelines(f"# {arg1} + {arg2}, idx at {idx}\n")
                gdb.execute('c')
                addr = eval(gdb.execute('p $rax+$rdx', to_string=True).split(' ')[-1])
                val = hex(gdb.parse_and_eval("$rcx"))
                myfile.writelines(f"other_var[{(addr-0x555555570060)//8}]={eval(val)} ")
                myfile.writelines(f"# add {hex(addr)}, {val}\n")
            if name == 'sub':
                gdb.execute('c')
                arg1 = hex(gdb.parse_and_eval("$rax"))
                arg2 = hex(gdb.parse_and_eval("$rdx"))
                idx = eval(gdb.execute('x/gx 0x555555570040', to_string=True).split('\t')[-1])
                myfile.writelines(f"# {arg1} - {arg2}, idx at {idx}\n")
                gdb.execute('c')
                addr = eval(gdb.execute('p $rcx+$rdx', to_string=True).split(' ')[-1])
                val = hex(gdb.parse_and_eval("$rax"))
                myfile.writelines(f"other_var[{(addr-0x555555570060)//8}]={eval(val)} ")
                myfile.writelines(f"# sub {hex(addr)}, {val}\n")
            if name == 'mul':
                gdb.execute('c')
                arg1 = hex(gdb.parse_and_eval("$rax"))
                arg2 = hex(gdb.parse_and_eval("$rdx"))
                idx = eval(gdb.execute('x/gx 0x555555570040', to_string=True).split('\t')[-1])
                myfile.writelines(f"# {arg1} * {arg2}, idx at {idx}\n")
                gdb.execute('c')
                addr = eval(gdb.execute('p $rcx+$rdx', to_string=True).split(' ')[-1])
                val = hex(gdb.parse_and_eval("$rax"))
                myfile.writelines(f"other_var[{(addr-0x555555570060)//8}]={eval(val)} ")
                myfile.writelines(f"# mul {hex(addr)}, {val}\n")
            if name == 'div':
                gdb.execute('c')
                arg1 = hex(gdb.parse_and_eval("$rax"))
                arg2 = hex(gdb.parse_and_eval("$rsi"))
                idx = eval(gdb.execute('x/gx 0x555555570040', to_string=True).split('\t')[-1])
                myfile.writelines(f"# {arg1} / {arg2}, idx at {idx}\n")
                gdb.execute('c')
                addr = eval(gdb.execute('p $rax+$rdx', to_string=True).split(' ')[-1])
                val = hex(gdb.parse_and_eval("$rcx"))
                myfile.writelines(f"other_var[{(addr-0x555555570060)//8}]={eval(val)} ")
                myfile.writelines(f"# div {hex(addr)}, {val}\n")
            if name == 'mod':
                gdb.execute('c')
                arg1 = hex(gdb.parse_and_eval("$rax"))
                arg2 = hex(gdb.parse_and_eval("$rcx"))
                idx = eval(gdb.execute('x/gx 0x555555570040', to_string=True).split('\t')[-1])
                myfile.writelines(f"# {arg1} % {arg2}, idx at {idx}\n")
                gdb.execute('c')
                addr = eval(gdb.execute('p $rcx+$rdx', to_string=True).split(' ')[-1])
                val = hex(gdb.parse_and_eval("$rax"))
                myfile.writelines(f"other_var[{(addr-0x555555570060)//8}]={eval(val)} ")
                myfile.writelines(f"# mod {hex(addr)}, {val}\n")
            if name == 'xor':
                gdb.execute('c')
                arg1 = hex(gdb.parse_and_eval("$rcx"))
                arg2 = hex(gdb.parse_and_eval("$rax"))
                idx = eval(gdb.execute('x/gx 0x555555570040', to_string=True).split('\t')[-1])
                myfile.writelines(f"# {arg1} ^ {arg2}, idx at {idx}\n")
                gdb.execute('c')
                addr = eval(gdb.execute('p $rax+$rdx', to_string=True).split(' ')[-1])
                val = hex(gdb.parse_and_eval("$rcx"))
                myfile.writelines(f"other_var[{(addr-0x555555570060)//8}]={eval(val)} ")
                myfile.writelines(f"# xor {hex(addr)}, {val}\n")
            if name == 'check':
                rax = int(gdb.parse_and_eval("$rax"))
                myfile.writelines(f"print('win' if ({rax} == 0) else 'lose')\n")
                print("done should be here")
            if name == 'setup':
                myfile.writelines("# set up some variable check ida for more\n")
        else: 
            print("error fuck")
        gdb.execute("c")
    except:
        myfile.close()
        gdb.execute('q')

basically we just hook each function and extract the value and put it in other file, maybe there's a better tools for this, but i haven't learned something except this python gdb script (sorry). And to run it do gdb -x script.py

But there's a catch the binary dies for the wrong input, in my solution i need to patch one function, the check function (just nop it).

image

So now there wont be any exit and the binary will accepts all of our input.

now let's inspect our vm_opcode.py generated by gdb script.

image

there're a lot more down there, let search the check.

image

btw this is the file i generated with correct input so that's why it'll print win. after reading it carefully i get the equation.

eq1 = input1 * 0x342e70ab89920dd9
eq2 = input2 * -0x796b3031dc23f20f
eq3 = input3 * 0x5e14dede8db175aa
eq4 = input4 * 0x488ede6e062613be

eq5 = eq1 + eq2 + eq3 + eq4

eq5 ^ -0x74926eb3301c1b34 ^ -0x62d634b8a1be3471 == 0

so it'll check our input 4 by 4, so we need to get 4 char correct to get the next block checked, that's why i patch the check exit.

each block will have different constant value, but the equations still the same, i was not expecting that the flag would be 72 chars, so i extract value from my vm_opcode.py manually one by one and throw it in z3 one by one, smh i'm so stupid for doing this.

here's my z3 script:

from z3 import *

# Define the constants
c1 = -0x66dc43851a6bdfb
c2 = -0x68460b0b3ee4ec12
c3 = 0x2452efbf17fce49
c4 = 0x7b75687e9288270a

# XOR constants
xor1 = 0x54ccfb02994249cf
xor2 = -0x305197660dbf0cf6

# Create Z3 variables
input1 = BitVec('input1', 64)
input2 = BitVec('input2', 64)
input3 = BitVec('input3', 64)
input4 = BitVec('input4', 64)

# Define the equations
eq1 = input1 * c1
eq2 = input2 * c2
eq3 = input3 * c3
eq4 = input4 * c4
eq5 = eq1 + eq2 + eq3 + eq4

# Define the equation to solve
solver = Solver()
solver.add(eq5 ^ xor1 ^ xor2 == 0)
solver.add(And(input1 >= 32, input1 <= 126))  # Printable ASCII range for input1
solver.add(And(input2 >= 32, input2 <= 126))  # Printable ASCII range for input2
solver.add(And(input3 >= 32, input3 <= 126))  # Printable ASCII range for input3
solver.add(And(input4 >= 32, input4 <= 126))  # Printable ASCII range for input4

# Check satisfiability
if solver.check() == sat:
    model = solver.model()
    print(f"Solution found:")
    print(f"{chr(model[input1].as_long())}", end='')
    print(f"{chr(model[input2].as_long())}", end='')
    print(f"{chr(model[input3].as_long())}", end='')
    print(f"{chr(model[input4].as_long())}", end='')
else:
    print("No solution exists")

and that's it, by very patiently extracting and running z3 we get our flag

ictf{n3v3r_too_m4ny_form4t_sp3cifi3rs_9a7837294d1633140433f51d13a033736}

It was really fun to do this challenge, next time i need to be smarter on how to approach VM challenge, thanks.