Description:
True story: I had planned on learning to write LLVM passes and use them for obfuscation for this year's gCTF challenge.However, when I compiled this code with optimizations enabled, I decided I didn't need to write an obfuscator.
It's a crackme, you know the drill. GL HF!
After a quick check, I found the code flow is effortless:
First, it reads 32 chars from the console, then converts it from hex bytes. It saves these bytes as a rc4 key. So we know that it takes the 16-byte key from our input.
That key is just used for rc4 to decrypt a given buffer, a.k.a flag.
After that, it converts 16 bytes to an array, also a 4x4 matrix (the matrix is that thing I don't think about when doing the challenge, so I got stuck while analyzing and guessing that thing).
So, here is the code flow of the challenge:
To be honest, I didn't solve the challenge but after a quick check of the src, I can't believe that the challenge flow is not hard at all, it's about the math problem. So I decided to test it all.
Okay, just think, if we know that a 16-word array is a matrix, so do_something
is probably an operation of the matrix, like add, multiply, inverse,…vv.
So I decided to test it, this one is the input matrix.
Here's result in python:
result in IDA:
So, do_something
seems like a matrix add
operation. But when I take another add
, it seems incorrect:
Result:
Result in IDA:
Even add
operation in a special way, I will explain in rns:
So the correct add
is:
Then we know:
I found another matrix that I didn't know while doing this challenge (it looked like an unknown xmmword, so I formatted and renamed it):
Do you think a whole bunch of code means just a simple operation? Yes, that's it, the matrix multiplied with the other given matrix I found.
Testing
Result in IDA:
It's not working, am I missing something?
After a quick check, I found that another_matrix
is used for later, and I see that just a simple xor operation but in the form of substitution.
I know it's xor after checking the src code before it xor before it's pack and decode the matrix. (matrix using a custom base in range 260, instead of 256)
Keep in mind this, the matrix uses rns
, which is like another base.
Residue number system
A residue numeral system (RNS) is a numeral system representing integers by their values modulo several pairwise coprime integers called the moduli. This representation is allowed by the Chinese remainder theorem, which asserts that, if M is the product of the moduli, there is, in an interval of length M, exactly one integer having any given set of modular values. The arithmetic of a residue numeral system is also called multi-modular arithmetic.
https://en.wikipedia.org/wiki/Residue_number_system
https://personal.utdallas.edu/~ivor/ce6305/m5p.pdf
which means:
That's the reason why I got stuck for a long time…
So, we know the property of xor, then just xor again and we get the original buffer.
Just get the xor_table from binary, and then I use this function to operate xor.
I've tried this one but it didn't work when I do it again, even if I patch these words into IDA and let it run, still got the wrong answer.
So I write another function to xor but just using brute foce, and it work:
Now, we have completed step 1, and here's the flow we got:
and we have: matrix_1
, matrix_2
, another_matrix
, and buf
The feeling…
RULE 35: It just a matrix operation
Before we reverse, I figure out how it is possible. I mean, How can a simple operation be a bunch of code?
Here's a simple matrix multiply from programiz
but I'm using built-in type: long double
instead of int
:
So, I compiled with Clang++ and see the different between without optimize and optimize flag enabled:
We easily see that optimization makes the code harder. Even if it is just a built-in type, the challenge is using a custom one, which is almost impossible.
It's just a point of view from a reverser, here's more about clang optimize:
https://www.incredibuild.com/blog/compiling-with-clang-optimization-flags
https://news.ycombinator.com/item?id=28207207
https://www.reddit.com/r/C_Programming/comments/conavx/clangs_optimizer_is_ridiculously_smart_like/
In the challenge, we see it as SIMD:
https://ftp.cvut.cz/kernel/people/geoff/cell/ps3-linux-docs/CellProgrammingTutorial/BasicsOfSIMDProgramming.html
https://www.codeproject.com/Articles/5298048/Using-SIMD-to-Optimize-x86-Assembly-Code-in-Array
We know that the code just doing some simple operations but is optimized.
After a day I figure out what is it. I finally found it just matrix multiply (as I guess) but it was difficult to find the multiplicand (matrix A)
So, to save time I decided to get it from src code (will be updated later).
Here is the complete flow:
And here's the way that I operate xor to get the previous buffer before compare (expected_to_cmp):
I tried both ways: brute and get xor table, which got me 2 different matrices.
I've reimplement the rns matrix in python, just need the matrix invert operation.
Here invert from @d4rk9n1ght in sagemath.
output
Here's script that recover the input:
output
So, I finally found the correct input but still the wrong key for decryption. It is probably the problem from 2 matrices that I found above.
Also, I tried to brute force the byte in the matrix and got the result:
output
Finally got flag: ctf{I_pr0mize_its_jUsT_mAtriCeS}
In my point, it's the easiest challenge, and not interested at all, but at least, I know that I need to learn more, more, and more math. The new meta of Reverse. Also, I will update the remaining challenge later, pls enjoy.