Network Analysis
pyc
powershell
AES
My fiancée, Mizuhara Chizuru, lent her laptop to one of her friends. Her friend downloaded files using a peer-to-peer protocol on Windows. However, not long after, one of Chizuru's important documents went missing. There's always something with my fiancée :D
Attachment : 1 pcap file
From the given pcap, we can see that there are 2 interesting traffics, TCP and ICMP, which account for most part of the traffics.
We can't do much for the ICMP at the moment, because the payload data (siganture) is not recognizable. On the other hand, Since there's no TLS recorded on the traffic, it's safe to assume that the TCP packets are not encrypted, thus we can perhaps get recognizable data from the TCP streams.
There are 4 TCP streams in total, that is Stream 0 - 3. In all of the stream we can see readable data that looks like a bizzare communication protocol is being used.
Data from Stream 0
Turns out it is a Direct Connect
protocol (NMDC). Direct Protocol is a peer-to-peer protocol that's usually used in devices communication, e.g printer, which also uses TCP layer. Moving on to stream 1, we could see a lot more data than before.
stream 1 (truncated)
There are some interesting keyword here, namely $ADCGET
and $ADCSND
.
Referencing to the popular NMDC docs, we could see that this commands are used for file transfer between connected host.
From this we can assume that we have to reconstruct data that's sent using DC protocol in all stream. To check whether our direction is correct, we can try to reconstruct the smaller file first, that is the files.xml.bz2
.
To extract the appropriate data, we can just parse any data after "ZL1|" string in the $ADCSND
block (refer and understand the documentation) and decompress them with zlib (ZL1 refers to zlib compression mode 1). I wasn't able to extract the complete file from Stream 1, but somehow stream 3 works fine. after decompressing the files.xml.bz2
, we can see that files.xml
contains list of file in certain directory
There are 2 interesting files above, contract.doc.exe
and secret.pdf.wkwk
. At this point, im assuming (again) that this is kind of ransomware scenario. To my surprise, all of the 2 files are refenrenced in our TCP stream, not with their filename, but rather by their corresponding TTH
in the files.xml
. We now know, that stream 1 is traffic for contract.doc.exe
file transfer, where as stream 3 is traffic for secret.pdf.wkwk
file transer.
Initially, when i tried to parse and decompress the data in stream 1, i always get zlib error in some part, resulting in corrupted final data. Long story short, i just realized that there are some TCP problems in stream 1 (only in stream 1 though), namely "TCP retransmission" and "TCP out-of-order".
TCP retranssmission and out-of-oder
That means, we have to re-sort the parsed data based on tcp.seq
value before decompressing them. below are my scripts for these parts.
Parse all
$ADCND
block and concatenate them to a single file as binary data.
parse data from each
$ADCSND
block, decompress them, append the bytes, and write them to a file as binary data
Using these scripts we were able to reconstruct the contract.doc.exe
To check if the reconstructed contract.doc.exe
is not corrupted, make sure to compare the actual file size with the size inside files.xml
. they should be the same
After getting the contract.doc.exe
, we can see that it's a PE32 executable
and after some analysis here and there, we found out that the executable contains many references to python-related lib, dll and files.
output of strings command
We assume that the exectuable is and exe made from python source code using python installer / compiler. to decompile and (perhaps) get the more readable source code, we used pyinstxtractor.
The tools "unpacked" our exe and outputs folder containings python bytecodes, dll, etc. there's 1 file that piqued or interest, that is server.pyc
, because it's the only pyc file with not default or lib-ish naming lol. using pycdc to "decompile" the server.pyc
we got the following result
This is confusing, i know, especially with variable naming. But let's focus on the last parts first. The line sniff(ChizuruMizuhara, 'icmp', yellowww, **('iface', 'filter', 'prn'))
is to sniff (listen) incoming icmp packet from ChizuruMizuhara
device interface, and pass the sniffed packet to yellow
function. ok, then what does the yellow
function do?
before diving deeper into yellow
function, let's rename/remap some function first.
now we can get a better picture of what yellow
function does. For every ICMP request received and only process if ICMP.raw data length < 1024. It then decrypt, and decompress the data and pass it as the kazuyabjirrrrrrrrrr
arguments. Unfortunately, we don't know the detail of kazuyabjirrrrrrrrrr
since the OP_CODE of python used is not supported by pycdc
, let's skip it for now.
the return of kazuyabjirrrrrrrrrr
then will be compressed and encrypted before sent as ICMP reply (via yoasobi
function). The gabut
function is actually similar to yellow
and yoasobi
combined, but with the reversed order of compression-encryption and decompression-decryption operations. To be honest, I don't know when it is used, so let's just leave it for now.
Using our knowledge so far, we can then try to parse data from ICMP and decrypt-decompress them. To check if our step is correct, we can see that in all corressponding ICMP request-reply packet, the ICMP data begins with the same bytes.
ICMP request
ICMP reply
This 16 bytes in the beginning is the IV used in the AES encryption (refer back to the decompiled pyc, that IV is prepended to the encryption result). So we got the cipher suite, IV, but now what about the key?
Again, refer back to the decompiled server.pyc
, we can see that the key used is the WKWKWKWKKW...
variable. then we only need to reverse engineer the kawokaowkaowk....
function as it used as the WKWKWKWK..
validator. here's our script to do so.
Using the information we gathered so far, we can then extract all data from ICMP packet.
After experimenting, there's certain pattern. for every pair of ICMP Request and Reply packet, they hold the exact same data. and there's additional ICMP reply packet with different IV. turns out, the Pair ICMP evaluates to base64 encoded command and the additional ICMP packet evaluates to the command output
The ICMP data we extract, resulted to 2 different files, req.txt
and rep.txt
. req.txt
contains base64encoded string with tons of special char. to decode it we need to use CyberChef and remove the null bytes. The decoded string is obfuscated powershell script (on god) with 10000++ lines. below is the formatted and truncated script.
formatted, truncated, and modified a bit
And near the ends, we could see a readable powershell function
in powershell variable can be declared by using ${}
and with such syntaxt, the variable name can include any utf-8 chars, be it special or alphanumeric. Before analyzing deeper, let's just see what each block (until iex) does. My intuition gave me that the line before iex
statement is trying to construct a string, so we can try to iex that variable before. Doing so on the first block gave me this
output of the first iex block
hoho, we are on the right track here. i tried to do this several times for each subsequent blocks (less than 10) and eventually find some that outputs something interesting
The first part picture is self-explanatory, it tries to create a glovbal variable $key
, but the second picture? let's just leave it at that for now. the next we do is to get all the command strings generated by the shellcode. unfortunately, we can't evaluate (execute) them all in the same powershell session, it will break the whole command. luckily, we were able to come up with automation script as shown below
This script will evaluate all of the string generation by shellcode and write it to a file. The content should look like this
To understand the purpose of commands operations in the previous section, we have to check the output of that command in rep.txt
, below is the truncated rep.txt
So the script is actually telling us, the correct byte element of the key at certain indexes by doing the operation before. If that's the case, then what we are interested in are pair of command-ouput which either eq operation-True
or not eq operation-False
since it tells us the correct value of key at that index. Well then the next part will be parsing (again :sob: ) to whatever format you like, in this case, we decided to parse it to python syntax e.g using Not()
. And we used z3 as the equalities solver, since evaluating manual 200++ statements is cumbersome.
This script will print the correct value for each key indices and after some post-proceessing, we get the key which is base64 encoded $key = ZCiksSGQXP+8ofYJqfphfdWD+orfiGW/EUuwtexXuNaV3w==
We already got the key, the secret.pdf.wkwk
, but what about the IV? That's what also we asked ourselves. After some time of tinkering, we decided to run the obfuscated powershell block, after the declaration of Encrypt-File
, which results as shown below
iex ${#%*}
Turned out the script does output the IV as base64 encoded string, and we found it in the rep.txt
deep down below.
Now That we have all the necessary information, we can decrypt the secret.pdf.wkwk
.
Somehow our solver in python doesn't do it correctly, so we ought to use powershell, weird.
TCP1P{It_tOOk_4_lOt_Of_s4cr1f1ces_fOr_Ch1zUrU_hUh?_It's_4ctU4lly_nOth1ng_fOr_me _4s_her_f14ncé.}
You can find me on :