Slide (Kyungchan): https://docs.google.com/presentation/d/1tYCSaAzWR5N-LkEfZIM6bJPa4PGf9JRGqlY5ATJjbZY/edit#slide=id.p
Slide (Jiacheng): https://docs.google.com/presentation/d/1EbF2e9GXvZ7yfpAk-5vK-2_an9f8t__UIUDnUU2_dnI/edit#slide=id.g322a75e83c2_2_78
Slide (Arjun): https://docs.google.com/presentation/d/1Sozz06YI7Gb-Wj_07Caj0NTNq6u3otaHS9o-hiAvDh8/edit?pli=1#slide=id.p
### 12/18/24 Meeting Summary:
1. Deadcode elimination led to missing conditionals (if/loop condition) in compiled binary.
2. Checking assembly code (.S) to check the elimination of conditional statements.
3. Adding casting (char --> unsigned char) solved the problem.
Issued code: if ((choice + 0x90U) < 4) break; --> if (3 < (unsigned char)(choice + 0x90U))
### Takeaway:
1. Check for instructions when performing perturbation due to possible deadcode elimination.
2. Run compiled code and check if it runs as expected.
3. Use rand() as an input to eliminate optimization.
### 1/2/25 Meeting Summary:
1. Map the source code with instructions to see where each instruction came from.
* Use Ghidra, or .S (basicgame.S which is produced using the debug option) to map source code
2. Modify source code line by line to see how instruction changes.
@Kyungchan:
1. Map instruction with source code (weird for loop in the example).
2. Apply the pattern found in the swap function to transform automatically.
@Jiacheng:
1. Map instruction with source code with more divided (instead of having while, if, and variable together).
2. Find the pattern from the examples (e.g., FD_ZERO, FD_SET) and apply it in the transform later.
3. When analyze the instruction differences, convert back to corresponding small code pieces.
4. One transform pattern: (CONSTANT >> str) & 1 != 0 is doing char comparison, it can be decoded to ascii char code.
5. MARCO transform: When MARCO is using C code, find the C code and do normal compare and transform. When MARCO is using assembly code, we can just use the assembly code as a pattern to replace.
6. Unify constant and registers of the same type in instructions, the similarity score care more about instruction differences.
@Arjun:
1. Map instruction with source code using Ghidra or .S file to see where the different instructions came from.
### 1/8/25 Meeting Summary:
1. Check further into unused for loop using GDB.
--> Check whether it is Ghidra's error. With the condition and constraint, we can claim that it is Ghidra's error.
2. Start perturbation with blocks. Try not to start from copying original code, use instructions to see where to perturb.
3. If pattern is found in the decompiled code, write a simple python script to transform.
@Kyungchan:
1. Check unused for loop using GDB.
2. Automate pattern tranformation (swap function).
@Jiacheng:
1. Consider priority of transformation, start with high priority.
2. Automate pattern transformation (multiple nested if).
3. Start perturbation with blocks of instructions. Try not to copy from the original code.
### 1/16/25 Meeting Summary:
@Kyungchan:
1. Updated preprocessing. Need more testing and if there is any bug found, update the code and share it in the discord.
2. Found more patterns --> For loop expands to each individual blocks.
3. To start perturbation, instead of looking at original code, check blocks where it has big difference in block from the instruction.
4. Checking empty for loop. Using GDB to see where it executes within the for loop.
@Jiacheng:
1. Updated patterns using regex.
@Arjun:
1. When struct has array, it expands to arrow operator when decompiled.
2. Different if statement is merged into the loop when for loop is converted to do-while loop when decompiled. --> check goto and move blocks accordingly.
Please share finished/planned samples here. I will make a schedule of what has been done and what needs to do for each of us.
Also, please update HackMD if any new preprocessing or transformation pattern is found.
### 1/22/25 Meeting Summary:
@Kyungchan:
1. Check empty for loop using GDB where it top executes.
2. Found more patterns (loop unrolling).
Todo:
1. Check new sample codes and assign functions to everyone.
2. Check finished functions.
@Jiacheng:
1. Updated patterns using regex.
2. Applied converting offsets to CONST, generalizing registers (e.g., xmm1, xmm2, xmm3 to xmm)
@Arjun:
1. Found a bug in preprocessing where if there is a new line from the decompiled code, it does not convert.
### 1/29/25 Meeting Summary:
@Kyungchan:
1. Organized samples in spreadsheet:
https://docs.google.com/spreadsheets/d/1D224FGo0AW86-CpqDcgNttYmO57LSae5SKzHrWHS6M8/edit?usp=sharing
* Added status
* Reviewing finished samples to extract patterns or interesting results
@Jiacheng:
1. Worked on more samples
* When decompiled, __sync_sub_and_fetch() function transforms to multiple lines of code
* Decompiled code only had casting difference: size_t --> unsigned int
For @Jiacheng when you find those interesting pattern, can you dig little more into it and find what those are and why they may be decompiled that way? (e.g., __sync_sub_and_fetch, size_t)
@Arjun:
1. Due to the exam, didn't have update
#### Todo:
@Kyungchan:
1. Review finished samples to extract patterns or interesting results.
@Jiacheng @Arjun:
1. Continue to work on samples
### 2/5/25 Meeting Summary:
@Kyungchan:
1. Updating pattern transformation (in progress)
* Loop Unrolling (Temporary variable swapping)
* Updated
* hospital/sortByBeds
* hospital/sortByName
* hospital/sortByPrice
* In progress
* basicgame/loseItem (original code has similar logic as hospital/sortByBeds but decompiled code is different)
* Merging other transform codes (from Jiacheng) into one transform.py
2. Organize (extract) patterns from finished samples
* Checking finished samples and extracting patterns
slide: https://docs.google.com/presentation/d/1tYCSaAzWR5N-LkEfZIM6bJPa4PGf9JRGqlY5ATJjbZY/edit#slide=id.g32d5d27ebd9_0_0
@Jiacheng:
1. Redunant variable settings changes instruction position
```(i.e., a = b; then c = a;)```
2. goto -> return
* case 1: Simply convert goto to return
* Look for other cases
3. Code looks different but similarity is high (0.93)
* Need to look deeper into what are the difference
4. Switch converted to multiple if statement
* Pattern: ```if (a == 1){} if (b == 2) {} ...```
@Arjun:
1. Keep the logic of the original code as perturbation
### 2/12/25 Meeting Summary:
@Kyungchan:
1. Reviewed found patterns (3/7 patterns applied as transformation so far).
2. Make regex for found pattern.
* Ghidra error
* Expanded loop (hospital/function/printHospitalsInCity.c)
* goto
* Casting automation
3. New Patterns
* Struct expansion:
* struct can be expanded to multiple variable assignment --> instructions are similar. May not apply to the pattern.
* Loop Unrolling instruction pattern:
* movdqu, movups, movslq are only used in loop unrolling assembly codes --> Apply batch transformation when those instructions are detected.
@Jiacheng:
1. Ghidra produced a for loop which can be applied as a pattern to memset (found 2 same cases).
2. Using different compiler/OS version can affect in different assembly code after compile/decompilation --> Use the same GCC/OS version throughout the same sample code.
* We will need to review and check samples with the same compiler version after we find enough patterns.
@Arjun:
1. Skip mei-amt-check sample --> it requires Intel-based machine
2. Updated preprocessing --> cover __vfprintf_chk(_stderr,2,msg,ap)
3. isprint() decompiled to a different condition with if statement. --> can be considered as a pattern.
@Arjun "isprint()" can be applied as a pattern. If you can, please make a simple regex to apply the transformation.
### 2/19/25 Meeting Summary:
@Kyungchan:
1. Make regex for the found pattern.
* Ghidra error
* Casting automation
2. Going over samples and eliminate simple samples
3. New patterns
* Removed else
* Resulted in different instructions
* Ghidra removed code
* Ghidra removed one line of the if statement
* Ghidra created a new function
* Original code used "error("text")" but Ghidra added another function and then called the function in place of perror()
@Jiacheng:
1. New patterns
* Return early in the if statement
* Original code returns inside of the if statement, but Ghidra removed them
* assert() negated: assert_fail() can be changed to assert() with prior condition
* pointer deference can be changed to pointer array <-- This could be similar to struct expansion
@Arjun:
1. Updated preprocessing
2. New patterns
* strcmp pattern found
* Ghidra added while loop
* It can be from strcat() from the original code. Check with strcat() instruction with added while loop to map