--- tags: comp-decomp title: Overview --- # Impact of slicing Code object and decompiling We slice code object into permutations .i.e. a given code object will all possible sub-codeobjects possible. This code object is then fed to decompiler. ![](https://i.imgur.com/vWl0rft.png) - Take a single code object - Create permutations of code objects - Run decompiler on each chunk of code object` - Collect all string code outputs - Slice each by line ('\n' character) - Create a set of all the lines collected All slicing experiments are continued [**here**](https://hackmd.io/leaart_aT3qfitcuvvyt5A?view#Overview). :::info Do note that bigger the code, the larger will be the set of permutations. Need to optimize this to prune out garbage code. Example: If the Code object starts with `STORE_FAST`, this never decompiles and so can prune all of them out. ::: ## Calculating similarity b/w code objects The similarity between code objects is acheived by applying substring matching on opcodes of the two code objects. - Each matching opcode is scored 1 - The ordinal positioning of opcodes is considered per match Recursive solution for score: ``` score (c1, c2): if len(c1) == 0 or len(c2) == 0: return 0 if c1[0] == c2[0]: return max( 1 + score(c1[2:], c2[2:]), score(c1[2:], c2), score(c1, c2[2:]) ) else: return max( score(c1[2:], c2), score(c1, c2[2:]) ) ``` Final result is: `score(c1,c2) / (max(len(c1), len(c2))/ 2)` .i.e. # of same opcodes / max between the # of opcodes in the two code objects. :::info Note that we take the "max between the # of opcodes in the two code objects" to avoid false alarms .i.e. one code object can be a subset of the other code object. Example [here](https://hackmd.io/leaart_aT3qfitcuvvyt5A?view#Example-4). In the example, Additional return instruction is added in decompiled code while changing logic. This would either require CFG analysis or trivially can be fixed by checking for subsets. ![](https://i.imgur.com/SHwAOr9.png) (Left: Decompiled code. Right: Original code.) ::: ## Optimization: The following instructions cause infinite loop at start of slice: - `STORE_FAST` - `JUMP_ABSOLUTE` ## New instructions discovered? - Aggregated `if` conditions in loops are broken down as shown in [**here**](https://hackmd.io/leaart_aT3qfitcuvvyt5A?view#Example-3) - ## Failed decompilation. Can they be fixed? # Lit-review: - [Dr. Eric Schulte](https://eschulte.github.io/): - [Evolving exact binaries](https://eschulte.github.io/data/bed.pdf) - [AST manipulation](https://grammatech.github.io/prj/sel/) -