--- tags: decompiler --- # Understanding bytecode ## How do `EXTEND_ARGS` work More like the overall value can be chained through extended args. Like: ``` EXTENDED_ARGS A EXTENDED_ARGS B JUMP_ABSOLUTE C ``` So it will jump absolute to value ABC where each A, B and C are one byte each. So the final value would be: `((A << 8) | B ) << 8 | C` Where `8` is the number of bits to shift which is equivalent to one byte. ## Types of jump instructions There are two types of jump instructions in python. - Absolute Jump: Jumps to the offset passed as an argument - Relative Jump: Jump the number of bytes ahead as passed as an argument In python `opcode`, the two are separated in separate list and can be looked up. ## What are the code objects variables: ### When adding local variable: 1. Add to `co_varnames` at certain index 2. Update total in `co_nlocals` ## Conditionals and jumps - 'JUMP_FORWARD', 110 - 'JUMP_IF_FALSE_OR_POP', 111 - 'JUMP_IF_TRUE_OR_POP', 112 - 'JUMP_ABSOLUTE', 113 - 'POP_JUMP_IF_FALSE', 114 - 'POP_JUMP_IF_TRUE', 115 ## Struggles of instrumenting instruction: Below is a list of properties that need to be resolved after instrumenting an instruction in bytecode - Code object `co_lnotab` property which is the mapping of bytecodes to line numbers in the source file. *We ignore this because bytecode still remains runnable if not adjusted and is not being used by decompiler* - Adding variables or global variables would mean adjusting the `co_varnames` and `co_nlocals` or `co_names` respectively. - Adjusting jump offsets of relative jumps (like `JUMP_FORWARD` which takes the number of bytes of offsets that are skipped) and absolute jumps (like `POP_JUMP_IF_FALSE` which jumps to an absolute offset) in bytecode to reflect the adjustment. - Sometimes, adjusting args of jump instructions may exceed the 256 maximum value and would require the need to instrument `EXTENDED_ARGS` to incorporate the large values - Optimised jump instructions like `SETUP_LOOP` may not be as trivial to update. They store the number of bytes to jump to which is jumped after the instruction `POP_BLOCK` is called. The bytes are usually the size of the chunk of bytecode for loop but they usually add the number of bytes to jump in it if `POP_BLOCK` is followed by instructions like `JUMP_FORWARD` in order to avoid invoking `JUMP_FORWARD`. We need to cater such optimization given if the instruction instrumented is between `POP_BLOCK` and `JUMP_FORWARD` to avoid skipping the instrumented instruction. # Patterns and solutions ## Some opcode patterns: - **`POP_JUMP_IF_TRUE`** : Completes an `or` operation to go to an `and` operation. The operation jumps to next expression in `and` chain. - **`JUMP_IF_FALSE_OR_POP`** : Connects all `and` expressions together. The argument of instruction target instruction to load overall boolean chain value. All `JUMP_IF_FALSE_OR_POP` will have the same jump targets. - **`JUMP_IF_TRUE_OR_POP`** : Same as previous except for this instruction is for `or` chains. ## Transformation rules: ### 1. Breaking `(a or b) and c` Example: 0.pyc Pattern: `^([POP_JUMP_IF_TRUE].*)+([JUMP_IF_FALSE_OR_POP].*)+[exit_set]$` The above pattern is for a series of `(a1 or a2 or a3 or .. aN) and b1 and b2 and ... and bN` where `aN` is *always* before the first occurence of `JUMP_IF_FALSE_OR_POP` and `bN`will always be before the jump target of all `JUMP_IF_FALSE_OR_POP`. For the `exit_set` which mark the end of the pattern are as follows: ``` exit_set = [ 90, # name_op('STORE_NAME', 90) 95, # name_op('STORE_ATTR', 95) # Index in name list 97, # name_op('STORE_GLOBAL', 97) # "" 125, # def_op('STORE_FAST', 125) # Local variable number 83, # def_op('RETURN_VALUE', 83) 86, # def_op('YIELD_VALUE', 86) ] ``` #### The transformation: 1. Add `STORE_FAST` and `LOAD_FAST` to introduce a new local variable to store boolean result and then load it immediately for the next boolean expression. (This will be added before the first `JUMP_IF_FALSE_OR_POP` - `and`) 2. Convert `POP_JUMP_IF_TRUE` to `JUMP_IF_TRUE_OR_POP`. This preserves the boolean value for our temporary variable introduced 3. Assign jump arg of `JUMP_IF_TRUE_OR_POP` to our newly instrumented instructions Results: ``` (a or b) and c Changes to => t1 = (a or b) t1 and c ``` ### 2. Breaking `a and b and c` Example: 10.pyc Pattern: `^([JUMP_IF_FALSE_OR_POP].*)+[exit_set]$` The above pattern is for a series of `b1 and b2 and ... and bN` where `bN`will always be before the jump target of all `JUMP_IF_FALSE_OR_POP`. For the `exit_set` which mark the end of the pattern are as follows: ``` exit_set = [ 90, # name_op('STORE_NAME', 90) 95, # name_op('STORE_ATTR', 95) # Index in name list 97, # name_op('STORE_GLOBAL', 97) # "" 125, # def_op('STORE_FAST', 125) # Local variable number 83, # def_op('RETURN_VALUE', 83) 86, # def_op('YIELD_VALUE', 86) ] ``` #### The transformation: 1. Add `STORE_FAST` and `LOAD_FAST` to introduce a new local variable to store boolean result and then load it immediately for the next boolean expression. (This will be added before the second `JUMP_IF_FALSE_OR_POP` - `and`) 2. Keep first `JUMP_IF_FALSE_OR_POP` as is. This preserves the boolean value for our temporary variable introduced 3. Assign jump arg of `JUMP_IF_FALSE_OR_POP` to our newly instrumented instructions. 4. Give the number of extractions to extract the number of booleans Results: ``` a and b and c Changes to => t1 = (a and b) t1 and c ``` > NOTE: There may be other jump instructions like "POP_JUMP_IF_TRUE" ### 3. Breaking conditionals `if a and/or b and/or c:` Example: 16.pyc Pattern: `^([POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE].*)+[exit_set]$` - Where each `POP_JUMP_IF_FALSE` corresponds to an `and` and each `POP_JUMP_IF_TRUE` corresponds to an `or`. - For each of these, the jump target is the same. The above pattern is for a series of `b1 and/or b2 and/or ... and/or bN` where `bN`will always be before the jump target of all `POP_JUMP_IF_FALSE`/`POP_JUMP_IF_TRUE` . For the `exit_set` which mark the end of the pattern are as follows: ``` exit_set = [ 90, # name_op('STORE_NAME', 90) 95, # name_op('STORE_ATTR', 95) # Index in name list 97, # name_op('STORE_GLOBAL', 97) # "" 125, # def_op('STORE_FAST', 125) # Local variable number 83, # def_op('RETURN_VALUE', 83) 86, # def_op('YIELD_VALUE', 86) ] ``` #### The transformation: 1. Add `STORE_FAST` and `LOAD_FAST` to introduce a new local variable to store boolean result and then load it immediately for the last jump. (This will be added before the last `POP_JUMP_IF_FALSE`/`POP_JUMP_IF_TRUE` - `and`/`or`) 2. Convert all `POP_JUMP_IF_FALSE`/`POP_JUMP_IF_TRUE` before the instrumented instruction to `JUMP_IF_FALSE_OR_POP`/`JUMP_IF_TRUE_OR_POP` respectively. 3. Assign jump arg of all `JUMP_IF_FALSE_OR_POP`/`JUMP_IF_TRUE_OR_POP` to our newly instrumented instructions. Results: ``` if a and/or b and/or c: Changes to => t1 = a and/or b and/or c if t1: ``` > NOTE: There may be other jump instructions from before like "JUMP_IF_TRUE_OR_POP", fix their args as well if need be but most probably they need not be changed. ### 4. Breaking elif into `else and nested if` Example: 19.pyc Pattern: `^[POP_JUMP_IF_FALSE]).*[JUMP_FORWARD]$` - Where each `POP_JUMP_IF_FALSE` corresponds to the end of the last if condition. - For this instruction we make sure the jump target is greater than the offset of `JUMP_FORWARD`. The above pattern consists of single `POP_JUMP_IF_FALSE` and `JUMP_FORWARD`. All other jump instructions between these two instructions either return or jump within them. #### The transformation: 1. Add `STORE_FAST` and `LOAD_FAST` to introduce a new local variable to store boolean result and then load it immediately for the next boolean expression. (This will be added before the `POP_JUMP_IF_FALSE`) 2. Immediately after `JUMP_FORWARD`, load the boolean variable via `LOAD_FAST` 3. After load, add `POP_JUMP_IF_TRUE` to jump to the target where `JUMP_FORWARD` jumps. Results: ``` if a and b: a = 1 + 2 elif c: c = 1 +2 Changes to => t1 = (a and b) if t1: a = 1 + 2 else: if not t1 and c: c = 1 + 2 ``` ### 5. Adding NOP instr at error inducing spot Add pattern for optmised spots??? ### 6. Extract boolean from while loops Example: 158.pyc Pattern: `([SETUP_LOOP]([^FOR_ITER]*[POP_JUMP_IF_FALSE,POP_JUMP_IF_TRUE])+.*[JUMP_ABSOLUTE][POP_BLOCK]` - Where `SETUP_LOOP` marks the start of the loop and `[JUMP_ABSOLUTE][POP_BLOCK]` mark the end of it. - `[POP_JUMP_IF_FALSE,POP_JUMP_IF_TRUE]` will mark the end of the condtion where the arg should be the offset of `POP_BLOCK`. #### The transformation: 1. Remove `SETUP_LOOP` instruction 1. Add `STORE_FAST`, `SETUP_LOOP` and `LOAD_FAST` to introduce a new local variable to store boolean result, start the loop and then load it immediately for the next boolean expression. (This will be added before the second `POP_JUMP_IF_FALSE`/`POP_JUMP_IF_TRUE`) 2. Convert all `POP_JUMP_IF_FALSE`/`POP_JUMP_IF_TRUE` before the instrumented instruction to `JUMP_IF_FALSE_OR_POP`/`JUMP_IF_TRUE_OR_POP` respectively. 3. Assign jump arg of all `JUMP_IF_FALSE_OR_POP`/`JUMP_IF_TRUE_OR_POP` to our newly instrumented instructions. 4. Update the args of `JUMP_ABSOLUTE` to the new position of `SETUP_LOOP` 5. For FET, we copy the entire boolean expression (from `SETUP_LOOP` to `POP_JUMP_IF_FALSE`/`POP_JUMP_IF_TRUE` excluding the two instructions themselves) followed by a `STORE_FAST` and prepend before the `JUMP_ABSOLUTE` to update the boolean value in our newly instrumented variable Results: ``` while a and/or b and/or c: <do something> Changes to => t1 = (a and/or b and/or c) while t1: <do something> t1 = (a and/or b and/or c) # this is for FET ``` ### 7. Removing deadcode after RETURN_VALUE instruction ### 8. Remove `JUMP_FORWARD` when not needed There are instances of instructions where `JUMP_FORWARD` jumps to the immediately next instruction. Here we remove this instruction. ### 9. Instrument after the last instruction of loop if it is `break` Example: Mal `102168.pyc` - `pretty_flags` Example: 158.pyc Pattern: `[POP_JUMP_IF_FALSE,POP_JUMP_IF_TRUE].*[BREAK_LOOP][JUMP_ABSOLUTE][POP_BLOCK]` - Where `[POP_JUMP_IF_FALSE,POP_JUMP_IF_TRUE]` marks the conditional including the `[BREAK_LOOP]` instruction. - `[JUMP_ABSOLUTE][POP_BLOCK]` marks the end of the loop. We instrument our code here - The `[POP_JUMP_IF_FALSE,POP_JUMP_IF_TRUE]` must have the same target offset as `[JUMP_ABSOLUTE]` #### The transformation: 1. Insert our NOP instructions before `[JUMP_ABSOLUTE][POP_BLOCK]` 2. Redirect the `[POP_JUMP_IF_FALSE,POP_JUMP_IF_TRUE]` to our new instruction. 3. We should be done with this then. ### 10. Instrument before `continue` in loops Example: Mal `116051.pyc` Pattern: `[POP_JUMP_IF_FALSE,POP_JUMP_IF_TRUE][JUMP_ABSOLUTE][POP_TOP][JUMP_ABSOLUTE][POP_BLOCK]` - Where `[POP_JUMP_IF_FALSE,POP_JUMP_IF_TRUE]` marks the conditional including the `[BREAK_LOOP]` instruction. - `[JUMP_ABSOLUTE][POP_TOP]` Is an in optimization for `continue`. We instrument our code here - The `[JUMP_ABSOLUTE]` is the actual `continue` instruction - We must confirm that both `[JUMP_ABSOLUTE]` jump to the same offset - `[JUMP_ABSOLUTE][POP_BLOCK]` marks the end of the loop. We instrument our code here The transformation: 1. Insert our NOP instructions before the last `[JUMP_ABSOLUTE]` 2. Convert the first `[JUMP_ABSOLUTE]` to `[JUMP_FORWARD]` and have it jump 2 bytes that is to our newly instrumented instruction ### 11. Instrument at the end of loops: Example: Mal `102150.pyc` Pattern: `[POP_TOP][JUMP_ABSOLUTE]` - `[POP_TOP][JUMP_ABSOLUTE]` marks the end of the loop The transformation: - We instrument before `POP_TOP`. - We make sure all the jumps to `POP_TOP` in the bytecode jump to the newly instrumented code ### 12. Instrument at the end of loops and fix JUMP_ABSOLUTE: Example: Mal `121042.pyc` Pattern: `[POP_TOP][JUMP_ABSOLUTE]` - `[POP_TOP][JUMP_ABSOLUTE]` marks the end of the loop The transformation: - We instrument before `POP_TOP`. - We make sure all the jumps to `POP_TOP` in the bytecode jump to the newly instrumented code - Convert all JUMP_ABSOLUTE going back to `JUMP_FORWARD` to our new instruction (**NEW**) ### 13. Catering to chained compare operators eg. `<` (not implemented) Example: Mal `102355.py` Pattern: `^([POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE].*)+[COMPARE_OP][POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE][JUMP_FORWARD][POP_TOP]+$` The above pattern is for a series of `b1 and/or b2 and/or ... and/or bN` where all `[COMPARE_OP][POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE][JUMP_FORWARD][POP_TOP]` will be the ending of bolean expression. - Do note that `[COMPARE_OP][POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE]` is a general comparison for any comparison operator like `<`. - The boolean expression will end with `[COMPARE_OP][POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE][JUMP_FORWARD][POP_TOP]` if the expression ends with comparison operator. Transformation: 1. Add `STORE_FAST` and `LOAD_FAST` to introduce a new local variable to store boolean result. 2. Instrument at the end of `POP_TOP` and also add a `POP_JUMP_IF_FALSE` that jumps to the previous last `[POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE]`. 3. Remove previous last `[POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE]` 4. Add `ROT_TWO` before `POP_TOP` 5. If previous last `POP_JUMP_IF_X` is `POP_JUMP_IF_TRUE` then add `UNARY_NOT` before `STORE_FAST` 6. Redirect all jumps to `POP_TOP` to `ROT_TWO` 7. Convert all `POP_JUMP_IF_FALSE`/`POP_JUMP_IF_TRUE` before the instrumented instruction to `JUMP_IF_FALSE_OR_POP`/`JUMP_IF_TRUE_OR_POP` respectively. 8. Assign jump arg of all `JUMP_IF_FALSE_OR_POP`/`JUMP_IF_TRUE_OR_POP` to `UNARY_NOT` or if not exist then our newly instrumented instructions ![](https://i.imgur.com/CT45Ci8.png) conversion: ``` if not (nonce and 7 <= len(nonce) <= 13): Changes to ==> tmp = not (nonce and 7 <= len(nonce) <= 13) if tmp: ``` ### 14. Converting `if` and `elif` chain to `if` and `if and not` (not implemented - original sample has extended args so will visit later) - not logically equivalent Example: Mal `102496.py` Pattern: `^[POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE].+[POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE]+$` Where the jump target of the first `[POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE]` leads to a boolean expression of the second `[POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE]`. Transformation: 1. Add `STORE_FAST` and `LOAD_FAST` to introduce a new local variable to store boolean result. 2. Apply rule 3 for the first boolean 3. Remove all `JUMP_FORWARD` between the first `[POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE]_1` and the target `[POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE]_2` 4. Add `LOAD_FAST` after the second `[POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE]_2` and add `POP_JUMP_IF_TRUE` that jumps to the same target as `[POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE]_2` conversion: ``` if a: ... elif b: ... Changes to ==> tmp = a if tmp: ... if b and not tmp: ... ``` ### 15. Converting `if` and `else` chain to `if` and `if and not` (not implemented - original sample has extended args so will visit later) Example: Mal `102496.py` Pattern: `^[POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE].+[JUMP_FORWARD]+$` Where the jump target of the first `[POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE]` leads to an else that is`[JUMP_FORWARD]`. Transformation: 1. Add `STORE_FAST` and `LOAD_FAST` to introduce a new local variable to store boolean result. 2. Apply rule 3 for the first boolean 3. Remove all `JUMP_FORWARD` between the first `[POP_JUMP_IF_TRUE,POP_JUMP_IF_FALSE]_1` and the target `[JUMP_FORWARD]_2` 4. Add `LOAD_FAST` after the `[JUMP_FORWARD]_2` and replace it with `POP_JUMP_IF_TRUE` that jumps to the same target as `[JUMP_FORWARD]_2` conversion: ``` if a: ... else: ... Changes to ==> tmp = a if tmp: ... if not tmp: ... ``` ### 16. Instrument after loops (outisde):(not implemented) Example: Mal `37997.pyc` - close Pattern: `[POP_TOP][JUMP_ABSOLUTE]` - `[POP_TOP][JUMP_ABSOLUTE][POP_BLOCK]` marks the end of the loop The transformation: - We instrument after `POP_BLOCK`. - We make sure the `SETUP_LOOP` now jumps to this instruction and does not optimize ### 17. Instrument store instead of return in `try` block Example: Mal `12489.pyc` - from_param Pattern: `[SETUP_FINALLY].+[POP_BLOCK][RETURN_VALUE].+[END_FINALLY]` - The above pattern is for the try block structure for bytecode pattern for try block with a return statement The transformation: - Instrument `STORE_FAST` to store the value in our variable before `POP_BLOCK` - Replace `RETURN_VALUE` with `JUMP_FORWARD` and update the jump target to `END_FINALLY` ### 18. Instrument store instead of return in `except` block Example: Mal `121817.pyc` - from_param Pattern: `([ROT_FOUR][POP_EXCEPT])+[RETURN_VALUE]` - The above pattern shows a value being returned in exception block Transofrmation: - Instrument `STORE_FAST` to store the value in our variable before `ROT_FOUR` - Remove all pairs of `ROT_FOUR` and `POP_EXCEPT`. - If instruction after `RETURN_VALUE` is `END_FINALLY` then replace `RETURN_VALUE` with `POP_EXCEPT` and `JUMP_FORWARD` and jump to instruction after. - If instruction after `RETURN_VALUE` is **not** `END_FINALLY` then remove all instruction up until and including `RETURN_VALUE` ### 19. remove `continue` from `except` block and replace with pass Example: Mal `12481.pyc` - \_check_ctypeslib_typecodes Pattern: `([POP_EXCEPT][JUMP_ABSOLUTE])+[END_FINALLY][JUMP_ABSOLUTE]` - There are multiple `[POP_EXCEPT][JUMP_ABSOLUTE]` entries where each is for continue and the last one is for end of except at the end of loop - All `JUMP_ABSOLUTE` in `[POP_EXCEPT][JUMP_ABSOLUTE]` will jump to same target Transformation: - Remove copies of `[POP_EXCEPT][JUMP_ABSOLUTE]` except one ``` for tp in set(np.sctypeDict.values()): try: ctype_for = as_ctypes_type(tp) ctypes_to_dtypes[ctype_for] = tp except NotImplementedError: continue Change to => for tp in set(np.sctypeDict.values()): try: ctype_for = as_ctypes_type(tp) ctypes_to_dtypes[ctype_for] = tp except NotImplementedError: pass ``` ### 20. Move `return` statement outside `with` block Example: Mal `132693.pyc` - load_library Pattern: `[POP_BLOCK][ROT_TWO][BEGIN_FINALLY][WITH_CLEANUP_START][WITH_CLEANUP_FINISH][POP_FINALLY][RETURN_VALUE][WITH_CLEANUP_START][WITH_CLEANUP_FINISH][END_FINALLY][LOAD_CONST][RETURN_VALUE]` - where the first `[POP_BLOCK][ROT_TWO][BEGIN_FINALLY][WITH_CLEANUP_START][WITH_CLEANUP_FINISH][POP_FINALLY][RETURN_VALUE]` cleans up the `with` block and returns the value - The second part `[WITH_CLEANUP_START][WITH_CLEANUP_FINISH][END_FINALLY][LOAD_CONST][RETURN_VALUE]` ensures that immediately outside the `with` block there is nothing else being returned and marks the end of function Transformation: 1. Add `STORE_FAST` to introduce a new local variable to store returned value before `POP_BLOCK`. 2. Remove `ROT_TWO` 3. Remove `[WITH_CLEANUP_START][WITH_CLEANUP_FINISH][POP_FINALLY][RETURN_VALUE]` that clear up and return in the `with` block 4. Replace `LOAD_CONST` with `LOAD_FAST` to load our stored value and change it's argument too ``` with self.ffi._lock: return self._load_library change to: with self.ffi._lock: temp= self._load_library return temp ``` ### 21. Convert continue in try/except and move in conditional block (not implemented) Example: Mal `18250.pyc` - __init__ Pattern: `[FOR_ITER][STORE_FAST].+[POP_EXCEPT][JUMP_ABSOLUTE][POP_EXCEPT][JUMP_FORWARD][END_FINALLY].+[JUMP_ABSOLUTE]` - Where `[FOR_ITER][STORE_FAST]` marks the start of the loop - `[POP_EXCEPT][JUMP_ABSOLUTE][POP_EXCEPT][JUMP_FORWARD][END_FINALLY]` marks the continue in exception block - the last `[JUMP_ABSOLUTE]` marks the end of the loop Transformation: - Instrument `LOAD_CONST` and `STORE_FAST` after `[FOR_ITER][STORE_FAST]` and store `False` value - Replace `[POP_EXCEPT][JUMP_ABSOLUTE]` with `[LOAD_CONST][STORE_FAST]` to store the `True` value to our instrumented variable - Immediately after `[POP_EXCEPT][JUMP_FORWARD][END_FINALLY]` instrument `[LOAD_FAST][LOAD_CONST][COMPARE_OP][POP_JUMP_IF_FALSE][JUMP_ABSOLUTE]` to compare the instrumented variable value and invoke `continue` accordingly ``` for i in range(len(path) - 1, - 1, - 1): t= False if path[i] == '?': try: offset = int(path[i + 1:]) except ValueError: # Just ignore any spurious "?" in the path # (like in Windows UNC \\?\<path>). continue path = path[:i] break else: offset = 0 Convert to: for i in range(len(path) - 1, - 1, - 1): t= False if path[i] == '?': try: offset = int(path[i + 1:]) except ValueError: # Just ignore any spurious "?" in the path # (like in Windows UNC \\?\<path>). t=True if t == True: continue path = path[:i] break else: offset = 0 ``` ### 22. Convert `return` statement to variable in `with` block Example: Mal `132681.pyc` - typeof Pattern: `[POP_BLOCK][ROT_TWO][BEGIN_FINALLY][WITH_CLEANUP_START][WITH_CLEANUP_FINISH][POP_FINALLY][RETURN_VALUE][WITH_CLEANUP_START][WITH_CLEANUP_FINISH][END_FINALLY][LOAD_CONST][RETURN_VALUE]` - where the first `[POP_BLOCK][ROT_TWO][BEGIN_FINALLY][WITH_CLEANUP_START][WITH_CLEANUP_FINISH][POP_FINALLY][RETURN_VALUE]` cleans up the `with` block and returns the value - The second part `[WITH_CLEANUP_START][WITH_CLEANUP_FINISH][END_FINALLY][LOAD_CONST][RETURN_VALUE]` ensures that immediately outside the `with` block there is nothing else being returned and marks the end of function Transformation: 1. Add `STORE_FAST` to introduce a new local variable to store returned value before `POP_BLOCK`. 2. Remove `ROT_TWO` 3. Remove `[WITH_CLEANUP_START][WITH_CLEANUP_FINISH][POP_FINALLY][RETURN_VALUE]` that clear up and return in the `with` block 4. Replace `LOAD_CONST` with `LOAD_FAST` to load our stored value and change it's argument too ``` with self.ffi._lock: return self._load_library change to: with self.ffi._lock: temp= self._load_library return temp ``` ### 23. Convert `break` statement to variable in `except` block Example: Mal `10338.pyc` - compiler_fixup Pattern: `[POP_EXCEPT][JUMP_ABSOLUTE].+[JUMP_ABSOLUTE]` - Where `[POP_EXCEPT][JUMP_ABSOLUTE]` is the break in except block - The second `[JUMP_ABSOLUTE]` is the end of loop where it precedes the `JUMP_ABSOLUTE` target in break Transformation: 1. Add `LOAD_FAST` and `STORE_FAST` to introduce a new local variable as marker for where `break` is. 2. Replace `[POP_EXCEPT][JUMP_ABSOLUTE]` with our instrumented instruction ``` while True: try: os.startfile(Directory + 'Uninstaller.bat', 'runas') except: pass else: break change to: with self.ffi._lock: temp= self._load_library return temp ``` ### 24. Convert `continue` statement to variable in `with` and `try` block Example: Mal `132681.pyc` - typeof Pattern: `[POP_BLOCK][BEGIN_FINALLY][WITH_CLEANUP_START][WITH_CLEANUP_FINISH][POP_FINALLY][POP_BLOCK][JUMP_ABSOLUTE]` - where the first `[POP_BLOCK][BEGIN_FINALLY][WITH_CLEANUP_START][WITH_CLEANUP_FINISH][POP_FINALLY]` cleans up the `with` block - The second part `[POP_BLOCK]` removes the `try` block - The `JUMP_ABSOLUTE` jumps to start of the loop Transformation: 1. Add `[LOAD_FAST][STORE_FAST]` to mark the `continue` position. 2. Remove all the remaining instruction to continue with normal execution ``` try: with self.ffi._lock: continue change to: try: with self.ffi._lock: z=z ``` ### 25. Convert `continue` with instruction in loop Example: Mal `3600.pyc` - \_line_iterator Pattern: `[JUMP_ABSOLUTE].*[JUMP_ABSOLUTE]` - where the first `[JUMP_ABSOLUTE]` is for `continue` - The second `[JUMP_ABSOLUTE]` is the end of the loop - Both `JUMP_ABSOLUTE` instructions should target the same position and the target should be less than the offsets of each instructions Transformation: 1. Add `[LOAD_FAST][STORE_FAST]` to mark the `continue` position. 2. Remove the first `[JUMP_ABSOLUTE]` ``` while 1: if x: continue changes to => while 1: if x: z=z ``` ### 26. Convert `break` with instruction in loop (To do) ### 27. Removing closures from dictionary generation (Not implemented) Example: Mal `10365.pyc` - all_tasks Pattern: `[LOAD_CLOSURE][BUILD_TUPLE][LOAD_CONST]+[MAKE_FUNCTION][LOAD_FAST][GET_ITER][CALL_FUNCTION]` - `[LOAD_CLOSURE][BUILD_TUPLE]` utilizes local variables and creates a scope for the function `setcomp` for computing each variable in dictionary generation - `[LOAD_CONST]+[MAKE_FUNCTION]` loads all the constants and functions. Note that the `MAKE_FUNCTION` is given an argument as `0x08` which is loading the closure created previously - `[LOAD_FAST][GET_ITER]` will set up the loop to iterate a variable - `CALL_FUNCTION` calls the function for each iteration Their needs to be an inter-procedural analysis. The code object at `LOAD_CONST` will be analysed further as follows: - In the function code object, we look for `LOAD_DEREF` instruction - The `LOAD_DEREF` instruction will load a free variable in cell with a reference. - This is accessible through closure that was created earlier Transformation: 1. Remove the closure by removing the instructions `[LOAD_CLOSURE][BUILD_TUPLE]` 2. Give `MAKE_FUNCTION` argument of `0x00` instead of `0x08` because the closure is no longer necessary 3. Go into the code object and find references to parent closure by looking for the instruction `LOAD_DEREF` and do the follows: a) Convert `LOAD_DEREF` to `LOAD_GLOBAL` and hence changing the scope of the variable to global variable b) Add the global variable to the list of consts of the code object c) Go to parent scope and find all references to the same variable and change instructions as `LOAD_DEREF` -> `LOAD_GLOBAL` and `STORE_DEREF` -> `STORE_GLOBAL`. d) Change the list of consts of the parent code object as well ``` if loop is None: loop = events.get_running_loop() return {t for t in tasks if futures._get_loop(t) is loop and not t.done()} changes to => global loop # this will be added if loop is None: loop = events.get_running_loop() return {t for t in tasks if futures._get_loop(t) is loop and not t.done()} ``` ### Decompyle3 catering from here ### 28. Convert `not` in loop condition into bytecode equivalent Example: Mal `2465.pyc` - source_synopsis Pattern: `[SETUP_LOOP].*[POP_JUMP_IF_TRUE]` - where `[SETUP_LOOP]` is the start of loop - The `[POP_JUMP_IF_TRUE]` is the last instruction in condition and jump target is the end of the loop. Transformation: 1. convert `POP_JUMP_IF_TRUE` to `POP_JUMP_IF_FALSE`. 2. Instrument `UNARY_NOT` instruction before the `POP_JUMP_IF_FALSE` to make it equivalent ``` while a and not b: if x: continue changes to => while a and not b: if x: continue ``` > NOTE: This keeps the instructions the same so it will be semantically the same as well. ### 29. Convert `try`/`except`/`else` to `try`/`except`/`if cond` (not implemented) Example: Mal 47920.pyc - \_signature_get_partial Pattern: `[POP_EXCEPT][JUMP_FORWARD][END_FINALLY]` - Where `[POP_EXCEPT][JUMP_FORWARD][END_FINALLY]` marks the end of except block - `[JUMP_FORWARD]` will jump to the end of `else` block if it exists. If `else` block does not exist, it jumps to the instruction after `END_FINALLY` Transformation: 1. Instrument your annotated condition as `[LOAD_GLOBAL][POP_JUMP_IF_FALSE]` after `[END_FINALLY]` 2. Redirect `JUMP_FORWARD` to the instrumented instructions 3. Change jump target of `POP_JUMP_IF_FALSE` to the previous target of `JUMP_FORWARD` ``` try: pass except: pass else: pass return changes to => try: pass except: pass if a: pass return ``` ### Python 2.7 catering from here ### 30. Add `CALL_FUNCTION` before `RAISE_VARARGS` in `assert` Example: Mal 18845.pyc - output_difference Pattern: `[POP_JUMP_IF_TRUE, POP_JUMP_IF_FALSE][LOAD_GLOBAL][LOAD_CONST][RAISE_VARARGS]` - Where `[POP_JUMP_IF_TRUE, POP_JUMP_IF_FALSE]` marks the end of assertion condition - `[LOAD_GLOBAL][LOAD_CONST][RAISE_VARARGS]` will load the assertion error and raise it Transformation: 1. Instrument `CALL_FUNCTION` before `RAISE_VARARGS` fixes the problem. ### Extra ### 31. Convert `return` statement to variable in nested `for` Example: Mal [here](https://github.com/urllib3/urllib3/blob/1.8.2/urllib3/packages/ssl_match_hostname/_implementation.py) - match_hostname Pattern: `[POP_TOP]+[LOAD_CONST][RETURN_VALUE]` - where the chain of `[POP_TOP]` cleans up the loops - The second part `[LOAD_CONST][RETURN_VALUE]` returns the value Transformation: 1. Remove all instructions 2. Add `[STORE_FAST][LOAD_FAST]` to mark the position where the return instruction was > NOTE: This was in py3.8 and due to the fact that nested `return` instruction caused issue with decompilation. There was no way of reducing this nesting effect and so had to remove `return` ``` for sub in cert.get('subject', ()): for key, value in sub: if key == 'commonName': if _dnsname_match(value, hostname): return change to: for sub in cert.get('subject', ()): for key, value in sub: if key == 'commonName': if _dnsname_match(value, hostname): z=z ``` ### Main file errors ### 32. Break imports Example: 12839 Pattern: `[LOAD_CONST][LOAD_CONST][IMPORT_NAME]([IMPORT_FROM][STORE_NAME]){2,}[POP_TOP]` Fix: `([LOAD_CONST][LOAD_CONST][IMPORT_NAME][IMPORT_FROM][STORE_NAME][POP_TOP]){2,}` - Where `[LOAD_CONST][LOAD_CONST][IMPORT_NAME]` is initializing import library - `([IMPORT_FROM][STORE_NAME]){2,}` is merged imports of different tuples of `a as b` - `POP_TOP` ends the import statement Transformation: - Copy the first `LOAD_CONST` - Copy the second `LOAD_CONST` and break it's argument tuples into one value each of multiple tuples `i` - For each tuple `i` use `[LOAD_CONST][LOAD_CONST][IMPORT_NAME][IMPORT_FROM][STORE_NAME][POP_TOP]` where - First `LOAD_CONST` is the same - Second `LOAD_CONST` has one of the tuples `i` - `IMPORT_NAME` is the same as original - `[IMPORT_FROM]` has the argument of the single value in the tuple `i` - `STORE_NAME` is what will be the corresponding variable it was storing in the original - `POP_TOP` ends the import statement Example1: ``` from numpy.core.records import ( fromarrays as recfromarrays, fromrecords as recfromrecords ) Changes to => from numpy.core.records import ( fromarrays as recfromarrays ) from numpy.core.records import ( fromrecords as recfromrecords ) ``` Example2: ``` from importlib.resources import path as get_path, read_text Changes to => from importlib.resources import path as read_text from importlib.resources import path as get_path ``` Example3: ``` from Crypto.PublicKey import RSA as CryptoRSA, DSA as CryptoDSA Changes to => from Crypto.PublicKey import RSA as CryptoRSA from Crypto.PublicKey import DSA as CryptoDSA ``` ### 33. Fix class inheritance with `**` Example: 117791 Pattern:`[LOAD_CONST][MAKE_FUNCTION][LOAD_CONST][BUILD_TUPLE][LOAD_NAME][CALL_FUNCTION_EX][STORE_NAME]` Fix: `[LOAD_CONST][MAKE_FUNCTION][LOAD_CONST][LOAD_NAME][CALL_FUNCTION][STORE_NAME]` reference for `**kwargs`: [link](https://www.geeksforgeeks.org/what-does-the-double-star-operator-mean-in-python/#:~:text=In%20a%20function%20definition%2C%20the,not%20enforced%20by%20the%20language.) - `[LOAD_CONST][MAKE_FUNCTION][LOAD_CONST]` will load the target class - `[BUILD_TUPLE][LOAD_NAME][CALL_FUNCTION_EX]` will invoke the parent class from which it inherits which can be variable number of classes in this case using `**` token - `[STORE_NAME]` will store the class ``[LOAD_CONST][MAKE_FUNCTION][LOAD_CONST]` - We load our FET variable to indicate `**` and pass just one inherited class function with `[LOAD_NAME][CALL_FUNCTION]` with `CALL_FUNCTION` with an argument of 3 Transformation: - We load the class as is with Example: ``` class ClientConnectorSSLError(**ssl_error_bases): """Response ssl error.""" Changes to => class ClientConnectorSSLError(FET_ssl_error_bases): """Response ssl error.""" ```