--- tags: decompiler title: How does PyFET bypass decompiler error --- # Summary PyFET transforms binary so that decompiler uses *different robust parsing rules*. In Uncompyle6, the rule for `and` ('`and ::= expr JUMP_IF_FALSE_OR_POP expr COME_FROM`') expect only one `COME_FROM` but gets a second one from `or` expression. We transform by moving `or` expression to a separate statement (`FET_cond = c1 or c2`). In doing so, the rule for `and` is satisfied by having only one `COME_FROM`. <!-- In Uncompyle6, alternate rules for `or` are used (`or ::= expr_jitop expr COME_FROM` and `expr_jitop ::= expr JUMP_IF_TRUE_OR_POP` because '`JUMP_IF_TRUE_OR_POP`' used instead of '`POP_JUMP_IF_TRUE`' when expression results are assigned to `FET_cond`). --> PyFET allows different parsing code to be executed by it's transformation. In Unpyc37, our transformation makes the decompiler to *execute different code segment* from transformation. `POP_JUMP_IF_FALSE` convert into `JUMP_IF_FALSE_OR_POP` when boolean expression is assigned to a variable, thereby executing different handlers ( '`JUMP_IF_FALSE_OR_POP`' on line 2421 instead of `POP_JUMP_IF` on line 2508). PyFET allows transformations that enable execution of skipped instruction in incorrect parsing logic. For Decompyle++, PyFET's transformation now skips `LOAD_NAME` of instrumented instruction in the decompiler which allows it to parse previously skipped `SETUP_EXCEPT` parsing logic (line 1980 in 'ASTree.cpp'). For Decompyle++, PyFET's transformation *bypasses erronous instances* in parsing rules. Adding `FET_null` after `RETURN_VALUE` prevents essential instructions (`SETUP_EXCEPT` to initialize `try` block in this case) to be skipped by `bc_next()` function on line 1894. ``` or ::= expr_jitop expr COME_FROM expr_jitop ::= expr JUMP_IF_TRUE_OR_POP ``` # Overview: This document shows an overview of how PyFET leverages it's transformation to bypass error. ## Uncompyle6 Error code: ```python def is_at_turn(): return (x or z) and not y ``` PyFET fix: ```python def is_at_turn(): temp = (x or z) return temp and not y ``` ### Rules for fix used. There are two statements for which different sets of rules are used than the original. #### Statement 1. ```python temp = (x or z) ``` Corresponding rules: ``` stmts ::= sstmt+ sstmt ::= assign assign ::= expr store store ::= STORE_FAST expr ::= or or ::= expr_jitop expr COME_FROM expr_jitop ::= expr JUMP_IF_TRUE_OR_POP expr ::= LOAD_GLOBAL ``` Tree: ``` 0. sstmt assign (2) 0. expr or (3) 0. expr_jitop (2) 0. expr L. 2 0 LOAD_GLOBAL x 1. 2 JUMP_IF_TRUE_OR_POP 6 'to 6' 1. expr 4 LOAD_GLOBAL z 2. 6_0 COME_FROM 2 '2' 1. store 6 STORE_FAST 'temp' ``` #### Statement 2. ```python return temp and not y ``` Corresponding rules: ``` stmts ::= sstmt+ sstmt ::= sstmt RETURN_LAST sstmt ::= return return ::= ret_expr RETURN_VALUE ret_expr ::= expr expr ::= and and ::= expr JUMP_IF_FALSE_OR_POP expr COME_FROM expr ::= LOAD_FAST expr ::= unary_not unary_not ::= expr UNARY_NOT expr ::= LOAD_GLOBAL ``` Tree: ``` 1. sstmt (2) 0. sstmt return (2) 0. ret_expr expr and (4) 0. expr L. 3 8 LOAD_FAST 'temp' 1. 10 JUMP_IF_FALSE_OR_POP 16 'to 16' 2. expr unary_not (2) 0. expr 12 LOAD_GLOBAL y 1. 14 UNARY_NOT 3. 16_0 COME_FROM 10 '10' 1. 16 RETURN_VALUE 1. -1 RETURN_LAST ``` ### Rules for fix and their difference from original. Error: Fails to convert the following into an `expr` for `and` expression. ``` x or z LOAD_GLOBAL (x) 2 POP_JUMP_IF_TRUE (to 8) 4 LOAD_GLOBAL ``` Use this instead which is used for same except the first `expr` not comes from previous statement by using: ``` expr ::= LOAD_FAST and ::= expr JUMP_IF_FALSE_OR_POP expr COME_FROM ``` ## Unpyc37 Fix: ```python def __new__(cls, year, month=None, day=None): c = a and c1 <= d <= c3 if c: pass ``` dissassembly: ``` 2: 0 LOAD_GLOBAL (a) 2 JUMP_IF_FALSE_OR_POP (to 26) 4 LOAD_GLOBAL (c1) 6 LOAD_GLOBAL (d) 8 DUP_TOP 10 ROT_THREE 12 COMPARE_OP (<=) 14 JUMP_IF_FALSE_OR_POP (to 22) 16 LOAD_GLOBAL (c3) 18 COMPARE_OP (<=) 20 JUMP_FORWARD (to 26) >> 22 ROT_TWO 24 POP_TOP >> 26 STORE_FAST (c) 3: 28 LOAD_FAST (c) 30 POP_JUMP_IF_FALSE (to 32) 4: >> 32 LOAD_CONST (None) ``` diff: ![](https://i.imgur.com/rkaWDjq.png) ### Rules executed instead (Unpyc37): Rules used for `JUMP_IF_FALSE_OR_POP` instead of `POP_JUMP_IF_FALSE` for chained `<=` opertors. **Code with error** executed [before](https://github.com/andrew-tavera/unpyc37/blob/d7dc609e8c63086dc58fc749835f7aed2482543f/unpyc3.py#L2508) in function `def POP_JUMP_IF(`: ```python=2508 if addr[-3] and \ addr[-1].opcode == COMPARE_OP and \ addr[-2].opcode == ROT_THREE and \ addr[-3].opcode == DUP_TOP: if self.popjump_stack: c = self.pop_popjump() c = c.chain(cond) self.push_popjump(not truthiness, jump_addr, c, addr) else: self.push_popjump(not truthiness, jump_addr, cond, addr) return is_chained = isinstance(cond, PyCompare) and addr.seek_back(ROT_THREE, addr.seek_back(stmt_opcodes)) if is_chained and self.popjump_stack: pj = self.pop_popjump() if isinstance(pj, PyCompare): cond = pj.chain(cond) ``` Issue with above is it pops twice where in the second pop, it is empty since the stack has only one element. **Code without error** executed instead [here](https://github.com/andrew-tavera/unpyc37/blob/d7dc609e8c63086dc58fc749835f7aed2482543f/unpyc3.py#L2421) in function `def JUMP_IF_FALSE_OR_POP(`: ```python=2421 def JUMP_IF_FALSE_OR_POP(self, addr: Address, target): end_addr = addr.jump() truthiness = not addr.seek_back_statement(POP_JUMP_IF_TRUE) self.push_popjump(truthiness, end_addr, self.stack.pop(), addr) left = self.pop_popjump() if end_addr.opcode == ROT_TWO: opc, arg = end_addr[-1] if opc == JUMP_FORWARD and arg == 2: end_addr = end_addr[2] elif opc == RETURN_VALUE or opc == JUMP_FORWARD: end_addr = end_addr[-1] d = SuiteDecompiler(addr[1], end_addr, self.stack) d.run() right = self.stack.pop() if isinstance(right, PyCompare) and right.extends(left): py_and = left.chain(right) else: py_and = PyBooleanAnd(left, right) self.stack.push(py_and) return end_addr[3] d = SuiteDecompiler(addr[1], end_addr, self.stack) d.run() # if end_addr.opcode == RETURN_VALUE: # return end_addr[2] right = self.stack.pop() if isinstance(right, PyCompare) and right.extends(left): py_and = left.chain(right) else: py_and = PyBooleanAnd(left, right) self.stack.push(py_and) return end_addr ``` ## Decompyle++ Fix: Uses dummy instruction ``` FET_null() ``` ``` LOAD_NAME CALL_FUNCTION ``` Final code: ```python if cmd == '': return self.default(line) FET_null() try: func = getattr(self, 'do_' + cmd) except AttributeError: return self.default(line) ``` Decompile output: ```python if cmd == '': return self.default(line) None() try: func = getattr(self, 'do_' + cmd) except AttributeError: return self.default(line) ``` ### Rules executed instead (pycdc): Same code executed in switch case of `case Pyc::RETURN_VALUE:` at [here](https://github.com/zrax/pycdc/blob/1b59ea5cd8d875a7696c29bc593b5f6ea55fe6a8/ASTree.cpp#L1876) (function `BuildFromCode`): ```cpp=1876 case Pyc::RETURN_VALUE: { PycRef<ASTNode> value = stack.top(); stack.pop(); curblock->append(new ASTReturn(value)); if ((curblock->blktype() == ASTBlock::BLK_IF || curblock->blktype() == ASTBlock::BLK_ELSE) && stack_hist.size() && (mod->verCompare(2, 6) >= 0)) { stack = stack_hist.top(); stack_hist.pop(); PycRef<ASTBlock> prev = curblock; blocks.pop(); curblock = blocks.top(); curblock->append(prev.cast<ASTNode>()); bc_next(source, mod, opcode, operand, pos); } } break; ``` Now from the following: ``` RETURN_VALUE <== previous LOAD_NAME CALL_FUNCTION SETUP_EXCEPT ``` The `LOAD_NAME` will be consumed instead of `SETUP_EXCEPT` which doesn't throw an error.