--- tags: decompiler title: fixing error u6 --- # Setup time. 1 hr. # Finding MWE. 45min. 44 + 7 + 1,237 + 1,509 + 2695 + 499 + 580 = # Inspecting uncompyle6 ## Find entry point First look at the directory log and find the entry point of the main function. There are 113 files from which potential entry points are `main.py` and `__init__.py`. The entry point turns out to be in `bin` directory names `uncompyle.py` with 266 SLOC. ## Find decompilation function ### `main.py` (499 SLOC) Trace `pyc_paths` variable to find where the decompilation is called. Leads to the function `main` that jumps to file `main.py` (499 SLOC). Then find function that leads to `decompile_file` to decompiler the file after processing. This leads to `decompile` function. Which essentially runs on each file as well. -> This function will print the header of the file as well. The After 5 multiple iterations of tracing the code, the deparsing functions found were `deparse_code_with_map`, `code_deparse_fragments`, and `code_deparse`. From these it was `code_deparse` that was the invoked function for decompiling a file. ### `pysource.py` (2695 SLOC) `code_deparse` (108 SLOC) takes a code object and sourceWalker. We need to understand the structure of both code object variable (`co`) and SourceWalker (`walker`). The decompiler first gets the variable `scanner` from `get_scanner`. ``` # store final output stream for case of error scanner = get_scanner(version, is_pypy=is_pypy) ``` The comment mislead me to thinking there may be a double meaning for scanner. Inspecting the function `get_scanner` (570 SLOC) function in new file. The object we received was `uncompyle6.scanners.scanner37.Scanner37`. Essentially need of inspection of each module and traverse it to see how the scanner class works (3 class hierarchy). The object `scanner` used `ingest` function that needs to be inspected (inherited from parent classes and has 200 SLOC). Custom tokenization is done that essentially returns renamed instructions. Note that this only gives first level tokens. The `ingest` function also stores the instructions in `self.code` making it stateful (Had to traverse classes to find this). ``` The transformations are made to assist the deparsing grammar. Specificially: - various types of LOAD_CONST's are categorized in terms of what they load - COME_FROM instructions are added to assist parsing control structures - MAKE_FUNCTION and FUNCTION_CALLS append the number of positional arguments - some EXTENDED_ARGS instructions are removed ``` ``` linestarts = dict(scanner.opc.findlinestarts(co)) ``` #### `scanner.py` (580 SLOC) This imports an object of scanner from `scanners` directory. The directory has 34 files (3,786 SLOC). All of them used to make a scanner for a specific python version. ## Decompilation. The grammar is parsed through the `SourceWalker` class. It takes the ingested bytecode/instructions from `Scanner` class and parses it using the parser from `get_python_parser` in `parser.py` (893 SLOC). ## Tracing rules. Python 3.7. We have to look into hierarchy of files i.e., file for parsing. The rules are not straightforward and can be in functions such as `custom_classfunc_rule` which are intialized based on the instructions parsed in the `scanner`. #### `SourceWalker` (2443 SLOC) # Python 3.7, uncompyle6 MWE: ``` def is_at_turn(): return (x or z) and not y ``` ``` 0 LOAD_GLOBAL x 2 POP_JUMP_IF_TRUE 8 'to 8' 4 LOAD_GLOBAL z 6 JUMP_IF_FALSE_OR_POP 12 'to 12' 8_0 COME_FROM 2 '2' 8 LOAD_GLOBAL y 10 UNARY_NOT 12_0 COME_FROM 6 '6' 12 RETURN_VALUE -1 RETURN_LAST ``` Grammar pointed out: ``` expr ::= LOAD_GLOBAL expr ::= unary_not unary_not ::= expr UNARY_NOT ret_expr_or_cond ::= ret_expr ret_and ::= expr JUMP_IF_FALSE_OR_POP ret_expr_or_cond COME_FROM ret_or ::= expr JUMP_IF_TRUE_OR_POP ret_expr_or_cond COME_FROM ret_expr ::= expr ret_expr ::= ret_and ret_expr ::= ret_or return ::= ret_expr RETURN_VALUE sstmt ::= return RETURN_LAST ``` ``` sstmt: (0) return (0) return expr (0) expr (0) and (1) RETURN_VALUE (1)RETURN_LAST ``` ## Rule updates tried: Rule #1: make `ret_expr_or_cond` since not used in the above pattern. Failed. I wanted to target the `and` and noticed that maybe it can capture `ret_and` if made it optional. ``` ret_and ::= expr JUMP_IF_FALSE_OR_POP ret_expr_or_cond? COME_FROM ``` Crafted from python 3.6 migration. ``` sstmt ::= sstmt RETURN_LAST ❤️ return ::= ret_expr RETURN_VALUE ❤️ ret_expr ::= expr ❤️ expr ::= and ❤️ and ::= expr JUMP_IF_FALSE_OR_POP expr COME_FROM ❤️ expr ::= or ❤️ expr ::= unary_not ❤️ unary_not ::= expr UNARY_NOT ❤️ or ::= expr_jt expr COME_FROM ❤️ expr_jt ::= expr jmp_true ❤️ jmp_true ::= POP_JUMP_IF_TRUE ❤️ ``` Rule #2: Applied the following rule to `p_stmt` and fixes the error. ``` ret_expr ::= expr POP_JUMP_IF_TRUE expr JUMP_IF_FALSE_OR_POP COME_FROM unary_not COME_FROM ``` however the output I get is: ``` def is_at_turn(): return x ``` And so a general rule is not enough and each token must be put in the respective block. Rule #3: Applied the following rule to `p_stmt`: ``` ret_expr ::= ret_and_a ``` And the following rule to `p_jump3` func: ``` ret_and_a ::= expr POP_JUMP_IF_TRUE expr JUMP_IF_FALSE_OR_POP COME_FROM expr COME_FROM ``` And so we get the following output: ``` def is_at_turn(): return xz(not y) ``` Missing `and` which needs to be added. Break down the rule in `p_jump3`. Rule #4: I break the rules in `p_jump3` as follows. ``` or ::= expr POP_JUMP_IF_TRUE expr JUMP_IF_FALSE_OR_POP COME_FROM ret_and_a ::= or expr COME_FROM ``` The following is the result: ``` def is_at_turn(): return x or (not y) # Itentifies `or` but no and ``` Rule #5: Following pattern. ``` or ::= expr POP_JUMP_IF_TRUE expr JUMP_IF_FALSE_OR_POP COME_FROM and ::= or expr COME_FROM ret_and_a ::= and ``` gives the following: ``` def is_at_turn(): return x or and ``` Rule #6: Following breakdown attempted. ``` or ::= expr POP_JUMP_IF_TRUE expr ret_and_a ::= or JUMP_IF_FALSE_OR_POP COME_FROM expr COME_FROM ``` gives the following: ``` def is_at_turn(): return x or z(not y) ``` Rule #7: Finally the following fixes the error. ``` def p_expr(self, args): ret_expr ::= ret_and_a def p_jump3(self, args): or ::= expr POP_JUMP_IF_TRUE expr and ::= or JUMP_IF_FALSE_OR_POP COME_FROM expr ret_and_a ::= and COME_FROM ``` Giving the output: ``` def is_at_turn(): return (x or z) and (not y) ``` Time: 10 hrs (after analysis of codebase before) Implicit errors: ``` def is_at_turn(): return (a or x or z) and not y ``` decompiles to: ``` def is_at_turn(): return (x or z) and (not y) ``` ## Grammar rule used ``` def is_at_turn(): temp = (x or z) return temp and not y ``` ``` temp = (x or z) ``` ``` stmts ::= sstmt+ sstmt ::= assign assign ::= expr store store ::= STORE_FAST expr ::= or or ::= expr_jitop expr COME_FROM expr_jitop ::= expr JUMP_IF_TRUE_OR_POP expr ::= LOAD_GLOBAL ``` ``` return temp and not y ``` ``` stmts ::= sstmt+ sstmt ::= sstmt RETURN_LAST sstmt ::= return return ::= ret_expr RETURN_VALUE ret_expr ::= expr expr ::= and and ::= expr JUMP_IF_FALSE_OR_POP expr COME_FROM expr ::= LOAD_FAST expr ::= unary_not unary_not ::= expr UNARY_NOT expr ::= LOAD_GLOBAL ``` Tree: ``` stmts (2) 0. sstmt assign (2) 0. expr or (3) 0. expr_jitop (2) 0. expr L. 2 0 LOAD_GLOBAL x 1. 2 JUMP_IF_TRUE_OR_POP 6 'to 6' 1. expr 4 LOAD_GLOBAL z 2. 6_0 COME_FROM 2 '2' 1. store 6 STORE_FAST 'temp' 1. sstmt (2) 0. sstmt return (2) 0. ret_expr expr and (4) 0. expr L. 3 8 LOAD_FAST 'temp' 1. 10 JUMP_IF_FALSE_OR_POP 16 'to 16' 2. expr unary_not (2) 0. expr 12 LOAD_GLOBAL y 1. 14 UNARY_NOT 3. 16_0 COME_FROM 10 '10' 1. 16 RETURN_VALUE 1. -1 RETURN_LAST ``` Error: Fails to convert the following into an `expr` for `and` expression. ``` x or z LOAD_GLOBAL (x) 2 POP_JUMP_IF_TRUE (to 8) 4 LOAD_GLOBAL ``` Use this instead: ``` and ::= expr JUMP_IF_FALSE_OR_POP expr COME_FROM ```