---
tags: decompiler
title: fixing error u6
---
# Setup time.
1 hr.
# Finding MWE.
45min.
44 + 7 +
1,237 + 1,509 + 2695 + 499 + 580 =
# Inspecting uncompyle6
## Find entry point
First look at the directory log and find the entry point of the main function. There are 113 files from which potential entry points are `main.py` and `__init__.py`.
The entry point turns out to be in `bin` directory names `uncompyle.py` with 266 SLOC.
## Find decompilation function
### `main.py` (499 SLOC)
Trace `pyc_paths` variable to find where the decompilation is called. Leads to the function `main` that jumps to file `main.py` (499 SLOC).
Then find function that leads to `decompile_file` to decompiler the file after processing.
This leads to `decompile` function. Which essentially runs on each file as well.
-> This function will print the header of the file as well.
The After 5 multiple iterations of tracing the code, the deparsing functions found were `deparse_code_with_map`, `code_deparse_fragments`, and `code_deparse`. From these it was `code_deparse` that was the invoked function for decompiling a file.
### `pysource.py` (2695 SLOC)
`code_deparse` (108 SLOC) takes a code object and sourceWalker.
We need to understand the structure of both code object variable (`co`) and SourceWalker (`walker`).
The decompiler first gets the variable `scanner` from `get_scanner`.
```
# store final output stream for case of error
scanner = get_scanner(version, is_pypy=is_pypy)
```
The comment mislead me to thinking there may be a double meaning for scanner. Inspecting the function `get_scanner` (570 SLOC) function in new file.
The object we received was `uncompyle6.scanners.scanner37.Scanner37`. Essentially need of inspection of each module and traverse it to see how the scanner class works (3 class hierarchy).
The object `scanner` used `ingest` function that needs to be inspected (inherited from parent classes and has 200 SLOC). Custom tokenization is done that essentially returns renamed instructions. Note that this only gives first level tokens.
The `ingest` function also stores the instructions in `self.code` making it stateful (Had to traverse classes to find this).
```
The transformations are made to assist the deparsing grammar.
Specificially:
- various types of LOAD_CONST's are categorized in terms of what they load
- COME_FROM instructions are added to assist parsing control structures
- MAKE_FUNCTION and FUNCTION_CALLS append the number of positional arguments
- some EXTENDED_ARGS instructions are removed
```
```
linestarts = dict(scanner.opc.findlinestarts(co))
```
#### `scanner.py` (580 SLOC)
This imports an object of scanner from `scanners` directory. The directory has 34 files (3,786 SLOC). All of them used to make a scanner for a specific python version.
## Decompilation.
The grammar is parsed through the `SourceWalker` class. It takes the ingested bytecode/instructions from `Scanner` class and parses it using the parser from `get_python_parser` in `parser.py` (893 SLOC).
## Tracing rules.
Python 3.7. We have to look into hierarchy of files i.e., file for parsing. The rules are not straightforward and can be in functions such as `custom_classfunc_rule` which are intialized based on the instructions parsed in the `scanner`.
#### `SourceWalker` (2443 SLOC)
# Python 3.7, uncompyle6
MWE:
```
def is_at_turn():
return (x or z) and not y
```
```
0 LOAD_GLOBAL x
2 POP_JUMP_IF_TRUE 8 'to 8'
4 LOAD_GLOBAL z
6 JUMP_IF_FALSE_OR_POP 12 'to 12'
8_0 COME_FROM 2 '2'
8 LOAD_GLOBAL y
10 UNARY_NOT
12_0 COME_FROM 6 '6'
12 RETURN_VALUE
-1 RETURN_LAST
```
Grammar pointed out:
```
expr ::= LOAD_GLOBAL
expr ::= unary_not
unary_not ::= expr UNARY_NOT
ret_expr_or_cond ::= ret_expr
ret_and ::= expr JUMP_IF_FALSE_OR_POP ret_expr_or_cond COME_FROM
ret_or ::= expr JUMP_IF_TRUE_OR_POP ret_expr_or_cond COME_FROM
ret_expr ::= expr
ret_expr ::= ret_and
ret_expr ::= ret_or
return ::= ret_expr RETURN_VALUE
sstmt ::= return RETURN_LAST
```
```
sstmt:
(0) return
(0) return expr
(0) expr
(0) and
(1) RETURN_VALUE
(1)RETURN_LAST
```
## Rule updates tried:
Rule #1:
make `ret_expr_or_cond` since not used in the above pattern. Failed. I wanted to target the `and` and noticed that maybe it can capture `ret_and` if made it optional.
```
ret_and ::= expr JUMP_IF_FALSE_OR_POP ret_expr_or_cond? COME_FROM
```
Crafted from python 3.6 migration.
```
sstmt ::= sstmt RETURN_LAST ❤️
return ::= ret_expr RETURN_VALUE ❤️
ret_expr ::= expr ❤️
expr ::= and ❤️
and ::= expr JUMP_IF_FALSE_OR_POP expr COME_FROM ❤️
expr ::= or ❤️
expr ::= unary_not ❤️
unary_not ::= expr UNARY_NOT ❤️
or ::= expr_jt expr COME_FROM ❤️
expr_jt ::= expr jmp_true ❤️
jmp_true ::= POP_JUMP_IF_TRUE ❤️
```
Rule #2:
Applied the following rule to `p_stmt` and fixes the error.
```
ret_expr ::= expr POP_JUMP_IF_TRUE expr JUMP_IF_FALSE_OR_POP COME_FROM unary_not COME_FROM
```
however the output I get is:
```
def is_at_turn():
return x
```
And so a general rule is not enough and each token must be put in the respective block.
Rule #3:
Applied the following rule to `p_stmt`:
```
ret_expr ::= ret_and_a
```
And the following rule to `p_jump3` func:
```
ret_and_a ::= expr POP_JUMP_IF_TRUE expr JUMP_IF_FALSE_OR_POP COME_FROM expr COME_FROM
```
And so we get the following output:
```
def is_at_turn():
return xz(not y)
```
Missing `and` which needs to be added. Break down the rule in `p_jump3`.
Rule #4:
I break the rules in `p_jump3` as follows.
```
or ::= expr POP_JUMP_IF_TRUE expr JUMP_IF_FALSE_OR_POP COME_FROM
ret_and_a ::= or expr COME_FROM
```
The following is the result:
```
def is_at_turn():
return x or (not y)
# Itentifies `or` but no and
```
Rule #5:
Following pattern.
```
or ::= expr POP_JUMP_IF_TRUE expr JUMP_IF_FALSE_OR_POP COME_FROM
and ::= or expr COME_FROM
ret_and_a ::= and
```
gives the following:
```
def is_at_turn():
return x or and
```
Rule #6:
Following breakdown attempted.
```
or ::= expr POP_JUMP_IF_TRUE expr
ret_and_a ::= or JUMP_IF_FALSE_OR_POP COME_FROM expr COME_FROM
```
gives the following:
```
def is_at_turn():
return x or z(not y)
```
Rule #7:
Finally the following fixes the error.
```
def p_expr(self, args):
ret_expr ::= ret_and_a
def p_jump3(self, args):
or ::= expr POP_JUMP_IF_TRUE expr
and ::= or JUMP_IF_FALSE_OR_POP COME_FROM expr
ret_and_a ::= and COME_FROM
```
Giving the output:
```
def is_at_turn():
return (x or z) and (not y)
```
Time: 10 hrs (after analysis of codebase before)
Implicit errors:
```
def is_at_turn():
return (a or x or z) and not y
```
decompiles to:
```
def is_at_turn():
return (x or z) and (not y)
```
## Grammar rule used
```
def is_at_turn():
temp = (x or z)
return temp and not y
```
```
temp = (x or z)
```
```
stmts ::= sstmt+
sstmt ::= assign
assign ::= expr store
store ::= STORE_FAST
expr ::= or
or ::= expr_jitop expr COME_FROM
expr_jitop ::= expr JUMP_IF_TRUE_OR_POP
expr ::= LOAD_GLOBAL
```
```
return temp and not y
```
```
stmts ::= sstmt+
sstmt ::= sstmt RETURN_LAST
sstmt ::= return
return ::= ret_expr RETURN_VALUE
ret_expr ::= expr
expr ::= and
and ::= expr JUMP_IF_FALSE_OR_POP expr COME_FROM
expr ::= LOAD_FAST
expr ::= unary_not
unary_not ::= expr UNARY_NOT
expr ::= LOAD_GLOBAL
```
Tree:
```
stmts (2)
0. sstmt
assign (2)
0. expr
or (3)
0. expr_jitop (2)
0. expr
L. 2 0 LOAD_GLOBAL x
1. 2 JUMP_IF_TRUE_OR_POP 6 'to 6'
1. expr
4 LOAD_GLOBAL z
2. 6_0 COME_FROM 2 '2'
1. store
6 STORE_FAST 'temp'
1. sstmt (2)
0. sstmt
return (2)
0. ret_expr
expr
and (4)
0. expr
L. 3 8 LOAD_FAST 'temp'
1. 10 JUMP_IF_FALSE_OR_POP 16 'to 16'
2. expr
unary_not (2)
0. expr
12 LOAD_GLOBAL y
1. 14 UNARY_NOT
3. 16_0 COME_FROM 10 '10'
1. 16 RETURN_VALUE
1. -1 RETURN_LAST
```
Error:
Fails to convert the following into an `expr` for `and` expression.
```
x or z
LOAD_GLOBAL (x)
2 POP_JUMP_IF_TRUE (to 8)
4 LOAD_GLOBAL
```
Use this instead:
```
and ::= expr JUMP_IF_FALSE_OR_POP expr COME_FROM
```