---
tags: decompiler
title: Other Errors
---
# Other errors
| Other Error | Count| Status|
| ------- | ----| ----|
| Reading file error (throw) | 3 | :heavy_check_mark: |
| Corrupted files (throw) | 209 |:heavy_check_mark: |
| Tokenization errors (throw) | 129| :warning: |
| Parse errors w/o .py (explicit) | 267 |:heavy_check_mark: |
| Internal grammar rule-bug (explicit) | 844|:heavy_check_mark: |
| Python 3.9 | 21050 - (3045) - 840 | :warning: |
Break down of other errors:
1) **Reading file error** (3): Trivial since the file was deleted by antivirus on my end (my bad)
2) **Bytecode is corrupted** (209): This is actually corrupted files confirmed via dis
3) **Conversion to their own tokens**(129): These are files that have dis success but because of some patterns that the decompiler expects, they fail. -> Theoretically can be fixed but will take alot more time since would need to instrument in decompiler to find the locations. The positions in code that it fails also varies but are all in the same stage of tokenization.
4) **Parse Errors without py output**(267): These are those that are indeed parse errors but do not have .py output except have filenames printed. Relatively easier to fix that previous one since filenames are there so can look up source files
5) **Internal grammar rule-bug**(844): These are those with py output but fail while finalizing the final python code. Effectively still don't point to where they fail and would be similar to implicit errors.
6) **Parse Errors that we focus on**(11029): These are the ones we are working on and this is the final level of error
7) **Python 3.9 code** (21050): These are also we are already covering
## Corrupted files
All files were run under dis to confirm whether they were corrupted or not.
Furthermore, the headers were changed to different python versions and `dis` was re-tried to see if there were header errors but there was no success.
## `Tokenization errors`
### Key error
- out of 126:
- 123 (all python 3.8): `jump_back_index = self.offset2tok_index[jump_target] - 1` - [ref](https://github.com/rocky/python-uncompyle6/blob/c7ebdb344be0ceb938b764b6b81a6a3af9913f27/uncompyle6/scanners/scanner38.py#L108)
- 6 (all python 2.7): `j = self.offset2inst_index[offset]` - [ref](https://github.com/rocky/python-uncompyle6/blob/f6f0e344d02925630e4b5c78a36ef8144dd78938/uncompyle6/scanners/scanner2.py#L400)
### Other tokenizatione errors:
- `AssertionError: at ifpoplaststmtl[2], expected 'c_stmts' node; got 'pass'` : 21
- `IndexError: list index out of range` : 11
- `AssertionError: set_comp_func (8)` : 2
- `IndexError: pop from empty list` : 2
All of these fall in code errors.
## `Internal grammar error`
### Common trends
- All have errors on `break` statement.
- All cause errors in python 3.8 bytecode
- Total of 844
- All have py output
### Summary trends
- 1. `while` loop is nested in `elif` block and has `break`
- 2. `while` loop has `1` or any constant value as condition and has `break`
- 3. Multiple conditions and nested `for` loops and has `break` in external loop - Does not loose any loops
- 4. `for` loop is nested in `elif` block and has `break`
- 5. Large codebase with multiple loops and one `pass` in `elif` block causes the decompiler to add `break`
- 6. `for` loop with 2 `continue` and one `break` in `else` block of `try/except/else`
- 7. Implicit error leads to pattern 1
- 8. Just multiple `for` loops in `elif`
### Examples
Example 1:
```python
def _task_get_stack(task, limit):
if a:
z=z#frames.reverse()
elif b:
while tb is not None:
if limit <= 0:
break
```
output:
```python
def _task_get_stack(task, limit):
if a:
z = z
else:
if b:
if tb is not None:
if limit <= 0:
break
# NOTE: have internal decompilation grammar errors.
# Use -t option to show full context.
# not in loop:
# break
# L. 10 30 BREAK_LOOP 34 'to 34'
```
Solution:
Using transformation key_word
```python
def _task_get_stack(task, limit):
if a:
z=z#frames.reverse()
elif b:
while tb is not None:
if limit <= 0:
FET_break()
```
or break `elif`
```python=
def _task_get_stack(task, limit):
if a:
z=z#frames.reverse()
if b and not a:
while tb is not None:
if limit <= 0:
break
```
Example 2:
```python
def _fix_exception_context(new_exc, old_exc):
# Context may not be correct, so find the end of the chain
while 1:
exc_context = new_exc.__context__
if exc_context is old_exc:
# Context is already set correctly (see issue 20317)
return
if exc_context is None or exc_context is frame_exc:
break
new_exc = exc_context
# Change the end of the chain to point to the exception
# we expect it to reference
new_exc.__context__ = old_exc
```
output:
```python
def _fix_exception_context(new_exc, old_exc):
exc_context = new_exc.__context__
if exc_context is old_exc:
return
else:
if not exc_context is None:
if exc_context is frame_exc:
break
new_exc = exc_context
new_exc.__context__ = old_exc
# NOTE: have internal decompilation grammar errors.
# Use -t option to show full context.
# not in loop:
# break
# L. 9 34 BREAK_LOOP 42 'to 42'
```
solution
```python
def _fix_exception_context(new_exc, old_exc):
# Context may not be correct, so find the end of the chain
tmp = 1
while tmp:
exc_context = new_exc.__context__
if exc_context is old_exc:
# Context is already set correctly (see issue 20317)
return
if exc_context is None or exc_context is frame_exc:
break
new_exc = exc_context
# Change the end of the chain to point to the exception
# we expect it to reference
new_exc.__context__ = old_exc
```
> Note: This leads to another implicit error but will count that as another error.
Example 3:
```python=
def parse_parts(self, parts):
parsed = []
sep = self.sep
altsep = self.altsep
drv = root = ''
it = reversed(parts)
for part in it:
if not part:
continue
if altsep:
part = part.replace(altsep, sep)
drv, root, rel = self.splitroot(part)
if sep in rel:
for x in reversed(rel.split(sep)):
if x and x != '.':
parsed.append(sys.intern(x))
else:
if rel and rel != '.':
parsed.append(sys.intern(rel))
if drv or root:
if not drv:
for part in it:
if not part:
continue
if altsep:
part = part.replace(altsep, sep)
drv = self.splitroot(part)[0]
if drv:
break
break
if drv or root:
parsed.append(drv + root)
parsed.reverse()
return drv, root,
```
Example 4:
```python=
def find_module(self, fullname, path=None):
if fullname in self.toc:
z=z
elif path is not None:
z=z
for p in path:
if not p.startswith(SYS_PREFIX):
continue
p = p[SYS_PREFIXLEN:]
parts = p.split(pyi_os_path.os_sep)
if not parts:
continue
if entry_name in self.toc:
break
return module_loader
```
output:
```python
def find_module(self, fullname, path=None):
if fullname in self.toc:
z = z
else:
if path is not None:
z = z
for p in path:
if not p.startswith(SYS_PREFIX):
pass
else:
p = p[SYS_PREFIXLEN:]
parts = p.split(pyi_os_path.os_sep)
if not parts:
pass
elif entry_name in self.toc:
break
return module_loader
```
solution:
```python
def find_module(self, fullname, path=None):
if fullname in self.toc:
z=z
if path is not None and not fullname in self.toc:
z=z
for p in path:
if not p.startswith(SYS_PREFIX):
continue
p = p[SYS_PREFIXLEN:]
parts = p.split(pyi_os_path.os_sep)
if not parts:
continue
if entry_name in self.toc:
break
return module_loader
```
Example 5:
[Link](https://github.com/numpy/numpy/blob/main/numpy/lib/arraypad.py#L806)
Solution: No solution
Example 6:
```
def process_listeners(self, listener_type, argument, result):
removed = []
for i, listener in enumerate(self._listeners):
if listener.type != listener_type:
continue
future = listener.future
if future.cancelled():
removed.append(i)
continue
try:
passed = listener.predicate(argument)
except Exception as exc:
future.set_exception(exc)
removed.append(i)
else:
if passed:
future.set_result(result)
removed.append(i)
if listener.type == ListenerType.chunk:
break
```
Output =>
```
XXXX
anything loses information in decompiler. example:
def process_listeners(self, listener_type, argument, result):
removed = []
for i, listener in enumerate(self._listeners):
if listener.type != listener_type:
continue
future = listener.future
if future.cancelled():
removed.append(i)
continue
try:
passed = listener.predicate(argument)
except Exception as exc:
future.set_exception(exc)
removed.append(i)
else:
if passed:
future.set_result(result)
removed.append(i)
if listener.type == ListenerType.chunk:
tmp = 'break'
if tmp=='break':
break
```
Example 7:
```python
def determineEncoding(self, chardet=True):
# "likely" encoding
charEncoding = lookupEncoding(self.likely_encoding), "tentative"
if charEncoding[0] is not None:
return charEncoding
# Guess with chardet, if available
if chardet:
try:
from chardet.universaldetector import UniversalDetector
except ImportError:
pass
else:
buffers = []
detector = UniversalDetector()
while not detector.done:
buffer = self.rawStream.read(self.numBytesChardet)
assert isinstance(buffer, bytes)
if not buffer:
break
buffers.append(buffer)
detector.feed(buffer)
detector.close()
encoding = lookupEncoding(detector.result['encoding'])
self.rawStream.seek(0)
if encoding is not None:
return encoding, "tentative"
# Try the default encoding
```
converts to =>
```python
def determineEncoding(self, chardet=True):
charEncoding = (
lookupEncoding(self.likely_encoding), 'tentative')
if charEncoding[0] is not None:
return charEncoding
elif chardet:
try:
from chardet.universaldetector import UniversalDetector
except ImportError:
pass
else:
buffers = []
detector = UniversalDetector()
if not detector.done:
buffer = self.rawStream.read(self.numBytesChardet)
assert isinstance(buffer, bytes)
if not buffer:
break
buffers.append(buffer)
detector.feed(buffer)
else:
detector.close()
encoding = lookupEncoding(detector.result['encoding'])
self.rawStream.seek(0)
if encoding is not None:
return (
encoding, 'tentative')
```
`elif chardet:` makes into pattern 1 and so causing issues
Example 8:
```
def tokens(self, event, next):
kind, data, _ = event
if kind == START:
tag, attribs = data
name = tag.localname
namespace = tag.namespace
converted_attribs = {}
for k, v in attribs:
if isinstance(k, QName):
converted_attribs[(k.namespace, k.localname)] = v
else:
converted_attribs[(None, k)] = v
if namespace == namespaces["html"] and name in voidElements:
for token in self.emptyTag(namespace, name, converted_attribs,
not next or next[0] != END or
next[1] != tag):
yield token
else:
yield self.startTag(namespace, name, converted_attribs)
elif kind == END:
name = data.localname
namespace = data.namespace
if namespace != namespaces["html"] or name not in voidElements:
yield self.endTag(namespace, name)
elif kind == COMMENT:
yield self.comment(data)
elif kind == TEXT:
for token in self.text(data):
yield token
elif kind == DOCTYPE:
yield self.doctype(*data)
elif kind in (XML_NAMESPACE, DOCTYPE, START_NS, END_NS,
START_CDATA, END_CDATA, PI):
pass
else:
yield self.unknown(kind)
```
Causes:
```
def tokens(self, event, next):
kind, data, _ = event
if kind == START:
tag, attribs = data
name = tag.localname
namespace = tag.namespace
converted_attribs = {}
for k, v in attribs:
if isinstance(k, QName):
converted_attribs[(k.namespace, k.localname)] = v
else:
converted_attribs[(None, k)] = v
else:
if namespace == namespaces['html'] and name in voidElements:
for token in self.emptyTag(namespace, name, converted_attribs, not next or next[0] != END or next[1] != tag):
yield token
else:
yield self.startTag(namespace, name, converted_attribs)
else:
if kind == END:
name = data.localname
namespace = data.namespace
if namespace != namespaces['html'] or name not in voidElements:
yield self.endTag(namespace, name)
else:
if kind == COMMENT:
yield self.comment(data)
else:
if kind == TEXT:
for token in self.text(data):
yield token
else:
if kind == DOCTYPE:
yield (self.doctype)(*data)
else:
if kind in (XML_NAMESPACE, DOCTYPE, START_NS, END_NS,
START_CDATA, END_CDATA, PI):
break
else:
yield self.unknown(kind)
# NOTE: have internal decompilation grammar errors.
# Use -t option to show full context.
# not in loop:
# break
# L. 40 354 BREAK_LOOP 368 'to 368'
```