# Simpler decompiler
This lays out a scheme for a simpler, faster *Decompiler*.
Since writing the original *Decompiler* the requirements and available infrastructure have changed:
- We turn error code checking into `try`/`catch` in most backends.
- [orelse markers](/zIUkiDbUQ9mFSPENBoSXGg) aims to provide a way to preserve information necessary for this
- *MaximalPath* provides an efficient, well-tested basic-block decomposition of a CFG.
- Placing declarations well has proven to be a significant duty of *TmpLControlFlow*
- The decompiler is used by *PseudoCode* to arrange *BlockTree* children so is on the debugging path.
## Current decompiler
The current decompiler produced `if`/`while` and nested, labeled blocks. Labeled breaks allow forward jumping to the end of a containing block.
It is followed by a pass that simplifies nested statement structures to eliminate unnecessary labels.
It is complex: the beginning and end of a loop and an if have to correspond to nodes in a CFG; this involves:
- converting the input CFG into an internal CFG
- inserting nodes into that CFG so that each node serves one purpose. *ensureSingularityOfPurpose*
- walking the CFG and inferring cuts where `if` and `while`s end. That same logic is used to infer where `try` and `catch` boundaries should go. That inference is not robust in the case where an external-to-Temper function lies about whether it can throw.
## Redesign
The redesign builds on *MaximalPath*'s basic block destructuring.
*MaximalPath* provides several useful elements.
- A list of paths that collect all runs of statements that must follow together into one nice bundle. Each path corresponds to a *basic block*.
- Edges between those paths with conditions filtered for reachable edges and for statically known predicates.
- Explicit notation of edges that go back to the beginning of a loop: `continue` edges.
- An order over the paths such that, ignoring loop `continue` edges, each path appears before the ones that it flows into.
The redesigned decompiler outputs:
- Nested `if`/`while`/`try` statements and blocks containing expression statements corresponding to *Tree*s.
- Instead of labels it uses numbered `goto`/`comefrom` pairs.
- All `goto`s are to `comefrom`s that lexically appear afterwards and to a `comefrom` in a block that contains the matching `goto`s. This means that `goto`/`comefrom` can be translated into labeled `break`s & labeled blocks.
### Rule 1 of decompiling: joining
If a basic block has multiple preceders joining at it, end each preceder with a new `goto` to a `comefrom` in the common ancestor.
```ts=
if (b) {
if (c) {
f();
// JOINING FROM HERE
} else {
while (true) {
g();
if (done) {
// AND JOINING FROM HERE
}
}
}
// CURRENT END OF BLOCK AFTER COMMON ANCESTOR
} else {
}
```
After joining we have `goto`/`comeFrom` like the below:
```diff=
if (b) {
if (c) {
f();
+ goto 1;
} else {
while (true) {
g();
if (done) {
+ goto 1;
}
}
}
+ goto 2;
+ {
+ comeFrom 1;
+ // PLACE TO INSERT INSTRUCTIONS AFTER JOIN.
+ }
+ comeFrom 2;
} else {
}
```
Rules of `comeFrom`/`goto` simplification:
- If there is one `goto x` for a `comeFrom x`, any run of statements after the `comeFrom x` can move before the `goto x` as long as doing so would not move a nested `comeFrom y` before any intervening `goto y`.
`goto 1; ...; comeFrom 1; f(); g()` ⇒
`f(); g(); goto 1; ...; comeFrom 1`.
- If a `comeFrom x` is the next statement in flow order from a `goto x` then the `goto x` can be eliminated.
`if (x) { f(); goto 1 }; comeFrom 1` ⇒
`if (x) { f() }; comeFrom 1`
- If a `comeFrom x` has no corresponding `goto x` it can be replaced with a no-op.
`if (x) { f() }; comeFrom 1` ⇒
`if (x) { f() }`
### Rule 2 of decompiling: loopbacks
Until indicated otherwise, the *containing* block is the block which joins the incoming edges. (There is a pre-allocated *root block* which is the containing block for the entry edge and which is the output of the whole decompilation process.)
If a basic block has an incoming `continue` edge, insert a `while (true) {}` into the tentative *containing block* and make the loop body the *containing block*.
Later on, we can recognize patterns to turn `while (true)` into a more idiomatic loop kind:
- `while (true) { ... if (!cond) break; }` ⇒
`do { ... } while (cond);`
- `while (true) { if (!cond) break; ... }` ⇒
`while (cond) { ... }`
### Rule 3 of decompiling: output statements
If a basic block contains statements, insert them at the end of the containing block.
### Rule 4 of decompiling: branching
Let the continuing block be the containing block.
For each following edge:
- If it is conditional:
1. insert an `if` statement with the conditional into the continuing block.
2. Store the consequent block as the continuer for that edge.
3. Set the alternate block as the new continuing block
- Else, set the continuing block as the continuer for that block.
### Status
Mostly implemented building in a side branch.
`goto`/`comeFrom` simplification not implemented.
Using try/catch markers not implemented as those markers are not present yet.
Some tests via a new *PseudoCodeDetail* bit, *useDecompiler2* which can be used in *TypeStageTest* cases by passing an explicit *PseudoCodeDetail* to *assertModuleAtStage*.