# Simpler decompiler This lays out a scheme for a simpler, faster *Decompiler*. Since writing the original *Decompiler* the requirements and available infrastructure have changed: - We turn error code checking into `try`/`catch` in most backends. - [orelse markers](/zIUkiDbUQ9mFSPENBoSXGg) aims to provide a way to preserve information necessary for this - *MaximalPath* provides an efficient, well-tested basic-block decomposition of a CFG. - Placing declarations well has proven to be a significant duty of *TmpLControlFlow* - The decompiler is used by *PseudoCode* to arrange *BlockTree* children so is on the debugging path. ## Current decompiler The current decompiler produced `if`/`while` and nested, labeled blocks. Labeled breaks allow forward jumping to the end of a containing block. It is followed by a pass that simplifies nested statement structures to eliminate unnecessary labels. It is complex: the beginning and end of a loop and an if have to correspond to nodes in a CFG; this involves: - converting the input CFG into an internal CFG - inserting nodes into that CFG so that each node serves one purpose. *ensureSingularityOfPurpose* - walking the CFG and inferring cuts where `if` and `while`s end. That same logic is used to infer where `try` and `catch` boundaries should go. That inference is not robust in the case where an external-to-Temper function lies about whether it can throw. ## Redesign The redesign builds on *MaximalPath*'s basic block destructuring. *MaximalPath* provides several useful elements. - A list of paths that collect all runs of statements that must follow together into one nice bundle. Each path corresponds to a *basic block*. - Edges between those paths with conditions filtered for reachable edges and for statically known predicates. - Explicit notation of edges that go back to the beginning of a loop: `continue` edges. - An order over the paths such that, ignoring loop `continue` edges, each path appears before the ones that it flows into. The redesigned decompiler outputs: - Nested `if`/`while`/`try` statements and blocks containing expression statements corresponding to *Tree*s. - Instead of labels it uses numbered `goto`/`comefrom` pairs. - All `goto`s are to `comefrom`s that lexically appear afterwards and to a `comefrom` in a block that contains the matching `goto`s. This means that `goto`/`comefrom` can be translated into labeled `break`s & labeled blocks. ### Rule 1 of decompiling: joining If a basic block has multiple preceders joining at it, end each preceder with a new `goto` to a `comefrom` in the common ancestor. ```ts= if (b) { if (c) { f(); // JOINING FROM HERE } else { while (true) { g(); if (done) { // AND JOINING FROM HERE } } } // CURRENT END OF BLOCK AFTER COMMON ANCESTOR } else { } ``` After joining we have `goto`/`comeFrom` like the below: ```diff= if (b) { if (c) { f(); + goto 1; } else { while (true) { g(); if (done) { + goto 1; } } } + goto 2; + { + comeFrom 1; + // PLACE TO INSERT INSTRUCTIONS AFTER JOIN. + } + comeFrom 2; } else { } ``` Rules of `comeFrom`/`goto` simplification: - If there is one `goto x` for a `comeFrom x`, any run of statements after the `comeFrom x` can move before the `goto x` as long as doing so would not move a nested `comeFrom y` before any intervening `goto y`. `goto 1; ...; comeFrom 1; f(); g()` ⇒ `f(); g(); goto 1; ...; comeFrom 1`. - If a `comeFrom x` is the next statement in flow order from a `goto x` then the `goto x` can be eliminated. `if (x) { f(); goto 1 }; comeFrom 1` ⇒ `if (x) { f() }; comeFrom 1` - If a `comeFrom x` has no corresponding `goto x` it can be replaced with a no-op. `if (x) { f() }; comeFrom 1` ⇒ `if (x) { f() }` ### Rule 2 of decompiling: loopbacks Until indicated otherwise, the *containing* block is the block which joins the incoming edges. (There is a pre-allocated *root block* which is the containing block for the entry edge and which is the output of the whole decompilation process.) If a basic block has an incoming `continue` edge, insert a `while (true) {}` into the tentative *containing block* and make the loop body the *containing block*. Later on, we can recognize patterns to turn `while (true)` into a more idiomatic loop kind: - `while (true) { ... if (!cond) break; }` ⇒ `do { ... } while (cond);` - `while (true) { if (!cond) break; ... }` ⇒ `while (cond) { ... }` ### Rule 3 of decompiling: output statements If a basic block contains statements, insert them at the end of the containing block. ### Rule 4 of decompiling: branching Let the continuing block be the containing block. For each following edge: - If it is conditional: 1. insert an `if` statement with the conditional into the continuing block. 2. Store the consequent block as the continuer for that edge. 3. Set the alternate block as the new continuing block - Else, set the continuing block as the continuer for that block. ### Status Mostly implemented building in a side branch. `goto`/`comeFrom` simplification not implemented. Using try/catch markers not implemented as those markers are not present yet. Some tests via a new *PseudoCodeDetail* bit, *useDecompiler2* which can be used in *TypeStageTest* cases by passing an explicit *PseudoCodeDetail* to *assertModuleAtStage*.