Optimizing the interpretation loop

# Optimizing the interpretation loop Period: <2020-10-24> -- <2020-11-10> After the optimizations related to gas update and exhaustion checking (documented here: https://hackmd.io/Q4ws_SZZTfu8aU3ESof4QA?edit), we now tackle the improvements of the interpretation loops. At this point, the evaluation of `fact 100` takes ~75μs. ## Preliminary experiments The current interpreter is written in the Lwt monad because some Michelson primitives require the Lwt monad. Even the evaluation of the instructions that do not require the monad are written in monadic style and as a consequence, there are a lot of Lwt.return and Lwt.bind where a direct computation is possible. There are many proposals from Michelson developers to optimize this: some advocate for using a FreeMonad because it offers good separation between pure and impure computations ; others suggest to implement the interpreter using a low-level language and by translating Michelson into a different untyped bytecode or assembly-like language before execution ; others claim that a small-step interpreter would better handle the alternation between pure and impure computations, etc... We decided to explore these proposals on a simplified Michelson, i.e., a subset of the current instruction set, realistic enough to expose the issues related to the primitives implemented in the Lwt monad. So we implemented most of these proposal (except the untyped one) and we run a microbenchmark on a Michelson script that computes `fact 100` and does an Lwt I/O at each step of this computation (100 Lwt I/Os). The resulting experiment is described here: https://gitlab.com/-/snippets/2034385 ## Rewriting the interpreter in the DCPS_ASTACK style The rewriting of the Michelson interpreter into DCPS_ASTACK style was done with no significant issues: https://gitlab.com/yrg/tezos/-/tree/yrg@optimize-interpreter For the moment, the typechecker still produces instructions in the original datatype. A translation from this datatype to the new one is available. It simply lifts the stack types to stack types with a unit element at the bottom: this ensures the decomposability of the stack into an accumulator and an extra stack. The translation is well-typed and justified with GADTs: this complexity will vanish when we will make the typechecker produce instructions in the new datatype. ## Next actions - Migrate the PayGas insertion to the new instruction datatype. - Integrate the gas-counter-as-step-argument optimization. - Rewrite the parser/typechecker. - Perform statistical analysis on execution traces to find superinstructions.