# LeanVM Compiler
> Inspired by the analysis documentation from Morgan Thomas: https://docs.google.com/document/d/1wh2WoMo3r4fSJWHr8OzS6ogJlLKQ6Ai_SIPFIh5eeTE/edit?tab=t.0
>
## Why do we need a compiler for leanVM?
Before diving into the compiler, let's understand its target. The leanVM is not a general-purpose machine like the other zkVMs targetting real time proving on the L1. It's designed with a single, crucial goal in mind: post-quantum signature aggregation on Ethereum.
* **The Task**: To prove the aggregation and recursive merging of many XMSS hash-based signatures. This boils down to proving a massive number of Poseidon2 hashes glued together with specific logic.
* **The Target ISA**: The compiler generates bytecode for the leanVM's Instruction Set Architecture (ISA), which is tailored for this task. Key features include:
* **Field**: The small and efficient KoalaBear prime field ($p = 2^{31} - 2^{24} + 1$).
* **Memory**: A simple read-only memory model.
* **Registers**: A minimalist set with just a program counter (`pc`) and a frame pointer (`fp`).
* **Instructions**: A few core opcodes (`ADD`, `MUL`, `DEREF`, `JUMP`) and some precompiles for hashing (`POSEIDON`), cryptography (`DOT_PRODUCT`), and proof system internals (`MULTILINEAR_EVAL`).
The compiler's essential role is to translate human-readable logic into the VM's constrained, ZK-friendly bytecode. For a program of this scale, writing bytecode manually is impractically complex and prone to error, making a dedicated compiler the most viable approach. Crucially, the compiler is a distinct component that can be designed and developed in parallel with the core proof system.
## Writing Code in the leanSig DSL
The high-level language for the leanVM is the leanSig DSL. It's designed to provide familiar programming constructs while mapping cleanly to the underlying ZK constraints.
* **Syntax & Semantics**: It has a Rust-like syntax and C-like semantics.
* **Core Type**: The only first-class type is a scalar, representing a base field element.
* **Features**:
* Basic arithmetic (`+`, `*`, `/`, `-`).
* Control flow: `if`, `match`, `for` loops, and function calls.
* Memory management via `malloc` and `malloc_vec` for dynamic allocation.
* Direct access to VM features through keywords like `poseidon16`, `dot_product`, and `decompose_bits`.
* **Optimization Hints**: The programmer can guide the compiler with annotations like `#[inline]`, `#[unroll]`, and `const` function arguments.
## Four-Stage Compilation Pipeline
The leanSig compiler transforms DSL code into executable bytecode through a four-stage process:
1. **Parsing**: The source code is parsed into an Abstract Syntax Tree (AST). This stage performs basic validation, like checking if functions are defined.
2. **Simplification**: The AST is transformed into a simpler, more canonical form. This is where high-level language features are "desugared":
* Functions marked `#[inline]` are inlined.
* Functions with `const` arguments are specialized for each call site.
* Constant expressions are evaluated at compile time (constant folding).
* `for` loops are rewritten into recursive functions (ZK-friendly pattern!).
3. **Intermediate Code Generation**: The simplified AST is compiled into an intermediate bytecode. This stage is important for memory layout, as it's where stack frames are planned and high-level constructs are translated into a sequence of low-level operations.
4. **Final Code Generation**: The intermediate representation is converted into the final, low-level bytecode. This stage lays out the instructions in memory, materializes constants, and inserts the necessary entry and finalization code.
## Compiler Challenges & ZK-Friendly Strategies
Compiling for a zk-VM isn't like compiling for a normal CPU. Every choice has consequences for the proving cost. Let's discuss some of the interesting strategies used in the leanVM compiler.
### Handling Control Flow
* **Loops**: How do you handle a `for` loop when the machine can't make arbitrary jumps? The leanSig compiler uses a standard ZK trick:
* If the number of iterations is small and known at compile-time, the loop is unrolled.
* Otherwise, the loop is automatically rewritten as a recursive function. This replaces dynamic jumps with function calls, which have a predictable memory and instruction layout.
* **Switch Statements**: To implement a `match` or `switch` statement, the compiler uses an arithmetic trick instead of a jump table. It places the code for each branch at regular intervals in memory and computes the jump destination directly: `dest = base_address + branch_index * block_size`. (Note: For soundness reasons, this requires the program to assert `branch_index` < `total_number_of_branches`)
* **Functions & The Stack**: With a read-only memory model, the stack can't be reused in the traditional sense. When a function is called:
* A new, fresh memory region (stack frame) is allocated.
* The old `pc` and `fp` are saved to the top of the new frame.
* The `fp` register is updated to point to this new frame.
* Interestingly, the allocation pointer (`ap`) is a Prover-only concept used during execution to find free memory. It doesn't need to be part of the VM state or the proof, simplifying the AIR.
### Memory Management & Hints
* **`malloc`**: How does `malloc` work with read-only memory and no OS?
* The compiler distinguishes between compile-time known sizes (stack allocated) and dynamic sizes (heap allocated).
* For dynamic sizes, it emits a `RequestMemory` hint. This hint is not an instruction and doesn't go into the trace. Instead, it tells the *execution engine* (the Prover) to allocate memory from a heap pointer.
* The execution engine actually runs the program twice: the first pass calculates the total required heap size, and the second pass performs the actual execution with correctly laid out memory.
### Open Questions for Discussion
* The "unroll or recurse" strategy for loops is common. What are the performance trade-offs? When does recursion become cheaper than unrolling a large loop?
* The two-pass execution for `malloc` is an interesting solution. Are there alternative ways to handle dynamic memory allocation in a zk-VM? What are their pros and cons?
## Correctness and Performance
The technical analysis of the leanVM compiler highlights two critical, long-term goals.
### Proving Compiler Correctness
How do we trust that the compiler doesn't introduce bugs? A full formal verification of the compiler could add value. A more pragmatic approach is to build a **verifying compiler**.
* **Verified Compiler**: Proves that for *all possible* source programs, the compilation is correct.
* **Verifying Compiler**: For *each specific program it compiles*, it produces a proof of correctness alongside the bytecode.
Possible recommendation for leanVM is to pursue the path of a verifying compiler, likely by porting it to a proof assistant like Coq or Lean. This requires creating formal mathematical definitions of the DSL and ISA semantics.
### Optimizing for Proving Cost
Compiler optimizations are usually about execution speed. Here, the goal is to minimize the cost of generating a proof. This requires a different mindset.
* **Cost Model**: Since generating a proof is expensive, the optimizer can't just try a compilation and measure the result. It needs a cheap cost model—a function that *estimates* the proving cost based on metrics like bytecode size, cycle count, memory footprint, and opcode frequency.
* **Combinatorial Optimization**: Choosing whether to inline a function or unroll a loop is a complex decision. The best choice depends on the context of the entire program. The analysis suggests treating the DSL annotations (`#[inline]`, `#[unroll]`) as *suggestions* for a combinatorial optimization search that tries to find the best global configuration for minimizing the cost model.
### Open Questions for Discussion
* What are the most important factors for a ZK proving cost model? Is it trace length, constraint degree, or something else?
* How can we balance the complexity of advanced compiler optimizations against the need for the compiler itself to be simple and verifiable?
* The leanVM ISA is minimal. What single instruction, if added, would provide the biggest performance improvement for the XMSS signature program? (e.g. bitwise operations?)