leanVM Compiler

# LeanVM Compiler > Inspired by the analysis documentation from Morgan Thomas: https://docs.google.com/document/d/1wh2WoMo3r4fSJWHr8OzS6ogJlLKQ6Ai_SIPFIh5eeTE/edit?tab=t.0 > ## Why do we need a compiler for leanVM? Before diving into the compiler, let's understand its target. The leanVM is not a general-purpose machine like the other zkVMs targetting real time proving on the L1. It's designed with a single, crucial goal in mind: post-quantum signature aggregation on Ethereum. * **The Task**: To prove the aggregation and recursive merging of many XMSS hash-based signatures. This boils down to proving a massive number of Poseidon2 hashes glued together with specific logic. * **The Target ISA**: The compiler generates bytecode for the leanVM's Instruction Set Architecture (ISA), which is tailored for this task. Key features include: * **Field**: The small and efficient KoalaBear prime field ($p = 2^{31} - 2^{24} + 1$). * **Memory**: A simple read-only memory model. * **Registers**: A minimalist set with just a program counter (`pc`) and a frame pointer (`fp`). * **Instructions**: A few core opcodes (`ADD`, `MUL`, `DEREF`, `JUMP`) and some precompiles for hashing (`POSEIDON`), cryptography (`DOT_PRODUCT`), and proof system internals (`MULTILINEAR_EVAL`). The compiler's essential role is to translate human-readable logic into the VM's constrained, ZK-friendly bytecode. For a program of this scale, writing bytecode manually is impractically complex and prone to error, making a dedicated compiler the most viable approach. Crucially, the compiler is a distinct component that can be designed and developed in parallel with the core proof system. ## Writing Code in the leanSig DSL The high-level language for the leanVM is the leanSig DSL. It's designed to provide familiar programming constructs while mapping cleanly to the underlying ZK constraints. * **Syntax & Semantics**: It has a Rust-like syntax and C-like semantics. * **Core Type**: The only first-class type is a scalar, representing a base field element. * **Features**: * Basic arithmetic (`+`, `*`, `/`, `-`). * Control flow: `if`, `match`, `for` loops, and function calls. * Memory management via `malloc` and `malloc_vec` for dynamic allocation. * Direct access to VM features through keywords like `poseidon16`, `dot_product`, and `decompose_bits`. * **Optimization Hints**: The programmer can guide the compiler with annotations like `#[inline]`, `#[unroll]`, and `const` function arguments. ## Four-Stage Compilation Pipeline The leanSig compiler transforms DSL code into executable bytecode through a four-stage process: 1. **Parsing**: The source code is parsed into an Abstract Syntax Tree (AST). This stage performs basic validation, like checking if functions are defined. 2. **Simplification**: The AST is transformed into a simpler, more canonical form. This is where high-level language features are "desugared": * Functions marked `#[inline]` are inlined. * Functions with `const` arguments are specialized for each call site. * Constant expressions are evaluated at compile time (constant folding). * `for` loops are rewritten into recursive functions (ZK-friendly pattern!). 3. **Intermediate Code Generation**: The simplified AST is compiled into an intermediate bytecode. This stage is important for memory layout, as it's where stack frames are planned and high-level constructs are translated into a sequence of low-level operations. 4. **Final Code Generation**: The intermediate representation is converted into the final, low-level bytecode. This stage lays out the instructions in memory, materializes constants, and inserts the necessary entry and finalization code. ## Compiler Challenges & ZK-Friendly Strategies Compiling for a zk-VM isn't like compiling for a normal CPU. Every choice has consequences for the proving cost. Let's discuss some of the interesting strategies used in the leanVM compiler. ### Handling Control Flow * **Loops**: How do you handle a `for` loop when the machine can't make arbitrary jumps? The leanSig compiler uses a standard ZK trick: * If the number of iterations is small and known at compile-time, the loop is unrolled. * Otherwise, the loop is automatically rewritten as a recursive function. This replaces dynamic jumps with function calls, which have a predictable memory and instruction layout. * **Switch Statements**: To implement a `match` or `switch` statement, the compiler uses an arithmetic trick instead of a jump table. It places the code for each branch at regular intervals in memory and computes the jump destination directly: `dest = base_address + branch_index * block_size`. (Note: For soundness reasons, this requires the program to assert `branch_index` < `total_number_of_branches`) * **Functions & The Stack**: With a read-only memory model, the stack can't be reused in the traditional sense. When a function is called: * A new, fresh memory region (stack frame) is allocated. * The old `pc` and `fp` are saved to the top of the new frame. * The `fp` register is updated to point to this new frame. * Interestingly, the allocation pointer (`ap`) is a Prover-only concept used during execution to find free memory. It doesn't need to be part of the VM state or the proof, simplifying the AIR. ### Memory Management & Hints * **`malloc`**: How does `malloc` work with read-only memory and no OS? * The compiler distinguishes between compile-time known sizes (stack allocated) and dynamic sizes (heap allocated). * For dynamic sizes, it emits a `RequestMemory` hint. This hint is not an instruction and doesn't go into the trace. Instead, it tells the *execution engine* (the Prover) to allocate memory from a heap pointer. * The execution engine actually runs the program twice: the first pass calculates the total required heap size, and the second pass performs the actual execution with correctly laid out memory. ### Open Questions for Discussion * The "unroll or recurse" strategy for loops is common. What are the performance trade-offs? When does recursion become cheaper than unrolling a large loop? * The two-pass execution for `malloc` is an interesting solution. Are there alternative ways to handle dynamic memory allocation in a zk-VM? What are their pros and cons? ## Correctness and Performance The technical analysis of the leanVM compiler highlights two critical, long-term goals. ### Proving Compiler Correctness How do we trust that the compiler doesn't introduce bugs? A full formal verification of the compiler could add value. A more pragmatic approach is to build a **verifying compiler**. * **Verified Compiler**: Proves that for *all possible* source programs, the compilation is correct. * **Verifying Compiler**: For *each specific program it compiles*, it produces a proof of correctness alongside the bytecode. Possible recommendation for leanVM is to pursue the path of a verifying compiler, likely by porting it to a proof assistant like Coq or Lean. This requires creating formal mathematical definitions of the DSL and ISA semantics. ### Optimizing for Proving Cost Compiler optimizations are usually about execution speed. Here, the goal is to minimize the cost of generating a proof. This requires a different mindset. * **Cost Model**: Since generating a proof is expensive, the optimizer can't just try a compilation and measure the result. It needs a cheap cost model—a function that *estimates* the proving cost based on metrics like bytecode size, cycle count, memory footprint, and opcode frequency. * **Combinatorial Optimization**: Choosing whether to inline a function or unroll a loop is a complex decision. The best choice depends on the context of the entire program. The analysis suggests treating the DSL annotations (`#[inline]`, `#[unroll]`) as *suggestions* for a combinatorial optimization search that tries to find the best global configuration for minimizing the cost model. ### Open Questions for Discussion * What are the most important factors for a ZK proving cost model? Is it trace length, constraint degree, or something else? * How can we balance the complexity of advanced compiler optimizations against the need for the compiler itself to be simple and verifiable? * The leanVM ISA is minimal. What single instruction, if added, would provide the biggest performance improvement for the XMSS signature program? (e.g. bitwise operations?)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.