SP1 Precompiles 101

Special thanks to [John Guibas (Co-Founder, Succinct Labs)](https://x.com/jtguibas) for sharing us about Multi-table FRI technique. ## 1. Precompile Execution Precompiles are invoked via the RISC‑V `ecall` opcode, which acts as a syscall instruction. When a precompile is called, a dedicated custom circuit is executed, and its result is stored in fixed registers/memory. Subsequent execution can access these registers/memory to retrieve the precompile’s result. ![image](https://hackmd.io/_uploads/rkHyuXocJg.png) **Detailed Implementation:** 1. **`execute()`** ([Source](https://github.com/succinctlabs/sp1/blob/c97ccf2eb7d14bd3559e41c9717657491b0d1296/crates/core/executor/src/executor.rs#L1931)) 2. **`execute_cycle()`** ([Source](https://github.com/succinctlabs/sp1/blob/c97ccf2eb7d14bd3559e41c9717657491b0d1296/crates/core/executor/src/executor.rs#L1634)) 3. **`execute_instruction()`** ([Source](https://github.com/succinctlabs/sp1/blob/c97ccf2eb7d14bd3559e41c9717657491b0d1296/crates/core/executor/src/executor.rs#L1291)) ```rust if instruction.is_alu_instruction() { (a, b, c) = self.execute_alu(instruction); } ... } else if instruction.is_ecall_instruction() { // Precompiles will be here!!! (a, b, c, clk, next_pc, syscall, exit_code) = self.execute_ecall()?; } ... ``` 4. **`execute_ecall()`** ([Source](https://github.com/succinctlabs/sp1/blob/c97ccf2eb7d14bd3559e41c9717657491b0d1296/crates/core/executor/src/executor.rs#L1538)) The process within `execute_ecall()` is as follows: - **Read registers:** Retrieve the syscall ID from register x5 and arguments from x10 and x11. - **Mode & validation:** Check execution mode, update reports, and validate the call. - **Invoke syscall:** Obtain and execute the corresponding syscall implementation: ```rust syscall_impl.execute(&mut precompile_rt, syscall, b, c); ``` - **Post-call register re-read:** For specific syscalls (e.g., `EXIT_UNCONSTRAINED`), re-read register values. - **Write-back & clock update:** Write the execution result back to registers and add extra cycles to the clock. - **Return tuple:** The returned tuple contains: - `a`: Syscall execution result (the value to write back) - `b`: (Possibly re-read) value of x10 - `c`: (Possibly re-read) value of x11 - `clk`: Clock value prior to syscall execution - `precompile_next_pc`: The program counter for the next instruction - `syscall`: The code of the executed syscall - `returned_exit_code`: The exit code set within the syscall **CPU Cycle Calculation:** The cost for each precompile call consists mainly of: - The cost of executing the `ecall` instruction itself (equivalent to 1 cycle). - Memory access and register setup/cleanup for preparing and processing the data (ranging from a few cycles to several hundred cycles). - The execution cost of the dedicated precompile custom circuit. SP1 also uses a Cycle Tracker to measure how many cycles each part of the program consumes. For details, see the [Cycle Tracking documentation](https://docs.succinct.xyz/docs/sp1/writing-programs/cycle-tracking#:~:text=When%20writing%20a%20program%2C%20it,a%20portion%20of%20the%20program) on Succinct Docs. The timing for calculating clock cycles in SP1 is summarized in the table below. In total, the cost is computed as ``` 4 * (num_instructions) + syscall_cycles ``` where `syscall_cycles` represents the number of cycles required for the precompile computation. | code | cycle | | ------------------- | ---------------------------------------------------- | | execute_instruction | 4 | | execute_ecall | precompile_cycles (syscall_impl.num_extra_cycles()) | For comparison, the cycle count table for RISC Zero can be found [here](https://dev.risczero.com/api/zkvm/optimization#rv32im-operations-with-cycle-counts). ## 2. Proving Precompiles SP1 uses a vRAM-style approach of grouping the traces into chips for each opcode type. The first step is to construct chips for each type of event recorded in the [ExecutionRecord](https://github.com/succinctlabs/sp1/blob/595cf0ea29b515bdf2e471a4afdcecafcfbc033f/crates/core/executor/src/record.rs#L29). These chips (e.g. AddChip, MemoryChip, SyscallChip, etc.) are built by grouping events by type, and each chip defines its own set of constraints and columns that serve as the basic unit for generating the AIR circuit. For example, an AddEvent corresponds to an AddChip. The overall proof cost in SP1 is calculated on a per-chip basis, as described in [crates/core/machine/src/riscv/mod.rs](https://github.com/succinctlabs/sp1/blob/595cf0ea29b515bdf2e471a4afdcecafcfbc033f/crates/core/machine/src/riscv/mod.rs#L194). For each chip, the proof proceeds as follows: 1. `generate_dependencies()`: Generates an ExecutionRecord that indicates additional dependencies. 2. Observe the preprocessed commitment. 3. For each record: - `generate_trace()` - `commit()` - `open()` If `generate_dependencies()` returns a new ExecutionRecord with additional dependencies, a new chip is constructed based on that record, and the proving process repeats. ![image](https://hackmd.io/_uploads/S1Hdww39Jg.png) --- **How Precompile Costs Are Paid** The ecall instruction that invokes a precompile corresponds to a SyscallEvent and a `SyscallChip` (e.g., `Sha2ExtendChip`). In addition, two extra events are generated: 1. **CpuEvent**: This event is generated together with cycle instructions (see [here](https://github.com/succinctlabs/sp1/blob/595cf0ea29b515bdf2e471a4afdcecafcfbc033f/crates/core/executor/src/executor.rs#L1015)). The corresponding `CpuChip` proves the consistency of each operand, clock, and shard information in the execution trace. 2. **GlobalEvent**: This event is produced within the SyscallChip's `generate_dependencies()` (see [here](https://github.com/succinctlabs/sp1/blob/595cf0ea29b515bdf2e471a4afdcecafcfbc033f/crates/core/machine/src/syscall/chip.rs#L84)). The corresponding [GlobalChip](https://github.com/succinctlabs/sp1/blob/595cf0ea29b515bdf2e471a4afdcecafcfbc033f/crates/core/machine/src/global/mod.rs) proves that the interactions between instructions are correctly recorded in order and that the cumulative calculations are executed properly. By generating these dependencies via `generate_dependencies()`, the cost for executing a precompile is incurred only when the precompile is actually invoked. Below is a revised version of that chapter in English, with a concise and organized logic: --- **Multi-table FRI: Avoiding Per-Cycle Commitment for Precompile I/O** In SP1, a technique called “Multi-table FRI” is used to prevent the need to commit precompile I/O at every cycle. Each chip maintains a Boolean flag, `is_real`, which is switched to true only when the chip is actively used. The state of this flag is enforced by a table circuit known as the [MultiBuilder](https://github.com/succinctlabs/sp1/blob/595cf0ea29b515bdf2e471a4afdcecafcfbc033f/crates/recursion/core/src/air/multi_builder.rs#L20). This proof runs as part of the recursion process, ensuring that I/O commitment occurs only for chips that are in use. Furthermore, by controlling table sizes, each chip’s table can have a variable length. Additionally, by using the [Plonky3 Merkle Tree](https://github.com/Plonky3/Plonky3/blob/main/merkle-tree/src/merkle_tree.rs#L20-L22), multiple commitments can be generated simultaneously. --- **How to make PrecompileChip efficient?** Rather than expressing precompiles via the general-purpose CPU circuit, SP1 offloads them to dedicated STARK subcircuits. Key optimizations include: - **Low-Degree Constraints:** Bitwise operations (XOR, AND, NOT) are expressed using constraints of as low a degree as possible. - **64-bit Integer Handling:** 64-bit integers are decomposed into four elements (each less than 31 bits) to prevent overflow and to allow for parallelized, simplified arithmetic. - **Local State Management:** During the precompile computation, no memory accesses or register writes occur—the entire intermediate state is maintained within the circuit’s columns. --- **Examples of Implementation:** - **Keccak256:** [Source Repository](https://github.com/succinctlabs/sp1/tree/dev/crates/core/machine/src/syscall/precompiles/keccak256) - **SHA2Extend:** [Source Repository](https://github.com/succinctlabs/sp1/tree/dev/crates/core/machine/src/syscall/precompiles/sha256/extend) --- ## 3. Specifications and Implementation **Overview of the Official Specification:** Overall code flow:. ![image](https://hackmd.io/_uploads/Bkg6qf65kl.png) Internally, precompiles are implemented as custom STARK tables that are optimized for specific computations. For a concise overview, see the [Precompiles documentation](https://docs.succinct.xyz/docs/sp1/writing-programs/precompiles#:~:text=Precompiles%20are%20built%20into%20the,a%20few%20orders%20of%20magnitude). SP1’s modular structure makes it easy to add new precompiles. Within the zkVM, they are exposed as syscalls via the `ecall` instruction with unique syscall numbers and dedicated computation interfaces. On GitHub, you can find: - Event definitions for precompiles under `crates/core/executor/src/events/precompiles/` - Proof logic (AIR/chip implementations) for each precompile under `crates/core/machine/src/syscall/precompiles/` **Example – SHA-256 Extend Implementation:** - **Event Definition:** The `ShaExtendEvent` structure captures input message blocks (512 bits), intermediate values, and memory access information during precompile invocation. - **Chip Implementation:** The corresponding `ShaExtendChip` defines the algebraic constraints (e.g., verifying that ``` w_i = w_{i-16} + sigma0(w_{i-15}) + w_{i-7} + sigma1(w_{i-2}) ``` holds) for the SHA-256 message schedule. The prover uses this chip to efficiently verify the computation, ensuring that the overall VM behavior is equivalent to having executed a correct SHA-256 calculation. When an `Opcode::ECALL` is encountered in `execute_instruction()`, a **PrecompileEvent** is recorded via `self.add_precompile_event(...)`, and the precompile’s dedicated chip validates the computation. From the user’s perspective, calling a function like `syscall_sha256_extend` automatically performs part of the SHA-256 calculation and makes its result available for subsequent operations. **Additional Utilities:** - **Unconstrained Mode:** Functions like `syscall_enter_unconstrained` and `syscall_exit_unconstrained` define an “unconstrained mode” where computations are not strictly constrained by the proof system. This mode is useful for expensive operations (e.g., computing inverses or square roots) where candidate values are tried and then verified, thus reducing the overall computational cost. - **I/O Operations:** Syscalls such as `read_vec_raw()` and `sys_alloc_aligned` are provided for host–VM data exchange and environment setup. While not part of the core computation, they facilitate necessary peripheral operations. **Positioning in the Codebase:** Precompiles in SP1 serve as an extension to the ISA. Although not standard RISC‑V instructions, they are available as “pseudo‑instructions” via syscalls. Internally, each precompile is implemented as an independent STARK table (AIR) and is connected to the main CPU table via cross-table lookups. This architecture allows heavy computations (like hash functions) to be proved efficiently on dedicated circuits while the CPU table only retains minimal information (results or references). This approach effectively removes the VM overhead on critical parts while still enjoying the flexibility of compiling general-purpose RISC‑V code. **Comparison with Other zkVMs:** Other RISC‑V zkVMs—such as RISC Zero and Jolt (by a16z)—also employ precompile-like extensions. However, SP1 has focused on precompile extensions from the beginning. For example, RISC Zero (v1.2 and later) has added precompile features for SHA‑256 and 256‑bit multiplication, reporting efficiency improvements of up to 10×. In contrast, SP1 offers a broader range of optimized precompiles (e.g., SHA‑256, Keccak, RSA, pairings) while maintaining open implementations. According to comparisons by Symbolic Capital, SP1’s GPU support and extensive precompiles make it both cheaper and faster on realistic workloads than RISC Zero, albeit with a slightly larger proof size. **References:** - [Precompiles Overview | Succinct Docs](https://docs.succinct.xyz/docs/sp1/writing-programs/precompiles#:~:text=Precompiles%20are%20built%20into%20the,a%20few%20orders%20of%20magnitude) - https://github.com/succinctlabs/sp1 - [Succinct Ships: Optimized Precompiles in SP1](https://blog.succinct.xyz/succinctshipsprecompiles/#:~:text=In%20traditional%20CPU%20execution%20environments%2C,the%20value%20inside%20of%20SP1) - [The zkVM Wars – Precompiles Comparison](https://www.symbolic.capital/writing/the-zkvm-wars#:~:text=precompiles%20that%20accelerate%20hash%20functions,Client%20proving%20or%20Reth%20proving)