# EPF Final Report: Ziren zkVM Integration with Grandine Consensus Client ## Summary This report documents the successful integration of Ziren zkVM (formerly zkMIPS) into the Grandine Ethereum consensus client, enabling zero-knowledge proof generation for beacon chain state transitions. The project involved zkVM integration and precompiles integration for better performance and benchmarking to demonstrate the feasibility of proving large-scale Ethereum consensus operations. [PR#385 - Ziren zkVM](https://github.com/grandinetech/grandine/pull/385) ## Key Achievements - Successfully integrated Ziren zkVM into Grandine's zkVM backend architecture - Developed cryptographic precompiles (`sha256` and `bls12_381`) achieving a significant performance improvement - Generated proofs for beacon chain state transitions including epoch transitions - Established CI/CD infrastructure for automated zkVM build testing - Contributed to open-source ecosystem with multiple merged and pending PRs Here is a clean **appendix-style clickable navigation section** you can place at the **top of your report**. It uses Markdown anchor links so readers can jump to any section instantly. ## Appendix - [Summary](#summary) - [Key Achievements](#key-achievements) - [Project Background](#project-background) - [Motivation](#motivation) - [Selection Criteria](#selection-criteria) - [Technical Architecture](#technical-architecture) - [Initial Integration](#initial-integration) - [First Successful Execution](#first-successful-execution) - [Memory Allocation Issues](#memory-allocation-issues) * [Optimization Through Precompiles](#optimization-through-precompiles) * [`sha256` Precompile Development](#sha256-precompile-development) * [`bls12_381` Precompile Development](#bls12_381-precompile-development) * [Proof Generation and Benchmarking](#proof-generation-and-benchmarking-weeks-12-13) - [CI/CD and Production Readiness](#cicd-and-production-readiness) * [Future Work](#future-work) * [Short-term Priorities](#short-term-priorities) * [Medium-term Enhancements](#medium-term-enhancements) - [Acknowledgments](#acknowledgments) - [Conclusion](#conclusion) - [References](#references) ## Project Background ### Motivation The integration of zkVMs into Ethereum consensus clients enables trustless verification of beacon chain state transitions without re-executing the entire computation. This capability is crucial for light client implementations requiring succinct proofs and trustless validator set tracking. ### Selection Criteria As defined by Saulius Grigaitis (Grandine team), suitable zkVMs must meet three critical requirements: - Scale: Capability to prove at least 100,000 validators for beacon chain state transition functions - Continuation: Support for proving large programs through segmentation - Optimization: Precompiles or accelerations for `sha256` and `bls12_381` operations Based on these criteria, Ziren zkVM was selected as a primary integration candidate due to its MIPS architecture offering smaller circuit constraints and proven performance in large-scale computations. ### Technical Architecture The integration follows Grandine's established zkVM backend pattern, implementing three core traits: `state_ssz`, `block_ssz` and `cache`: ```rust pub trait VmBackend { fn execute(&self, state_ssz: &[u8], block_ssz: &[u8], cache: &[u8]) -> Result<(Vec<u8>, Report)>; fn prove(&self, state_ssz: &[u8], block_ssz: &[u8], cache: &[u8]) -> Result<Proof>; } ``` The guest program executing within the zkVM performs the following operations: Input Deserialization: Reads SSZ-encoded inputs state_ssz: Current beacon chain state block_ssz: Signed beacon block to process cache_ssz: Precomputed validator public keys State Transition Execution: Applies Ethereum consensus rules Commitment Generation: Creates cryptographic proof Compilation Target The Rust codebase compiles to the MIPS ELF format using: ```bash cargo +nightly-2025-06-30 build --release --target mipsel-zkm-zkvm-elf ``` The mipsel-zkm-zkvm-elf target represents a 32-bit MIPS little-endian architecture optimized for zero-knowledge circuit constraints. ### Initial Integration With the research phase complete, implementation work began in earnest. I began the implementation by creating a [private repository fork of Grandine](https://github.com/Dyslex7c/grandine-zk) and started implementing the Ziren backend module. The implementation followed the established pattern set by the existing RISC Zero and SP1 backends, ensuring consistency in the codebase and making it easier for other developers to understand and maintain the code. The Ziren module implements the same `VmBackend`, `ReportTrait`, and `ProofTrait` interfaces using ZKM SDK components. The integration involves several key components: - ZKMStdin: Handles serialization and transfer of input data to the guest program - ZKMProofWithPublicValues: Manages the generated proof and its associated public outputs - ZKMVerifyingKey: Contains the cryptographic verification key used to validate proofs The implementation required careful handling of the setup phase, where proving and verifying keys are generated or loaded. These keys are specific to the guest program and must be managed properly to ensure that proofs can be verified by third parties without access to the prover's system. The first technical challenge I faced was during initial compilation attempts. The `mipsel-zkm-zkvm-elf target` is a 32-bit MIPS little-endian architecture, which means it lacks native support for 64-bit atomic operations. However, the SSZ serialization library used throughout Ethereum's consensus layer made use of 64-bit atomics for thread-safe operations, as a result of which it manifested as compilation errors that prevented the guest program from building. After consulting with Artiom from the Grandine team, I decided to conditionally compile this part. This decision was justified by the deterministic nature of beacon chain state transitions. ### First Successful Execution After resolving these blockers, I was able to successfully execute the first test case `pectra-devnet-6 with epoch transition`. This test represents a realistic and complex scenario involving multiple validators, attestations, and the computationally expensive epoch boundary processing. The results were as follows: ``` Execution time: Approximately 875 seconds (14.6 minutes) Cycle count: 7,514,771,682 cycles ``` The successful computation of the correct state root validated that the integration was functioning correctly and that the zkVM was properly executing the Ethereum consensus logic. However, the execution time and cycle count also revealed significant room for optimization, which I eventually was also successful in by introducing precompiles. ### Memory Allocation Issues Before diving into precompile development, I faced a blocker when testing larger workloads. The test case mainnet without epoch transition crashed silently without producing any output or error messages. Initial debugging revealed that the zkVM was returning an empty output_bytes array, indicating that the guest program had failed before completing execution. Deeper investigation traced the issue to the following line in the host code: ```rust let (output, report) = client.execute(STATE_TRANSITION_ELF, stdin).run()?; ``` The zkVM was terminating prematurely, and analysis suggested two potential root causes: memory exhaustion or unsupported cycle counts exceeding the zkVM's limits. The guest code was exiting during the deserialization phase, before even beginning the state transition computation, suggesting that the large mainnet state was causing excessive memory usage. I experimented with an alternative embedded-allocator that implements a more sophisticated heap-like allocation pattern with support for freeing and reusing memory. However, this allocator introduced its own challenges: the metadata overhead and potential fragmentation actually increased memory usage in some cases, still resulting in overflow for mainnet-scale state data. A similar issue had been encountered by the Grandine team during SP1 integration, and an issue was opened in the Ziren repository as well. This remains an ongoing area of collaboration, with potential solutions including increased memory limits in the zkVM, more efficient serialization formats, or segmentation strategies that process state data in chunks. ### Optimization Through Precompiles I turned my attention to optimization after basic execution started to work. Profiling revealed that cryptographic operations dominated execution time, consuming the majority of cycles during state transition execution. This finding aligned with expectations, as beacon chain operations involve `sha256` and `bls12_381` precompiles. ### `sha256` Precompile Development I continued my work for the test cases that were working. The first optimization target was using `sha256` hash compression, which is used in beacon chain operations for computing block roots, state roots, and Merkle proofs. I developed a `sha256` precompile for Ziren that used external system calls provided by the zkVM. The implementation utilizes `syscall_sha256_extend` and `syscall_sha256_compress`, which are optimized circuit implementations of SHA-256 operations. These system calls replace the standard software implementation of `sha256` with specialized zkVM instructions that are proven more efficiently. The implementation required careful compile-time detection to ensure that the precompile is only used when running within the Ziren zkVM: ```rust [cfg(all(target_os = "zkvm", target_vendor = "zkm", target_arch = "mips"))] ``` This conditional compilation ensures that the same codebase can run natively for testing and within the zkVM for proving, without requiring separate code paths or runtime detection overhead. The following shows the impact of using `sha256` precompiles: | **Metric** | **Without Precompiles** | **With Precompiles** | **Improvement** | | -------------- | ----------------------- | -------------------- | ------------------- | | Execution Time | ~192 seconds | ~110 seconds | **42.7% faster** | | Cycle Count | ~4.82 billion | ~1.56 billion | **67.6% reduction** | [Merged PR#1 - sha256 precompile (hash compression function)](https://github.com/grandinetech/universal-precompiles/pull/1) ### `bls12_381` Precompile Development After the `sha256` hash compression function, I turned my attention to `bls12_381` elliptic curve operations, which are even more computationally intensive. BLS signatures are central to Ethereum's consensus mechanism, used for validator signatures, attestation aggregation, and sync committee operations. The `bls12_381` precompile implementation is considerably more complex than `sha256`, as it includes a variety of mathematical structures like: Base Field Operations (Fp): The foundation consists of arithmetic in the base field GF(p), where p is a 381-bit prime number. Operations include addition, subtraction, multiplication, and inversion modulo p. These operations are implemented using syscall-based modular arithmetic: `syscall_addmod`: Modular addition `syscall_submod`: Modular subtraction `syscall_mulmod`: Modular multiplication Montgomery Form Handling: `bls12_381` implementations typically use Montgomery form for efficient modular arithmetic. Montgomery form represents field elements in a transformed space where certain operations become more efficient. The precompile implementation includes methods for converting to and from Montgomery form, following the same pattern as the SP1 implementation for consistency. Extension Fields: `bls12_381` uses tower extensions of the base field. The implementation extends through Fp2 (quadratic extension), Fp6 (cubic extension of Fp2), and Fp12 (quadratic extension of Fp6). Each level adds complexity but is necessary for the complete pairing computation. Elliptic Curve Groups: The implementation includes operations on both G1 (points on the base curve) and G2 (points on a twisted curve over Fp2). Key operations include: Projective point doubling for efficient scalar multiplication Affine point arithmetic for memory-efficient representations Scalar multiplication using optimized double-and-add algorithms Point decompression for compact point representation Pairing Engine: This maps pairs of points from G1 and G2 to Fp12 and includes the following: Miller loop computation, the core of the pairing algorithm Final exponentiation in the cyclotomic subgroup Frobenius map implementations for extension field operations The entire implementation follows the structure of existing `bls12_381` libraries (particularly zkcrypto and blst). [Merged PR#2 - bls12_381 precompile](https://github.com/grandinetech/universal-precompiles/pull/2) ### Proof Generation and Benchmarking (Weeks 12-13) Later, I transitioned from execution testing to actual proof generation. Execution within the zkVM demonstrates that the computation can be performed correctly but proof generation produces the actual cryptographic artifact enabling trustless verification. The first proof generation attempt targeted the smallest available test case: `consensus spec tests mainnet electra empty block transition`. This test represents a minimal state transition with an empty block, making it the least computationally demanding scenario. Even for this simple case, proof generation required substantial computational resources: Metrics: - Total time: Approximately 1.2 hours (4,309 seconds) - GPU clock cycles: ~1.47 billion cycles for verification - Clock speed: 34.11 kHz (~34,110 cycles/second) The relatively slow clock speed of 34.11 kHz is characteristic of zkVM proof generation. This value doesn't represent the speed of the underlying hardware but rather the effective throughput of the zkVM when accounting for constraint generation, witness computation, and cryptographic operations. Each "cycle" in this context represents a complete execution step in the MIPS emulation, including all the associated proof generation overhead. The proof generation process revealed several interesting characteristics of the Ziren zkVM. The system emulates a MIPS32 processor, and the emulation overhead is substantial Test Environment: Device: MacBook Air with Apple M3 chip CPU: 8 cores (4 performance + 4 efficiency cores) Memory: 16 GB unified memory OS: macOS Sequoia 15.6.1 Kernel: Version 24.6.0 Execute Operation Results: The execute operation runs the guest program within the zkVM without generating a proof: ![image](https://hackmd.io/_uploads/Sk-Phrqg-g.png) Prove Operation Results: ![Screenshot 2025-11-19 at 1.19.48 AM](https://hackmd.io/_uploads/HksI0rqx-g.png) ### CI/CD and Production Readiness Continuous Integration Workflow I developed a gitHub actions workflow for automated zkVM build testing. Environment Setup - Disk space optimization - Nim compiler installation for Constantine backend - System dependencies: GMP, LLVM Toolchain Caching - ZKM toolchain cached for faster builds - Nightly Rust 2025-06-30 installation - Build artifact caching Build and Test - Guest program compilation to `mipsel-zkm-zkvm-elf` - Automated test execution with matrix strategy - Test case parameterization ### Future Work Short-term Priorities Complete 64-bit Atomics Implementation - Enables full SSZ library compatibility - Removes temporary workarounds Resolve Memory Allocation Issues - Collaboration with Ziren team ongoing - Alternative: custom allocator implementation - Blocker for mainnet-scale operations Fix SSZ Deserialization Bug - Trace maximum variable corruption - Implement proper boundary checking ### Medium-term Enhancements GPU Proving Service Integration - Access to Ziren's cloud proving infrastructure - Benchmark larger test cases - Evaluate production deployment feasibility Additional Precompile Development - Merkle proof verification acceleration - SSZ serialization/deserialization optimization - Attestation aggregation precompiles ## Acknowledgments This project would not have been possible without the utmost guidance and support of Saulius Grigaitis, Artiom (Grandine team), Mário and Josh, Ziren Team and EPF Program itself. And other fellows with whom I collaborated together in this project - Jimmy Chu - Aman Special thanks to the Ethereum Protocol Fellowship for providing the platform to contribute meaningfully to Ethereum's infrastructure development. ## Conclusion In this EPF project, I have successfully integrated Ziren zkVM into the Grandine Ethereum consensus client for zero-knowledge proof generation of beacon chain state transitions. Through systematic research, careful implementation, and optimization, we were able to achieve this. The journey from initial research through implementation and optimization has been both challenging as well as rewarding. As Ethereum continues evolving toward more scalable and trustless verification mechanisms, zkVM integration into consensus clients represents a crucial stepping stone toward the protocol's long-term vision. ## References ### Repositories [Grandine main repository](https://github.com/grandinetech/grandine) [Universal precompiles](https://github.com/grandinetech/universal-precompiles) ### PR's [PR#1 - sha256 precompile (merged)](https://github.com/grandinetech/universal-precompiles/pull/1) [PR#2 - bls12_381 precompile (merged)](https://github.com/grandinetech/universal-precompiles/pull/2) [PR#385 - introduce ziren zkVM to Grandine (open)](https://github.com/grandinetech/grandine/pull/385)