# Report on proof systems for signature aggregation ## Introduction This report focuses on the evaluation of proof systems suitable for signature aggregation, based on our current assumptions and early design explorations. At the moment, we are working with the signature scheme proposed in [Hash-Based Multi-Signatures for Post-Quantum Ethereum](https://eprint.iacr.org/2025/055). While alternative schemes (e.g., lattice-based signatures) are being discussed both internally and externally, we use this hash-based scheme as a reference point to guide our initial evaluations of proof systems. A change in the underlying signature scheme could impact the proof system design, but the current setup offers a concrete baseline for early investigation. From the start of the project, the "minimal VM" approach has been presented as the preferred path forward. The rationale for this direction includes: - Encapsulating the proving logic within the VM to reduce the cryptographic complexity for client teams. - Providing a clean and centralized specification at the VM level for the beam chain. - Improving security by reducing the need for teams to write and verify their own circuits. - Lowering the barrier for client adoption, especially for teams without in-house cryptography expertise—still common at this stage. However, there are some trade-offs and risks associated with this VM-centric approach: - If the signature scheme changes over the course of our work, the VM design might require significant rework, as it is closely tied to the specific structure and efficiency characteristics of the scheme. - Even a minimal or specialized VM introduces some performance overhead compared to circuit-based approaches, which could be a limiting factor. With this context in mind, we proceed to examine available zkVMs and proof systems to evaluate their fitness for our specific use case. ## Hash function considerations A key early decision in our investigation is the choice of hash function. Given ongoing efforts both within the Ethereum Foundation and the broader ecosystem, Poseidon2 currently appears to be the strongest candidate. It offers excellent performance in SNARK settings and has been the focus of significant optimization work. That said, there are still important considerations: - Some community feedback has expressed reservations about Poseidon's structure, favoring more traditional hash functions. See for example: [this discussion](https://x.com/zac_aztec/status/1894811836773462343?s=46&t=Wvdt5z9cJIQ0FpcP8DswUg). - We continue to monitor and evaluate other well-established and widely-audited hash functions, including Keccak, Groestl, and Ajtai-based constructions. While Poseidon2 is currently our working baseline, no option is excluded at this stage. The choice of hash function plays a central role in determining the structure and performance of the proof system. After the signature scheme itself, it is likely the most critical building block. For most of our experiments and benchmarks, we have used Poseidon2 as the default, both due to its practical performance and its likely alignment with future Ethereum design directions. ## Evaluation of zkVMs With the background established, we turned to existing zkVMs to assess whether any could meet our specific requirements in terms of: - Proving throughput - Proof size Late last year, Han ran initial tests on two of the fastest zkVMs currently available: - [OpenVM](https://github.com/han0110/hash-sig-agg/tree/main/zkvm/openvm) - [SP1](https://github.com/han0110/hash-sig-agg/tree/main/zkvm/sp1) The outcomes of these tests led to several observations: - These general-purpose zkVMs are not suitable for our use case as-is. Their performance falls well short of our target, specifically for signature aggregation. While they may be appropriate for broader use cases like proving L1 blocks, our requirements are more specialized and focused, which calls for a different performance profile. - Large, monolithic VMs such as SP1/RISC0/Cairo/Miden come with significant complexity and large codebases. Integrating or adapting them would likely be time-consuming and misaligned with our goal of a minimal and purpose-built system. - The zkVM landscape is broad and diverse, making it impractical to explore every option in depth. Some degree of prioritization is necessary. - More specialized zkVMs such as Zisk, Jolt, or the upcoming Binius VM appear to be better aligned with our needs and may offer a more appropriate foundation as more specialized alternatives. Given these constraints, we arrived at the following working plan: 1. **Prototype with hand-written circuits** Since existing zkVMs fall short in performance, we should first focus on designing a customized proof system that meets our baseline requirements. To simplify the scope, this initial phase assumes only a single aggregation layer—delaying any peer-to-peer or multi-recursion design. 2. **VM abstraction and specification** Once a suitable proving scheme is in place, we can explore how to: - Encapsulate it within a minimal VM architecture. - Abstract circuit construction, if no existing VM fits well (middle point between circuit based approach and VM based approach). - Draft a formal specification to gather feedback from the wider community. - Define a clear API for client teams. 3. **Recursive design** Longer-term, we will need to handle multiple layers of recursion (potentially). Folding schemes may help reduce peer-to-peer communication costs. We are currently in discussion with Srinath about [Neo](https://eprint.iacr.org/2025/294), which explores lattice-based folding in a post-quantum setting. 4. **Testnet validation** Eventually, we aim to deploy a testnet to validate the full system architecture and assess its real-world properties. ## Hash performance benchmarks in SNARKs A central performance metric for our proof system is the number of hashes we can prove per second. This has consistently served as our primary filter before engaging more deeply with any given scheme: - If a system is significantly below our target (pessimistically 100k hashes/sec, optimistically 500k hashes/sec on modern CPUs), we typically rule it out early. - If a system meets or exceeds this threshold, it warrants further investigation. Benchmarking work was conducted by Han toward the end of last year and can be found [here](https://hackmd.io/@han/bench-hash-in-snark). Below is a summary of the results and observations for each approach. ### Plonky3 - Currently the fastest setup tested, reaching up to almost 1M hashes/sec on a modern CPU without any GPU acceleration. - Its modular architecture allows for rapid integration of components like new PCS systems (e.g., WHIR), typically requiring only a few days/weeks of work. - Widely adopted across the ecosystem, with multiple provers either using it directly or building on forks. - GPU acceleration efforts are already underway with [ICICLE](https://www.ingonyama.com/blog/air-icicle-plonky3-on-icicle-part-1), making it a strong practical candidate. ### Stwo - Also shows promising results, slightly below Plonky3 in performance (up to 500k hash/second approximately). - StarkWare recently shared new benchmark results last week: > Stwo achieves 3.7 million Poseidon2 hashes/sec on a 4090 GPU with `N_LOG_INSTANCES_PER_ROW = 0` and `log_n = 22`; under a more practical configuration (`N_LOG_INSTANCES_PER_ROW = 3`), it reaches 3.0 million hashes/sec. - While impressive, adopting Stwo would come with additional complexity due to its use of circle STARKs, which could increase implementation and specification overhead. - We also considered writing our aggregation logic directly in Cairo for execution in the Cairo VM, but the continued reliance on Felt252 fields (despite Mersenne primes in Stwo) results in high overhead, making this path impractical for our use case. ### Binius - This was the most unexpected result. Poseidon is not yet available in Binius (though it may be soon, pending work such as Dimtry’s recent analysis). Currently, only Keccak and Vision are implemented, with Vision being a binary field-specific hash. - Performance with Vision is currently limited to 4.63k hashes/sec—well below our requirements. - A recent [Groestl integration PR](https://github.com/IrreducibleOSS/binius/pull/139) might improve results, though it is unlikely to close the 100x–200x performance gap with Plonky3. - But paradoxically, we have to keep in mind that Binary fields remain promising from a long-term perspective due to their natural alignment with bit-based logic and avoidance of bit-to-prime overhead. - We’ve seen increasing interest in binary field techniques, both in industry and academia: - [Jolt](https://a16zcrypto.com/posts/article/understanding-jolt-clarifications-and-reflections/) has expressed plans to integrate Binius: > The Binius commitment and Jolt go together like peanut butter and jelly because Jolt is the only zkVM today that is exclusively based on the sum-check protocol. Today, Jolt uses commitment schemes based on elliptic curve cryptography, but incorporating the Binius commitment into Jolt is our highest priority. - Zisk and [Risc0](https://www.irreducible.com/posts/irreducible-x-risc-zero) have also shown interest in Binius. ![image](https://hackmd.io/_uploads/HJK3oxzake.png) - A dedicated Binius-based VM is in development ([link](https://www.irreducible.com/posts/irreducible-x-polygon-labs-collaboration)). - Academic momentum is growing around binary techniqes, e.g., [sumcheck over binary fields](https://eprint.iacr.org/2024/1046) and grant proposals submitted to the EF. - Given this context, we may revisit Binius with a Poseidon arithmetization effort in their M3 framework, similar to recent work with Groestl. - Discussions are ongoing around exploring Ajtai hash in binary settings, though its performance and security characteristics remain to be evaluated. ### Hashcaster - A positive surprise: initial Keccak-based implementation reached up to 36k hashes/sec—currently the fastest Keccak-based approach tested. - In recent [grant work](https://github.com/tcoratger/hashcaster-exploration), we observed ~10% speed improvements even on modest hardware (Macbook M1), suggesting potential gains on modern CPUs. - Further optimization opportunities remain: - Bit/byte slicing - GPU acceleration - Despite promising early results, performance is still below our baseline targets, so we have not prioritized further integration for now. ### Expander and GKR Techniques - Based largely on sumcheck methods, this direction opens the door to Jolt-like designs relying only on lookup and sumcheck operations. - Could potentially offer advantages in formal verification, the cost of formally verifying the custom gates vs Plonky3 AIRs for example still to be evaluated. - At the time of initial testing, the PCS component was still in development, but the team recently shared updated benchmark results: ``` benchmarking poseidon with GKR^2 over m31ext3 field: m31ext3 #threads: 4 #bench repeats: 4 PCS: Orion Throughput results: - 0: 507,237 hashes/sec - 1: 509,550 hashes/sec - 2: 509,681 hashes/sec - 3: 510,227 hashes/sec Proof size: 1.7 MB (1.5 MB from PCS) ``` - These results meet our performance criteria. Replacing the Orion PCS with WHIR could reduce proof size further. - Potential challenges include: - Reliance on a domain-specific language and compiler infrastructure. - Auditability concerns similar to those with AIR-based systems, especially around GKR circuit structures. ## Current work ### Circuit-based approach Following the earlier benchmarks and analysis, Plonky3 has emerged as the most promising candidate. As a result, we have moved from initial evaluation to hands-on experimentation with a circuit-based approach tailored to our signature aggregation use case. Key components of the current work: - The reference implementation of the signature scheme is available [here](https://github.com/b-wagn/hash-sig). - An initial circuit implementation can be found [here](https://github.com/han0110/hash-sig-agg/tree/main/circuit/openvm). - Circuit details and supporting gadgets are documented [here](https://hackmd.io/@han/hash-sig-agg) and [here](https://hackmd.io/@tcoratger/SyWbmVPckx). Han has also started a [Plonky3 playground](https://github.com/han0110/p3-playground) to experiment with prover customization. Early results show reasonable proof generation speed (3.5 k signatures per second as far as I know), though proof size remains an area for improvement. To address this, we are working on integrating the WHIR polynomial commitment scheme with Plonky3. The WHIR integration process involves three stages: 1. Understanding the WHIR protocol. 2. Improving the base implementation — building on the original repository: [WizardOfMenlo/whir](https://github.com/WizardOfMenlo/whir). 3. Porting to Plonky3 — making WHIR compatible with Plonky3 via [this repository](https://github.com/tcoratger/whir-p3), which is nearing completion (pending final debugging). Once this integration is complete, we will be able to test WHIR within our Plonky3 playground and evaluate improvements in proof size and overall performance. ## Additional considerations As discussed during the recent ARG x Cryptography team call, the benchmark data has been instrumental in shaping our preliminary roadmap. However, there are additional dimensions to consider when making final decisions. This section highlights a few noteworthy directions and tools that may influence our long-term architecture. ### Jolt Jolt VM has been of interest from the beginning due to its simplicity and clear structure, even though it currently relies on elliptic curve cryptography and does not yet meet post-quantum requirements. Key aspects: - Jolt is built entirely on sumcheck and lookup arguments, making it one of the simplest zkVMs in terms of structure and potentially the easiest to formally verify. - The codebase is compact (~25,000 lines), which is significantly smaller than other RISC-V-based zkVMs. This simplicity also aids in auditability and ease of specification: > The lookup-only approach also results in a simpler and more auditable implementation. These benefits are difficult to quantify, and take time to recognize and appreciate. But Jolt does well on crude proxies such as lines of code (the Jolt codebase is about 25,000 lines of code, which is 2x to 4x fewer than prior RISC-V zkVMs) and development time. Such improvements are much harder to obtain than performance ones: While I expect zkVM provers to be nearly 100x faster in the coming months than they were in August 2023, it’s hard to imagine a zkVM with even 10x fewer lines of code - The project is also closely aligned with academic developments, such as [Twist and Shout](https://a16zcrypto.com/posts/article/introducing-twist-and-shout/) and other ones, indicating a strong innovation pipeline with always simplicity as the north star. ### Zisk Zisk, developed by the Polygon team, has recently shown promising results—especially in terms of GPU acceleration and low-latency performance: ![Zisk Benchmark](https://hackmd.io/_uploads/rksufZG6Jl.png) Its compact and focused codebase is also a positive factor. We are currently evaluating it further: [Zisk repository](https://github.com/0xPolygonHermez/zisk). ### Neo: toward post-quantum folding Looking ahead, we anticipate the need for a second layer of recursion, which could incur substantial peer-to-peer communication costs. Folding schemes may offer a more efficient alternative by reducing these overheads. In this context, we have opened a discussion with Srinath Setty regarding [Neo](https://eprint.iacr.org/2025/294), a recent proposal introducing a lattice-based folding scheme over small prime fields. Neo appears to be the first construction of its kind with a post-quantum orientation. Notable characteristics: - Folding scheme based on CCS - Operates over small prime fields like Goldilocks - Builds on Ajtai commitments - Commitment cost is proportional to the number of non-zero bits - Represents an evolution over prior work such as LatticeFold Relevant commentary from Justin Thaler: >Regarding post-quantum versions of Jolt, there's a new possible path towards that that doesn't go through Binius. Srinath Setty and Wilson Nguyen are releasing a new e-print soon (called Neo) that gives a lattice-based commitment scheme (based on Ajtai commitments) that a) avoids a lot of the overheads of prior lattice-based commitments and b) retains nice properties of elliptic curves that we exploit heavily in Twist+Shout, specifically 0s are "free" to commit to. Unfortunately, Neo's evaluation proofs are still very big (Srinath and William's new paper is focused on folding schemes so they don't really care about evaluation proofs). If the evaluation proof size comes down in the future, we could slot this right into Jolt and get a plausibly post-quantum proof system that's even more performative and way easier to build than Jolt+Binius (since it avoids switching over to binary fields). In the future, evaluation proofs in Neo could potentially be improved using schemes like WHIR or FRI, which may reduce proof size and make the system more practical.