# The Definitive CSP: Towards a Post-Quantum, On-Chain-Verifiable ZK Proving System for Client-Side Applications
> ***Living research document**.
This document records design constraints, evaluated alternatives, empirical findings, and the current implementation plan. It is expected to evolve as benchmarks, parameters, and implementation details are refined.*
## Abstract
Current on-chain ZKP verification is dominated by Groth16 and PLONK-ish systems that rely on elliptic-curve pairings and will be broken once quantum computer arrives. We aim to design and implement a zero-knowledge proving system that is simultaneously: (i) **client-side feasible** (targeting mobile constraints on prover time, RAM and bandwidth), (ii) **post-quantum sound** (hash/code-based or lattice-based), (iii) **transparent** (eliminating setup security assumptions and distribution costs) and (iv) **directly verifiable on-chain** on an EVM L1 without relying on a SNARK wrapper.
---
## 1. Problem Statement And Success Criteria
### 1.1 Goal
Provide an answer to: **"What is the best client-side proving ZKP system that is PQ-sound and verifiable on-chain?"**
"Best" is defined by a three-way objective:
- **Client-side proving**: feasible on commodity phones (~4GB RAM target[^hwsurvey]) under realistic latency budgets (comparable to or better than ProveKit in CSP benchmarks[^ethproofs] for reasons defined in [section 5.1](#51-Baseline-And-Performance-Expectations)).
- **Post-quantum soundness**: avoid pairing-based assumptions; prefer hash-/code-based soundness.
- **On-chain verification under 1 MGas**: low enough for practical L1 settlement (see section [2](#2-On-Chain-Verification-Constraints)).
[^hwsurvey]: https://hackmd.io/@clientsideproving/AvgMobileHardware
[^ethproofs]: https://ethproofs.org/csp-benchmarks
### 1.2 Success Metrics
We will track:
- **Prover time**, **peak RAM**, **proof size**, preprocessing sizes for workloads equivalent to available client-side proving benchmarks[^ethproofs]:
- proving the Keccak-256 hash of messages ranging in size from 128 B to 2 kB
- proving the SHA-256 hash of messages ranging in size from 128 B to 2 kB
- **Verifier gas** (broken down into calldata + hashing + field arithmetic + control flow).
- **Security level** (targeting ≥128-bit where feasible; otherwise explicit roadmap to achieve it).
## 2. On-Chain Verification Constraints
### 2.1 Current Baseline: Groth16/PLONK-ish Systems
Groth16 is widely used because it is cheap and constant-size on-chain, but it is not post-quantum sound.
PLONK-ish verifiers can be substantially more expensive in gas. For example, Base reported **~2,396,575 gas** for Barretenberg verification in a passkey/ECDSA benchmark suite, vs ~347k gas for Groth16 implementations in that comparison[^bbgas].
[^bbgas]: https://blog.base.dev/benchmarking-zkp-systems
### 2.2 On-Chain Verification of Hash-Based Proof Systems
STARK verification on Ethereum has historically been considered expensive; an Ethereum Research post (May 26, 2021) cites **~5M gas** for StarkNet proof verification at that time[^starkgas]. This cost profile is acceptable for rollups but not acceptable for client-side dApps where thhe user is supposed to pay the gas cost per-transaction.
[^starkgas]: https://ethresear.ch/t/checkpoints-for-faster-finality-in-starknet/9633
### 2.3 WHIR As A Promising Direction
Recent work explores making WHIR/FRI-style verification significantly cheaper. A WHIR Solidity verifier PoC reported **~1.9M gas** (although, with parameters not necessarily at 128-bit security)[^solwhir]. This is already competitive with (and in that comparison, better than) some PLONK-ish on-chain verifiers (e.g., the Noir/UltraHonk figure above).
### 2.4 State-of-the-Art On-Chain Verification Costs
|ZKP system|Verification gas cost|
|-|-|
|Groth16|348K[^bbgas]|
|***Target***|***<1M***|
|***WHIR (SotA)***|***1.9M***[^solwhir]|
|Barretenberg|2.4M[^bbgas]|
[^solwhir]: https://ethresear.ch/t/on-the-gas-efficiency-of-the-whir-polynomial-commitment-scheme/21301
## 3. Design Constraints
### 3.1 PQ SNARK Families
Two main PQ directions are commonly considered:
1. **Hash-/code-based proof systems** (often based on FRI/Reed–Solomon proximity testing), colloquially "STARKs", though not all are STARKs - e.g., Ligero - and Binius use RS codes but are structurally distinct.
2. **Lattice-based approaches**, which can benefit from NTT-heavy arithmetic.
### 3.2 Lattice Verification And The NTT Precompile Line Of Work
ZKNOX reports practical results for on-chain lattice verification and proposes acceleration via NTT precompiles[^zknox]. This is relevant because NTT is also a general acceleration primitive for ZK systems and STARKs in particular[^pqeth]. However, this direction does not currently translate into actionable design choices for our project. Existing STARK-style verifiers deployed on Ethereum are fundamentally Merkle-based: on-chain verification is dominated by Merkle path checking and hash computation and does not include NTTs. Leveraging an NTT precompile would therefore require a **substantial restructuring of the proof system itself**, rather than a localized verifier optimization.
Moreover, practical use of such a precompile would require successful standardization and deployment of the precompile at the protocol level. At present, neither condition is guaranteed, and progress on one without the other does not yield a deployable system. Accordingly, we treat this line of work as **informative but out-of-scope** and focus instead on constructions that can be realized using today’s EVM primitives.
[^zknox]: https://zknox.eth.limo/posts/2025/02/24/ETHEREUM_for_PQ_era_250224.html
[^pqeth]: https://ethresear.ch/t/tasklist-for-post-quantum-eth/21296
### 3.3 Scope: WHIR-Based Constructions from Existing Implementations
We prioritize a path that is safe engineering-wise: **build only from existing components**, minimizing speculative cryptographic redesign. Concretely, WHIR has already been integrated in multiple projects (PCS usage):
- ProveKit (Spartan + WHIR): [https://github.com/worldfnd/ProveKit](https://github.com/worldfnd/ProveKit)
- Whirlaway (WHIR-based): [https://github.com/TomWambsgans/Whirlaway](https://github.com/TomWambsgans/Whirlaway)
- leanMultisig (continuation of Whirlaway work): [https://github.com/leanEthereum/leanMultisig](https://github.com/leanEthereum/leanMultisig)
- Ceno zkVM: [https://github.com/scroll-tech/ceno/](https://github.com/scroll-tech/ceno/)
- HyperPlonk + WHIR (p3-playground): [https://github.com/han0110/p3-playground](https://github.com/han0110/p3-playground)
This ecosystem gives us multiple candidates and implementation reference points, even though they currently do not implement an EVM verifier.
## 4. Small-Field WHIR-Based SNARKs
A repeated pattern in recent "STARK-ish/IOP-ish" engineering is a move toward **small fields** (e.g., KoalaBear/BabyBear/Goldilocks) to reduce prover overhead and improve hardware efficiency.
In the WHIR Solidity verifier analysis, calldata is a major contributor to total gas, alongside hashing (Merkle) and modular arithmetic[^solwhir]. Thus, field element encoding directly affects the resulting gas cost.
We adopt the working hypothesis that moving WHIR verification from BN254 field elements (ProveKit, sol-whir) to 31-bit field elements should reduce calldata costs substantially, because calldata costs are proportional to its byte size.
## 5. Benchmarking Plan
### 5.1 Baseline And Performance Expectations
ProveKit (Spartan + WHIR) is adopted as the **baseline system** for client-side proving performance throughout this work. This choice is motivated by the fact that ProveKit:
- already integrates WHIR in a production-oriented prover stack,
- supports zero-knowledge,
- targets developers,
- and represents a realistic upper bound on engineering maturity among WHIR-based systems today.
We aim to match or outperform ProveKit’s by choosing a system with a different field and the ZKP protocol, while removing constraints that make ProveKit unsuitable for direct post-quantum on-chain verification (e.g., reliance on BN254 and proof wrapping).
The central hypothesis is that moving from large prime field (BN254) to small fields (e.g., KoalaBear) provides:
- lower arithmetic cost per operation,
- better cache locality and SIMD/vectorization friendliness[^simd],
- reduced witness and transcript size,
which together allow a WHIR-based system to achieve ProveKit-comparable or better prover performance.
[^simd]: https://github.com/tcoratger/whir-p3/pull/383
### 5.2 Preliminary Results
We compare five WHIR-based candidates with the explicit aim of choosing _one_ base system:
- **ProveKit** (Spartan + WHIR; big-field leaning; includes ZK features but also design choices aimed at SNARK wrapping)
- **HyperPlonk + WHIR** (p3-playground)
- **Whirlaway**
- **[Spartan-WHIR](https://github.com/alxkzmn/spartan-whir)**
- **Ceno** (GKR + WHIR)
#### 5.2.1 Client-Side Proving Performance
The [latest benchmarks](https://github.com/alxkzmn/csp-benchmarks/tree/the-definitive-csp) were run on an M4 Pro laptop.
The benchmarks were run using equivalent Keccak-256 circuits that perform full hashing for variable input size. Keccak-256 circuit was chosen for the following reasons:
- Readily available [Keccak-256 AIR in Plonky3 library](https://github.com/succinctlabs/plonky3/tree/main/keccak-air) - significantly reduces the engineering effort;
- Sufficient circuit complexity to be able to judge about systems' properties (i.e. not as simple as Poseidon AIR).
**Ceno** was discarded early during the test runs for the following reasons:
- No full WHIR support yet (only BaseFold)
- ~6s proving time, 5+ GB RAM footprint for the smallest SHA-256 input size in CSP benchamrking suite (128B). Such RAM footprint is beyond our feasibility cutoff for client-side proving. Significantly long proving time for such a small circuit (compared to other systems[^ethproofs]) also hints at the system not being optimized for client-side devices, which may be out of scope for this zkVM.
The benchmarks therefore were run for **HyperPlonk-WHIR** and **Whirlaway** against the baseline of **ProveKit**.





:::success
The results demonstrate that both HyperPlonk and Whirlaway have enough margin for parameter optimization to trade prover performance for proof size and verifier time reduction. The systems' proving performance (time and RAM footprint) is well within the bound of client-side proving[^hwsurvey][^ethproofs].
:::
#### 5.2.2 Proof Size Benchmarks (WIP)
Preliminary comparative results were obtained by running the systems with the following parameters:
- polynomial degree k=16 (for Hyperplonk and Whirlaway achieved by adjusting the circuit input size)
- security bits=80
- soundness=CapacityBound (ConjectureList in old naming)
- pow_bits=30
- folding_factor=4
- starting_log_inv_rate=6
- rs_domain_initial_reduction_factor=1 if available
- quartic extension (for Plonky3-based systems)
:::success
The proof sizes (in bytes) are given in the table below.
| System | WHIR proof | PIOP proof | Total |
| ------------ | ---------- | ---------- | ------ |
| **WHIR (SoTA)** | **30724** | - | 30724 |
| **[Spartan-WHIR](https://github.com/alxkzmn/spartan-whir)** | **11368** | 2627 | 13995 |
| Whirlaway | 19735 | 94728 | 114463 |
| Hyperplonk | 19509 | 125224 | 144789 |
As seen from the current results, the Plonky3-based implementation of Spartan-WHIR **improves the SoTA by almost 3x**.
:::
:::warning
TODO the discrepancy may be high due to Spartan-WHIR using R1CS - investigate
:::
We are using the known calldata size optimization techniques in all three implementations:
- "multiproof" batched Merkle openings[^mmp] where the openings for multiple leaves are provided within a single data structure where nodes are deduplicated, which allows savings of around 10-15% of the total proof size and around 5% of verification time;
- "masking", that is widely used in production deployments of STARK verifiers[^starkmasking] and consists in zeroing out the least significant bits (LSBs) of Merkle hash digests, which allows for another 15-18% calldata size reduction.
[^mmp]: https://ethresear.ch/t/optimizing-merkle-tree-multi-queries/4912/3
[^starkmasking]: https://raw.githubusercontent.com/starkware-libs/starkex-contracts/aecf37f2278b2df233edd13b686d0aa9462ada02/evm-verifier/solidity/contracts/MerkleVerifier.sol
[^whirp3]: https://github.com/tcoratger/whir-p3/
## 6. On-Chain Verifier Design
### 6.1 Reference Implementations
We build from the existing Solidity verifier PoC and accompanying EVM-oriented prover modifications:
- Solidity verifier PoC: https://github.com/privacy-ethereum/sol-whir
- EVM-verifier-oriented prover branch: https://github.com/dmpierre/whir/tree/feat/evm-verifier
- Gas-efficiency writeup and breakdown[^solwhir]
### 6.2 Expected Effects Of Small Field
We expect improvements mainly from:
- **Calldata shrinkage**: smaller field elements → fewer bytes.
- **Byte-wise Keccak input shrinkage**: modest hashing savings.
- **Arithmetic batching**: for 31-bit field elements, many additions/multiplications can be accumulated in 256-bit lanes before a single modular reduction step, potentially reducing `addmod/mulmod` usage.
### 6.3 Merkle Commitment Parameterization
We will incorporate "increase arity closer to the root" style optimizations where they help gas marginally, as already explored in prior analysis[^merkleopti]. The gain is expected to be incremental, not foundational.
[^merkleopti]: https://hackmd.io/@clientsideproving/whir-fri-verifier-opt
### 6.4 Verification Cost Targets
A pragmatic target is **< 1M gas** for verification on Ethereum mainnet (accepting that this will not beat Groth16’s ~300–400k class costs, but provides PQ soundness + transparency).
## 7. Adding Zero-Knowledge (ZK) To A Non-ZK Base Protocol
The chosen base (e.g., HyperPlonk/Whirlaway/Ceno) may not be ZK by default. We plan to add ZK using established techniques for making sumcheck/IOP components zero-knowledge[^zksumcheck]. We will also reuse implementation patterns from multilinear and WHIR-based systems that already ship ZK in practice (e.g., Spartan2 and ProveKit) to reduce engineering risk.
[^zksumcheck]: https://eprint.iacr.org/2017/305
## 8. Frontend/Developer Experience Plan
### 8.1 Why "Not Noir"
Noir is widely adopted, but its target field and IR assumptions do not match our target stack. Our candidate systems imply:
- Different field (not BN254, but a small field like KoalaBear)
- Potentially different IR (not ACIR, but AIR with Plonky3 API)
This makes direct reuse of Noir libraries and tooling substantially harder, and may require re-implementing "precompile-like" gadgets (e.g., range checks), as observed in practice in ProveKit.
### 8.2 Cairo-Based Frontend Option
Cairo already targets small fields and AIR, matching the representation used by Plonky3-based systems that we are considering. This reduces the potential amount of work on an adapter from Cairo to Plonky3 AIR.
## 9. Implementation Milestones
**M0 - Reproducible baseline** ✅
- Pin versions, unify hash/transcript primitives (Keccak), and produce a reproducible benchmark harness.
**M1 - Updated benchmark round** ✅
- Re-run ProveKit vs HyperPlonk+WHIR vs Whirlaway vs Ceno benchmark.
- Select winner using: proof size → verifier proxy cost → prover RAM/time.
**M2 - Solidity verifier for the chosen small-field WHIR system**
- Port the existing Bn254 verifier to chosen small field (or build from scratch while closely referencing the existing implementation).
- Implement all potential enhancements related to small field (e.g., batch addmod/mulmod).
- Profile gas cost and apply further optimizations to reach target range (ideally <1M).
**M3 - Add ZK**
- Apply zero-knowledge sumcheck/IOP techniques; validate no leakage and measure overhead.
**M5 - Frontend prototype**
- Cairo-to-(target AIR) compilation path for at least one canonical application (e.g., account abstraction gadget or private transfer toy app).
**M6 (Optional) - Low-memory proving option**
- Integrate streaming sumcheck if needed for the "4GB RAM" goal.