SNARKifying the Ethereum Protocol [WIP]

1. Ethereum Roadmap

Ethereum wants to have its cake and eat it too: achieve scalability while maintaining or even increasing decentralization.

By reducing the tension between the two we can have more of both simultaneously, which in turn leads to more overall security in various ways: more usage and usecases, network value, robustness against attack, and democratization of acesses and validation.

A lot has been achieved since 2020: uniquely distributed liveness-favoring consensus complemented with a robust slashing and recovery mechansim, enshrined ETH with baseline staking utility, trust-minimized light validation, and scalable and proliferating layer-2s.

Work continues to further harden security, neutrality, and chain economics both in-protocol (censorship resistance, protocol simplification, decentralization) and extra-protocol.

There still are essential things we need and want, and better things we would like:

Fast finality and non-reorgability
Post-quantum security
Accessible trustless validation:
- Stateful validation: the hardware requirement for running a solo staker should be accessible to average consumer hardware
- Stateless validation: The cost of verifying whole blocks and/or selective parts of the state should possible on low-power devices
Scalable validation where all 1M+ validators attest to a block.
Scalable L1 execution ensuring the viability of L1 as a deployment target for a wide range of usecases beyond bridges and DA, thereby achieving a healthy symbiotic balance between L1 and L2s.
.
SNARKs are the key to:
- Bringing (2)-(5)
- Unlocking a simpler path towards (1).

2. Why build an L1-specific zkVM substrate

There are many existing zkVM projects in the industry, why can't we just build our L1 SNARKs of interest on one of them?

L1 requirements for prover costs are quite extreme: proving in seconds and on consumer-level hardware, to accomodate solo stakers for example (for signature aggregation particularly). It is unlikely that a generic zkVM can meet that soon enough:
- soon enough is somewhat subjective: it depends on the perceived urgency that L1 scales to 1M+ validator and <10s SSF time and much higher blocksize limit than exist today (such that L1 keeps its share of execution vis-a-vis dApps)
Auditing should be integrated in the process of building the substrate from early stages. This can be taken further by baking formal proving of self-contained components into the CI.
Elimination of cruft and reduction of bug surface
Building a substrate as close to the (arithmetization) metal as possible is advantageous for L1 but disadvantageous for businesses whose goal is to go to market as quickly as possible.
The Last Mile Mis-alignment of Incentives: if the proving cost of a zkVM project are sustainable economically to start onboarding dApp devs into their prover services, the incentive then is to focus on devrel and bd etc. There is less incentive to continue dedicating resources to grind down on prover costs further. In fact, there could be an incentive not to do that or do it but not open-source it, because that removes their moat. Otherwise, proving will be done client-side or other competing services will start competing on the service using their open-sourced cheap prover:
- The industry wants server-side proving while Ethereum needs client-side proving

Via Ming::

Auditability: the architecture should be straightforward, ideally explainable in layman’s terms, and highly auditable. I would even say it’s beneficial if didn’t rely on pre-compile handrolled circuits, allowing for a static circuit design that remains adaptable even through L1 main-chain upgrades in the futures. We even tried to think about exploring possibility one-instruction-set-computer as zkVM candidate)
Accessibility + Decentralization: The zkVM prover should be optimized for accessibility on consumer-grade hardware:
- No GPU requirement.
- Workload should be min-overhead parallelizable by design to run efficiently on low-end hardware. The aggregation of workload threads should remain simple, with minimal inter-thread data exchange requirements.
Optionality on recursion: while other zkVM solutions focus on STARKs combined with SNARKs for on-chain verification, zkVM on Layer 1 is in a unique position in that the SNARKs are baked-in at the client level, so gas-cost savings is not a primary concern. This setup makes recursive SNARKs optional rather than essential.
In End Game zkVM, a modular zkVM prover framework can be advantageous (A topic we rarely touched): define modular prover framework interfaces for different zkVM solutions on L1, enabling plug-and-play. It’s also essential to design economic incentives to reward participation and ensure alignment with network goals.

3. Requirements for L1-specific SNARKs:

SNARKs R&D has matured over recent year and can plausibly now meet L1 core requirement of keeping hardware requirement low enough for consumer-grade hardware:

GKR-Sumcheck zkVM: 2^20 riscv add opcode in 0.65s => 1.613Mhz on a single node. 7x & 17x faster than SP1 & Jolt, respectively.

Plonky3 2M/s Poseidon hashes

Risk0 Ethereum block proven in 90s, sha256 of 1MB per second

SP1: 15 minutes 38GB RAM for 1 BLS12 pairing (arkworks) with SP1 on M3 withg 12 cores

Benchmarks by vac.dev

Computing power: low verifier cost for light clients and low verifier+prover costs for validators
Bandwidth: small proof sizes that can be propagated across the network within one slot. Bandwidth as a limitation is somewhat improved because the time reserved currently for nodes to re-execute blocks is time we can use to propoagate larger blocks.
Storage: there is "cold" data that Ethereum receives from rollups, whether as calldata or the much cheaper blobs. It's "cold" because L1 nodes store it but doesn't re-exeute what's in its belly of transactions (hence scaling). Then there is hot data: the state. Processing transactions involves reading and writing to the state which is structure in Merklized database, which involves a lot of hashing.
PeerDAS scales cold data (blobs), while SNARKs scale the hot data: nodes update the state in-place without re-executing transaction and re-computing root hashes, they just verify the proof attached to a new block. In the End Game where both CL and EL are validity proven, nodes don't even need to store the state at all. The state becomes private input to the proof provider which is the proposer or the builder on its behalf.

4. L1 SNARKification Strategy:

There are two questions to consider:

What is the best overall approach for introducing SNARKs to the base protocol
What proof system ingredients are best suited for L1 needs

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

VM + Bottom-up has better r:r overall: oil the auditing machinery over many iterations over the same core, trial-run in high-stake yet non-consensus-breaking context, and simplify SSF and potentially achieve End Game faster overall.
In Top-down, the Ethereum CL/EL becomes zk{EL,CL} in one-shot. This is sometimes referred to as zkEVM, but that is somewhat reductive because a lot more goes into the Ehereum state-transition function than just EVM opcodes: blocks, headers, signatures, ..
Full image

Top-down circuit

Circuits are created for every aspect of the protocol: EVM opcodes, storage read/write, signature verification, hashing, .. all of which are consolidated in meta-circuits to prove entire blocks and finally state-transition function after that (eg checking this proven block's header to constrain previous root hash of previous block).

The advantage of this is that every circuit can be hand-optimized for maximum efficiency (=least number of constraints), but there are serious disadvantages:

Any change in the Ethereum protocol means the circuits have to be changed, which means they have to be re-audited. Their integration has to be re-audited too, which effectively means auditing the whole thing again.
Meta-circuits (gluing, say, hashing+storage into the state meta circuit to prove a tx footprint on the state was correctly applied) are incredibly complex and extremely hard to reason about or audit in a reliable way.

Top-down VM

There are still circuits in this route, but instead of constraining the always-mutating protocol components, we constrain a stable and simple ISA such as RISCV (or MIPS, WASM, ..). Then, Rust- or Go-based Ethereum nodes can be compiled to RISCV, and the proving goes from there. The circuits in this case have no exposure to the Ethereum protocol directly, a seperation which is clearly advantageous for lower complexity and better auditability.

If Ethereum nodes are upgraded, we simply re-compile the nodes again to RISCV and the proving proceeds exactly as before.

Auditability is also easier. RISCV is simply primitive opcodes and register and memory access.

This abstraction sandwich:

Bread slices:
- Ethereum protocol (not just EVM opcodes, common misconception)
- Constraints
Filling: RISCV instruction set

comes with an overhead, but that can be minimized with optimizations: carve out highly-repeated operations and hand-constrain them with 'precompiles'.

Hashing is an example where, instead of implementing keccak in rust and compiling that to riscv and proving whatever binary the riscv compiler spits out for us, we instead:

Create a set of constraints by hand for keccak, OR
Compose the constraints from low-level opcodes (booleans and shifts etc) to simulate keccak, effectively creating a 'virtual opcode', THEN
Inject this 'precompile' set of constraints during proof generation. This injection is through a connective-tissue Rust library that end users simply import and invoke precompiles through.

Deploying a zkVM to wrap the EL for example goes roughly as:
A generic zkVM is created and the entire Ethereum node is compiled to it. For the EL, a block producer or its proxy (builder) would create a block within said zkVM by way of compiling the production node to the zkVM bytecode. The trace of the compiled binary is what gets proven. The new block and the proof are propagated to the network, and nodes simply update the state to reflect changes from this new block but without re-executing tx's or re-hashing tries_, they simply verify the proof and apply the changes. In post-End Game world, "hot" state data itself could be PeerDASed giving nodes, light or otherwise, the choice not store everything. They simply verify and follow a chain of blocks (no state attached to them) or even just the chain of headers + the state of relevance to them which they can validate against the root of the tip of the chain: "there does exist a state trie whose root hash is x and I have a merkle root with the same x proving that my ETH balance, the leafe, is part of that distributed state".

Bottom-up circuit

Circuits are built but for narrow-scope usecases at some lower level such as Halo2 API or PIL. Depending on the SNARK being built, development time can varry. It can can be an optimization candy store for system developers, because of building so close to the metal. The scope must however remain small before issues of complexity and unaudiability started appearing.

> TODO: find benchmarks of keccak in Halo2 vs in zkVMs or compare/contrasts impl in both

Bottom-up VM

The zkVM is built in exactly the same way as described above in "Top-Down VM" route, but not deployed just yet. Instead, it is used as a substrate to build and deploy narrow-scope SNARKs targeting specific self-contained components of the protocol. These SNARKs are developed mostly in vanilla Rust, with precompiles of high-frequency and/or proving-expensive components imported as libs.

Narrow-scope SNARKs, such as hash-based-SNARK signature aggregation to replace BLS, can help achieve some urgent roadmap goals faster: SSF, quantum resistance, and fully validating light clients. Meanwhile, the core zkVM design and implementation, including proof system backend, can continue to be iterated upon independently and without disruption.

Because of the decoupling between the proof system and programming environment (recall: riscv binary seperates the two nicely), development on Core zkVM can proceeed in parallel with the development of narrow-scope SNARKs of interest, themselves being parallizable.

There exist a decoupling between all layers, and so changes in one does not affect the others (unless some protocol change just happened to touch that exact part of the protocol the SNARK is targeting):

Ethereum protocol changes
SNARK business logic
Upgrades to the core proof sytem
Amendment or additions of zkVM precompiles

5. Bottom-up VM for Ethereum L1

Any self-contained component of the Ethereum protocol could in theory by wrapped in a SNARK. A component is an ideal target for SNARKification if it maximally meets the following criteria:

Urgently needed for security, scalability, and/or decentralization (emphasis on the latter because, arguably, it's lagging behind the former two).
It unlocks optionality or simplification in other parts of the roadmap
Low risk: it doesn't break consensus or it is easy to excise off in case of emergency

Today we could have users attach a SNARK to their transaction proving its validity (well-formedness, signature, balance, nonce,..) wrt the latest finalized block, which can be verified before adding it to the mempool or an inclusion list for example. However expending an effort to take it to production is somewhat unjustified: (1) it doesn't solve an urgent need (2) it doesn't unlock optionality or simplification in other parts of the protocol.

	Urgency	Unlocking potential	Risk
SNARKified ECDSA Sigs	low	low	low
hash-based-SNARKified Sig Aggr.	high	high	high
Consensus-Commitee-free light validation	low	medium	low

Possible strategy: start with #3 because it's less risky but sufficiently useful, then #2 because of its massive unlocking potential towards simple SSF which in turn focuses the work on other things. The work on #1 and #2 can happen in parallel, but deployment of #2 should take longer given the higher risk (dominated by audit time and EIP process).

5.1 SNARK #1: eliminate the need for consensus committee

Right now a daily committee of 512 validators is selected to inform lightclients about the tip of the chain. Eliminating the need for this committee is sufficiently urgent to warrant the effort, can unlock optionality at the app and wallet level (removed or minimized trust vis-a-vis centralized providers of node RPCs) in the short term and can be a kernel to SNARKifying all of EL. It is also low-risk enough to serve as a trial run of the SNARKification stack.

What to prove:

At the EL:
- This block is valid: proving must not be too expensive for centralized rpc providers not to support it
- This block is in the chain of headers.
At the CL (however there is a case to first reform the CL layer to be SNARK-friendly first):
- We could follow a storage-based proving strategy to avoid pairings
  - Challenge: validator state is sha256-hashed
    - It is likely sha256 will be refactored-out and replaced with SNARK-friendly Poseidon
    - Is there a storage slot we can prove against the root+headers that says "this block has this total validator weight behind it", and avoid proving hashing? If not: this is probably something that can ~easily be added to CL.
- Chain of headers since the merge

5.2 SNARK #2: Hash-based recusive SNARK to replace BLS Sig Aggregation

Replacing BLS aggregation with a hash-based SNARK, where aggregation is accomplished though recursive proving, can achieve many benefits and unlock optionality wrt to SSF and state structure refactoring:

Post-quantum-secure signature aggregation
More efficient aggregation: recursively SNARK-aggregating two batches with overlapping bitfields is a non-issue, unlike in BLS aggregation where we have to perform an optimization heurestic and so some signatures can go to waste.
Optionality on SSF: if each node can recursively-prove and forward in <=1 1s, brute-forcing Single-slot Finality becomes feasible. Bonus: simultaneously progressively reduce staking amount from 32 ETH down to 1 ETH, which may be possible if p2p comm can bear flooding-style broadcast and/or more aggregation/collation hierarchy. At fast-enough aggregation SNARKs, we avoid consensus changes or even the current aggregation strucuture (except maybe increasing number of participants at each layer of aggregation).
Optionality on validator set size: choose un- or less structured broadcasting of signatures because what's being communicated in aggregation subnets is (a) bitfield (b) proof, the list of (batches of) signatures are private input that don't get communicated. This reduces the issue of supporting >1M validators to a question of bandwidth.

Because hashing and recursion is core to this SNARK, there is indirect unlocking potential at the EL level provided the zkVM substrate exhibits blazing fast hash and whole-block proving (i.e. the success of this SNARK as a trial may justify moving expanding and parallelizing the team to move faster on partial and full EL SNARKification):

Optionality on state tree:
- Skip qunatum-unsafe Verkle trees
  AND
  (keep the the existing hextree OR upgrade to binary trees).
- Keep hextrees: Binary (less disc IO because less sparse, smaller roots) over hex (less hashing, longer roots) becomes less of a factor because SNARK proofs to the effect "this piece of data is part of the finalized state" have roots as private input, and therefore need not be communicated anyway. The question is then reduced to only: can we improve disc IO with software engineering and just keep the hextree as is?
- Keep keccak: it is very likely hand-optimized circuits and zkVM precompiles for proving keccak get to 200k+ keccak/second in the next 6-12 months, which unlocks the option of keeping hextrees and avoid the tree-transition complexity.
As high gas limit as PeerDAS can safely bear: in the last milestone where the entire EL/CL are validity-proven and nodes are completely stateless except for holding partial data in PeerDAS, the block gas limit can be extremely large as the labor of producing and proving blocks are left to specialized builders. The distinction between staking nodes and light clients is blurred.

6. Implementation approach

There are two fundamental aspects to seriously consider in building SNARKs for Ethereum:

Performance: there should be a plausible path towards (a) acceptable hashe/seconds, 200k/s and (b) acceptable proving time of an entire Ethereum block
Security: [TODO: acceptable soundness security levels]

6.1 Ingredients

Credit to Ceno contributors for their design on which this wishlist is based

We can list some of the design choices that are both desireable given the state of the Ethereum roadmap, and feasible given the state of ZK R&D:

Quantum resistance
Small fields
Small commitment sizes during proof generation
Efficient aggregation
- Right now: Recursion > Folding
- TODO: concisely describe issues in folding
No FFT: we can't have O(N) provers with FFT, they at least bump prover complexity to O(NlogN)
Contained arithmetization: constrain a small set of forever-unchanging opcodes, re-use them to constrain higher-level higher-frequency components as 'virtual opcodes' or 'precompiles'.
Parallelization: opcodes can be run in parallel and aggregated in no particular order.
Arithmetization TLDR: rigidity of R1CS results in bloat, flexibility of plonkish results in high complexity. Balance: GKR+sumcheck.
TODO: Basefold backend + recent developments: STIR, WHIR, ARC

6.2 Formal verification

Formal verification is undecidable by Rice's theorem, so we can't build general-purpose formal verifiers and just pass systems through. This is fundamentally why formal verification is tedious and hard and why its feasibility and usefulness is inversly proportional to how large and complex the codebase is. FV shines when the codebase is small and all execution paths can exhaustively be verified.

There are two general approaches:

Transpile the software program into an equivelant representation amenable to mathematical quantifiers, and prove facts (theorems) about that code
Define properties the system shouldn't have, aggregate such properties in a giant SAT formula, and then use an SMT solver to prove it's satisfiable.

For large systems, formal verification is very complex and the spec itself could be buggy leading to under- or mis- or non-reporting of bugs.

Unique challenges to a RISCV-based zkVM:

We have to audit GCC and LLVM too, or use an older commit and freeze to it (newly introduced optimizations in both are a constant source of bugs). In Ethereum client development, we rely on client diversity to shield against compiler bugs.
Subtlties when it comes to the soundness of the backend in theory vs in implementations

Adaptations:

Diversify:
- The Backend (proof system implementations): this is the backend which takes riscv binary and generates a proof for its execution. Rust is dominant. Garbage-collected languages may not be suitable because zero-cost abstractions and lean parallelization are kee to prover time. Anecdotally, parallelization in Go may not be efficient because of the garbage collector.
- The backend of The Backend: PCS/IOP backends
- Frontend (SDK languages): this is the programming environment for compiling business logic (narrow-scope snarks, or entire Ethereum nodes) to riscv. The viability of a language depends on the compactness of the resulting riscv binary. Creating the connective tissue libs for precompiles may also be challenging (?)
- RISCV compilation stack
Deploy in increments, starting with consensus-non-breaking SNARKs.
Formally-verify self-contained components in isolation (eg the zkvm_add opcode is exactly riscv_add), then (a) fuzz inter-components (b) formally verify inter-components taking each as a valid blackbox (eg treat a formally-verified zkvm_add just as a normal add).
- What's wrong with fomrally verifying the whole things without blackboxing? It increases the complexity of the spec which leads in bugs in the spec itself leading to mis- or non-detection of code bugs.
Contineous FV: bake in FV in the CI of the software development, such that if a component changes its gets auto re-verified.
Diversify the formal verification vertically (abstracting or zooming in) and horizentally (use different formal-proof systems), and complement it with thorough fuzzing.

7. Anchoring the SNARKifiation discussion at SSF

There are many considerations to weigh when discussing Ethereum upgrades in terms of assigning priorities and making design and security assumptions. Some aspect can be quiet subjective, such as: what is the current risk of imbalance between L1-L2 in share of execution (low/med/high/critical)? Another less subjective but still not fully objective question: when do we consider Poseidon to have passed the battle-testing? and so on.

One way to ogranize the discussion is to put Single-slot finality (SSF) at the center and weigh-in every consideration against that. SSF affects and is affected by many other components of the roadmap, and so it serves as a good anchor for assigning weight to different considerations.

8. Discussion and further digging:

Is SSF indeed the best anchor of discussing roadmap priorities?
Can we even entertain the possibility of <= 5s proving of 50k+/s sha256 (CL) and 200k+/s keccak (EL)? That would simplify a ton of things and we could actually reach the End Game a lot sooner.
Can quantum unsafe KZG in PeerDAS be replaced with quantum-safe coding-based PCS used in zkVM? If so, that would mean: 1 audit, 2 deployments.
Can the aggregation hierarchy be eliminated altogether, pushing the task of aggregating all validator signatures to the proposer, who in turn outsources to the builder? The slot time then becomes dominated by:
1. Block proving by the builder
2. Block verification time by the proposer and rest of validators (block = public input)
3. Signature batching by the builder: validators send their attestation to the block to the builder directly.
  Only (2) needs to be extremely fast on consumer hardware.
  Caveat: centralization risks.
Do we actually need to worry about the size of state Merkle proofs if we have hash-based SNARK proofs of state? An hash-based SNARK proof should have it as private input, and so a user or light client for example receives: this leaf has this root (proof 1), and root has x validator weight behind it (proof 2, whose public-input = block headers since genesis)
Concrete benchmarks of hash-based SNARKs-based signature aggregation we have to meet in order to achieve SSF the easy way (brute-force)?
- proved hashes/s
- proofs broadcast (bandwidth) per? <= 1/2 slot-time is which is 2.5-5 seconds in SSF world
Can less- or unstructured signature aggregation hierac result in faster overall aggregation time?
TODO: elaborate on snarkifying riscv vs a custom-built 'zk-frienly ISA'

9. Supplementary material

State of ZK Proof Systems

The seeds for general-compute in zero-knowledge have been planted over the decades of the 90s and 00s under the academic umbrella of the interactive proofs research generally and PCP specifically. But these noble seeds lacked sufficient conditions to grow: prohibitive proof size and prover/verifier time.

This began to change since early 2010s with innovations in cryptographic primitives and arithmetization of sufficiently-generic computations. More breakthroughs followed and shoots began sprouting out, leading to the first non-trivial implementation of a zkSNARK proof system in 2016: Groth16.

More, more, and more shoots sprouted in years to follow, but not without hitting fundamental obstacles: (a) the tooling still too close to the metal for the average developer to build zkDapps, or the average system developer to build large systems with baked-in ZK, and (b) inefficiency and rigidity of R1CS arithmetization imposed a ceiling on what can practically be built in terms of size and performance.

More innovation followed. PLONK (2019) brought flexibility and universal updatable trusted setup for zkSNARKs. Meanwhile, older ideas began to be realized in live hash-based SNARK production systems. The 2021-22 years witnessesed an increase in large zkS{N,T}AR systems being developed, chief among which are zkRollups tethered to Ethereum. There has also being many advances in primitives such as lookups, commitments schemes, and incrementally verifiable computations (IVC).

TODO:
- key developments in 2023-2024:
  - Blaz, Basefold, STIR, Arc 2
  - WHIR "serves as a direct replacement for protocols like FRI, STIR, BaseFold, and others"
  - Binary fields (binius)
  - Rediscovery of gkr+sumcheck
  - Folding iterations
  - Learnings of flexibility<>complexity in arithmetizations

Arithmetization

A look back at arithmetizations given from implementer's perspective

R1CS arithmetization simulates arbitrary NP deciders using two simple primitives: arithmetic addition and multiplication. This in practice leads to a lot of overhead as routine primitives such as logical operators have to be simulated using a lot of addition and multiplication.

Plonkish arithmetization is a generalization of R1CS and does provide flexability to construct more custom gates, reducing the overall number of constraints. This is done using selector columns (and therefore selector polynomials). With great flexibility, however, comes great complexity when the large circuits are integrated to simulate an overall VM intent. The layout of the overall supercircuit integrating many other circuits becomes too inefficient (lots of unused areas in the final table) and too complex to audit.

[_TODO: AIR arith_]

[TODO: more on padding in Plonkish see eg "Mapping Rows to the Evaluation Domain" in this]
[TODO: performance, complexity, and auditability. TLDR: circuits for small purpose-specific things, VM for larger things.]