Post Quantum signature aggregation: Brute-Force Recursion vs. Folding

# Post Quantum signature aggregation: Brute-Force Recursion vs. Folding ## Introduction As Ethereum moves toward a post-quantum future, one of the most critical challenges is replacing its core signature scheme, BLS. While elegant and efficient, BLS relies on pairings over elliptic curves, which will be vulnerable to quantum attacks. The leading candidates for a quantum-resistant replacement are hash-based signatures like XMSS. They offer robust security but come with a price: they are significantly larger and far more computationally expensive to verify on-chain. Verifying tens of thousands of individual XMSS signatures every block is simply not feasible. This is where signature aggregation becomes essential. The goal is to take a massive set of individual signatures and compress them into a single, constant-size, and computationally cheap proof that can be efficiently verified on-chain. This proof must irrefutably attest that every single signature in the original set was valid. The architecture of this cryptographic aggregator will have implications for the performance, security, and scalability of Ethereum's consensus layer for decades to come. In this post, we'll explore two distinct philosophies for building this aggregator. The core difference lies in *how* we handle recursion. The first path uses **brute-force recursion**, where we verify a full SNARK inside another, a task so complex it often requires a **zkVM** abstraction. The second path uses newer, more specialized primitives called **folding/accumulation schemes**, which offer a cryptographic shortcut to achieve recursion more directly and efficiently. ## The Aggregation Architecture: From a Million Signatures to One Proof The core challenge is one of scale. With a potential validator set of up to a million, individually processing each post-quantum signature on-chain is impossible. The gas costs would be astronomical. Therefore, we need a robust system to compress proof of participation for the entire validator set into a single, compact object. This system is built on two concepts: **bitfields** to track participation and a recursive **`Aggregate`/`Merge`** process to scale the cryptography. ### The Blueprint: Bitfields and Subnets A **bitfield** is a binary vector where a `1` at index `i` signifies that validator `i` has signed. This is crucial for Ethereum's consensus, which needs to know who participated to assign rewards and penalties. To handle the scale, we partition the validator set into smaller, manageable **subnets**. For instance, we can divide 1,048,576 ($2^{20}$) validators into 1024 ($2^{10}$) subnets, each containing 1024 ($2^{10}$) validators. This strategy allows for massive parallelization. ### The Multi-Layered Aggregation Flow The aggregation process unfolds at two scales: within each subnet and then between them. The core operations, **`Aggregate`** (processing raw signatures) and **`Merge`** (combining existing proofs), happen throughout this flow in a flexible, and potentially unstructured, P2P topology. #### **Layer 1: Intra-Subnet Aggregation** The process begins in parallel within each subnet, where the workload is shared in a decentralized, graph-like manner. 1. **Initial `Aggregate`:** Aggregator nodes grab small batches of raw XMSS signatures, verify them, and create "seed proofs." The subnet quickly becomes populated with many small, valid proofs. 2. **Local `Merge`:** Other nodes in the same subnet listen for these seed proofs, combine any two (or more) they find, and broadcast a new, more comprehensive merged proof back to the subnet. This continues until one final proof emerges for the entire subnet, represented by a pair: (`proof_subnet_N`, `bitfield_subnet_N`). #### **Layer 2: Inter-Subnet Merging** Once each subnet has its final proof, the second layer of aggregation begins. Now, every operation is a **`Merge`**. 1. **Global Collection & Merging:** Nodes listen for the final proofs from each subnet and repeat the same unstructured, gossip-based merging process as before. This continues until one node produces the final, globally aggregated (`final_proof`, `final_bitfield`), which is the only object submitted on-chain. ## The Core Cryptographic Choice: How to Merge Proofs? The critical difference between the two architectural paths lies in how the recursive `Merge` step is cryptographically implemented. ### Path A: Brute-Force Recursion (and the need for a zkVM) The most direct way to merge two proofs is to simply prove that you correctly verified them. This is "brute-force" recursion. #### **How It Works: Verifying a Proof in a Circuit** The `Merge` operation involves a prover generating a SNARK of a circuit that contains a **full SNARK verifier**. The statement being proven is: "I have two valid proofs (`proof_A`, `proof_B`), and I have correctly verified both of them." However, the circuit for a SNARK verifier, especially a post-quantum one like WHIR, is massive and extraordinarily complex. Building and formally verifying such a circuit by hand is practically infeasible. This is where the **zkVM** comes in. It's not a new route, but rather a powerful abstraction to manage the immense complexity of this one. An engineer writes a simple program with a `snark_verify` instruction, and the VM infrastructure handles the colossal task of correctly generating the underlying constraints. #### **Advantages** * **Powerful Abstraction:** The zkVM solves the complexity problem of brute-force recursion. It provides a clean separation of concerns, making the system easier to audit and maintain. Auditors focus on a minimal ISA, not a sea of polynomial constraints. * **Future-Proof Flexibility:** The zkVM is a general-purpose tool. The same infrastructure could be reused to prove other computations for Ethereum, making it a valuable long-term investment. #### **Drawbacks** * **High Recursive Overhead:** The elegance of the VM hides a high computational cost. Executing a full SNARK verifier is a heavy operation, making the `Merge` step for this path computationally intensive and potentially high-latency. * **VM Infrastructure Complexity:** While the program logic is simple, the zkVM itself—with its compiler, memory models, and lookup arguments—is a massive and complex piece of software to build, maintain, and secure. ### Path B: Folding & Accumulation Schemes This second path avoids the complexity of brute-force recursion by using a more advanced cryptographic tool: **folding**. #### **How It Works: A Cryptographic Shortcut** Instead of verifying a full proof, a folding scheme combines the mathematical statements of two proofs directly. This is possible because all proofs, whether they are for raw signatures or for other folded proofs, are just structured mathematical objects (called **instance-witness pairs**) of the same type. Let's walk through the example: Node A folds 100 signatures, Node B folds another 100, and Node C merges their work. 1. **The Universal Circuit:** We first define a single, universal circuit, $\mathcal{F}_{step}$. This circuit understands two things: * How to verify a raw XMSS signature. * The rules of the lightweight **"folding verifier"**—the mathematical steps required to combine two proofs. 2. **The `Aggregate` Step (Nodes A & B):** * **Node A** takes 100 signatures. It runs the signature verification logic of $\mathcal{F}_{step}$ and produces its output: an instance-witness pair `(instance_A, witness_A)`. This pair is a cryptographic proof representing the claim "I hold a valid witness for 100 signatures." * In parallel, **Node B** does the same for its 100 signatures, producing its own proof, `(instance_B, witness_B)`. 3. **The `Merge` Step (Node C):** * **Node C** receives both `(instance_A, witness_A)` and `(instance_B, witness_B)`. * Here's the important step: Instead of running a heavy `snark_verify` function, Node C uses these two pairs as inputs to the "folding verifier" part of the circuit $\mathcal{F}_{step}$. * It runs the **folding algorithm**. This lightweight process takes `instance_A` and `instance_B` and cryptographically combines them into a single new `instance_C`. It does the same for the witnesses, producing `witness_C`. * The output is a single, new folded proof: `(instance_C, witness_C)`. This new proof represents the combined claim of the previous two, without ever having to fully verify them in the traditional sense. 4. **Finalization:** This efficient folding continues across the network. Another node, D, could now fold the proof from C with another proof. Only the final node in the chain generates a compressed SNARK from its final accumulated proof. The key is that the recursive step is so cryptographically simple that the circuit for it can be built directly, **potentially making a complex zkVM unnecessary**. You simply replace the "SNARK verifier" in the `Merge` logic with the much simpler "folding verifier." #### **Advantages** * **Extremely Efficient Recursion:** The `Merge` step is incredibly fast. By replacing a heavy SNARK verifier with a lightweight folding algorithm, merger nodes can operate with much lower latency. * **Optimal Prover Performance:** Hash-based schemes like **WARP** are designed with **linear-time provers**, meaning the prover's work scales perfectly with the size of the task without extra overhead. * **Specialized Optimizations:** Lattice-based systems like **Neo** offer **"pay-per-bit" commitments**. For bit-heavy tasks like signature verification, this can make proving orders of magnitude cheaper. #### **Drawbacks** * **Witness Management Complexity:** A naive implementation of folding would require transmitting large witnesses (megabytes) between nodes. The main challenge of this route is designing and implementing clever **witness management techniques** to keep this data small. This adds significant complexity to the core protocol, trading computational savings for engineering and protocol complexity. * **Lower-Level Abstraction:** Without a VM, engineers work directly with low-level cryptographic primitives. Implementing the aggregation logic requires deep expertise to translate it into polynomial constraints, making it harder to build and audit. * **Maturity and Complexity:** This is the frontier of cryptographic research. Adapting these schemes for complex P2P networks (PCD) and integrating features like lookups are active areas of development, making them less "off-the-shelf" than more established paradigms. ## Conclusion: The Path Forward The choice for Ethereum's post-quantum signature aggregation scheme is not simply about which proof system is fastest, but about which architectural philosophy offers the best long-term trade-offs. * **Path A (Brute-Force)** reduces the immense complexity of its recursive step using the powerful abstraction of a **zkVM**. The cost is the overhead of the VM itself and the high computational cost of full recursion. * **Path B (Folding)** leverages a more efficient cryptographic primitive for recursion, making the process faster and potentially simple enough to **dispense with the VM**. The cost is the increased complexity of the core protocol, particularly around witness management. Ultimately, the decision rests on a fundamental trade-off: do we manage complexity at the **infrastructure level (the VM)** or at the **protocol level (the folding scheme)**? The exploration of both paths will be crucial in finding the right balance of performance, security, and simplicity for Ethereum's future.