# BLS Signature Verification and Aggregation In Nova
**Pathway toward efficient proofs of finality for Ethereum**
# Introduction
This work presents research towards a more complete, scalable, low latency, SNARK-based proof of finality for Ethereum using IVC schemes. All code created to achieve these results is available [on my github](https://github.com/Nramsrud), with explorations in parallelized Nova requiring [customized implementations of the Nova library](https://github.com/Nramsrud/Parallel-Nova) and [Nova-Scotia tooling](https://github.com/Nramsrud/Nova-Scotia) also available.
## Motivation
With the progression in developer tooling, as well as academic and industry research and implementations of SNARK schemes, the applications of succinctly verifiable computation have extended into a myriad of blockchain applications. These tools enable compressing the execution of any "external" computation into something that is succinctly verifiable in practice for onchain applications. These schemes have many different kinds of practical applications such as verifying the result of a consensus mechanism for use in compressing state history (e.g. [Mina](https://minaprotocol.com/) and similar ZK-based blockchains), verifying the result of some arbitrary external execution environment (e.g. ZK validity rollups and zkEVMs), as well as providing proof of consensus for verifiable data exchange across blockchains (e.g. for use in ZK powered bridges such as [Telepathy](https://www.telepathy.xyz/)).
Several projects have explored such use of SNARKs to create proofs that verify consensus to reduce the trust required when verifying external events for generalized message passing between domains. The ultimate goal of consensus proof based interoperability is to fully define and verify the rules of the consensus mechanism to create a proof that can prove the validity of finalized state with the full security of the source consensus mechanism, enabling the use of this finalized state without requiring additional trust assumptions.
Succinct Labs was the pioneer, to our knowledge, in working on a public implementation of cross-chain message passing via a proof of consensus SNARK for Ethereum. Their implementation makes use of the Sync Committee protocol, designed for use with Ethereum light clients. Sync committees are chosen as a verifiably random sub-selection of the validator set with lower frequency rotation which have additional responsibilities aside from consensus progression. Sync committees are required to produce additional signatures for every finalized slot which enable light nodes to verify finality with lesser computational burden than verifying all committee attestations. For a given sync committee, a SNARK can be created validating the sync committee's attestation (i.e. their signatures) for a given slot. This proof can then be used as the foundation of an onchain Ethereum light client that uses smart contracts to verify the SNARK proof. From this, Merkle inclusion proofs enable verifiable data exchange and event broadcasting. This is usually enough security, because the sync committee size is large enough where it is unrealistic that malicious validators could take over the sync committee due to its verifiably random shuffling, making the costs for attacking the light client protocol similar to the costs of attacking the entire validator set. However, in an ideal world, relying on the sync committee as opposed to the direct and complete verification of the entire set of validator attestations, is suboptimal even if it's not insecure. Sync committees do not have the same requirements and slashing conditions as validators via attestations and votes, and as such the concept of building a full client rather than a light client still has its own merits, even more so if this solution can have comparable performance costs, possible thanks to the succinctness of SNARK verifiers.
Our motivation is to develop a more complete proof of consensus that verifies committee majority vote for a given slot and true majority for a given checkpoint, by looking at the whole validator set and not just the sync committee, with the goal of achieving proof of finality with no additional trust assumptions. This proof of consensus would enable cross-chain interoperability with the full economic guarantees of the source blockchain through trust-minimized onchain smart contract clients to enable cross-chain communication. Optimally, this solution should also be scalable to a large number of signers and the performance target should be comparable to the network progression such that proof generation does not fall behind or produce undesired latency.
## Goals
This makes use of and builds upon previous work done by 0xPARC's [circom-pairing libraries](https://github.com/yi-sun/circom-pairing) and Succinct Labs' [Ethereum consensus proofs](https://github.com/succinctlabs/eth-proof-of-consensus/), as well as Rust based implementations of [Nova IVC](https://crates.io/crates/nova-snark) from Srinath Setty, forks for [parallelizing Nova from the PSE team](https://github.com/privacy-scaling-explorations/Nova/blob/parallel_prover_bench) and [middleware tooling](https://crates.io/crates/nova-scotia) for digesting R1CS and witness generation by Nalin Bhardwaj.
Nova is a folding scheme designed to achieve Incrementally Verifiable Computation (IVC) by processing linear combinations of satisfying instances for statements of the same structure. We make use of Nova by first using Circom to create R1CS circuits that verify the correctness of a BLS aggregate signature over the BLS12-381 curve given a list of public keys and participation bitmap. The example circuit used in our benchmarking is simplified and does not check for the validity of the participating set (active validators or proper committee/slot assignment). Such implementations have been made for sync committee election (by Succinct Labs) but does not apply to committee assignments via shuffling for slot based aggregation of attestations and votes. This set selection and membership check implementation is required and will be included in future work, but for validating the performance viability of the system, they aren't being considered at this time. Our assumption is that BLS aggregate signature verification represents the majority of the overhead and computational work required to achieve our goals.
As a baseline, we used set sizes per committee of 512 public keys. The circuit aggregates the public key list and participation bitmap into a aggregate public key via G1 point addition, maps the hash of message to the G2 group, and conducts a two pairing check of the aggregate signature and public key. This results in a primary circuit of 22.5M constraints. This tests a committee size more than current committee set size of ~340 validators. Our estimates show that this would decrease the circuit size by ~3.5% and the witness generation time by ~12%. Although in a production circuit, maximally configuring the step circuit for maximal set size would enable forward applicability without the need to perform additional circuit setups and upgrades when the set size grows.
Our results show that while Nova enables operating a running SNARK prover instance in parallel to the consensus mechanism, building attestation proofs as the consensus machine progresses towards finality, computational overhead is too large to generate proofs in comparable times to network progression for every single slot. Each step of Nova recursion representing the verification of a single committee vote for a given checkpoint over one slot, requires roughly 20 seconds per proving step after 180 seconds of witness generation time. The result is a single chain Nova proof for 64 committee attestations requiring 1,456 seconds (1,636 seconds with parallelized witness generation), representing the verification and aggregation of attestations from 32,768 validators and over 1.4 billion constraints. Importantly, these early results demonstrate the efficiency of folding where for any given number of folds, the total proof time was roughly half of the time required to generate the same number of individual proofs.
We also explore additional layers of parallelization by extending the one dimensional Nova proof into two dimensions as a folding tree where multiple running instances can be parallelized and folded together to achieve one final proof. This concept could also be extended into a third dimension where slot proofs can be processed in parallel and folded together across the length of the epoch as it progresses resulting in a final proof over all slots and committees for a given epoch.
What we found was that for higher number of instances the gains from parallel processing of folds are lost in the overhead required to clone, copy and duplicate large datasets (primary circuit and CRS duplication) in memory while also reducing the effectiveness of multithread computation that the base Nova implementation makes use of. For small number of instances, this structure provides an increasing efficiency gain, from ~20% to ~40%, as the number of instances increase. These results show promising possibilities in parallelized proof generation over many running instances.
Lastly, we discuss possible optimizations to reduce the cost of proving statements about BLS aggregate signatures and multivariate folding scheme implementations aiming at achieving optimal prover work.
## Overview of Gasper Consensus
### At a high level
Gasper consensus is a dynamic availability consensus machine that prioritizes liveness and high replication (decentralization) over optimistic responsiveness (fast finality). As validator numbers increase, the time required for information to fully propagate and thus for consensus to be achieved, increases. This requires the core of the consensus machine to operate around a fork choice rule (LMD-GHOST) to enable the network to progress in uncertain conditions. Time is divided into epochs, with each epoch containing 32 slots (blocks). To enable the network to progress at relatively fast pace given the validator set size, validators are randomly shuffled with a Verifiable Random Function (VRF) into committees, and each is assigned to a slot, to be reshuffled every epoch. This minimizes network communication overhead as each slot only involves 1/32 of the validator set.
Because there is not a full consensus round performed at any given slot, a secondary finality mechanism must be employed to ensure progress under a common fork. Casper FFG requires that in each slot, committee members must attest to a proposed block, a target checkpoint (which is the first slot in an epoch) as well as a source checkpoint (a previously justified checkpoint). This means that at the end of a 32 slot epoch, the entirety of the validator set has attested to a common target checkpoint, justifying (soft finality) the previous epoch and the first slot of the current epoch, and finalizing the source checkpoint which was previously justified. This justified height is then used to inform the LMD-GHOST fork choice rule, becoming roughly "follow the chain of the greatest height that is built upon the latest justified height." This is the optimistic case for finality progression. In general, checkpoints are finalized when the majority of the network signals support for a given justified checkpoint. This means that different target votes and low validator participation can delay justification and finalization through several epochs.
Casper FFG provides classical finality in the case of 2/3 honest majority and economic finality (accountable saftey) even if greater than 1/3 of the validator set is faulty or adversarial (the finality conditions are described in greater detail by [Ben Edgington's Eth2book](https://eth2book.info/capella/part2/consensus/casper_ffg)). The power of Casper FFG's approach to safety and liveness was demonstrated by how the Ethereum network handled the [recent execution client finality glitch](https://medium.com/offchainlabs/post-mortem-report-ethereum-mainnet-finality-05-11-2023-95e271dfd8b2).
### BLS Signature Aggregation and Attestation Rounds
Shuffling, aggregator selections, sync committee election and randomness are all assumed to be familiar to the reader. These protocols are not covered in the benchmark implementation, but will be required for a complete proof implementation.
Consensus in Gasper is achieved via BLS signature aggregation over the BLS12-381 curve. In a nutshell, private keys are 32 byte scalars, public keys are `sk*g1` and thus members of the G1 group (48 bytes), messages to be signed are mapped to the G2 group (96 bytes), signatures are the public key multiplied by the mapped message and are a member of the G2 group (96 bytes), aggregating signatures is the product of signatures resulting in a point in G2 group (96 bytes), verification of aggregate signatures requires aggregate public keys which are the product of participating public keys and thus also members of the G1 group (48 bytes), signature verification consists of a two pairing check of the aggregate signature and G1 generator point with the mapped message and aggregate public key.
Aggregating signatures and public keys is a non-interactive and incremental process, with each signature aggregation computation (multiplication of G2 points) taking 1,350ns, each public key aggregation computation (multiplication of G1 points) being 4,500ns and verification computation (two pairing checks) around 5,400,000ns.
In order to reduce the overhead of passing public keys alongside signatures in the global topic, a participation bitmap of size equal (in bits) to committee size is added to the signature, mapping participation to the validator public key tree. This is updated alongside signature aggregation.
This signature scheme creates constant size signatures and public keys (ignoring participation bitmap), with verification cost a constant 2 pairing checks regardless of set size. In a committee size of ~350 (approximately current Ethereum validator committee size), aggregation of the entire committee at full participation by the aggregator requires 172 bytes per attestation (including a 44 byte bitmap, 32 byte message and 96 byte signature), equalling 60KB of subnetwork overhead per committee during a slot. Aggregating signatures would cost 0.5ms, aggregating public keys would cost 1.6ms, verification would cost 5.4ms. The resultant aggregate attestation would be the same size as an individual attestation at less than 200 bytes. This same process is conducted across all participating committees per slot which is 64 committees. This is remarkably efficient, in both size and time.
Unfortunately, verifying BLS signatures over the 12-381 curve in particular is very inefficient in Groth16's base curve of BN254 (of which there is an EVM precompile making it possible to use in Ethereum). Doing so requires emulating non-native field arithmetic over the larger 381 bit field along with operations over field extensions, which requires a lot of constraints and results in very large circuits. As such, creating SNARKs of valid attestations with regards to BLS signature aggregation will require some clever abstraction and optimizations in order to be more practical.
# Benchmarking Setup
The overall design of the proving system mirrors a less dynamic consensus progression whereas each slot is accepted by the network with 2/3 majority committee attestation linking to a previously justified checkpoint. This is an optimistic configuration as in practice, the network can continue to progress without finalization (not 2/3 majority vote toward a justified checkpoint), meaning valid slots will be accepted with linkage to a justified checkpoint that may be one or more epochs deep.
Using IVC proof schemes, accumulators and steps can mirror the consensus protocol (which can be reduced to verifying a statement with repeated structure) through a running IVC instance, not unlike how zkEVMs process and prove state transitions. This enables the proving system to be run in parallel to the consensus progression, instead of sequential (i.e. proof generation begins only upon finality), with the aim of reducing overall latency between the network achieving finality and the final proof being generated.
The predicate proven per step was: given a set of public keys, participation bits, signature and hash of message, verify the signature is valid. This represents a valid committee attestation.
An aggregate proof of 64 valid committee attestations for a given checkpoint, slot number and epoch number represents a valid slot. An aggregate proof of 32 slot proofs for a given epoch represents a proof of a justified checkpoint (for a given target checkpoint) or a proof of finality (for a given source checkpoint), enabling the verification of all slots and block contents for the epoch prior to the checkpoint slot and everything else before it, via SSZ inclusion proofs.
This benchmarking circuit does not include set membership checks for validating that the participating public keys are valid committee members in the correct slot or that they are members of the active validator set. Implementing these checks would require circuits for the "swap-or-not" shuffling algorithm. These checks would be required for a complete proof of finality implementation. For the purposes of these benchmarks, it is assumed that signature verification represents the largest portion of overhead.
Our circuit construction was built using Circom with the [0xPARC Circom pairing libraries and example circuits](https://github.com/yi-sun/circom-pairing) as well as example implementations from [Ethereum Proof of Consensus](https://github.com/succinctlabs/eth-proof-of-consensus/) work by Succinct Labs for verifying a BLS aggregate signature using Groth16 and R1CS. Witness generation was with the C++ witness generator. [Rapidsnark](https://github.com/iden3/rapidsnark) was used for generating the proof and [Snarkjs](https://github.com/iden3/snarkjs) for verifying the proof for the Groth16 implementation of the step circuit. Benchmark circuits for both 32 and 512 signers were used to demonstrate the overhead growth of increasing the signature set. The benchmark circuit was used as the step circuit in the recursive proof over 2, 4, 8, 16, 32 and 64 iterations, demonstrating the overhead of recursively proving valid committee attestations over an increasing number of committees.
Creating recursive proofs with Nova utilizes cycle curves to verify step proofs which require compiling the circuits over a pairing friendly curve. The cycle used was the Pasta curve cycle which required a [customized implementation of Circom by Nalin Bhardwaj](https://github.com/nalinbhardwaj/circom/tree/pasta) to be used. The proving pipeline outside of circuit compiler and witness generator was done in Rust, using the [Nova libraries](https://crates.io/crates/nova-snark) by Srinath Setty and [Nova Scotia middleware tooling](https://crates.io/crates/nova-scotia) by Nalin Bhardwaj. Explorations in parallelized Nova schemes used a [fork of the Nova libraries](https://github.com/privacy-scaling-explorations/Nova/blob/parallel_prover_bench) by the Privacy and Scaling Explorations team as well as my own patches to the Nova Scotia tooling for Circuit and witness processing.
Compiling and testing these large circuits requires large computational resources. Many Circom operations are single threaded and require all data to remain in RAM. As the SRS for the step circuits is large (requiring a powers-of-tau `2^27` at 145GB), RAM usage peaked at 450GB. On the other hand, many of the Rust-based Nova computations were multithreaded and required less RAM for generating the CRS (~80GB) and benefited from many cores. All tests were performed on an AWS R5.8XL 32 vCPU 256GB RAM machine with 200GB swap (increased to 400GB swap to test parallelization).
# Results
## The Step Circuit
### Verifying a single signature from 512 signers
Example circuit [here](https://github.com/Nramsrud/Circom-eth-proof-of-consensus/blob/Nova-BLS-circuits/circuits/circuits/novaaggregate_bls_verify.circom)
Increasing the number of signers per committee changes the input data as well as circuit size. Increasing the committee size from 32 to 512 (1600%) only increases the total constraints by 10.5% with witness generation time (outside of recursive proof) increasing by 47%. As was inferred earlier, it follows that increasing the set size is more performant then increasing the number of proof iterations in the recursive circuit.
<img style="display: block; margin-left: auto; margin-right: auto;" alt="Table 1: Groth16 implementation of BLS aggregate signature verification for a set size of 32 and 512 public keys." src="https://hackmd.io/_uploads/HkFDjG_3n.png">
It is already clear that at this stage, witness generation will be a large barrier to practicality, being single threaded and requiring over 10x the slot time to generate for a given committee attestation.
## Single-chain Nova
### Verifying a chain of signatures, each of 512 signers
Implementation [here](https://github.com/Nramsrud/Nova-Single-Chain-BLS)
<img style="display: block; margin-left: auto; margin-right: auto;" alt="Representation of Nova sequentially folding witnesses." src="https://hackmd.io/_uploads/B1H4bNd32.png">
Our initial implementation of Nova is as a single chain proof pipeline where each committee signature is verified in a linear sequential proof. It is assumed that witness generation would be processed in parallel regardless of number of steps and is therefore not included in prover time.
<img style="display: block; margin-left: auto; margin-right: auto;" alt="Table 2: BLS aggregate signature verification in a single-layer Nova proof for up to 64 committees, each of 512 validators." src="https://hackmd.io/_uploads/r1pWBHtn2.png">
The resulting proof can be seen to grow linearly in the number of steps. Both the folding overhead as well as the verification overhead is linear in the size of the circuit. This benchmark already delivers on the efficiency promises of Nova whereas folding inputs to achieve IVC is more efficient and parallelizable than monolithic SNARK proof generation with post-proof accumulation schemes. Our results show that proof generation time for a single instance in a Groth16 proof is comparable to a Nova proof containing 2 instances (with prover time roughly being 50% of the time it would take to generate the same number of Groth16 proofs).
With slot durations of 12 seconds, single chain Nova slot proofs are unable to keep up with the progress of the network by two orders of magnitude.
To achieve full succinctness for onchain use, a final compression SNARK will be needed to reduce verification times to the subsecond territory (and in a scheme that Ethereum supports). The Nova library supports the use of Spartan to generate a compression SNARK. Unfortunately, Spartan proof and verifier size are `O(log^2(n))` to `O(sqrt(n))` and thus are not compressing for a statement of this size (figure below). Nonetheless, as a compression SNARK will be required for on chain use, we kept the results for purposes of illustration.
<img style="display: block; margin-left: auto; margin-right: auto;" alt="Table 3: Spartan proof times for compressing Nova proofs for up to 64 committees, each of 512 validators."
src="https://hackmd.io/_uploads/HkWJxVY22.png">
## Parallelized Nova
### Verifying a group of signatures in parallel
Implementation [here](https://github.com/Nramsrud/Parallelized-Nova-BLS)
As an extension of this concept, we also explored additional layers of parallelization by extending the one dimensional Nova proof into two dimensions as a folding tree where multiple running instances can be parallelized and folded together to achieve one final proof. This concept can also be further extended into a third dimension where slot proofs can be processed in parallel and folded together across the length of the epoch as it progresses resulting in a final proof over all slots and committees for a given epoch.
The initial implementation of this design was as a binary tree, where for `N` steps, `N/2` instantiations of Nova are processed in parallel, with each layer folding the resulting Nova proofs along with an additional step, resulting in `log_2(N)` total sequential fold layers required to prove `N` steps.
<img style="display: block; margin-left: auto; margin-right: auto;" alt="Representation of Nova folding witnesses in parallel using a tree structure." src="https://hackmd.io/_uploads/S14EI4On2.png">
What we found was that for higher number of instances the gains from parallel processing of folds are lost in the overhead required to clone, copy and duplicate large datasets (primary circuit and CRS duplication) in memory while also reducing the effectiveness of multithread computation that the base Nova implementation makes use of. As shown in the table below, for small number of instances, this structure provides an increasing efficiency gain, from ~20% to ~40%, as the number of instances increase. This gain is completely overtaken by memory inefficiencies for the 32 instance test with prover times being twice the single chain equivalent cost. Shown also is the total time spent duplicating data during the proving steps. Assuming this as waste to be optimized out of a production implementation, the efficiency gains approach 50% for the 16 committee test, and break even for the 32 committee test. You will notice that we do not show a 64 committee test. This is because we ran out of system memory and could not produce a result (with 256GB RAM and 400GB swap, did I mention memory problems?).
<img style="display: block; margin-left: auto; margin-right: auto;" alt="Table 4: BLS aggregate signature verification in a parallel processed Nova proof tree for up to 32 committees, each of 512 validators." src="https://hackmd.io/_uploads/ryYVavt33.png">
Taking note of the efficiency gain curve to parallelization, we theorize that there exists a "Goldilocks zone" of parallelization design which allows multiple single chain Nova instances to run in parallel, with a final fold step/tree merging each single chain instance (shown below). This design would balance the overhead of data duplication with parallelizing processing with the optimal values for number of prongs being dependent on circuit size and resource availability. While some experimentation was done in this design, it was abandoned for now to explore other avenues for optimization.
<img style="display: block; margin-left: auto; margin-right: auto;" alt="Representation of Nova folding witnesses in parallel using a shallower tree structure." src="https://hackmd.io/_uploads/r15XqNu23.png">
This implementation resulted in some customizations to the PSE Nova fork for higher resolution benchmarking and experimentation, as well as a custom fork of Nova-Scotia to align dependencies and enable R1CS file digestion and witness generation for parallel Nova, both of which are available [here](https://github.com/Nramsrud/Parallelized-Nova-BLS/).
# Where Do We Go From Here?
Our end takeaways are that while work can be done to make recursion more efficient, even a 50% reduction in proof generation time would still be 10x longer than what is needed for a practical implementation that can maintain pace with the network for every slot (even when ignoring further proofs for set membership and majority check). While some of this cost comes from the inefficiencies and overhead of the folding scheme, the primary driver effecting total cost comes as a result of the complexity of validating BLS aggregate signatures in R1CS. Doing non-native field arithmetic and bigint representation in a quadratic constraint system leads to a very large number of constraints, requiring a very large setup and very large circuit, which creates a computationally intensive witness generation process, resulting in large witnesses and long prover time, all increasing the resources required for proof construction and reducing the opportunities for parallelization. This also carries through and impacts the efficiency of the folding schemes which are primarily a function of circuit size.
Thus we have two levers to pull, continue down the pathway of optimization of a monolithic arithmetization of BLS aggregate signature verification with customized constraints, implementing wider and custom gates, and other novel techniques to compress circuit size along with multivariate folding schemes that trade-off prover work for verifier work, and/or explore more efficient ways to prove statements about BLS aggregate signatures to serve as the foundation of a more efficient but equivalent scheme.
Since both of those paths require a divergence away from the Circom implementation and R1CS, the more bang-for-the-buck exploration is in looking towards other, more efficient NARKs for BLS aggregate signatures. Regardless, below are some of the realizations and paths to be taken in future implementations and experiments.
## Hardware Acceleration
Aside from witness generation, much of the folding overhead is dominated by parallelizable Multi-Scalar Multiplication. These computations could be accelerated via GPUs to produce an order of magnitude or more reduction in fold time. This is an area of very active research, with a [recent ZPrize competetion](https://assets.website-files.com/625a083eef681031e135cc99/6305a4064bf4b8421497cf7a_gpu-fpga-msm.pdf) resulting in some [impressive results](https://blog.lambdaclass.com/eyes-on-the-prize/) that could make Nova for large circuits practical.
While Nova supports acceleration via CUDA, we did not explore it for now.
## Multivariate Folding Schemes
Multivariate poly-IOPs enable linear combinations of instances which generate no error terms or additional computational overhead. Additionally, the flexibility of the constraint systems enables the creation of customizable gates (no longer limited to just multiplication and addition) to reduce the overall statement size. While multivariate poly-IOPs create optimal accumulation and prover work, the trade-off is in the increased verifier work with multivariate PCS being `O(log(n))`. As even with Nova, a final compression proof is required to make it fully succinct for onchain use, this will likely be a net positive trade-off for prover efficiency.
## Inner Product Arguments and Proofs
BLS signatures have interesting algebraic properties that can be taken advantage of to transform the proof pipeline from constraining the steps for verifying a BLS signature to making statements about [inner products in a pairing based language](https://eprint.iacr.org/2019/1177.pdf). This proof construction removes witness generation and is far more memory efficient than the R1CS equivalent. This will be the focus of our next exploration toward practical and scalable attestation proofs.
# Notes and Open Questions
## Set Membership Checks
The final proof of valid attestation is only as secure as the constraints upon it. As is implemented in the benchmark, any valid aggregate signature over the same message from 64 committees of 512 participants would produce a valid proof. The final implementation for proof of finality will need to include set membership checks that constrain the signing set to the active validator set and perhaps even slot and committee assignments.
Implementation of the shuffle algorithm for committee assignment would link the random seed, slot number, committee number and public key set to the valid committee attestation check. The [specific implementation](https://github.com/ethereum/consensus-specs/blob/v1.3.0/specs/phase0/beacon-chain.md#compute_shuffled_index) operates hashing and basic mathematics over 32 and 64 byte integers and should therefore not pose a large impact on total per step constraints.
There are several implementation paths available dependent largely on the overhead of verifying a set of public keys through the generated proof. If it is found to be lightweight-enough (unlikely), this check could be done per committee alongside the committee signature check, resulting in a complete check of valid committee attestations. However, as the slot/committee assignments are available at the beginning of an epoch, a single execution output for the entire validator set which populates a shared lookup table could provide the least overhead for set membership checks.
This will be part of later work.
## Further Signature Aggregation
As verifying a single aggregate signature is the dominating factor in overall circuit size and overhead, it would make sense to aggregate committee attestations across all committees per slot prior to verifying it in a SNARK. This would reduce the 64 signature verifications down to a single verification, which could then be folded via some folding scheme across each slot in the epoch. While this reduces the work to verify attestations, it makes the set membership check one step removed from the protocol layer. There may be some opportunity to do a pre-fold aggregation while maintaining transparency of the public key set, which might provide another avenue for optimization.
## Non-homogeneous Views
In the cases where the network is not progressing as a single chain caused by validators of differing views due to a network partition, this proof construction would fail to provide a valid proof. It is possible to relax the committee count allowance to enable multi-epoch attestations on a source checkpoint, verifying only total validator participation count over aggregate proofs, finalizing upon majority count. These considerations are taken to be out of scope currently, but should not be insurmountable.
## Double Counting Risk
Another layer that was not taken into consideration was ensuring that no committee was double counted nor that the participation of a single valid committee member cannot be counted more than once. While this is handled well at the protocol layer, making the resultant proof fully trustless would require a similar check in the circuit or smart contract level to make sure all these invariants were properly upheld.
## How Much Finality Is Enough Finality?
Finality in the context of interoperability is user-side friction and its requirement is application dependent. Interoperability networks should be seen as settlement layers for crosschain data and should therefore only relay finalized events. However, this presents interesting design problems in creating desireable application UX.
Even in the context of how to prove finality in a SNARK, implementing the consensus verification as well as the FFG logic would prove to be impractical. For most uses, relying on the economic finality of justified checkpoints, in that they would require at least 1/3 of the signing set to revert and therefore be slashed, provides ample security while massively simplfying logic.
---
A heartfelt thanks to Gustavo Spelzon, Stefano Tessaro, Nirvan Tyagi, for help, insights and refining our strategy. [Srinath Setty](https://twitter.com/srinathtv) for some pointed support for optimizations, the proof scheme, and a truly stellar implementation base. [Nalin Bhardwaj](https://twitter.com/nibnalin) for the tooling that made this much simpler, [Carlos Perez](twitter.com/@CPerezz19), [Aleph](https://twitter.com/alpeh_v) and the PSE team for exceptional example code and help in benchmarking the parallel Nova construction.
---
# Refs
More on BLS aggregate signature implementation in Ethereum
https://ethresear.ch/t/pragmatic-signature-aggregation-with-bls/2105
BLS12-381 for the rest of us
https://hackmd.io/@benjaminion/bls12-381
A great in depth overview of Ethereum Consensus
https://eth2book.info/capella/part2/building_blocks/
Eth2book in general
https://eth2book.info/capella/
0xPARC Circom Pairing Blog
https://0xparc.org/blog/zk-pairing-1