# BLS Signature Verification Proofs As Inner Product Arguments
**Pathway toward efficient proofs of finality for Ethereum**
# Introduction
This is a continuation on our research and development on our work towards a scalable and efficient proof of finality for Ethereum. The initial discussion outlining design goals and motivation can be found [here](https://hackmd.io/@wrinqle/rye7Dju92), along with bechmarking results using R1CS BLS signature verification circuits and Nova recursive schemes.
If you want to understand more of the problem statement, motivation and earlier work, please reference the previous HackMD above.
# Key Learnings
Our previous benchmarking results showed that while parallelization of folding instances for step circuits of single BLS aggregate signature verifications yielded promising performance, we would still need 1-2 orders of magnitude improvement in order to achieve our practicality goal of $(Proof Time)/(Consensus Time)\le1$.

This barrier to practicality is largely driven by the size of the statement being folded; i.e. the complexity of generating a proof of a BLS aggregate signature over the BLS12-381 curve in R1CS embedded in 255-bit curves. Therefore there are two paths for possible optimizations: optimizing the constraint based arithmetization of signature verification and folding schemes (via custom gates, field reductions, constraint systems and folding schemes, all of which are heavily manual work over large codebases), or investigate other proving systems which look to sidestep the most costly aspect of these proofs; the non-native field and curve operations.
The first option will always be an uphill battle as it requires emulation of non-native field and curve operations in a much smaller field (which is a lot like doing math with really big numbers on a narrow sheet of paper).
The second option introduces the concept of signature scheme specific proof systems over the native curve as a prover-efficient pre-computation proof that can be efficiently folded and compressed for onchain use. While these proofs cannot be directly verified onchain (as BN254 is the most commonly supported curve for SNARKs onchain), it can reduce the amount non-native field and curve operations required prove the validity of a set of signatures.
# BLS Signatures as Inner Products
So let's see if we can cheat a bit.
Recall BLS signature verification is the two pairing check
$$e(pk, H(m)) = e(g_{1}, \sigma)$$
over the bilinear map
$$e: \mathbb{G}_{1} \times \mathbb{G}_{2} \to \mathbb{G}_{T}$$
where the signature scheme is defined over the elliptic curve as group elements
$$g_{1} \in \mathbb{G}_{1}, pk \in \mathbb{G}_{1}, H(m) \in \mathbb{G}_{2}, \sigma \in \mathbb{G}_{2}$$
---
Verifying a list signatures
$$
pk_{1} \times H(m_1) = g_{1} \times \sigma_{1}
$$ $$ pk_{2} \times H(m_2) = g_{1} \times \sigma_{2}
$$ $$ pk_{3} \times H(m_3) = g_{1} \times \sigma_{3}
$$ $$ \vdots
$$ $$ pk_{n} \times H(m_n) = g_{1} \times \sigma_{n}$$
begins to look a bit like inner products over vectors of group elements
$$
\begin{bmatrix} pk_1 \\ pk_2 \\ pk_3 \\ \vdots \\ pk_n \end{bmatrix} \cdot \begin{bmatrix} H(m_1) \\ H(m_2) \\ H(m_3) \\ \vdots \\ H(m_n) \end{bmatrix} = \begin{bmatrix} g_1 \\ g_1 \\ g_1 \\ \vdots \\ g_1 \end{bmatrix} \cdot \begin{bmatrix} \sigma_{1} \\ \sigma_{2} \\ \sigma_{3} \\ \vdots \\ \sigma_{n} \end{bmatrix}$$
And if all of the messages being signed on are the same, we can aggregate the signature to a single element (via point addition)
$$\begin{bmatrix} pk_1 \\ pk_2 \\ pk_3 \\ \vdots \\ pk_n \end{bmatrix} \cdot \begin{bmatrix} H(m) \\ H(m) \\ H(m) \\ \vdots \\ H(m) \end{bmatrix}= g_1 \times \sigma_{agg}$$
This equation set (along with checking message correctness and public key aggregation) is what is verified for each committee per slot (64 committees) and for each slot per epoch (32 slots per epoch) to verify finality, totalling 2,048 aggregate signatures or 4,096 pairing checks.
But, we can take this further and move the $e(g_1,\sigma_{agg})$ over to the left to give this constraining equation for valid signatures:
$$\begin{bmatrix} pk_1 \\ pk_2 \\ pk_3 \\ \vdots \\ pk_n \\ g_{1}^{-1}\end{bmatrix} \cdot \begin{bmatrix} H(m) \\ H(m) \\ H(m) \\ \vdots \\ H(m) \\ \sigma_{agg} \end{bmatrix} =1$$
Which is just an inner product argument in a pairing-based language!
This construction can also be extended over multiple messages allowing us to use it as a proof construction over each slot and epoch with the constraint still holding (or even over multiple signing sets, e.g. different blockchains using the same signature scheme):
| For each committee | For each slot | For each epoch |
|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| $$\begin{bmatrix} pk_1 \\ pk_2 \\ pk_3 \\ \vdots \\ pk_n \\ g_{1}^{-1}\end{bmatrix} \cdot \begin{bmatrix} H(m) \\ H(m) \\ H(m) \\ \vdots \\ H(m) \\ \sigma_{agg} \end{bmatrix}=1$$ | $$\begin{bmatrix} \vec{pk_{c1}} \\ \vec{pk_{c2}} \\ \vec{pk_{c3}} \\ \vdots \\ \vec{pk_{c64}} \\ g_{1}^{-1}\\ g_{1}^{-1}\\ g_{1}^{-1}\\ \vdots \\g_{1}^{-1} \end{bmatrix} \cdot \begin{bmatrix} H(m_{c1}) \\ H(m_{c2}) \\ H(m_{c3}) \\ \vdots \\ H(m_{c64}) \\ \sigma_{agg_c1} \\\sigma_{agg_c2} \\ \sigma_{agg_c3} \\ \vdots \\ \sigma_{agg_c64} \end{bmatrix}=1$$ | $$\begin{bmatrix} \vec{pk_{s1}} \\ \vec{pk_{s2}} \\ \vec{pk_{s3}} \\ \vdots \\ \vec{pk_{s32}} \\ g_{1}^{-1}\\ g_{1}^{-1}\\ g_{1}^{-1}\\ \vdots \\g_{1}^{-1} \end{bmatrix} \cdot \begin{bmatrix} \vec{H(m_{s1})} \\ \vec{H(m_{s2})} \\ \vec{H(m_{s3})} \\ \vdots \\ \vec{H(m_{s32})} \\ \vec{\sigma_{agg_{s1}}} \\ \vec{\sigma_{agg_{s2}}} \\ \vec{\sigma_{agg_{s3}}} \\ \vdots \\ \vec{\sigma_{agg_{s32}}} \end{bmatrix}=1$$ |
# Verifiable Computation of Inner Pairing Products
Armed with this realization, we can look to construct an inner pairing product argument in the BLS12-381 curve using Generalized Inner Product Arguments ([GIPA](https://https://eprint.iacr.org/2019/1177)).
What this proof construction enables is to reduce the problem of directly verifying the 2,048 aggregate BLS signatures in circuit to verifying the correct computation (and constraint) of the product of pairings of two public vectors comprised of all participating public keys (or aggregates), hashes of messages and aggregate signatures.
This method of verifying a set of BLS signatures was first proposed by Benedikt Bünz, Mary Maller et al in "[Proofs of Inner Pairing Products and Applications](https://eprint.iacr.org/2019/1177)" in appendix 3 as an implementation of SIPP.
## Protocol
Given:
$$
\mathbf{A} \in \mathbb{G_1^m}, \mathbf{B} \in \mathbb{G_2^m}, Z \in \mathbb{G_t}, \mathbf{r} \in \mathbb{F^m}
$$
Prove:
$$
\prod^{m-1}_{i=0} e(A^{r_i}_i, B_i) = Z
$$
Protocol:

The prover computes a total of 2$m$ pairings, $m$ exponentiations in each source group ($\mathbb{G_1}$ and $\mathbb{G_2}$), and $m$ hashes to generate random values (for the Fiat-shamir random challenges). The verifier computes $m$ exponentiations in each source group ($\mathbb{G_1}$ and $\mathbb{G_2}$), $2log(m)$ exponentiations in the target group ($\mathbb{G_t}$), $m$ hashes and a single pairing.
## Implementation And Results
What do we gain with using this as our proving system for verifying a batch of aggregate BLS signatures? Over the set of 2,048 aggregate signatures of ~1M public keys, this proof reduces the 4,096 pairing checks to 1 pairing check, with no witness generation. As the protocol relies upon random linear combinations, it requires no setup and is completely trustless, also making it memory efficient. As the protocol does not require FFTs, it is fully parallelizable (more compute = faster proofs/verification).

Our implementation results show that for both the slot proof and the epoch proof, our practicality target is achieved, with prover times being half the consensus time for each of the given tests. As the proof construction is computationally bound, additional benchmarks were run on a 32 core baremetal machine and achieved an additional factor of 2 reduction in proving time.
**That means an epoch proof of > 1 million validators in ~ 1 minute!**
## Making A Succinct Verifier For Onchain Use
As the verifier is not succinct (9s to verify!) nor in the BN254 curve, we must wrap the verifier in a Groth16 proof to utilize this proof onchain. Unfortunately, even though we reduced the verifier work to $log(m)$ for $m$-sized vectors, we still run into the issue of costly non-native group and field operations.
The single pairing check costs ~11M constraints (do-able!) but the large number of group and field operations required to verify the prover's work is an additional ~565M constraints.
While our Circom verifier implementation is assumed to be underoptimized, this is not immediately practical for compressing the IPPA proof.
Regardless, this construction effectively reduced the total constraints required to prove the validity of an epoch of validator attestations from ~46 billion constraints to ~575 million constraints (an 80x reduction!).
## Optimizations And Open Problems
Currently we are investigating pathways for optimizing the verifier work in order to make the compression wrapper practical. None of these are exclusive and may work together to achieve optimial results.
### Aggregating Public Keys
The above proof construction was intended to be a "maximum configuration" proof as we are still determining how to link the set membership proofs to the batch signature proof. As such, the proof contains each of the participating public keys individually, where for each committee, there is n elements of participating public keys over the same message, where n can be on the order of 350 validators or more. Over 64 committees per slot and 32 slots per epoch, this constitues a bulk of the elements in each vector.
This would be the lowest hanging fruit, to optimize for the instances where the message is the same.
If an efficient set membership scheme can be built which outputs a commitment to an aggregate public key per committee, we could then reduce the batch signature check to just include aggregate public key per message and aggregate signature, cutting the total vector lengths from ~1M to 4,096 (2 elements per committee per epoch). Unfortunately as the verifier work is $log(m)$, this only cuts the verifier side non-native field and group operations in half, which would still require optimizations to make practical.
### Helpers
Our primary area of investigation is ways of introducing a helper construct (which could be the prover) which can precompute the $\mathbf{A}', \mathbf{B}'$ and $Z'$ values, then pass a commitment to the verifier, which can trustlessly check for correctness by opening the commitment at an appropriate point. This might require a structured setup and protocol changes but could reduce the verifier work from $m$ exponentiations in each source group ($\mathbb{G_1}$ and $\mathbb{G_2}$), $2log(m)$ exponentiations in the target group ($\mathbb{G_t}$) to 3 commitment openings (the cost of which is dependent on commitment scheme).
A similar concept is used in Halo2 called "Nested Amortization" as well as in the proving system [Testudo](https://https://eprint.iacr.org/2023/961), specifically in the commitment scheme "Testudo-Comm". These potential paths for optimization were mentioned in conversations with Mara Mihali from Aztec (who is also a coauthor of Testudo).
### Instruction Machine Lookups
Another interesting path is to append all group and field operations to an "instruction machine" lookup table, the verification of which can be defered to a powerful verifier. This would enable the IPPA verifier to assume correctness of the given values and verify a single pairing in circuit, the acceptance of which is predicated upon an additional proof of instruction machine lookup table correctness for the rest of the verifier computation.
This would give us the ability to prove the validity of the set of computations in a proving system that is prover efficient as a precomputation to the final compression wrapper which would verify the output of this proof.
This concept was taken from Aztec's proving system implementation and refined in conversations with Innokentii Sennovskii from Aztec.
While this may seem like overkill, this has certain desirable properties in terms of verifier scalability if we look to extend the goal of creating a finality proof of one blockchain to proving other blockchains in parallel and compressing them into a single proof, which is the ultimate design goal.
### Proof Verifier "VM"
Another possible pathway to parallelize proof compression at scale (with the added complexity multiple finality proofs being generated in parallel), is that it may be more efficient at some complexity threshold to run the verifiers in a provable VM (like [RISC-0](https://www.risczero.com/) or [Lurk](https://lurk-lang.org/), or a specialized VM over defined over a much smaller and simpler instruction set), outputing an execution trace. Whether this execution trace is recursively folded to achieve succinct proofs or committed to via FRI, the output should be more efficient to verify in Groth16 than direct verification, despite the overhead.
This proof verifier VM design has interesting applications outside of our use in scaling proof systems toward prover efficient schemes that forego succinctness.
In fact, Aztec's private function proving system design could lend itself to serve as this proof verifier layer which outputs a succinct proof. In order to achieve private state execution of smart contracts, the computational burden of proof of valid execution for smart contract functions is on the RPC node and not on the network layer (as with classical smart contract platforms). This gives three-fold benefit. One, the user data remains private and not exposed to the network, two, the computational cost is detached from the network (meaning no gas, and essentially free), and most interestingly, this allows for arbitrarily complex private smart contracts with constant onchain cost and acceleratable proof generation performance. Meaning you could make Aztec your verifier VM utilizing its private functions as a form of scalable verifiable compute.
We are currently exploring this idea as we port our verification contracts to Aztec's [Noir](https://github.com/noir-lang/noir/) language and [Sandbox](https://aztec.network/sandbox/).
### GKR Parallelization
Along the same lines as the proof verifier VM, where the goal is to essentially throw computational resources at the problem of creating a succinct merged output proof, we can parallelize the verifier proof or Instruction Machine table proof using the [GKR](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/12/2008-DelegatingComputation.pdf) protocol. GKR enables any quadratic circuit with parallelizable steps to be proven separately as a cascade of sum-checks and accumulated together.
As the costly parts of verifier compression are the large number of group and field operations, these could be proven independently and folded together either serially or in a tree.
# Where do we go from here?
Alongside work toward a succinct recursive verifier for the batch signature proof, we are working on the other parts of what constitutes a complete finality proof: verifying the invariants checked as part of the finality mechanism. For Ethereum this includes set membership checks for active validator status and proper committee/slot assignment along with a majority proof.
The batch signature verification proof is, by far, the heavier of the proofs required for a complete proof of finality, so these set membership and majority proofs (while no less exciting) are not anticipated to create any barriers for practicality.
The structure of these proofs will be like those of the earlier per-committee Nova scheme, with each committee proof being a set membership lookup argument verifying active validator status, a bit vector sum-check proof to prove no multiplicity in the participation bits, and a majority sum-check proof to prove the signature corresponds to a majority attestation. An output of these proofs is a commitment to the committee's public key vector which is used to link a given committee proof to the batch signature verification proof.
Each of these proofs can be generated in parallel and folded together into one set membership and majority proof to accompany the batch signature proof.
-----
Thank you to [Innokentii Sennovskii](https://twitter.com/rumata888), [Mara Mihali](https://twitter.com/MihaliMara), and [Adrian Hamelink](https://twitter.com/adr1anh) for graceously putting up with my attempts to nerd snipe them and for sharing their brilliance which will influence our paths forward.
-----