Efficient Set Membership Check For Recursive SNARKs Over Many Steps

# Efficient Set Membership Check For Recursive SNARKs Over Many Steps With A "Small" Table In looking for a protocol for set membership checks over many recursive steps with a shared, unstructured table (for use in validator status checks, which will be core protocol for all PoS light client work), there is enough of a difference in the particular statement type and intended uses that separates it from generalized lookup arguments and justifies a standalone investigation. The reason I am calling this set membership instead of a lookup argument is for the intended use, we wish to check that some value (or vector) is a member of an unstructured list and not of the common lookup use (range check, bitwise operation or other structured or continuous table). These values are often the inputs to the circuits and completely predetermined before runtime. This gives ample opportunity to optimize and generally omit work throughout the proving system for an overall simpler protocol. More formally: Given sequences $W_{i}=[w_0,w_1,..,w_i]$ and $T_j=[t_0,t_1,...,t_j]$ in $\mathbb{F}$. Prove $W_{i} \subseteq T_j$ for all $i$, with $W_i$ being a subset of the witness values. In the case of a recursive SNARK, the values in $W_i$ are spread between $s$ steps. For the context of the discussion, $|T|$ is $[0,2^{20}]$ (for Ethereum, the largest validator set we envision), the number of lookups $i$ is usually on the order of $|T|$ over the total statement but for any given step might be on the order of 10s-100s, which implies multiplicities always being $[0,1]$, although this constraint doesn't change the protocol. Our key realization is that for small table sizes, the linear work of directly calculating the set membership argument in circuit has lower overall cost and complexity than implementing a full lookup proof inside/alongside (Super)Nova. The former being implemented directly alongside the step circuit logic with little additional verifier work, the latter requiring multiple polynomial commits to the table and secondary verifier logic. Our resulting protocol implements the LogUp-style multiset argument as a running sum per step that can be implemented in three distinct protocols: a per-step complete set membership check, a per-step aggregated subset sum with deferred table sum and equality check, and an incremental per-step complete set membership check for semi-structured tables. The protocols are linear prover work in size of both the size of the set, $W$, and the size of the table, $T$, with each protocol optimizing for a desired tradeoff. Importantly, each lookup costs only 1 multiplication gate, and each table entry access costs 2 multiplication gates. ## Offline Memory Checking Via Permutation Arguments Building off of what has been done via {https://hackmd.io/C6XlfTRFQ_KoRy1Eo-4Pfw?view#Memory-checking-via-permutation-checking-without-re-ordering} which builds off of techniques from Jolt/Lasso, implements a fold-able lookup argument that proves a permutation argument between a read table and a write table, implementing a verifier at each step. This does defer the proving overhead to the last step at the cost of an additional $m$ ($m==i$ or number of lookups) field operations and additional running hash Fiat-Shamir value (or two? Can $\alpha = \gamma$ ?). Taken from the above HackMD: _______ Take grand product expression from Jolt ![](https://storage.googleapis.com/ethereum-hackmd/upload_6551ebf834d4fac45d776ec660f50d53.png) > Expression (12) RHS $T[i]$ should be $T'[i]$, represents the final state of tables. $T'[i]$ and $c_{m+i}$ are non-deterministics input from prover. `m` represent number of read/write accesss. N represent overall memory space. $b_i$ means memory address or table index, $a_i$ means value, $c_i$ means counter. In other word, $T[b_i] = a_i$ with counter $c_i$. > Proof of completeness & knowledge soundness analysis For Equation (12), define $LHS = \prod_{i=0}^{m-1} (\alpha - (b_{i} + \gamma \times a_{i} + \gamma^2 \times c_i))$ For Equation (13), define $RHS = \prod_{i=0}^{m-1} (\alpha - (b_{i} + \gamma \times a_{i} + \gamma^2 \times (c_i + 1))$ And $LHS, RHS \in F$ RHS & LHS both demonstrate nice pattern to be able to folding by purely F-arithmetics. Giving there are k round of foldings, each round just contribute $m_1, m_2, ...,$ and satisfy $m = m_1 + m_2 + ... + m_k$. $m_i$ can be 0 if at that round there is no lookup operation. | Prover work | NIFS Verifier work| | -------- | -------- | | $\Theta(m_i) \times \mathbb{F}$, $m_i$ is the number of lookup in i folding round| 1 random oracle hash to compute challenge|| ______ It can be seen with this protocol, at our given bound of lookups ($~2^{20}$ in our case) this grand product argument requires $2^m$ terms. Using Quarks as proposed requires $4*|T|$ (4M in our case) committed elements, making this seem a bit heavy. This also requires the prover to know all values in advance to calculate $\alpha$ and $\gamma$ as witness. This isn't a problem for light client work as we are targeting IVC batch signature proofs such that these proofs can be done incrementally. But this might be a problem in other signature schemes. ## Simple Incremental Set Membership With LogUp-style Relation in Nova Recall LogUp uses logarithmic derivatives to turn the multiset equality permutation argument of products of polynomials: $$ \prod_i(X-w_i)=\prod_j(X-t_j)^{m_{j}} $$ to sums of polynomials: $$ \sum_i{1\over{X-w_i}} = \sum_j{m_j\over{X-t_j}} $$ for lookup values $w_i$ over table $t_j$ with multiplicities $m_j$. This can be then reduced into the sumcheck of scalars using challenge $\alpha \leftarrow \mathbb{F}$: $$ \sum_i{1\over{\alpha-w_i}} = \sum_j{m_j\over{\alpha-t_j}} $$ which over a batch of lookups $w_{i,s}$ is the sumcheck: $$ \sum_s\sum_i{1\over{\alpha_s-w_{i,s}}} = \sum_s\sum_j{m_{j,s}\over{\alpha_s-t_j}} $$ for $s$ steps. If this equality holds for the given multiplicities, the values are a valid subset. It is at this point in all lookups that the multiset identity is too expensive to calculate/arithmetize directly and thus we look to employ some secondary IOP mechanism to reduce the linear work to sublinear. For the purposes of proving subset equality for validator-signer relationships, it is cheap enough to do this arithmetic directly in circuit without the need for IOPs and the additional polynomial commitments that come with them and in such a way that it maintains the IVC nature of our batch signature proof. ### Per Step Complete Set Membership Check In the simplistic case, this multiset equality check can be performed inside each step by calculating: $$ \sum_i{1\over{\alpha-w_{i}}} = \sum_j{m_{j}\over{\alpha-t_j}} $$ for subset $w_i$, multiplicities $m_j$, over table $t_j$, and challenge $\alpha$. This requires a transcript verifier per step (unless done in circuit via a running hash transcript at increased cost $O(|W|)$) and offloads all set membership verifier work to the prover. This does add additional per-step overhead and complexity in the batch signature proof but makes each step logically complete and incremental by now proving not only the correctness of an aggregate public key but that it is composed only of elements of some table, while also removing the secondary proof and verifier overhead. While the per step complexity is $O(|W|+|T|)$, if this check is replicated in each step over $s$ steps of different values $w_s$, this cost increases to $O(|W|+s*|T|)$ for $s$ steps and is only practical for very small $|T|$. This cost-benefit relationship is well-defined and could be modeled for each application. The in-circuit arithmetic does not need to be emulated in the SNARK to check equality, making it all native multiplications. This adds $i*k$ multiplications (with $k$ limbs) for set $w_i$ to calculate the LHS of the equation and $2*j*k$ multiplications (one for constraining the inverse, another for multiplying by $m_{j}$) to calculate the RHS of the equation. Note that the entirety of the table would need to be input into each step circuit. #### For Small $|T|$ In the Aptos case, or other small validator sets where each block has majority signatures, this multiset equality can likely be arithmetized in its entirety per step with acceptable overhead. For Aptos, at 125 validators, the sums alone would add ~ 1,000 constraints per block. This is not considering the majority check as this is a weight-based BFT. #### For Large-*er* (but still not big) $|T|$ In the Ethereum case, each set per step is ~512 validators (per committee), with 64 committees per slot, 32 slots per epoch over a table of ~$2^{20}$ entries, with elements represented by 6 limbs. This is a per step cost of $512*6 = 3,072$ constraints per step for calculating the per step LHS sum and $2*2^{20}*6 = 12M$ constraints per step for calculating the RHS sum. This is impractical and cost is dominated by the sum over the table of elements. ### Per Step Set Membership With Deferred Table Sum If the table is shared amongst all steps, we can calculate the running sum on the left hand side separately from the right hand side of the argument, enabling each step to calculate the running sum: $$ S_{s-1} + \sum_i{1\over{\alpha_s-w_{i,s}}}=S_s $$ for step $s$ in circuit, passing the running sum to step $s+1$. With $S_{total}$ being the final sum for all used values in $w_i$: $$ S_{total}=S_s $$ at $s_{final}$. The result is an incrementally verifiable lookup against the sum: $$ S_{total} = \sum_s\sum_j{m_{j,s}\over{\alpha_s-t_j}} $$ Where $\alpha_s$ is the fiat-shamir randomness used in step $s$, $m_j$ is the participation bits of the validator set in steps $s$, which holds iff the values used in each step of the batch signature proof, $w_{i,s}$, are active validators in table $T$. This enables each step's set membership proof "work" to be calculated independent of the steps previous and independently from the RHS table sum. To prevent the need to pass the entire table as input to each step, and to reduce prover overhead, the right hand side of the sum is deferred to a standalone proof which can be generated in parallel (only somewhat as you need the full multiplicities/participation set) and shown to be equal to the running sum in each step. This does push set validity all the way to the end of the proof, meaning the proof would fail only at the end if there was a discrepancy in the participation list. Although, I don't think this is much of an issue. (I also think for soundness sake between proofs the multiplicity vector (participation bits) needs to be passed and concatenated per step for equality check in the final table summation proof?) #### Table Summation Proof Cost The Table Summation proof would calculate: $$ \sum_s\sum_j{m_{j,s}\over{\alpha_s-t_j}} $$ This requires $2*j*k$ native multiplications. The final table summation proof will also need to verify $\alpha_s$ is correctly formed. Doing this via running hash transcript over the lookup elements is a bit heavy for larger sizes. There may be some way to piggyback on the randomness commit/check per step in the Nova protocol to only use one set of randomness. This is future work. Proving LHS-RHS equivalence can be done in the table summation proof, but would require serial generation post batch signature proof for the output of the LHS sums. Adding the equality check to the verifier would be trivial, constraining that LHS == RHS as outputs. So this protocol is linear cost in $O(|W|+|T|)$ but the $O(|W|)$ cost factor is amortizable across folding steps. While this protocol is not practical for large lookups it has better asymptotics at small $|W|+|T|$ than stand alone lookup proofs which add secondary verification and increased per step overhead. #### For Small $|T|$ For small tables, this added complexity of carrying randomness per step and deferred verifier logic is not worth the cost. #### For Large-*er* $|T|$ In the Ethereum case, this work over a table of $2^{20}$ is ~12M multiplication gates. This is large but much better than paying this per step and is in the realm of practical as a coprocessor to our overall batch signature statement and likely cheaper than wrapping a verifier for a lookup argument. Also, understanding that $m_j$ has some sparsity to it, the $0$s could be skipped, reducing the overall constraint count. The running transcript hash (~800k hashes in circuit to verify) is another large cost. It would be best if we could reduce that down to a single hashing per step, maybe from the final aggregate public key from each point add step which is determined before the proof? That reduces the verifier to $s$ hashes to verify which is $2,048$ in the Ethereum case. Not great but better and would need some work to be proven sound. ### Complete, Iterative Per Step Set Membership For Semi-structured $|T|$ Having completeness per step is desirable but impractical as $|T|$ diverges from $|W|$. In the context of PoS validator tables, we are assuming there is a secondary, parallel "stateful" proof that establishes the validity of this table of validators, with active status and weights (if applicable). This is assumed to output some checkable commitment to the table that links that proof to any lookups against that table. Depending on how the table is constructed, and how indices are handled, there might be a creative way to do the full LHS-RHS summation per step but only calculate the RHS over the required indices, enabling a similar running incremental sum. This would be ideal as it would internalize the verification of set membership per step, not requiring a secondary proof, while fully amortizing the table summation cost through the steps. In Ethereum's case, the final slot of every epoch establishes a new "active" validator set in the beacon block that shuffles and allocates validators to their respective slot/committee assignment over the next 2 epochs. These slot/committee assignment values are used in the committee subnets to relate a participation index to a public key in the validator tree contained in the beacon block. While we are not concerned with validating proper participation of validators in their respective committee assignments, we are interested in validating their active status in the validator tree. This inherently means a ~512 value lookup in the ~$2^{20}$ value validator tree per committee with the strategy of deferring the summation over the table to a later proof. If the stateful proof that establishes the validator table for a given epoch organizes the elements as a list in the index order by which they are assigned in slots and committees (or more importantly, the order by which they are iterated through in the parallelized Nova step proofs), we could then calculate the running hash of the table slice used in each committee proof ($T_s$) such that: $$ com(T) = com(T_1)+ com(T_2)+,...,com(T_s) $$ over $s$ steps, with each step committing to the table slice for that step. This protocol would cost $i*k + 2*j_s*k$ constraints per step or $512*6 +2*512*6$ = ~10k constraints, assuming full participation.