The fancy world of Accumulation Schemes

# The fancy world of Accumulation Schemes ## PCD Proof-carrying data (**PCD**) is a powerful cryptographic primitive that enables mutually distrustful parties to perform distributed computations that run indefinitely. PCD constructions are based on: * succinct non-interactive arguments of knowledge (SNARKs) that have * succinct verifier [BCCT13] * or a succinct accumulation scheme [BCMS20] * or a NARK with split accumulation scheme [BCLMS21] ## Accumulation Schemes ### Atomic AS Given a (language) predicate $\Phi: X \to \{0,1\}$, an accumulation scheme for $\Phi$ is a tuple (P,V,D), that enable proving/verifying (and later deciding) statements of the form $\Phi(q_1) \land \Phi(q_2) \land \dots \Phi(q_t)$ more efficiently than naively running $\Phi$ on each input $q_i$. #### The role of the Prover At step i of the computation, the prover P accumulates the accumulator $acc_i$ and the predicate input $q_i$[^1] into a new accumulator $acc_{i+1}$. Intuitevely, the role of the prover is to "assemble", i.e. "accumulate" $q_i$ into an object called "accumulator" in a fast and efficient way. #### The role of the Verifier The accumulation Verifier is responsible for checking that a given predicate input $q_i$ and a previous accumulator $acc_i$ have been correctly accumulated into a new accumulator $acc_{i+1}$. This operation should be fast as well (this should be highly optimized since PCD constructions are based on recursion of this V) #### The role of the Decider The Decider at any step $i$ can take in input an accumulator $acc_i$ and outputs a validation bit $b$. If the Decider accepts an accumulator, it must be the case that all previously accumulated q_is satisfy $\Phi$ #### Cost of an atomic AS After t steps, rather than requiring $t\cdot|\Phi|$, this approach requires $t\cdot|V|+|D|$. In practice, this is convenient when $|V| << |D|$. This is possible if and only if $|acc| << |\Phi|$, since the verifier takes in input acc! #### Some caveat The prover generates a proof $pf^*$ together with the new accumulator; this is done to enable Zero-Knowledge Accumulation Schemes. P, V and D additionally require some key, respectively apk, avk, and dk, that are associated with the representation of $\Phi$. These keys are generated by an Indexer algorithm I ### Split AS Atomic AS for Predicates can be generalized to the notion of Split AS for Relations. Given a (relation) predicate $\Phi: X\times W \to \{0,1\}$, a split accumulation scheme for $\Phi$ is a tuple (P,V,D), that enables proving/verifying (and later deciding) statements of the form $\Phi(qx_1, qw_1) \land \Phi(qx_2, qw_2) \land \dots \Phi(qx_t, qw_t)$ more efficiently than naively running $\Phi$ on each input $q_i:=(qx_i, qw_i)$. Here are the major differences w.r.t. the atomic case: * $q_i$ is now a pair, whose instance part $qx_i$ is short, and whose witness part $qw_i$ is potentially long * the accumulator $acc_i$ can be split into a (short) instance and a (long) witness part as well * the accumulation verifier requires only the instance part of the predicate and of the accumulator! The last point is crucial. Even if the total cost of verifying a t-step computation seems exactly the same as before, i.e. $t\cdot|V|+|D|$, now it is not required that $|acc| << |\Phi|$ but only that the instance part of acc is succinct! It is also evident that an atomic AS is a split AS with empty witnesses. ### Knowledge Soundness of AS Knowledge Soundness requires that for every adversary $\mathbb P$ running in expected (non-uniform) polynomial time and auxiliary input distribution $D$, and that produces a "valid" (according to the decider) accumulator, $n$ instance predicates and $m$ instance accumulators, there exists an extractor $\mathbb E$ with oracle access to $\mathbb P$ running in expected polynomial time that produces $n$ wtiness predicates and $m$ witness accumulators that are valid witnesses for the corresponding instances. #### Special case: what if accumulators and predicate inputs have the same form? In this case the definitions don't make any difference between accumulators nad predicate inputs; the prover (resp. the verifier) gives in output $n$ instance (resp. witnesses) predicates: they can either refer to input predicates or accumulators! #### A relaxed definition of knowledge soundness This is motivated by the fact that current PCD construction can be instantiated on a weak form of "multi-instance" extraction. The difference, at high level, is that now the prover produces $l$ different tuples of the previous form (i.e. "$l$ times" $n$ instance predicates and $m$ accumulators), and the extractor needs to extract them all. In the classical setting, this notion is implied by the previous one: just run $l$ times the extractor with oracle access to the prover, but only considering one of the $l$ output tuples. The random oracle is programmed by the reduction, and is "consistent" across the different invokations of the prover. This however makes some difference in the quantum setting, where the extractor cannot simply observe the queries to the real oracle. ## Recursion in PCD (from atomic/split accumulation) ![](https://i.imgur.com/UsnqPSO.png) ▲ **Comparison of circuits used to realize recursion with different techniques** When using the accumulation, the "recursive" circuit $R$ is the same with a small (yet meaningful) difference: we pass the entire proof and accumulators or just the $\mathbb x$ part. This difference has the following implications: * the proofs are **larger**: a (split accumulation-based) PCD proof has size that is linear in a node's computation, while PCD from accumulation schemes have proof whose size is sublinear in the node's computation. * verifying a node proof could require linear time for the latter scheme as well! * yet, it is independent of the number of nodes! That's why it is a PCD. * the NARK verifier takes in input *only* an accumulator instance and a predicate instance, and not the corresponding *large* witnesses. That's the reason why we don't need succinctness at all! ## Some building blocks ### NARK with split accumulation based on DL (and RO) [BCLMS21] show a (zk) NARK for R1CS that has a (zk) split AS, whose accumulation verifier has constant size. The starting point is a quasi-naive sigma protocol for R1CS, where the prover Pedersen-commits to $Az, Bz, Cz$ and sends these 3 values together with the witness $w$, and the verifier checks that the commitments are (i) correctly computed and (ii) the third commitment is consistent with the circuit equation. They additionally show how to make the sigma protocol zero-knowledge (as well as its associated split AS); at high level, the Prover produces randomized (rather than deterministic) Pedersen commitments, and rerandomizes the witness, and additionally commits to all the randomness used; the Verifier provides some random value to compute a linear combination of the witness and the randomness committed by the prover. ![](https://i.imgur.com/zQzJEUQ.png) ▲ **The sigma protocol(s) for R1CS** The next step is to show how to turn these Sigma Protocols into NARKs thanks to Fiat-Shamir approach :rocket: #### Split AS For simplicity we consider one predicate (i.e. for IVC setting), and not the zero-knowledge. The predicate $\Phi(q_x=(x, C_A, C_B, C_C), q_w=w)$ runs the NARK verifier for R1CS of circuit A, B, C, with instance x and proof $\pi=(C_A, C_B, C_C, w)$ The accumulation instance is made of the 4 Pedersen commitments and the instance vector x, while the accumulation witness is the witness itself. The Prover accumulates the previous accumulator and the current predicate input generating a random linear combination of the two (this requires access to a random oracle). This also forces the prover to compute some cross terms (due to the rerandomization of the inputs): the prover then returns commitments to the cross-terms, to be later used by the verifier, who simply checks the "consistency" of the commitments. ![](https://i.imgur.com/osLYort.png) ▲ **Split Accumulation Prover and Verifier for the NARK for R1CS** The Decider corresponds (more or less) to the NARK verifier, i.e. checks that the commitments are consistent and correct. In the actual scheme, they rely on an AS for the Hadamard Product, used as a black-box scheme. The knowledge soundness of both the NARKs and the ASs (both for Hadamard and R1CS) is proved by using an expected-time forking lemma. ## Implementation Two main Rust frameworks: one for accumulation, and one for accumulation-based PCD. The former provides an interface (trait) of a generic accumulation scheme. Currently implemented schemes are: * AS for polynomial commitment KZG [BCMS20] * AS for polynomial commitment IPA [BCMS20] * split AS for Hadamard * split AS for R1CS * ... The latter defines an interface to instantiate a PCD given: * a NARK * an AS for that NARK * constraints for the AS verifier ## Resources * [Proof-Carrying Data from Accumulation Schemes](https://eprint.iacr.org/2020/499.pdf), [[YouTube presentation](https://www.youtube.com/watch?v=axypTmc74Bs)], [[YouTube extended presentation](https://www.youtube.com/watch?v=UNwlBq1FQ3E)] * [Proof-Carrying Data without Succinctness](https://eprint.iacr.org/2020/1618.pdf) * [Halo Infinite: Recursive zk-SNARKs from any Additive Polynomial Commitment Scheme](https://eprint.iacr.org/2020/1536.pdf) ## Soundness of a PCD A transcript $T$ is a directed acyclic graph where each vertex $u$ is labeled by local data $z^{(u)}_{loc}$ and each edge $e$ is labeled by a message $z(e) \ne \bot$. The output of a transcript T is denoted $o(T)$. A vertex is $\phi$-compliant for a given predicate $\phi$ if for all its outgoin edges $e=(u, v)$, either: * (base case, no incoming edges) we have $\phi(z^{(e)}, z^{(u)}_{loc}, \bot, \dots, \bot) = 1$, or * (recursive case, for incoming edges $e_1, \dots, e_m$) we have $\phi(z^{(e)}, z^{(u)}_{loc}, z^{(e_1)}, \dots, z^{(e_m)}) = 1$ T is $\phi$-compliant if all its vertices are $\phi$-compliant. ### Knowledge Soundness For every expected polynomial-time adversary $\mathbb P$, there is an expected polynomial-time extractor $\mathbb E$ that whenever $\mathbb P$ returns a valid (i.e. the Verifier returns 1) output-proof pair $(o, \pi)$, $\mathbb E$ extracts a $\phi$-compliant transcript T. The definition takes into account auxiliary input (and output) distributions.[^2] ### Simulation Extractability We would like to introduce the notion of simulation extractability for PCD. This would allow to use PCD as a building block in schemes where an adversary may have oracle access to it (e.g. in a RCCA setting). So, knowledge soundness must hold even if an adversary has access to the oracle $O(\phi, o) \rightarrow \pi$ such that $V(ivk, o, \pi) = 1$ PCD has simulation extractability soundness with respect to auxiliary input distribution $D$ if for every expected polynomial-time adversary $\mathbb P$ there exists an expected polynomial-time extractor $\mathbb E$ such that, for every set $Z$: $$Pr\left[ \begin{array}{cc|cc} \phi \in F &&& pp \leftarrow G(1^\lambda) \\ \land \; (pp, ai, \phi, o, ao) \in Z &&& ai \leftarrow D(pp) \\ \land \;T \mbox{ is }\phi\mbox{-compliant} &&& (\phi, o, \pi, ao) \leftarrow \mathbb P^O(pp, ai) \end{array} \right]$$ is "computationally close" to $$Pr\left[ \begin{array}{cc|cc} \phi \in F &&& pp \leftarrow G(1^\lambda) \\ \land \; (pp, ai, \phi, o(T), ao) \in Z &&& ai \leftarrow D(pp) \\ \land \;T \mbox{ is }\phi\mbox{-compliant} &&& (\phi, T, ao) \leftarrow \mathbb E^O(pp, ai) \end{array} \right]$$ where the oracle $O$ is defined above. A weak form of simulation extractability is also possible: the adversary has now oracle access to $O(\phi, [z_i, \pi_i]_{i=_1}^m, o)$, but this definition doesn't seem to capture the "essence" of simulation, since the adversary is giving out the information about incoming edges (even though is not saying anything about the local input $z$). ## Typos Page 45 of [BCLMS21]: the definition of b is wrong, it should be $b := \sum_{i=1}^n \nu^{ n−i}b_i$ ## References [^1]: This can be extended to the case of multiple accumulators and multiple predicate inputs, and is crucial to build PCD (rather than only IVC) [^2]: For SNARKs, extraction in the presence of arbitrary auxiliary input can be problematic, if not implausible. When assuming indistinguishability obfuscation, there do not exist extractable one-way functions (and thus SNARKs) with respect to arbitrary auxiliary input of unbounded polynomial length. Other results appear in the literature. This should be investigated and carefully compared for the PCD case.