NeutronStar: IVC Based Infinite PLONK Proof Aggregation

# NeutronStar: IVC Based Infinite PLONK Proof Aggregation $\newcommand{\plonk}{\mathcal{P}\mathfrak{lon}\mathcal{K}} \newcommand{\neustar}{\texttt{NeutronStar}} \newcommand{\verify}{\text{verify}} \newcommand{\bverify}{\text{batched verify}} \newcommand{\pairing}{\text{pairing}} \newcommand{\prepare}{\text{prepare}} \newcommand{\rlc}{\text{RLC}} \newcommand{\ivc}{\text{IVC}} \newcommand{\z}{\mathfrak{z}} \newcommand{\A}{\mathcal{A}} \newcommand{\insert}{\text{insert}} \newcommand{\prove}{\text{prove}} \newcommand{\output}{\text{output}} \newcommand{\absorb}{\text{absorb}}$ ## Abstract ## Background ### $\plonk$ Proof System > Note: in this paper, we focus on $\plonk$ with KZG as PCS Let $C = (n, \ell, (q_{L_i}, q_{R_i}, q_{M_i}, q_{O_i}, q_{C_i})_{i=1}^n, \sigma)$ be a $\plonk$ circuit, where - $n = 2^k$ is the number of gates. - $\ell \leq n$ is the number of public inputs - $\sigma$ is the permutation encoding of the copy constraints We denote a $\plonk$ proof by $\pi$. It consists of the following elements: $$\begin{align*} \pi = \left( [a]_1, [b]_1, [c]_1, [z]_1, [t_{lo}]_1, [t_{mid}]_1, [t_{hi}]_1, [W_\xi]_1, [W_{\xi\omega}]_1, \\ \bar{a}, \bar{b}, \bar{c}, \bar{s}_{\sigma1}, \bar{s}_{\sigma2}, \bar{z}_\omega \right), \end{align*}$$ where the first line consists of group elements and the second line of field elements. The Plonk verifier's work (GWC19 Section 8.3) consists of two stages: Preparation and Pairing. We introduce some helpful notation: Let $\Phi_{\verify}: (\pi, vk) \mapsto b$ denote the Plonk verifier, which takes as input a proof $\pi$ of the above form and verifying key $vk$ for the circuit and produces a boolean value $b$. We will write $\Phi_{\verify} = \Phi_\pairing \circ \Phi_{\prepare}$. Here $\Phi_\prepare$ performs some field arithmetic and elliptic curve operations on the components of $\pi$ to prepare them for the pairing check. We write $$\Phi_\prepare(\pi,vk) = (\ell, r) $$ where (in the notation of GWC19) $$\begin{align*} \ell &= [W_\xi]_1 + u \cdot [W_{\xi\omega}]_1 \\ r &= {\z} \cdot [W_\xi]_1 + u \z \omega \cdot [W_{\xi\omega}]_1 + F - E \\ \tilde r &= {\z} \cdot [W_\xi]_1 + u \z \omega \cdot [W_{\xi\omega}]_1 + (F - [PI]) - E \end{align*}$$ That is, $\Phi_\prepare$ performs Steps 1-11 of GWC19. Then $\Phi_{\pairing}(\ell,r)$ returns the truth value of the final pairing check: $$e(\ell, [x]_2) \stackrel{?}{=} e(r, [1]_2)$$ That is, $\Phi_{\pairing}$ performs Step 12 of GWC19. The decomposition $\Phi_{\verify}$ as $\Phi_{\pairing} \circ \Phi_{\prepare}$ is useful for thinking about Plonk proof aggregation because it separates the verifier work into a stage which can take advantage of standard KZG batching tricks ($\Phi_{\pairing}$) and one which must be handled separately ($\Phi_{\prepare}$). The high-level idea of $\neustar$ is to use folding-based IVC (*e.g.* Nova, Sangria) to aggregate the computation $\Phi_{\prepare}(\pi_i, vk)$ for a batch $\{ \pi_i \}_{i=1}^N$ of Plonk proofs and verify the batch with a single invocation of $\Phi_{\pairing}$. ### Batched Pairings We recall a well-known method for batching pairing checks. Given pairs $\{ (\ell_i, r_i )_{i=1}^N \}$, a verifier can confirm that $\forall i, \,\Phi_\pairing(\ell_i, r_i) = \text{true}$ by randomly sampling scalars $\{ \gamma_i \}_{i=1}^N$ and checking that the random linear combination satisfies the pairing check: $$\Phi_\pairing \left( \sum_{i=1}^N \gamma_i \ell_i \,\,, \sum_{i=1}^N \gamma_i r_i\right) = \texttt{true}$$ A standard argument shows that the soundness error introduced is $N/|\mathbb{F}|$. One may take $\gamma_1 = 1$ without any loss of soundness. The verifier performs only 1 pairing instead of $N$, at the cost of a multi-scalar multiplication of size $N-1$. We note that this argument is only helpful in the case that all individual pairing checks pass; if $\Phi_\pairing(\ell_i, r_i) = \texttt{false}$ for some $i$ then the batched pairing check will fail without telling us which points $(\ell_i, r_i)$ are responsible for the failure. ### Aggregating $\plonk$ Proofs The batched pairing check is helpful when we wish to verify a batch $\{ \pi_1, \ldots, \pi_N \}$ of Plonk proofs. The naive verifier computes $\Phi_\pairing \circ \Phi_\prepare(\pi_i, vk)$ for each $i$. Instead one can prepare each proof $$ (\ell_i, r_i) = \Phi_\prepare(\pi_i, vk) $$ and then compute $\Phi_\pairing$ on a random linear combination. We denote by $\Phi_\rlc$ the (non-deterministic) function that computes a random linear combination: $$\Phi_\rlc \left( \{\ell_i, r_i \}_{i=1}^N \right) = \left( \sum_{i=1}^N \gamma_i \ell_i, \sum_{i=1}^N \gamma_i r_i \right)$$ where the scalars $\gamma_i$ are randomly sampled. Then the batched $\plonk$ verifier can be written as $$\Phi_\bverify(\pi_1,\ldots \pi_N; vk) = \Phi_\pairing \circ \Phi_\rlc \circ \left( \oplus_1^N \Phi_\prepare(\pi_i, vk) \right)$$ In other words, the batched verification algorithm is 1. Individually prepare each $\pi_i$ for pairing using $\Phi_\prepare$ 2. Combine the results using $\Phi_\rlc$ 3. Perform a single pairing check $\Phi_\pairing$ The batched verifier performs 1 pairing check, independent of $N$. This is a significant savings. However, the first two steps are still quite costly: Step 1 requires 16 elliptic curve scalar multiplications per proof and Step 2 requires two multiscalar multiplications of size $N-1$. Thus the verifier's work is still linear in the number of proofs aggregated. To achieve a verifier which is sub-linear in $N$, one may introduce an untrusted Aggregator who will compute a SNARK of Steps 1 and 2. The practicality of this approach depends on the Aggregator's computational costs, which can be high, as these steps require roughly $18N$ scalar multiplications (as well as $O(N)$ hashing and finite field arithmetic). The current approach is to write a single aggregator circuit that takes all $N$ proofs as input and produces a SNARK of Steps 1 and 2. This results in approximately $N \log N$ prover time for the Aggregator and high memory overhead. Rather than writing a high fan-in aggregator circuit, we propose to take advantage of the incremental nature of $\Phi_\rlc$ and use recent advances [Nova] in folding schemes to reduce the Aggregator's cost. As we will see below, this approach enables "streaming aggregation" with $O(N)$ prover time and memory overhead independent of $N$. ### IVC and Folding Schemes Incrementally-verifiable computation (IVC) is a cryptographic framework that produces proofs of correct execution of “long running, repetitive” computations such that a verifier can efficiently verify the correct execution of any prefix of the computation. The IVC scheme can be illustrated as below: ![](https://i.imgur.com/R6IYrdQ.png) Informally, IVC works as follows: for a fixed function $F$, - The computation starts at some initial state $s_0$ - Each state $s_i$ is computed from the previous state $s_{i-1}$ and some private witness $\omega_i$ as $s_i = F(s_{i-1}, \omega_i)$ - Each step is accompanied by a proof $\pi_i$ that the computation is correct up to that point. More precisely, for $i=1, \ldots, n$, at step $i$ prover outputs $s_i$ and a a proof $\pi_i$ that prover has a witness $(s_{i-1}, \omega_i, \pi_{i-1})$ such that $F(s_{i-1}, \omega_i) = s_i \land V(vp, (i-1, s_0, s_{i-1}), \pi_{i-1}) = \texttt{yes}$ IVC can be made particularly efficient using folding schemes. Intuitively, a **folding scheme** for a relation $\mathcal{R}=(x, w)$ (where $x$ is the public input, and $w$ is the witness) is a protocol that reduces the task of checking two instances in $\mathcal{R}$ to the task of checking a single instance in $\mathcal{R}$. For example, Nova [[2](#Reference)] and Sangria [[3](#Reference)] are folding schemes that are designed for relaxed R1CS and relaxed $\plonk$ish arithmetisation. Here, we use the folding scheme as a blackbox to get a highly efficient IVC scheme. Please refer to Nova [[2](#Reference)] for formal definitions. ## $\neustar$: Infinite $\plonk$ Proof Aggregation using Folding-based IVC Above we proposed a batched verification algorithm consisting of three basic steps: 1. Individually prepare each $\pi_i$ for pairing using $\Phi_\prepare$ 2. Combine these results using $\Phi_\rlc$ 3. Perform a single pairing check $\Phi_\pairing$ $\neustar$ replaces Steps 1 and 2 by an incremental computation $F$ that computes $(\ell_i, r_i) = \Phi_\prepare(\pi_i)$, randomly samples $\gamma_i$, and adds $(\gamma_i\ell_i, \gamma_i r_i)$ to an accumulated random linear combination. Folding-based IVC offers two great advantages to the Aggregator. Firstly, the Aggregator's memory requirements are only as large as a single step of the computation, and therefore do not grow with the number $N$ of proofs it aggregates. Secondly, the computation time of a single step can be made independent of $N$, resulting in an Aggregator with $O(N)$ asymptotic proving time. Note that treating Steps 1 and 2 as an incremental computation removes the need to know in advance the number of proofs being aggregated. The Aggregator can simply repeat the incremental computation until it is told to stop. In this way we achieve "infinite" $\plonk$ proof aggregation! ### $\neustar$'s Incremental Step We describe here in detail the incremental step that carries us from state $s_{i-1}$ to $s_i = F(s_{i-1}, \omega_i)$. That is, we define the function $F$. The running state $s_i$ is a tuple $(L_i, R_i, H_i)$, where $L_i$ and $R_i$ are $\mathbb{G}_1$ points and $H_i$ is the state of some permutation sponge (*e.g.* Poseidon sponge). The state starts with $L_0 = R_0 =0$ and $H_0$ the sponge's default state. The state is updated as follows: Given previous state $s_{i-1} = (L_{i-1}, R_{i-1}, H_{i-1})$ and auxiliary data $\omega_i = (\pi_i, x_i)$, where $\pi_i$ is a Plonk proof corresponding to instance $(C, x_i)$, $F$ does the following: 1. $(\ell_i, r_i) \leftarrow \Phi_\prepare(\pi_i)$ 2. $\gamma_i \leftarrow_r \mathbb{F}$ (in practice $\gamma_i$ may be the hash of some protocol transcript) 3. $(L_i, R_i) \leftarrow (L_{i-1} + \gamma_i \ell_i \,, R_{i-1} + \gamma_i r_i)$ 4. $H_i \leftarrow H_{i-1}.\absorb(x_i)$ In other words, the running state $s_i$ contains the pair $(L_i, R_i)$, where $$\begin{align*} L_i &= \sum_{j=1}^i \gamma_j \ell_j \\ R_i &= \sum_{j=1}^i \gamma_j r_j \end{align*}$$ along with a commitment $H_i$ to public inputs up to that point. ### $\neustar$ scheme The overall scheme allows a verifier to perform a batched verification of $N$ Plonk proofs $\pi_1, \ldots \pi_N$ for instances $x_1, \ldots x_N$ with the help of an aggregator as follows: 1. Aggregator performs folding-based IVC to produce $\Pi = (s_N, \pi_\ivc)$ 2. Verifier checks $\pi_\ivc$ 3. Verifier checks that $H_N$ is the accumulated output of $x_1, \ldots x_N$ 4. Verifier computes $\Phi_\pairing(L_N, R_N)$ We note that the public check in Step 3 is $O(N)$, meaning that we have not achieved a sub-linear Verifier. In fact, the question of what it means to verify the public inputs to an aggregated proof is quite delicate, as we will now explain. ### Verification Modes What does it mean to verify the public inputs to an aggregated proof? That depends on the Verifier's intention and how the verification result will be consumed. We sketch here two "verification modes" that illustrate the problem. **Mode 1: Correctness of Public Input List** - Aggregated proof is verified by [??? single end consumer] - Verification should imply correctness of the list of public inputs - Consumption of result happens all at once - Verifier needs to perform linear work to verify pub. in.s, but only once This mode is of interest in a scenario such as the final tally of some ZK voting scheme, where the proofs that accompany votes need to be checked all at once. In this scenario the "end consumer" cannot avoid doing work proportional to the number of aggregated proofs, so we simply aim to minimize this linear work. The above construction that aggregates public inputs using a hash sponge is a good candidate for this mode of verification. **Mode 2: Correctness of Public Input Accumulator** - Aggregated proof is verified by S.C. - Public input accumulator goes on chain - Verification should imply correctness of pub. in. acc. - "Consumer" of verification result responsible for checking a given public input belongs to the accumulator - Consumption of result happens at unpredictable times, maybe even never for some of the inputs - On-chain Verifier skips Step 3 (or perhaps needs to replace it with a constant-size step?) In this case, the Verifier of the aggregated proof should do only what is necessary to establish the correctness of the public input accumulator; it is not the Verifier's responsibility to demonstrate that any particular $x_i$ belongs to this accumulator. In this scenario we wish to minimize the work required to later verify that any given public input belongs to the accumulator. We illustrate below how parallel folding techniques can achieve this. ### $\neustar$ Aggregator Cost A quick estimate on this Aggregator's cost can be inferred from the Nova prover's cost and the Plonk prover's cost. To compute each incremental step of the computation, the Aggregator computes the augmented function $F'$ which performs $F$ and verifies correctness of a folding. We have $|F'| \approx |F| + 2 \cdot \mathbb{G}$, where $\mathbb{G}$ is the size of a single scalar multiplication in the group used for vector commitments. Note that because we chose above to accumulate the public inputs with a hash sponge, $|F|$ is independent of $N$. The cost to produce the final IVC proof depends on the size of $|F|$ and choice of proving scheme, and is therefore also independent of $N$. Thus our aggregator incurs $O(N)$ cost to perform IVC. In more detail, the size $|F|$ is dominated by: 1. 16 scalar multiplications + 5 hashes to compute $\Phi_{\prepare}(\pi_i)$ 2. A hash to compute challenge scalar $\gamma_i$ (F.S. heuristic) 3. 2 scalar multiplications by $\gamma_i$ 4. A hash to absorb $x_i$ into public input sponge (We have ignored the modular arithmetic involved in $\Phi_\prepare$) Overall this means the cost of $F'$ is about 20 scalar multiplications and 8 hashes ($F'$ has 1 more hash than $F$ coming from the folding). The Aggregator will incur this cost $N$ times and then must compute a ZKP for the final instance of $F'$. The cost to compute the final proof depends on the choice of NIZK proving scheme, in particular on how it represents elliptic curve arithmetic. For example, assuming $\pi_i$ is a proof for KZG-Plonk over Bn254, $F'$ will perform arithmetic in Bn254's $\mathbb{G}_1$ curve. Something like a Spartan prover over the Grumpkin curve would minimize the cost of proving the final instance of $F'$. We emphasize that although the cost of computing the final proof is incurred only once and is therefore independent of $N$, practically speaking the size of witnesses to $F'$ is quite important since the Prover must generate and fold these in each step. Therefore optimizations that reduce the number of constraints needed to represent $F'$ (such as avoiding wrong-field arithmetic) do improve the Prover's asymptotic performance. ### Parallel Folding ### Cost Analysis and Comparison > Todd: A thought on nonnative arithmetic: > The work in $\Phi_\prepare$ is dominated by 16 scalar multiplications in Bn254's $\mathbb{G}_1$; this will cost fewer constraints if we work over Grumpkin, whose scalar field is $\mathbb{G}_1$'s base field. Grumpkin is not a pairing curve, so you could not use Plonk-KZG. But you don't have to use Plonk/Sangria for the folding. For example, we could represent $\Phi_\prepare$ using an R1CS circuit over Grumpkin and use Nova instead. Note that $\Phi_\prepare$ does involve some arithmetic in $F_r = \mathbb{G}_1$ scalar field, so there would still be some non-native arithmetic, but it's field arithmetic instead of EC arithmetic. Now we compare with the cost of aggregating $n$ proofs: > Shumo: the table below is not accurate, just an expression of intent | Aggregation Scheme | (aggregation) prover cost | verifier cost | Proof Size |streaming aggregate | | -------------------| -------- | ------------------ | -------- | ------------| | random lincomb + outside checking | $O(n)$ | $3n \cdot MSM^{G_1} + 1P$ (Shumo: not sure if this is right) | | yes | |Scroll Halo2 Agg | $O(n \; \log n)$ | $\plonk$.verify $+ 1P$ | | No | |$\neustar$| O(n) | $\plonk$.verfiy $+ 3 \; \textsf{mul} + 1P$| | yes | # Draft: PCD (BCLMS20) $\newcommand{\PI}{\mathsf{PI}}\newcommand{\in}{\text{inner}}\newcommand{\acc}{\text{acc}}$ Split out the instance data $\PI$ from the rest of the "witness" which in this case means inner proof $\pi_\in$. So $\Phi_\prepare(vk, \PI, \pi) = (\ell, r, [\PI])$ where now we intentionally exclude the public inputs from the $\ell, r$ points. So (notation conflict: $r$ here is *not* the linearization polynomial) \begin{align*} \ell &= [W_\z] + u [W_{\z\omega}] \\ r &= {\z} \cdot [W_\z]_1 + u \z \omega \cdot [W_{\z\omega}]_1 + F - E \\ E &= -r_0 + v \bar a + v^2 \bar b + \ldots \\ r_0 &= -\alpha^2 L_1(\z) + \ldots \end{align*} That is, we omit the public inputs from $r_0$, as in UniPlonk. Unlike UniPlonk, we *do not* include the public input commitment $[\PI]$ in $D$. We instead say that \begin{equation*} \Phi_{\pairing}(\ell, r, [\PI]) = \langle \ell, [x]_2 \rangle \stackrel{?}{=} \langle r + [\PI], [1]_2 \rangle \end{equation*} In order for this to be correct you need to make the same transformation to a Plonk proof as we do in UniPlonk, so I guess implicitly that's been done on $[W_\z]$. The point is that the pairing check can be batched over multiple instances with a random linear combination, as before. But now we accumulate the instance data separately: \begin{equation*} L = \sum_{i=1}^N \gamma^{i-1} \ell_i, R = \sum_{i=1}^N \gamma^{i-1} r_i, \PI = \sum_{i=1}^N \gamma^{i-1} \PI_i \end{equation*} Can we treat this as a split accumulator? Where $\acc.x = \PI$, $\acc.w = (L,R)$ The main thing I don't understand is whether this eliminates the need to prove that you correctly computed $\ell,r$ from $\pi_\in$. That's a non-trivial amount of scalar multiplication, so it would still be somewhat expensive. ## References 1. Scroll's Halo2 Aggregation Circuit: https://github.com/scroll-tech/halo2-snark-aggregator 2. Nova: Recursive Zero-Knowledge Arguments from Folding Schemes [paper](https://eprint.iacr.org/2021/370.pdf) 3. Sangria: A folding scheme of PLONK [paper](https://github.com/geometryresearch/technical_notes/blob/main/sangria_folding_plonk.pdf)