Hash-based PCD from the BOIL accumulator

# Hash-based PCD from the BOIL accumulator In this article, we present a batching technique for Reed--Solomon codewords. Using this batching protocol, we propose a construction of a practically efficient accumulation scheme, which we call BOIL. Our accumulation scheme can be initiated with an arbitrary hash-based SNARK, leading to a new class of PCD constructions. This note's results were originally presented at [zkSummit12](https://www.youtube.com/watch?v=on7Rvty00VM) and later published as a [research paper](https://eprint.iacr.org/2024/1993.pdf). ## Introduction Proof-carrying data (PCD) is a cryptographic primitive that enables the dynamic compilation of a distributed computation, where each message is augmented by a short proof verifying that some local condition is met. An ideal PCD construction achieves this goal with minimal additional computation or communication overhead. A specific instance of PCD is an incrementally verifiable computation (IVC), a cryptographic primitive that allows one to produce a proof of the correctness of an iterative computation in an incremental fashion. **PCD via Recursive Composition** Early constructions of IVC and PCD relied on recursive composition, where each prover attaches a proof to each outgoing message, attesting that all previous local conditions have been met. To achieve this, the constructions included a circuit representation of the verifier algorithm within a recursive statement. As a result, PCD was primarily limited to SNARKs -- proof systems where the verifier's description is asymptotically smaller than the overall size of the statement being verified. **PCD via Accumulation** Later works showed that it is often possible to avoid the complete recursive proof verification at each iteration. Instead, the most expensive part of the verification can be accumulated outside the recursive statement, and checked only once at the very end of the distributed computation. Verifying proof of correctness of the accumulation is usually much simpler than a full proof verification. The folding approach can be considered an extreme version of accumulation schemes for constructing IVC. While a very small recursive overhead distinguishes constructions using folding, the accumulation approach allows for using a broader class of NARKs as a basic block of the PCD scheme. However, accumulation and folding schemes usually require the use of additively homomorphic commitment schemes. Therefore, this entire series of results is not compatible with (S)NARKs relying on code-based polynomial commitments, which are not homomorphic. However, even when using a full in-circuit verifier for recursion, such SNARKs perform well and are widely used in practice. The work of Bünz et al. [[2]](#2) was the first attempt in trying to close this gap, showing how to build an accumulation scheme by postponing the code proximity test of such non-homomorphic constructions to the end. However, their work has significant shortcomings, discussed below in more detail, which our new construction addresses. For this work, we're considering SNARKs that are built using a compilation of a Polynomial-IOP and a proximity proof for a linear code. This approach is widely adopted in the industry. Our reasoning: - No need for a trusted ceremony and trapdoors to generate parameters. - Such systems do not require public key assumptions. - The system parameters can be adjusted to alter the efficiency trade-off between the prover and the verifier. - Protocols for code proximity tests, for example FRI, keep all arithmetic in the same field for prover and verifier, so no field switching is necessary. This means, in particular, that the recursive composition of proofs does not require the use of cycles of elliptic curves. First, we present the Oracle batching protocol, which efficiently reduces claims about the proximity of multiple functions to a Reed Solomon code to a similar claim for a single function. The new oracle batching protocol opens the doors for more efficient aggregation and recursive composition of proofs. Informally speaking, instead of performing a low-degree proximity test on each iteration of recursion, we can use the oracle batching protocol, thereby deferring this expensive test to the very end of the computation. ## Oracle Batching Protocol Informally, the primary goal of FRI (or any other proximity test) is to distinguish, by querying a function $f: D \rightarrow \mathbb{F}$ at a few locations, whether $f$ coincides with the evaluation of some polynomial of degree less than $d < |D|$ on the domain $D$, or whether it is far in relative Hamming distance from the evaluation of any low-degree polynomial. The batched version of the protocol allows one to perform a proximity test for several functions $f_1, \dots, f_n$ at once. However, to build more efficient IVC/PCD schemes, we need a reduction for multiple functions that does not require a full-fledged proximity test. Informally, this allows the incremental aggregation of new functions in batches, during the long-running computation. The construction of Bünz et al. [[2]](#2) achieves the goal above by combining $f_1, \dots, f_n : D \to \mathbb{F}$ into a single function $f_\sf{new} : D \to \mathbb{F}$ by means of a random linear combination, with randomness supplied by the verifier. Informally, if an honestly computed $f_\sf{new}$ is $\delta$-close to $\sf{RS}[\mathbb{F}, D, d]$, then with high probability so are $f_1, \dots, f_n$. The prover sends an oracle for the purported $f_\sf{new}$. The verifier ensures consistency between $f_1, \dots, f_n$ and $f_\sf{new}$ by means of spot checks at random points in $D$. This process can be iterated upon, and at the end the verifier simply checks proximity of the final aggregated function to $\sf{RS}[\mathbb{F}, D, d]$. However, this approach has two significant limitations. One is that it requires to work within the unique decoding radius $(1-\rho)/2$ of the Reed--Solomon code, where $\rho$ is the code rate, to prevent ambiguous decoding which thwarts the security proof of the accumulator. This leads to parameters that are inefficient in practice: the smaller the decoding radius, the more queries are necessary to guarantee the same level of soundness in the spot checks phase. The other issue is that the distance guarantee degrades with subsequent iterations of this procedure. Informally, what happens is that we can guarantee that - $f_\sf{new}$ is $\delta$-close to $\sf{RS}[\mathbb{F}, D, d]$, due to the final proximity test, - the honest folding of $f_1, \dots, f_n$ is $\delta$-close to $f_\sf{new}$, due to the spot checks. Overall we can just guarantee that $f_1, \dots, f_n$ are $2\delta$-close to the code. More generally, over $k$ recursion steps, we can only guarantee $k\delta$-closeness to the code. From a theoretical point of view, this means that we can only build bounded-depth accumulation from this technique (although [[2]](#2) shows that this is enough to build PCD). From a practical point of view, this makes us use even more expensive choices of the code's proximity parameter, namely $(1-\delta)/2k$. Our first contribution is a batching oracle protocol heavily inspired by the recent STIR protocol [[1]](#1). Suppose that we have $n$ functions $f_1, \dots, f_n : D \to \mathbb{F}$, for some $D \subset \mathbb{F}$. The verifier has oracle access to them, and the prover claims that $f_i$ is $\delta$-close to a low-degree polynomial, for all $i=1, \dots, n$. The protocol will produce a function $f_\sf{new} : D' \to \mathbb{F}$, for some other $D' \subset \mathbb{F}$, and reduce the $n$ claims above to the single claim that $f_\sf{new}$ is $\delta$-close to a low-degree polynomial. ![1](https://hackmd.io/_uploads/BJFM28sNyx.png) The two problems we discussed earlier are solved if we try to use a round from STIR protocol. Namely, it adds an out-of domain check and quotienting to disambiguate the decoding of $f_\sf{new}$. This is very reminiscent of the technique often used to turn FRI into a polynomial commitment scheme. Informally, this ensures that, with high probability, there is only one valid choice of codeword, even if we are in the list decoding regime. Moreover, it also gets rid of the distance decay issue: we can directly prove that if a (possibly dishonest) $f_\sf{new}$ is $\delta$-close to the code, then $f_1, \dots, f_n$ are $\delta$-close to the code. This allows us to work with the much better proximity parameter $1-\sqrt{\rho}$ (or $1 - \rho$, under the usual Reed--Solomon decoding conjectures). We provide a more formal description of the protocol below. ![2](https://hackmd.io/_uploads/H15T_zdVJe.png) This protocol does not require a proximity test, so it is much more efficient than batched FRI. We think it can be a base block for various designs such as a split accumulation or a linear combination scheme for PCS. ## PCD via BOIL Accumulator In the context of IVC/PCD constructions, an important performance metric is the recursive overhead. In the case of SNARKs based on the low-degree proximity test for Reed--Solomon codes, the recursive statement usually includes many hash invocations. The paper [[4]](4) formally shows how such SNARKs can be used to construct IVC/PCD. The described approach requires representing the verifier as a circuit, in which the lion's share is occupied by hash operations, even when using SNARK-friendly hash functions such as Poseidon. The proximity test makes the largest contribution to this size: $$ O(\lambda \cdot c_{\delta} \log^2 d) \text{ hash invocations,} $$ where $\lambda$ is a security parameter, $d$ is the corresponding polynomial degree and $c_{\delta}$ depends on the proximity parameter $\delta$. Accumulation without Homomorphism [[2]](2) proposes an alternative construction of the IVC/PCD based on a linearity test and exclusively symmetric assumptions. Their linearity test has a better asymptotic estimate of $O(\lambda \cdot c_{\delta} \log d)$ hash invocations. However, its use entails the problem of distance decay: with each iteration, the provable distance increases, so a much smaller proximity parameter $\delta/T$ must be used, where $T$ is the number of iterations. For practically significant parameters, this negates the advantage since $O(\lambda \cdot c_{\delta/T} \log d)$ can be comparable to or even larger than $O(\lambda \cdot c_{\delta} \log^2 d)$. Our technique, which we call BOIL (**B**atching **O**racles for **I**OPP from **L**inearity), allows us to get the best of both worlds -- logarithmic complexity and the absence of the distance decay problem. It is a theoretical and practical improvement over previous results. Our starting point is the framework of $\delta$-correlated IOPs from [[3]](3). These are IOPs equipped with an oracle that checks for $\delta$-correlated agreement. Informally, $n$ functions $f_1, \dots, f_n : D \to \mathbb{F}$ are in $\delta$-correlated agreement if each of them agrees with some codeword in $\sf{RS}[\mathbb{F}, D, d]$ in at least a $(1-\delta)$ fraction of the points in $d$, and the agreement subset of $D$ is the same for all $f_i$. This model captures many protocols (Plonky2, RISC Zero, Redshift, Placeholder) and gives us the flexibility we need. Roughly speaking, in such systems, the Prover and the Verifier execute the IOP protocol, and at the very end, to make a final decision, the Verifier checks the correlated agreement for the functions $f_1, \dots, f_n$ (**note** that we can use the Oracle batching protocol to reduce this test to a proximity test for a single function $f_\sf{new}$). In this case, the Verifier can be represented by two algorithms: - $\sf{Verifier_{PolyID}}$ checks the correctness of $\pi_{\sf{PolyID}}$, the short part corresponding to the execution of the IOP, - $\sf{Verifier_{PRX}}$ checks the correctness of the proximity proof for the function $f_\sf{new}$. As discussed above, running a proximity test for the Reed--Solomon code $\sf{RS}[\mathbb{F}, D, d]$ within a circuit is expensive, due to the large amount of hashes. Thus, we want to defer as much to the end of the recursive computation as we can, to minimize the size of the in-circuit verifier in the final IVC/PCD construction. To achieve this, we recursively aggregate all these proximity checks for Reed--Solomon into a single one, which will be performed only once at the end, hence our interest in an oracle batching technique. This results in a construction with an accumulator verifier that does not need to run a full proximity test on each accumulation step. Thus, we avoid the issue of running a full proximity test inside of a circuit. ![3](https://hackmd.io/_uploads/H1NPrSdNke.png) **Proof size** The size of the PCD proof for many practical accumulation-based constructions is comparable to the size of the recursive circuit. Generally, this requires provers to send significant amounts of data and complicates parallelization. Due to the fact that we use the the Oracle batching protocol twice, in the accumulation scheme and in the proof system, we get a much smaller size of the PCD proof. In particular, for a Plonkish circuit represented by a matrix of $N\times M$ elements, the proof size will be proportional to one column, i.e., $N$ elements. In practice, this results in a smaller proof by several orders of magnitude. **Comparison with Folding-based constructions**. Nova-like folding schemes allow efficient IVC constructions but are based on the use of R1CS or CCS arithmetization. Our construction allows the use of the Plonkish arithmetization, which is very expressive and widely used in practice. Another important difference is that the EC-based IVC/PCD designs use cycles of curves. This makes formalization somewhat more complicated. Moreover it requires the use of non-native arithmetic, which significantly increases the recursive overhead. By avoiding cycles of curves, we get a conceptually more straightforward construction. **A note on concurrent work.** Our results were first presented at [zkSummit12](https://www.youtube.com/watch?v=on7Rvty00VM). Between this event and the publication of this text, two independent papers [[5]](5) and [[6]](6) were published, which also use the idea of a STIR-based oracle batching protocol, and provide a rigorous formalization. A detailed comparison of these papers is beyond the scope of this note, but we would like to note that our work uses different building blocks, and our final design has some explicit practical optimizations of independent interest. ## References <a id="1">[1]</a> Arnon, G. et al. (2024) ‘STIR: Reed–Solomon Proximity Testing with Fewer Queries’. Available at: https://eprint.iacr.org/2024/390 <a id="2">[2]</a> Bünz, B. et al. (2024) ‘Accumulation without Homomorphism’. Available at: https://eprint.iacr.org/2024/474 <a id="3">[3]</a> Block, A.R. et al. (2023) ‘Fiat-Shamir Security of FRI and Related SNARKs’. Available at: https://eprint.iacr.org/2023/1071 <a id="4">[4]</a> Chiesa, A., Ojha, D. and Spooner, N. (2019) ‘Fractal: Post-Quantum and Transparent Recursive Proofs from Holography’. Available at: https://eprint.iacr.org/2019/1076 <a id="5">[5]</a> Bünz, B. et al. (2024) ‘Arc: Accumulation for Reed--Solomon Codes’. Available at: https://eprint.iacr.org/2024/1731 <a id="6">[6]</a> Szepieniec, A. (2024) ‘DEEP Commitments and Their Applications’. Available at: https://eprint.iacr.org/2024/1752