On Distributed FRI-based Proof Generation

# On Distributed FRI-based Proof Generation In this note we discuss distributed proof generation using $\mathtt{FRI}$-based SNARKs. We show that despite many limitations, this is quite possible in practice and allows for significant efficiency gains in proof generation. ## What makes FRI attractive for practical use? $\mathtt{FRI}$-based proof systems have a number of attractive features. - Transparent Setup. No need for a trusted ceremony and trapdoors to generate parameters. - PQ-security. Such systems are based on symmetric cryptography, which is resistant to quantum computer attacks (technically symmetric cryptography is susceptible to quantum attacks, although the impact is much smaller than in DLog-based or Factorization-based schemes). - Prover/Verifier Tradeoffs. The system parameters can be adjusted to alter the efficiency trade-off between prover and verifier. - Small fields. The system allows the use of small finite fields (as small as 32 bits or even less), unlike DLog-based schemes that rely on elliptic curves and require much larger fields. This leads to greater efficiency, as arithmetic operations in smaller fields are less computationally expensive compared to those in larger fields. - No cycles of curves. $\mathtt{FRI}$ keeps all arithmetic in the same field for prover and verifier, so no field switching is necessary. This means, in particular, that the recursive composition of proofs does not require the use of cycles of elliptic curves. Moreover, real-world systems show excellent performance. It is not surprising that they are widespread in practice [[6]](#6) [[7]](#7) [[8].](#8) ## Why do we care about proof generation scaling? Modern SNARKs have proven their effectiveness in a variety of practical applications. They are increasingly used to provide guarantees of the correctness of complex and even distributed computations. This has become possible due to the development of advanced techniques such as recursive proof aggregation and distributed proof computation. Let's consider, for example, a multi-chain system (an architecture where multiple independent blockchains operate in parallel). Let's assume for simplicity that blocks are produced uniformly over time. Our task is to obtain state transition proof, i.e., proof that the new state is correct *if the previous state was correct*. Then, such a proof should also guarantee the correctness of the state transition for each individual chain. There are several known approaches to solving this problem. ![zkevm-Page-11.drawio (7)](https://hackmd.io/_uploads/BkqMXAcR0.png) - We can send all the necessary data to a dedicated prover machine, the purpose of which is to produce a SNARK proof for the state transition of the entire system. Such an approach would be extremely inefficient and time-consuming. Moreover, for really large computations this is simply not possible due to the high memory requirements of $\mathtt{FRI}$-based SNARKS. - On the other hand, we can produce proofs for individual chains in parallel (independently). Thus, the total proof length will grow linearly with the number of individual chains. Of course, all these proofs can be combined using recursive composition, but this entails additional overhead. - Finally, we can take into account the fact that at the time of proof generation, all the necessary data is already available. Then we can use aggregation methods or distributed proof generation. In this note we are taking a closer look at the latter approach. Let the computation to be proved be represented as a circuit $\mathcal{C}$. It is assumed that this computation can be split between $M$ provers (for example for the case when $\mathcal{C}$ can be divided into data-parallel sub-circuits), where the prover $P_i$ deals with circuit $\mathcal{C}_i$, and $|\mathcal{C}_i| = |\mathcal{C}|/M$. ![zkevm-Page-11.drawio (8)](https://hackmd.io/_uploads/SJFLQ09RA.png) In the described context, modern distributed proof generation protocols allow obtaining a single proof with a small overhead in the complexity of individual provers and in the size of the final proof. However, there is an additional communication cost. Surprisingly, the amount of data transferred can also be made negligibly small using additively homomorhic commitment schemes such as KZG (see for example Pianist [[4]](#4)). Unfortunately, $\mathtt{FRI}$-based systems lack this capability, so the corresponding distributed systems have communication complexity linear in the circuit size (see for example deVirgo [[3]](#3)). Next, we show how the communication complexity can be reduced for the case of distributed computation of the $\mathtt{FRI}$-based proof, while maintaining the prover efficiency and only slightly increasing the proof size. ## Distributed FRI $\mathtt{FRI}$ (Fast Reed-Solomon IOPP) is an IOP of Proximity for Reed-Solomon codes. The primary goal of $\mathtt{FRI}$ is to distinguish, by querying function $f$ at few locations, whether $f$ is a codeword $\left( f \in \mathsf{RS} [\mathbb{F}_p, {D}, d] \right)$ or $f$ is far in relative Hamming distance from all codewords in $\mathsf{RS} [\mathbb{F}_p, {D}, d]$. The protocol works by recursively reducing the problem of checking proximity to a code $\mathsf{RS} [\mathbb{F}_p, {D}, d]$ to the one of checking proximity to a code $\mathsf{RS} [\mathbb{F}_p, {D'}, d']$ with smaller parameters $D’ \ll D, d' \ll d$. $\mathtt{FRI}$ protocol consists of a Commit phase and a Query phase. The Commit phase is executed in $r$ rounds. In round $i < r$, the $\mathtt{Verifier}$ samples a random value from $\mathbb{F}_p$. If $i < r$, the $\mathtt{Prover}$ responds with an oracle to a specially constructed polynomial $f_{i+1}^{\mathtt{FRI}}$ defined on the domain $D_{i+1}$, such that $|D_{i+1}| = |D_i|/2^{s_i} = |D|/2^{s_1 + \dots+s_i}$. If $i = r$, $\mathtt{Prover}$ responds with the coefficients of a polynomial of degree $\left( d/2^{\sum_{i=1}^{r}s_i}\right)$. The Query phase is executed in $\lambda$ rounds. In each round, the $\mathtt{Verifier}$ queries oracles and checks the consistency of the answers, which reduces the probability of a dishonest $\mathtt{Prover}$ to a negligible value. To check the consistency, the $\mathtt{Verifier}$ asks for the values of the polynomial $f_i^{\mathtt{FRI}}$ at $2^{s_{i+1}}$ points. We intentionally omit the details of the $\mathtt{FRI}$ protocol, you can find a full description here [[1]](#1). The __batched__ version of the $\mathtt{FRI}$ protocol allows one to estimate the closeness to the $\mathsf{RS}$ code of each of the functions $f_1, \dots, f_L$. To do this, $\mathtt{Verifier}$ samples and sends a random $\theta \in \mathbb{F}_p$ to $\mathtt{Prover}$. The latter calculates a linear combination $$ F = \theta^1 \cdot f_1 + \theta^2 \cdot f_2 + \dots + \theta^L \cdot f_L $$ Then the $\mathtt{Prover}$ and $\mathtt{Verifier}$ execute the regular version of the $\mathtt{FRI}$ protocol for testing $F$. The only difference is that each time $F$ is queried at point $x$, $\mathtt{Verifier}$ also performs a consistency check: $$ F(x) = \theta^1 \cdot f_1(x) + \theta^2 \cdot f_2(x) + \dots + \theta^L \cdot f_L(x). $$ If $\mathtt{Verifier}$ accepted in the end of the protocol, then all $f_i$ are close to $\mathsf{RS}$ (see details, for example, in [[2]](#2)). Let us now consider a distributed setting in which $n=L \cdot M$ polynomials of degree at most $d$ are divided among $M$ $\mathtt{Provers}$. The output of the protocol should be a proof that all the polynomials $f_1, \dots, f_n$ are close enough to the $\mathsf{RS}$ code. A naive approach would be to send all polynomials in plaintext to one of the provers, who in turn would execute the batched $\mathtt{FRI}$ protocol. Let us consider how this problem can be solved more efficiently. $\mathtt{Provers}$ generate $\mathsf{Merkle~ Tree}$ commitments to their polynomials and send them to the $\mathtt{Master~Prover}$ (this function can be performed by one of the provers, for simplicity we will assume that this is a separate entity). The $\mathtt{Master~Prover}$ gets a random challenge $\theta$ from the $\mathtt{Verifier}$ and broadcasts it among all $\mathtt{Provers}$. Now each $\mathtt{Prover}$ $P_i$, knowing its number $i$, can generate its "part of the linear combination" and send it to the $\mathtt{Master~Prover}$. $$ F_i = \sum_{j=1}^{L}\theta^{(i-1) \cdot L + j}f_{(i-1) \cdot L + j}. $$ $\mathtt{Master~Prover}$ runs a regular version of $\mathtt{FRI}$. However, it cannot provide polynomial evaluations and Merkle auth paths for consistency proofs in the query phase of the protocol for individual polynomials, so it asks the corresponding $\mathtt{Prover}$ for each of them. Below we provide a more formal description of the protocol, in which interaction with the $\mathtt{Verifier}$ is eliminated using a Fiat-Shamir transformation. --- ### $\mathtt{Prover}~ P_i$ __Input:__ $f_{i_0 + 1}, \dots, f_{i_0 + L}$, where $i_0 = (i-1) \cdot L$ __Output__ -- 1. Commit to the polynomials $$ c_{i_0+j} = \mathsf{MT.Commit}(f_{i_0+j}), j \in [L]$$ 1. Send commitments $[c_{i_0+j}]_{j=1}^{L}$ to the $\mathtt{Master~prover}$ 1. Get a random challenge $\theta$ from the $\mathtt{Master~prover}$ 1. Compute a partial linear combination $$F_i = \sum_{j=1}^{L}\theta^{i_0 + j}f_{i_0 + j}$$ 1. Send polynomial $F_i$ to the $\mathtt{Master~prover}$ 1. Get random challenges $x_1, \dots, x_{\lambda}$ from the $\mathtt{Master~prover}$ 1. Send corresponding evaluations and Merkle proofs $$f_{i_0 + j}(x_k), \pi_{i_0 + j, k} = \mathsf{MT.Open}(c_{i_0+j}, f_{i_0 + j}(x_k)), \text{ for } j \in [L], k \in [\lambda]$$ --- ### $\mathtt{Master~Prover}$ __Input:__ -- __Output:__ $[c_i]_{i=1}^{n}, \text{transcript of } \mathtt{FRI}$ protocol, $[\pi_{i,k}]_{i=1}^{n}$ for all $k \in [\lambda]$ 1. Collect all the commitments $[c_i]_{i=1}^{n}$ 1. Derive challenge $\theta$ and broadcast it to the $\mathtt{Provers}$ 1. Collect all the polynomials $[F_i]_{i=1}^{M}$ 1. Compute a linear combination $$ F = \sum_{i=1}^{M}F_i $$ 1. Execute batched $\mathtt{FRI}$ with polynomial $F$. For every query $x_k, k \in [\lambda]$ collect all the Merkle proofs $[\pi_{i,k}]_{i=1}^{n}$. 1. (Optional) Check the honesty of each $P_j$. To do this, check the correspondence of the Merkle proofs $[\pi_{i,k}]_{i=(j-1)\cdot L+1}^{j\cdot L}$ and the value of $F_j(x_k)$ for every $k \in [\lambda]$). --- The resulting protocol is robust in the sense that the $\mathtt{Master~prover}$ can easily detect malicious behavior of individual $\mathtt{Provers}$. This is achieved due to the fact that the partial linear combinations $F_i$ declared by the $\mathtt{Provers}$ belong to $\mathsf{RS}$ code by definition, so they are low-degree polynomials. Then the forgery with high probability is detected at step 6. This property is especially useful in a distributed SNARK generation process, as it allows for the implementation of economic measures to penalize participants for misbehaving. It is easy to see that the time complexity of the $\mathtt{Provers}$ is $O(d\log d)$. The communication cost (this is communication between provers and master prover) is dominated by sending a partial linear combination, whose size is $O(d)$ elements from $\mathbb{F}_p$. Moreover, due to the single execution of the $\mathtt{FRI}$ protocol, the length of the proof is asymptotically smaller than the sum of the lengths of independent proofs. | Approach | $\mathtt{Prover}$'s time | Comm. |Proof Length | | -------- | ------------------------ | ----- | --- | | Independent Proofs | $O(\|D\|\log\|D\|)$ | $O(\log^2\|D\|)$ | $O(M(\log^2 \|D\| + \log \|D\|))$ | Single Prover | $O(\|D\|\log\|D\|)$ | $O(L\cdot \|D\|)$ | $O(\log^2 \|D\| + M \cdot \log \|D\|))$ | Distributed $\mathtt{FRI}$ | $O(\|D\|\log\|D\|)$ | $O( \|D\|)$ | $O(\log^2 \|D\| + M \cdot \log \|D\|))$ Let us estimate the size of the resulting proof more precisely. __Practical Optimisation.__ The elements that we request from the oracle to the function $f_i^{\mathtt{FRI}}$ (coset elements) can be stored in one leaf of the Merkle Tree as a result of a hash chain computation, then at each round of the Query phase only one request to each function $f_i^{\mathtt{FRI}}$ will be executed. Given the above optimisations, let $d = 2^k, k \in \mathbb{Z}$ and each leaf contains $2^s, s \ll k$ elements, then the height of the Tree is equal to $k-s$ and the inclusion proof will consist of $\left(2^s + k -s\right)$ hash codes (where the term $2^s$ appears from the need to calculate the value of a tree leaf). Then the total number of hash codes that need to be verified in one round of the Query phase is equal to $$ \mathtt{C}_{\mathtt{query}} = \sum_{i=1}^{r}\left(2^{s_i}+\log_2{|D|}-\sum_{j=1}^{i}s_j\right). $$ __Practical optimisation.__ During the execution of the protocol, the same evaluation points are used for all the functions. Therefore, combining several functions into one Merkle tree (a batch) significantly reduces the number of authentication paths. For example, if a batch combines $L$ functions defined over domain $|D|$ and the size of the coset is $2^s$, then to verify the authentication path it will be necessary to calculate $2^s \cdot L + (\log_2|D|-s)$ hash codes. Let functions $f_1, \dots, f_L$ be split into $B$ batches: $$ \mathcal{B}_i = \langle f_1, \dots, f_{b_i}\rangle, $$ where $b_i = |\mathcal{B}_i |.$ Then each consistency check requires calculation of the following number of hash codes $$\mathtt{C}_{\mathtt{cons}} = B\cdot (\log_2|D|-s_1)+2^{s_1}\cdot L$$ So the total number of hash code calculations is \begin{align*} \mathtt{C}_{\mathtt{FRI}} &= \lambda \cdot(\mathtt{C}_{\mathtt{query}} + \mathtt{C}_{\mathtt{cons}}) = \\ &= \lambda \cdot \left(\sum_{i=1}^{r}\left(2^{s_i}+\log_2{|D|}-\sum_{j=1}^{i}s_j\right) + B\cdot (\log_2|D|-s_1)+2^{s_1}\cdot L \right). \end{align*} Let's assume for simplicity that $s_1 = \dots = s_r = 1, d=2^{k}, \rho=1/2^R, |D| = 2^{k+R}, r=k$. Then $$ \mathtt{C}_{\mathtt{FRI}} = \lambda \cdot \left(\frac{1}{2}k(k+3)+kR + B\cdot (k+R-1)+2 L \right). $$ Now we can see that the total length (in hash codes) of $M$ independent $\mathtt{FRI}$-proofs is $$ M \cdot \lambda \cdot (\mathtt{C}_{\mathtt{query}} + \mathtt{C}_{\mathtt{cons}}). $$ While the distributed version has a length $$ \lambda \cdot \mathtt{C}_{\mathtt{query}} + M \cdot \lambda \cdot \mathtt{C}_{\mathtt{cons}}. $$ These results show that the reduction in the number of hash codes is significant not only asymptotically but also for practically used parameters. For example, for $M = 10, R = 2, L = 15, B = 1, \lambda = 80$ we present a plot of the dependence of the proof size for different circuit sizes. ![zkevm-Page-11.drawio](https://hackmd.io/_uploads/rJ40avekkx.png) __Remark.__ We should note that the protocol described in this section has much in common with the approach described in [[5]](#5). The authors also rely on the observation that batched $\mathtt{FRI}$ can be used to aggregate proofs for different circuits. However, their main goal is to minimize the size of the proof - for this they propose a technique for splitting a single circuit and creating a single commitment for all polynomials (the latter is impossible in our case). The main difference is that we consider this technique in a distributed setting, which introduces additional restrictions and requires optimizing the communication cost. ### FRI-based SNARKs The protocol described above can be applied to any $\mathtt{FRI}$-based SNARK. In particular, to any SNARK obtained from the compilation of Polynomial IOP and $\mathsf{RS}$ proximity test. Let the computation be represented by a rectangular matrix $(d+1) \times L$ (we can think of it as an execution trace). The correctness of such a computation is guaranteed by a set of constraints for the matrix cells. By assigning to the $i$-th column of the matrix a unique polynomial $f_i$ of degree at most $d$, checking such constraints can be reduced to checking the polynomial equality $$ \mathcal{P}(f_1(X),\dots,f_L(X)) = 0, $$ for some multivariate polynomial $\mathcal{P}$. The last equality is checked using a randomly chosen evaluation point $\zeta$: if $f_1(\zeta)=v_1,\dots,f_L(\zeta)=v_L$ and $\mathcal{P}(v_1,\dots,v_L)=0$ then it is satisfied with high probability. Finally, to check the evaluations $v_1, \dots, v_L$ claimed by the $\mathtt{Prover}$, a proximity test to the $\mathsf{RS}$ is performed for the functions $$(f_i(X)-v_i)/(X-\zeta),~ i = 1, \dots, L.$$ The final proof $\pi$ can be roughly represented as a concatenation of the proof $\pi_{\mathtt{PIOP}}$ for the Polynomial IOP and the proof for the proximity test $\pi_\mathtt{FRI}$. Moreover, the latter has a predominant size (70-90 percent of the total size $|\pi|$). Thus, the simplest use case for the described protocol is to replace the conventional $\mathtt{FRI}$ execution with its distributed version. This allows us to reduce the size of the proof by up to 50 percent in our experiments. It is extremely important to emphasize that the amount of data transferred remains quite acceptable for distributed computing. For example, for $k=23$ and $|\mathbb{F}_p|\approx 2^{32}$, the partial linear combination $F_i$ takes only 32 MB of memory, which can be easily transferred between geographically distributed $\mathtt{Provers}$. Merkle tree proofs for consistency checks in such a setting take up only a few dozen kilobytes. ![zkevm-Page-11.drawio (10)](https://hackmd.io/_uploads/SkJhNCqRR.png) ## Conclusion Most protocols for distributed SNARK generation rely on the use of homomorphic cryptographic mechanisms, which allows them to achieve low communication costs. The $\mathtt{FRI}$ protocol is based on the use of hash functions and does not allow homomorphic operations on data. It is not surprising that the widespread opinion is that distributed generation of $\mathtt{FRI}$-based proof requires the transfer of data comparable to the size of the entire circuit. In this note, we considered a specific technique that allows using a network of distributed $\mathtt{Provers}$ to jointly generate $\mathtt{FRI}$-based proof. At the same time, the amount of transmitted data for practically significant parameters remains several orders of magnitude smaller than the size of the entire circuit. | Approach | $\mathtt{Prover}$'s time | Comm. | Proof Length | Transp. | | -------- | ------------------------ | ----- | ------------ | --- | | deVirgo | $\widetilde{O}(\|\mathcal{C}\|/M)$ | $O(\|\mathcal{C}\|)$ | $O(\log^2\|\mathcal{C}\|)$ |Yes | |Pianist | $\widetilde{O}(\|\mathcal{C}\|/M)$ | $O(M)$ | $O(1)$ | No |Distributed $\mathtt{FRI}$-based SNARK | $\widetilde{O}(\|\mathcal{C}\|/M)$ | $O(\|\mathcal{C}\|/L)$ | $O(\log^2\|\mathcal{C}\|)$ | Yes ## References <a id="1">[1]</a> Eli Ben-Sasson et al. “Fast Reed-Solomon Interactive Oracle Proofs of Proximity”. In: (2018). In collab. with Michael Wagner. Artwork Size: 17 pages Medium: application/pdf Publisher: Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik GmbH, Wadern/Saarbruecken, Germany, 17 pages. doi: 10.4230/LIPICS.ICALP.2018.14. url: http://drops.dagstuhl.de/opus/volltexte/2018/9018/ (visited on 02/16/2023). <a id="2">[2]</a> Eli Ben-Sasson et al. Proximity Gaps for Reed-Solomon Codes. Publication info: Published elsewhere. Minor revision. FOCS 2020. 2020. url: https://eprint.iacr.org/2020/654 (visited on 09/04/2024). <a id="3">[3]</a> Tiancheng Xie et al. zkBridge: Trustless Cross-chain Bridges Made Practical. Oct. 1, 2022. arXiv: 2210 . 00264[cs]. url:http://arxiv.org/abs/2210.00264 (visited on 07/07/2024). <a id="4">[4]</a> Tianyi Liu et al. Pianist: Scalable zkRollups via Fully Distributed Zero-Knowledge Proofs. Publication info: Published elsewhere. Minor revision. S&P 2024. 2023. url: https://eprint.iacr.org/2023/1271 (visited on 06/24/2024). <a id="5">[5]</a> Albert Garreta et al. On amortization techniques for FRI-based SNARKs. Publication info: Preprint. 2024. url: https://eprint.iacr.org/2024/661 (visited on 09/04/2024). <a id="6">[6]</a> Polygon. url: https://polygon.technology. <a id="7">[7]</a> Risc0. url: https://risc0.com. <a id="8">[8]</a> StarkNet. url: https://www.starknet.io.