Polynomial products in Plonkish circuits

## Polynomial products in Plonkish circuits Steve Thakur and Kapil Shenvi Pause, *Mozak* We describe a protocol for multiple univariate polynomial products of varying degrees in a circuit. Each product $$f_{1,j}(X)\cdot f_{2,j}(X) = f_{1,2,j}(X)$$ entails $\deg(f_{1,2,j})+1$ gates in a 2-input 1-output Plonkish circuit. The equations of the form $$ f_{1,j}(X)\cdot f_{2,j}(X) \;=\; f_{1,2,j}(X) \mod q_{j}(X)) $$ with $q_{j}(X)$ preprocessed and $f_{1,j}(X)$, $f_{2,j}(X),$ $f_{1,2,j}(X)$ of degree $\leq \deg(q_{j})-1$ (which are more relevant to our use cases), entail $\deg(q_{j})$ gates. This note is a brief summary of a soon-to-be-released paper which will include security proofs and more details. The two primary use cases are: 1. field extension arithmetic in a circuit The protocol allows us to multiply two elements in the degree $k$ extension $\mathbb{F}_{p^k}$ with $k$ gates. 2. non-native arithmetic The protocol makes everything except the range checks (handled by a lookup argument) relatively cheap. It also allows working with limbs small enough to be subjected to range checks with lookups, without further decomposition. ### The polynomial product protocol Consider a system of polynomial equations \begin{equation} \tag{1} a_j(X)\cdot b_j(X) = c_j(X) + q_j(X)\cdot e_j(X)\;\;\;,\;\;j\in\mathcal{J} \end{equation} in the circuit, for some index set $\mathcal{J},$ the polynomials $q_j(X)$ preprocessed and with the degrees of $a_j(X)$, $b_j(X)$, $c_j(X)$ $\leq \deg(q_j)-1$. Our proposed approach is to defer all of these product checks and prove them alongside the Snark rather than encoding them within the circuit. This approach sidesteps the need for generating randomness within the circuit, which is a bit expensive. *** #### Lagrange and monomial bases We have used the monomial basis as in [Th23a] for "field-agosticism", i.e. we would prefer for these techniques to work over all sufficiently large primes rather than just FFT-friendly ones. In particular, we would like the techniques to be compatible with the base fields of the widely used curves Ed25519, secp256k1, BN254 and BLS12-381. The polynomial products - the only superlinear part of the proof generation - are computed via multimodular FFTs when the 2-adicity of the scalar field is low. But everything in this note can be efficiently translated to a Lagrange basis when the scalar field has high enough 2-adicity. In fact, Remco Bloemen's blogpost ([Bl22]) describes an efficient protocol to succinctly link the monomial and Lagrange bases *in FFT-friendly fields*. *** For simplicity, we assume it is a 2-input 1-output circuit with left/right/output polynomials $L(X)$, $R(X)$, $O(X).$ In the monomial basis, the wire polynomials are given by \begin{align} L(X)=\sum\limits_{k=0}^{n-1} L_k\cdot X^k\;\;,\;\; R(X)=\sum\limits_{k=0}^{n-1} R_k\cdot X^k\;\;,\;\;O(X)=\sum\limits_{i=0}^{n-1} O_k\cdot X^k \end{align} where the $L_k,R_k,O_k$ are the wire values and $n$ is the circuit size (i.e. the number of gates). We also commit to the polynomial \begin{align} E(X)=\sum\limits_{t=0}^{n-1} E_t\cdot X^t \end{align} consisting of the coefficents of the polynomials $e_j(X)$ (occuring at the appropriate indices) described in Equation (1). The polynomial $Q(X)$ corresponding to the polynomials $q_j(X)$ can be pre-processed. With the *appropriate choices of index sets and maps*, we can interpret the system of equations (1) as: \begin{equation} \tag{1} \Big(\sum\limits_{i\in {{\rho}}^{-1}(j)} {\tt{Coef}}(L\;,\;i)\cdot X^{{{\boldsymbol{\tau}}}(i)} \Big)\cdot \Big(\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(R\;,\;i)\cdot X^{{{\boldsymbol{\tau}}}(i)} \Big) = \sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(O\;,\;i)\cdot X^{{{\boldsymbol{\tau}}}(i)} + \Big(\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(E\;,\;i)\cdot X^{{{\boldsymbol{\tau}}}(i)} \big)\cdot \Big(\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(Q\;,\;i)\cdot X^{{{\boldsymbol{\tau}}}(i)}\Big) \;\;\;\forall\; j\in\;\mathcal{J} \end{equation} where $\mathcal{I}$, $\mathcal{J}\;\subseteq [0,\;n-1]$ are preprocessed $${{\boldsymbol{\rho}}}:\mathcal{I}\longrightarrow \mathcal{J}\;\;\;,\;\;\;{{\boldsymbol{\tau}}}:\mathcal{I}\longrightarrow \mathcal{K}:=[0,\;\max(q_j)_{j\in\mathcal{J}}] $$ are preprocessed maps. The protocol we now describe allows a Prover to succinctly show that all $|\mathcal{J}|$ of these polynomial product equations hold. In the monomial basis as in [Th23a], we commit to the index sets via their *indicator polynomials* (analogous to Plonk's selector polynomials) \begin{align} \chi_{_{\mathcal{I}}}(X):= \sum\limits_{i\in \mathcal{I}} X^i\;\;,\;\; \chi_{_{\mathcal{J}}}(X):= \sum\limits_{j\in \mathcal{J}} X^j \end{align} and to the maps via the preprocessed polynomials \begin{align} C_{\boldsymbol{\rho}}(X):= \sum\limits_{i\in \mathcal{I}} \boldsymbol{\rho}(i)\cdot X^i\;\;,\;\;C_{\boldsymbol{\tau}}(X):= \sum\limits_{i\in \mathcal{I}} \boldsymbol{\tau}(i)\cdot X^i \end{align} For a polynomial $f(X)$ and an element $\gamma\in\mathbb{F}_p,$ we define the "flattened" polynomial $$ {f}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X):= \sum\limits_{j\in \mathcal{J}} \Big[\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(f\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)}\Big]\cdot X^j \;=\; \sum\limits_{i\in \mathcal{I}} {\tt{Coef}}(f\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)}\cdot X^{{{\boldsymbol{\rho}}}(i)}. $$ The key part of the protocol is verifiably sending a commitment to this polynomial. The Schwartz-Zippel lemma implies that the system of equations (1) reduces to showing that for a challenge $\gamma$ generated after the commitments to the wire polynomials and the polynomial $E(X)$ have been sent, the equations \begin{equation} \tag{2} \Big(\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(L\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)} \Big)\cdot \Big(\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(R\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)} \Big) \;=\; \sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(O\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)} + \Big(\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(E\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)} \Big)\cdot \Big(\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(Q\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)}\Big) \;\;\;\forall\; j\in\;\mathcal{J} \end{equation} holds. We construct and commit to the flattened polynomials \begin{align} {L}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X):= \sum\limits_{j\in \mathcal{J}} \Big[\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(L\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)}\Big]\cdot X^j \;=\; \sum\limits_{i\in \mathcal{I}} {\tt{Coef}}(L\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)}\cdot X^{{{\boldsymbol{\rho}}}(i)}\end{align} and ${R}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X)$, $\;O^{{\boldsymbol{\tau}},\gamma}(X)$, $\; {Q}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X)$, $\;{E}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X).$ With the KZG polynomial commitment scheme, these commitments entail MSMs of length $|\mathcal{J}|,$ which - in practice - will be substantially smaller than the circuit size. Our task boils down to: - **The tricky part:** *verifiably* sending commitments to the flattened polynomials ${L}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X)$, $\;{R}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X),$ $\;{O}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X),$ $\;{Q}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X),\;$ $\;{E}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X),\;$ - **The easy part:** proving the Hadamard product equation \begin{equation}L^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}\odot R^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X) = O^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X) + Q^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}\odot E^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X) \end{equation} on these committed polynomials We now describe the mechanism whereby for a committed polynomial $f(X),$ the purported commitment to ${f}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X)$ can be proven to be well-formed. We first note that the map $$ \mathbb{F}_p[X]\;\longrightarrow\;\mathbb{F}_p[X]\;\;\;,\;\;\; f\mapsto f^{{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}}$$ is linear, i.e. for polynomials $f_t(X)$ and an element $\lambda\in\mathbb{F}_p,$ $$\big[\sum\limits_{i}\lambda^t\cdot f_t(X) \big]^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma} \;=\;\sum\limits_{t} \lambda^t\cdot f_t^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X) $$ and hence, the proofs of the well-formations of ${L}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X)$, $\;{R}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X),$ $\;{O}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X),$ $\;{Q}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X),\;$ $\;{E}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X)\;$ can be batched. By Schwartz-Zippel, the claim \begin{align} {f}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X) = \sum\limits_{j\in \mathcal{J}} \Big[\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(f\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)}\Big]\cdot X^j \;\;\in\;\;\mathbb{F}_p[X] \end{align} boils down to showing that \begin{equation} \tag{3} {f}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(\zeta)= \sum\limits_{j\in \mathcal{J}} \Big[\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(L\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)}\Big]\cdot \zeta^j\;\;\in\;\; \mathbb{F}_p \end{equation} for a challenge $\zeta$ randomly and uniformly generated after the commitments to ${f}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X)$ has been sent. The left hand side of equation (3) is merely the evaluation of ${f}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X)$ at $X = \zeta.$ To lucidly describe the right hand side of (3), we note that \begin{align} \sum\limits_{j\in \mathcal{J}} \Big[\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(f\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)}\Big]\cdot \zeta^j \;=\; \sum\limits_{i\in\mathcal{I}} {\tt{Coef}}(f\;,\;i)\cdot \gamma^{{\boldsymbol{\tau}}(i)}\cdot \zeta^{{\boldsymbol{\rho}}(i)}. \end{align} We define the polynomials \begin{align} S_{{{\boldsymbol{\tau}}},\gamma}(X):= \sum\limits_{i\in \mathcal{I}} \gamma^{{{\boldsymbol{\tau}}}(i)}\cdot X^i \;\;,\;\;S_{{\boldsymbol{\rho}},\zeta}(X):= \sum\limits_{i\in \mathcal{I}} \zeta^{{\boldsymbol{\rho}}(i)}\cdot X^i. \end{align} These polynomials are obtained by applying the maps ${\boldsymbol{\tau}}$ and ${\boldsymbol{\rho}}$ to the polynomials $\chi_{_{\mathcal{I}}}(\gamma\cdot X)$ and $\chi_{_{\mathcal{I}}}(\zeta\cdot X)$ respectively. The polynomials $\chi_{_{\mathcal{I}}}(\gamma\cdot X)$ and $\chi_{_{\mathcal{I}}}(\zeta\cdot X)$, in turn, are merely twists of the preprocessed indicator polynomial $\chi_{_{\mathcal{I}}}(X)$ by $\gamma$ and $\zeta$ respectively. Thus, the commitments to these polynomials $S_{{{\boldsymbol{\tau}}},\gamma}(X),$ $S_{{{\boldsymbol{\rho}}},\zeta}(X)$ can be verifiably sent using the "generalized permutation" aka self-map argument in [Th23b], which we summarize below. This is but a minor generalization of Plonk's permutation argument that exploits the *underlying sets* rather than the multisets of committed vectors. Thus, equation (3) now boils down to succinctly showing that \begin{align} {f}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(\zeta) \;=\; f(X)\circ \big[ S_{{{\boldsymbol{\rho}}},\zeta}(X) \odot S_{{{\boldsymbol{\tau}}},\gamma}(X) \big], \end{align} where $\circ$ and $\odot$ denote the dot product and the Hadamard product respectively (which we already know how to deal with). *** #### Summary of added costs to the Prover As always, the Hadamard product proofs are batched. The additional Prover work entailed by this protocol is dominated by the two length $|\mathcal{I}|$ MSMs required to commit to the polynomials $$ \sum\limits_{i\in\mathcal{I}} [\alpha+\zeta^{\boldsymbol{\rho}(i)}+\delta\cdot i]^{-1}\cdot X^i \;\;\;,\;\;\; \sum\limits_{i\in\mathcal{I}} [\alpha+\gamma^{\boldsymbol{\tau}(i)}+\delta\cdot i]^{-1}\cdot X^i $$ (for challenges $\alpha,\delta$) arising from the proofs of the well-formation of $S_{\boldsymbol{\rho},\zeta}(X)$ and $S_{\boldsymbol{\tau},\gamma}(X).$ The Prover also needs to commit to $$ \sum\limits_{j\in\mathcal{J}} [\alpha+\zeta^{j}+\delta\cdot j]^{-1}\cdot X^j \;\;\;,\;\;\; \sum\limits_{k\in\mathcal{K}} [\alpha+\gamma^k+\delta\cdot k]^{-1}\cdot X^k $$ but these entail short MSMs of length $|\mathcal{J}|,\;|\mathcal{K}|$ respectively. The commitment to $S_{{{\boldsymbol{\rho}}},\zeta}\odot S_{{{\boldsymbol{\tau}}},\gamma}(X)$ can be computed in runtime $O(\frac{|\mathcal{J}|}{\log(\mathcal{J}|)}+\frac{|\mathcal{K}|}{\log(\mathcal{K}|)})$ because of the structure of this polynomial (Pippenger). The protocol also adds two (multimodular) FFTs of length $2\cdot |\mathcal{I}|$ and $2$-$4$ of length $2\cdot |\mathcal{J}|$ (depending on the precise use case) to the proof generation. In the Lagrange basis - when available (i.e. in a setting with high enough 2-adicity) - the costs are a bit lower. The Verifier needs to compute a few more scalar multiplications but no additional pairings. *** ### Optimizations in case of repeated moduli polynomials For some use cases, the polynomials $q_j(X)$ will usually be the same for several indices $j\in \mathcal{J}.$ We define a map $\boldsymbol{\eta}:\mathcal{J}\longrightarrow \mathcal{R}$ and label the distinct $q_j(X)$ by $q_r(X)\;\;$ ($r\in\mathcal{R}$). Furthermore, we define a fourth map $\boldsymbol{\mu}:\mathcal{J}\longrightarrow \mathcal{K}.$ Rather than proving each equation $$a_j(X)\cdot b_j(X)\equiv c_j(X)\mod{q_j(X)}\;\;,\;\;j\in \mathcal{J}\;,$$ it suffices to show that the $|\mathcal{R}|$ equations $$ \sum\limits_{j\in \boldsymbol{\eta}^{-1}(r)}\lambda^j\cdot [a_j(X)\cdot b_j(X)-c_j(X)]\equiv 0\mod{q_r(X)} \;\;,\;\;r\in\mathcal{R}$$ hold. We note that this technique is easily exploitable in the case of *native* field extension arithmetic in the circuit, where the individual polynomials $$q_j(X)^{-1}\cdot [a_j(X)\cdot b_j(X)-c_j(X)]\Big]$$ need not have their coefficients subjected to range checks. The technique does not seem as useful when dealing with non-native arithmetic, where we do need range checks on the coefficients. For a challenge $\lambda,$ we compute the quotients $$ e_{r,\lambda}(X):= q_r(X)^{-1}\cdot \Big[\sum\sum\limits_{j\in \boldsymbol{\eta}^{-1}(r)}\lambda^j\cdot [a_j(X)\cdot b_j(X)-c_j(X)]\Big]\;\;,\;\;r\in\mathcal{R}$$ and commit to the correspoding polynomial $E_{\lambda}(X).$ For a second challenge $\gamma$, we construct and commit to the polynomials \begin{align} {L}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X):= \sum\limits_{j\in \mathcal{J}} \Big[\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(L\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)}\Big]\cdot X^j \;=\; \sum\limits_{i\in \mathcal{I}} {\tt{Coef}}(L\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)}\cdot X^{{{\boldsymbol{\rho}}}(i)}\end{align} \begin{align} {R}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X):= \sum\limits_{j\in \mathcal{J}} \Big[\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(R\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)}\Big]\cdot X^j \;=\; \sum\limits_{i\in \mathcal{I}} {\tt{Coef}}(R\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)}\cdot X^{{{\boldsymbol{\rho}}}(i)} \end{align} \begin{align} {O}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X):= \sum\limits_{j\in \mathcal{J}} \Big[\sum\limits_{i\in {{\boldsymbol{\rho}}}^{-1}(j)} {\tt{Coef}}(O\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)}\Big]\cdot X^j \;=\; \sum\limits_{i\in \mathcal{I}} {\tt{Coef}}(O\;,\;i)\cdot \gamma^{{{\boldsymbol{\tau}}}(i)}\cdot X^{{{\boldsymbol{\rho}}}(i)} \end{align} \begin{align} h_{\lambda}(X):= {L}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(\lambda\cdot X)\odot {R}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(X)-{O}^{{\boldsymbol{\rho}},{\boldsymbol{\tau}},\gamma}(\lambda\cdot X) \end{align} \begin{align} {Q}^{{\boldsymbol{\eta}},{\boldsymbol{\mu}},\gamma}(X):= \sum\limits_{j\in \mathcal{R}} \Big[\sum\limits_{j\in {{\boldsymbol{\eta}}}^{-1}(r)} {\tt{Coef}}(Q\;,\;j)\cdot \gamma^{{{\boldsymbol{\mu}}}(j)}\Big]\cdot X^j \;=\; \sum\limits_{j\in \mathcal{J}} {\tt{Coef}}(Q\;,\;j)\cdot \gamma^{{{\boldsymbol{\mu}}}(j)}\cdot X^{{{\boldsymbol{\eta}}}(j)} \end{align} \begin{align} {E}_{\lambda}^{{\boldsymbol{\eta}},{\boldsymbol{\mu}},\gamma}(X):= \sum\limits_{j\in \mathcal{R}} \Big[\sum\limits_{j\in {{\boldsymbol{\eta}}}^{-1}(r)} {\tt{Coef}}({E}_{\lambda} \;,\;j)\cdot \gamma^{{{\boldsymbol{\mu}}}(j)}\Big]\cdot X^j \;=\; \sum\limits_{j\in \mathcal{J}} {\tt{Coef}}({E}_{\lambda} \;,\;j)\cdot \gamma^{{{\boldsymbol{\mu}}}(j)}\cdot X^{{{\boldsymbol{\eta}}}(j)} \end{align} The first 4 entail MSMs of length $|\mathcal{J}|$, while the last two entail MSMs of length $|\mathcal{R}|$. It now boils down to showing that the polyomial \begin{align} h_{\lambda}^{{\boldsymbol{\eta}},{\boldsymbol{\mu}},1}(X):= \sum\limits_{j\in\mathcal{J}} {\tt{Coef}}(h_{\lambda}\;,\;j)\cdot X^{\boldsymbol{\eta}(j)} \end{align} coincides with $$ {E}_{\lambda}^{{\boldsymbol{\eta}},{\boldsymbol{\mu}},\gamma}(X)\odot {Q}^{{\boldsymbol{\eta}},{\boldsymbol{\mu}},\gamma}(X). $$ *** ### The self-map aka "generalized permutation" argument The protocol described above hinges on the *self-map* argument from [Th23B]. This is a minor generalization of Plonk's permutation argument, which replaces permutations with possibly non-injective maps. We briefly describe it here to make this document a bit more self-contained. It hinges on the following lemma: **Lemma:** *** Let $\mathcal{I}$, $\mathcal{J}$ be subsets of the interval $[0,\;N-1]$ and let ${\boldsymbol{\rho}}:\;\mathcal{I}\;\longrightarrow\;\mathcal{J}$ be a map with domain $\mathcal{I}$ and image $\mathcal{J}.$ For vectors $\mathbb{V}$, $\widetilde{\mathbb{V}}$ of $\mathbb{F}_p$-elements, the following statements are equivalent with overwhelming probability: (i). $\widetilde{\mathbb{V}}[{\boldsymbol{\rho}}(i)]\;=\; \mathbb{V}[i] \;\;\;\;\;\;\forall\; i\in \mathcal{I}.$ (ii). For a randomly generated element $\delta\in \mathbb{F}_p$, the vectors $$ \big[ \mathbb{V}[i] + \delta\cdot {\boldsymbol{\rho}}(i)\;:\;i\in \mathcal{I} \big] \;\;\;\text{and}\;\;\;\big [\widetilde{\mathbb{V}}[j] + \delta\cdot j\;:\;j\in \mathcal{J} \big] $$ have the same underlying set. (iii). There exists a sequence $m_j\in \mathbb{F}_p$ $(j\in\mathcal{J})$ such that for a randomly generated element $\delta\in \mathbb{F}_p$, the equation \begin{equation} \sum\limits_{i\in\mathcal{I}}\; \big( X + \mathbb{V}[i] + \delta\cdot {\boldsymbol{\rho}}(i) \big)^{-1}\;\;=\;\; \sum\limits_{j\in\mathcal{J}} \;m_j\cdot \big( X + \widetilde{\mathbb{V}}[j] + \delta\cdot j\big) ^{-1} \end{equation} of rational functions holds. (iv). There exists a sequence $m_j\in \mathbb{F}_p$ $(j\in\mathcal{J})$ such that for randomly generated elements $\delta,\alpha\;\in \;\mathbb{F}_p$, the equation \begin{equation} \tag{4}\sum\limits_{i\in\mathcal{I}}\; \big( \alpha + \mathbb{V}[i] + \delta\cdot {\boldsymbol{\rho}}(i) \big) ^{-1}\;\;=\;\; \sum\limits_{j\in\mathcal{J}}\; m_j\cdot \big( \alpha + \widetilde{\mathbb{V}}[j] + \delta\cdot j\big) ^{-1}\;\;\in\;\;\mathbb{F}_p \end{equation} holds. *** Equation (4) can be efficiently demonstrated using the Hadamard and dot product protocols in the monomial basis as in [Th23a]. The polynomial $\sum\limits_{j\in\mathcal{J}} m_j\cdot X^j$ is preprocessed. *** ### Applications *** ### Field extensions A field extension $\mathbb{F}_{p^k}$ is identified with the quotient $\mathbb{F}_{p}[X]/(\mathrm{irr}_k(X))$ for some irreducible polynomial $\mathrm{irr}_k(X)\in \mathbb{F}_{p}[X]$ of degree $k.$ For efficiency, this $\mathrm{irr}_k(X)$ is usually chosen to be of the form $X^k - u$ for some $u\in \mathbb{F}_{p}^*$ that is not a $k_0$ power in $\mathbb{F}_{p}^{*}$ for any divisor $k_0$ of $k$ larger than $1$. A product $a\cdot b = c$ of $\mathbb{F}_{p^k}$ elements can be rephrased as \begin{align*} a(X)\cdot b(X) = c(X)+\mathrm{irr}_k(X)\cdot e(X) \end{align*} with $\deg(a),\deg(b),\deg(c)\leq k-1$, $\deg(e)\leq k-2$. Thus, we can encode $\mathbb{F}_{p^k}$ multiplications in $\mathbb{F}_{p}$ using $k$ gates using this protocol described. *** ### Non-native arithmetic The primary motivation for this protocol was to find ways to deal with non-native arithmetic (in particular, non-native products) in a Plonkish circuit. It is an attempt to adapt the ideas from [KPS18], [Kub22], [Kub23] etc. to a Plonkish setting so that: - it would be compatible with custom gates - it could be merged with **cq** ([EFG22]) with very large lookup tables if used with KZG Plonk in an FFT-friendly field - would allow us to use the distributed Plonk proof system *Pianist* ([LXZ+23]) with a large cluster if need be - could be used in Plonky2 Goldilocks circuits in conjunction with a lookup protocol such as LogUp Let $p$ denote the native modulus and let $q$ ($\neq p$) denote the non-native modulus. We need to emulate $\mathbb{F}_q$ operations in $\mathbb{F}_p.$ Fix a base $2^B$. We will need lookups for range checks w.r.t. the interval $[0,\;2^{B}-1]$ for the limbs. The decomposition of elements with respect to the base $2^{B}$ boils down to weighted sums and can be handled as in [Th23b] (or by other means). Consider a *non-native* product decomposition $a\cdot b = c + q\cdot k$ in a circuit defined over the native field $\mathbb{F}_p.$ Let $d$ be a small integer denoting the number of limbs the non-native elements will be decomposed into. *ModeExp* ([Kub23]) and [Kub22] use $d = 4.$ This entails $64$-bit limbs which must be further decomposed for range checks. The flip side is that this requires working with small polynomials (degree $\leq 6$ for the product) and hence, the evaluations of the polynomials in the circuit are relatively cheap. Since our approach does not require polynomial evaluation *within the circuit*, there does not appear to be a downside to using a larger $d$. In fact, we could use $d$ to be large enough so that the resulting limbs are smaller than the size of the lookup tables used for the range checks. Suppose we have the decompositions $a = \sum_{i=0}^{d-1} a_i\cdot 2^{B\cdot i}$, $\;\;b = \sum_{i=0}^{d-1} b_i\cdot 2^{B\cdot i}$ with the limbs $a_i,b_i\;\in\;[0,\;2^{B}-1].$ These can be associated with the univariate polynomials \begin{align} a(X) = \sum_{i=0}^{d-1} {a_i}\cdot X^i\;\;,\;\;b(X) = \sum_{i=0}^{d-1} {b_i}\cdot X^i \end{align} of degree $\leq 3.$ For $\mathbb{F}_q$-multiplication, we want to compute $c(X)$ given by $$c(X) = \sum\limits_{k=0}^{2(d-1)} {c_k}\cdot X^i\;\; , \;\;c_k:= \sum\limits_{0\leq i\leq {2(d-1)}} a_i\cdot b_{k-i}$$ followed by reduction modulo $q.$ Adopting the approach of *ModExp* ([Kub23]), we rephrase the non-native integer products $a_j\cdot b_j = c_j\pmod{q}$ in $\mathbb{F}_p$ as polynomial equations of the form \begin{equation} a_j(X)\cdot b_j(X) = c_j(X) + q(X)\cdot k_j(X) + (X-2^{2B})\cdot e_j(X). \end{equation} indexed by the elements $j$ of an index set $\mathcal{J}.$ This allows us to perform the multiplication and the modular reduction in a single step. The coefficients of $a_j(X)$, $b_j(X)$, $c_j(X)$, $e_j(X)$ and $k_j(X)$ will need range checks via an appropriate lookup protocol (the options for which depend on the polynomial commitment scheme and the $2$-adicity of $\mathbb{F}_p$). *** **Acknowledgements** We thank our colleague Matthias Görgens for helpful feedback and in particular, noting that the protocol could be useful for field extension arithmetic in a circuit. *** **References** [Bl22] Remco Bloemen, *NTT transform argument*, https://xn--2-umb.com/22/ntt-argument/ [EFG22] L. Eagen, D. Fiore, and A. Gabizon. ***cq**: Cached quotients for fast lookups*, https://eprint.iacr.org/2022/1763 [GWC19] A. Gabizon, Z. Williamson, O. Ciobotoru, *PLONK: Permutations over Lagrange-bases for Oecumenical Non-interactive arguments of Knowledge*, https://eprint.iacr.org/2019/953 [Hab22] U. Habock, *Multivariate lookups based on logarithmic derivatives*, https://eprint.iacr.org/2022/1530 [KPS18] A. Kosba, C. Papamanthou, E. Shi, *xJsnark: A Framework for Efficient Verifiable Computation*, https://akosba.github.io/papers/xjsnark.pdf [Kub22] Ivo Kubjas, *Notes about optimizing emulated pairing*, https://hackmd.io/@ivokub/SyJRV7ye2 [Kub23] ____, *Modexp*, https://hackmd.io/3cpvkONzQl-TF5Oj8VXEHQ [LXZ+23] Liu et al, *Pianist: Scalable zkRollups via Fully Distributed Zero-Knowledge Proofs*, https://eprint.iacr.org/2023/1271 [Th23a] S. Thakur, *A flexible Snark via the monomial basis*, https://eprint.iacr.org/2023/1255 [Th23b] ____, *An optimization of the addition gate count in Plonkish circuits*, https://eprint.iacr.org/2023/1264 *** ### Multimodular FFTs A consequence of using the monomial basis is that the only superlinear computations the Prover performs are products of polynomials in $\mathbb{F}_p[X].$ More precisely, the only superlinear computation is the sum of twisted polynomial products arising from the batched Hadamard product equation. The simplest and the most efficient way to do so is to use multimodular FFTs (terminology as in the NTL library). We fix highly $2$-adic primes $p_1$, $p_2$ such that $p< \min(p_1,p_2)$ and the product $p_1\cdot p_2$ is larger than $p^2\cdot M$ where $M$ is an upper bound on the degrees of any polynomials to be multiplied. Given polynomials $f_1(X),\; f_2(X),\;\in\;\mathbb{F}_p[X]$, a the Prover computes the product $f_1(X)\cdot f_2(X)$ as follows. 1. Lift the polynomials $f_1(X),\; f_2(X)$ to characteristic zero. Denote these polynomials by $\widetilde{f}_1(X),\;\widetilde{f}_2(X)\;\in\; \mathbb{Z}^{\geq 0}[X]$ 2. Compute the polynomials $\widetilde{f}_1(X)\cdot \widetilde{f}_2(X)\mod{p_i\cdot \mathbb{Z}[X]}$ using FFTs in $\mathbb{F}_{p_i}$ for $i=1,\;2$. 3. Use the Chinese remainder theorem on the coefficients of these polynomials to recover the polynomial $\widetilde{f}_{1,2}(X):= \widetilde{f}_1(X)\cdot \widetilde{f}_2(X)\in \mathbb{Z}^{\geq 0}[X].$ 4. Reduce the coefficients of $\widetilde{f}_{1,2}(X)$ modulo $p$ to retrieve the $\mathbb{F}_p$-polynomial $f_1(X)\cdot f_2(X).$ Computing a sum $\sum_{j=1}^k f_{j,1}(X)\cdot f_{j,2}(X)$ entails $k$ FFTs and a single inverse FFT *per prime modulus used* (in this case 2) in addition to the Chinese remainder theorem and reduction of the coefficients modulo $p$.