Plonk and PLookup

# Plonk and PLookup ## 1. Framework and Compilers ### 1.1 Polynomial Protocol Framework Here is a common framework used by Plonk and Plookup. We prove statements about secret polynomials $f_1,\ldots,f_t$ of degree $<d$. The statements are polynomial identities over some domain $H$ which involve public polynomials $g_1,\ldots,g_l$ of degree $<d$ and $G$: $$ F(X)\overset{def}{=}G(X,h_1(v_1(X)),\ldots,h_M(v_M(X)))\equiv 0 $$ where $F$ has degree $<D$ and $h_i$ is some $f_{i_j}$ or $g_{i_j}$. We start with an interactive proof protocol: 1. Prover $\mathbf{P}$ sends $f_1,\ldots,f_t$ to Checker $\mathbf{I}$. 2. Verifier $\mathbf{V}$ sends $\{F_i\}$ to $\mathbf{I}$. 3. $\mathbf{I}$ verifies identities and tells $\mathbf{V}$. The first step in converting it into a practical protocol is to change the domain to the full space $\mathbb{F}$: 1. $\mathbf{P}$ sends $f_1,\ldots,f_t$ to $\mathbf{I}$. 2. $\mathbf{V}$ sends $a_1,\ldots,a_k$ to $\mathbf{P}$. 3. $\mathbf{P}$ computes the numerator $Z(X) = \prod_{a\in H}(X-a)$ and *quotient polynomial* $$ T = \frac{\sum_i a_i F_i}{Z} $$ and sends it to $\mathbf{I}$. 4. $\mathbf{I}$ verifies identity $$ \sum_i a_i F_i(X) \equiv T \cdot Z $$ and tells $\mathbf{V}$. ### 1.2 Two-party Compiler The second step is to compile the polynomial protocol into a protocol with just two parties $\mathbf{P}$ and $\mathbf{V}$. This is done using polynomial commitments. We denote a commitment of $f$ by $\mathbf{cm}(f)$ and the opening at point $x$ protocol by $\mathbf{open}(f(x),x,\mathbf{cm}(f))$. The compiler works as follows: 1. Replace sending $f_i$ to $\mathbf{I}$ with sending $\mathbf{cm}(f_i)$ to $\mathbf{V}$. 2. To verify identity $F$, $\mathbf{V}$ selects random $x$ and sends it to $\mathbf{P}$. 3. $\mathbf{P}$ replies with $\{s_i=h_i(v_i(x))\}$ and engages in $\mathbf{open}$ for $s_i$. 4. $\mathbf{V}$ verifies that $G(x,s_1,\ldots,s_M)=0$. The prover complexity consists of two parts: * Computing $t$ commitments. * Open $t$ polynomials in $t^*\leq M$ distinct points $v_i(x)$ and prove openings. A version of pairing-based (groups $\mathbb{G}_1,\mathbb{G}_2,\mathbb{G}_T$), trusted setup [KZG10] commitment scheme gives the complexity * $\sum_{i\leq t} deg(f_i)$ exponentiations in $\mathbb{G}_1$ to compute commitments; * $\sum_{i\leq t^*} d_i$ exponentiations to open and prove, where $d_i$ is the maximum degree of a function opened at $i$-th point. * $t+2t^*$ elements of $\mathbb{G}_1$ are transferred as commitments and openings. Another optimization is to set $r<t^*$ points and openings so that $G$ reduces to a multivariate (of remaining points) polynomial $G'$ of degree 1. Then $\mathbf{V}$ can compute $\mathbf{cm}(G')$ using $s_i$ and $\mathbf{cm}(f_i)$ and then engage in another opening protocol to ensure it opens to 0 at $x$. ### 1.3 Non-interactive protocol To compile into a non-interactive protocol, we note that all Verifier messages are public coins, and then replace them with Fiat-Shamir calls over the transcript of the protocol. ### 1.4 Zero Knowledge ## 2. Circuits and Identities ### 2.1 Circuits with Additions and Multiplications Suppose $\mathbb{F}$ has a multiplicative subgroup $H$ of order $n-1$. Consider an arithmetic circuit of $n$ gates representable in the following form ($i\in [n]$): $$ (\mathbf{q}_{\mathbf{L}})_i\cdot \mathrm{x}_{\mathbf{a}_i} + (\mathbf{q}_{\mathbf{R}})_i\cdot \mathrm{x}_{\mathbf{b}_i} + (\mathbf{q}_{\mathbf{O}})_i\cdot \mathrm{x}_{\mathbf{c }_i} + (\mathbf{q}_{\mathbf{M}})_i\cdot \mathrm{x}_{\mathbf{a}_i} \cdot \mathrm{x}_{\mathbf{b}_i}+\mathbf{q}_{\mathbf{C}_i} $$ where $\mathbf{a},\mathbf{b},\mathbf{c}\in[m]^n$ (wire assignment vectors and $\mathbf{q}_{\mathbf{L}},\mathbf{q}_{\mathbf{R}},\mathbf{q}_{\mathbf{O}},\mathbf{q}_{\mathbf{M}},\mathbf{q}_{\mathbf{C}}$ are public vectors of coefficients (selector vectors). Let $\mathfrak{S}$ be partition of $[3n]$ according to $\mathbf{a},\mathbf{b},\mathbf{c}$ (i.e. $m$ sets). Let $\sigma$ be a permutation on $[3n]$ such that it consists of $m$ cycles going over the elements of $\mathfrak{S}$. Then we prove the following identities: 1. Circuit correctness: $$ \mathbf{q}_Lf_L + \mathbf{q}_Rf_R + \mathbf{q}_Of_O+ \mathbf{q}_Mf_Lf_R +\mathbf{q}_C +PI =0 $$ where $\mathbf{q}_L,\mathbf{q}_R,\mathbf{q}_O,\mathbf{q}_M,\mathbf{q}_C$ interpolate the selector vectors and $f_L,f_R,f_O$ interpolate $\mathrm{x}_{\mathbf{a}},\mathrm{x}_{\mathbf{b}},\mathrm{x}_{\mathbf{c}}$. For this identity the committed polynomials are $f_L,f_R,f_O$ of degree $n-1$. 2. Permutation check: $$ \sigma(f_L,f_R,f_O)=(f_L,f_R,f_O) $$ For this identity the committed polynomial is $Z$ of degree $n$. 3. Quotient polynomial We compute and commit a polynomial $T$ of degree $3n-1$. 4. Total prover complexity is the sum of commitment degrees +1 ($7n+1$) plus sum of maximal degrees at evaluation points $x$ and $gx$, i.e. $3n+n+1 = 4n+1$, in total $11n+2$. ### 2.2 Identity for Permutation check We check that $$ \sigma(f_1,f_2,\ldots,f_k) \overset{?}{=} (g_1,g_2,\ldots,g_k)$$ 1. Define $$ f_j' = f_j + \beta\cdot\underbrace{(j-1)n+\log_{\mathbf{g}} x}_{S_{ID_j}} + \gamma $$ and $$ g_j' = g_j + \beta\cdot\underbrace{\sigma((j-1)n+\log_{\mathbf{g}} x)}_{S_{\sigma_j}} + \gamma $$ 2. Define multiproduct $$ f' = \prod f_j';\quad g'= \prod g_j'. $$ 3. Define incremental product polynomial: \begin{align*} Z(\mathbf{g}^i) = f'(\mathbf{g})\cdot f'(\mathbf{g}^2)\cdots f'(\mathbf{g}^{i-1})/\left( g'(\mathbf{g})\cdot g'(\mathbf{g}^2)\cdots g'(\mathbf{g}^{i-1})\right).\\ \end{align*} 4. Prove identities: \begin{align*} [a=\mathbf{g}](Z(a)-1) &=0; \\ Z(a) f'(a) &=g'(a) Z(a\mathbf{g}). \end{align*} The correctness as follows. Let $\sigma(i)\neq i$ for some $i$. Then an elementary proof implies that $f'\neq g'$, which means that the second equation can not hold with the other. ### 2.3 Table Identity Let $d\leq n$. We can prove that $\mathrm{x}_{i\in[n]}\subset T=\{t_i\}_{i\in[d]}$ for all $i$. Interpret $T$ as vector $\mathbf{t}$ and let $\mathbf{s}$ be the vector $\mathbf{x}||\mathbf{t}$ sorted by $\mathbf{t}$ (i.e. the values appear in the same order as in $\mathbf{t}$). Let $f$ interpolate $\mathrm{x}$, $h_1$ interpolate $\mathbf{s}$ on first $n+1$ points, and $h_2$ interpolate $s$ on last $n+1$ points. Then $\mathbf{P}$: 1. Sends polynomials $f,h_1,h_2$ to $\mathbf{I}$; 2. Gets $\beta,\gamma$ from $\mathbf{V}$. 3. Computes polynomial $Z'$ of degree $n$ using $\mathbf{s},\mathbf{x},\mathbf{t}$: $$ Z(g)=Z(g^{n+1})=1;\quad Z(g^i) = \frac{(1+\beta)^{i-1}\prod_{j<i}(\gamma+f_j)\prod_{1\leq j<i}(\gamma(1+ \beta)+t_j+\beta t_{j+1})}{\prod_{1\leq j<i}(\gamma(1+\beta)+s_j+\beta s_{j+1})(\gamma(1+\beta)+s_{n+j}+\beta s_{n+j+1})} $$ 4. Sends $Z'$ to $\mathbf{I}$. Then $\mathbf{V}$ asks $\mathbf{I}$ to verify 4 identities on $f,h_1,h_2,Z$, the highest degree is $3n+2$.  ### 2.4 Vector Tables We prove that $(\mathrm{x}_{a_i},\mathrm{x}_{b_i},\mathrm{x}_{c_i})\in T$ for $i \in [n]$ and $[a_i],[b_i],[c_i]$ being some vectors of length $n$. 1. Prover sends polynomials $f_{a},f_{b},f_{c}$ to $\mathbf{I}$. 2. $\mathbf{V}$ sends challenge $\delta$ to $\mathbf{P}$. 3. $\mathbf{P}$ computes table $T'=\{(T[i][1]+\delta T[i][2]+\delta^2 T[i][3])\}$. 4. $\mathbf{P}$ computes $Z'$ using $x=x_a+\delta x_b+\delta^2 x_c$ and sends to $\mathbf{I}$. 5. $\mathbf{V}$ asks the same 4 identities as for regular tables using $f=f_a+\delta f_b+\delta^2 f_c$. ### 2.5 Multiple Tables Let us have $s$ tables $T_1,T_2,\ldots,T_s$. We prove that $(\mathrm{x}_{a_i},\mathrm{x}_{b_i},\mathrm{x}_{c_i})\in T_{y_i}$ for $i \in [n]$ and $y_i\in [s]$. Note that this happens if and only if for any $\delta_1,\delta_2,\delta_3$ it holds that $$ x_i = \mathrm{x}_{a_i}+\delta_1\mathrm{x}_{b_i}+\delta_2\mathrm{x}_{c_i}+\delta_3 y_i\in T' $$ where $$ T'=\{T_{j}[k][1]+\delta_1T_{j}[k][2]+\delta_2T_{j}[k][3]+\delta_3j\}_{ j\in[s],\;k\text{ goes for all rows of }T_y} $$ Let $f_t$ be a degree-$n$ interpolant of $[y_0,y_1,\ldots,y_{n-1}]$: $f_t(g^i)= y_i$. 1. Prover sends polynomials $f_{a},f_{b},f_{c}$ to $\mathbf{I}$. 2. $\mathbf{V}$ sends challenges $\delta_1,\delta_2,\delta_3$ to $\mathbf{P}$. 3. $\mathbf{P}$ computes table $T'=\{T_{j}[k][1]+\delta_1T_{j}[k][2]+\delta_2T_{j}[k][3]+\delta_3j\}$. 4. $\mathbf{P}$ computes $Z'$ using $x=x_a+\delta_1 x_b+\delta_2 x_c+ \delta_3 y$ and sends to $\mathbf{I}$. 5. $\mathbf{V}$ asks the same 4 identities as for regular tables using $f=f_a+\delta_1 f_b+\delta_2 f_c+\delta_3 f_t$. ## 3.Poseidon ### 3.1 Specification For Poseidon of width $3$ (processing $(2$ elements per call). 1. $\mathbf{X}=(X_1,X_2,X_3)$ is state. 2. For $1\leq i \leq 4$: 2.1 $X_j \leftarrow (X_j)^5$; 2.2 $\mathbf{X}\leftarrow A\cdot \mathbf{X} +\mathbf{C}_i$ where $A$ is matrix, $\mathbf{C}_i$ are round constants. 3. For $5\leq i \leq 61$: 3.1 Only the last element is changed: $X_3 \leftarrow (X_3)^5$; 3.2 $\mathbf{X}\leftarrow A\cdot \mathbf{X} +\mathbf{C}_i$. 4. For $62\leq i \leq 65$: 4.1 $X_j \leftarrow (X_j)^5$; 4.2 $\mathbf{X}\leftarrow A\cdot \mathbf{X} +\mathbf{C}_i$ ### 3.2 Regular Plonk prover cost We need $24+57\cdot 3$ multiplications, and for each round we need 6 fan-in-2 additions in the linear phase. Thus in total we need $195 + 65 \cdot 6 = 585$ gates. Thus the prover cost is $11\cdot 585+2 = 6437$ exponentiations. ### Optimized Plonk implementation We designate one polynomial per state variable $X_i$. Each round is described as an equation of degree 5 over $f_1,f_2,f_3$. For $R$ rounds we have a round equation of degree $6R$, which is the only polynomial identity we need. Thus $T$ is of degree $5R$. Overall we spend $8R$ exponentiations for commitments and further $6R$ for openings. In total the cost is $910$, i.e. 7 times smaller than for regular Plonk. ## 4.SHA-256 ### Method 1 We implement SHA-256 with lookups and with 4 state polynomials $x_L,x_R,x_S,x_T$. We have the only type of constraints: * Table: $$ (\mathrm{x}_{\mathbf{L}_i},\mathrm{x}_{\mathbf{R}_i},\mathrm{x}_{\mathbf{S}_i},\mathrm{x}_{\mathbf{T}_i})\in T $$  We aim to fit $2^{12}$ gates so consider tables of size $2^{12}$ and less. We split each 32-bit of a state register into 6-bit chunks $A^1,A^2,A^3,A^4,A^5,A^6$. All rounds are implemented in the same way, we assume that the input is already split into 48 chunks of the state and 6 chunks of the message. * $\Sigma_0,\Sigma_1$ with 21 lookups each: * We split the input of $\Sigma_i$ into three chunks of 5 bits and three chunks of 6 bits each and represent the function as a 3-XOR of 11-to-32 bit mappings, where each mapping consists of 3 2-to-2 lookups, 9 lookups in total:$$\Sigma_{i,j}(A^{2k},A^{2k+1})=(\widehat{A}^{2k,j},\widehat{A}^{2k+1,j});$$ * We use XOR tables to compute 6-bit 2-XOR. We need 6 6-bit 3-XORs, or 12 2-XORs, so 12 more lookups. * Maj function: for each 5/6-bit chunk compute 3 ANDs and 2 2-XORs, thus 30 lookups. * Ch function: for each 5/6-bit chunk compute AND, NAND and 2-XOR, thus 18 lookups. * Addition: we need 6-addition to compute the new E value as $E_{new} = D+H'$ where $H'=\Sigma_1+W+Ch+K+H$ * We implement addition with carry: $$ ADD(x,y) = (x+y\bmod{2^6},(x+y)/2^6). $$ * We need 4 such additions to compute one chunk of $H'$, one more addition for $E_{new}$ and then we aggregate carries with fan-3 addition lookup, thus spending 2 lookups. In total we need $7\cdot 6=42$ lookups for $E_{new}$. * We compute $A_{new}=H'+Maj+\Sigma_0$ spending 2 more additions and one more carry aggregator. In total we need $18$ lookups for $A_{new}$. * In total 60 lookups for modular addition. * Therefore we compute one round with 150 lookups. * A message schedule round can be implemented the same way as $\Sigma_0$. This costs 21 lookups. In total we spend $64\cdot 150 + 21\cdot 48 = 10608$ gates. #### Potential improvements * We can drop permutation polynomials since all rounds are the same but we will have to open commitments in significantly more points. * We can use field additions and then carry extraction instead of lookup addtions. This might save 10-12 gates per round (<10%). ### Method 2 We implement SHA-256 with lookups and with 4 state polynomials $x_L,x_R,x_S,x_T,x_U$. We have the only type of constraints: * Table: $$ (\mathrm{x}_{\mathbf{L}_i},\mathrm{x}_{\mathbf{R}_i},\mathrm{x}_{\mathbf{S}_i},\mathrm{x}_{\mathbf{T}_i},\mathrm{x}_{\mathbf{U}_i})\in T $$  We aim to fit $2^{12}$ gates so consider tables of size $2^{12}$ and less. We split each 32-bit of a state register into 4-bit chunks $A^1,A^2,\ldots,A^8$. All rounds are implemented in the same way, we assume that the input is already split into 64 chunks of the state and 8 chunks of the message. * $\Sigma_0,\Sigma_1$ with 20 lookups each: * We group the input of $\Sigma_i$ into four blocks of two chunks and represent the function as a 4-XOR of 8-to-32 bit mappings, where each mapping consists of 4 2-to-2 lookups, 12 lookups in total:$$\Sigma_{i,j}(A^{2k},A^{2k+1} )=(\widehat{A}^{2k,j},\widehat{A}^{2k+1,j});$$ * We use 3-XOR tables to compute 4-bit 3-XOR. We need 8 more lookups. * Maj function: 8 3-to-1 4-bit lookups. * Ch function: 8 3-to-1 4-bit lookups. * Addition: we need 6-addition to compute the new E value as $E_{new} = D+H'$ where $H'=\Sigma_1+W+Ch+K+H$ * We implement 3-addition with carry: $$ ADD(x,y,z) = (x+y+z\bmod{2^4},(x+y+z)/2^4). $$ * We need 2 such additions to compute $E_{new}$ and then we aggregate carries with fan-4 addition lookup, thus spending 2 lookups. In total we need $4\cdot 8=32$ lookups for $E_{new}$. * We compute $A_{new}=H'+Maj+\Sigma_0$ spending 2 more additions and one more carry aggregator. In total we need $24$ lookups for $A_{new}$. * In total 56 lookups for modular addition. * Therefore we compute one round with 112 lookups. * A message schedule round can be implemented the same way as $\Sigma_0$. This costs 20 lookups. In total we spend $64\cdot 112 + 20\cdot 48 = 8128$ gates. However, we use a wider state (5 polys vs 4). We also use several tables of size $2^{12}$, though $2^{11}$ should also suffice for $\Sigma_i$. ## AES We can implement AES with S-box lookup tables. We consider the following equations: $$ A\cdot S(\mathbf{x}_r) = \mathbf{x}_{r+1} +\mathbf{k}_{r+1}. $$ ### Method 1 In this method we proceed as follows: * Split MC-SR-SB-box outputs of form $a_iS(x_j)$ into 4-bit nibbles $E_{a_i,L},E_{a_i,R}$. * Convert each nibble bit into base-6 representation, so that the expanded nibble $L_{a_i,\cdot}$ is in $\mathbb{Z}_6^4$. * Compute each AK output expanded nibble as integer sum of four expanded nibbles plus key nibble. * Collapse expanded nibbles back into nibbles by evaluating each 3 bits modulo 2. * Compute S-box value as lookup over three nibbles. We prepare the following tables $(x,y\in Z_{16})$: * $T_1:\; (x,y,L_{1,L}(x,y),L_{1,R}(x,y))$ -- for S-box multiplied by 1; * $T_2:\; (x,y,L_{2,L}(x,y),L_{2,R}(x,y))$ -- for S-box multiplied by 2 in $GF(256)$; * $T_3:\; (x,y,L_{3,L}(x,y),L_{3,R}(x,y))$ -- for S-box multiplied by 3 in $GF(256)$; * $XOR:\; (x_1||x_2||x_3||x_4, (x_1\bmod{2})||(x_2\bmod{2})||(x_3\bmod{2})||(x_4\bmod{2}))$ with $x_i\in Z_6$. * The table entries will occupy $\mathrm{x}_{\mathbf{a}_i},\mathrm{x}_{\mathbf{b}_i},\mathrm{x}_{\mathbf{l}_i},\mathrm{x}_{\mathbf{s}_i}$, whereas the table number will be chosen by $\mathrm{q}_{\mathbf{c}_i}$ Our common equation form: $$ (\mathbf{q}_{\mathbf{a}})_i\cdot \mathrm{x}_{\mathbf{a}_i} + (\mathbf{q}_{\mathbf{b}})_i\cdot \mathrm{x}_{\mathbf{b}_i} + (\mathbf{q}_{\mathbf{c}})_i\cdot \mathrm{x}_{\mathbf{c }_i} + (\mathbf{q}_{\mathbf{S}})_i\cdot \mathrm{x}_{\mathbf{s}_i} +(\mathbf{q}_{\mathbf{L}})_i\cdot \mathrm{x}_{\mathbf{l }_i} =0 $$ Then we have the following equations in each round: * 32 $T_1$ entries: \begin{align}(\mathbf{q}_{\mathbf{a}})_i=(\mathbf{q}_{\mathbf{b}})_i&= (\mathbf{q}_{\mathbf{s}})_i=(\mathbf{q}_{\mathbf{l}})_i=0, (\mathbf{q}_{\mathbf{c}})_i=1\\ (\mathrm{x}_{\mathbf{a}_i},\mathrm{x}_{\mathbf{b}_i},\mathrm{x}_{\mathbf{s}_i},\mathrm{x}_{\mathbf{l}_i},\mathrm{x}_{\mathbf{c}_i})&\in T\end{align} * 16 $T_2$ entries: $(\mathbf{q}_{\mathbf{a}})_i=(\mathbf{q}_{\mathbf{b}})_i= (\mathbf{q}_{\mathbf{s}})_i=(\mathbf{q}_{\mathbf{l}})_i=0$, $(\mathbf{q}_{\mathbf{c}})_i=2$; * 16 $T_3$ entries: $(\mathbf{q}_{\mathbf{a}})_i=(\mathbf{q}_{\mathbf{b}})_i= (\mathbf{q}_{\mathbf{s}})_i=(\mathbf{q}_{\mathbf{l}})_i=0$, $(\mathbf{q}_{\mathbf{c}})_i=3$; * 4 sums to get a nibble after AK, per nibble (i.e. 128 equations) $(\mathbf{q}_{\mathbf{a}})_i=(\mathbf{q}_{\mathbf{b}})_i=(\mathbf{q}_{\mathbf{c}})_i=1$, $(\mathbf{q}_{\mathbf{s}})_i=(\mathbf{q}_{\mathbf{l}})_i=0$; * 32 XOR entries, one per nibble. $(\mathbf{q}_{\mathbf{a}})_i=(\mathbf{q}_{\mathbf{b}})_i= (\mathbf{q}_{\mathbf{s}})_i=(\mathbf{q}_{\mathbf{l}})_i=0$, $(\mathbf{q}_{\mathbf{c}})_i=4$ In total we have 960 lookup gates and 1280 linear equations, the state has 5 state polynomials. The table size is $3\cdot 256 + 6^4 = 2064$. The total complexity is $17\cdot 2240=38080$ exponentiations without key schedule. ### Method 2 Don't split into nibbles, work with entire bytes but use XOR [more frequently](https://hackmd.io/m0fnJ_lPTPahWAhfaiQA7Q#With-smaller-38-sized-tables) due to base-3 representation. Then you use the same number of lookups, but tables have size $3^8=6561$. $$ \textsf{LASKA}^{[love\text{ in Czech}]}:\quad \text{Lookup Arguments Suitable for Kingly Applications} $$