# Plonk and PLookup
## 1. Framework and Compilers
### 1.1 Polynomial Protocol Framework
Here is a common framework used by Plonk and Plookup. We prove statements about secret polynomials $f_1,\ldots,f_t$ of degree $<d$. The statements are polynomial identities over some domain $H$ which involve public polynomials $g_1,\ldots,g_l$ of degree $<d$ and $G$:
$$
F(X)\overset{def}{=}G(X,h_1(v_1(X)),\ldots,h_M(v_M(X)))\equiv 0
$$
where $F$ has degree $<D$ and $h_i$ is some $f_{i_j}$ or $g_{i_j}$.
We start with an interactive proof protocol:
1. Prover $\mathbf{P}$ sends $f_1,\ldots,f_t$ to Checker $\mathbf{I}$.
2. Verifier $\mathbf{V}$ sends $\{F_i\}$ to $\mathbf{I}$.
3. $\mathbf{I}$ verifies identities and tells $\mathbf{V}$.
The first step in converting it into a practical protocol is to change the domain to the full space $\mathbb{F}$:
1. $\mathbf{P}$ sends $f_1,\ldots,f_t$ to $\mathbf{I}$.
2. $\mathbf{V}$ sends $a_1,\ldots,a_k$ to $\mathbf{P}$.
3. $\mathbf{P}$ computes the numerator $Z(X) = \prod_{a\in H}(X-a)$ and *quotient polynomial*
$$
T = \frac{\sum_i a_i F_i}{Z}
$$
and sends it to $\mathbf{I}$.
4. $\mathbf{I}$ verifies identity
$$
\sum_i a_i F_i(X) \equiv T \cdot Z
$$
and tells $\mathbf{V}$.
### 1.2 Two-party Compiler
The second step is to compile the polynomial protocol into a protocol with just two parties $\mathbf{P}$ and $\mathbf{V}$. This is done using polynomial commitments. We denote a commitment of $f$ by $\mathbf{cm}(f)$ and the opening at point $x$ protocol by $\mathbf{open}(f(x),x,\mathbf{cm}(f))$. The compiler works as follows:
1. Replace sending $f_i$ to $\mathbf{I}$ with sending $\mathbf{cm}(f_i)$ to $\mathbf{V}$.
2. To verify identity $F$, $\mathbf{V}$ selects random $x$ and sends it to $\mathbf{P}$.
3. $\mathbf{P}$ replies with $\{s_i=h_i(v_i(x))\}$ and engages in $\mathbf{open}$ for $s_i$.
4. $\mathbf{V}$ verifies that $G(x,s_1,\ldots,s_M)=0$.
The prover complexity consists of two parts:
* Computing $t$ commitments.
* Open $t$ polynomials in $t^*\leq M$ distinct points $v_i(x)$ and prove openings.
A version of pairing-based (groups $\mathbb{G}_1,\mathbb{G}_2,\mathbb{G}_T$), trusted setup [KZG10] commitment scheme gives the complexity
* $\sum_{i\leq t} deg(f_i)$ exponentiations in $\mathbb{G}_1$ to compute commitments;
* $\sum_{i\leq t^*} d_i$ exponentiations to open and prove, where $d_i$ is the maximum degree of a function opened at $i$-th point.
* $t+2t^*$ elements of $\mathbb{G}_1$ are transferred as commitments and openings.
Another optimization is to set $r<t^*$ points and openings so that $G$ reduces to a multivariate (of remaining points) polynomial $G'$ of degree 1. Then $\mathbf{V}$ can compute $\mathbf{cm}(G')$ using $s_i$ and $\mathbf{cm}(f_i)$ and then engage in another opening protocol to ensure it opens to 0 at $x$.
### 1.3 Non-interactive protocol
To compile into a non-interactive protocol, we note that all Verifier messages are public coins, and then replace them with Fiat-Shamir calls over the transcript of the protocol.
### 1.4 Zero Knowledge
## 2. Circuits and Identities
### 2.1 Circuits with Additions and Multiplications
Suppose $\mathbb{F}$ has a multiplicative subgroup $H$ of order $n-1$.
Consider an arithmetic circuit of $n$ gates representable in the following form ($i\in [n]$):
$$
(\mathbf{q}_{\mathbf{L}})_i\cdot \mathrm{x}_{\mathbf{a}_i} + (\mathbf{q}_{\mathbf{R}})_i\cdot \mathrm{x}_{\mathbf{b}_i} + (\mathbf{q}_{\mathbf{O}})_i\cdot \mathrm{x}_{\mathbf{c
}_i} + (\mathbf{q}_{\mathbf{M}})_i\cdot \mathrm{x}_{\mathbf{a}_i} \cdot \mathrm{x}_{\mathbf{b}_i}+\mathbf{q}_{\mathbf{C}_i}
$$
where $\mathbf{a},\mathbf{b},\mathbf{c}\in[m]^n$ (wire assignment vectors and $\mathbf{q}_{\mathbf{L}},\mathbf{q}_{\mathbf{R}},\mathbf{q}_{\mathbf{O}},\mathbf{q}_{\mathbf{M}},\mathbf{q}_{\mathbf{C}}$ are public vectors of coefficients (selector vectors).
Let $\mathfrak{S}$ be partition of $[3n]$ according to $\mathbf{a},\mathbf{b},\mathbf{c}$ (i.e. $m$ sets). Let $\sigma$ be a permutation on $[3n]$ such that it consists of $m$ cycles going over the elements of $\mathfrak{S}$.
Then we prove the following identities:
1. Circuit correctness:
$$
\mathbf{q}_Lf_L + \mathbf{q}_Rf_R + \mathbf{q}_Of_O+
\mathbf{q}_Mf_Lf_R +\mathbf{q}_C +PI =0
$$
where $\mathbf{q}_L,\mathbf{q}_R,\mathbf{q}_O,\mathbf{q}_M,\mathbf{q}_C$ interpolate the selector vectors and $f_L,f_R,f_O$ interpolate $\mathrm{x}_{\mathbf{a}},\mathrm{x}_{\mathbf{b}},\mathrm{x}_{\mathbf{c}}$.
For this identity the committed polynomials are $f_L,f_R,f_O$ of degree $n-1$.
2. Permutation check:
$$
\sigma(f_L,f_R,f_O)=(f_L,f_R,f_O)
$$
For this identity the committed polynomial is $Z$ of degree $n$.
3. Quotient polynomial
We compute and commit a polynomial $T$ of degree $3n-1$.
4. Total prover complexity is the sum of commitment degrees +1 ($7n+1$) plus sum of maximal degrees at evaluation points $x$ and $gx$, i.e. $3n+n+1 = 4n+1$, in total $11n+2$.
### 2.2 Identity for Permutation check
We check that
$$ \sigma(f_1,f_2,\ldots,f_k) \overset{?}{=} (g_1,g_2,\ldots,g_k)$$
1. Define $$
f_j' = f_j + \beta\cdot\underbrace{(j-1)n+\log_{\mathbf{g}} x}_{S_{ID_j}} + \gamma
$$ and
$$
g_j' = g_j + \beta\cdot\underbrace{\sigma((j-1)n+\log_{\mathbf{g}} x)}_{S_{\sigma_j}} + \gamma
$$
2. Define multiproduct $$
f' = \prod f_j';\quad g'= \prod g_j'.
$$
3. Define incremental product polynomial:
\begin{align*}
Z(\mathbf{g}^i) = f'(\mathbf{g})\cdot f'(\mathbf{g}^2)\cdots f'(\mathbf{g}^{i-1})/\left( g'(\mathbf{g})\cdot g'(\mathbf{g}^2)\cdots g'(\mathbf{g}^{i-1})\right).\\
\end{align*}
4. Prove identities:
\begin{align*}
[a=\mathbf{g}](Z(a)-1) &=0; \\
Z(a) f'(a) &=g'(a) Z(a\mathbf{g}).
\end{align*}
The correctness as follows. Let $\sigma(i)\neq i$ for some $i$. Then an elementary proof implies that $f'\neq g'$, which means that the second equation can not hold with the other.
### 2.3 Table Identity
Let $d\leq n$. We can prove that $\mathrm{x}_{i\in[n]}\subset T=\{t_i\}_{i\in[d]}$ for all $i$. Interpret $T$ as vector $\mathbf{t}$ and let $\mathbf{s}$ be the vector $\mathbf{x}||\mathbf{t}$ sorted by $\mathbf{t}$ (i.e. the values appear in the same order as in $\mathbf{t}$). Let $f$ interpolate $\mathrm{x}$, $h_1$ interpolate $\mathbf{s}$ on first $n+1$ points, and $h_2$ interpolate $s$ on last $n+1$ points. Then $\mathbf{P}$:
1. Sends polynomials $f,h_1,h_2$ to $\mathbf{I}$;
2. Gets $\beta,\gamma$ from $\mathbf{V}$.
3. Computes polynomial $Z'$ of degree $n$ using $\mathbf{s},\mathbf{x},\mathbf{t}$:
$$
Z(g)=Z(g^{n+1})=1;\quad Z(g^i) = \frac{(1+\beta)^{i-1}\prod_{j<i}(\gamma+f_j)\prod_{1\leq j<i}(\gamma(1+
\beta)+t_j+\beta t_{j+1})}{\prod_{1\leq j<i}(\gamma(1+\beta)+s_j+\beta s_{j+1})(\gamma(1+\beta)+s_{n+j}+\beta s_{n+j+1})}
$$
4. Sends $Z'$ to $\mathbf{I}$.
Then $\mathbf{V}$ asks $\mathbf{I}$ to verify 4 identities on $f,h_1,h_2,Z$, the highest degree is $3n+2$.
<!-- The complexity is determined by the identity check. We have the quotient polynomial of degree $3n+1-(n+1)=2n$. The commitment cost is $(n+1)+(n+1)+(n+1)+(2n+1)=5n+4$. Then $e_1=2n$ and $e_2=n+1$. In total we have complexity $8n+5$. -->
### 2.4 Vector Tables
We prove that $(\mathrm{x}_{a_i},\mathrm{x}_{b_i},\mathrm{x}_{c_i})\in T$ for $i \in [n]$ and $[a_i],[b_i],[c_i]$ being some vectors of length $n$.
1. Prover sends polynomials $f_{a},f_{b},f_{c}$ to $\mathbf{I}$.
2. $\mathbf{V}$ sends challenge $\delta$ to $\mathbf{P}$.
3. $\mathbf{P}$ computes table $T'=\{(T[i][1]+\delta T[i][2]+\delta^2 T[i][3])\}$.
4. $\mathbf{P}$ computes $Z'$ using $x=x_a+\delta x_b+\delta^2 x_c$ and sends to $\mathbf{I}$.
5. $\mathbf{V}$ asks the same 4 identities as for regular tables using $f=f_a+\delta f_b+\delta^2 f_c$.
### 2.5 Multiple Tables
Let us have $s$ tables $T_1,T_2,\ldots,T_s$.
We prove that $(\mathrm{x}_{a_i},\mathrm{x}_{b_i},\mathrm{x}_{c_i})\in T_{y_i}$ for $i \in [n]$ and $y_i\in [s]$. Note that this happens if and only if for any $\delta_1,\delta_2,\delta_3$ it holds that
$$
x_i = \mathrm{x}_{a_i}+\delta_1\mathrm{x}_{b_i}+\delta_2\mathrm{x}_{c_i}+\delta_3 y_i\in T'
$$
where
$$
T'=\{T_{j}[k][1]+\delta_1T_{j}[k][2]+\delta_2T_{j}[k][3]+\delta_3j\}_{ j\in[s],\;k\text{ goes for all rows of }T_y}
$$
Let $f_t$ be a degree-$n$ interpolant of $[y_0,y_1,\ldots,y_{n-1}]$: $f_t(g^i)= y_i$.
1. Prover sends polynomials $f_{a},f_{b},f_{c}$ to $\mathbf{I}$.
2. $\mathbf{V}$ sends challenges $\delta_1,\delta_2,\delta_3$ to $\mathbf{P}$.
3. $\mathbf{P}$ computes table $T'=\{T_{j}[k][1]+\delta_1T_{j}[k][2]+\delta_2T_{j}[k][3]+\delta_3j\}$.
4. $\mathbf{P}$ computes $Z'$ using $x=x_a+\delta_1 x_b+\delta_2 x_c+ \delta_3 y$ and sends to $\mathbf{I}$.
5. $\mathbf{V}$ asks the same 4 identities as for regular tables using $f=f_a+\delta_1 f_b+\delta_2 f_c+\delta_3 f_t$.
## 3.Poseidon
### 3.1 Specification
For Poseidon of width $3$ (processing $(2$ elements per call).
1. $\mathbf{X}=(X_1,X_2,X_3)$ is state.
2. For $1\leq i \leq 4$:
2.1 $X_j \leftarrow (X_j)^5$;
2.2 $\mathbf{X}\leftarrow A\cdot \mathbf{X} +\mathbf{C}_i$ where $A$ is matrix, $\mathbf{C}_i$ are round constants.
3. For $5\leq i \leq 61$:
3.1 Only the last element is changed: $X_3 \leftarrow (X_3)^5$;
3.2 $\mathbf{X}\leftarrow A\cdot \mathbf{X} +\mathbf{C}_i$.
4. For $62\leq i \leq 65$:
4.1 $X_j \leftarrow (X_j)^5$;
4.2 $\mathbf{X}\leftarrow A\cdot \mathbf{X} +\mathbf{C}_i$
### 3.2 Regular Plonk prover cost
We need $24+57\cdot 3$ multiplications, and for each round we need 6 fan-in-2 additions in the linear phase. Thus in total we need $195 + 65
\cdot 6 = 585$ gates.
Thus the prover cost is $11\cdot 585+2 = 6437$ exponentiations.
### Optimized Plonk implementation
We designate one polynomial per state variable $X_i$. Each round is described as an equation of degree 5 over $f_1,f_2,f_3$. For $R$ rounds we have a round equation of degree $6R$, which is the only polynomial identity we need. Thus $T$ is of degree $5R$.
Overall we spend $8R$ exponentiations for commitments and further $6R$ for openings. In total the cost is $910$, i.e. 7 times smaller than for regular Plonk.
## 4.SHA-256
### Method 1
We implement SHA-256 with lookups and with 4 state polynomials $x_L,x_R,x_S,x_T$.
We have the only type of constraints:
* Table:
$$
(\mathrm{x}_{\mathbf{L}_i},\mathrm{x}_{\mathbf{R}_i},\mathrm{x}_{\mathbf{S}_i},\mathrm{x}_{\mathbf{T}_i})\in T
$$
<!-- Note that we can satisfy both at the same time by setting $(\mathbf{q}_{\mathbf{L}})_i=(\mathbf{q}_{\mathbf{R}})_i=(\mathbf{q}_{\mathbf{S}})_i=(\mathbf{q}_{\mathbf{T}})_i=0$ for table gates and
-->
We aim to fit $2^{12}$ gates so consider tables of size $2^{12}$ and less. We split each 32-bit of a state register into 6-bit chunks $A^1,A^2,A^3,A^4,A^5,A^6$.
All rounds are implemented in the same way, we assume that the input is already split into 48 chunks of the state and 6 chunks of the message.
* $\Sigma_0,\Sigma_1$ with 21 lookups each:
* We split the input of $\Sigma_i$ into three chunks of 5 bits and three chunks of 6 bits each and represent the function as a 3-XOR of 11-to-32 bit mappings, where each mapping consists of 3 2-to-2 lookups, 9 lookups in total:$$\Sigma_{i,j}(A^{2k},A^{2k+1})=(\widehat{A}^{2k,j},\widehat{A}^{2k+1,j});$$
* We use XOR tables to compute 6-bit 2-XOR. We need 6 6-bit 3-XORs, or 12 2-XORs, so 12 more lookups.
* Maj function: for each 5/6-bit chunk compute 3 ANDs and 2 2-XORs, thus 30 lookups.
* Ch function: for each 5/6-bit chunk compute AND, NAND and 2-XOR, thus 18 lookups.
* Addition: we need 6-addition to compute the new E value as $E_{new} = D+H'$ where $H'=\Sigma_1+W+Ch+K+H$
* We implement addition with carry:
$$
ADD(x,y) = (x+y\bmod{2^6},(x+y)/2^6).
$$
* We need 4 such additions to compute one chunk of $H'$, one more addition for $E_{new}$ and then we aggregate carries with fan-3 addition lookup, thus spending 2 lookups. In total we need $7\cdot 6=42$ lookups for $E_{new}$.
* We compute $A_{new}=H'+Maj+\Sigma_0$ spending 2 more additions and one more carry aggregator. In total we need $18$ lookups for $A_{new}$.
* In total 60 lookups for modular addition.
* Therefore we compute one round with 150 lookups.
* A message schedule round can be implemented the same way as $\Sigma_0$. This costs 21 lookups.
In total we spend $64\cdot 150 + 21\cdot 48 = 10608$ gates.
#### Potential improvements
* We can drop permutation polynomials since all rounds are the same but we will have to open commitments in significantly more points.
* We can use field additions and then carry extraction instead of lookup addtions. This might save 10-12 gates per round (<10%).
### Method 2
We implement SHA-256 with lookups and with 4 state polynomials $x_L,x_R,x_S,x_T,x_U$.
We have the only type of constraints:
* Table:
$$
(\mathrm{x}_{\mathbf{L}_i},\mathrm{x}_{\mathbf{R}_i},\mathrm{x}_{\mathbf{S}_i},\mathrm{x}_{\mathbf{T}_i},\mathrm{x}_{\mathbf{U}_i})\in T
$$
<!-- Note that we can satisfy both at the same time by setting $(\mathbf{q}_{\mathbf{L}})_i=(\mathbf{q}_{\mathbf{R}})_i=(\mathbf{q}_{\mathbf{S}})_i=(\mathbf{q}_{\mathbf{T}})_i=0$ for table gates and
-->
We aim to fit $2^{12}$ gates so consider tables of size $2^{12}$ and less. We split each 32-bit of a state register into 4-bit chunks $A^1,A^2,\ldots,A^8$.
All rounds are implemented in the same way, we assume that the input is already split into 64 chunks of the state and 8 chunks of the message.
* $\Sigma_0,\Sigma_1$ with 20 lookups each:
* We group the input of $\Sigma_i$ into four blocks of two chunks and represent the function as a 4-XOR of 8-to-32 bit mappings, where each mapping consists of 4 2-to-2 lookups, 12 lookups in total:$$\Sigma_{i,j}(A^{2k},A^{2k+1} )=(\widehat{A}^{2k,j},\widehat{A}^{2k+1,j});$$
* We use 3-XOR tables to compute 4-bit 3-XOR. We need 8 more lookups.
* Maj function: 8 3-to-1 4-bit lookups.
* Ch function: 8 3-to-1 4-bit lookups.
* Addition: we need 6-addition to compute the new E value as $E_{new} = D+H'$ where $H'=\Sigma_1+W+Ch+K+H$
* We implement 3-addition with carry:
$$
ADD(x,y,z) = (x+y+z\bmod{2^4},(x+y+z)/2^4).
$$
* We need 2 such additions to compute $E_{new}$ and then we aggregate carries with fan-4 addition lookup, thus spending 2 lookups. In total we need $4\cdot 8=32$ lookups for $E_{new}$.
* We compute $A_{new}=H'+Maj+\Sigma_0$ spending 2 more additions and one more carry aggregator. In total we need $24$ lookups for $A_{new}$.
* In total 56 lookups for modular addition.
* Therefore we compute one round with 112 lookups.
* A message schedule round can be implemented the same way as $\Sigma_0$. This costs 20 lookups.
In total we spend $64\cdot 112 + 20\cdot 48 = 8128$ gates. However, we use a wider state (5 polys vs 4). We also use several tables of size $2^{12}$, though $2^{11}$ should also suffice for $\Sigma_i$.
## AES
We can implement AES with S-box lookup tables. We consider the following equations:
$$
A\cdot S(\mathbf{x}_r) = \mathbf{x}_{r+1} +\mathbf{k}_{r+1}.
$$
### Method 1
In this method we proceed as follows:
* Split MC-SR-SB-box outputs of form $a_iS(x_j)$ into 4-bit nibbles $E_{a_i,L},E_{a_i,R}$.
* Convert each nibble bit into base-6 representation, so that the expanded nibble $L_{a_i,\cdot}$ is in $\mathbb{Z}_6^4$.
* Compute each AK output expanded nibble as integer sum of four expanded nibbles plus key nibble.
* Collapse expanded nibbles back into nibbles by evaluating each 3 bits modulo 2.
* Compute S-box value as lookup over three nibbles.
We prepare the following tables $(x,y\in Z_{16})$:
* $T_1:\; (x,y,L_{1,L}(x,y),L_{1,R}(x,y))$ -- for S-box multiplied by 1;
* $T_2:\; (x,y,L_{2,L}(x,y),L_{2,R}(x,y))$ -- for S-box multiplied by 2 in $GF(256)$;
* $T_3:\; (x,y,L_{3,L}(x,y),L_{3,R}(x,y))$ -- for S-box multiplied by 3 in $GF(256)$;
* $XOR:\; (x_1||x_2||x_3||x_4, (x_1\bmod{2})||(x_2\bmod{2})||(x_3\bmod{2})||(x_4\bmod{2}))$ with $x_i\in Z_6$.
* The table entries will occupy $\mathrm{x}_{\mathbf{a}_i},\mathrm{x}_{\mathbf{b}_i},\mathrm{x}_{\mathbf{l}_i},\mathrm{x}_{\mathbf{s}_i}$, whereas the table number will be chosen by $\mathrm{q}_{\mathbf{c}_i}$
Our common equation form:
$$
(\mathbf{q}_{\mathbf{a}})_i\cdot \mathrm{x}_{\mathbf{a}_i} + (\mathbf{q}_{\mathbf{b}})_i\cdot \mathrm{x}_{\mathbf{b}_i} + (\mathbf{q}_{\mathbf{c}})_i\cdot \mathrm{x}_{\mathbf{c
}_i} + (\mathbf{q}_{\mathbf{S}})_i\cdot \mathrm{x}_{\mathbf{s}_i} +(\mathbf{q}_{\mathbf{L}})_i\cdot \mathrm{x}_{\mathbf{l
}_i} =0
$$
Then we have the following equations in each round:
* 32 $T_1$ entries: \begin{align}(\mathbf{q}_{\mathbf{a}})_i=(\mathbf{q}_{\mathbf{b}})_i&= (\mathbf{q}_{\mathbf{s}})_i=(\mathbf{q}_{\mathbf{l}})_i=0, (\mathbf{q}_{\mathbf{c}})_i=1\\ (\mathrm{x}_{\mathbf{a}_i},\mathrm{x}_{\mathbf{b}_i},\mathrm{x}_{\mathbf{s}_i},\mathrm{x}_{\mathbf{l}_i},\mathrm{x}_{\mathbf{c}_i})&\in T\end{align}
* 16 $T_2$ entries: $(\mathbf{q}_{\mathbf{a}})_i=(\mathbf{q}_{\mathbf{b}})_i= (\mathbf{q}_{\mathbf{s}})_i=(\mathbf{q}_{\mathbf{l}})_i=0$, $(\mathbf{q}_{\mathbf{c}})_i=2$;
* 16 $T_3$ entries: $(\mathbf{q}_{\mathbf{a}})_i=(\mathbf{q}_{\mathbf{b}})_i= (\mathbf{q}_{\mathbf{s}})_i=(\mathbf{q}_{\mathbf{l}})_i=0$, $(\mathbf{q}_{\mathbf{c}})_i=3$;
* 4 sums to get a nibble after AK, per nibble (i.e. 128 equations) $(\mathbf{q}_{\mathbf{a}})_i=(\mathbf{q}_{\mathbf{b}})_i=(\mathbf{q}_{\mathbf{c}})_i=1$, $(\mathbf{q}_{\mathbf{s}})_i=(\mathbf{q}_{\mathbf{l}})_i=0$;
* 32 XOR entries, one per nibble. $(\mathbf{q}_{\mathbf{a}})_i=(\mathbf{q}_{\mathbf{b}})_i= (\mathbf{q}_{\mathbf{s}})_i=(\mathbf{q}_{\mathbf{l}})_i=0$, $(\mathbf{q}_{\mathbf{c}})_i=4$
In total we have 960 lookup gates and 1280 linear equations, the state has 5 state polynomials. The table size is $3\cdot 256 + 6^4 = 2064$. The total complexity is $17\cdot 2240=38080$ exponentiations without key schedule.
### Method 2
Don't split into nibbles, work with entire bytes but use XOR [more frequently](https://hackmd.io/m0fnJ_lPTPahWAhfaiQA7Q#With-smaller-38-sized-tables) due to base-3 representation. Then you use the same number of lookups, but tables have size $3^8=6561$.
$$
\textsf{LASKA}^{[love\text{ in Czech}]}:\quad \text{Lookup Arguments Suitable for Kingly Applications}
$$