# Sin7Y Tech Review (31): Looking into Lookup Arguments ![](https://hackmd.io/_uploads/rJPqao0_o.png) ## TL;DR As mentioned in the previous article [Hello, OlaVM!](https://hackmd.io/@sin7y/H1yPj_J8i), [OlaVM](https://olavm.org/)’s vision is to build a high-performance ZKVM, and this article will focus on one of the tools that make OlaVM high-performance, namely, Lookup Arguments. Lookup Arguments play an important role in reducing the size of the circuit, thereby improving Zero Knowledge efficiency, and it's widely used in the circuit design of ZKVMs. Throughout this article you'll learn more about the following: 1. What role do Lookup Arguments play in ZKVM? 2. [Plookup protocol](https://eprint.iacr.org/2020/315.pdf) principles 3. [Lookup Argument protocol](https://zcash.github.io/halo2/design/proving-system/lookup.html) principle of Halo 2 4. The connection between the two Lookup Argument algorithms ## The roles of a ZKVM The ZKVM utilizes Zero Knowledge to constrain all the execution processes of the VM, and the execution process of the VM can generally be divided into: instruction execution, memory access and built-in function execution. It is somewhat impractical to execute constraints on these operations in one trace. First of all, in a trace, a row represents an operation type, and one operation type corresponds to multiple constraints, and different constraints correspond to different numbers of columns, resulting in different widths. If one of the rows is too wide due to one constraint corresponding to too many columns, then the width of the entire trace is affected, becoming too large. Resource usage of this design is wasteful when the rows corresponding to the remaining constraints do not require so many columns. Then, if there are too many different operation types in a single trace, more selectors will be introduced, increasing not only the number of polynomials, but also the order of the constraint. Finally, due to the order limitation of the group, the number of rows of the trace itself cannot exceed the order of the group, so the number of trace rows occupied by a certain type of operation should be minimized. So, for simplicity, we need to: 1. Split different operation types into multiple sub-traces and prove them separately, with data consistency between the main trace and the sub-traces to be ensured by Lookup argument. 2. For some ZK-unfriendly computations, we can reduce the size of the trace by Lookup Argument techniques, such as bitwise operations. Of course, there are other technical means to reduce the size of the trace, and that will be explained further down in this article. ### Lookup between trace tables All the execution processes of the VM will form a complete trace, called "the main trace". "Complete" refers to that it contains all the states of the VM execution, but not auxiliary states, such as some extended information that is convenient for ZK verification. As mentioned earlier, including this auxiliary information in the main trace will make the main trace complex and difficult to constrain. Therefore, for the convenience of constraints, some sub-traces are usually established, and then constrained for these sub-traces respectively, and the main traces are mainly used to execute the correct program constraints and Context constraints. ![](https://i.imgur.com/RSCkz9G.png) By creating different sub-traces, we divide the different operations performed by the VM and ensure that the data in the sub-trace is derived from the main trace by using Lookup Argument techniques. For the data validity proof in the sub-trace, you need to generate different traces according to a specific operation type, and then use the corresponding constraints to prove the validity of these traces. In particular, for zk-unfriendly operations such as Bitwise and rangecheck. ### Lookup for ZK-unfriendly operations As mentioned earlier, the proofs of each sub-trace are independent, so getting the smallest possible trace will improve the efficiency of prover. Taking Bitwise as an example, Bitwise operations include AND, XOR, and NOT. If you simply want to implement the constraints of Bitwise operations through circuits, you may need to split each OP into multiple binary limbs, and if these OPs are 32 bits wide, they will be split into 32 limbs. Then, you need to constrain that: 1. Sumcheck: All bits combined together equals the original OP, $a = \sum2^ilimb\_a_{i}$ 2. Bitwise check per bit:$limb\_a_{i} * limb\_b_{i} = limb\_c_{i}$ A total of 2+32*3=99 trace cells, and the number of constraints is 3 * sumcheck +33* bitwise=35. If there are some truth tables at this time, for AND, XOR, and NOT calculations, you can define three tables, which contain data for bitwise calculations that refer to the op with a specified bit width, such as 8-bit. For 32-bit ops, you only need to split them into four 8-bit limbs, and then the bitwise relationship between these OP limbs does not need to be implemented with corresponding constraints, only needs to execute Lookup on the fixed table. At this time, a total of 3+4*3=15 trace cells are taken up with 3 constraints sumcheck and one Lookup argument (Batch Lookup is supported). ![](https://i.imgur.com/25tRV13.png) Utilizing Lookup Arguments is a huge boost not only for proofs of bitwise operations, but also for rangecheck operations. For a 32-bit OP, you only need to split it into two 16-bit limbs. There are two good designs here. One is that it allows rangecheck to take up fewer trace cells, and the other is that our customized ADD-MUL constraint can be reused for rangecheck’s sum constraint. For different calculation types, being able to reuse the same constraint is of great help to the overall efficiency improvement. As shown in the figure above, the customized ASS-MUL gate can support the constraint reuse of five calculation types: ADD, MUL, ADD-MUL, EQ, and RANGECHECK. ## Plookup Protocol ### Introduction Plookup is a protocol used to check that a polynomial $$f \in F_{<n}[x]$$ of the order less than n, the value on the multiplicative group $H$ of n order is contained in a d-dimensional table $F^d$. A typical scenario is to do a rangecheck in a zk-snark, verifying that the values of all polynomials over $H$ are on [0, m]. ### Symbol Description - $H = \{g,...,g^{n+1}\}$ is a multiplicative group of order n+1 on $F$, and for $f \in F[x]$, if $f_i = f(g^i), i \in [n+1]$ , then $f_i$ can be expressed as $f(g^i)$, where $[n+1]$ represents $\{1,2,3...,n+1\}$ - The vector $f \in F^n$ can also be expressed by the polynomial $f \in F_{<n}[X]$ , where $f(g^i) = f_i$ . - Given two vectors $f \in F^n$ , $t \in F^d$ , $f \subset t$ represents $\{f_i\}_{i\in[n]} \subset \{t_i\}_{i\in[d]}$. ### Pretreatment 1. Suppose $d = n+1$ - If $d\le n$ , repeat the last element $n-d+1$ times. - $d$ is at most $n+1$, because the group has only $n+1$ order. 2. The polynomial $t \in F_{<n+1}[X]$ represents the value of the lookup - $t$ is $n+1$ sets of finite field elements, $t.degree()<n+1$. 3. The input polynomial is represented by $f \in F_{<n}[X]$. ### Protocol Process 1. $s \in F^{2n+1} represents the combined vector of $(f,t)$, and then two polynomial $h_1、h_2$ on $F_{<n+1}[X]$ to represent $s$: $$h_1(g^i) = s_i, i \in [n+1]$$ $$h_2(g^i) = s_{n+i}, i \in [n+1]$$ - There are $2n+1$ elements in $s$. $h_1$ represents $\{s_1, s_2,...,s_{n+1}\},h_2$ represents $\{s_{n+1}, s_{n+2},....,s_{2n+1}\}$ which means: $$h_1(g^{n+1}) = h_2(g) = s_{n+1}$$ 2. $P$ calculates $h_1、h_2$ and sends it to a trusted third party $I$ 3. $V$ randomly selects random numbers $\beta、\gamma \in F$ in two finite fields $F$, and sends them to $P$ 4. $P$ uses $\beta and \gamma$ to calculate a polynomial $Z \in F_{<n+1}[X]$ , defined as follows: - $Z(g)=1$ - For :$i \in [2, n]$ $Z(g^i)=\frac{(1+\beta)^{i-1}\Pi_{j<i}(\gamma+f_j)\Pi_{1\le j<i}(\gamma(1+\beta)+t_j+\beta t_{j+1})}{\Pi_{1\le j<i}(\gamma(1+\beta)+s_j+\beta s_{j+1})(\gamma(1+\beta)+s_{n+j}+\beta s_{n+j+1})}$ - $Z(g^{n+1}) = 1$ 5. $P$ sends $Z$ to $I$ 6. $V$ checks the correctness of $Z$, for each $x \in H$ : - $L_1(x)(Z(x)-1) = 0$ - $(x-g^{n+1})Z(x)(1+\beta)(\gamma+f(x))(\gamma(1+\beta)+t(x)+\beta t(gx)) = (x-g^{n+1})Z(gx)(\gamma(1+\beta)+h_1(x)+\beta h_1(gx))(\gamma(1+\beta)+h_2(x)+\beta h_2(gx))$ - $L_{n+1}(x)(h_1(x) - h_2(gx)) = 0$ - $L_{n+1}(x)(Z(x) - 1) = 0$ 7. $V$ outputs acc if all the equations in step 6 are satisfied. ### Protocol Understanding 1. Define two polynomials $F$ and $G$: $$F(\beta,\gamma) = (1+\beta)^n\Pi_{i \in [n]}(\gamma+f_i)\Pi_{i \in [d-1]}(\gamma(1+\beta)+t_i+\beta t_{i+1})$$ $$G(\beta,\gamma) = \Pi_{i \in [n+d-1]}(\gamma(1+\beta)+s_i+\beta s_{i+1})$$ where $F \equiv G,d = n+1$,if and only if: - $f \subset t$ - $s$ is the ordering of $(f,t)$ with respect to $t$ This transfers the proof $f \subset t$ to the proof $F \equiv G$ , which is to prove that $s$ is a permutation of$(f,t)$. Prove that $a、b$ is the prerequisite for $F \equiv G$ to be true: $$F(\beta,\gamma)=(1+\beta)^{n+d-1}\Pi_{i \in [n]}(\gamma+f_i)\Pi_{i\in [d-1]}(\gamma + \frac{t_i+\beta t_{i+1}}{1+\beta})$$ $$G(\beta,\gamma) = (1+\beta)^{n+d-1}\Pi_{i \in [n+d-1]}(\gamma+\frac{s_i+\beta s_{i+1})}{1+\beta}$$ For any$j \in [d-1]$,there is $i \in [n+d-1]$that makes $(t_j,t_{j+1})=(s_i, s_{i+1})$, For (n*i) subscripts that are not listed above, there are always $s_i=s_{i+1}$and 2 sets of$\{s_i\} and\{f_i\}$ are equal,so $$\gamma+\frac{s_i+\beta s_{i+1}}{1+\beta}=\gamma+s_i=\gamma+f_{i'}。$$ the prerequisite is true. Please refer to the paper for proof of sufficient conditions. 2. What is $Z$ of $P$ in the calculation - Defined $Z(g^i)$ when $i \in [2, n-1]$: $$Z(g^{i+1}) = Z(g^i)\times \frac{(1+\beta)(\gamma+f_i)(\gamma(1+\beta)+t_i + \beta t_{i+1})}{(\gamma(1+\beta)+s_i+\beta s_{i+1})(\gamma(1+\beta)+s_{n+i}+\beta s_{n+i+1})}$$ - When $i=1$ , $Z(g)=1$, and when $i=2$, $$Z(g^2) = \frac{(1+\beta)(\gamma+f_1)(\gamma(1+\beta)+t_1 + \beta t_{2})}{(\gamma(1+\beta)+s_1+\beta s_2)(\gamma(1+\beta)+s_{n+1}+\beta s_{n+2})}$$ So the relations $Z(g^{i+1}) and Z(g^i)$ are satisfied for $i \in [1, n-1]$ . When $i=n$ , $$Z(g^{n})=\frac{(1+\beta)^{n-1}(\gamma+f_1)..(\gamma+f_{n-1})\times(\gamma(1+\beta)+t_1+t_2)..(\gamma(1+\beta)+t_{n-1}+t_{n})}{(\gamma(1+\beta)+s_1+\beta s_2)..(\gamma(1+\beta)+s_{n-1}+\beta s_n)\times (\gamma(1+\beta)+s_{n+1}+\beta s_{n+2})..(\gamma(1+\beta)+s_{2n-1}+\beta s_{2n})}$$ $$Z(g^{n+1}) = Z(g^n)\times \frac{(1+\beta)(\gamma+f_n)(\gamma(1+\beta)+t_n + \beta t_{n+1})}{(\gamma(1+\beta)+s_n+\beta s_{n+1})(\gamma(1+\beta)+s_{2n}+\beta s_{2n+1})}=\frac{F}{G}=1$$ So the relations between$Z(g^{i+1}) and Z(g^i)$ are satisfied for $i \in [1, n]$. 3. What $V$ is verifying: - $V$ checks a and d to verify that $Z(g)=1$ and $Z(g^{n+1})=1$ - $V$ checks c to verify that $h_1(g^{n+1}) = h_2(g)$. - $V$ checks b to verify that $$Z(g^{i+1})=Z(g^i)\times \frac{(1+\beta)(\gamma+f_i)(\gamma(1+\beta)+t_i+\beta t_{i+1})}{(\gamma(1+\beta)+s_i+\beta s_{i+1})(\gamma(1+\beta)+s_{n+i}+\beta s_{n+i+1})}$$ is satisfied for $i \in [1, n]$. Further clarification of $Z$, $F$ and $G$: $$F_i=(1+\beta)(\gamma+f_i)(\gamma(1+\beta)+t_i+\beta t_{i+1})$$ $$G_i=(\gamma(1+\beta)+s_i+\beta s_{i+1})(\gamma(1+\beta)+s_{n+i}+\beta s_{n+i+1})$$ Prove: $\Pi_{i \in [1, n]}\frac{F_i}{G_i}=1$,let $U_i=\frac{F_i}{G_i}$,that is to prove $\Pi_{i \in [1,n]}U_i=U_1U_2...U_n=1$ let $Z(g)=1$, $Z(g^2)=U_1$, $Z(g^3)=U_1U_2$, ... $Z(g^n)=U_1U_2...U_{n-1}$ $Z(g^{n+1})=1$ And there is $Z(g^{n+1})=Z(g^n)\times U_n$,so $U_1U_2...U_n=1$ ## Halo2 Lookup Protocol ### Introduction The lookup protocol function of Halo2 is to, given two columns of data $A$ and $S$ with a length of $2^k$, proving that every cell in $A$ is in $S$, and some cells in $S$ can not be in $A$, and - $S$ can be fixed or variable. - The case where $S$ is variable is the inclusion of columns in trace, and the values are not fixed. - $A$ and $S$ can contain duplicate data, and if the size of $A$ and $S$ does not reach $2^k$, they need to be completed to $2^k$. - Fill $A$ with some data from $S$. - Complete $S$ with the last data. ### Protocol Process 1. Let $H=\{1,w,w^2,...,w^{2k-1}\}$ 2. $P$ calculates the permutations $A'$ and $S'$ of $A$ and $S$ - $A'$ is an ordering of $A$ that places the same values on adjacent cells and randomly ranks between different values - $S'$ is an ordering of $S$, where the first cell of the different values in $A'$ has to be the same as the value in the corresponding row in $S'$, and the other rows are random - $A'$ and S'$ satisfy $A'_i=S'_i$ or $A'_i=A'_{i-1}$ 3. $P$ calculates the polynomial $Z$: - $Z_0=Z_{2^k}=1$ - $Z_{i+1}=Z_i\times \frac{(A_i+\beta)(S_i+\gamma)}{(A'_i+\beta)(S'_i+\gamma)},i \in [0, 2^k)$ 4. $V$ verifies $A'$ and $S'$, and for each $x$ on $H$, it satisfies: - $(A'(X)-S'(X))(A'(X)-A'(w^{-1}X))=0$ - $L_0(X)(A'(X)-S'(X))=0$ 5. $V$ verifies $Z$, and for each $x$ on $H$, it satisfies : - $Z(wX)(A'(X)+\beta)(S'(X)+\gamma) - Z(X)(A(X)+\beta)(S(X)+\gamma)=0$ - $L_0(X)(1-Z(X))=0$ 6. $V$ outputs acc if both the equations in 4 and 5 are satisfied. **The protocol only checks against the expanded relationship between $A$ and $S$. How is it guaranteed that the expansion is valid for $S$ ?** Suppose A = {1,2} and S = {3,4}, both of which do not satisfy the subset argument, are expanded to become A = {1,2,3,4} and S = {1,2,3,4}, which can be justified by the subset argument. **It is unreasonable, but how can it be proved to pass?** ### Support ZK - Finally, add $t$ rows of random numbers, and only use the first $0~u=2^k-t-1$ rows to store the data - Put $Z(x)$ in row $u$ - Polynomials $q_{blind}$ set the last $t$ row to 1 and the others to 0 - Polynomials $q_{last}$ set the row $u$ to 1 and everything else to 0, which is the last row of data cell - Change the $V$ validation to - $(1-(q_{last}(X)+q_{blind}(X)))(Z(wX)(A'(X)+\beta)(S'(X)+\gamma) - Z(X)(A(X)+\beta)(S(X)+\gamma))=0$ - $(1-(q_{last}(X)+q_{blind}(X)))(A'(X)-S'(X))(A'(X)-A'(w^{-1}X))=0$ - Why add $q_{last}(X)$ here? The row u stores $Z(wx)$ and it does not need to participate in validation - $L_0(X)(1-Z(X))=0$ - $L_0(X)(A'(X)-S'(X))=0$ - $q_{last}(X)(Z(X)^2-Z(X))=0$ - $Z(w^u)$ can be 1or 0, as there may be $A_i+\beta=0$ or $S_i+\gamma=0$ ## Extend - 1 : Vector Lookup Suppose P has $\omega$ polynomials $f_1,...,f_{\omega} \in F_{<n}[X]$ and a data table $t^* \in (F^{\omega})^d$ of $d$ rows and $\omega$ columns, and to prove that $\forall j \in [n], (f_1(g^j),...,f_{\omega}(g^j)) \in t^*$, you need to first convert and $\{f_i\}_{i\in[\omega]}$ to $\{t_i\}_{i\in[\omega]}$ the previous column of data. 1. Calculate $\omega$ polynomials for the column $\omega$ of data in $t^*$, and the column $i$ of polynomials $t_i \in F_{<d}[X],t_i(g^j) = t^*_{i,j}, j \in [d]$ 2. $V$ selects a random number $\alpha$ to send to $P$ 3. $P$ recalculates the polynomial and data that require lookup - $f := \Sigma_{i \in \omega} \alpha^if_i$ - $t := \Sigma_{i \in \omega} \alpha^it_i$ 4. $P$ and $A$ use $f$ and $t$ and for the previous lookup process ## Extend - 2 : Multi-tables Further, suppose $P$ has multiple data tables $t^*_1,...,t^*_l$, where $t^*_i \in (F^{\omega})^{d/l}$ for each $i \in [n]$ we need to prove that $j = j(i) \in [l], (f_1(g^j),...,f_{\omega}(g^j)) \in t^*_j$ . For example, prove that every row of data in the witness is in some data table. Like a vector lookup, you want to transfer this into a case with only one polynomial and one column of data. 1. Form $t^*_1,...,t^*_l$ tables into $t^* \in (F^{\omega+1})^d$ , and add a column to each subtable - For each $j \in [l],i \in [d/l]$ , there is a row of elements $(j, (t^*_j)_i)$ in $t^*$ 2. Calculate the polynomial $q \in F_{<n}[X]$, $q(g^i) = j(i)$ 3. Use Vector lookup to prove that $(q(g^i), f_1(g^i),...,f_{\omega}(g^i)) \in t^*, i \in [n]$ ## Comparison between Plookup and Lookup Both the Plookup protocol and Halo2’s Lookup protocol can prove that $f \subset t$, but the ideas behind the two protocols are different and their differences are as follows: - Plookup needs $f$ and $t$ to build a new sequence $s$, where both the elements in $f$ and $t$ appear at least once in $s$. It then proves $s \subset t$ by comparing the non-zero distance sets of elements in $s$ are equal, and finally $f \subset s \subset t \to f \subset t$. - Halo2’s lookup proves $f \subset t$ directly, without the need to construct a new sequence, and is more concise than Plookup. - Both Plookup and Halo2 lookup require sorting and completing the set. Once completed, Plookup is $|t| = |f| + 1$, and Halo2 Lookup is $|t| = |f| = 2^k$.