MV Lookup - HackMD

--- tags: zk-prover --- # MV Lookup Implementation: - https://github.com/scroll-tech/halo2/pull/49 - https://github.com/scroll-tech/halo2/pull/71 References: - multivariate lookup argument: https://eprint.iacr.org/2022/1530 - cq: https://eprint.iacr.org/2022/1763 ## Motivating Idea Let circuit cells carry elements in a finite field $\mathbb{F}$ and let $\{\omega^i\}_{i=0}^{n-1}$ be a one-parameter cyclic subgroup of $\mathbb{F}$ with size $n$. Input column $\mathcal{F}=\{f_i\}_{i\in [n]}$, table column $\mathcal{T}=\{t_i\}_{i\in [n]}$ and $m_i, i\in [n]$ is the number of times that table element $t_i$ appear in $\mathcal{F}$. The following two specifications happen in the computation of $m_i$: - (in https://github.com/scroll-tech/halo2/pull/49) if a table element $t$ appears in $\mathcal{T}$ multiple times as $t_{i_1}=t_{i_2}=...=t_{i_k}=t$, $i_1<...<i_k$, then only $m_{i_k}$ counts the number of times $t$ appear in $\mathcal{F}$ and all other $m_{i_1}=...=m_{i_{k-1}}=0$. - (in https://github.com/scroll-tech/halo2/pull/71) if a table element $t$ appears in $\mathcal{T}$ multiple times as $t_{i_1}=t_{i_2}=...=t_{i_k}=t$, $i_1<...<i_k$, then $m_{i_1}+...+m_{i_k}$ counts the number of times $t$ appear in $\mathcal{F}$. Then $\mathcal{F}\subseteq \mathcal{T}$ can be justified with overwelming probability if for a random challenge $\beta\in \mathbb{F}$ we have $$ \sum\limits_{i\in [n]} \dfrac{1}{\beta + f_i}=\sum\limits_{i\in [n]} \dfrac{m_i}{\beta + t_i} \ . $$ Let polynomial $f(X)$ be such that $f(\omega^i)=f_i$ for $i\in [n]$; polynomials $t(x), m(x)$ be such that $t(\omega^i)=t_i$ and $m(\omega^i)=m_i$ for $i\in [n]$. Then the above equality becomes $$\begin{array}{lr} \sum\limits_{i\in [n]} \dfrac{1}{\beta + f(\omega^i)}=\sum\limits_{i\in [n]} \dfrac{m(\omega^i)}{\beta + t(\omega^i)} \ . & \qquad \verb"(1)" \end{array}$$ In the circuit, input is the polynomial $f(X)$ and table is the polynomial $t(X)$. These polynomials may be expressions (via arithmetic operations such as addition, multiplication etc.) of "assignment polynomials", each of which is a polynomial corresponding a circuit column. When we refer to the degree of any of the input or table polynomials $f(X)$, $t(X)$ etc., it stands for the degree with respect to these "assignment polynomials". ## Constraint System Using the above rationale, the implementation works with batched inputs charactered as polynomials $f^1(X),...,f^K(X)$ and table polynomial $t(X)$. The multiplicity polynomial $m(X)$ is then constructed with $m(\omega^i)=m_i$, which is the number of times that table element $t_i$ appear in all of $\{f^k(\omega^i)\}_{i\in [n], k=1,...,K}$ (if a table element $t$ appears in $\mathcal{T}$ multiple times, we use the same logic as we mentioned in the previous section, so that only the last corresponding $m$ counts). This means that we need to justify a batched version of $\verb"(1)"$: $$ \sum\limits_{i\in [n]}\sum\limits_{k=1}^K \dfrac{1}{\beta + f^k(\omega^i)}=\sum\limits_{i\in [n]} \dfrac{m(\omega^i)}{\beta + t(\omega^i)} \ , $$ which is equivalent to $$\begin{array}{lr} \sum\limits_{i\in [n]}\left( \sum\limits_{k=1}^K \dfrac{1}{\beta + f^k(\omega^i)}-\dfrac{m(\omega^i)}{\beta + t(\omega^i)}\right)=0 \ . & \qquad \verb"(2)" \end{array}$$ The implementation uses a <i>running sum argument</i> for $\verb"(2)"$, meaning that we introduce a polynomial $\phi(X)$ interpolating at $\{\omega^j\}_{j=0}^{n-1}$ with values $$\begin{array}{lr} \phi(\omega^j)=\sum\limits_{i=0}^j\left(\sum\limits_{k=1}^K \dfrac{1}{\beta + f^k(\omega^i)}-\dfrac{m(\omega^i)}{\beta + t(\omega^i)}\right) \ , & \qquad \verb"(3)" \end{array}$$ which is the same as $$\phi(1)=0 \ , \ \phi(\omega^{i+1})-\phi(\omega^i)=\sum\limits_{k=1}^K \dfrac{1}{\beta + f^k(\omega^i)}-\dfrac{m(\omega^i)}{\beta + t(\omega^i)} \ , \ i=0,1,...,n-1 \ .$$ With this running sum polynomial, $\verb"(2)"$ is the same as $\phi(1)=\phi(\omega^n)=0$. It is easy to see that the so-constructed polynomial $\phi(X)$ satisfies the following polynomial identity: $$\begin{array}{lr} \begin{array}{ll} & (\beta+t(X))\prod\limits_{k=1}^K (\beta+f^k(X))\left(\phi(\omega X)-\phi(X)\right) \\ = & (\beta+t(X))\prod\limits_{k=1}^K (\beta+f^k(X))\left(\sum\limits_{k=1}^K \dfrac{1}{\beta + f^k(X)}-\dfrac{m(X)}{\beta + t(X)}\right) \ . \end{array} & \verb"(4)" \end{array}$$ The equation $\verb"(4)"$ is the main constraint used in this implementation. The Verifier can select a random value of $X=x\in \mathbb{F}$ to check the validity of $\verb"(4)"$ with overwhelming probability. ## The Protocol The details of the protocol such as polynomial commitments, multi-opening schemes and pairing checks are the same as in the original Halo2. We only sketch major steps as follows: - Step 0. Prover knows batched lookup input expressions $f^1(X),...,f^K(X)$ and table expression $t(X)$. He computes their commitments $\boldsymbol{f}^1,...,\boldsymbol{f}^K, \boldsymbol{t}$; - Step 1. Prover computes $m(X)$ based on occurances of $\left\{f^k(\omega_i)\right\}_{i\in [n], k=1,...,K}$ in the table. After that he computes the commitment $\boldsymbol{m}$ of $m(X)$ and send it to Verifier; - Step 2. Verifier randomly samples challenege $\beta\in \mathbb{F}$ and send it to Prover. Prover uses the value of $\beta$ to compute $\phi(X)$ based on $\verb"(3)"$. Then Prover computes the commitment $\boldsymbol{\phi}$ of $\phi(X)$ and send it to Verifier; - Step 3. Verifier sends a random challenge value $x\in \mathbb{F}$ to Prover. Prover uses the value $x$ to obtain evaluations $m(x)$ and $\phi(x)$, $\phi(\omega x)$. Prover sends these evaluations to Verifier; - Step 4. Verifier checks that $\verb"(4)"$ is valid when $X=x$ using the evaluations $f^1(x),...,f^K(x)$, $t(x)$, and the evaluations $m(x)$, $\phi(x)$, $\phi(\omega x)$ provided by the Prover and the random challenege $\beta$; - Step 5. Verifier checks $\phi(0)=\phi(\omega^n)=0$ (<i>This holds implicitly</i>); - Step 6. Prover and Verifier checks the consistencies of the commitments $\boldsymbol{f}^1,...,\boldsymbol{f}^K$, $\boldsymbol{t}$, $\boldsymbol{m}$ and $\boldsymbol{\phi}$ with the evaluations $f^1(x),...,f^K(x)$, $t(x)$, $m(x)$ and $\phi(x)$ via standard multi-opening schemes and pairing checks. ## Blinding Factors and Zero-Knowledge The above description does not enable zero-knowledge in the proof. Instead, the actual implementation enforces zero-knowledge by adding blinding rows in the circuit columns, which is the same method used in the original Halo2. This leads to modification of the constraint $\verb"(4)"$ as well as the protocol. To be more precise, we set $n=2^k$ and the last $t$ rows will be blinding rows that are assigned with random numbers. So the number of usable rows is $u=2^k-t-1$. Then we set two selector columns (1) $q_{\text{blind}}$ is set to $1$ on the last $t$ rows and $0$ elsewhere; (2) $q_{\text{last}}$ is set to $1$ only on row $u$ and $0$ elsewhere. These two selector columns will be interpolated to two polynomials $q_{\text{blind}}(X)$ and $q_{\text{last}}(X)$. The blinding rows affect all inputs polynomials $f^1(X),...,f^K(X)$ and table polynomials $t(X)$, as well as Prover constructed polynomials $m(X)$ and $\phi(X)$. The constraint $\verb"(4)"$ must be modified to $$\begin{array}{lr} \begin{array}{ll} & (1-q_{\text{last}}(X)-q_{\text{blind}}(X))(\beta+t(X))\prod\limits_{k=1}^K (\beta+f^k(X))\left(\phi(\omega X)-\phi(X)\right) \\ = & (1-q_{\text{last}}(X)-q_{\text{blind}}(X))(\beta+t(X))\prod\limits_{k=1}^K (\beta+f^k(X))\left(\sum\limits_{k=1}^K \dfrac{1}{\beta + f^k(X)}-\dfrac{m(X)}{\beta + t(X)}\right) \ , \end{array} & \verb"(5)" \end{array} $$ and with $\phi(1)=\phi(\omega^u)=0$. ## Optimizing input batch size according to degree constraints In the implementation, the batch input size $K$ is chosen via a greedy algorithm that complies with the circuit's maximal degree restriction. For each lookup with input $f(X)$ and table $t(X)$, its required degree $$\verb"LookupDeg"(f(X), t(X))$$ in constraint $\verb"(5)"$ is obtained from its LHS expression $$(1-q_{\text{last}}(X)-q_{\text{blind}}(X))(\beta+t(X))(\beta+f(X))\left(\phi(\omega X)-\phi(X)\right) \ ,$$ which is calculated by the formula $$\verb"LookupDeg"(f(X), t(X)) = \max\left(3, 2+\text{deg}(f(X))\right)+\text{deg}(t(X)) \ ,$$ where $\text{deg}(t(X))$ is the table degree and $\text{deg}(f(X))$ is the input degree, both counted in terms of "assignment polynomials". The $2$ comes from adding the degrees of $1-q_{\text{last}}(X)-q_{\text{blind}}(X)$ and $\phi(\omega X)-\phi(X)$ (their degrees with respect to "assignment polynomials" are both equal to $1$). Let the list of all inputs that lookup into the same table $t(X)$ be $f^1(X), ..., f^M(X)$. The implementation first scans through all these inputs and obtains $$\verb"MaxLookupDeg"=\max\limits_{i=1,...,M} \verb"LookupDeg"(f^i(X), t(X)) \ .$$ Let the maximal degree of all the circuit's custom gates be $\verb"MaxGateDeg"$. Then the minimal required degree of the circuit is given by $$\begin{array}{lr} \verb"required_degree"=\min\left\{2^k: k\geq 1, 2^k\geq \max(\verb"MaxLookupDeg", \verb"MaxGateDeg")\right\} . & \qquad \verb"(6)" \end{array}$$ Notice that for a batch of inputs $f^1(X),...,f^K(X)$ that lookup into the table expression $t(X)$, the required lookup degree in $\verb"(5)"$ is given by $$\begin{array}{lr} \max\left(3+K \ , \ \sum\limits_{k=1}^K \text{deg}(f^k(X))+\text{deg}(t(X))+2\right) \ . & \qquad \verb"(7)" \end{array}$$ So inserting a new input will increase the above quantity by at least the new input's degree. Using $\verb"required_degree"$ obtained in $\verb"(6)"$, the implementation splits all inputs $\{f^1(X)$ $, ...,$ $f^M(X)\}$ into batches in a greedy manner: it inserts as many inputs as possible into one batch, increasing the required lookup degree in $\verb"(5)"$ via $\verb"(7)"$ until the next insertion will have the value of $\verb"(7)"$ exceed $\verb"required_degree"$ in $\verb"(5)"$. Then it moves to the next batch and so on. This optimization of input batch size can yield nearly balanced degrees for constraint $\verb"(5)"$ for each batch of inputs. ## Comparison with the original Halo2's lookup argument Here we compare mv lookup with Halo2's original lookup argument. We assume there is lookup expression and one table expression and no add zero-knowlegde is added. ![](https://hackmd.io/_uploads/ryutiRHlT.png) - Prover's pre-processing data operations - Halo2 lookup: arrange $A(X), S(X)$ into $A'(X), S'(X)$ - mv lookup: compute the multiplicity $m(X)$ - Prover's Polynomial Commitments - Halo2 lookup: $A(X), S(X), A'(X), S'(X), Z(X)$ - mv lookup: $f(X), t(X), m(X), \phi(X)$ - Constraints - Halo2 lookup: $$\begin{array}{rcl} Z(\omega X)(A'(X)+\beta)(S'(X)+\gamma) - Z(X)(A(X)+\beta)(S(X)+\gamma) & = & 0 \\ (A'(X)-S'(X))(A'(X)-A'(\omega^{-1}X)) & = & 0 \\ \ell_0(X)(1-Z(X)) & = & 0 \\ \ell_0(X)(A'(X)-S'(X)) & = & 0 \end{array}$$ - mv lookup: $$\begin{array}{rcl} (\beta+t(X)) (\beta+f(X))\left(\phi(\omega X)-\phi(X)\right) -(\beta+t(X))(\beta+f(X))\left(\dfrac{1}{\beta + f(X)}-\dfrac{m(X)}{\beta + t(X)}\right) & = & 0 \\ \ell_0(X)\phi(X) & = & 0 \end{array} $$ - challenges sampled by verifier - Halo2 lookup: $\beta, \gamma$ - mv lookup: $\beta$ - Constraint order - Halo2 lookup: 3 - mv lookup: 3 - Scale with batch inputs - Halo2 lookup: Each lookup needs a separate constraint system - mv lookup: can put various lookups in one constraint at the cost of increasing the constraint order