to GKR or not - thoughts on GKR

The Lasso use case

We analyze the advantage of GKR when computing a randomized grand product - as comes up for mutliset equality checks.
tl;dr - seems GKR is significantly cheaper - unless we use sumcheck and assume our input vector is split to 7-8 columns.
An optimized WIP version of GKR might make GKR significantly cheaper also in this case.

The setting: we have a commitment to a vector \(f=(f_1,\ldots,f_n)\),
a verifier challenge \(\gamma\),
and wish to prove the correctness of the value
\(c=\prod_{i\in [n]}(f_i+\gamma)\).

"standard approach":

The plonk approach is to commit
to the vector \(Z\)
with \(Z_i=\prod_{j\leq i} (f_i+\gamma)\).
i.e. a size \(n\) MSM with full scalars.
So the cost is an \(n\) size MSM with 256 bit scalars. According to this that costs us roughly \(256\cdot n/\log (n)\) group adds, which according to this
costs roughly \(6\cdot 256 n/\log n= 1536 n/ \log n\) field mults.

In the univariate setting, we have to add the cost of field operations to compute the
quotient \(Q\) showing \(Z\) is well-formed.
Say \(H\) is a subgroup of size \(n\) with lagrange base \(L_1(X),\ldots,L_{n}(X)\) and generator \(\omega\).
\(Q\) will look something like (see sec 5 and 8.3 of plonk paper for similar computations)
\[Q(X)=\frac{L_1(X)(Z(X)-1)+\alpha\cdot \prod_{i\in [n]}[Z(\omega X)-Z(X)f(X)]}{Z_{H}(X)} \]
for a verifier challenge \(\alpha\).
To interpolate \(Q\) we need to evaluate it at \(n\) points.
For this we first must evaluate \(Z\) at a coset of \(H\)
requiring \(2n\log n\) mults in IFFT and FFT after \(n\) mults for evaluating \(Z\) on \(H\). Each evaluation then requires \(3\) mutliplications
and one inversion.
Using batch inversion, that is roughly \(6n\) mults in total.
Giving us \(1536 n/ \log n + 7n + 2n\log n\) in total.
In a mutlivariate sumcheck context, the FFT for \(Z\) is removed
so this is "just" \(1536 n/ \log n +7n\)

If we take e.g. \(n=2^{20}\) this is \(123.8 n;83.8n\).
Or for \(n=2^{22}\), \(118.8 n;84.8n\).

Trying to improve the bound:

We could instead try for parameter \(t\), to have a \(Z\) of size \(n/t\) that accumulates
\(t\) values of \(f\) in each new value.
So that \(Z_i\) is the product of the first \(ti\) values of \(f\).
Setting \(n'=n/t\), the MSM cost for \(Z\) will be \(1536 n'/ \log n'\)

Now the quotient \(Q\) for checking well formedness will have degree \(tn\)
and the form
\[Q(X)=\]
\[\frac{L'_1(X)(Z(X)-1)+\alpha\cdot \prod_{i\in [n']}Z(\tau X)-Z(X)\prod_{j\in [t]}Z(X)f(\omega^{j-1}X)}{Z_{H'}(X)}, \]
where \(H'\) is a subgroup of size \(n'\) contained in \(H\) with generator \(\tau\).

An evaluation of \(T\) now requires \(t+2\) multiplications, and one inversion and we need \(tn\) evals.
Ignoring FFT costs,using batch inversion, and slightly simplfiying this means we need in total roughly
\(1536 n/(t \log n) + (t+5)tn\) multiplications.
Evaluating for various \(t\) we say that
e.g.
when \(n=20\), this is minimized for \(t=3\) and gives 49.6n - or 89.6n with FFT's
when \(n=22\), this is minimized for \(t=3\) and gives 47.3n - or 91.3n with FFT's

A more favorable setting for partial products

Say we alter the problem definition - and assume \(f\) is given in advance as \(t\) commitments to segments of \(n/t\) of its values.
This is sometimes the case (e.g. in the plonk paper where we have \(f_L,f_R,f_O\)) but not entirely in our control.

\(T\)'s degree goes down below \(n\)
and our operation count goes down to \(1536 n/(t \log n) + (t+5)n\)

e.g.
when \(n=20\), this is minimized for \(t=9\) and gives 22.5n - or 62.5n with FFT's
when \(n=22\), this is minimized for \(t=8\) and gives 21.7n - or 65.7n with FFT's

It's worth noting that in the univariate case, we can use the fflonk commitment scheme to commit to the columns with one commitment and still open the individual column polynomials getting the reduced quotient degree giving these reduced numbers

GKR approach:

Thaler sec 5 gives an optimized version of GKR for this case. Namely, when our layered circuit is a binary tree of multiplications with \(n\) leaves. Thaler's result in sec 5.4 is quoted as \(O(n)\) prover field ops - an improvement over the CMT result of \(O(n\log n)\) for general GKR of a fan-in two layered circuit.
We'll try to give a more concrete estimate for the number of mults[1].

polys coming up in Thaler13 for tree of mults

Denoting by \(V_i\) the mle for the \(i\)'th layer at the circuit (thinking of the \(\log n\)'th layer as the input).
Thaler observes
that for the tree of mults circuit, for any \(z\):
\[V_i(z)=\sum_{p\in \{0,1\}^{s_i}}g^i_z(p)\]
where

\[g^i_z(p) := eq(z,p)V_{i+1}(p,0)V_{i+1}(p,1)\]
Most of the work in sec 5.4 is about computing all the required \(eq\) and \(V_{i+1}\) values in \(O(n)\) time via dynamic programming.
after which the \(g^i_z(p)\) values can be computed in two multiplications each:
image

Calculation shows there are in total \(3n\) \(g^i_z\) values that need to be computed (throughout all layers and rounds of each sumcheck, and using the fact that the univariates are linearly constrained and thus need three instead of four values to interpolate ): Hence requiring in total \(6n\) mults after required \(eq\) and \(V_{i+1}\) vals are given.

Now we need to estimate the constant in the dynamic programming for all the \(eq\) values. Looking at top of pg 21, we see the eq values are computed for the \(\log n - i\)'th layer in a tree with \(n/2^i\) leaves on the bottom, where leaves involve two multiplications and other nodes involve three mults.

This gives in total \(n+3n/2 + n/2 + 3n/4 + n/4 + 3n/8 .. =2n + 3n =5n\) mults

adding so far to \(11n\).

Finally, for each of the \(3n\) needed evaluations (again going over all rounds and layers)
of the form \(g^i_z(a)\) for some \(i,z,a\) we need \(V_{i+1}(a,0), V_{i+1}(a,1)\).
In Thaler13 these values are computed via a similar dynamic program for a more complex example than the product circuit.
Assuming one mult is needed per eval (TODO:check this) we're up to \(11n+6n=17n\).

Amortization:
Thaler mentions that when running several GKR protocols in parallel we can compute the same eq values for all of them, bringing the number down to \(12n\) per protocol execution.

Lasso paper thoughts -

Lasso paper says "Moreover, evaluation proofs exhibit excellent batching propertiesFor all of the above reasons, our accounting of prover cost in this work generally ignores the cost of polynomial
evaluation proofs."
It would be nice to have an estimate including opening costs: For GKR it might be batching different mutlilinear openings - different polys at different points. Hyperplonk sec 3.8 gives such batching at a cost of \(O(k n)\) field ops
where \(k\) is number of evaluation points and \(n\) is size of hypercube.
Although if all GKR's are run in parallel in terms of challenge generation for their \(i\)'th rounds, we might be able to manage same evaluation point for all. (We just need the bottom layer to be run in parallel as that's the one where we actually do a pcs query)

thx to Justin Thaler and Benedikt Bünz for comments. thx to Axiom for supporting this work


  1. We follow Thaler13 for the constant estimation here. Thaler reports a WIP with better constants. ↩︎

Select a repo