Cryptographic work, such as commitments, is often the primary performance bottleneck in SNARKs. This cost is particularly pronounced when committed values are random and necessarily large, as is the case with PLONKish systems.
In recent years, a lot of innovation have been aimed at making commitments as cheap as possible. Circle STARKs and Binius, in particular, are hash-based and perfectly sound over small fields, M31 and GF\[2\] respectively. This means lower prover overhead and better hardware compatibility.
However, it should be noted that FRI, the scheme behind these improvements, involves superlinear-time procedures like FFTs, which become a new bottleneck when applied directly to large computations without recursion.
In this note, I will explore GKR, an interactive proof (IP) scheme that addresses cryptographic overhead differently—by nearly avoiding commitments in the first place.
See full article at https://taueflambda.dev/posts/gkr.
## Background
[GKR08](https://www.microsoft.com/en-us/research/wp-content/uploads/2008/01/GoldwasserKR08a.pdf) is a relatively old scheme that is based on multivariate polynomials and the sum-check protocol. A technology that was largely ignored in favor of simpler designs featuring univariate polynomials and [divisibility check](https://nmohnblatt.github.io/zk-jargon-decoder/definitions/vanishing_polynomial.html). Lately, however, it has seen renewed interest as projects like [Jolt](https://github.com/a16z/jolt), [Expander](https://expander.polyhedra.network/), and [Modulus](https://github.com/Modulus-Labs/Papers/blob/master/remainder-paper.pdf) have demonstrated not only its viability but also impressive performance results.
Notably, modern GKR-based systems demonstrate $O(n)$ prover complexity, which is linear in the size of the computation and with a constant factor overhead. For some applications, like matrix multiplication, the prover is less than 10x slower than a C++ program that simply evaluates the circuit, [Tha13](https://eprint.iacr.org/2013/351.pdf).
Furthermore, the aforementioned Binius is multivariate and it too involves sumcheck. Even StarkWare's Circle-Stark-based proof system called [Stwo](https://github.com/starkware-libs/stwo/tree/dev/crates/prover/src/core/lookups) used GKR for LogUp lookups. I think it is safe to say that there is currently a broad consensus on the relevance of GKR.
### Multilinear functions & MLEs
**Multilinear polynomial** is a multivariate polynomial that is linear in each variable, meaning it has degree at most one in each variable. When multilinear polynomials defined over the boolean domain, they have a much lower degree compared to the univariate case for the same domain size, [T23a](https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.pdf#page=29).
For any multilinear polynomial $P\left(x_1, x_2, \ldots, x_v\right)$ over the boolean domain $\{0,1\}^v$ (the boolean hypercube), polynomial operations such as addition, multiplication, and evaluation can be performed using only the evaluations of $P$ on the boolean hypercube. This eliminates the need to explicitly reconstruct the polynomial from its evaluations, so there's no FFTs.
**Multilinear extension (MLE)** is used to translate functions into polynomials over the boolean domain $\{0,1\}^v$. Every function $f$ and vector $\vec{a}$ mapping from $\{0,1\}^v \rightarrow \mathbb{F}$ has exactly one extension polynomial that is multilinear.
The multilinear extension is a multivariate analog of the low-degree extensions (LDE) commonly present in STARKs. One thinks of $\operatorname{MLE}(f)$ as an "extension" of $a$, as $\operatorname{MLE}(a)$ "begins" with $a$ itself but includes a large number of additional entries. This distance-amplifying nature of MLE, combined with the [Schwartz-Zippel lemma](https://nmohnblatt.github.io/zk-jargon-decoder/definitions/schwartz_zippel.html), forms the first basic building block of multivariate interactive proofs and GKR in particular.
### Sumcheck
The Sumcheck protocol allows a **prover** $\mathbf{P}$ to convince a **verifier** $\mathbf{V}$ that the sum of a multivariate polynomial over boolean inputs $[x_i \in \{0, 1\}]$ is computed correctly.
$$
H := \sum_{(x_1,\cdots,x_v) \in \{0,1\}} g(x_1, x_2, \cdots, x_v)
$$
Sumcheck does not require $\mathbf{V}$ to know anything about the polynomial to which it is being applied. It is only until the final check in the protocol that, depending on the performance/succinctness tradeoff, $\mathbf{V}$ can choose to either request the polynomial and evaluate it directly or perform an oracle query, i.e., outsource the evaluation to $\mathbf{P}$ via a polynomial commitment scheme (PCS).
This is an interactive protocol: If $\ell$ is the number of variables in $g$, then $\ell$ rounds are required to complete the protocol. By applying the Fiat-Shamir transform, we can render the sumcheck non-interactive.
This post doesn't aim to explain the general workings of sumcheck. For that, I recommend [T23a](https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.pdf#page=35) (section 4.1) or [this blog post](https://semiotic.ai/articles/sumcheck-tutorial/).
## GKR
In GKR, we work with layered circuits. A circuit is layered if it can be partitioned into layers such that every wire in the circuit is between adjacent layers, [L23](https://eprint.iacr.org/2023/1066.pdf#page=6). The number of layers is the depth of the circuit, denoted as $d$. Note that many of today's variations of GKR allow for more arbitrary topologies just as well.
The arithmetic circuit $\mathcal{C}$ encodes logic with addition and multiplication gates to combine values on the incoming wires. Accordingly, functions $\operatorname{add}_i$ and $\operatorname{mul}_i$ are gate selectors that together constitute the wiring predicate of layer $i$.
We encode wire values on a boolean hypercube $\{0,1\}^n$, creating multi-linear extensions $\widetilde{W}_i(x)$ for each layer $i$. The output is in $\widetilde{W}_0(x)$, and inputs will be encoded in $\widetilde{W}_d(x)$.
Gate selectors depend only on the wiring pattern of the circuit, not on the values, so they can be evaluated by the verifier locally, [XZZ19](https://eprint.iacr.org/2019/317.pdf#page=10). Each gate $a$ at layer $i$ has two unique in-neighbors, namely $\operatorname{in}_1(a)$ and $\operatorname{in}_2(a)$.
$$
\operatorname{add}_i(a, b, c)= \begin{cases}1 & \text { if }(b, c)=\left(\operatorname{in}_1(a), \operatorname{in}_2(a)\right) \\ 0 & \text { otherwise }\end{cases}
$$
and $\operatorname{mul}_i(a, b, c)=0$ for all $b, c \in\{0,1\}^{\ell_{i+1}}$ (the case where gate $a$ is a multiplication gate is similar), [T23a](https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.pdf#page=63). Selector MLEs are sparse, with at most $2^{2\ell}$ non-zero elements out of ${0,1}^{3\ell}$.
$$
\begin{aligned}
\operatorname{add}_i(x, y, z) & = \begin{cases}1 & \widetilde{W}_i(x)=\widetilde{W}_{i+1}(y)+\widetilde{W}_{i+1}(z) \\
0 & \text { otherwise }\end{cases} \\
\operatorname{mul}_i(x, y, z) & = \begin{cases}1 & \widetilde{W}_i(x)=\widetilde{W}_{i+1}(y) \cdot \widetilde{W}_{i+1}(z) \\
0 & \text { otherwise }\end{cases}
\end{aligned}
$$
> Note, the above definition does not reflect how selector MLEs are computed in practice, it is an effective method for illustrating the relationship between selectors and wire values.
The GKR prover starts with the output claim and iteratively applies the sumcheck protocol to reduce it from one layer to the next until it arrives at the input. The values on the layers are related thusly:
$$
\widetilde{W}_i(x)=\sum_{y, z \in\{0,1\}^{\ell_{i+1}}}\operatorname{add}_i(x, y, z) \cdot(\widetilde{W}_{i+1}(y)+\widetilde{W}_{i+1}(z))+\operatorname{mul}_i(x, y, z) \cdot(\widetilde{W}_{i+1}(y) \cdot \widetilde{W}_{i+1}(z))
\tag{1}
$$
A few things to note here: First, notice how gate selectors are indexed—this aligns with the convention that selectors at layer $i$ determine how values from the next layer $i+1$ are combined. Notice also that the $i$-layer sumcheck is over boolean inputs of size $\ell_{i+1}$, i.e., the number of gates in the next layer.
## Protocol
1. $\mathbf{P}$ sends the output vector $\vec{\omega}$ and claims that $\tilde{\omega} = \widetilde{W}_0$.
2. $\mathbf{V}$ sends random $r_0 \in \mathbb{E}$ and computes $m_0 := \tilde{\omega}(r_0)$.
3. $\mathbf{P}$ and $\mathbf{V}$ apply sumcheck on the relation between $W_0$ and $W_1$ (using $f_{r_0}$ for the summand)
$$
\sum_{y, z \in\{0,1\}^{\ell_1}} f_{r_0}(y, z) \stackrel{?}{=} m_0
$$
4. $\mathbf{P}$ and $\mathbf{V}$ reduce two claims $W_{i+1}(b)$ and $W_{i+1}(c)$ to a single random evaluation/combination $m_i$.
5. $\mathbf{P}$ and $\mathbf{V}$ apply sumcheck on the reduced relation $m_i$, alternating steps 4-5 for $d-1$ more times.
6. $\mathbf{V}$ checks that $\widetilde{W}_d$ is consistent with the inputs vector $\vec{x}$.
#### 1. $\mathbf{P}$ sends the output vector $\vec{\omega}$ and claims that $\tilde{\omega} = \widetilde{W}_0$
First, the $\mathbf{P}$ has to evaluate the circuit at the given input vector $\vec{x}$. This is why the prover is always at least linear to the size of the computation (circuit).
A common notation would be to write the output vector $\vec{\omega}$ as a function $D: \{0,1\}^{\ell_0} \rightarrow \mathbb{F}$ mapping output gate labels to output values, $2^{\ell_0}$ of them in total. The gate labels are the vertices of the boolean hypercube; for example, for $\ell_0=2$, the 4 output labels are $00$, $01$, $10$, $11$.
In practice, though, $\mathbf{P}$ and $\mathbf{V}$ work with polynomials, not functions, so they have to compute multilinear extension $\tilde{\omega}$ from the vector $\vec{\omega}$. Same goes for the inputs $\vec{x} \rightarrow \widetilde{W}_d$. However, since computation on MLEs can be performed over evaluations, in practice no additional computation, like interpolation/IFFT, is even required!
#### 2. $\mathbf{V}$ sends random $r_0 \in \mathbb{E}$ and computes $m_0 := \tilde{\omega}(r_0)$
$\mathbf{V}$ picks a random challenge $r_0 \in \mathbb{E}$. Crucially, the soundness error of the sumcheck protocol is inversely proportional to the size of the field from which challenges $r_i$ are drawn (due to the Schwartz-Zippel lemma), [T23a](https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.pdf#page=66). That's why in practice for challenge values, we use an extension field $\mathbb{E} := \mathbb{F}_{p^k}$ where $k$ is the extension degree.
For example, say we opt for M31 as a base field $\mathbb{F}_p$ whose order is $|\mathbb{F}_p| = p=2^{31}-1$ elements. Then, the probability of soundness error in sumcheck for an $\ell$-variate summand polynomial is $\frac{\ell}{2^{31}-1}$, which is too large, certainly for the non-interactive setting. Instead, we choose challenges from QM31—the extension field of M31 with $k=4$. Now, for any reasonable $\ell$, the soundness error can be considered negligible.
This soundness characteristic is a notable drawback of sumcheck-based systems, as it requires the GKR prover to work over extension fields after the second round, thus making field work the main contributor to the proving overhead. In Part 2, I'll cover the recent work from [Bagad et al](https://eprint.iacr.org/2024/1046.pdf) describing an algorithm that reduces the number of extension field operations by multiple orders of magnitude.
Also note that $\mathbf{V}$ cannot yet trust $\tilde{\omega}$ correctly to encode the outputs. The remainder of the protocol is devoted to confirming that the output claim is consistent with the rest of the circuit and its inputs.
#### 3. $\mathbf{P}$ and $\mathbf{V}$ apply sumcheck on the relation between $W_0$ and $W_1$
For the first layer, $\mathbf{P}$ and $\mathbf{V}$ run a sumcheck on Equation $(1)$ with $i=0$, between $W_0$ and $W_1$, with $x$ fixed to $r_0$.
$$
\sum_{y, z \in\{0,1\}^{\ell_1}} \operatorname{add}_{r_0}(y, z) \cdot\left(\widetilde{W}_1(y)+\widetilde{W}_1(z)\right)+\operatorname{mul}_{r_0}(y, z) \cdot\left(\widetilde{W}_1(y) \cdot \widetilde{W}_1(z)\right) \stackrel{?}{=} m_0
$$
In the first round of sumcheck, $\mathbf{P}$ uses $r_0$ to fix the variable $x$. In the remaining two rounds, $\mathbf{V}$ picks $b, c \in \mathbb{E}$ randomly to fix the variables $y$ and $z$, respectively. At the end of the sumcheck, from $\mathbf{V}$'s point of view, the relation on the sum (Equation $1$) is reduced to a simple check that the summand $f_{r_0}(b, c)$ evaluates to $m_0$.
To compute $f_{r_0}(b, c)$, $\mathbf{V}$ must compute $\operatorname{add}_{r_0}(b, c)$ and $\operatorname{mul}_{r_0}(b, c)$ locally. Remember that these depend only on the circuit's wiring pattern, not on the values. Since $f_{r_0}$ is recursive, $\mathbf{V}$ also asks $\mathbf{P}$ for the $\widetilde{W}_1(b)$ and $\widetilde{W}_1(c)$ values and computes $f_{r_0}(u, v)$ to complete the sumcheck protocol.
In this way, $\mathbf{P}$ and $\mathbf{V}$ reduce a claim about the output to two claims about values in layer 1. While $\mathbf{P}$ and $\mathbf{V}$ could recursively invoke two sumcheck protocols on $\widetilde{W}_1(b)$ and $\widetilde{W}_1(c)$ for the layers above, the number of claims and sumcheck protocols would grow exponentially in $d$. [XZZ19](https://eprint.iacr.org/2019/317.pdf#page=10)
#### 4. $\mathbf{P}$ and $\mathbf{V}$ reduce two claims $W_{i+1}(b)$ and $W_{i+1}(c)$ to a single random evaluation/combination $m_i$
To avoid an exponential blow-up in complexity, $\mathbf{P}$ and $\mathbf{V}$ apply a "rand-eval" reduction subroutine.
Here I will describe a method from [CFS17](https://eprint.iacr.org/2017/305.pdf) based on random linear combination (RLC), as it is more commonly found in modern implementations. Later, for completeness, I will also include the method based on line restriction, as described in the original paper [GKR08](https://www.microsoft.com/en-us/research/wp-content/uploads/2008/01/GoldwasserKR08a.pdf).
Given two claims $\widetilde{W}_1(b)$ and $\widetilde{W}_1(c)$, $\mathbf{V}$ picks random weights $\alpha_i, \beta_i \in \mathbb{E}$ and computes the RLC as
$$
m_i = \alpha_i \widetilde{W}_1(b)+\beta_i \widetilde{W}_1(c)
$$
In the next step, $\mathbf{V}$ would use $m_i$ as the claim for the $i$-th layer sumcheck.
#### 5. $\mathbf{P}$ and $\mathbf{V}$ apply sumcheck on the reduced relation $m_i$, alternating steps 4-5 for $d-1$ more times
For the layers $i=1, \ldots, d-1$, $\mathbf{P}$ and $\mathbf{V}$ would execute the sumcheck protocol on Equation $(2)$ instead of Equation $(1)$
$$
\begin{align}
&\alpha_i \widetilde{W}_i(b)+\beta_i \widetilde{W}_i(c) = \\
&\sum_{y, z \in\{0,1\}^n}\binom{(\alpha_i\cdot\operatorname{add}_i(b, y, z)+\beta_i\cdot\operatorname{add}_i(c, y, z)) \cdot(\widetilde{W}_{i+1}(y)+\widetilde{W}_{i+1}(z))}{+(\alpha_i\cdot\operatorname{mul}_i(b, y, z) +\beta_i\cdot\operatorname{mul}_i(c, y, z)) \cdot(\widetilde{W}_{i+1}(y) \cdot \widetilde{W}_{i+1}(z))}
\end{align}\tag{2}
$$
At the end of each layer's sumcheck protocol, $\mathbf{V}$ still receives two claims about $\widetilde{W}_{i+1}$, computes their random linear combination, and proceeds to the next layer above recursively until the input layer. [XZZ19](https://eprint.iacr.org/2019/317.pdf#page=10)
#### 6. $\mathbf{V}$ checks that $\widetilde{W}_d$ is consistent with the inputs vector $\vec{x}$
At the input layer $d$, $\mathbf{V}$ receives two claims $\widetilde{W}_d(b_{i=d})$ and $\widetilde{W}_d(c_{i=d})$ from $\mathbf{P}$. Recall that $\widetilde{W}_d$ is claimed to be a multilinear extension of the input vector $\vec{x}$.
If $\mathbf{V}$ knows all the inputs in clear, they can compute $\widetilde{W}_d$ and evaluate it at $b_{i=d}$ and $c_{i=d}$ themselves.
Alternatively, if $\mathbf{V}$ doesn't know all inputs and is instead given an input commitment $[w_d]$ for succinctness or zero-knowledge reasons, then $\mathbf{V}$ queries the oracle for evaluations of $\widetilde{W}_d$ at $b_{i=d}$ and $c_{i=d}$.
Ultimately, $\mathbf{V}$ outputs $\mathbf{accept}$ if the evaluated values are the same as the two claims; otherwise, they output $\mathbf{reject}$. [XZZ19](https://eprint.iacr.org/2019/317.pdf#page=11)
## Analysis
Let $\mathcal{C}: \mathbb{F} \rightarrow \mathbb{E}$ be a layered circuit of depth $d$ with $n := |\vec{x}|$ input variables, having $S_i$ gates in layer $i$. Naturally, $|\mathcal{C}| := \sum^d_{i=0} S_i$ is the total number of gates in the circuit, i.e., the circuit size.
- **Rounds:** $O(d\cdot \log |\mathcal{C}|)$.
- **$\mathbf{P}$ time:** $O(|\mathcal{C}|\cdot\log |\mathcal{C}|)$
- **$\mathbf{V}$ work:** $O(n+d\cdot\log |\mathcal{C}|+t+S_0) \approx O(n+d\cdot\log |\mathcal{C}|)$ for 1 output.
- **Communication:** $O(S_0+d\cdot\log |\mathcal{C}|)$ field elements.
- **Soundness error:** $O(\frac{d\cdot\log |\mathcal{C}|}{|\mathbb{E}|})$
The GKR protocol consists of one execution of the sumcheck protocol per layer. Therefore, the total communication cost (proof size) is $O(d \log |\mathcal{C}|)$ field elements. The accumulated soundness error is $O\left(\frac{d \cdot \log |\mathcal{C}|}{|\mathbb{E}|}\right)$ due to the Schwartz-Zippel lemma. [T23a](https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.pdf#page=66)
The prover must first evaluate the circuit, which takes time $|\mathcal{C}|$. It must also compute $\operatorname{add}_i$ and $\operatorname{mul}_i$ at each layer, which, if done trivially, induces logarithmic overhead $\log |\mathcal{C}|$. The resulting time for the prover is $O(|\mathcal{C}| \cdot \log |\mathcal{C}|)$. [XZZ19](https://eprint.iacr.org/2019/317.pdf#page=10)
The verifier's work is $O(n + d \log |\mathcal{C}| + t + S_0)$, where $S_0$ is the number of outputs of the circuit, $t$ denotes the optimal time to evaluate all $\operatorname{add}_i$ and $\operatorname{mul}_i$, and the $n$ term is due to the time needed to evaluate $\widetilde{W}_d$. [XZZ19](https://eprint.iacr.org/2019/317.pdf#page=10)
Part 2 will cover modifications and techniques that result in $O(|\mathcal{C})$ prover.
## Zero Knowledge
To make the sumcheck protocol zero-knowledge, the polynomial in the sumcheck protocol is masked by a random polynomial.
To prove the sumcheck for the summand polynomial $f$ from Equation $(1)$ and a claim $W$ in zero-knowledge, $\mathbf{P}$ generates a random polynomial $\gamma$ with the same variables and individual degrees as $f$, commits to $\gamma$, and sends $\mathbf{V}$ a claim
$$\Gamma = \sum_{x_1, \ldots, x_{\ell} \in \{0,1\}^{\ell}} \gamma(x_1, \ldots, x_{\ell})$$
$\mathbf{V}$ picks a random number $\rho$, and together with $\mathbf{P}$ executes the sumcheck protocol on
$$
W+\rho\cdot\Gamma=\sum_{x_1, \ldots, x_{n} \in\{0,1\}^{\ell}} f\left(x_1, \ldots, x_{n}\right)+\rho\cdot\gamma\left(x_1, \ldots, x_{n}\right)
$$
In the last round of this sumcheck, $\mathbf{P}$ opens the commitment to $\gamma$ at $\gamma(r_1, \ldots, r_{\ell})$, and the verifier computes $f(r_1, \ldots, r_{\ell})$ by subtracting $\rho \cdot \gamma(r_1, \ldots, r_{\ell})$ from the last message, and compares it with $\mathbf{P}$'s original claim.
[Chiesa et al.](https://eprint.iacr.org/2017/305.pdf#page=11) showed that as long as the commitment and opening of $\gamma$ are zero-knowledge, the protocol is zero-knowledge.
## Original method to reduce two claims to one
Let $\widetilde{W}$ be a multilinear polynomial over $\mathbb{F}$ with $\log n$ variables. The following is the description of a simple one-round subroutine from [GKR08](https://www.microsoft.com/en-us/research/wp-content/uploads/2008/01/GoldwasserKR08a.pdf) with communication cost $O(\log n)$ that reduces the evaluation of $\widetilde{W}(b)$ and $\widetilde{W}(c)$ to the evaluation of $\widetilde{W}(r)$ for a single point $r \in \mathbb{E}$.
$\mathbf{P}$ interpolates a unique line $\ell$ passing through $b$ and $c$, such that $\ell(0)=b$ and $\ell(1)=c$. The line can be formally defined as $\ell(t) = b + t \cdot (c - b)$ using the point-slope form. The points $b$ and $c$ are tuples with $v$ elements for a $v$-variate polynomial $\widetilde{W}$.
By substituting $b$ and $c - b$ into $\ell(t)$, we get the tuple $\ell(t) = (\ell_0(t), \cdots, \ell_{v}(t))$ defined by $v$ linear polynomials over $t$. $\mathbf{P}$ sends a univariate polynomial $q$ of degree at most $k_{i+1}$ that is claimed to be $\widetilde{W}_{i+1} \circ \ell$, the restriction of $\widetilde{W}_{i+1}$ to the unique line $\ell$.
$$
q(t) := (\widetilde{W} \circ \ell)(t)=\widetilde{W}(\ell_0(t),\cdots,\ell_{\log n}(t)))
$$
$\mathbf{V}$ interprets $q(0)$ and $q(1)$ as $\mathbf{P}$'s claims for the values of $\widetilde{W}(b)$ and $\widetilde{W}(c)$. $\mathbf{V}$ also picks a random point $r^* \in \mathbb{F}$, sets $r=\ell(r^*)$, and interprets $q(r^*)$ as $\mathbf{P}$'s claim for the value of $\widetilde{W}(r)$. See the picture and an example from [T23a](https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.pdf#page=54) (Section 4.5.2).
## References
- [\[Tha13\]](https://eprint.iacr.org/2013/351.pdf) Justin Thaler (2013). "Time-Optimal Interactive Proofs for Circuit Evaluation"
- [\[T23a\]](https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.pdf) Justin Thaler (2023). "Proofs, Arguments, and Zero-Knowledge". See also [lecture notes](https://people.cs.georgetown.edu/jthaler/COSC544.html).
- [\[L23\]](https://eprint.iacr.org/2023/1066.pdf) Jieyi Long (2023). "Efficient Arguments and Proofs for Batch Arithmetic Circuit Satisfiability"
- [\[XZZ+19\]](https://eprint.iacr.org/2019/317.pdf): Tiancheng Xie et al. (2019). "Libra: Succinct Zero-Knowledge Proofs with Optimal Prover Computation"
- [\[G24\]](https://github.com/arielgabizon/Lectures/blob/master/zkWarsawJan24.pdf) Ariel Gabizon (2024). [zkWarsaw talk](https://youtu.be/tSpvk3EGRSM?si=ggekZA2JGKjbBx4z) "The GKR method".