owned this note
owned this note
Published
Linked with GitHub
# Optimizing the Sumcheck Protocol
## What is the Sumcheck Protocol?
The Sumcheck protocol is an interactive proof system where a **Prover** \(P\) convinces a **Verifier** (V) of a specific claim without the Verifier re-doing all the work.
* **The Claim**: The Prover asserts that the sum of a known multivariate polynomial $g(X_1, \dots, X_\ell)$ over every point in the Boolean hypercube $\{0,1\}^\ell$ equals a certain value $H$.
$$
H = \sum_{x \in \{0,1\}^\ell} g(x)
$$
* **The Goal**: The Verifier needs to check this claim efficiently. A naive check would require $2^\ell$ evaluations of $g$, which is computationally infeasible for the Verifier.
* **The Trick**: The sumcheck protocol reduces the Verifier's work to just **one single evaluation** of the polynomial $g$ at a random point, plus some very cheap consistency checks.
## The Prover's Bottleneck & Optimization Landscape
While the Verifier's job is easy, the Prover must compute a new polynomial in each round, which can be computationally intensive. The optimizations we'll discuss are particularly relevant for a common scenario in modern multilinear SNARKs like the one we have in the leanVM prover.
### The Setting
1. **Polynomial Structure**: The polynomial $g$ is often a product of $d$ simpler, multilinear polynomials (degree at most 1 in each variable).
$$
g(X) = \prod_{k=1}^{d} p_k(X)
$$
2. **Base and Extension Field**: The actual computations involve "small" values (e.g., from a base field $\mathbb{F}_p$), but for security, the protocol's challenges are drawn from a much larger extension field $\mathbb{F}_{p^k}$.
This creates a significant performance gap between different types of multiplication:
* `ss` (small-small): Base field $\times$ Base field. **Very fast.**
* `sl` (small-large): Base field $\times$ Extension field. **Fast.**
* `ll` (large-large): Extension field $\times$ Extension field. **Very slow.**
Since the Verifier's random challenges $r_i$ are in the large field, the Prover's work from round 2 onwards is dominated by slow `ll` multiplications. The key to optimization is to rearrange the computation to minimize these `ll` operations, often by doing more work upfront in the cheaper base field.
## Optimization 1: Small-Value Sumcheck via Interpolation
This technique, based on work by Bagad, Dao, Domb, and Thaler, decouples the Prover's work into offline and online phases.
> LambdaClass has hands-on experience implementing this, so their insights will be highly valuable here! This section will focus on the core concepts, with the deeper dive happening during our discussion.
> Just putting a bunch of equations here for the discussion.
### The Core Idea: Decouple and Pre-compute
We split the prover's work into two phases:
* **Offline (Preprocessing)**: Perform as much work as possible using only "small" values before interacting with the Verifier.
* **Online**: Perform the remaining work that requires the "large" random challenges from the Verifier.
The main tool for this is Lagrange Interpolation. Any polynomial $F(Y)$ can be expressed as a weighted sum of basis polynomials. Instead of evaluating $F(r)$ directly at a random challenge $r$, we can write:
$$F(r) = \sum_{j} \underbrace{F(v_j)}_{\text{small coefficient}} \cdot \underbrace{L_j(r)}_{\text{basis eval}}$$
Here, the $v_j$ are pre-chosen "small" points. The terms $F(v_j)$ can be pre-computed offline.
### Applying it to Sumcheck
In round $i$, the Prover needs to compute $s_i(u) = \sum_{x'} \prod_{k=1}^{d} p_k(r_1, \dots, r_{i-1}, u, x')$. Using interpolation and swapping the order of summations, we get:
$$s_i(u) = \sum_{v \in G_d} \left( \sum_{x'} \prod_{k=1}^{d} p_k(v, u, x') \right) \cdot L_v(r_1, \dots, r_{i-1})$$
The term in the parentheses is what we call an accumulator:
$$A_i(v, u) = \sum_{x' \in \{0,1\}^{\ell-i}} \prod_{k=1}^{d} p_k(v, u, x')$$
These accumulators depend only on small, fixed points and can be entirely pre-computed offline using fast `ss` multiplications. The online work is then reduced to a single linear combination involving the `ll` heavy evaluations of $L_v(r_1, \dots, r_{i-1})$.
### The Hybrid Strategy
This method is most effective for the first few rounds. The optimal strategy is **hybrid**:
1. Use the accumulator method for the first $\ell_0$ rounds.
2. Switch back to the standard prover algorithm for the remaining rounds.
This approach dramatically reduces the total number of expensive `ll` multiplications.
### Open Questions for Discussion
* What is the optimal switch-over point $\ell_0$ in practice?
* Is this technique equally beneficial for the PCS side (like WHIR) and the PIOP, or is it better suited for one over the other?
## Optimization 2: Univariate Skip
The univariate skip is another technique for the initial rounds of sumcheck, designed to maximize work in the cheap base field.
### The Motivation
Can we do even more work in the base field to reduce the number of expensive extension-field rounds? Instead of eliminating one variable at a time, can we eliminate $k$ variables in a single, super-sized first round? A naive attempt to create a *multivariate* polynomial is too expensive, as it requires $(d+1)^k$ evaluations to define.
### The Solution: Reorganize the Domain
The key is to avoid creating a multivariate polynomial by changing the evaluation domain's structure.
1. **Restructure**: Replace the first $k$ Boolean variables (domain $\{0,1\}^k$) with a single variable that ranges over a multiplicative subgroup $D$ of the base field, where $|D| = 2^k$.
2. **Reinterpret**: The polynomial to be summed, $g(X_0, \dots, X_{\ell-1})$, is reinterpreted as a function $g'(X, x_k, \dots, x_{\ell-1})$, where $X \in D$.
3. **Compute**: The prover computes a single univariate polynomial $v(X)$ by summing over the remaining $\ell-k$ Boolean variables.
$$
v(X) = \sum_{x_k, \dots, x_{\ell-1} \in \{0,1\}} g'(X, x_k, \dots, x_{\ell-1})
$$
Because $v(X)$ is univariate, it can be defined by a manageable number of evaluations (linear in $2^k$, not exponential). The prover sends $v(X)$ to the verifier, effectively completing $k$ rounds of Sumcheck in one go, all within the fast base field. This is often implemented using Low-Degree Extensions (LDEs) via FFTs.
### Open Questions for Discussion
* WHIR: The univariate skip modifies the challenger state and has downstream effects on the entire proof system, including STIR query management and verifier logic in WHIR. How can we best manage this complexity?
* Initial benchmarks suggest this might be slower than the traditional approach (for WHIR but much better results for the PIOP), possibly due to overhead from handling a large `eq` polynomial table. What are the bottlenecks and how can they be addressed?
## Broader Implementation Questions
Let's brainstorm some higher-level implementation challenges and goals.
* How can we design a single, generic Sumcheck implementation that is efficient for both WHIR and SuperSpartan-like PIOPs?
* How to integrate packing strategies in the Sumcheck of WHIR?
* What would an ideal, unified implementation for both Sumcheck and Zerocheck look like, potentially for upstreaming into a library like Plonky3?
* How to implem the univariate skip properly ?