Optimizing Sumcheck in WHIR: Deferring Constraint Combination

# Optimizing Sumcheck in WHIR: Deferring Constraint Combination ## The bottleneck In WHIR, a significant performance bottleneck exists in the initial round where we combine multiple evaluation claims (e.g., $p(z_i) = s_i$) into a single, unified claim, before doing the first sumcheck. Currently, this is handled by the `statement.combine` function, which creates a single large weight polynomial $W(b)$ from all individual equality polynomials $\text{eq}(z_i, b)$. This process is quite slow when compared to the overall proving cost. ```rust pub fn from_base_evals<Challenger>( evals: &EvaluationsList<F>, statement: &Statement<EF>, combination_randomness: EF, prover_state: &mut ProverState<F, EF, Challenger>, folding_factor: usize, pow_bits: usize, ) -> (Self, MultilinearPoint<EF>) where F: TwoAdicField, EF: TwoAdicField, Challenger: FieldChallenger<F> + GrindingChallenger<Witness = F>, { assert_ne!(folding_factor, 0); let mut res = Vec::with_capacity(folding_factor); let (mut weights, mut sum) = statement.combine::<F>(combination_randomness); // In the first round base field evaluations are folded into extension field elements let (r, mut evals) = initial_round(prover_state, evals, &mut weights, &mut sum, pow_bits); res.push(r); // Apply rest of sumcheck rounds res.extend( (1..folding_factor) .map(|_| round(prover_state, &mut evals, &mut weights, &mut sum, pow_bits)), ); // Reverse challenges to maintain order from X₀ to Xₙ. res.reverse(); let sumcheck = Self { evals, weights, sum, phantom: std::marker::PhantomData, }; (sumcheck, MultilinearPoint::new(res)) } ``` ## Our Current Approach: Combine First → Sumcheck Once The current strategy is as follows: 1. **Start with `k` claims:** We have claims $p(z_i) = s_i$ for $i=1, \dots, k$. 2. **Combine:** We use a random challenge $\gamma$ to combine them into a single claim. This involves computing the evaluations of a dense weight polynomial over the entire hypercube: $$ W(b) = \sum_{i=1}^k \gamma^i \cdot \text{eq}(z_i, b) $$ 3. **Sumcheck:** We then run a single, large sumcheck protocol on the claim: $$ \sum_{b \in \{0,1\}^n} p(b) \cdot W(b) = \sum_{i=1}^k \gamma^i \cdot s_i $$ The main issue is Step 2: materializing the full evaluation table for $W(b)$ is computationally expensive and quite painful to parallelize well. ## Idea of Optimization: Sumcheck in Parallel → Combine Later The trick is to leverage the linearity of summation to re-order the operations. $$\sum_{b} p(b) \cdot \left( \sum_{i=1}^k \gamma^i \cdot \text{eq}(z_i, b) \right) = \sum_{i=1}^k \gamma^i \cdot \left( \sum_{b} p(b) \cdot \text{eq}(z_i, b) \right)$$ This algebraic small change can give us a new computational strategy: 1. **Decompose:** Instead of one big sumcheck, we can perform `k` independent sumchecks. The $i$-th sumcheck is on the much simpler product $p(b) \cdot \text{eq}(z_i, b)$. 2. **Execute in Parallel:** We run these `k` sumchecks concurrently. In each sumcheck round, this produces `k` small univariate polynomials: $h_1(X), h_2(X), \dots, h_k(X)$. 3. **Combine the Results:** We then combine these *small output polynomials* at the very end. The final polynomial sent to the verifier is: $$ h_{\text{final}}(X) = \sum_{i=1}^k \gamma^i \cdot h_i(X) $$ ### Why it could be better * **Parallelism**: The `k` sumchecks are completely independent and can be run on `k` different cores. * **Avoids the Bottleneck**: It completely sidesteps the expensive, monolithic step of creating the large, dense `W(b)` evaluation table. * **Another idea (not sure)**: Each parallel sumcheck operates on a product with `eq(z_i, b)`, which is a very sparse polynomial (non-zero at only one point). This allows the sumcheck logic itself to be heavily optimized, as we are no longer dealing with a generic, dense weight polynomial.