# Optimizing Sumcheck in WHIR: Deferring Constraint Combination
## The bottleneck
In WHIR, a significant performance bottleneck exists in the initial round where we combine multiple evaluation claims (e.g., $p(z_i) = s_i$) into a single, unified claim, before doing the first sumcheck.
Currently, this is handled by the `statement.combine` function, which creates a single large weight polynomial $W(b)$ from all individual equality polynomials $\text{eq}(z_i, b)$. This process is quite slow when compared to the overall proving cost.
```rust
pub fn from_base_evals<Challenger>(
evals: &EvaluationsList<F>,
statement: &Statement<EF>,
combination_randomness: EF,
prover_state: &mut ProverState<F, EF, Challenger>,
folding_factor: usize,
pow_bits: usize,
) -> (Self, MultilinearPoint<EF>)
where
F: TwoAdicField,
EF: TwoAdicField,
Challenger: FieldChallenger<F> + GrindingChallenger<Witness = F>,
{
assert_ne!(folding_factor, 0);
let mut res = Vec::with_capacity(folding_factor);
let (mut weights, mut sum) = statement.combine::<F>(combination_randomness);
// In the first round base field evaluations are folded into extension field elements
let (r, mut evals) = initial_round(prover_state, evals, &mut weights, &mut sum, pow_bits);
res.push(r);
// Apply rest of sumcheck rounds
res.extend(
(1..folding_factor)
.map(|_| round(prover_state, &mut evals, &mut weights, &mut sum, pow_bits)),
);
// Reverse challenges to maintain order from X₀ to Xₙ.
res.reverse();
let sumcheck = Self {
evals,
weights,
sum,
phantom: std::marker::PhantomData,
};
(sumcheck, MultilinearPoint::new(res))
}
```
## Our Current Approach: Combine First → Sumcheck Once
The current strategy is as follows:
1. **Start with `k` claims:** We have claims $p(z_i) = s_i$ for $i=1, \dots, k$.
2. **Combine:** We use a random challenge $\gamma$ to combine them into a single claim. This involves computing the evaluations of a dense weight polynomial over the entire hypercube:
$$
W(b) = \sum_{i=1}^k \gamma^i \cdot \text{eq}(z_i, b)
$$
3. **Sumcheck:** We then run a single, large sumcheck protocol on the claim:
$$
\sum_{b \in \{0,1\}^n} p(b) \cdot W(b) = \sum_{i=1}^k \gamma^i \cdot s_i
$$
The main issue is Step 2: materializing the full evaluation table for $W(b)$ is computationally expensive and quite painful to parallelize well.
## Idea of Optimization: Sumcheck in Parallel → Combine Later
The trick is to leverage the linearity of summation to re-order the operations.
$$\sum_{b} p(b) \cdot \left( \sum_{i=1}^k \gamma^i \cdot \text{eq}(z_i, b) \right) = \sum_{i=1}^k \gamma^i \cdot \left( \sum_{b} p(b) \cdot \text{eq}(z_i, b) \right)$$
This algebraic small change can give us a new computational strategy:
1. **Decompose:** Instead of one big sumcheck, we can perform `k` independent sumchecks. The $i$-th sumcheck is on the much simpler product $p(b) \cdot \text{eq}(z_i, b)$.
2. **Execute in Parallel:** We run these `k` sumchecks concurrently. In each sumcheck round, this produces `k` small univariate polynomials: $h_1(X), h_2(X), \dots, h_k(X)$.
3. **Combine the Results:** We then combine these *small output polynomials* at the very end. The final polynomial sent to the verifier is:
$$
h_{\text{final}}(X) = \sum_{i=1}^k \gamma^i \cdot h_i(X)
$$
### Why it could be better
* **Parallelism**: The `k` sumchecks are completely independent and can be run on `k` different cores.
* **Avoids the Bottleneck**: It completely sidesteps the expensive, monolithic step of creating the large, dense `W(b)` evaluation table.
* **Another idea (not sure)**: Each parallel sumcheck operates on a product with `eq(z_i, b)`, which is a very sparse polynomial (non-zero at only one point). This allows the sumcheck logic itself to be heavily optimized, as we are no longer dealing with a generic, dense weight polynomial.