# Sound Challenge Derivation for Logup in ProveKit
## Background
### ProveKit Architecture
ProveKit is a proving system for R1CS satisfiability using:
- **WHIR** for multilinear polynomial commitments with zero-knowledge
- **Sumcheck protocol** to verify constraint satisfaction
- **Spark** for matrix evaluation verification
- **Log-derivative (logup)** for efficient range checks and memory lookups
### Logup in ProveKit
ProveKit uses logup to prove multiset equality for lookups and range checks efficiently.
**Given:**
- Table $T = [t_0, t_1, ..., t_{N-1}]$
- Lookup vector $L = [l_0, l_1, ..., l_{m-1}]$ where each $l_j$ should be in $T$
**Prove:** $L \subseteq T$ (with multiplicity)
**Logup protocol:**
1. Compute multiplicities: $mᵢ = |{j : l_j = t_i}|$ (counts of each table value in $L$)
2. Sample random challenge $\alpha$ via Fiat-Shamir
3. Verify log-derivative equality:
$$\sum_i \frac{m_i}{\alpha - t_i} = \sum_j \frac{1}{\alpha - l_j}$$
This creates **two witness phases** in ProveKit's R1CS:
- $W_1$ : Multiplicities {$m_i$} - deterministic given $L$
- $W_2$ : Challenge-dependent values:
- Denominators: $\frac{1}{\alpha - t_i}$, $\frac{1}{\alpha - l_j}$
- Products: $m_i \cdot \frac{1}{\alpha - t_i}$
- Sums for equality verification
### ROM Checking in ProveKit
ProveKit uses logup to verify read-only memory (ROM) integrity.
**Given:**
- ROM block with initial values $[v_0, v_1, ..., v_{N-1}]$
- Read operations at addresses $[a_0, a_1, ..., a_{m-1}]$
**Prove:** All reads are consistent with initial memory
**ROM checking protocol:**
1. Compute access counts: $c_i = |{j : a_j = i}|$ (multiplicity of address $i$)
2. Sample random challenges $\tau_1, \tau_2$ via Fiat-Shamir
3. Verify indexed logup equality:
$$\sum_j \frac{1}{\tau_1 \cdot a_j + \tau_2 \cdot v_j + 1} = \sum_i \frac{c_i}{\tau_1 \cdot i + \tau_2 \cdot v_i + 1}$$
This creates **two witness phases**:
- $W_1^{\text{ROM}}$: Access counts $\{c_i\}$ - deterministic given read addresses
- $W_2^{\text{ROM}}$: Challenge-dependent denominators and products (depends on $\tau_1, \tau_2$)
### RAM Checking in ProveKit
ProveKit uses the Spice protocol for read-write memory (RAM) integrity.
**Given:**
- RAM block with initial values and read/write operations
**Prove:** All reads return the last written value at each address
**Spice protocol:**
1. Compute deterministic witness values:
- Read timestamps $rt_j$ for each operation
- Old values before writes
- Final memory state
2. Sample random challenges $\tau_1, \tau_2$ via Fiat-Shamir
3. Build read-set and write-set product hashes using tuples $(addr, value, timestamp)$
4. Verify: $\prod_{\text{reads}} (\tau_1^2 \cdot addr + \tau_1 \cdot value + \tau_2 \cdot timestamp + 1) = \prod_{\text{writes}} (\ldots)$
This creates **two witness phases**:
- $W_1^{\text{RAM}}$: All Spice witnesses (read timestamps, old values, final state) - deterministic
- $W_2^{\text{RAM}}$: Challenge-dependent factors and products (depends on $\tau_1, \tau_2$)
**Vulnerability:** Like range checks, both ROM and RAM checking sample $\tau_1, \tau_2$ from prover-controlled `witness_merlin` before committing deterministic witnesses, enabling adaptive attacks.
## The Vulnerability
**Current ProveKit implementation:**
```rust
// In witness generation (provekit/prover/src/noir_proof_scheme.rs)
let mut witness_merlin = witness_io.to_prover_state();
self.seed_witness_merlin(&mut witness_merlin, &acir_witness_idx_to_value_map)?;
let partial_witness = self.r1cs.solve_witness_vec(
&self.witness_builders,
&acir_witness_idx_to_value_map,
&mut witness_merlin, // Prover-controlled RNG
);
```
**Problem:** `witness_merlin` is seeded with public inputs only. The prover controls the internal state when `WitnessBuilder::Challenge` samples $\alpha$.
**Attack scenario (Range Checks):**
1. Prover picks malicious multiplicities $\tilde m_i$
2. Runs witness generation, samples $\alpha$ from controlled `witness_merlin`
3. For this $\alpha$, computes denominators $\frac{1}{\alpha - t_i}$, $\frac{1}{\alpha - l_j}$
4. By construction, $\sum_i \frac{m_i}{\alpha - t_i} = \sum_j \frac{1}{\alpha - l_j}$ passes
5. Even though actual $L \not \subset T$, the check succeeds
**Attack extends to ROM/RAM:** Similarly, prover can pick malicious access counts (ROM) or timestamps (RAM), then sample $\tau_1, \tau_2$ adaptively to satisfy the multiset checks.
**Root cause:** Challenges ($\alpha$ for range checks, $\tau_1, \tau_2$ for ROM/RAM) are sampled from prover-controlled state before deterministic witnesses are committed, enabling adaptive attacks.
**Requirement:** All challenges must be cryptographically derived from a binding commitment to their respective deterministic witnesses.
## Multilinear Polynomials in ProveKit
### Encoding and Evaluation
ProveKit encodes witness vectors as multilinear polynomial evaluations over the boolean hypercube.
A vector $v = [v_0, \ldots, v_{2^m-1}]$ of length $2^m$ defines multilinear polynomial $P(x_1, ..., x_m)$ where:
$$P(b_1, \ldots, b_m) = v_i \text{ where } b_j \in \{0,1\} \text{ and } i = \sum_{j=1}^{m} b_j \cdot 2^{j-1}$$
At random point $r \in \mathbb{F}^m$:
$$P(r) = \sum_i v_i \cdot eq(r, i)$$
This is computed via `EvaluationsList::evaluate()` in ProveKit.
### ZK-WHIR Commitment
ProveKit commits to witnesses using WHIR with masking for zero-knowledge:
```rust
// From provekit/prover/src/whir_r1cs.rs
pub fn batch_commit_to_polynomial(
m: usize,
whir_config: &WhirConfig,
witness: &EvaluationsList<FieldElement>,
merlin: &mut ProverState<SkyscraperSponge, FieldElement>,
) -> (
Witness<FieldElement, SkyscraperMerkleConfig>,
EvaluationsList<FieldElement>,
EvaluationsList<FieldElement>,
) {
let mask = generate_random_multilinear_polynomial(witness.num_variables());
let masked_polynomial = create_masked_polynomial(witness, &mask);
let random_polynomial_eval = EvaluationsList::new(
generate_random_multilinear_polynomial(m)
);
let committer = CommitmentWriter::new(whir_config.clone());
let witness_new = committer
.commit_batch(merlin, &[
masked_polynomial_coeff.clone(),
random_polynomial_coeff.clone(),
])
.expect("WHIR prover failed to commit");
(witness_new, masked_polynomial, random_polynomial_eval)
}
```
**Masking mechanism:**
```rust
// From provekit/common/src/utils/zk_utils.rs
pub fn create_masked_polynomial(
original: &EvaluationsList<FieldElement>,
mask: &[FieldElement],
) -> EvaluationsList<FieldElement> {
let mut combined = Vec::with_capacity(original.num_evals() * 2);
combined.extend_from_slice(original.evals());
combined.extend_from_slice(mask);
EvaluationsList::new(combined)
}
```
This creates $\hat W(x, y) = (1-y) \cdot W(x) + y \cdot mask(x)$ as (m+1)-variate where:
- $\hat W(x, 0) = W(x)$ (first half via multilinear interpolation)
- $\hat W(x, 1) = mask(x)$ (second half)
**Opening at y=0:**
```rust
// From provekit/prover/src/whir_r1cs.rs lines 420-422
let mut expanded_alphas = pad_to_power_of_two(alpha.clone());
expanded_alphas.resize(expanded_alphas.len() * 2, FieldElement::zero());
// Zero-padding selects y=0, giving W(r)
```
## The Solution: Two-Phase Witness Commitment
### Protocol Overview
**Phase 1: Commit to Deterministic Witnesses**
```rust
// New: Before challenge extraction
// w1 contains all deterministic witnesses:
// - Range check multiplicities
// - ROM access counts
// - RAM Spice witnesses (timestamps, old values, final state)
let w1 = solve_phase1_witnesses(...);
let k = w1.len(); // Actual size of deterministic witnesses
let k_prime = k.next_power_of_two();
let m1 = k_prime.ilog2();
let (commitment_to_w1, masked_w1, random_w1) =
batch_commit_to_polynomial(m1, &whir_config, &w1_evals, &mut merlin);
```
**Phase 2: Derive Challenges**
```rust
// Extract all challenges from transcript containing C1
// α for range checks, τ₁ and τ₂ for ROM/RAM
let alpha = merlin.challenge_scalar()?;
let tau1 = merlin.challenge_scalar()?;
let tau2 = merlin.challenge_scalar()?;
```
**Phase 3: Build Full Witness**
```rust
// w2 contains all challenge-dependent witnesses:
// - Range check denominators and products (using α)
// - ROM denominators and products (using τ₁, τ₂)
// - RAM multiset factors and products (using τ₁, τ₂)
let w2 = solve_phase2_witnesses(..., alpha, tau1, tau2);
let w = [w1, w2].concat();
let n = w.len();
let n_prime = n.next_power_of_two();
let m = n_prime.ilog2();
let (commitment_to_witness, masked_polynomial, random_polynomial) =
batch_commit_to_polynomial(m, &whir_config, &w_evals, &mut merlin);
```
**Phase 4: Prove R1CS (unchanged)**
```rust
// Existing ProveKit R1CS proof from provekit/prover/src/whir_r1cs.rs
let (mut merlin, alpha_rx) = run_zk_sumcheck_prover(
r1cs, &witness, merlin, self.m_0, &self.whir_for_hiding_spartan
);
let alphas = calculate_external_row_of_r1cs_matrices(&alpha_rx, r1cs);
let (statement, f_sums, g_sums) = create_combined_statement_over_two_polynomials::<3>(
self.m, &commitment_to_witness, &masked_polynomial, &random_polynomial, &alphas
);
let (merlin, whir_randomness, deferred_evaluations) =
run_zk_whir_pcs_prover(commitment_to_witness, statement, &self.whir_witness, merlin);
```
**Note on R1CS verification:** The WHIR statement includes linear constraints that verify $\langle A(r_x, \cdot), W(\cdot) \rangle$ at random point $r_y$. The evaluation $W(r_y)$ is handled internally by WHIR's opening proof and never revealed directly to the verifier. The verifier only receives the public matrix evaluations $A(r_x, r_y)$, $B(r_x, r_y)$, $C(r_x, r_y)$ in `deferred_evaluations`.
**Phase 5: Consistency Proof**
Prove first $k$ entries of $W$ match $W_1$ using a separate sumcheck.
### Consistency Check Detail
**Virtual polynomial:**
We define a virtual polynomial that captures the difference between $W$ and $W_1$ in the first $k$ positions:
$$V(y) = W(y, 0, \ldots, 0) - W_1(y) \quad \text{for } y \in \{0,1\}^{m_1}$$
where $W$ is evaluated at $y$ padded with $(m - m_1)$ zeros, and $W_1$ is evaluated directly at $y$ in its native $m_1$-dimensional space.
**Selector polynomial:**
Since $W_1$ is padded to size $2^{m_1}$ but only the first $k$ entries are meaningful, we define:
$$\text{selector}(y) = \sum_{i=0}^{k-1} \text{eq}(y, i)$$
This is the multilinear extension of the indicator vector $[1, 1, \ldots, 1, 0, \ldots, 0]$ with $k$ ones. The verifier can compute $\text{selector}(r_z)$ directly without requiring a commitment to the selector polynomial.
**Zero-Knowledge Sumcheck:**
- **Claim:** $\displaystyle\sum_{y \in \{0,1\}^{m_1}} V(y) \cdot \text{selector}(y) = 0$
This holds if and only if the first $k$ entries of $W$ match $W_1$.
- **Blinding Protocol:**
1. Define random multilinear $G: \{0,1\}^{m_1} \to \mathbb{F}$
2. Commit $C_G \leftarrow \text{Commit}(G)$ and compute $S_G = \displaystyle\sum_{y \in \{0,1\}^{m_1}} G(y)$
3. Add $S_G$ to transcript and sample $\rho \xleftarrow{\$} \mathbb{F}$
4. Run sumcheck proving:
$$\sum_{y \in \{0,1\}^{m_1}} \left[ V(y) \cdot \text{selector}(y) + \rho \cdot G(y) \right] = \rho \cdot S_G$$
5. Extract evaluation point $r_z \in \mathbb{F}^{m_1}$ from $m_1$ sumcheck rounds
6. Open $G(r_z)$ against $C_G$
**Opening W and W₁ at r_z:**
After sumcheck reduces to point $r_z$, prover opens both commitments using WHIR's `prove()` function with evaluation constraints. Note that WHIR does not expose a separate `open()` API; instead, evaluation constraints within `prove()` serve as point openings.
1. **Open W at padded point:**
Evaluate $W$ at $(r_z, 0, \ldots, 0) \in \mathbb{F}^m$ to get $w_{\text{eval}}$
Prove via WHIR evaluation constraint:
$$\text{Weights::evaluation}(r_z \,||\, \mathbf{0}_{m-m_1}) \implies W(r_z, 0, \ldots, 0) = w_{\text{eval}}$$
WHIR internally verifies this evaluation and returns empty deferred evaluations.
2. **Open W₁ at r_z:**
Evaluate $W_1$ at $r_z \in \mathbb{F}^{m_1}$ to get $w_{1,\text{eval}}$
Prove via WHIR evaluation constraint:
$$\text{Weights::evaluation}(r_z) \implies W_1(r_z) = w_{1,\text{eval}}$$
Similarly, WHIR verifies this evaluation internally.
**Verifier Check:**
Given the opened values and the sumcheck verification, verifier computes:
$$\text{selector}(r_z) \quad \text{(deterministic from public } k\text{)}$$
and checks:
$$(w_{\text{eval}} - w_{1,\text{eval}}) \cdot \text{selector}(r_z) = 0$$
Combined with the sumcheck guarantee that $V(r_z) \cdot \text{selector}(r_z) + \rho \cdot G(r_z) = [\text{final fold value}]$, this proves the first $k$ entries of $W$ match $W_1$ with overwhelming probability.
## Security Analysis
### Binding
**Theorem:** Challenge $\alpha$ is cryptographically bound to multiplicities committed in $C_1$.
**Proof:**
- $C_1$ computationally binding by WHIR security
- $\alpha$ = Hash(transcript including $C_1$) via Fiat-Shamir
- Changing multiplicities requires breaking commitment binding or finding hash collision
**Theorem:** Consistency check ensures $W$ contains committed $W_1$.
**Proof:**
- If $W[0..k) \neq W_1$, then $V$ is non-zero polynomial of degree $\leq m_1$
- By Schwartz-Zippel, $Pr[V(r_z) = 0] ≤ \frac{m_1}{|\mathbb{F}|}$
- Openings bind claimed values to commitments
- Verifier detects inconsistency except with negligible probability
**Corollary:** ROM and RAM challenges are bound to their respective deterministic witnesses.
**Proof:**
- ROM access counts and RAM Spice witnesses committed in $C_1$
- Challenges $\tau_1, \tau_2$ = Hash(transcript including $C_1$) via Fiat-Shamir
- Changing committed witnesses requires breaking commitment binding or finding hash collision
- Same argument as range check binding applies to all logup-based checks
### Zero-Knowledge
**Theorem:** Protocol is zero-knowledge against poly-time verifiers in the bounded-query random oracle model.
**Proof sketch:**
- All commitments use ZK-WHIR masking ($y$-variable technique with random polynomial)
- Openings reveal only $P(r)$ for random $r$, indistinguishable from uniform
- Sumcheck round polynomials blinded by independent random polynomials $G$
- Simulator: program random oracle, sample commitments/openings uniformly
- Hybrid argument shows real and simulated distributions computationally indistinguishable
### Soundness
**Theorem:** Soundness error bounded by $\frac{m_0 + m + m_1}{|\mathbb{F}|} + \varepsilon_{\text{WHIR}}$.
**Proof:**
- Phase 1+2: Challenges bound to $W_1$ (binding + collision resistance)
- Phase 5: Consistency check via Schwartz-Zippel (error $\leq \frac{m_1}{|\mathbb{F}|}$)
- Phase 4: R1CS sumcheck soundness (error $\leq \frac{m_0 + m}{|\mathbb{F}|}$)
- WHIR PCS soundness: $\varepsilon_{\text{WHIR}}$ depends on FRI parameters (query count, folding factors, PoW bits)
- Union bound gives total error
**Note:** WHIR soundness $\varepsilon_{\text{WHIR}}$ is configurable via query repetitions and proof-of-work, typically set to $2^{-128}$ or smaller for cryptographic security.
## Cost Analysis
### Overhead Quantification
**For multiplicities k and full witness size n:**
```
m₁ = ceil(log₂ k)
m = ceil(log₂ n)
```
**Additional costs:**
- Commitment: 2^m₁ vs 2^m (when k << n, this is ~k vs n elements)
- Sumcheck: m₁ rounds vs m rounds
- Openings: 2 WHIR proofs (one m-variate, one m₁-variate)
**Concrete example (ProveKit usage):**
```
Range check with 2^16 table:
k = 2^16 multiplicities
n = 2^20 full witness
m₁ = 16
m = 20
C₁ commitment: 65,536 vs 1,048,576 elements (16x reduction)
Consistency sumcheck: 16 vs 20 rounds
```
## Implementation in ProveKit
### Witness Builder Modification
Tag builders in `provekit/common/src/witness/witness_builder.rs`:
```rust
impl WitnessBuilder {
pub fn is_phase1(&self) -> bool {
matches!(self,
WitnessBuilder::Constant(_) |
WitnessBuilder::Acir(_, _) |
WitnessBuilder::MultiplicitiesForRange(_, _, _) |
WitnessBuilder::MultiplicitiesForBinOp(_, _) |
WitnessBuilder::SpiceWitnesses(_) // RAM checking witnesses
)
}
pub fn is_phase2(&self) -> bool {
matches!(self,
WitnessBuilder::Challenge(_) | // All challenges: α, τ₁, τ₂, ...
WitnessBuilder::LogUpDenominator(_, _, _) | // Range checks
WitnessBuilder::IndexedLogUpDenominator(_, _, _, _, _) | // ROM checking
WitnessBuilder::BinOpLookupDenominator(_, _, _, _, _, _, _) |
// Any builder that depends on Challenge values
...
)
}
}
```
### Two-Pass Witness Solving
In `provekit/prover/src/noir_proof_scheme.rs`:
```rust
// Pass 1: Solve phase 1 witnesses
let mut witness_merlin_phase1 = witness_io.to_prover_state();
self.seed_witness_merlin(&mut witness_merlin_phase1, &acir_witness_idx_to_value_map)?;
let phase1_builders = self.witness_builders.iter()
.filter(|wb| wb.is_phase1())
.collect();
let partial_witness_phase1 = self.r1cs.solve_witness_vec(
&phase1_builders,
&acir_witness_idx_to_value_map,
&mut witness_merlin_phase1,
);
// Commit C1 and extract α
let (commitment_to_w1, masked_w1, random_w1) =
batch_commit_to_polynomial(..., &partial_witness_phase1, ...);
let alpha = witness_merlin_phase1.challenge_scalar()?;
// Pass 2: Solve phase 2 witnesses using α
let full_witness = self.r1cs.solve_witness_vec(
&self.witness_builders,
&acir_witness_idx_to_value_map,
&mut witness_merlin_phase1, // Contains α now
);
```
### Integration Points
1. **`provekit/prover/src/noir_proof_scheme.rs`**: Two-pass witness solving
2. **`provekit/prover/src/whir_r1cs.rs`**: Add consistency sumcheck after main R1CS proof
3. **`provekit/r1cs-compiler/src/witness_generator.rs`**: Tag builders during compilation
4. **`provekit/verifier/src/whir_r1cs.rs`**: Add consistency verification
## Conclusion
The two-phase commitment protocol fixes the adaptive witness attack in ProveKit's logup-based checks (range checks, ROM, RAM) while maintaining:
- **Soundness:** All challenges ($\alpha$, $\tau_1$, $\tau_2$) cryptographically bound to committed deterministic witnesses
- **Efficiency:** $O(k)$ overhead when $k << n$ (typical case), where $k$ is the size of deterministic witnesses
- **Zero-knowledge:** All openings use ZK-WHIR masking in the bounded-query model
- **Compatibility:** Integrates with existing ProveKit sumcheck and WHIR infrastructure
- **Generality:** Applies uniformly to all log-derivative based checks in the system
The protocol is provably secure with explicit soundness bounds $\frac{m_0 + m + m_1}{|\mathbb{F}|} + \varepsilon_{\text{WHIR}}$ and adds minimal overhead proportional to the size of deterministic witnesses, not the full witness.
## Solution 2: Split Witness with Multi-Tree Batched Opening
This solution combines the 2D witness encoding approach with WHIR's multi-commitment batching capability to achieve optimal efficiency.
### Overview
**Key idea:** Commit to w₁ and w₂ separately, use 2D witness encoding during R1CS sumcheck, then leverage WHIR's ability to batch openings across multiple Merkle trees. This eliminates both the consistency proof and the additional matrix batching sumcheck.
**Benefits:**
- No consistency proof needed (different commitments, no duplication)
- No additional sumcheck rounds (single batched opening produces single evaluation point)
- Minimal proof overhead compared to current implementation
- Soundness preserved: challenges extracted after w₁ commitment
### Protocol Specification
#### Phase 1: Witness Splitting and Commitment
```rust
// 1. Solve w₁ (deterministic witness: multiplicities, ACIR, constants, etc.)
let w1 = r1cs.solve_witness_vec(
&split_builders.w1_layers,
&acir_witness_map,
None, // No transcript access
split_builders.w1_size
);
// 2. Commit to w₁
let z1 = pad_to_power_of_two(w1); // Size: 2^m
let (commitment_w1, masked_poly_w1, random_poly_w1) =
whir_for_w1.commit(&mut merlin, &z1)?;
// 3. Extract challenges from transcript (now bound to w₁ commitment)
// Challenge extraction happens during w2 witness building
// 4. Solve w₂ (challenge-dependent witness)
let w2 = r1cs.solve_witness_vec(
&split_builders.w2_layers,
&acir_witness_map,
Some(&mut merlin), // Challenges extracted here
split_builders.w2_size
);
// 5. Commit to w₂
let z2 = pad_to_power_of_two(w2); // Size: 2^m (same as w₁)
let (commitment_w2, masked_poly_w2, random_poly_w2) =
whir_for_w2.commit(&mut merlin, &z2)?;
```
**Requirements:**
- `w1.len() = w2.len() = 2^m` (equal power-of-2 sizes for batching)
- Commitments stored separately: `root₁` and `root₂`
- No concatenation or re-commitment needed
#### Phase 2: R1CS Sumcheck with 2D Witness Encoding
**Virtual witness structure:** Treat witness as 2D multilinear polynomial:
```
w(x, y) where:
x ∈ {0, 1}: selects w₁ (x=0) or w₂ (x=1)
y ∈ {0,1}^m: position within selected part
w(0, y) = w₁(y)
w(1, y) = w₂(y)
```
**Matrix encoding:** R1CS matrices A, B, C extended to 3D:
```
A(constraint_idx, x_witness, y_witness) where:
constraint_idx ∈ {0,1}^m₀: constraint index
x_witness ∈ {0, 1}: first/second half selector
y_witness ∈ {0,1}^m: position within half
```
**R1CS sumcheck (standard):**
```rust
// Verify: Σᵢ eq(i,r) · (Aᵢw · Bᵢw - Cᵢw) = 0
run_zk_sumcheck_prover(
m_0, // Constraint variables
[a, b, c, eq],
|[a, b, c, eq]| {
let f0 = eq.0 * (a.0 * b.0 - c.0);
let f_em1 = (eq.0 + eq.0 - eq.1) *
((a.0 + a.0 - a.1) * (b.0 + b.0 - b.1) -
(c.0 + c.0 - c.1));
let f_inf = (eq.1 - eq.0) * (a.1 - a.0) * (b.1 - b.0);
[f0, f_em1, f_inf]
},
&whir_for_hiding_spartan,
&mut merlin
);
```
**Result:** Random constraint point `r`, claims on `a(r), b(r), c(r)` where:
```
a(r) = Σⱼ A(r,j)·w(j)
```
#### Phase 3: Multi-Tree Batched Opening
**Step 1: Rewrite evaluation claim using 2D encoding**
For matrix A (same for B, C):
```
a(r) = Σⱼ A(r,j)·w(j)
Reindex j → (x,y):
a(r) = Σₓ Σᵧ A(r,x,y)·w(x,y)
= Σᵧ A(r,0,y)·w₁(y) + Σᵧ A(r,1,y)·w₂(y)
︸━━━━━━━━━━━━━━━ ︸━━━━━━━━━━━━━━━
weighted sum on w₁ weighted sum on w₂
```
**Step 2: Batch commitments and weights**
```rust
// Extract batching randomness
let β = merlin.challenge_scalar()?;
// Batched claim for matrices A, B, C with additional randomness
let (β_A, β_B, β_C) = extract_batching_randomness(&mut merlin);
// Combined weighted claim:
// β_A·a(r) + β_B·b(r) + β_C·c(r)
// = β_A·[Σᵧ A(r,0,y)·w₁(y) + Σᵧ A(r,1,y)·w₂(y)]
// + β_B·[...] + β_C·[...]
// = Σᵧ [β_A·A(r,0,y) + β_B·B(r,0,y) + β_C·C(r,0,y)]·w₁(y)
// + β·Σᵧ [β_A·A(r,1,y) + β_B·B(r,1,y) + β_C·C(r,1,y)]·w₂(y)
//
// = Σᵧ weights₁(y)·w₁(y) + β·Σᵧ weights₂(y)·w₂(y)
let weights_1 = compute_combined_weights(A, B, C, r, 0, β_A, β_B, β_C);
let weights_2 = compute_combined_weights(A, B, C, r, 1, β_A, β_B, β_C);
```
**Step 3: Multi-tree WHIR opening (MODIFIED WHIR)**
Run single WHIR sumcheck over the batched polynomial `w₁ + β·w₂`:
```rust
// Conceptually: prove Σᵧ weights₁(y)·w₁(y) + β·Σᵧ weights₂(y)·w₂(y) = v
run_multi_tree_whir_opening(
commitments: [commitment_w1, commitment_w2],
polynomials: [w1, w2],
weights: [weights_1, weights_2],
batching_coeff: β,
merlin: &mut merlin
);
```
**WHIR sumcheck internals (m rounds):**
Each round i:
1. Prover computes univariate: `g_i(X) = weights₁·w₁ + β·weights₂·w₂` evaluated along dimension i
2. Verifier checks consistency, samples random challenge `r_i`
3. Both fold to next round
**Result after m rounds:** Single evaluation point `q ∈ {0,1}^m`
**Final claims:**
```
weights₁(q)·w₁(q) + β·weights₂(q)·w₂(q) = final_value
Requires proving:
- w₁(q): Open Merkle tree at position q in commitment_w1
- w₂(q): Open Merkle tree at position q in commitment_w2
- weights₁(q), weights₂(q): Evaluate from sparse matrices (Spark protocol)
```
#### Phase 4: Matrix Evaluation (Spark)
Since both matrix evaluations are at the **same point** `(r, 0, q)` and `(r, 1, q)`, they can be evaluated directly:
```rust
// For weights₁(q) = β_A·A(r,0,q) + β_B·B(r,0,q) + β_C·C(r,0,q)
// For weights₂(q) = β_A·A(r,1,q) + β_B·B(r,1,q) + β_C·C(r,1,q)
// Batch both evaluations with random γ
let γ = merlin.challenge_scalar()?;
// Prove: weights₁(q) + γ·weights₂(q)
// = Σ_{entries} value·eq(entry.row, r)·
// [eq(entry.col, (0,q)) + γ·eq(entry.col, (1,q))]
run_spark_matrix_evaluation(
matrices: [A, B, C],
eval_points: [(r, 0, q), (r, 1, q)],
batching_coeffs: [β_A, β_B, β_C, γ],
&mut merlin
);
```
**No additional sumcheck needed** - this is the standard Spark matrix evaluation.
### Implementation Requirements
#### WHIR Library Modifications
The current WHIR implementation needs extension to support multi-tree batching:
**1. Modify `Witness` struct to hold multiple trees:**
```rust
pub struct MultiTreeWitness<F, MerkleConfig> {
polynomials: Vec<CoefficientList<F>>,
merkle_trees: Vec<MerkleTree<MerkleConfig>>,
merkle_leaves: Vec<Vec<F>>,
ood_points: Vec<F>,
per_tree_ood_answers: Vec<Vec<F>>,
batching_randomness: F, // Extracted at opening time
}
```
**2. Modify opening protocol to delay RLC:**
Currently (single tree):
```rust
// In commit_batch: RLC happens immediately
let batching_randomness = prover_state.challenge_scalar();
batched_poly = poly₀ + β·poly₁ + ...;
```
Required (multi-tree):
```rust
// In commit_batch: Store polynomials separately
return MultiTreeWitness {
polynomials: vec![poly₀, poly₁],
merkle_trees: vec![tree₀, tree₁],
// No RLC yet
};
// In open_batch: RLC happens here
let β = prover_state.challenge_scalar();
let batched_poly = poly₀ + β·poly₁;
// Run sumcheck on batched_poly
```
**3. Modify opening proof to include multiple Merkle paths:**
```rust
pub struct MultiTreeOpeningProof {
sumcheck_rounds: Vec<UnivariatePolynomial>, // m rounds
merkle_proofs: Vec<MerklePath>, // One per tree
final_evaluations: Vec<F>, // w₁(q), w₂(q), ...
}
```
### Proof Size Analysis
**Current implementation (with consistency proof):**
| Component | Rounds | Field Elements | Cost |
|-----------|--------|----------------|------|
| R1CS sumcheck | m₀ | ~3m₀ | Standard |
| Full witness commitment | - | O(λ) | Merkle root |
| Full witness opening | m+1 | ~4(m+1) | WHIR rounds |
| w₁ commitment | - | O(λ) | Merkle root |
| Consistency sumcheck | m₁ | ~3m₁ | Additional |
| Consistency opening (w₁) | m₁ | ~4m₁ | WHIR on w₁ |
| Consistency opening (full) | m+1 | ~4(m+1) | WHIR on full |
| **Total opening cost** | | **~11m + 7m₁ + 8** | |
**Proposed implementation (multi-tree batched):**
| Component | Rounds | Field Elements | Cost |
|-----------|--------|----------------|------|
| R1CS sumcheck | m₀ | ~3m₀ | Standard |
| w₁ commitment | - | O(λ) | Merkle root |
| w₂ commitment | - | O(λ) | Merkle root |
| Multi-tree batched opening | m | ~4m | WHIR rounds |
| 2 Merkle paths | - | ~2 log(2^m) | Authentication |
| Spark matrix evaluation | ~m+1 | ~3(m+1) | Standard |
| **Total opening cost** | | **~7m + 3** | |
**Savings:** ~4m + 7m₁ + 5 field elements ≈ **35-40% reduction** for typical m=20, m₁=15
### Security Considerations
**Soundness:** The protocol maintains soundness through:
1. w₁ committed before challenge extraction (prevents adaptive attacks)
2. Fiat-Shamir for all randomness (β, γ from transcript)
3. WHIR batching soundness (Schwartz-Zippel lemma)
4. No duplication between commitments (no consistency needed)
**Zero-knowledge:** Preserved through:
1. WHIR's existing ZK mechanisms (masking polynomials)
2. Independent commitments with independent randomness
3. Batching happens post-commitment (doesn't leak information)
### Migration Path
**Phase 1:** Implement basic split witness (current work)
- Use separate commitments for w₁ and w₂
- Keep consistency proof temporarily
- Validate correctness with existing tests
**Phase 2:** Add multi-tree batching to WHIR
- Extend `Witness` struct for multiple trees
- Modify opening protocol to delay RLC
- Add multi-tree test cases
**Phase 3:** Integrate multi-tree opening
- Replace consistency proof with batched opening
- Update ProveKit's `WhirR1CSProver` trait
- Benchmark proof size reduction
**Phase 4:** Optimize for production
- Implement Spark matrix evaluation optimization
- Fine-tune batching parameters
- Add comprehensive test coverage