Sound Challenge Derivation for Logup in ProveKit

# Sound Challenge Derivation for Logup in ProveKit ## Background ### ProveKit Architecture ProveKit is a proving system for R1CS satisfiability using: - **WHIR** for multilinear polynomial commitments with zero-knowledge - **Sumcheck protocol** to verify constraint satisfaction - **Spark** for matrix evaluation verification - **Log-derivative (logup)** for efficient range checks and memory lookups ### Logup in ProveKit ProveKit uses logup to prove multiset equality for lookups and range checks efficiently. **Given:** - Table $T = [t_0, t_1, ..., t_{N-1}]$ - Lookup vector $L = [l_0, l_1, ..., l_{m-1}]$ where each $l_j$ should be in $T$ **Prove:** $L \subseteq T$ (with multiplicity) **Logup protocol:** 1. Compute multiplicities: $mᵢ = |{j : l_j = t_i}|$ (counts of each table value in $L$) 2. Sample random challenge $\alpha$ via Fiat-Shamir 3. Verify log-derivative equality: $$\sum_i \frac{m_i}{\alpha - t_i} = \sum_j \frac{1}{\alpha - l_j}$$ This creates **two witness phases** in ProveKit's R1CS: - $W_1$ : Multiplicities {$m_i$} - deterministic given $L$ - $W_2$ : Challenge-dependent values: - Denominators: $\frac{1}{\alpha - t_i}$, $\frac{1}{\alpha - l_j}$ - Products: $m_i \cdot \frac{1}{\alpha - t_i}$ - Sums for equality verification ### ROM Checking in ProveKit ProveKit uses logup to verify read-only memory (ROM) integrity. **Given:** - ROM block with initial values $[v_0, v_1, ..., v_{N-1}]$ - Read operations at addresses $[a_0, a_1, ..., a_{m-1}]$ **Prove:** All reads are consistent with initial memory **ROM checking protocol:** 1. Compute access counts: $c_i = |{j : a_j = i}|$ (multiplicity of address $i$) 2. Sample random challenges $\tau_1, \tau_2$ via Fiat-Shamir 3. Verify indexed logup equality: $$\sum_j \frac{1}{\tau_1 \cdot a_j + \tau_2 \cdot v_j + 1} = \sum_i \frac{c_i}{\tau_1 \cdot i + \tau_2 \cdot v_i + 1}$$ This creates **two witness phases**: - $W_1^{\text{ROM}}$: Access counts $\{c_i\}$ - deterministic given read addresses - $W_2^{\text{ROM}}$: Challenge-dependent denominators and products (depends on $\tau_1, \tau_2$) ### RAM Checking in ProveKit ProveKit uses the Spice protocol for read-write memory (RAM) integrity. **Given:** - RAM block with initial values and read/write operations **Prove:** All reads return the last written value at each address **Spice protocol:** 1. Compute deterministic witness values: - Read timestamps $rt_j$ for each operation - Old values before writes - Final memory state 2. Sample random challenges $\tau_1, \tau_2$ via Fiat-Shamir 3. Build read-set and write-set product hashes using tuples $(addr, value, timestamp)$ 4. Verify: $\prod_{\text{reads}} (\tau_1^2 \cdot addr + \tau_1 \cdot value + \tau_2 \cdot timestamp + 1) = \prod_{\text{writes}} (\ldots)$ This creates **two witness phases**: - $W_1^{\text{RAM}}$: All Spice witnesses (read timestamps, old values, final state) - deterministic - $W_2^{\text{RAM}}$: Challenge-dependent factors and products (depends on $\tau_1, \tau_2$) **Vulnerability:** Like range checks, both ROM and RAM checking sample $\tau_1, \tau_2$ from prover-controlled `witness_merlin` before committing deterministic witnesses, enabling adaptive attacks. ## The Vulnerability **Current ProveKit implementation:** ```rust // In witness generation (provekit/prover/src/noir_proof_scheme.rs) let mut witness_merlin = witness_io.to_prover_state(); self.seed_witness_merlin(&mut witness_merlin, &acir_witness_idx_to_value_map)?; let partial_witness = self.r1cs.solve_witness_vec( &self.witness_builders, &acir_witness_idx_to_value_map, &mut witness_merlin, // Prover-controlled RNG ); ``` **Problem:** `witness_merlin` is seeded with public inputs only. The prover controls the internal state when `WitnessBuilder::Challenge` samples $\alpha$. **Attack scenario (Range Checks):** 1. Prover picks malicious multiplicities $\tilde m_i$ 2. Runs witness generation, samples $\alpha$ from controlled `witness_merlin` 3. For this $\alpha$, computes denominators $\frac{1}{\alpha - t_i}$, $\frac{1}{\alpha - l_j}$ 4. By construction, $\sum_i \frac{m_i}{\alpha - t_i} = \sum_j \frac{1}{\alpha - l_j}$ passes 5. Even though actual $L \not \subset T$, the check succeeds **Attack extends to ROM/RAM:** Similarly, prover can pick malicious access counts (ROM) or timestamps (RAM), then sample $\tau_1, \tau_2$ adaptively to satisfy the multiset checks. **Root cause:** Challenges ($\alpha$ for range checks, $\tau_1, \tau_2$ for ROM/RAM) are sampled from prover-controlled state before deterministic witnesses are committed, enabling adaptive attacks. **Requirement:** All challenges must be cryptographically derived from a binding commitment to their respective deterministic witnesses. ## Multilinear Polynomials in ProveKit ### Encoding and Evaluation ProveKit encodes witness vectors as multilinear polynomial evaluations over the boolean hypercube. A vector $v = [v_0, \ldots, v_{2^m-1}]$ of length $2^m$ defines multilinear polynomial $P(x_1, ..., x_m)$ where: $$P(b_1, \ldots, b_m) = v_i \text{ where } b_j \in \{0,1\} \text{ and } i = \sum_{j=1}^{m} b_j \cdot 2^{j-1}$$ At random point $r \in \mathbb{F}^m$: $$P(r) = \sum_i v_i \cdot eq(r, i)$$ This is computed via `EvaluationsList::evaluate()` in ProveKit. ### ZK-WHIR Commitment ProveKit commits to witnesses using WHIR with masking for zero-knowledge: ```rust // From provekit/prover/src/whir_r1cs.rs pub fn batch_commit_to_polynomial( m: usize, whir_config: &WhirConfig, witness: &EvaluationsList<FieldElement>, merlin: &mut ProverState<SkyscraperSponge, FieldElement>, ) -> ( Witness<FieldElement, SkyscraperMerkleConfig>, EvaluationsList<FieldElement>, EvaluationsList<FieldElement>, ) { let mask = generate_random_multilinear_polynomial(witness.num_variables()); let masked_polynomial = create_masked_polynomial(witness, &mask); let random_polynomial_eval = EvaluationsList::new( generate_random_multilinear_polynomial(m) ); let committer = CommitmentWriter::new(whir_config.clone()); let witness_new = committer .commit_batch(merlin, &[ masked_polynomial_coeff.clone(), random_polynomial_coeff.clone(), ]) .expect("WHIR prover failed to commit"); (witness_new, masked_polynomial, random_polynomial_eval) } ``` **Masking mechanism:** ```rust // From provekit/common/src/utils/zk_utils.rs pub fn create_masked_polynomial( original: &EvaluationsList<FieldElement>, mask: &[FieldElement], ) -> EvaluationsList<FieldElement> { let mut combined = Vec::with_capacity(original.num_evals() * 2); combined.extend_from_slice(original.evals()); combined.extend_from_slice(mask); EvaluationsList::new(combined) } ``` This creates $\hat W(x, y) = (1-y) \cdot W(x) + y \cdot mask(x)$ as (m+1)-variate where: - $\hat W(x, 0) = W(x)$ (first half via multilinear interpolation) - $\hat W(x, 1) = mask(x)$ (second half) **Opening at y=0:** ```rust // From provekit/prover/src/whir_r1cs.rs lines 420-422 let mut expanded_alphas = pad_to_power_of_two(alpha.clone()); expanded_alphas.resize(expanded_alphas.len() * 2, FieldElement::zero()); // Zero-padding selects y=0, giving W(r) ``` ## The Solution: Two-Phase Witness Commitment ### Protocol Overview **Phase 1: Commit to Deterministic Witnesses** ```rust // New: Before challenge extraction // w1 contains all deterministic witnesses: // - Range check multiplicities // - ROM access counts // - RAM Spice witnesses (timestamps, old values, final state) let w1 = solve_phase1_witnesses(...); let k = w1.len(); // Actual size of deterministic witnesses let k_prime = k.next_power_of_two(); let m1 = k_prime.ilog2(); let (commitment_to_w1, masked_w1, random_w1) = batch_commit_to_polynomial(m1, &whir_config, &w1_evals, &mut merlin); ``` **Phase 2: Derive Challenges** ```rust // Extract all challenges from transcript containing C1 // α for range checks, τ₁ and τ₂ for ROM/RAM let alpha = merlin.challenge_scalar()?; let tau1 = merlin.challenge_scalar()?; let tau2 = merlin.challenge_scalar()?; ``` **Phase 3: Build Full Witness** ```rust // w2 contains all challenge-dependent witnesses: // - Range check denominators and products (using α) // - ROM denominators and products (using τ₁, τ₂) // - RAM multiset factors and products (using τ₁, τ₂) let w2 = solve_phase2_witnesses(..., alpha, tau1, tau2); let w = [w1, w2].concat(); let n = w.len(); let n_prime = n.next_power_of_two(); let m = n_prime.ilog2(); let (commitment_to_witness, masked_polynomial, random_polynomial) = batch_commit_to_polynomial(m, &whir_config, &w_evals, &mut merlin); ``` **Phase 4: Prove R1CS (unchanged)** ```rust // Existing ProveKit R1CS proof from provekit/prover/src/whir_r1cs.rs let (mut merlin, alpha_rx) = run_zk_sumcheck_prover( r1cs, &witness, merlin, self.m_0, &self.whir_for_hiding_spartan ); let alphas = calculate_external_row_of_r1cs_matrices(&alpha_rx, r1cs); let (statement, f_sums, g_sums) = create_combined_statement_over_two_polynomials::<3>( self.m, &commitment_to_witness, &masked_polynomial, &random_polynomial, &alphas ); let (merlin, whir_randomness, deferred_evaluations) = run_zk_whir_pcs_prover(commitment_to_witness, statement, &self.whir_witness, merlin); ``` **Note on R1CS verification:** The WHIR statement includes linear constraints that verify $\langle A(r_x, \cdot), W(\cdot) \rangle$ at random point $r_y$. The evaluation $W(r_y)$ is handled internally by WHIR's opening proof and never revealed directly to the verifier. The verifier only receives the public matrix evaluations $A(r_x, r_y)$, $B(r_x, r_y)$, $C(r_x, r_y)$ in `deferred_evaluations`. **Phase 5: Consistency Proof** Prove first $k$ entries of $W$ match $W_1$ using a separate sumcheck. ### Consistency Check Detail **Virtual polynomial:** We define a virtual polynomial that captures the difference between $W$ and $W_1$ in the first $k$ positions: $$V(y) = W(y, 0, \ldots, 0) - W_1(y) \quad \text{for } y \in \{0,1\}^{m_1}$$ where $W$ is evaluated at $y$ padded with $(m - m_1)$ zeros, and $W_1$ is evaluated directly at $y$ in its native $m_1$-dimensional space. **Selector polynomial:** Since $W_1$ is padded to size $2^{m_1}$ but only the first $k$ entries are meaningful, we define: $$\text{selector}(y) = \sum_{i=0}^{k-1} \text{eq}(y, i)$$ This is the multilinear extension of the indicator vector $[1, 1, \ldots, 1, 0, \ldots, 0]$ with $k$ ones. The verifier can compute $\text{selector}(r_z)$ directly without requiring a commitment to the selector polynomial. **Zero-Knowledge Sumcheck:** - **Claim:** $\displaystyle\sum_{y \in \{0,1\}^{m_1}} V(y) \cdot \text{selector}(y) = 0$ This holds if and only if the first $k$ entries of $W$ match $W_1$. - **Blinding Protocol:** 1. Define random multilinear $G: \{0,1\}^{m_1} \to \mathbb{F}$ 2. Commit $C_G \leftarrow \text{Commit}(G)$ and compute $S_G = \displaystyle\sum_{y \in \{0,1\}^{m_1}} G(y)$ 3. Add $S_G$ to transcript and sample $\rho \xleftarrow{\$} \mathbb{F}$ 4. Run sumcheck proving: $$\sum_{y \in \{0,1\}^{m_1}} \left[ V(y) \cdot \text{selector}(y) + \rho \cdot G(y) \right] = \rho \cdot S_G$$ 5. Extract evaluation point $r_z \in \mathbb{F}^{m_1}$ from $m_1$ sumcheck rounds 6. Open $G(r_z)$ against $C_G$ **Opening W and W₁ at r_z:** After sumcheck reduces to point $r_z$, prover opens both commitments using WHIR's `prove()` function with evaluation constraints. Note that WHIR does not expose a separate `open()` API; instead, evaluation constraints within `prove()` serve as point openings. 1. **Open W at padded point:** Evaluate $W$ at $(r_z, 0, \ldots, 0) \in \mathbb{F}^m$ to get $w_{\text{eval}}$ Prove via WHIR evaluation constraint: $$\text{Weights::evaluation}(r_z \,||\, \mathbf{0}_{m-m_1}) \implies W(r_z, 0, \ldots, 0) = w_{\text{eval}}$$ WHIR internally verifies this evaluation and returns empty deferred evaluations. 2. **Open W₁ at r_z:** Evaluate $W_1$ at $r_z \in \mathbb{F}^{m_1}$ to get $w_{1,\text{eval}}$ Prove via WHIR evaluation constraint: $$\text{Weights::evaluation}(r_z) \implies W_1(r_z) = w_{1,\text{eval}}$$ Similarly, WHIR verifies this evaluation internally. **Verifier Check:** Given the opened values and the sumcheck verification, verifier computes: $$\text{selector}(r_z) \quad \text{(deterministic from public } k\text{)}$$ and checks: $$(w_{\text{eval}} - w_{1,\text{eval}}) \cdot \text{selector}(r_z) = 0$$ Combined with the sumcheck guarantee that $V(r_z) \cdot \text{selector}(r_z) + \rho \cdot G(r_z) = [\text{final fold value}]$, this proves the first $k$ entries of $W$ match $W_1$ with overwhelming probability. ## Security Analysis ### Binding **Theorem:** Challenge $\alpha$ is cryptographically bound to multiplicities committed in $C_1$. **Proof:** - $C_1$ computationally binding by WHIR security - $\alpha$ = Hash(transcript including $C_1$) via Fiat-Shamir - Changing multiplicities requires breaking commitment binding or finding hash collision **Theorem:** Consistency check ensures $W$ contains committed $W_1$. **Proof:** - If $W[0..k) \neq W_1$, then $V$ is non-zero polynomial of degree $\leq m_1$ - By Schwartz-Zippel, $Pr[V(r_z) = 0] ≤ \frac{m_1}{|\mathbb{F}|}$ - Openings bind claimed values to commitments - Verifier detects inconsistency except with negligible probability **Corollary:** ROM and RAM challenges are bound to their respective deterministic witnesses. **Proof:** - ROM access counts and RAM Spice witnesses committed in $C_1$ - Challenges $\tau_1, \tau_2$ = Hash(transcript including $C_1$) via Fiat-Shamir - Changing committed witnesses requires breaking commitment binding or finding hash collision - Same argument as range check binding applies to all logup-based checks ### Zero-Knowledge **Theorem:** Protocol is zero-knowledge against poly-time verifiers in the bounded-query random oracle model. **Proof sketch:** - All commitments use ZK-WHIR masking ($y$-variable technique with random polynomial) - Openings reveal only $P(r)$ for random $r$, indistinguishable from uniform - Sumcheck round polynomials blinded by independent random polynomials $G$ - Simulator: program random oracle, sample commitments/openings uniformly - Hybrid argument shows real and simulated distributions computationally indistinguishable ### Soundness **Theorem:** Soundness error bounded by $\frac{m_0 + m + m_1}{|\mathbb{F}|} + \varepsilon_{\text{WHIR}}$. **Proof:** - Phase 1+2: Challenges bound to $W_1$ (binding + collision resistance) - Phase 5: Consistency check via Schwartz-Zippel (error $\leq \frac{m_1}{|\mathbb{F}|}$) - Phase 4: R1CS sumcheck soundness (error $\leq \frac{m_0 + m}{|\mathbb{F}|}$) - WHIR PCS soundness: $\varepsilon_{\text{WHIR}}$ depends on FRI parameters (query count, folding factors, PoW bits) - Union bound gives total error **Note:** WHIR soundness $\varepsilon_{\text{WHIR}}$ is configurable via query repetitions and proof-of-work, typically set to $2^{-128}$ or smaller for cryptographic security. ## Cost Analysis ### Overhead Quantification **For multiplicities k and full witness size n:** ``` m₁ = ceil(log₂ k) m = ceil(log₂ n) ``` **Additional costs:** - Commitment: 2^m₁ vs 2^m (when k << n, this is ~k vs n elements) - Sumcheck: m₁ rounds vs m rounds - Openings: 2 WHIR proofs (one m-variate, one m₁-variate) **Concrete example (ProveKit usage):** ``` Range check with 2^16 table: k = 2^16 multiplicities n = 2^20 full witness m₁ = 16 m = 20 C₁ commitment: 65,536 vs 1,048,576 elements (16x reduction) Consistency sumcheck: 16 vs 20 rounds ``` ## Implementation in ProveKit ### Witness Builder Modification Tag builders in `provekit/common/src/witness/witness_builder.rs`: ```rust impl WitnessBuilder { pub fn is_phase1(&self) -> bool { matches!(self, WitnessBuilder::Constant(_) | WitnessBuilder::Acir(_, _) | WitnessBuilder::MultiplicitiesForRange(_, _, _) | WitnessBuilder::MultiplicitiesForBinOp(_, _) | WitnessBuilder::SpiceWitnesses(_) // RAM checking witnesses ) } pub fn is_phase2(&self) -> bool { matches!(self, WitnessBuilder::Challenge(_) | // All challenges: α, τ₁, τ₂, ... WitnessBuilder::LogUpDenominator(_, _, _) | // Range checks WitnessBuilder::IndexedLogUpDenominator(_, _, _, _, _) | // ROM checking WitnessBuilder::BinOpLookupDenominator(_, _, _, _, _, _, _) | // Any builder that depends on Challenge values ... ) } } ``` ### Two-Pass Witness Solving In `provekit/prover/src/noir_proof_scheme.rs`: ```rust // Pass 1: Solve phase 1 witnesses let mut witness_merlin_phase1 = witness_io.to_prover_state(); self.seed_witness_merlin(&mut witness_merlin_phase1, &acir_witness_idx_to_value_map)?; let phase1_builders = self.witness_builders.iter() .filter(|wb| wb.is_phase1()) .collect(); let partial_witness_phase1 = self.r1cs.solve_witness_vec( &phase1_builders, &acir_witness_idx_to_value_map, &mut witness_merlin_phase1, ); // Commit C1 and extract α let (commitment_to_w1, masked_w1, random_w1) = batch_commit_to_polynomial(..., &partial_witness_phase1, ...); let alpha = witness_merlin_phase1.challenge_scalar()?; // Pass 2: Solve phase 2 witnesses using α let full_witness = self.r1cs.solve_witness_vec( &self.witness_builders, &acir_witness_idx_to_value_map, &mut witness_merlin_phase1, // Contains α now ); ``` ### Integration Points 1. **`provekit/prover/src/noir_proof_scheme.rs`**: Two-pass witness solving 2. **`provekit/prover/src/whir_r1cs.rs`**: Add consistency sumcheck after main R1CS proof 3. **`provekit/r1cs-compiler/src/witness_generator.rs`**: Tag builders during compilation 4. **`provekit/verifier/src/whir_r1cs.rs`**: Add consistency verification ## Conclusion The two-phase commitment protocol fixes the adaptive witness attack in ProveKit's logup-based checks (range checks, ROM, RAM) while maintaining: - **Soundness:** All challenges ($\alpha$, $\tau_1$, $\tau_2$) cryptographically bound to committed deterministic witnesses - **Efficiency:** $O(k)$ overhead when $k << n$ (typical case), where $k$ is the size of deterministic witnesses - **Zero-knowledge:** All openings use ZK-WHIR masking in the bounded-query model - **Compatibility:** Integrates with existing ProveKit sumcheck and WHIR infrastructure - **Generality:** Applies uniformly to all log-derivative based checks in the system The protocol is provably secure with explicit soundness bounds $\frac{m_0 + m + m_1}{|\mathbb{F}|} + \varepsilon_{\text{WHIR}}$ and adds minimal overhead proportional to the size of deterministic witnesses, not the full witness. ## Solution 2: Split Witness with Multi-Tree Batched Opening This solution combines the 2D witness encoding approach with WHIR's multi-commitment batching capability to achieve optimal efficiency. ### Overview **Key idea:** Commit to w₁ and w₂ separately, use 2D witness encoding during R1CS sumcheck, then leverage WHIR's ability to batch openings across multiple Merkle trees. This eliminates both the consistency proof and the additional matrix batching sumcheck. **Benefits:** - No consistency proof needed (different commitments, no duplication) - No additional sumcheck rounds (single batched opening produces single evaluation point) - Minimal proof overhead compared to current implementation - Soundness preserved: challenges extracted after w₁ commitment ### Protocol Specification #### Phase 1: Witness Splitting and Commitment ```rust // 1. Solve w₁ (deterministic witness: multiplicities, ACIR, constants, etc.) let w1 = r1cs.solve_witness_vec( &split_builders.w1_layers, &acir_witness_map, None, // No transcript access split_builders.w1_size ); // 2. Commit to w₁ let z1 = pad_to_power_of_two(w1); // Size: 2^m let (commitment_w1, masked_poly_w1, random_poly_w1) = whir_for_w1.commit(&mut merlin, &z1)?; // 3. Extract challenges from transcript (now bound to w₁ commitment) // Challenge extraction happens during w2 witness building // 4. Solve w₂ (challenge-dependent witness) let w2 = r1cs.solve_witness_vec( &split_builders.w2_layers, &acir_witness_map, Some(&mut merlin), // Challenges extracted here split_builders.w2_size ); // 5. Commit to w₂ let z2 = pad_to_power_of_two(w2); // Size: 2^m (same as w₁) let (commitment_w2, masked_poly_w2, random_poly_w2) = whir_for_w2.commit(&mut merlin, &z2)?; ``` **Requirements:** - `w1.len() = w2.len() = 2^m` (equal power-of-2 sizes for batching) - Commitments stored separately: `root₁` and `root₂` - No concatenation or re-commitment needed #### Phase 2: R1CS Sumcheck with 2D Witness Encoding **Virtual witness structure:** Treat witness as 2D multilinear polynomial: ``` w(x, y) where: x ∈ {0, 1}: selects w₁ (x=0) or w₂ (x=1) y ∈ {0,1}^m: position within selected part w(0, y) = w₁(y) w(1, y) = w₂(y) ``` **Matrix encoding:** R1CS matrices A, B, C extended to 3D: ``` A(constraint_idx, x_witness, y_witness) where: constraint_idx ∈ {0,1}^m₀: constraint index x_witness ∈ {0, 1}: first/second half selector y_witness ∈ {0,1}^m: position within half ``` **R1CS sumcheck (standard):** ```rust // Verify: Σᵢ eq(i,r) · (Aᵢw · Bᵢw - Cᵢw) = 0 run_zk_sumcheck_prover( m_0, // Constraint variables [a, b, c, eq], |[a, b, c, eq]| { let f0 = eq.0 * (a.0 * b.0 - c.0); let f_em1 = (eq.0 + eq.0 - eq.1) * ((a.0 + a.0 - a.1) * (b.0 + b.0 - b.1) - (c.0 + c.0 - c.1)); let f_inf = (eq.1 - eq.0) * (a.1 - a.0) * (b.1 - b.0); [f0, f_em1, f_inf] }, &whir_for_hiding_spartan, &mut merlin ); ``` **Result:** Random constraint point `r`, claims on `a(r), b(r), c(r)` where: ``` a(r) = Σⱼ A(r,j)·w(j) ``` #### Phase 3: Multi-Tree Batched Opening **Step 1: Rewrite evaluation claim using 2D encoding** For matrix A (same for B, C): ``` a(r) = Σⱼ A(r,j)·w(j) Reindex j → (x,y): a(r) = Σₓ Σᵧ A(r,x,y)·w(x,y) = Σᵧ A(r,0,y)·w₁(y) + Σᵧ A(r,1,y)·w₂(y) ︸━━━━━━━━━━━━━━━ ︸━━━━━━━━━━━━━━━ weighted sum on w₁ weighted sum on w₂ ``` **Step 2: Batch commitments and weights** ```rust // Extract batching randomness let β = merlin.challenge_scalar()?; // Batched claim for matrices A, B, C with additional randomness let (β_A, β_B, β_C) = extract_batching_randomness(&mut merlin); // Combined weighted claim: // β_A·a(r) + β_B·b(r) + β_C·c(r) // = β_A·[Σᵧ A(r,0,y)·w₁(y) + Σᵧ A(r,1,y)·w₂(y)] // + β_B·[...] + β_C·[...] // = Σᵧ [β_A·A(r,0,y) + β_B·B(r,0,y) + β_C·C(r,0,y)]·w₁(y) // + β·Σᵧ [β_A·A(r,1,y) + β_B·B(r,1,y) + β_C·C(r,1,y)]·w₂(y) // // = Σᵧ weights₁(y)·w₁(y) + β·Σᵧ weights₂(y)·w₂(y) let weights_1 = compute_combined_weights(A, B, C, r, 0, β_A, β_B, β_C); let weights_2 = compute_combined_weights(A, B, C, r, 1, β_A, β_B, β_C); ``` **Step 3: Multi-tree WHIR opening (MODIFIED WHIR)** Run single WHIR sumcheck over the batched polynomial `w₁ + β·w₂`: ```rust // Conceptually: prove Σᵧ weights₁(y)·w₁(y) + β·Σᵧ weights₂(y)·w₂(y) = v run_multi_tree_whir_opening( commitments: [commitment_w1, commitment_w2], polynomials: [w1, w2], weights: [weights_1, weights_2], batching_coeff: β, merlin: &mut merlin ); ``` **WHIR sumcheck internals (m rounds):** Each round i: 1. Prover computes univariate: `g_i(X) = weights₁·w₁ + β·weights₂·w₂` evaluated along dimension i 2. Verifier checks consistency, samples random challenge `r_i` 3. Both fold to next round **Result after m rounds:** Single evaluation point `q ∈ {0,1}^m` **Final claims:** ``` weights₁(q)·w₁(q) + β·weights₂(q)·w₂(q) = final_value Requires proving: - w₁(q): Open Merkle tree at position q in commitment_w1 - w₂(q): Open Merkle tree at position q in commitment_w2 - weights₁(q), weights₂(q): Evaluate from sparse matrices (Spark protocol) ``` #### Phase 4: Matrix Evaluation (Spark) Since both matrix evaluations are at the **same point** `(r, 0, q)` and `(r, 1, q)`, they can be evaluated directly: ```rust // For weights₁(q) = β_A·A(r,0,q) + β_B·B(r,0,q) + β_C·C(r,0,q) // For weights₂(q) = β_A·A(r,1,q) + β_B·B(r,1,q) + β_C·C(r,1,q) // Batch both evaluations with random γ let γ = merlin.challenge_scalar()?; // Prove: weights₁(q) + γ·weights₂(q) // = Σ_{entries} value·eq(entry.row, r)· // [eq(entry.col, (0,q)) + γ·eq(entry.col, (1,q))] run_spark_matrix_evaluation( matrices: [A, B, C], eval_points: [(r, 0, q), (r, 1, q)], batching_coeffs: [β_A, β_B, β_C, γ], &mut merlin ); ``` **No additional sumcheck needed** - this is the standard Spark matrix evaluation. ### Implementation Requirements #### WHIR Library Modifications The current WHIR implementation needs extension to support multi-tree batching: **1. Modify `Witness` struct to hold multiple trees:** ```rust pub struct MultiTreeWitness<F, MerkleConfig> { polynomials: Vec<CoefficientList<F>>, merkle_trees: Vec<MerkleTree<MerkleConfig>>, merkle_leaves: Vec<Vec<F>>, ood_points: Vec<F>, per_tree_ood_answers: Vec<Vec<F>>, batching_randomness: F, // Extracted at opening time } ``` **2. Modify opening protocol to delay RLC:** Currently (single tree): ```rust // In commit_batch: RLC happens immediately let batching_randomness = prover_state.challenge_scalar(); batched_poly = poly₀ + β·poly₁ + ...; ``` Required (multi-tree): ```rust // In commit_batch: Store polynomials separately return MultiTreeWitness { polynomials: vec![poly₀, poly₁], merkle_trees: vec![tree₀, tree₁], // No RLC yet }; // In open_batch: RLC happens here let β = prover_state.challenge_scalar(); let batched_poly = poly₀ + β·poly₁; // Run sumcheck on batched_poly ``` **3. Modify opening proof to include multiple Merkle paths:** ```rust pub struct MultiTreeOpeningProof { sumcheck_rounds: Vec<UnivariatePolynomial>, // m rounds merkle_proofs: Vec<MerklePath>, // One per tree final_evaluations: Vec<F>, // w₁(q), w₂(q), ... } ``` ### Proof Size Analysis **Current implementation (with consistency proof):** | Component | Rounds | Field Elements | Cost | |-----------|--------|----------------|------| | R1CS sumcheck | m₀ | ~3m₀ | Standard | | Full witness commitment | - | O(λ) | Merkle root | | Full witness opening | m+1 | ~4(m+1) | WHIR rounds | | w₁ commitment | - | O(λ) | Merkle root | | Consistency sumcheck | m₁ | ~3m₁ | Additional | | Consistency opening (w₁) | m₁ | ~4m₁ | WHIR on w₁ | | Consistency opening (full) | m+1 | ~4(m+1) | WHIR on full | | **Total opening cost** | | **~11m + 7m₁ + 8** | | **Proposed implementation (multi-tree batched):** | Component | Rounds | Field Elements | Cost | |-----------|--------|----------------|------| | R1CS sumcheck | m₀ | ~3m₀ | Standard | | w₁ commitment | - | O(λ) | Merkle root | | w₂ commitment | - | O(λ) | Merkle root | | Multi-tree batched opening | m | ~4m | WHIR rounds | | 2 Merkle paths | - | ~2 log(2^m) | Authentication | | Spark matrix evaluation | ~m+1 | ~3(m+1) | Standard | | **Total opening cost** | | **~7m + 3** | | **Savings:** ~4m + 7m₁ + 5 field elements ≈ **35-40% reduction** for typical m=20, m₁=15 ### Security Considerations **Soundness:** The protocol maintains soundness through: 1. w₁ committed before challenge extraction (prevents adaptive attacks) 2. Fiat-Shamir for all randomness (β, γ from transcript) 3. WHIR batching soundness (Schwartz-Zippel lemma) 4. No duplication between commitments (no consistency needed) **Zero-knowledge:** Preserved through: 1. WHIR's existing ZK mechanisms (masking polynomials) 2. Independent commitments with independent randomness 3. Batching happens post-commitment (doesn't leak information) ### Migration Path **Phase 1:** Implement basic split witness (current work) - Use separate commitments for w₁ and w₂ - Keep consistency proof temporarily - Validate correctness with existing tests **Phase 2:** Add multi-tree batching to WHIR - Extend `Witness` struct for multiple trees - Modify opening protocol to delay RLC - Add multi-tree test cases **Phase 3:** Integrate multi-tree opening - Replace consistency proof with batched opening - Update ProveKit's `WhirR1CSProver` trait - Benchmark proof size reduction **Phase 4:** Optimize for production - Implement Spark matrix evaluation optimization - Fine-tune batching parameters - Add comprehensive test coverage