Redefining the `combine` Function for the Univariate Skip Optimization

# Redefining the `combine` Function for the Univariate Skip Optimization This note details the necessary modifications to the `combine` function, originally designed for the sumcheck protocol over a Boolean hypercube, to support the univariate skip optimization. This optimization requires handling polynomials over a mixed domain $D \times H^j$, where $D$ is a multiplicative subgroup of a finite field. ## How the Original `combine` Function Works and Its Limitations for the Univariate Skip The existing `combine` function in `statement.rs` is purpose-built for multilinear polynomials evaluated over the input domain of the Boolean hypercube $\{0,1\}^m$. Its core purpose is to compute the batched evaluation polynomial $W(X)$ defined as: $$ W(X) = \sum_{i=0}^{t-1} \gamma^i \cdot \text{eq}_{z_i}(X) $$ where: * $t$ is the number of constraints. * $\gamma$ is a random challenge. * $z_i \in \mathbb{F}^m$ are the constraint points, whose coordinates are elements of the field $\mathbb{F}$. * $\text{eq}_{z_i}(X)$ is the equality polynomial, which evaluates to $1$ if $X = z_i$ and $0$ for all other $X \in \{0,1\}^m$. The implementation computes the evaluations of the batched polynomial $W(X)$ for all $X \in \{0,1\}^m$ by leveraging an iterative process. This method is possible because, for the Boolean hypercube domain, the equality polynomial has a special structure. For a given constraint point $z = (z_1, \dots, z_m) \in \mathbb{F}^m$, this polynomial is defined as a product of simple, independent terms for each variable: $$ \text{eq}_z(X_1, \dots, X_m) = \prod_{j=1}^{m} \underbrace{\left(X_j z_j + (1-X_j)(1-z_j)\right)}_{\text{Term for variable } X_j \text{ and coordinate } z_j} $$ The algorithm materializes the full table of $2^m$ evaluations by starting with the initial weights (powers of $\gamma$) and then, for each dimension $j=1, \dots, m$, updating the entire table. At each step $j$, the table is split into two halves corresponding to points where $X_j = 0$ and $X_j = 1$. The update rules are derived directly from the formula above: * For the half of the table where $X_j = 0$, all values are multiplied by the factor $(1 - z_j)$. * For the half of the table where $X_j = 1$, all values are multiplied by the factor $z_j$. This recursive update is efficient because it processes one dimension at a time. The entire method is fundamentally tied to the input domain being the Boolean hypercube. The decomposition of the equality polynomial into a product of independent, linear terms is a unique property of this domain. For any other input domain, such as a multiplicative subgroup, the equality polynomial becomes a non-decomposable Lagrange basis polynomial, which does not permit this simple, dimension-by-dimension construction. The univariate skip optimization fundamentally changes this **input domain structure**, rendering the existing logic inapplicable. ## The New Domain Structure: $D \times H^{n-k}$ The univariate skip optimization partitions the $n$ variables into two sets: 1. The first $k$ variables are mapped to a single variable $X$ that lives in a multiplicative subgroup $D \subset \mathbb{F}$ of size $|D| = 2^k$. 2. The remaining $j = n-k$ variables remain defined over the Boolean hypercube $H^{n-k} = \{0,1\}^{n-k}$. Consequently, our evaluation domain is no longer $\{0,1\}^n$ but the mixed domain $D \times H^{n-k}$. A point in this new domain is represented as $(x, \mathbf{y})$, where $x \in D$ and $\mathbf{y} \in H^{n-k}$. This change requires us to redefine the equality polynomial $\text{eq}_z$ for a constraint point $z = (y, \mathbf{b})$ where $y \in D$ and $\mathbf{b} \in H^{n-k}$. The new equality polynomial is a product of two independent equality polynomials: $$ \text{eq}_{(y, \mathbf{b})}(X, \mathbf{Y}) = \underbrace{\text{eq}_D(X, y)}_{\text{Part 1: Subgroup}} \cdot \underbrace{\text{eq}_{H^{n-k}}(\mathbf{Y}, \mathbf{b})}_{\text{Part 2: Hypercube}} $$ The hypercube part, $\text{eq}_{H^{n-k}}$, can still be handled by the original logic. The novel part is defining and computing $\text{eq}_D(X, y)$. ## The Equality Polynomial for a Multiplicative Subgroup $D$ The equality polynomial $\text{eq}_D(X, y)$ is the Lagrange basis polynomial $L_y(X)$ for the point $y \in D$ over the domain $D$. For a cyclic group $D$ of order $|D|$, this polynomial has a standard, compact form: \begin{aligned} \text{eq}_D(X, y) = L_y(X) &= \frac{1}{|D|} \sum_{i=0}^{|D|-1} \left(\frac{X}{y}\right)^i \\ &= \frac{1}{|D|} \sum_{i=0}^{|D|-1} (Xy^{-1})^i \end{aligned} This formula ensures that $\text{eq}_D(y, y) = 1$ and $\text{eq}_D(x, y) = 0$ for any other $x \in D, x \neq y$. :::info ### Examples Let's expand this formula for the specific cases shown on the whiteboard to build intuition. **Case 1: $D = \{\pm 1\}$** Here, $|D| = 2$. For any $y \in D$, we have $y^{-1} = y$. The formula becomes: $$ \text{eq}_D(X, y) = \frac{1}{2} \sum_{i=0}^{1} (Xy)^i = \frac{1}{2} (1 + Xy) $$ **Case 2: $D = \{\pm 1, \pm i\}$ (the 4th roots of unity)** Here, $|D| = 4$. For any $y \in D$, we have $y^{-1} = y^3$. The formula becomes (for any element $y \in D$, we have $y^4 = 1$): $$ \begin{align*} \text{eq}_D(X, y) &= \frac{1}{4} \sum_{i=0}^{3} (Xy^{-1})^i = \frac{1}{4} \sum_{i=0}^{3} (Xy^3)^i \\ &= \frac{1}{4} \left( 1 + (Xy^3) + (Xy^3)^2 + (Xy^3)^3 \right) \\ &= \frac{1}{4} \left( 1 + Xy^3 + X^2y^6 + X^3y^9 \right) \\ &= \frac{1}{4} (1 + Xy^3 + X^2\underbrace{y^2}_{\substack{ y^6 = y^{4} \cdot y^2 = 1 \cdot y^2 = y^2}} + X^3\underbrace{y}_{\substack{y^9 = (y^4)^2 \cdot y = 1^2 \cdot y = y}}) \end{align*} $$ ::: :::success ### Python script The following Python script provides an empirical test, confirming that for any multiplicative subgroup for a given modulus, the formula correctly evaluates to 1 when its inputs are identical and 0 otherwise, thereby validating our required conditions. ```python import math def get_prime_factors(n): """Finds the unique prime factors of a number n.""" factors = set() # Handle factor 2 while n % 2 == 0: factors.add(2) n = n // 2 # Handle odd factors for i in range(3, int(math.sqrt(n)) + 1, 2): while n % i == 0: factors.add(i) n = n // i # If n is a prime number greater than 2 if n > 2: factors.add(n) return list(factors) def find_generator(p): """Finds a generator for the multiplicative group of integers modulo p.""" if not isinstance(p, int) or p <= 1: raise ValueError("p must be a prime number.") # The order of the multiplicative group F_p* is p-1. order = p - 1 # Find the prime factors of the order. prime_factors = get_prime_factors(order) # Iterate through potential generators from 2 to p-1. for g in range(2, p): is_generator = True # A number g is a generator if for all prime factors q of (p-1), # g^((p-1)/q) is not congruent to 1 mod p. for q in prime_factors: if pow(g, order // q, p) == 1: is_generator = False break # If it passes all checks, it's a generator. if is_generator: return g return None # Should not happen for a prime p def find_subgroups(p, max_order): """ Finds all multiplicative subgroups of F_p* up to a given max_order. """ # Order of the full multiplicative group F_p*. group_order = p - 1 # Find a generator for the full group. g = find_generator(p) subgroups = [] # The order of any subgroup must divide the order of the group. # Find all divisors of the group's order. for d in range(1, group_order + 1): if group_order % d == 0 and d <= max_order: # For each divisor d, there is a unique subgroup of that order. # Its generator 'h' can be found by g^((p-1)/d). h = pow(g, group_order // d, p) # Generate the subgroup elements by taking powers of its generator h. subgroup = sorted([pow(h, i, p) for i in range(d)]) subgroups.append(subgroup) return subgroups def eq_D(X, y, D, p): """ Calculates the Lagrange basis polynomial L_y(X) over a subgroup D. The formula is: (1/|D|) * Σ_{i=0}^{|D|-1} (X * y^{-1})^i mod p """ # Get the order of the subgroup, |D|. order = len(D) # Calculate the modular multiplicative inverse of the order, (1/|D|) mod p. order_inv = pow(order, -1, p) # Calculate the modular multiplicative inverse of y, (y^{-1}) mod p. y_inv = pow(y, -1, p) # Calculate the base of the power in the summation, (X * y^{-1}) mod p. base = (X * y_inv) % p # Compute the sum of the geometric series Σ base^i for i from 0 to |D|-1. total_sum = sum(pow(base, i, p) for i in range(order)) % p # Multiply the final sum by the inverse of the order to get the result. result = (total_sum * order_inv) % p return result def verify_subgroup(D, p): """ Verifies that eq_D(X, y) is (for all X, y in D): - 1 if X == y - 0 if X != y. """ print(f"\n--- Verifying Subgroup D = {D} (mod {p}) ---") for y in D: for X in D: value = eq_D(X, y, D, p) if X == y: assert value == 1, ( f"PANIC: eq_D(X={X}, y={y}) should be 1, but was {value}" ) else: assert value == 0, ( f"PANIC: eq_D(X={X}, y={y}) should be 0, but was {value}" ) print(f"✅ All checks passed for subgroup D = {D}.") # Define the prime modulus for our finite field F_p. p = 17 # Define the maximum order of the subgroups you want to verify. max_order_to_check = 16 # Automatically find all valid subgroups. all_subgroups = find_subgroups(p, max_order_to_check) # Loop through the found subgroups and verify each one. for subgroup in all_subgroups: verify_subgroup(subgroup, p) print( f"\nAll subgroups of F_{p} up to order {max_order_to_check} verified successfully." ) ``` ::: ## The New `combine` Logic - NO NEED TO READ THESE ARE SOME IDEAS ABOUT IMPLEMENTATION Our goal is to compute the evaluations of the batched polynomial $W(X, \mathbf{Y})$ over its new mixed domain, which is the Cartesian product of a multiplicative subgroup $D$ and a Boolean hypercube $H^{n-k}$. The polynomial is defined as a sum over all $t$ constraints. For each point $(X, \mathbf{Y})$ in the domain, its value is: $$ W(X, \mathbf{Y}) = \sum_{i=0}^{t-1} \underbrace{\gamma^i}_{\text{Challenge}} \cdot \underbrace{\text{eq}_{D}(X, y_i)}_{\text{Subgroup Part}} \cdot \underbrace{\text{eq}_{H^{n-k}}(\mathbf{Y}, \mathbf{b}_i)}_{\text{Hypercube Part}} $$Here, each constraint point $z_i$ has been split into its two components: $y_i \in D$ and $\mathbf{b}_i \in H^{n-k}$. To implement this, we'll use a direct and simple algorithm. The core idea is to think of the final evaluation table as a canvas. We will compute the contribution of each constraint individually and "add" it to this canvas, one layer at a time, until the final result is complete.