# Hashcaster - Part 3: boolcheck
*Special thanks to Lev Soukhanov for his great help all along the project and for the super interesting theoretical discussions we had.*
## Introduction
The boolcheck protocol in Hashcaster introduces a novel approach for efficiently verifying quadratic Boolean formulas using polynomial coordinates. In this context, boolean variables are represented as elements of a finite field (for example $\mathbb{F}_{2^{128}}$), where logical operations are naturally embedded within algebraic expressions.
The main goal of boolcheck is to verify the correctness of boolean formulas while minimizing the computational burden on both the prover and verifier. To illustrate this, we will focus on the andcheck protocol—a fundamental case that demonstrates the core techniques. However, this framework extends seamlessly to any homogeneous quadratic Boolean formula, and with slight modifications, can also accommodate non-homogeneous formulas.
To understand why boolcheck leverages Frobenius theory and the 4 Russians method, we will first explore the underlying mathematical foundations. By diving into Frobenius automorphisms, their impact on polynomial representations, and how they enable efficient extraction of packed field elements, we will see how they naturally fit into the construction of boolcheck. Additionally, we will introduce the 4 Russians method, a powerful optimization technique that significantly accelerates matrix-vector computations, making it a key component of our protocol.
This post will take a structured approach:
1. **Frobenius theory** – Understanding the role of Frobenius maps, traces, and their application to encoding Boolean logic efficiently.
2. **The 4 Russians method** – Exploring a classical optimization technique that speeds up structured matrix operations.
3. **Boolcheck protocol** – Implementing these mathematical tools to efficiently verify quadratic Boolean formulas.
4. **Multiopening optimization** – Using the 4 Russians method to reduce complexity in multi-opening verifications.
By the end of this post, the reader will have a clear understanding of how boolcheck transforms Boolean logic verification into an algebraic proof system optimized for computational efficiency.
## Frobenius Theory
### Frobenius map and orbit
Let’s start with a fundamental concept in finite fields. Suppose we are working with a base field $\mathbb{F}$ and an extension field $\mathbb{F}^d$. We define a basis for this extension as:
\begin{equation}
b_0, b_1, \dots, b_{d-1} \in \mathbb{F}^d
\end{equation}
These basis elements have a special property: any element of the extension field $\mathbb{F}^d$ can be uniquely represented as a linear combination of them using coefficients from the base field $\mathbb{F}$. In other words, given any element in $\mathbb{F}^d$, we can always find some values $p_0, p_1, \dots, p_{d-1}$ in $\mathbb{F}$ such that:
\begin{equation} x = p_0 b_0 + p_1 b_1 + \dots + p_{d-1} b_{d-1} \end{equation}
Next, with this structure in mind, let’s consider values $p_0, p_1, \dots, p_{d-1}$ from $\mathbb{F}$. Using these values, we define a function known as the packing map:
\begin{equation}
\text{pack}(p_0, \dots, p_{d-1}) = \sum_{i=0}^{d-1} b_i p_i
\end{equation}
This function encodes multiple field elements into a single field element. However, our goal is to reverse this process: given a packed value $p$, we want to recover the original values $p_0, p_1, \dots, p_{d-1}$ using algebraic methods. To achieve this, we introduce the Frobenius morphism, which will allow us to systematically extract these components.
#### Frobenius map
The Frobenius map is a fundamental automorphism in finite field arithmetic. Given a field $\mathbb{F}_{p^m}$ with characteristic $p$, the Frobenius map $\operatorname{Fr}$ is defined as:
\begin{equation}
\operatorname{Fr}(x) = x^p
\end{equation}
This function preserves the algebraic structure of the field, meaning that addition and multiplication behave predictably under its application.
In fields where the characteristic is $2$ (such as $\mathbb{F}_{2^m}$), the Frobenius map simplifies further:
\begin{equation}
\operatorname{Fr}(x) = x^2
\end{equation}
This property makes computations particularly efficient when working in binary fields.
### Properties of the Frobenius map
The Frobenius map is a fundamental automorphism in finite field arithmetic, meaning it preserves the structure of the field while transforming its elements in a predictable way. It possesses two key properties that make it particularly useful in algebraic computations:
#### Preservation of addition and multiplication
One of the most important features of the Frobenius map is that it preserves both addition and multiplication in the field. That is, for any elements $a, b \in \mathbb{F}$:
\begin{equation}
\mathrm{Fr}(a + b) = \mathrm{Fr}(a) + \mathrm{Fr}(b), \quad \mathrm{Fr}(a \cdot b) = \mathrm{Fr}(a) \cdot \mathrm{Fr}(b).
\end{equation}
While the preservation of multiplication follows directly from exponentiation, the preservation of addition requires a deeper explanation. This property is a direct consequence of the Frobenius identity, which states that in a field of characteristic $p$, for any $a, b \in \mathbb{F}$:
\begin{equation}
(a + b)^p = a^p + b^p.
\end{equation}
This result follows from the binomial theorem, which expands the power $(a + b)^p$ as:
\begin{equation}
(a + b)^p = a^p + \binom{p}{1} a^{p-1} b + \binom{p}{2} a^{p-2} b^2 + \dots + \binom{p}{p-1} a b^{p-1} + b^p.
\end{equation}
Here, the binomial coefficients are given by:
\begin{equation}
\binom{p}{k} = \frac{p!}{k! \, (p-k)!}.
\end{equation}
For all $1 \leq k \leq p-1$, these coefficients are multiples of $p$, meaning that in a field of characteristic $p$, they vanish:
\begin{equation}
\binom{p}{k} \equiv 0 \mod p.
\end{equation}
This simplifies the expansion to:
\begin{equation}
(a + b)^p = a^p + b^p.
\end{equation}
Thus, applying the Frobenius map distributes over both addition and multiplication, allowing it to be extended to polynomial expressions. For example, given any polynomial in the field:
\begin{equation}
\mathrm{Fr}(a^3 + b^2 c) = \mathrm{Fr}(a)^3 + \mathrm{Fr}(b)^2 \mathrm{Fr}(c).
\end{equation}
#### Periodicity of the Frobenius map
Another fundamental property of the Frobenius map is that it is periodic in finite fields. Specifically, in an extension field $\mathbb{F}_{p^m}$, applying the Frobenius map $m$ times results in the identity map:
\begin{equation}
\mathrm{Fr}^m(x) = x, \quad \text{for all } x \in \mathbb{F}_{p^m}.
\end{equation}
This follows from the fact that every element of $\mathbb{F}_{p^m}$ satisfies:
\begin{equation}
x^{p^m} = x.
\end{equation}
This periodic behavior is a direct consequence of the structure of finite fields and plays an essential role in extracting coefficients when working with the Frobenius map in algebraic constructions.
#### Frobenius orbit
Applying the Frobenius map multiple times to an element $r$ generates a sequence known as its *Frobenius orbit*:
\begin{equation}
r, \mathrm{Fr}(r), \mathrm{Fr}^2(r), \dots
\end{equation}
Each step in this sequence represents the application of the Frobenius map to the previous element. Since the Frobenius map is a field automorphism, it systematically transforms $r$ while preserving algebraic structure.
In many algebraic protocols, we also need to reverse this process. The inverse Frobenius orbit traces this sequence backward, effectively undoing the effect of the Frobenius map at each step. This allows us to reconstruct previous values from later ones, which is particularly useful when recovering packed coefficients in extension fields.
### Frobenius action on polynomials
The Frobenius automorphism plays a crucial role in algebraic structures, and we naturally want to extend its action to polynomials. However, a direct application of the Frobenius map by exponentiation does not preserve the degree of a polynomial. This creates a challenge when working with polynomial representations in finite fields.
To resolve this, we define the Frobenius action on polynomials in a coefficient-wise manner: applying $\operatorname{Fr}(P)$ means applying the Frobenius map to each coefficient of $P$. This leads to a key identity:
\begin{equation}
\operatorname{Fr}(P)(x) = \operatorname{Fr}\big(P(\operatorname{Fr}^{-1}(x))\big).
\end{equation}
This equality follows directly from the automorphism properties of the Frobenius map, particularly the preservation of multiplication:
\begin{equation}
\operatorname{Fr}(P(x)) = \operatorname{Fr}(P)\big(\operatorname{Fr}(x)\big).
\end{equation}
Expanding $P(x)$ as a polynomial,
\begin{equation}
P(x) = \sum_i a_i x^i,
\end{equation}
the coefficient-wise application of Frobenius gives:
\begin{equation}
\operatorname{Fr}(P(x)) = \sum_i \operatorname{Fr} (a_i) x^i.
\end{equation}
Using this, we can now verify our key identity:
\begin{equation}
\begin{aligned}
\operatorname{Fr}\left(P(\operatorname{Fr}^{-1}(x))\right) &= \operatorname{Fr}\left(P\right) \left ( \underbrace{\operatorname{Fr}\left(\operatorname{Fr}^{-1}(x)\right)}_{x} \right ) \\
&= \operatorname{Fr}(P)(x).
\end{aligned}
\end{equation}
This confirms that the Frobenius transformation of a polynomial can be expressed in terms of the inverse Frobenius applied to its argument, making it a powerful tool in algebraic constructions.
### Definition of the trace
In a finite field extension $\mathbb{F}_{q^n}$, the trace function provides a way to aggregate information across the Frobenius orbit of an element. The trace of an element $x \in \mathbb{F}_{q^n}$ is defined as:
\begin{equation}
\operatorname{tr}(x) = \sum_{i=0}^{n-1} \operatorname{Fr}^i(x),
\end{equation}
where:
- $\operatorname{Fr}$ is the Frobenius automorphism, given by $\operatorname{Fr}(y) = y^q$,
- $\operatorname{Fr}^i$ denotes the $i$-th application of the Frobenius map.
This function computes a summation over the entire Frobenius orbit of $x$, effectively collapsing information from all conjugates of $x$ in the field extension. A key property of the trace is its invariance under Frobenius permutation: since the Frobenius map preserves the field structure, it permutes the conjugates of $x$ symmetrically, ensuring that the trace does not depend on the order in which the terms are summed.
An important observation is that the only elements left unchanged by the Frobenius morphism—that is, those satisfying $\operatorname{Fr}(x) = x$—are precisely those in the base field $\mathbb{F}$. As a result, the trace function inherently maps elements from the extension field $\mathbb{F}_{q^n}$ back to $\mathbb{F}_q$. This works because summing over all Frobenius conjugates effectively eliminates any dependency on the extension structure. Highlighting this property is essential, as it clarifies why $\operatorname{tr}(x)$ always resides in $\mathbb{F}$ rather than some larger field.
:::info
**Example: Trace in a degree-3 extension**
Consider a finite field extension $\mathbb{F}_{q^3}$, where $n = 3$. The Frobenius orbit of an element $x \in \mathbb{F}_{q^3}$ consists of:
\begin{equation}
x, \quad x^q, \quad x^{q^2}.
\end{equation}
Applying the definition, the trace is:
\begin{equation}
\operatorname{tr}(x) = x + x^q + x^{q^2}.
\end{equation}
This sum remains in the base field $\mathbb{F}_q$ and uniquely encodes information about $x$ in a way that is independent of its specific representation in the extension field.
:::
### Key properties of the trace
The trace function possesses two fundamental algebraic properties:
- ### Linearity of the trace function
The trace function is a linear map over the base field $\mathbb{F}_q$, meaning that for any $a, b \in \mathbb{F}_q$ and any $x, y \in \mathbb{F}_{q^n}$:
\begin{equation}
\operatorname{tr}(ax + by) = a \operatorname{tr}(x) + b \operatorname{tr}(y).
\end{equation}
This follows from the fact that the Frobenius map is linear over $\mathbb{F}_q$. Since $\operatorname{Fr}(y) = y^q$ preserves addition over the base field, applying it iteratively ensures that summing traces maintains linearity.
- ### Non-degeneracy of the bilinear pairing
In algebra and field theory, a bilinear pairing is said to be non-degenerate if it does not collapse information, meaning that for any nonzero element $x$, there exists some $y$ such that the pairing evaluates to a nonzero value. In our case, the bilinear pairing is given by:
\begin{equation}
(x, y) \mapsto \operatorname{tr}(xy).
\end{equation}
Non-degeneracy ensures that if $\operatorname{tr}(xy) = 0$ for all $y \in \mathbb{F}_{q^n}$, then necessarily $x = 0$. This property is crucial because it guarantees that the trace function retains enough information to distinguish elements in $\mathbb{F}_{q^n}$ and does not annihilate entire subspaces.
**Why is the pairing non-degenerate?**
The non-degeneracy of this bilinear pairing follows from two key observations:
1. **The trace function is surjective onto $\mathbb{F}_q$.**
This means that for any nonzero element in the base field $\mathbb{F}_q$, there exists at least one element in $\mathbb{F}_{q^n}$ whose trace evaluates to that value. In particular, there exist elements $s \in \mathbb{F}_{q^n}$ for which $\operatorname{tr}(s) \neq 0$.
2. **Multiplication by a nonzero element $x$ defines a linear transformation.**
Consider the map:
\begin{equation}
y \mapsto \operatorname{tr}(xy).
\end{equation}
This transformation is linear over $\mathbb{F}_q$. If this transformation had a nontrivial kernel (i.e., if $\operatorname{tr}(xy) = 0$ for all $y$ and some nonzero $x$), then multiplication by $x$ would collapse all values in $\mathbb{F}_{q^n}$ to zero under the trace function, contradicting the fact that the trace is surjective. The only way for this to hold is if $x = 0$.
#### Explicit proof of non-degeneracy
To see this concretely, take any **nonzero** $x \in \mathbb{F}_{q^n}$. Because $\operatorname{tr}$ is surjective, we can always find some element $s \in \mathbb{F}_{q^n}$ such that:
\begin{equation}
\operatorname{tr}(s) \neq 0.
\end{equation}
Now, define:
\begin{equation}
y = s x^{-1}.
\end{equation}
Substituting this into the pairing:
\begin{equation}
\operatorname{tr}(xy) = \operatorname{tr}(x \cdot s x^{-1}).
\end{equation}
Since multiplication by $x$ and its inverse cancel out:
\begin{equation}
\operatorname{tr}(xy) = \operatorname{tr}(s).
\end{equation}
By construction, $\operatorname{tr}(s) \neq 0$, meaning that for this choice of $y$, the pairing is nonzero. This confirms that for every nonzero $x$, we can always find a $y$ such that $\operatorname{tr}(xy) \neq 0$, proving that the pairing is non-degenerate.
### Recovering coordinates using the trace and dual basis
In a finite field extension, elements can be expressed in terms of a basis. However, given an element $p$, we need a way to extract its coordinates relative to this basis. This is achieved using the trace function and a special dual basis, leveraging the non-degeneracy of the bilinear pairing.
### Definition of the dual basis
Let $\{b_0, \dots, b_{d-1}\}$ be a basis of the field extension $\mathbb{F}_{q^d}$ over $\mathbb{F}_q$. A dual basis $\{u_0, \dots, u_{d-1}\}$ is defined by the condition:
\begin{equation}
\operatorname{tr}(b_k u_j) = \delta_{kj},
\end{equation}
where $\delta_{kj}$ is the Kronecker delta:
\begin{equation}
\delta_{kj} =
\begin{cases}
1, \quad \text{if } k = j, \\
0, \quad \text{if } k \neq j.
\end{cases}
\end{equation}
This means that each dual basis element $u_j$ is chosen so that it "isolates" a single coordinate when paired with the corresponding basis element $b_k$ under the trace function.
### Claim: coordinate recovery formula
Any element $p \in \mathbb{F}_{q^d}$ can be written as a linear combination of the basis elements:
\begin{equation}
p = \sum_{i=0}^{d-1} b_i p_i.
\end{equation}
Our goal is to recover the coefficients $p_i$. Using the dual basis, we claim that:
\begin{equation}
p_i = \operatorname{tr}(u_i p).
\end{equation}
This equation provides an explicit method to compute the coordinate $p_i$ of $p$ with respect to the basis $\{b_0, \dots, b_{d-1}\}$.
### Proof of the coordinate recovery formula
To verify this formula, we proceed as follows:
1. **Substituting the expansion of $p$ into the trace function:**
Applying $\operatorname{tr}(u_j p)$ to both sides of $p = \sum_{i=0}^{d-1} b_i p_i$, we get:
\begin{equation}
\operatorname{tr}(u_j p) = \operatorname{tr} \left( u_j \sum_{i=0}^{d-1} b_i p_i \right).
\end{equation}
2. **Using linearity of the trace function:**
Since the trace function is linear, it distributes over summation and scalar multiplication:
\begin{equation}
\operatorname{tr}(u_j p) = \sum_{i=0}^{d-1} p_i \operatorname{tr}(u_j b_i).
\end{equation}
3. **Applying the duality condition:**
From the definition of the dual basis, we know:
\begin{equation}
\operatorname{tr}(u_j b_i) = \delta_{ij}.
\end{equation}
This means that for each term in the sum, only the term where $i = j$ survives, while all others vanish:
\begin{equation}
\operatorname{tr}(u_j p) = p_j.
\end{equation}
4. **Conclusion:**
Since this holds for all indices $j$, we conclude that:
\begin{equation}
p_j = \operatorname{tr}(u_j p),
\end{equation}
which proves our claim.
### Intuition behind the dual basis
The dual basis $\{u_0, \dots, u_{d-1}\}$ acts as a "coordinate extraction tool". When an element $p$ is expressed in the basis $\{b_0, \dots, b_{d-1}\}$, the trace pairing ensures that applying $\operatorname{tr}(u_j p)$ isolates the $j$-th coordinate $p_j$.
This construction is particularly useful in finite field arithmetic, error-correcting codes, and cryptographic protocols, where efficient coordinate extraction is necessary.
:::info
### **Example: Coordinate extraction in $\mathbb{F}_{2^3}$**
Consider the field $\mathbb{F}_{2^3}$, where $q = 2$ and $d = 3$. Suppose we have a basis:
\begin{equation}
b_0 = 1, \quad b_1 = \alpha, \quad b_2 = \alpha^2,
\end{equation}
where $\alpha$ is a primitive element satisfying $\alpha^3 + \alpha + 1 = 0$.
Now, assume the element $p$ is given by:
\begin{equation}
p = b_0 + b_1 + b_2.
\end{equation}
We want to extract the coordinates $p_0, p_1, p_2$ using the dual basis $\{u_0, u_1, u_2\}$. Applying the coordinate recovery formula:
\begin{equation}
p_i = \operatorname{tr}(u_i p),
\end{equation}
we compute the trace function for each dual basis element. This allows us to recover $p_0 = 1, p_1 = 1, p_2 = 1$, confirming the decomposition of $p$ in the given basis.
:::
### Unpacking polynomials using the Frobenius orbit
When working in a field extension, polynomials can be packed together in a structured way. Given a polynomial $P$—which could be univariate or multivariate—it may be expressed as a sum of smaller polynomials $\{P_0, \dots, P_{d-1}\}$ defined over the base field $\mathbb{F}$. The packing is done using a basis $\{b_0, \dots, b_{d-1}\}$:
\begin{equation}
P = \sum_{i=0}^{d-1} b_i P_i.
\end{equation}
Our goal is to recover the original polynomials $\{P_0, \dots, P_{d-1}\}$ from $P$ efficiently. To achieve this, we leverage the trace function and the dual basis $\{u_0, \dots, u_{d-1}\}$, which allow us to extract the individual components. Specifically, the $i$-th polynomial $P_i(x)$ can be recovered using:
\begin{equation}
P_i(x) = \operatorname{tr}(u_i P).
\end{equation}
### Expanding the expression
To further understand this formula, we expand the trace function:
\begin{equation}
P_i(x) = \sum_{j=0}^{n-1} \operatorname{Fr}^j(u_i) \big(\operatorname{Fr}^j P\big)(x),
\end{equation}
where $n$ is the degree of the field extension. Since the Frobenius map acts as an automorphism on the field, we can rewrite this expression using the inverse Frobenius shift:
\begin{equation}
P_i(x) = \sum_{j=0}^{n-1} \operatorname{Fr}^j(u_i) \operatorname{Fr}^j\big(P(\operatorname{Fr}^{-j} x)\big).
\end{equation}
This equation is fundamental to our approach because it allows us to compute each $P_i(x)$ entirely in terms of $P$, without directly evaluating $P_i$.
### Key insights from this expression
#### **1. Frobenius orbit decomposition**
This result shows that $P_i(x)$ is reconstructed by evaluating $P$ along the inverse Frobenius orbit of $x$, which consists of the values:
\begin{equation}
\{x, \operatorname{Fr}^{-1}(x), \operatorname{Fr}^{-2}(x), \dots, \operatorname{Fr}^{-(n-1)}(x)\}.
\end{equation}
Instead of computing $P_i(x)$ directly, we express it in terms of $P$, the packed polynomial, by summing evaluations of $P$ at different points in the Frobenius orbit. This is powerful because:
- It allows structured polynomial unpacking without needing explicit knowledge of the individual $P_i$ polynomials.
- The computation naturally aligns with the Frobenius action, making it efficient in field operations.
#### **2. Claim reduction for sumcheck protocols**
One of the most significant consequences of this result is that it reduces the problem of evaluating $P_i(x)$ to multiple evaluations of $P$, which simplifies computations in proof systems. This has direct implications for our interactive proof protocol, such as the sumcheck protocol, where:
- Instead of handling separate polynomials $P_i$, we work only with the packed polynomial $P$.
- The verification process becomes simpler, as the verifier can check claims about $P$ rather than needing access to individual components.
[comment]: ################################
[comment]: ################################
[comment]: ################################
[comment]: ################################
[comment]: ################################
## The Four Russians method for efficient matrix-vector multiplication
The Four Russians method is an optimization technique designed to speed up matrix operations by leveraging precomputed partial results. In the context of the boolcheck algorithm, we use this method to accelerate matrix-vector multiplication, rather than general Boolean matrix multiplication.
### Key idea: precomputing matrix application to a vector
Instead of performing direct multiplication of a matrix with a vector, we precompute the effect of applying the matrix to all possible small input vectors. This precomputation enables fast lookup-based computation, significantly reducing redundant calculations.
### How it works
1. Split the matrix into columns
- We treat the matrix as a set of column vectors rather than working with full row-by-row multiplication.
- Each column represents how a single input bit affects the output.
2. Precompute partial sums of matrix columns
- Since matrix-vector multiplication involves summing specific column combinations based on the input vector, we precompute all possible XOR sums of groups of 8 columns at a time.
- There are $2^8 = 256$ possible subsets of 8 columns, so we store 256 possible results per chunk of 8 columns.
3. Efficiently apply the matrix to the vector
- Instead of computing each matrix-vector product from scratch, we split the input vector into bytes.
- Each byte of the input is treated as an index into our precomputed table, allowing us to fetch the corresponding result instantly.
- This avoids unnecessary calculations, replacing multiplications with simple lookups.
### Step-by-step algorithm
#### 1. Precompute XOR sums for column subsets
- Divide the $128 \times 128$ matrix into 16 chunks of 8 columns each.
- For each 8-column chunk, compute and store the XOR sum of all 256 possible row combinations.
- This results in a precomputed table with $256 \times 16$ entries.
#### 2. Apply the precomputed matrix to the input vector
- Convert the input vector into a 16-byte representation (since 128 bits = 16 bytes).
- Each byte of the input is used as an index into the precomputed lookup table.
- The results for each chunk are summed together to produce the final output.
### Computational efficiency
- Precomputing the table requires $O(2^8 \times 16)$ operations, which is feasible.
- Applying the matrix to a vector is reduced from $O(n^2)$ to just 16 table lookups and XOR additions.
- This method drastically reduces runtime from $O(n^2)$ to approximately $O(n)$, making it well-suited for large-scale computations.
### Intuition: why this works
Instead of multiplying the full matrix with the vector, we precompute how groups of 8 columns interact with any 8-bit input vector. Since the boolean matrix multiplication only involves XOR operations, all possible results can be stored in advance and retrieved instantly.
This allows us to avoid redundant computations, replacing them with fast memory lookups.
### Practical application in boolcheck
In the boolcheck protocol, we frequently need to apply a binary matrix transformation to a vector during various steps of the protocol. The Four Russians method allows us to speed up these transformations significantly by leveraging precomputed lookup tables.
Instead of recomputing matrix-vector products every time, we reuse previously computed results, making verification much more efficient.
### Minimal example: Applying the Four Russians method
To make the Four Russians method more concrete, let’s go through a simple example. Consider a small $4 \times 4$ binary matrix $A$ and a 4-bit input vector $x$. Even if the technique explained in this example may differ slightly from the actual implementation, it will still allow you to have an overview of the method and understand it better after all these theoretical explanations.
#### Matrix and input vector
Let’s define a boolean matrix $A$:
\begin{equation}
A =
\begin{bmatrix}
1 & 0 & 1 & 1 \\
0 & 1 & 0 & 1 \\
1 & 1 & 1 & 0 \\
0 & 0 & 1 & 1
\end{bmatrix}
\end{equation}
And an input vector:
\begin{equation}
x =
\begin{bmatrix}
1 \\
0 \\
1 \\
1
\end{bmatrix}
\end{equation}
The standard boolean matrix-vector multiplication rule states that the result vector $y = A \cdot x$ is computed as follows:
\begin{equation}
y_i = \bigoplus_{j=1}^{4} (A_{ij} \land x_j)
\end{equation}
where $\oplus$ denotes XOR. Computing this row-by-row gives:
\begin{aligned}
y &= \begin{bmatrix}
1 & 0 & 1 & 1 \\
0 & 1 & 0 & 1 \\
1 & 1 & 1 & 0 \\
0 & 0 & 1 & 1
\end{bmatrix} \cdot
\begin{bmatrix}
1 \\
0 \\
1 \\
1
\end{bmatrix} \\
&= \begin{bmatrix}
(1 \land 1) \oplus (0 \land 0) \oplus (1 \land 1) \oplus (1 \land 1) \\
(0 \land 1) \oplus (1 \land 0) \oplus (0 \land 1) \oplus (1 \land 1) \\
(1 \land 1) \oplus (1 \land 0) \oplus (1 \land 1) \oplus (0 \land 1) \\
(0 \land 1) \oplus (0 \land 0) \oplus (1 \land 1) \oplus (1 \land 1)
\end{bmatrix} \\
&= \begin{bmatrix}
1 \oplus 0 \oplus 1 \oplus 1 \\
0 \oplus 0 \oplus 0 \oplus 1 \\
1 \oplus 0 \oplus 1 \oplus 0 \\
0 \oplus 0 \oplus 1 \oplus 1
\end{bmatrix}
\end{aligned}
So the output vector is:
\begin{equation}
y =
\begin{bmatrix}
1 \\
1 \\
0 \\
0
\end{bmatrix}
\end{equation}
#### Using the Four Russians method
Instead of performing these calculations row-by-row, we precompute all possible XOR combinations for small chunks of columns.
**Step 1: Split the matrix into 2-column blocks**
We divide $A$ into chunks of 2 columns each:
\begin{equation}
A_1 =
\begin{bmatrix}
1 & 0 \\
0 & 1 \\
1 & 1 \\
0 & 0
\end{bmatrix}, \quad
A_2 =
\begin{bmatrix}
1 & 1 \\
0 & 1 \\
1 & 0 \\
1 & 1
\end{bmatrix}
\end{equation}
**Step 2: Precompute XORs for each 2-column block**
For each possible 2-bit input, we compute the corresponding XOR sum for each block:
| Input bits $(x_1, x_2)$ | XOR sum for $A_1$ | XOR sum for $A_2$ |
|-------------------|--------------|--------------|
| $(0,0)$ | $(0,0,0,0)$ | $(0,0,0,0)$ |
| $(0,1)$ | $(0,1,1,0)$ | $(1,1,0,1)$ |
| $(1,0)$ | $(1,0,1,0)$ | $(1,0,1,1)$ |
| $(1,1)$ | $(1,1,0,0)$ | $(0,1,1,0)$ |
**Step 3: Use input vector to lookup values**
Now, we split $x$ into two-bit chunks:
\begin{equation}
x_1 = (1, 0), \quad x_2 = (1, 1)
\end{equation}
Using the precomputed table, we retrieve:
- For $(1,0)$ from $A_1$: $(1,0,1,0)$
- For $(1,1)$ from $A_2$: $(0,1,1,0)$
We compute the final result by XORing these values together:
\begin{aligned}
(1,0,1,0) \oplus (0,1,1,0) &= (1 \oplus 0, 0 \oplus 1, 1 \oplus 1, 0 \oplus 0) \\
&= (1,1,0,0)
\end{aligned}
which matches our direct computation.
#### Why this is faster
Instead of computing the product row-by-row, we simply:
1. Precompute all possible results for small column chunks (done once).
2. Split the input vector into byte-sized chunks and use them as indices to retrieve precomputed values.
3. Combine results using XOR, which is very efficient.
For large matrices, this reduces complexity from $O(n^2)$ to $O(n)$, as only a small table lookup and a few XOR operations are needed per chunk.
[comment]: ################################
[comment]: ################################
[comment]: ################################
[comment]: ################################
[comment]: ################################
## Boolcheck protocol
The boolcheck protocol is a verification mechanism designed for quadratic boolean formulas. To illustrate the core idea, we will focus on andcheck, a specific case of boolcheck. However, the same construction extends naturally to any homogeneous quadratic Boolean formula. Additionally, with some modifications, non-homogeneous quadratic formulas can also be accommodated.
For our setting, we assume the base field is $\mathbb{F} = \mathbb{F}_2$.
### Packed representation of polynomials
We consider two packed polynomials:
\begin{equation}
P = \sum_i b_i P_i, \quad Q = \sum_i b_i Q_i,
\end{equation}
where:
- $\{P_i\}$ and $\{Q_i\}$ are coordinate polynomials defined over $\mathbb{F}_2$.
- $\{b_i\}$ are basis elements over $\mathbb{F}_2$.
The goal is to efficiently verify a quadratic boolean formula using the sumcheck protocol, which involves the following sum:
\begin{equation}
\sum_x \left( \sum_i \big(b_i P_i(x) Q_i(x)\big) \cdot \text{eq}(x, y) \right),
\end{equation}
where $\text{eq}(x, y)$ is an equality test function, enforcing constraints between variables $x$ and $y$.
A direct computation of this sum can be expensive, so we use an optimized approach by precomputing the (multiquadratic) polynomial:
\begin{equation}
(P \land Q)(x) = \sum_i \big(b_i P_i(x) Q_i(x)\big).
\end{equation}
This polynomial represents the bitwise AND operation applied to $P$ and $Q$ at each evaluation point $x$.
### Evaluation set and extension to $(0,1,\infty)^n$
To evaluate polynomials efficiently, we define our evaluation domain as:
\begin{equation}
(0,1,\infty)^n.
\end{equation}
The special element $\infty$ is handled using a rule similar to Karatsuba multiplication:
- The value of a polynomial at $\infty$ is interpreted as its highest-degree coefficient.
Using this rule, for any linear polynomial, we obtain the fundamental identity:
\begin{equation}
P(\dots, 0, \dots) + P(\dots, 1, \dots) + P(\dots, \infty, \dots) = 0.
\end{equation}
This identity allows us to efficiently extend a $2^n$-sized evaluation table of $P$ into a $3^n$-sized table over $(0,1,\infty)^n$. Finally, we compute:
\begin{equation}
(P \land Q)(x),
\end{equation}
using pointwise AND operations on the extended tables of $P$ and $Q$. Since this extension takes place in $\mathbb{F}_2$, it commutes with packing operations, preserving efficiency.
### Complexity considerations
While efficient for small sizes, this method suffers from an asymptotic complexity of $O(n^{\log_2(3)})$, which arises because:
- We start with a $2^n$-sized evaluation table of $P$.
- We extend it to a $3^n$-sized evaluation table over $(0,1,\infty)^n$.
:::info
### Why does the equality hold?
For any multilinear polynomial, we have:
\begin{equation}
P(\dots, 0, \dots) + P(\dots, 1, \dots) + P(\dots, \infty, \dots) = 0.
\end{equation}
**Step-by-step explanation**
A multilinear polynomial over $\mathbb{F}_2$ is defined as:
\begin{equation}
P(x_1, x_2, \dots, x_n) = \sum_{S \subseteq [n]} c_S \prod_{i \in S} x_i,
\end{equation}
where $c_S \in \mathbb{F}_2$ are coefficients, and each term is a monomial of degree at most $n$. Since we are in $\mathbb{F}_2$, addition follows the rule:
\begin{equation}
x + x = 0,
\end{equation}
which means addition is equivalent to XOR.
**Evaluations at key points**
We evaluate $P$ at three special points:
- **At $0$:**
Setting $x_1 = x_2 = \dots = x_n = 0$, we get:
\begin{equation}
P(\dots, 0, \dots) = 0.
\end{equation}
- **At $1$:**
Setting $x_1 = x_2 = \dots = x_n = 1$, we compute:
\begin{equation}
P(\dots, 1, \dots) = \sum_{S \subseteq [n]} c_S \cdot 1 = \sum_{S \subseteq [n]} c_S.
\end{equation}
- **At $\infty$:**
The evaluation at $\infty$ corresponds to extracting only the terms of highest degree (i.e., monomials of degree exactly $n$):
\begin{equation}
P(\dots, \infty, \dots) = \sum_{\lvert S \rvert = n} c_S.
\end{equation}
**Summing the evaluations:**
\begin{equation}
P(\dots, 0, \dots) + P(\dots, 1, \dots) + P(\dots, \infty, \dots).
\end{equation}
Substituting the values:
\begin{equation}
0 + \sum_{S \subseteq [n]} c_S + \sum_{\lvert S \rvert = n} c_S.
\end{equation}
Since every monomial appears twice except those of highest degree (once in $P(1)$ and once in $P(\infty)$), and addition in $\mathbb{F}_2$ follows $x + x = 0$, we get:
\begin{equation}
\sum_{S \subseteq [n]} c_S + \sum_{\lvert S \rvert = n} c_S = 0.
\end{equation}
Thus:
\begin{equation}
P(\dots, 0, \dots) + P(\dots, 1, \dots) + P(\dots, \infty, \dots) = 0.
\end{equation}
:::
### Andcheck: combining Frobenius orbit calculations and the extension method
To efficiently evaluate andcheck, we use a hybrid approach inspired by the [Gruen univariate skip](https://eprint.iacr.org/2024/108). The idea is to leverage the extension method as a "skip" for the first $c$ rounds, postponing the full unpacking of polynomials until later. After this initial phase, we transition to the naive algorithm to complete the computation.
This method significantly reduces the complexity of the sumcheck protocol by avoiding unnecessary operations on unpacked coordinate polynomials early in the process. Below, we detail the full approach step by step.
1. ### Extending $P$ and $Q$
We start by extending the packed polynomials $P$ and $Q$ into evaluation tables of size:
\begin{equation}
3^{c+1} \cdot 2^{n-c-1}.
\end{equation}
This step ensures that the polynomials are evaluated over an extended domain:
\begin{equation}
(0, 1, \infty)^{c+1} \times (0, 1)^{n-c-1}.
\end{equation}
Here:
- The first $c+1$ variables take values in $\{0, 1, \infty\}$, allowing us to incorporate an additional layer of structure.
- The remaining $n-c-1$ variables remain binary, taking values in $\{0,1\}$.
This domain extension is crucial because it allows us to later apply the Frobenius orbit technique for more efficient polynomial evaluations.
2. ### Sumcheck with packed representation
Instead of working directly with the unpacked coordinate polynomials $P_i$ and $Q_i$, we perform the first $c$ rounds of sumcheck using the packed representation:
\begin{equation}
(P \land Q)(x) = \sum_i b_i P_i(x) Q_i(x).
\end{equation}
Since the basis elements $b_i$ remain fixed, this formulation lets us compute sums over multiple coordinate polynomials simultaneously, avoiding the computational cost of handling each coordinate separately. This step significantly reduces complexity in the early rounds.
3. ### Restricting $P$ and $Q$ to challenge points
After the first $c$ rounds, the verifier provides challenge points $r_0, \dots, r_{c-1}$. These values allow us to fix the first $c$ variables in $P$ and $Q$, reducing them to new polynomials of the form:
\begin{equation}
P_i(r_0, \dots, r_{c-1}, x_c, \dots, x_{n-1}).
\end{equation}
At this stage:
- The first $c$ variables $r_0, \dots, r_{c-1}$ are now constants rather than variables.
- The polynomials are now functions of only $x_c, \dots, x_{n-1}$, simplifying subsequent evaluations.
However, computing this restricted form efficiently requires optimizing the restriction step.
4. ### Efficient restriction using the 4 Russians method
A naive approach to restricting $P$ and $Q$ would require evaluating $d$ polynomials across $N$ points, leading to a complexity of approximately $d \cdot N$ basefield-by-extension-field multiplications. Instead, we improve performance using the 4 Russians method, which speeds up table-based polynomial evaluations.
The method proceeds as follows:
1. **Precompute an evaluation table**
We construct the table:
\begin{equation}
S(x_0, \dots, x_{c-1}) = \text{eq}(r_0, \dots, r_{c-1}; x_0, \dots, x_{c-1}),
\end{equation}
where $\text{eq}$ is an equality indicator function that ensures consistency between the challenge points and the polynomial's domain.
2. **Chunking and precomputing XOR values**
- Split the table $S(x_0, \dots, x_{c-1})$ into chunks of size 8.
- For each chunk, precompute all $256$ possible XOR values.
3. **Transposing and using AVX-256 optimizations**
- Process the evaluation table of $P$ by transposing each chunk (to allow for efficient vectorized operations).
- Fetch results efficiently using the precomputed XOR AVX-256 instructions
By structuring computations in this way, we avoid redundant multiplications and minimize memory access costs, making the restriction step much more efficient.
5. ### Completing the sumcheck protocol
Once we have restricted $P$ and $Q$, we continue with the remaining rounds of sumcheck. At this stage, we switch to the standard algebraic representation:
\begin{equation}
(P \land Q)(x) = \sum_i b_i P_i(x) Q_i(x).
\end{equation}
Since the first $c$ variables are already fixed, we now operate directly on the unpacked coordinate polynomials $P_i$ and $Q_i$. This allows us to perform final evaluations efficiently.
6. ### Recasting openings in the Frobenius orbit
The final step is to recast the openings of the coordinate polynomials in terms of the Frobenius orbit. Given a challenge point:
\begin{equation}
r = (r_0, \dots, r_{n-1}),
\end{equation}
we reinterpret these evaluations in terms of openings of the packed polynomials $P$ and $Q$, evaluated over the Frobenius orbit of $r$.
This transformation is particularly useful because:
- The Frobenius orbit structure ensures that polynomial evaluations remain well-structured, reducing overhead.
- This recasting simplifies verification since many terms naturally align in computations, avoiding unnecessary recomputations.
## Multiopening using the 4 Russians method
As you must have understood by now, the 4 Russians method is a highly efficient optimization technique, particularly useful in our situation for the multiopening argument. After performing boolcheck, we are left with $d$ individual claims for a polynomial $P$, each of the form:
\begin{equation}
P(\operatorname{Fr}^i(r)) = s_i,
\end{equation}
where:
- $\operatorname{Fr}^i(r)$ is the $i$-th Frobenius twist of the challenge $r$.
- $s_i$ is the corresponding evaluation result at that twisted point.
Since verifying each of these $d$ claims individually would be computationally expensive, we compress them into a single claim using random coefficients $\gamma_i$. This results in the combined multiopening claim:
\begin{equation}
\sum_i \gamma_i P(\operatorname{Fr}^i(r)) = \sum_i \gamma_i s_i.
\end{equation}
This transformation allows us to verify all $d$ claims using a single sumcheck protocol, significantly improving efficiency.
### Verifying the combined claim
To check the validity of the combined claim, we run a sumcheck protocol on the expression:
\begin{equation}
P(x) \cdot \left( \sum_i \gamma_i \cdot \text{eq}(\operatorname{Fr}^i(r), x) \right),
\end{equation}
where:
- $\text{eq}(\operatorname{Fr}^i(r), x)$ is an equality indicator function that returns 1 if $x = \operatorname{Fr}^i(r)$ and 0 otherwise.
- The sum $\sum_i \gamma_i \cdot \text{eq}(\operatorname{Fr}^i(r), x)$ aggregates the weighted equality checks across all $d$ Frobenius-twisted points.
The main computational challenge lies in efficiently evaluating this summation across all $d$ Frobenius twists.
### Rewriting the equality sum as a matrix-vector product
To optimize the computation of:
\begin{equation}
\sum_i \gamma_i \cdot \text{eq}(\operatorname{Fr}^i(r), x),
\end{equation}
we express it as a matrix-vector multiplication:
\begin{equation}
L \cdot \text{eq}(r, x),
\end{equation}
where:
- $L$ is a $d \times d$ Boolean matrix defined as:
\begin{equation}
L = \sum_i \gamma_i \operatorname{Fr}^i.
\end{equation}
- The vector $\text{eq}(r, x)$ encodes the equality checks for all values of $x$.
By transforming the problem into a structured matrix operation, we can apply the 4 Russians method to compute it efficiently.
### Efficient computation using the 4 Russians method
The 4 Russians method accelerates the computation of $L \cdot \text{eq}(r, x)$ using precomputations and chunk-based processing. The steps are as follows:
1. **Precompute XOR values:**
- Divide the matrix $L$ into chunks of 8 columns.
- Precompute all $256$ possible XOR combinations of entries in each chunk.
2. **Apply the matrix to the vector efficiently:**
- Split the input vector $\text{eq}(r, x)$ into bytes (chunks of size 8 bits).
- Use each byte to index into the precomputed XOR tables and retrieve the corresponding results instantly.
This optimization is based on the insight that boolean matrix operations, particularly XOR operations, can be transformed into simple table lookups. By working with 8-bit chunks, we limit the number of precomputations to $256$ combinations per chunk, keeping the process both efficient and scalable.
### Computational complexity and efficiency
Using this method, the total runtime for computing $L \cdot \text{eq}(r, x)$ is reduced to approximately $2d$ multiplications, making it highly efficient.
The combination of:
- Precomputed XOR tables
- Chunked processing
- Optimized Boolean matrix-vector multiplication
ensures that the multiopening verification is performed in near-optimal time, even for large values of $d$.
### Impact on multiopening arguments
By applying the 4 Russians method, we significantly reduce the computational complexity of verifying multiopening claims. This optimization enhances the efficiency of our boolcheck protocol, especially when dealing with high-dimensional Frobenius twists in interactive proofs.
The method allows us to:
- Compress multiple claims into a single sumcheck.
- Optimize the key matrix-vector multiplication step.
- Achieve significant speed improvements over naive verification methods.
## Conclusion
The boolcheck protocol demonstrates how carefully designed algebraic structures can significantly optimize proof systems for quadratic Boolean formulas. By embedding Boolean operations within a finite field and leveraging key mathematical tools, boolcheck achieves a scalable and computationally efficient verification process.
By combining finite field algebra, sumcheck protocol, and matrix optimization techniques, boolcheck achieves a highly efficient Boolean formula verification mechanism. This structured approach to arithmetizing Boolean logic ensures that both provers and verifiers can operate within optimal constraints, making boolcheck a practical and scalable solution for proof systems in cryptographic applications.