Hashcaster - Part 3: boolcheck

Special thanks to Lev Soukhanov for his great help all along the project and for the super interesting theoretical discussions we had.

Introduction

The boolcheck protocol in Hashcaster introduces a novel approach for efficiently verifying quadratic Boolean formulas using polynomial coordinates. In this context, boolean variables are represented as elements of a finite field (for example

F_{2^{128}}

), where logical operations are naturally embedded within algebraic expressions.

The main goal of boolcheck is to verify the correctness of boolean formulas while minimizing the computational burden on both the prover and verifier. To illustrate this, we will focus on the andcheck protocol—a fundamental case that demonstrates the core techniques. However, this framework extends seamlessly to any homogeneous quadratic Boolean formula, and with slight modifications, can also accommodate non-homogeneous formulas.

To understand why boolcheck leverages Frobenius theory and the 4 Russians method, we will first explore the underlying mathematical foundations. By diving into Frobenius automorphisms, their impact on polynomial representations, and how they enable efficient extraction of packed field elements, we will see how they naturally fit into the construction of boolcheck. Additionally, we will introduce the 4 Russians method, a powerful optimization technique that significantly accelerates matrix-vector computations, making it a key component of our protocol.

This post will take a structured approach:

Frobenius theory – Understanding the role of Frobenius maps, traces, and their application to encoding Boolean logic efficiently.
The 4 Russians method – Exploring a classical optimization technique that speeds up structured matrix operations.
Boolcheck protocol – Implementing these mathematical tools to efficiently verify quadratic Boolean formulas.
Multiopening optimization – Using the 4 Russians method to reduce complexity in multi-opening verifications.

By the end of this post, the reader will have a clear understanding of how boolcheck transforms Boolean logic verification into an algebraic proof system optimized for computational efficiency.

Frobenius Theory

Frobenius map and orbit

Let’s start with a fundamental concept in finite fields. Suppose we are working with a base field

F

and an extension field

F^{d}

. We define a basis for this extension as:

b_{0}, b_{1}, \dots, b_{d - 1} \in F^{d}

These basis elements have a special property: any element of the extension field

F^{d}

can be uniquely represented as a linear combination of them using coefficients from the base field

F

. In other words, given any element in

F^{d}

, we can always find some values

p_{0}, p_{1}, \dots, p_{d - 1}

F

such that:

x = p_{0} b_{0} + p_{1} b_{1} + \dots + p_{d - 1} b_{d - 1}

Next, with this structure in mind, let’s consider values

p_{0}, p_{1}, \dots, p_{d - 1}

from

F

. Using these values, we define a function known as the packing map:

pack (p_{0}, \dots, p_{d - 1}) = \sum_{i = 0}^{d - 1} b_{i} p_{i}

This function encodes multiple field elements into a single field element. However, our goal is to reverse this process: given a packed value

p

, we want to recover the original values

p_{0}, p_{1}, \dots, p_{d - 1}

using algebraic methods. To achieve this, we introduce the Frobenius morphism, which will allow us to systematically extract these components.

Frobenius map

The Frobenius map is a fundamental automorphism in finite field arithmetic. Given a field

F_{p^{m}}

with characteristic

p

, the Frobenius map

Fr

is defined as:

Fr (x) = x^{p}

This function preserves the algebraic structure of the field, meaning that addition and multiplication behave predictably under its application.

In fields where the characteristic is

2

(such as

F_{2^{m}}

), the Frobenius map simplifies further:

Fr (x) = x^{2}

This property makes computations particularly efficient when working in binary fields.

Properties of the Frobenius map

The Frobenius map is a fundamental automorphism in finite field arithmetic, meaning it preserves the structure of the field while transforming its elements in a predictable way. It possesses two key properties that make it particularly useful in algebraic computations:

Preservation of addition and multiplication

One of the most important features of the Frobenius map is that it preserves both addition and multiplication in the field. That is, for any elements

a, b \in F

Fr (a + b) = Fr (a) + Fr (b), Fr (a \cdot b) = Fr (a) \cdot Fr (b) .

While the preservation of multiplication follows directly from exponentiation, the preservation of addition requires a deeper explanation. This property is a direct consequence of the Frobenius identity, which states that in a field of characteristic

p

, for any

a, b \in F

(a + b)^{p} = a^{p} + b^{p} .

This result follows from the binomial theorem, which expands the power

(a + b)^{p}

as:

(a + b)^{p} = a^{p} + (\binom{p}{1}) a^{p - 1} b + (\binom{p}{2}) a^{p - 2} b^{2} + \dots + (\binom{p}{p - 1}) a b^{p - 1} + b^{p} .

Here, the binomial coefficients are given by:

(\binom{p}{k}) = \frac{p!}{k! (p - k)!} .

For all

1 \leq k \leq p - 1

, these coefficients are multiples of

p

, meaning that in a field of characteristic

p

, they vanish:

(\binom{p}{k}) \equiv 0 \mod p .

This simplifies the expansion to:

(a + b)^{p} = a^{p} + b^{p} .

Thus, applying the Frobenius map distributes over both addition and multiplication, allowing it to be extended to polynomial expressions. For example, given any polynomial in the field:

Fr (a^{3} + b^{2} c) = Fr (a)^{3} + Fr (b)^{2} Fr (c) .

Periodicity of the Frobenius map

Another fundamental property of the Frobenius map is that it is periodic in finite fields. Specifically, in an extension field

F_{p^{m}}

, applying the Frobenius map

m

times results in the identity map:

{Fr}^{m} (x) = x, for all x \in F_{p^{m}} .

This follows from the fact that every element of

F_{p^{m}}

satisfies:

x^{p^{m}} = x .

This periodic behavior is a direct consequence of the structure of finite fields and plays an essential role in extracting coefficients when working with the Frobenius map in algebraic constructions.

Frobenius orbit

Applying the Frobenius map multiple times to an element

r

generates a sequence known as its Frobenius orbit:

r, Fr (r), {Fr}^{2} (r), \dots

Each step in this sequence represents the application of the Frobenius map to the previous element. Since the Frobenius map is a field automorphism, it systematically transforms

r

while preserving algebraic structure.

In many algebraic protocols, we also need to reverse this process. The inverse Frobenius orbit traces this sequence backward, effectively undoing the effect of the Frobenius map at each step. This allows us to reconstruct previous values from later ones, which is particularly useful when recovering packed coefficients in extension fields.

Frobenius action on polynomials

The Frobenius automorphism plays a crucial role in algebraic structures, and we naturally want to extend its action to polynomials. However, a direct application of the Frobenius map by exponentiation does not preserve the degree of a polynomial. This creates a challenge when working with polynomial representations in finite fields.

To resolve this, we define the Frobenius action on polynomials in a coefficient-wise manner: applying

Fr (P)

means applying the Frobenius map to each coefficient of

P

. This leads to a key identity:

Fr (P) (x) = Fr (P ({Fr}^{- 1} (x))) .

This equality follows directly from the automorphism properties of the Frobenius map, particularly the preservation of multiplication:

Fr (P (x)) = Fr (P) (Fr (x)) .

Expanding

P (x)

as a polynomial,

P (x) = \sum_{i} a_{i} x^{i},

the coefficient-wise application of Frobenius gives:

Fr (P (x)) = \sum_{i} Fr (a_{i}) x^{i} .

Using this, we can now verify our key identity:

\begin{aligned} Fr (P ({Fr}^{- 1} (x))) & = Fr (P) (\underset{x}{\underset{⏟}{Fr ({Fr}^{- 1} (x))}}) \\ = Fr (P) (x) . \end{aligned}

This confirms that the Frobenius transformation of a polynomial can be expressed in terms of the inverse Frobenius applied to its argument, making it a powerful tool in algebraic constructions.

Definition of the trace

In a finite field extension

F_{q^{n}}

, the trace function provides a way to aggregate information across the Frobenius orbit of an element. The trace of an element

x \in F_{q^{n}}

is defined as:

tr (x) = \sum_{i = 0}^{n - 1} {Fr}^{i} (x),

where:

$Fr$ is the Frobenius automorphism, given by
$Fr (y) = y^{q}$ ,
${Fr}^{i}$ denotes the
$i$ -th application of the Frobenius map.

This function computes a summation over the entire Frobenius orbit of

x

, effectively collapsing information from all conjugates of

x

in the field extension. A key property of the trace is its invariance under Frobenius permutation: since the Frobenius map preserves the field structure, it permutes the conjugates of

x

symmetrically, ensuring that the trace does not depend on the order in which the terms are summed.

An important observation is that the only elements left unchanged by the Frobenius morphism—that is, those satisfying

Fr (x) = x

—are precisely those in the base field

F

. As a result, the trace function inherently maps elements from the extension field

F_{q^{n}}

back to

F_{q}

. This works because summing over all Frobenius conjugates effectively eliminates any dependency on the extension structure. Highlighting this property is essential, as it clarifies why

tr (x)

always resides in

F

rather than some larger field.

Example: Trace in a degree-3 extension

Consider a finite field extension

F_{q^{3}}

, where

n = 3

. The Frobenius orbit of an element

x \in F_{q^{3}}

consists of:

x, x^{q}, x^{q^{2}} .

Applying the definition, the trace is:

tr (x) = x + x^{q} + x^{q^{2}} .

This sum remains in the base field

F_{q}

and uniquely encodes information about

x

in a way that is independent of its specific representation in the extension field.

Key properties of the trace

The trace function possesses two fundamental algebraic properties:

Linearity of the trace function

The trace function is a linear map over the base field

F_{q}

, meaning that for any

a, b \in F_{q}

and any

x, y \in F_{q^{n}}

tr (a x + b y) = a tr (x) + b tr (y) .

This follows from the fact that the Frobenius map is linear over

F_{q}

. Since

Fr (y) = y^{q}

preserves addition over the base field, applying it iteratively ensures that summing traces maintains linearity.

Non-degeneracy of the bilinear pairing

In algebra and field theory, a bilinear pairing is said to be non-degenerate if it does not collapse information, meaning that for any nonzero element

x

, there exists some

y

such that the pairing evaluates to a nonzero value. In our case, the bilinear pairing is given by:

(x, y) \mapsto tr (x y) .

Non-degeneracy ensures that if

tr (x y) = 0

for all

y \in F_{q^{n}}

, then necessarily

x = 0

. This property is crucial because it guarantees that the trace function retains enough information to distinguish elements in

F_{q^{n}}

and does not annihilate entire subspaces.

Why is the pairing non-degenerate?

The non-degeneracy of this bilinear pairing follows from two key observations:

The trace function is surjective onto
$F_{q}$ .
This means that for any nonzero element in the base field
$F_{q}$ , there exists at least one element in
$F_{q^{n}}$ whose trace evaluates to that value. In particular, there exist elements
$s \in F_{q^{n}}$ for which
$tr (s) \neq 0$ .
Multiplication by a nonzero element
$x$ defines a linear transformation.
Consider the map:

$y \mapsto tr (x y) .$

This transformation is linear over
$F_{q}$ . If this transformation had a nontrivial kernel (i.e., if
$tr (x y) = 0$ for all
$y$ and some nonzero
$x$ ), then multiplication by
$x$ would collapse all values in
$F_{q^{n}}$ to zero under the trace function, contradicting the fact that the trace is surjective. The only way for this to hold is if
$x = 0$ .

Explicit proof of non-degeneracy

To see this concretely, take any nonzero

x \in F_{q^{n}}

. Because

tr

is surjective, we can always find some element

s \in F_{q^{n}}

such that:

tr (s) \neq 0.

Now, define:

y = s x^{- 1} .

Substituting this into the pairing:

tr (x y) = tr (x \cdot s x^{- 1}) .

Since multiplication by

x

and its inverse cancel out:

tr (x y) = tr (s) .

By construction,

tr (s) \neq 0

, meaning that for this choice of

y

, the pairing is nonzero. This confirms that for every nonzero

x

, we can always find a

y

such that

tr (x y) \neq 0

, proving that the pairing is non-degenerate.

Recovering coordinates using the trace and dual basis

In a finite field extension, elements can be expressed in terms of a basis. However, given an element

p

, we need a way to extract its coordinates relative to this basis. This is achieved using the trace function and a special dual basis, leveraging the non-degeneracy of the bilinear pairing.

Definition of the dual basis

Let

{b_{0}, \dots, b_{d - 1}}

be a basis of the field extension

F_{q^{d}}

over

F_{q}

. A dual basis

{u_{0}, \dots, u_{d - 1}}

is defined by the condition:

tr (b_{k} u_{j}) = δ_{k j},

where

δ_{k j}

is the Kronecker delta:

δ_{k j} = {\begin{cases} 1, if k = j, \\ 0, if k \neq j . \end{cases}

This means that each dual basis element

u_{j}

is chosen so that it "isolates" a single coordinate when paired with the corresponding basis element

b_{k}

under the trace function.

Claim: coordinate recovery formula

Any element

p \in F_{q^{d}}

can be written as a linear combination of the basis elements:

p = \sum_{i = 0}^{d - 1} b_{i} p_{i} .

Our goal is to recover the coefficients

p_{i}

. Using the dual basis, we claim that:

p_{i} = tr (u_{i} p) .

This equation provides an explicit method to compute the coordinate

p_{i}

p

with respect to the basis

{b_{0}, \dots, b_{d - 1}}

Proof of the coordinate recovery formula

To verify this formula, we proceed as follows:

Substituting the expansion of
$p$ into the trace function:

Applying
$tr (u_{j} p)$ to both sides of
$p = \sum_{i = 0}^{d - 1} b_{i} p_{i}$ , we get:

$tr (u_{j} p) = tr (u_{j} \sum_{i = 0}^{d - 1} b_{i} p_{i}) .$
Using linearity of the trace function:

Since the trace function is linear, it distributes over summation and scalar multiplication:

$tr (u_{j} p) = \sum_{i = 0}^{d - 1} p_{i} tr (u_{j} b_{i}) .$
Applying the duality condition:

From the definition of the dual basis, we know:

$tr (u_{j} b_{i}) = δ_{i j} .$

This means that for each term in the sum, only the term where
$i = j$ survives, while all others vanish:

$tr (u_{j} p) = p_{j} .$
Conclusion:

Since this holds for all indices
$j$ , we conclude that:

$p_{j} = tr (u_{j} p),$

which proves our claim.

Intuition behind the dual basis

The dual basis

{u_{0}, \dots, u_{d - 1}}

acts as a "coordinate extraction tool". When an element

p

is expressed in the basis

{b_{0}, \dots, b_{d - 1}}

, the trace pairing ensures that applying

tr (u_{j} p)

isolates the

j

-th coordinate

p_{j}

This construction is particularly useful in finite field arithmetic, error-correcting codes, and cryptographic protocols, where efficient coordinate extraction is necessary.

Example: Coordinate extraction in
$F_{2^{3}}$

Consider the field

F_{2^{3}}

, where

q = 2

and

d = 3

. Suppose we have a basis:

b_{0} = 1, b_{1} = α, b_{2} = α^{2},

where

α

is a primitive element satisfying

α^{3} + α + 1 = 0

Now, assume the element

p

is given by:

p = b_{0} + b_{1} + b_{2} .

We want to extract the coordinates

p_{0}, p_{1}, p_{2}

using the dual basis

{u_{0}, u_{1}, u_{2}}

. Applying the coordinate recovery formula:

p_{i} = tr (u_{i} p),

we compute the trace function for each dual basis element. This allows us to recover

p_{0} = 1, p_{1} = 1, p_{2} = 1

, confirming the decomposition of

p

in the given basis.

Unpacking polynomials using the Frobenius orbit

When working in a field extension, polynomials can be packed together in a structured way. Given a polynomial

P

—which could be univariate or multivariate—it may be expressed as a sum of smaller polynomials

{P_{0}, \dots, P_{d - 1}}

defined over the base field

F

. The packing is done using a basis

{b_{0}, \dots, b_{d - 1}}

P = \sum_{i = 0}^{d - 1} b_{i} P_{i} .

Our goal is to recover the original polynomials

{P_{0}, \dots, P_{d - 1}}

from

P

efficiently. To achieve this, we leverage the trace function and the dual basis

{u_{0}, \dots, u_{d - 1}}

, which allow us to extract the individual components. Specifically, the

i

-th polynomial

P_{i} (x)

can be recovered using:

P_{i} (x) = tr (u_{i} P) .

Expanding the expression

To further understand this formula, we expand the trace function:

P_{i} (x) = \sum_{j = 0}^{n - 1} {Fr}^{j} (u_{i}) ({Fr}^{j} P) (x),

where

n

is the degree of the field extension. Since the Frobenius map acts as an automorphism on the field, we can rewrite this expression using the inverse Frobenius shift:

P_{i} (x) = \sum_{j = 0}^{n - 1} {Fr}^{j} (u_{i}) {Fr}^{j} (P ({Fr}^{- j} x)) .

This equation is fundamental to our approach because it allows us to compute each

P_{i} (x)

entirely in terms of

P

, without directly evaluating

P_{i}

Key insights from this expression

1. Frobenius orbit decomposition

This result shows that

P_{i} (x)

is reconstructed by evaluating

P

along the inverse Frobenius orbit of

x

, which consists of the values:

{x, {Fr}^{- 1} (x), {Fr}^{- 2} (x), \dots, {Fr}^{- (n - 1)} (x)} .

Instead of computing

P_{i} (x)

directly, we express it in terms of

P

, the packed polynomial, by summing evaluations of

P

at different points in the Frobenius orbit. This is powerful because:

It allows structured polynomial unpacking without needing explicit knowledge of the individual
$P_{i}$ polynomials.
The computation naturally aligns with the Frobenius action, making it efficient in field operations.

2. Claim reduction for sumcheck protocols

One of the most significant consequences of this result is that it reduces the problem of evaluating

P_{i} (x)

to multiple evaluations of

P

, which simplifies computations in proof systems. This has direct implications for our interactive proof protocol, such as the sumcheck protocol, where:

Instead of handling separate polynomials
$P_{i}$ , we work only with the packed polynomial
$P$ .
The verification process becomes simpler, as the verifier can check claims about
$P$ rather than needing access to individual components.

The Four Russians method for efficient matrix-vector multiplication

The Four Russians method is an optimization technique designed to speed up matrix operations by leveraging precomputed partial results. In the context of the boolcheck algorithm, we use this method to accelerate matrix-vector multiplication, rather than general Boolean matrix multiplication.

Key idea: precomputing matrix application to a vector

Instead of performing direct multiplication of a matrix with a vector, we precompute the effect of applying the matrix to all possible small input vectors. This precomputation enables fast lookup-based computation, significantly reducing redundant calculations.

How it works

Split the matrix into columns
- We treat the matrix as a set of column vectors rather than working with full row-by-row multiplication.
- Each column represents how a single input bit affects the output.
Precompute partial sums of matrix columns
- Since matrix-vector multiplication involves summing specific column combinations based on the input vector, we precompute all possible XOR sums of groups of 8 columns at a time.
- There are
  $2^{8} = 256$ possible subsets of 8 columns, so we store 256 possible results per chunk of 8 columns.
Efficiently apply the matrix to the vector
- Instead of computing each matrix-vector product from scratch, we split the input vector into bytes.
- Each byte of the input is treated as an index into our precomputed table, allowing us to fetch the corresponding result instantly.
- This avoids unnecessary calculations, replacing multiplications with simple lookups.

Step-by-step algorithm

1. Precompute XOR sums for column subsets

Divide the
$128 \times 128$ matrix into 16 chunks of 8 columns each.
For each 8-column chunk, compute and store the XOR sum of all 256 possible row combinations.
This results in a precomputed table with
$256 \times 16$ entries.

2. Apply the precomputed matrix to the input vector

Convert the input vector into a 16-byte representation (since 128 bits = 16 bytes).
Each byte of the input is used as an index into the precomputed lookup table.
The results for each chunk are summed together to produce the final output.

Computational efficiency

Precomputing the table requires
$O (2^{8} \times 16)$ operations, which is feasible.
Applying the matrix to a vector is reduced from
$O (n^{2})$ to just 16 table lookups and XOR additions.
This method drastically reduces runtime from
$O (n^{2})$ to approximately
$O (n)$ , making it well-suited for large-scale computations.

Intuition: why this works

Instead of multiplying the full matrix with the vector, we precompute how groups of 8 columns interact with any 8-bit input vector. Since the boolean matrix multiplication only involves XOR operations, all possible results can be stored in advance and retrieved instantly.

This allows us to avoid redundant computations, replacing them with fast memory lookups.

Practical application in boolcheck

In the boolcheck protocol, we frequently need to apply a binary matrix transformation to a vector during various steps of the protocol. The Four Russians method allows us to speed up these transformations significantly by leveraging precomputed lookup tables.

Instead of recomputing matrix-vector products every time, we reuse previously computed results, making verification much more efficient.

Minimal example: Applying the Four Russians method

To make the Four Russians method more concrete, let’s go through a simple example. Consider a small

4 \times 4

binary matrix

A

and a 4-bit input vector

x

. Even if the technique explained in this example may differ slightly from the actual implementation, it will still allow you to have an overview of the method and understand it better after all these theoretical explanations.

Matrix and input vector

Let’s define a boolean matrix

A

A = [\begin{matrix} 1 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ 1 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 \end{matrix}]

And an input vector:

x = [\begin{matrix} 1 \\ 0 \\ 1 \\ 1 \end{matrix}]

The standard boolean matrix-vector multiplication rule states that the result vector

y = A \cdot x

is computed as follows:

y_{i} = ⨁_{j = 1}^{4} (A_{i j} \land x_{j})

where

\oplus

denotes XOR. Computing this row-by-row gives:

\begin{aligned} y & = [\begin{array}{c} 1 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ 1 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 \end{array}] \cdot [\begin{array}{c} 1 \\ 0 \\ 1 \\ 1 \end{array}] \\ = [\begin{array}{c} (1 \land 1) \oplus (0 \land 0) \oplus (1 \land 1) \oplus (1 \land 1) \\ (0 \land 1) \oplus (1 \land 0) \oplus (0 \land 1) \oplus (1 \land 1) \\ (1 \land 1) \oplus (1 \land 0) \oplus (1 \land 1) \oplus (0 \land 1) \\ (0 \land 1) \oplus (0 \land 0) \oplus (1 \land 1) \oplus (1 \land 1) \end{array}] \\ = [\begin{array}{c} 1 \oplus 0 \oplus 1 \oplus 1 \\ 0 \oplus 0 \oplus 0 \oplus 1 \\ 1 \oplus 0 \oplus 1 \oplus 0 \\ 0 \oplus 0 \oplus 1 \oplus 1 \end{array}] \end{aligned}

So the output vector is:

y = [\begin{matrix} 1 \\ 1 \\ 0 \\ 0 \end{matrix}]

Using the Four Russians method

Instead of performing these calculations row-by-row, we precompute all possible XOR combinations for small chunks of columns.

Step 1: Split the matrix into 2-column blocks

We divide

A

into chunks of 2 columns each:

A_{1} = [\begin{matrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \\ 0 & 0 \end{matrix}], A_{2} = [\begin{matrix} 1 & 1 \\ 0 & 1 \\ 1 & 0 \\ 1 & 1 \end{matrix}]

Step 2: Precompute XORs for each 2-column block

For each possible 2-bit input, we compute the corresponding XOR sum for each block:

Input bits $(x_{1}, x_{2})$	XOR sum for $A_{1}$	XOR sum for $A_{2}$
$(0, 0)$	$(0, 0, 0, 0)$	$(0, 0, 0, 0)$
$(0, 1)$	$(0, 1, 1, 0)$	$(1, 1, 0, 1)$
$(1, 0)$	$(1, 0, 1, 0)$	$(1, 0, 1, 1)$
$(1, 1)$	$(1, 1, 0, 0)$	$(0, 1, 1, 0)$

Step 3: Use input vector to lookup values

Now, we split

x

into two-bit chunks:

x_{1} = (1, 0), x_{2} = (1, 1)

Using the precomputed table, we retrieve:

For
$(1, 0)$ from
$A_{1}$ :
$(1, 0, 1, 0)$
For
$(1, 1)$ from
$A_{2}$ :
$(0, 1, 1, 0)$

We compute the final result by XORing these values together:

\begin{aligned} (1, 0, 1, 0) \oplus (0, 1, 1, 0) & = (1 \oplus 0, 0 \oplus 1, 1 \oplus 1, 0 \oplus 0) \\ = (1, 1, 0, 0) \end{aligned}

which matches our direct computation.

Why this is faster

Instead of computing the product row-by-row, we simply:

Precompute all possible results for small column chunks (done once).
Split the input vector into byte-sized chunks and use them as indices to retrieve precomputed values.
Combine results using XOR, which is very efficient.

For large matrices, this reduces complexity from

O (n^{2})

O (n)

, as only a small table lookup and a few XOR operations are needed per chunk.

Boolcheck protocol

The boolcheck protocol is a verification mechanism designed for quadratic boolean formulas. To illustrate the core idea, we will focus on andcheck, a specific case of boolcheck. However, the same construction extends naturally to any homogeneous quadratic Boolean formula. Additionally, with some modifications, non-homogeneous quadratic formulas can also be accommodated.

For our setting, we assume the base field is

F = F_{2}

Packed representation of polynomials

We consider two packed polynomials:

P = \sum_{i} b_{i} P_{i}, Q = \sum_{i} b_{i} Q_{i},

where:

${P_{i}}$ and
${Q_{i}}$ are coordinate polynomials defined over
$F_{2}$ .
${b_{i}}$ are basis elements over
$F_{2}$ .

The goal is to efficiently verify a quadratic boolean formula using the sumcheck protocol, which involves the following sum:

\sum_{x} (\sum_{i} (b_{i} P_{i} (x) Q_{i} (x)) \cdot eq (x, y)),

where

eq (x, y)

is an equality test function, enforcing constraints between variables

x

and

y

A direct computation of this sum can be expensive, so we use an optimized approach by precomputing the (multiquadratic) polynomial:

(P \land Q) (x) = \sum_{i} (b_{i} P_{i} (x) Q_{i} (x)) .

This polynomial represents the bitwise AND operation applied to

P

and

Q

at each evaluation point

x

Evaluation set and extension to
$(0, 1, \infty)^{n}$

To evaluate polynomials efficiently, we define our evaluation domain as:

(0, 1, \infty)^{n} .

The special element

\infty

is handled using a rule similar to Karatsuba multiplication:

The value of a polynomial at
$\infty$ is interpreted as its highest-degree coefficient.

Using this rule, for any linear polynomial, we obtain the fundamental identity:

P (\dots, 0, \dots) + P (\dots, 1, \dots) + P (\dots, \infty, \dots) = 0.

This identity allows us to efficiently extend a

2^{n}

-sized evaluation table of

P

into a

3^{n}

-sized table over

(0, 1, \infty)^{n}

. Finally, we compute:

(P \land Q) (x),

using pointwise AND operations on the extended tables of

P

and

Q

. Since this extension takes place in

F_{2}

, it commutes with packing operations, preserving efficiency.

Complexity considerations

While efficient for small sizes, this method suffers from an asymptotic complexity of

O (n^{\log_{2} (3)})

, which arises because:

We start with a
$2^{n}$ -sized evaluation table of
$P$ .
We extend it to a
$3^{n}$ -sized evaluation table over
$(0, 1, \infty)^{n}$ .

Why does the equality hold?

For any multilinear polynomial, we have:

P (\dots, 0, \dots) + P (\dots, 1, \dots) + P (\dots, \infty, \dots) = 0.

Step-by-step explanation

A multilinear polynomial over

F_{2}

is defined as:

P (x_{1}, x_{2}, \dots, x_{n}) = \sum_{S \subseteq [n]} c_{S} \prod_{i \in S} x_{i},

where

c_{S} \in F_{2}

are coefficients, and each term is a monomial of degree at most

n

. Since we are in

F_{2}

, addition follows the rule:

x + x = 0,

which means addition is equivalent to XOR.

Evaluations at key points

We evaluate

P

at three special points:

At
$0$ :
Setting
$x_{1} = x_{2} = \dots = x_{n} = 0$ , we get:

P (\dots, 0, \dots) = 0.

At
$1$ :
Setting
$x_{1} = x_{2} = \dots = x_{n} = 1$ , we compute:

P (\dots, 1, \dots) = \sum_{S \subseteq [n]} c_{S} \cdot 1 = \sum_{S \subseteq [n]} c_{S} .

At
$\infty$ :
The evaluation at
$\infty$ corresponds to extracting only the terms of highest degree (i.e., monomials of degree exactly
$n$ ):

P (\dots, \infty, \dots) = \sum_{| S | = n} c_{S} .

Summing the evaluations:

P (\dots, 0, \dots) + P (\dots, 1, \dots) + P (\dots, \infty, \dots) .

Substituting the values:

0 + \sum_{S \subseteq [n]} c_{S} + \sum_{| S | = n} c_{S} .

Since every monomial appears twice except those of highest degree (once in

P (1)

and once in

P (\infty)

), and addition in

F_{2}

follows

x + x = 0

, we get:

\sum_{S \subseteq [n]} c_{S} + \sum_{| S | = n} c_{S} = 0.

Thus:

P (\dots, 0, \dots) + P (\dots, 1, \dots) + P (\dots, \infty, \dots) = 0.

Andcheck: combining Frobenius orbit calculations and the extension method

To efficiently evaluate andcheck, we use a hybrid approach inspired by the Gruen univariate skip. The idea is to leverage the extension method as a "skip" for the first

c

rounds, postponing the full unpacking of polynomials until later. After this initial phase, we transition to the naive algorithm to complete the computation.

This method significantly reduces the complexity of the sumcheck protocol by avoiding unnecessary operations on unpacked coordinate polynomials early in the process. Below, we detail the full approach step by step.

Extending
$P$ and
$Q$

We start by extending the packed polynomials

P

and

Q

into evaluation tables of size:

3^{c + 1} \cdot 2^{n - c - 1} .

This step ensures that the polynomials are evaluated over an extended domain:

(0, 1, \infty)^{c + 1} \times (0, 1)^{n - c - 1} .

Here:

The first
$c + 1$ variables take values in
${0, 1, \infty}$ , allowing us to incorporate an additional layer of structure.
The remaining
$n - c - 1$ variables remain binary, taking values in
${0, 1}$ .

This domain extension is crucial because it allows us to later apply the Frobenius orbit technique for more efficient polynomial evaluations.

Sumcheck with packed representation

Instead of working directly with the unpacked coordinate polynomials

P_{i}

and

Q_{i}

, we perform the first

c

rounds of sumcheck using the packed representation:

(P \land Q) (x) = \sum_{i} b_{i} P_{i} (x) Q_{i} (x) .

Since the basis elements

b_{i}

remain fixed, this formulation lets us compute sums over multiple coordinate polynomials simultaneously, avoiding the computational cost of handling each coordinate separately. This step significantly reduces complexity in the early rounds.

Restricting
$P$ and
$Q$ to challenge points

After the first

c

rounds, the verifier provides challenge points

r_{0}, \dots, r_{c - 1}

. These values allow us to fix the first

c

variables in

P

and

Q

, reducing them to new polynomials of the form:

P_{i} (r_{0}, \dots, r_{c - 1}, x_{c}, \dots, x_{n - 1}) .

At this stage:

The first
$c$ variables
$r_{0}, \dots, r_{c - 1}$ are now constants rather than variables.
The polynomials are now functions of only
$x_{c}, \dots, x_{n - 1}$ , simplifying subsequent evaluations.

However, computing this restricted form efficiently requires optimizing the restriction step.

Efficient restriction using the 4 Russians method

A naive approach to restricting

P

and

Q

would require evaluating

d

polynomials across

N

points, leading to a complexity of approximately

d \cdot N

basefield-by-extension-field multiplications. Instead, we improve performance using the 4 Russians method, which speeds up table-based polynomial evaluations.

The method proceeds as follows:

Precompute an evaluation table

We construct the table:

$S (x_{0}, \dots, x_{c - 1}) = eq (r_{0}, \dots, r_{c - 1}; x_{0}, \dots, x_{c - 1}),$

where
$eq$ is an equality indicator function that ensures consistency between the challenge points and the polynomial's domain.
Chunking and precomputing XOR values
- Split the table
  $S (x_{0}, \dots, x_{c - 1})$ into chunks of size 8.
- For each chunk, precompute all
  $256$ possible XOR values.
Transposing and using AVX-256 optimizations
- Process the evaluation table of
  $P$ by transposing each chunk (to allow for efficient vectorized operations).
- Fetch results efficiently using the precomputed XOR AVX-256 instructions

By structuring computations in this way, we avoid redundant multiplications and minimize memory access costs, making the restriction step much more efficient.

Completing the sumcheck protocol

Once we have restricted

P

and

Q

, we continue with the remaining rounds of sumcheck. At this stage, we switch to the standard algebraic representation:

(P \land Q) (x) = \sum_{i} b_{i} P_{i} (x) Q_{i} (x) .

Since the first

c

variables are already fixed, we now operate directly on the unpacked coordinate polynomials

P_{i}

and

Q_{i}

. This allows us to perform final evaluations efficiently.

Recasting openings in the Frobenius orbit

The final step is to recast the openings of the coordinate polynomials in terms of the Frobenius orbit. Given a challenge point:

r = (r_{0}, \dots, r_{n - 1}),

we reinterpret these evaluations in terms of openings of the packed polynomials

P

and

Q

, evaluated over the Frobenius orbit of

r

This transformation is particularly useful because:

The Frobenius orbit structure ensures that polynomial evaluations remain well-structured, reducing overhead.
This recasting simplifies verification since many terms naturally align in computations, avoiding unnecessary recomputations.

Multiopening using the 4 Russians method

As you must have understood by now, the 4 Russians method is a highly efficient optimization technique, particularly useful in our situation for the multiopening argument. After performing boolcheck, we are left with

d

individual claims for a polynomial

P

, each of the form:

P ({Fr}^{i} (r)) = s_{i},

where:

${Fr}^{i} (r)$ is the
$i$ -th Frobenius twist of the challenge
$r$ .
$s_{i}$ is the corresponding evaluation result at that twisted point.

Since verifying each of these

d

claims individually would be computationally expensive, we compress them into a single claim using random coefficients

γ_{i}

. This results in the combined multiopening claim:

\sum_{i} γ_{i} P ({Fr}^{i} (r)) = \sum_{i} γ_{i} s_{i} .

This transformation allows us to verify all

d

claims using a single sumcheck protocol, significantly improving efficiency.

Verifying the combined claim

To check the validity of the combined claim, we run a sumcheck protocol on the expression:

P (x) \cdot (\sum_{i} γ_{i} \cdot eq ({Fr}^{i} (r), x)),

where:

$eq ({Fr}^{i} (r), x)$ is an equality indicator function that returns 1 if
$x = {Fr}^{i} (r)$ and 0 otherwise.
The sum
$\sum_{i} γ_{i} \cdot eq ({Fr}^{i} (r), x)$ aggregates the weighted equality checks across all
$d$ Frobenius-twisted points.

The main computational challenge lies in efficiently evaluating this summation across all

d

Frobenius twists.

Rewriting the equality sum as a matrix-vector product

To optimize the computation of:

\sum_{i} γ_{i} \cdot eq ({Fr}^{i} (r), x),

we express it as a matrix-vector multiplication:

L \cdot eq (r, x),

where:

$L$ is a
$d \times d$ Boolean matrix defined as:

$L = \sum_{i} γ_{i} {Fr}^{i} .$
The vector
$eq (r, x)$ encodes the equality checks for all values of
$x$ .

By transforming the problem into a structured matrix operation, we can apply the 4 Russians method to compute it efficiently.

Efficient computation using the 4 Russians method

The 4 Russians method accelerates the computation of

L \cdot eq (r, x)

using precomputations and chunk-based processing. The steps are as follows:

Precompute XOR values:
- Divide the matrix
  $L$ into chunks of 8 columns.
- Precompute all
  $256$ possible XOR combinations of entries in each chunk.
Apply the matrix to the vector efficiently:
- Split the input vector
  $eq (r, x)$ into bytes (chunks of size 8 bits).
- Use each byte to index into the precomputed XOR tables and retrieve the corresponding results instantly.

This optimization is based on the insight that boolean matrix operations, particularly XOR operations, can be transformed into simple table lookups. By working with 8-bit chunks, we limit the number of precomputations to

256

combinations per chunk, keeping the process both efficient and scalable.

Computational complexity and efficiency

Using this method, the total runtime for computing

L \cdot eq (r, x)

is reduced to approximately

2 d

multiplications, making it highly efficient.

The combination of:

Precomputed XOR tables
Chunked processing
Optimized Boolean matrix-vector multiplication

ensures that the multiopening verification is performed in near-optimal time, even for large values of

d

Impact on multiopening arguments

By applying the 4 Russians method, we significantly reduce the computational complexity of verifying multiopening claims. This optimization enhances the efficiency of our boolcheck protocol, especially when dealing with high-dimensional Frobenius twists in interactive proofs.

The method allows us to:

Compress multiple claims into a single sumcheck.
Optimize the key matrix-vector multiplication step.
Achieve significant speed improvements over naive verification methods.

Conclusion

The boolcheck protocol demonstrates how carefully designed algebraic structures can significantly optimize proof systems for quadratic Boolean formulas. By embedding Boolean operations within a finite field and leveraging key mathematical tools, boolcheck achieves a scalable and computationally efficient verification process.

By combining finite field algebra, sumcheck protocol, and matrix optimization techniques, boolcheck achieves a highly efficient Boolean formula verification mechanism. This structured approach to arithmetizing Boolean logic ensures that both provers and verifiers can operate within optimal constraints, making boolcheck a practical and scalable solution for proof systems in cryptographic applications.

Hashcaster - Part 3: boolcheck

Introduction

Frobenius Theory

Frobenius map and orbit

Frobenius map

Properties of the Frobenius map

Preservation of addition and multiplication

Periodicity of the Frobenius map

Frobenius orbit

Frobenius action on polynomials

Definition of the trace

Key properties of the trace

Linearity of the trace function

Non-degeneracy of the bilinear pairing

Explicit proof of non-degeneracy

Recovering coordinates using the trace and dual basis

Definition of the dual basis

Claim: coordinate recovery formula

Proof of the coordinate recovery formula

Intuition behind the dual basis

Example: Coordinate extraction in F23

Unpacking polynomials using the Frobenius orbit

Expanding the expression

Key insights from this expression

1. Frobenius orbit decomposition

2. Claim reduction for sumcheck protocols

The Four Russians method for efficient matrix-vector multiplication

Key idea: precomputing matrix application to a vector

How it works

Step-by-step algorithm

1. Precompute XOR sums for column subsets

2. Apply the precomputed matrix to the input vector

Computational efficiency

Intuition: why this works

Practical application in boolcheck

Minimal example: Applying the Four Russians method

Matrix and input vector

Using the Four Russians method

Why this is faster

Boolcheck protocol

Packed representation of polynomials

Evaluation set and extension to (0,1,∞)n

Complexity considerations

Why does the equality hold?

Andcheck: combining Frobenius orbit calculations and the extension method

Extending P and Q

Sumcheck with packed representation

Restricting P and Q to challenge points

Efficient restriction using the 4 Russians method

Completing the sumcheck protocol

Recasting openings in the Frobenius orbit

Multiopening using the 4 Russians method

Verifying the combined claim

Rewriting the equality sum as a matrix-vector product

Efficient computation using the 4 Russians method

Computational complexity and efficiency

Impact on multiopening arguments

Conclusion

Read more

beam call #2

Circuit gadgets

Openvm stark backend

Hashcaster - Part 2: The challenges of Sumcheck in binary fields

Example: Coordinate extraction in
$F_{2^{3}}$

Evaluation set and extension to
$(0, 1, \infty)^{n}$

Extending
$P$ and
$Q$

Restricting
$P$ and
$Q$ to challenge points