Try   HackMD

Hashcaster - Part 3: boolcheck

Special thanks to Lev Soukhanov for his great help all along the project and for the super interesting theoretical discussions we had.

Introduction

The boolcheck protocol in Hashcaster introduces a novel approach for efficiently verifying quadratic Boolean formulas using polynomial coordinates. In this context, boolean variables are represented as elements of a finite field (for example

F2128), where logical operations are naturally embedded within algebraic expressions.

The main goal of boolcheck is to verify the correctness of boolean formulas while minimizing the computational burden on both the prover and verifier. To illustrate this, we will focus on the andcheck protocol—a fundamental case that demonstrates the core techniques. However, this framework extends seamlessly to any homogeneous quadratic Boolean formula, and with slight modifications, can also accommodate non-homogeneous formulas.

To understand why boolcheck leverages Frobenius theory and the 4 Russians method, we will first explore the underlying mathematical foundations. By diving into Frobenius automorphisms, their impact on polynomial representations, and how they enable efficient extraction of packed field elements, we will see how they naturally fit into the construction of boolcheck. Additionally, we will introduce the 4 Russians method, a powerful optimization technique that significantly accelerates matrix-vector computations, making it a key component of our protocol.

This post will take a structured approach:

  1. Frobenius theory – Understanding the role of Frobenius maps, traces, and their application to encoding Boolean logic efficiently.
  2. The 4 Russians method – Exploring a classical optimization technique that speeds up structured matrix operations.
  3. Boolcheck protocol – Implementing these mathematical tools to efficiently verify quadratic Boolean formulas.
  4. Multiopening optimization – Using the 4 Russians method to reduce complexity in multi-opening verifications.

By the end of this post, the reader will have a clear understanding of how boolcheck transforms Boolean logic verification into an algebraic proof system optimized for computational efficiency.

Frobenius Theory

Frobenius map and orbit

Let’s start with a fundamental concept in finite fields. Suppose we are working with a base field

F and an extension field
Fd
. We define a basis for this extension as:

b0,b1,,bd1Fd

These basis elements have a special property: any element of the extension field

Fd can be uniquely represented as a linear combination of them using coefficients from the base field
F
. In other words, given any element in
Fd
, we can always find some values
p0,p1,,pd1
in
F
such that:

x=p0b0+p1b1++pd1bd1

Next, with this structure in mind, let’s consider values

p0,p1,,pd1 from
F
. Using these values, we define a function known as the packing map:

pack(p0,,pd1)=i=0d1bipi

This function encodes multiple field elements into a single field element. However, our goal is to reverse this process: given a packed value

p, we want to recover the original values
p0,p1,,pd1
using algebraic methods. To achieve this, we introduce the Frobenius morphism, which will allow us to systematically extract these components.

Frobenius map

The Frobenius map is a fundamental automorphism in finite field arithmetic. Given a field

Fpm with characteristic
p
, the Frobenius map
Fr
is defined as:

Fr(x)=xp

This function preserves the algebraic structure of the field, meaning that addition and multiplication behave predictably under its application.

In fields where the characteristic is

2 (such as
F2m
), the Frobenius map simplifies further:

Fr(x)=x2

This property makes computations particularly efficient when working in binary fields.

Properties of the Frobenius map

The Frobenius map is a fundamental automorphism in finite field arithmetic, meaning it preserves the structure of the field while transforming its elements in a predictable way. It possesses two key properties that make it particularly useful in algebraic computations:

Preservation of addition and multiplication

One of the most important features of the Frobenius map is that it preserves both addition and multiplication in the field. That is, for any elements

a,bF:

Fr(a+b)=Fr(a)+Fr(b),Fr(ab)=Fr(a)Fr(b).

While the preservation of multiplication follows directly from exponentiation, the preservation of addition requires a deeper explanation. This property is a direct consequence of the Frobenius identity, which states that in a field of characteristic

p, for any
a,bF
:

(a+b)p=ap+bp.

This result follows from the binomial theorem, which expands the power

(a+b)p as:

(a+b)p=ap+(p1)ap1b+(p2)ap2b2++(pp1)abp1+bp.

Here, the binomial coefficients are given by:

(pk)=p!k!(pk)!.

For all

1kp1, these coefficients are multiples of
p
, meaning that in a field of characteristic
p
, they vanish:

(pk)0modp.

This simplifies the expansion to:

(a+b)p=ap+bp.

Thus, applying the Frobenius map distributes over both addition and multiplication, allowing it to be extended to polynomial expressions. For example, given any polynomial in the field:

Fr(a3+b2c)=Fr(a)3+Fr(b)2Fr(c).

Periodicity of the Frobenius map

Another fundamental property of the Frobenius map is that it is periodic in finite fields. Specifically, in an extension field

Fpm, applying the Frobenius map
m
times results in the identity map:

Frm(x)=x,for all xFpm.

This follows from the fact that every element of

Fpm satisfies:

xpm=x.

This periodic behavior is a direct consequence of the structure of finite fields and plays an essential role in extracting coefficients when working with the Frobenius map in algebraic constructions.

Frobenius orbit

Applying the Frobenius map multiple times to an element

r generates a sequence known as its Frobenius orbit:

r,Fr(r),Fr2(r),

Each step in this sequence represents the application of the Frobenius map to the previous element. Since the Frobenius map is a field automorphism, it systematically transforms

r while preserving algebraic structure.

In many algebraic protocols, we also need to reverse this process. The inverse Frobenius orbit traces this sequence backward, effectively undoing the effect of the Frobenius map at each step. This allows us to reconstruct previous values from later ones, which is particularly useful when recovering packed coefficients in extension fields.

Frobenius action on polynomials

The Frobenius automorphism plays a crucial role in algebraic structures, and we naturally want to extend its action to polynomials. However, a direct application of the Frobenius map by exponentiation does not preserve the degree of a polynomial. This creates a challenge when working with polynomial representations in finite fields.

To resolve this, we define the Frobenius action on polynomials in a coefficient-wise manner: applying

Fr(P) means applying the Frobenius map to each coefficient of
P
. This leads to a key identity:

Fr(P)(x)=Fr(P(Fr1(x))).

This equality follows directly from the automorphism properties of the Frobenius map, particularly the preservation of multiplication:

Fr(P(x))=Fr(P)(Fr(x)).

Expanding

P(x) as a polynomial,

P(x)=iaixi,

the coefficient-wise application of Frobenius gives:

Fr(P(x))=iFr(ai)xi.

Using this, we can now verify our key identity:

Fr(P(Fr1(x)))=Fr(P)(Fr(Fr1(x))x)=Fr(P)(x).

This confirms that the Frobenius transformation of a polynomial can be expressed in terms of the inverse Frobenius applied to its argument, making it a powerful tool in algebraic constructions.

Definition of the trace

In a finite field extension

Fqn, the trace function provides a way to aggregate information across the Frobenius orbit of an element. The trace of an element
xFqn
is defined as:

tr(x)=i=0n1Fri(x),

where:

  • Fr
    is the Frobenius automorphism, given by
    Fr(y)=yq
    ,
  • Fri
    denotes the
    i
    -th application of the Frobenius map.

This function computes a summation over the entire Frobenius orbit of

x, effectively collapsing information from all conjugates of
x
in the field extension. A key property of the trace is its invariance under Frobenius permutation: since the Frobenius map preserves the field structure, it permutes the conjugates of
x
symmetrically, ensuring that the trace does not depend on the order in which the terms are summed.

An important observation is that the only elements left unchanged by the Frobenius morphism—that is, those satisfying

Fr(x)=x—are precisely those in the base field
F
. As a result, the trace function inherently maps elements from the extension field
Fqn
back to
Fq
. This works because summing over all Frobenius conjugates effectively eliminates any dependency on the extension structure. Highlighting this property is essential, as it clarifies why
tr(x)
always resides in
F
rather than some larger field.

Example: Trace in a degree-3 extension

Consider a finite field extension

Fq3, where
n=3
. The Frobenius orbit of an element
xFq3
consists of:

x,xq,xq2.

Applying the definition, the trace is:

tr(x)=x+xq+xq2.

This sum remains in the base field

Fq and uniquely encodes information about
x
in a way that is independent of its specific representation in the extension field.

Key properties of the trace

The trace function possesses two fundamental algebraic properties:

  • Linearity of the trace function

The trace function is a linear map over the base field

Fq, meaning that for any
a,bFq
and any
x,yFqn
:

tr(ax+by)=atr(x)+btr(y).

This follows from the fact that the Frobenius map is linear over

Fq. Since
Fr(y)=yq
preserves addition over the base field, applying it iteratively ensures that summing traces maintains linearity.

  • Non-degeneracy of the bilinear pairing

In algebra and field theory, a bilinear pairing is said to be non-degenerate if it does not collapse information, meaning that for any nonzero element

x, there exists some
y
such that the pairing evaluates to a nonzero value. In our case, the bilinear pairing is given by:

(x,y)tr(xy).

Non-degeneracy ensures that if

tr(xy)=0 for all
yFqn
, then necessarily
x=0
. This property is crucial because it guarantees that the trace function retains enough information to distinguish elements in
Fqn
and does not annihilate entire subspaces.

Why is the pairing non-degenerate?

The non-degeneracy of this bilinear pairing follows from two key observations:

  1. The trace function is surjective onto

    Fq.
    This means that for any nonzero element in the base field
    Fq
    , there exists at least one element in
    Fqn
    whose trace evaluates to that value. In particular, there exist elements
    sFqn
    for which
    tr(s)0
    .

  2. Multiplication by a nonzero element

    x defines a linear transformation.
    Consider the map:

    ytr(xy).

    This transformation is linear over

    Fq. If this transformation had a nontrivial kernel (i.e., if
    tr(xy)=0
    for all
    y
    and some nonzero
    x
    ), then multiplication by
    x
    would collapse all values in
    Fqn
    to zero under the trace function, contradicting the fact that the trace is surjective. The only way for this to hold is if
    x=0
    .

Explicit proof of non-degeneracy

To see this concretely, take any nonzero

xFqn. Because
tr
is surjective, we can always find some element
sFqn
such that:

tr(s)0.

Now, define:

y=sx1.

Substituting this into the pairing:

tr(xy)=tr(xsx1).

Since multiplication by

x and its inverse cancel out:

tr(xy)=tr(s).

By construction,

tr(s)0, meaning that for this choice of
y
, the pairing is nonzero. This confirms that for every nonzero
x
, we can always find a
y
such that
tr(xy)0
, proving that the pairing is non-degenerate.

Recovering coordinates using the trace and dual basis

In a finite field extension, elements can be expressed in terms of a basis. However, given an element

p, we need a way to extract its coordinates relative to this basis. This is achieved using the trace function and a special dual basis, leveraging the non-degeneracy of the bilinear pairing.

Definition of the dual basis

Let

{b0,,bd1} be a basis of the field extension
Fqd
over
Fq
. A dual basis
{u0,,ud1}
is defined by the condition:

tr(bkuj)=δkj,

where

δkj is the Kronecker delta:

δkj={1,if k=j,0,if kj.

This means that each dual basis element

uj is chosen so that it "isolates" a single coordinate when paired with the corresponding basis element
bk
under the trace function.

Claim: coordinate recovery formula

Any element

pFqd can be written as a linear combination of the basis elements:

p=i=0d1bipi.

Our goal is to recover the coefficients

pi. Using the dual basis, we claim that:

pi=tr(uip).

This equation provides an explicit method to compute the coordinate

pi of
p
with respect to the basis
{b0,,bd1}
.

Proof of the coordinate recovery formula

To verify this formula, we proceed as follows:

  1. Substituting the expansion of

    p into the trace function:

    Applying

    tr(ujp) to both sides of
    p=i=0d1bipi
    , we get:

    tr(ujp)=tr(uji=0d1bipi).

  2. Using linearity of the trace function:

    Since the trace function is linear, it distributes over summation and scalar multiplication:

    tr(ujp)=i=0d1pitr(ujbi).

  3. Applying the duality condition:

    From the definition of the dual basis, we know:

    tr(ujbi)=δij.

    This means that for each term in the sum, only the term where

    i=j survives, while all others vanish:

    tr(ujp)=pj.

  4. Conclusion:

    Since this holds for all indices

    j, we conclude that:

    pj=tr(ujp),

    which proves our claim.

Intuition behind the dual basis

The dual basis

{u0,,ud1} acts as a "coordinate extraction tool". When an element
p
is expressed in the basis
{b0,,bd1}
, the trace pairing ensures that applying
tr(ujp)
isolates the
j
-th coordinate
pj
.

This construction is particularly useful in finite field arithmetic, error-correcting codes, and cryptographic protocols, where efficient coordinate extraction is necessary.

Example: Coordinate extraction in
F23

Consider the field

F23, where
q=2
and
d=3
. Suppose we have a basis:

b0=1,b1=α,b2=α2,

where

α is a primitive element satisfying
α3+α+1=0
.

Now, assume the element

p is given by:

p=b0+b1+b2.

We want to extract the coordinates

p0,p1,p2 using the dual basis
{u0,u1,u2}
. Applying the coordinate recovery formula:

pi=tr(uip),

we compute the trace function for each dual basis element. This allows us to recover

p0=1,p1=1,p2=1, confirming the decomposition of
p
in the given basis.

Unpacking polynomials using the Frobenius orbit

When working in a field extension, polynomials can be packed together in a structured way. Given a polynomial

P—which could be univariate or multivariate—it may be expressed as a sum of smaller polynomials
{P0,,Pd1}
defined over the base field
F
. The packing is done using a basis
{b0,,bd1}
:

P=i=0d1biPi.

Our goal is to recover the original polynomials

{P0,,Pd1} from
P
efficiently. To achieve this, we leverage the trace function and the dual basis
{u0,,ud1}
, which allow us to extract the individual components. Specifically, the
i
-th polynomial
Pi(x)
can be recovered using:

Pi(x)=tr(uiP).

Expanding the expression

To further understand this formula, we expand the trace function:

Pi(x)=j=0n1Frj(ui)(FrjP)(x),

where

n is the degree of the field extension. Since the Frobenius map acts as an automorphism on the field, we can rewrite this expression using the inverse Frobenius shift:

Pi(x)=j=0n1Frj(ui)Frj(P(Frjx)).

This equation is fundamental to our approach because it allows us to compute each

Pi(x) entirely in terms of
P
, without directly evaluating
Pi
.

Key insights from this expression

1. Frobenius orbit decomposition

This result shows that

Pi(x) is reconstructed by evaluating
P
along the inverse Frobenius orbit of
x
, which consists of the values:

{x,Fr1(x),Fr2(x),,Fr(n1)(x)}.

Instead of computing

Pi(x) directly, we express it in terms of
P
, the packed polynomial, by summing evaluations of
P
at different points in the Frobenius orbit. This is powerful because:

  • It allows structured polynomial unpacking without needing explicit knowledge of the individual
    Pi
    polynomials.
  • The computation naturally aligns with the Frobenius action, making it efficient in field operations.

2. Claim reduction for sumcheck protocols

One of the most significant consequences of this result is that it reduces the problem of evaluating

Pi(x) to multiple evaluations of
P
, which simplifies computations in proof systems. This has direct implications for our interactive proof protocol, such as the sumcheck protocol, where:

  • Instead of handling separate polynomials
    Pi
    , we work only with the packed polynomial
    P
    .
  • The verification process becomes simpler, as the verifier can check claims about
    P
    rather than needing access to individual components.

The Four Russians method for efficient matrix-vector multiplication

The Four Russians method is an optimization technique designed to speed up matrix operations by leveraging precomputed partial results. In the context of the boolcheck algorithm, we use this method to accelerate matrix-vector multiplication, rather than general Boolean matrix multiplication.

Key idea: precomputing matrix application to a vector

Instead of performing direct multiplication of a matrix with a vector, we precompute the effect of applying the matrix to all possible small input vectors. This precomputation enables fast lookup-based computation, significantly reducing redundant calculations.

How it works

  1. Split the matrix into columns

    • We treat the matrix as a set of column vectors rather than working with full row-by-row multiplication.
    • Each column represents how a single input bit affects the output.
  2. Precompute partial sums of matrix columns

    • Since matrix-vector multiplication involves summing specific column combinations based on the input vector, we precompute all possible XOR sums of groups of 8 columns at a time.
    • There are
      28=256
      possible subsets of 8 columns, so we store 256 possible results per chunk of 8 columns.
  3. Efficiently apply the matrix to the vector

    • Instead of computing each matrix-vector product from scratch, we split the input vector into bytes.
    • Each byte of the input is treated as an index into our precomputed table, allowing us to fetch the corresponding result instantly.
    • This avoids unnecessary calculations, replacing multiplications with simple lookups.

Step-by-step algorithm

1. Precompute XOR sums for column subsets

  • Divide the
    128×128
    matrix into 16 chunks of 8 columns each.
  • For each 8-column chunk, compute and store the XOR sum of all 256 possible row combinations.
  • This results in a precomputed table with
    256×16
    entries.

2. Apply the precomputed matrix to the input vector

  • Convert the input vector into a 16-byte representation (since 128 bits = 16 bytes).
  • Each byte of the input is used as an index into the precomputed lookup table.
  • The results for each chunk are summed together to produce the final output.

Computational efficiency

  • Precomputing the table requires
    O(28×16)
    operations, which is feasible.
  • Applying the matrix to a vector is reduced from
    O(n2)
    to just 16 table lookups and XOR additions.
  • This method drastically reduces runtime from
    O(n2)
    to approximately
    O(n)
    , making it well-suited for large-scale computations.

Intuition: why this works

Instead of multiplying the full matrix with the vector, we precompute how groups of 8 columns interact with any 8-bit input vector. Since the boolean matrix multiplication only involves XOR operations, all possible results can be stored in advance and retrieved instantly.

This allows us to avoid redundant computations, replacing them with fast memory lookups.

Practical application in boolcheck

In the boolcheck protocol, we frequently need to apply a binary matrix transformation to a vector during various steps of the protocol. The Four Russians method allows us to speed up these transformations significantly by leveraging precomputed lookup tables.

Instead of recomputing matrix-vector products every time, we reuse previously computed results, making verification much more efficient.

Minimal example: Applying the Four Russians method

To make the Four Russians method more concrete, let’s go through a simple example. Consider a small

4×4 binary matrix
A
and a 4-bit input vector
x
. Even if the technique explained in this example may differ slightly from the actual implementation, it will still allow you to have an overview of the method and understand it better after all these theoretical explanations.

Matrix and input vector

Let’s define a boolean matrix

A:

A=[1011010111100011]

And an input vector:

x=[1011]

The standard boolean matrix-vector multiplication rule states that the result vector

y=Ax is computed as follows:

yi=j=14(Aijxj)

where

denotes XOR. Computing this row-by-row gives:

y=[1011010111100011][1011]=[(11)(00)(11)(11)(01)(10)(01)(11)(11)(10)(11)(01)(01)(00)(11)(11)]=[1011000110100011]

So the output vector is:

y=[1100]

Using the Four Russians method

Instead of performing these calculations row-by-row, we precompute all possible XOR combinations for small chunks of columns.

Step 1: Split the matrix into 2-column blocks

We divide

A into chunks of 2 columns each:

A1=[10011100],A2=[11011011]

Step 2: Precompute XORs for each 2-column block

For each possible 2-bit input, we compute the corresponding XOR sum for each block:

Input bits
(x1,x2)
XOR sum for
A1
XOR sum for
A2
(0,0)
(0,0,0,0)
(0,0,0,0)
(0,1)
(0,1,1,0)
(1,1,0,1)
(1,0)
(1,0,1,0)
(1,0,1,1)
(1,1)
(1,1,0,0)
(0,1,1,0)

Step 3: Use input vector to lookup values

Now, we split

x into two-bit chunks:

x1=(1,0),x2=(1,1)

Using the precomputed table, we retrieve:

  • For
    (1,0)
    from
    A1
    :
    (1,0,1,0)
  • For
    (1,1)
    from
    A2
    :
    (0,1,1,0)

We compute the final result by XORing these values together:

(1,0,1,0)(0,1,1,0)=(10,01,11,00)=(1,1,0,0)

which matches our direct computation.

Why this is faster

Instead of computing the product row-by-row, we simply:

  1. Precompute all possible results for small column chunks (done once).
  2. Split the input vector into byte-sized chunks and use them as indices to retrieve precomputed values.
  3. Combine results using XOR, which is very efficient.

For large matrices, this reduces complexity from

O(n2) to
O(n)
, as only a small table lookup and a few XOR operations are needed per chunk.

Boolcheck protocol

The boolcheck protocol is a verification mechanism designed for quadratic boolean formulas. To illustrate the core idea, we will focus on andcheck, a specific case of boolcheck. However, the same construction extends naturally to any homogeneous quadratic Boolean formula. Additionally, with some modifications, non-homogeneous quadratic formulas can also be accommodated.

For our setting, we assume the base field is

F=F2.

Packed representation of polynomials

We consider two packed polynomials:

P=ibiPi,Q=ibiQi,

where:

  • {Pi}
    and
    {Qi}
    are coordinate polynomials defined over
    F2
    .
  • {bi}
    are basis elements over
    F2
    .

The goal is to efficiently verify a quadratic boolean formula using the sumcheck protocol, which involves the following sum:

x(i(biPi(x)Qi(x))eq(x,y)),

where

eq(x,y) is an equality test function, enforcing constraints between variables
x
and
y
.

A direct computation of this sum can be expensive, so we use an optimized approach by precomputing the (multiquadratic) polynomial:

(PQ)(x)=i(biPi(x)Qi(x)).

This polynomial represents the bitwise AND operation applied to

P and
Q
at each evaluation point
x
.

Evaluation set and extension to
(0,1,)n

To evaluate polynomials efficiently, we define our evaluation domain as:

(0,1,)n.

The special element

is handled using a rule similar to Karatsuba multiplication:

  • The value of a polynomial at
    is interpreted as its highest-degree coefficient.

Using this rule, for any linear polynomial, we obtain the fundamental identity:

P(,0,)+P(,1,)+P(,,)=0.

This identity allows us to efficiently extend a

2n-sized evaluation table of
P
into a
3n
-sized table over
(0,1,)n
. Finally, we compute:

(PQ)(x),

using pointwise AND operations on the extended tables of

P and
Q
. Since this extension takes place in
F2
, it commutes with packing operations, preserving efficiency.

Complexity considerations

While efficient for small sizes, this method suffers from an asymptotic complexity of

O(nlog2(3)), which arises because:

  • We start with a
    2n
    -sized evaluation table of
    P
    .
  • We extend it to a
    3n
    -sized evaluation table over
    (0,1,)n
    .

Why does the equality hold?

For any multilinear polynomial, we have:

P(,0,)+P(,1,)+P(,,)=0.

Step-by-step explanation

A multilinear polynomial over

F2 is defined as:

P(x1,x2,,xn)=S[n]cSiSxi,

where

cSF2 are coefficients, and each term is a monomial of degree at most
n
. Since we are in
F2
, addition follows the rule:

x+x=0,

which means addition is equivalent to XOR.

Evaluations at key points

We evaluate

P at three special points:

  • At
    0
    :

    Setting
    x1=x2==xn=0
    , we get:

P(,0,)=0.

  • At
    1
    :

    Setting
    x1=x2==xn=1
    , we compute:

P(,1,)=S[n]cS1=S[n]cS.

  • At
    :

    The evaluation at
    corresponds to extracting only the terms of highest degree (i.e., monomials of degree exactly
    n
    ):

P(,,)=|S|=ncS.

Summing the evaluations:

P(,0,)+P(,1,)+P(,,).

Substituting the values:

0+S[n]cS+|S|=ncS.

Since every monomial appears twice except those of highest degree (once in

P(1) and once in
P()
), and addition in
F2
follows
x+x=0
, we get:

S[n]cS+|S|=ncS=0.

Thus:

P(,0,)+P(,1,)+P(,,)=0.

Andcheck: combining Frobenius orbit calculations and the extension method

To efficiently evaluate andcheck, we use a hybrid approach inspired by the Gruen univariate skip. The idea is to leverage the extension method as a "skip" for the first

c rounds, postponing the full unpacking of polynomials until later. After this initial phase, we transition to the naive algorithm to complete the computation.

This method significantly reduces the complexity of the sumcheck protocol by avoiding unnecessary operations on unpacked coordinate polynomials early in the process. Below, we detail the full approach step by step.

  1. Extending
    P
    and
    Q

We start by extending the packed polynomials

P and
Q
into evaluation tables of size:

3c+12nc1.

This step ensures that the polynomials are evaluated over an extended domain:

(0,1,)c+1×(0,1)nc1.

Here:

  • The first
    c+1
    variables take values in
    {0,1,}
    , allowing us to incorporate an additional layer of structure.
  • The remaining
    nc1
    variables remain binary, taking values in
    {0,1}
    .

This domain extension is crucial because it allows us to later apply the Frobenius orbit technique for more efficient polynomial evaluations.

  1. Sumcheck with packed representation

Instead of working directly with the unpacked coordinate polynomials

Pi and
Qi
, we perform the first
c
rounds of sumcheck using the packed representation:

(PQ)(x)=ibiPi(x)Qi(x).

Since the basis elements

bi remain fixed, this formulation lets us compute sums over multiple coordinate polynomials simultaneously, avoiding the computational cost of handling each coordinate separately. This step significantly reduces complexity in the early rounds.

  1. Restricting
    P
    and
    Q
    to challenge points

After the first

c rounds, the verifier provides challenge points
r0,,rc1
. These values allow us to fix the first
c
variables in
P
and
Q
, reducing them to new polynomials of the form:

Pi(r0,,rc1,xc,,xn1).

At this stage:

  • The first
    c
    variables
    r0,,rc1
    are now constants rather than variables.
  • The polynomials are now functions of only
    xc,,xn1
    , simplifying subsequent evaluations.

However, computing this restricted form efficiently requires optimizing the restriction step.

  1. Efficient restriction using the 4 Russians method

A naive approach to restricting

P and
Q
would require evaluating
d
polynomials across
N
points, leading to a complexity of approximately
dN
basefield-by-extension-field multiplications. Instead, we improve performance using the 4 Russians method, which speeds up table-based polynomial evaluations.

The method proceeds as follows:

  1. Precompute an evaluation table

    We construct the table:

    S(x0,,xc1)=eq(r0,,rc1;x0,,xc1),

    where

    eq is an equality indicator function that ensures consistency between the challenge points and the polynomial's domain.

  2. Chunking and precomputing XOR values

    • Split the table
      S(x0,,xc1)
      into chunks of size 8.
    • For each chunk, precompute all
      256
      possible XOR values.
  3. Transposing and using AVX-256 optimizations

    • Process the evaluation table of
      P
      by transposing each chunk (to allow for efficient vectorized operations).
    • Fetch results efficiently using the precomputed XOR AVX-256 instructions

By structuring computations in this way, we avoid redundant multiplications and minimize memory access costs, making the restriction step much more efficient.

  1. Completing the sumcheck protocol

Once we have restricted

P and
Q
, we continue with the remaining rounds of sumcheck. At this stage, we switch to the standard algebraic representation:

(PQ)(x)=ibiPi(x)Qi(x).

Since the first

c variables are already fixed, we now operate directly on the unpacked coordinate polynomials
Pi
and
Qi
. This allows us to perform final evaluations efficiently.

  1. Recasting openings in the Frobenius orbit

The final step is to recast the openings of the coordinate polynomials in terms of the Frobenius orbit. Given a challenge point:

r=(r0,,rn1),

we reinterpret these evaluations in terms of openings of the packed polynomials

P and
Q
, evaluated over the Frobenius orbit of
r
.

This transformation is particularly useful because:

  • The Frobenius orbit structure ensures that polynomial evaluations remain well-structured, reducing overhead.
  • This recasting simplifies verification since many terms naturally align in computations, avoiding unnecessary recomputations.

Multiopening using the 4 Russians method

As you must have understood by now, the 4 Russians method is a highly efficient optimization technique, particularly useful in our situation for the multiopening argument. After performing boolcheck, we are left with

d individual claims for a polynomial
P
, each of the form:

P(Fri(r))=si,

where:

  • Fri(r)
    is the
    i
    -th Frobenius twist of the challenge
    r
    .
  • si
    is the corresponding evaluation result at that twisted point.

Since verifying each of these

d claims individually would be computationally expensive, we compress them into a single claim using random coefficients
γi
. This results in the combined multiopening claim:

iγiP(Fri(r))=iγisi.

This transformation allows us to verify all

d claims using a single sumcheck protocol, significantly improving efficiency.

Verifying the combined claim

To check the validity of the combined claim, we run a sumcheck protocol on the expression:

P(x)(iγieq(Fri(r),x)),

where:

  • eq(Fri(r),x)
    is an equality indicator function that returns 1 if
    x=Fri(r)
    and 0 otherwise.
  • The sum
    iγieq(Fri(r),x)
    aggregates the weighted equality checks across all
    d
    Frobenius-twisted points.

The main computational challenge lies in efficiently evaluating this summation across all

d Frobenius twists.

Rewriting the equality sum as a matrix-vector product

To optimize the computation of:

iγieq(Fri(r),x),

we express it as a matrix-vector multiplication:

Leq(r,x),

where:

  • L
    is a
    d×d
    Boolean matrix defined as:
    L=iγiFri.
  • The vector
    eq(r,x)
    encodes the equality checks for all values of
    x
    .

By transforming the problem into a structured matrix operation, we can apply the 4 Russians method to compute it efficiently.

Efficient computation using the 4 Russians method

The 4 Russians method accelerates the computation of

Leq(r,x) using precomputations and chunk-based processing. The steps are as follows:

  1. Precompute XOR values:

    • Divide the matrix
      L
      into chunks of 8 columns.
    • Precompute all
      256
      possible XOR combinations of entries in each chunk.
  2. Apply the matrix to the vector efficiently:

    • Split the input vector
      eq(r,x)
      into bytes (chunks of size 8 bits).
    • Use each byte to index into the precomputed XOR tables and retrieve the corresponding results instantly.

This optimization is based on the insight that boolean matrix operations, particularly XOR operations, can be transformed into simple table lookups. By working with 8-bit chunks, we limit the number of precomputations to

256 combinations per chunk, keeping the process both efficient and scalable.

Computational complexity and efficiency

Using this method, the total runtime for computing

Leq(r,x) is reduced to approximately
2d
multiplications, making it highly efficient.

The combination of:

  • Precomputed XOR tables
  • Chunked processing
  • Optimized Boolean matrix-vector multiplication

ensures that the multiopening verification is performed in near-optimal time, even for large values of

d.

Impact on multiopening arguments

By applying the 4 Russians method, we significantly reduce the computational complexity of verifying multiopening claims. This optimization enhances the efficiency of our boolcheck protocol, especially when dealing with high-dimensional Frobenius twists in interactive proofs.

The method allows us to:

  • Compress multiple claims into a single sumcheck.
  • Optimize the key matrix-vector multiplication step.
  • Achieve significant speed improvements over naive verification methods.

Conclusion

The boolcheck protocol demonstrates how carefully designed algebraic structures can significantly optimize proof systems for quadratic Boolean formulas. By embedding Boolean operations within a finite field and leveraging key mathematical tools, boolcheck achieves a scalable and computationally efficient verification process.

By combining finite field algebra, sumcheck protocol, and matrix optimization techniques, boolcheck achieves a highly efficient Boolean formula verification mechanism. This structured approach to arithmetizing Boolean logic ensures that both provers and verifiers can operate within optimal constraints, making boolcheck a practical and scalable solution for proof systems in cryptographic applications.