Hashcaster - Part 2: The challenges of sumcheck in binary fields

Binary fields

From small fields to binary fields in SNARKs

A widely accepted approach in modern SNARK design is to work over a "small" finite field

F

to maximize performance. The rationale behind this is simple: a fixed number

m

of field operations executes significantly faster in a 32-bit field compared to a 256-bit field.

Computers inherently represent numbers as sequences of bits (zeros and ones) and use digital circuits to perform arithmetic operations like addition and multiplication. Modern hardware is highly optimized for computations with standard integer sizes such as 16, 32, or 64 bits. As a result, certain moduli—such as

2^{64} - 2^{32} + 1

and

2^{31} - 1

—are chosen not only to fit within these bit-size constraints but also to align efficiently with hardware capabilities. For instance, modular multiplication with

2^{64} - 2^{32} + 1

can be executed efficiently using 32-bit multiplications combined with bitwise shifts and copies, reducing computational overhead.

Taking this idea further, one can consider performing arithmetic directly in binary. In such a setting, addition simplifies to a straightforward XOR operation, eliminating the need to manage carry propagation when summing

1 + 1

across bit positions. Similarly, multiplication becomes more parallelizable, further enhancing efficiency. This approach not only streamlines arithmetic but also aligns seamlessly with boolean logic, where computations naturally map to binary operations.

Building on this principle, the Binius team proposed a proof system that operates over the smallest possible field: the binary field

F_{2}

, which consists of just two elements,

0

and

1

. By leveraging arithmetic modulo 2, they exploit the inherent advantages of binary computation, making proof generation and verification more efficient in hardware-friendly environments.

Extension Fields

The field

F_{2}

is fundamentally simple but also quite limited in its applications. To enable more powerful cryptographic constructions and enhance security, we require larger fields that preserve the efficiency of binary arithmetic while expanding its structure. This process of constructing larger fields from smaller ones is known as the tower of binary fields, a concept initially introduced by Wiedemann. An extension field is a finite field that extends a base field by introducing new elements, allowing for a richer algebraic structure.

Starting from a prime field

F_{p}

, an extension field

F_{p^{m}}

is built by incorporating polynomials of degree less than

m

with coefficients in

F_{p}

. This expansion can be visualized as a skyscraper, where each floor represents a progressively larger binary field. The "ground floor" is

F_{2}

, the simplest field containing only

{0, 1}

, and each new floor extends the structure by adding more elements and increasing computational possibilities.

Binary fields differ in their arithmetic compared to other finite fields, so it's essential to understand how addition and multiplication work in

F_{2}

before exploring extensions.

Addition in
$F_{2}$

Addition in

F_{2}

follows XOR logic, where

1 + 1

results in

0

(without carry).

$+$	0	1
0	0	1
1	1	0

Multiplication in
$F_{2}$

Multiplication behaves as expected, following the rules of Boolean AND.

$*$	0	1
0	0	0
1	0	1

Constructing the first extension:
$F_{2^{2}}$

To construct a larger field, we introduce a new element

x

that satisfies the equation:

x^{2} = x + 1.

This polynomial,

P (x) = x^{2} + x + 1

, is irreducible over

F_{2}

, meaning it cannot be factored into lower-degree polynomials within the field. We verify its irreducibility by evaluating it at

0

and

1

P (0) = 0^{2} + 0 + 1 = 1, P (1) = 1^{2} + 1 + 1 = 1.

Since

P (x)

has no roots in

F_{2}

, it forms the basis for constructing

F_{2^{2}}

. The elements of this extended field are all polynomials of degree less than 2 with coefficients in

F_{2}

F_{2^{2}} = {0, 1, x, x + 1} .

Arithmetic in
$F_{2^{2}}$

Addition and multiplication in

F_{2^{2}}

follow specific rules based on polynomial arithmetic modulo

P (x)

. The resulting operation tables are:

Addition Table for
$F_{2^{2}}$

$+$	0	1	$x$	$x + 1$
0	0	1	$x$	$x + 1$
1	1	0	$x + 1$	$x$
$x$	$x$	$x + 1$	0	1
$x + 1$	$x + 1$	$x$	1	0

Multiplication Table for
$F_{2^{2}}$

$*$	1	$x$	$x + 1$
0	0	0	0
1	1	$x$	$x + 1$
$x$	$x$	$x + 1$	1
$x + 1$	$x + 1$	1	$x$

This construction lays the foundation for further field extensions, following the tower approach, to create even larger fields such as

F_{2^{4}}

F_{2^{8}}

, and beyond.

Tower construction

Each subsequent field is constructed by adding a new element

X_{i}

, where the defining polynomial maintains a structured dependency on the previous field elements. The complete tower follows:

\begin{aligned} τ_{0} & = F_{2}, \\ τ_{1} & = τ_{0} [X_{0}] / (X_{0}^{2} + X_{0} + 1) ≅ F_{2^{2}}, \\ τ_{2} & = τ_{1} [X_{1}] / (X_{1}^{2} + X_{1} \cdot X_{0} + 1) ≅ F_{2^{4}}, \\ τ_{3} & = τ_{2} [X_{2}] / (X_{2}^{2} + X_{1} \cdot X_{2} + 1) ≅ F_{2^{8}}, \\ τ_{4} & = τ_{3} [X_{3}] / (X_{3}^{2} + X_{2} \cdot X_{3} + 1) ≅ F_{2^{16}}, \\ τ_{5} & = τ_{4} [X_{4}] / (X_{4}^{2} + X_{3} \cdot X_{4} + 1) ≅ F_{2^{32}}, \\ τ_{6} & = τ_{5} [X_{5}] / (X_{5}^{2} + X_{4} \cdot X_{5} + 1) ≅ F_{2^{64}}, \\ τ_{7} & = τ_{6} [X_{6}] / (X_{6}^{2} + X_{5} \cdot X_{6} + 1) ≅ F_{2^{128}} . \end{aligned}

As illustrated in the figure, a 128-bit string can be interpreted in various ways within a binary field. It can be treated as a single element in a 128-bit binary field, or it can be decomposed into smaller units—such as two 64-bit tower field elements, four 32-bit elements, sixteen 8-bit elements, or even 128 individual elements of

F_{2}

. This flexibility comes at no computational cost, as it simply involves reinterpretation of the bit string rather than any actual transformation. This property is particularly valuable, as it allows smaller field elements to be efficiently packed into larger ones without incurring additional overhead.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Multiplications in binary fields

How things work

In binary fields, addition and subtraction are equivalent and can be performed using the XOR operation (

\oplus

x + y = x \oplus y .

However, multiplication in binary fields is more complex and involves recursive algorithms that decompose elements into lower-order components. Consider two elements

x

and

y

in a binary field, represented as:

x = x_{1} + x_{0} \cdot x_{k}, y = y_{1} + y_{0} \cdot x_{k},

where:

$x_{1}$ and
$y_{1}$ are the low parts, without
$x_{k}$ .
$x_{0}$ and
$y_{0}$ are the high parts, scaled by
$x_{k}$ .

To compute

x \cdot y

, we expand the product:

x \cdot y = x_{1} y_{1} + (x_{1} y_{0} + x_{0} y_{1}) \cdot x_{k} + x_{0} y_{0} \cdot x_{k}^{2} .

Let us now express some simplification rules in binary fields.

Eliminating coefficients of 2: In binary fields,
$1 + 1 = 0$ , so terms involving a coefficient of
$2$ vanish.
Handling
$x_{k}^{2}$ : The term
$x_{k}^{2}$ is reduced using the field’s defining polynomial. For instance, if
$x_{k}^{2} = x_{k - 1} \cdot x_{k} + 1$ , substitute this back into the expanded expression.

After applying the above rules, the product simplifies to:

x \cdot y = x_{1} y_{1} + (x_{1} y_{0} + x_{0} y_{1}) \cdot x_{k} + x_{0} y_{0} \cdot (x_{k - 1} \cdot x_{k} + 1) .

Breaking this down:

$x_{1} y_{1}$ : The product of the low parts, independent of
$x_{k}$ .
$(x_{1} y_{0} + x_{0} y_{1}) \cdot x_{k}$ : The cross-term involving the high and low parts.
$x_{0} y_{0} \cdot x_{k}^{2}$ : The product of the high parts, reduced using the field polynomial.

Multiplication costs

The recursive nature of multiplication in binary fields introduces additional overhead, particularly in extension fields. These costs can be categorized as follows (using the terminology of the related paper from S. Bagad, Y. Domb and J. Thaler):

$t_{bb}$ : The cost of multiplying two base field elements.
$t_{ee}$ : The cost of multiplying two extension field elements.

With this notation in mind, via experiment, the authors of the paper demonstrate the overhead for varying extension degrees:

Base field	Extension field	Degree of extension	$t_{e e} / t_{b b}$
$F_{2}$	$F_{2^{128}}$	128	$145 \times 10^{2}$
$F_{2^{2}}$	$F_{2^{128}}$	64	$2.5 \times 10^{2}$
$F_{2^{4}}$	$F_{2^{128}}$	32	$0.5 \times 10^{2}$

Multiplications in the sumcheck protocol

In the Binius protocol, advancements in polynomial commitment schemes and proving infrastructure have significantly accelerated computations using binary fields. However, this progress has introduced a new challenge: the sumcheck protocol has become the primary computational bottleneck (as discussed in detail in this video). As a result, optimizing sumcheck is now a critical step toward improving prover efficiency.

At its core, the sumcheck protocol ensures soundness by requiring the prover to sample random field elements

r_{1}, \dots, r_{l}

in each round. To maintain security, these elements should ideally come from a large enough field, typically of size at least

2^{128}

. When the polynomial

g

being summed is defined over a small base field, these random elements must instead be sampled from an extension field. This distinction is crucial because multiplications in an extension field are significantly more expensive than those in the base field, making them a major cost driver in the protocol.

This section focuses on pinpointing where these extension-field multiplications occur and analyzing their impact on the prover’s overall computational cost. By carefully breaking down the algorithm’s structure and its associated costs, we aim to build an intuitive understanding of why sumcheck is computationally intensive.

One key aspect of this analysis is the round polynomial, a fundamental component computed at each stage of sumcheck. Understanding its structure and how extension-field multiplications affect its evaluation is essential for designing a more efficient prover.

Round polynomial representation

At round

i

of the sumcheck protocol, the round polynomial

s_{i} (c)

is defined as:

s_{i} (c) := \sum_{x \in {0, 1}^{ℓ - i}} g (r_{1}, r_{2}, \dots, r_{i - 1}, c, x),

where

g

is a recursively defined multilinear polynomial representing the function being summed over a subset of binary variables.

Typically,

g

itself is expressed as a product of

d

multilinear polynomials

p_{1}, \dots, p_{d}

g (r_{1}, r_{2}, \dots, r_{i - 1}, c, x) = \underset{Product of extension field elements}{\underset{⏟}{\prod_{j = 1}^{d} p_{j} (r_{1}, r_{2}, \dots, r_{i - 1}, c, x)}} .

This formulation highlights a critical computational challenge in the sumcheck protocol: the overhead introduced by extension-field multiplications.

Example: Evaluating
$g$ at round 2

To illustrate the mechanics of the sumcheck protocol, let’s examine how the polynomial

g

is evaluated at round 2. This serves as a simple example to help understand the recursive nature of sumcheck before generalizing to later rounds.

At round 2, an arbitrary polynomial

p

among the

d

multilinear polynomials

p_{1}, \dots, p_{d}

is computed as:

p (r_{1}, c, x) = (1 - r_{1}) \cdot \color r e d p (0, c, x) + r_{1} \cdot \color g r e e n p (1, c, x),

where:

$\color r e d p (0, c, x)$ and
$\color g r e e n p (1, c, x)$ are precomputed evaluations of
$p$ at specific points.
$r_{1}$ is a randomly sampled challenge that ensures the soundness of the protocol.

This recursive decomposition is fundamental to sumcheck, as each polynomial evaluation in a given round is derived from its values in the previous round.

I feel like it is important to convince the reader why this expression is true. So let us make a small optional digression to justify this decomposition used to compute

p (r_{1}, c, x)

Multilinear polynomial decomposition

A multilinear polynomial

p : F^{ℓ} \to F

is linear in each variable when the others are fixed. This property allows us to express

p

in a structured form that simplifies computations.

Focusing on a single variable,

r_{1}

, while keeping the others fixed as

x^{'}

, the polynomial

p (r_{1}, x^{'})

behaves as a degree-1 polynomial in

r_{1}

p (r_{1}, x^{'}) = a \cdot r_{1} + b,

where

a

and

b

depend on

x^{'}

Defining boundary conditions
The values of
$p (r_{1}, x^{'})$ at specific points determine
$a$ and
$b$ :
- When
  $r_{1} = 1$ :
  
  $p (1, x^{'}) = a + b .$
- When
  $r_{1} = 0$ :
  
  $p (0, x^{'}) = b .$
Solving for
$a$ and
$b$
- By subtracting the second equation from the first:
  
  $a = p (1, x^{'}) - p (0, x^{'}) .$
- Directly substituting
  $b$ :
  
  $b = p (0, x^{'}) .$
Reconstructing the polynomial
Substituting these values back into
$p (r_{1}, x^{'}) = a \cdot r_{1} + b$ , we get:

$p (r_{1}, x^{'}) = r_{1} \cdot p (1, x^{'}) + (1 - r_{1}) \cdot p (0, x^{'}) .$

Expanding Multiplications in Sumcheck

Now, let’s consider a function

g

as the product of two polynomials

p (r_{1}, c, x)

and

q (r_{1}, c, x)

\begin{aligned} p (r_{1}, c, x) & = (1 - r_{1}) \cdot \color r e d p (0, c, x) + r_{1} \cdot \color g r e e n p (1, c, x), \\ q (r_{1}, c, x) & = (1 - r_{1}) \cdot \color r e d q (0, c, x) + r_{1} \cdot \color g r e e n q (1, c, x) . \end{aligned}

Expanding the product

p (r_{1}, c, x) \cdot q (r_{1}, c, x)

results in:

\begin{aligned} g (r_{1}, c, x) & = p (r_{1}, c, x) \cdot q (r_{1}, c, x) \\ = (1 - r_{1}) (1 - r_{1}) \cdot \color r e d p (0, c, x) q (0, c, x) \\ + (1 - r_{1}) r_{1} \cdot \color r e d p (0, c, x) q (1, c, x) \\ + r_{1} (1 - r_{1}) \cdot \color g r e e n p (1, c, x) q (0, c, x) \\ + r_{1} r_{1} \cdot \color g r e e n p (1, c, x) q (1, c, x) . \end{aligned}

This example provides a concrete view of how sumcheck recursively expands polynomials while maintaining soundness through random challenges. In the next section, we generalize this expansion to an arbitrary round

i

in the protocol.

Decomposing polynomial terms in sumcheck

As the sumcheck protocol progresses, the round polynomial expands into a structured decomposition involving challenge terms and witness terms. Specifically, each polynomial term takes the form:

p_{j} (r_{1}, \dots, r_{i - 1}, c, x) = \underset{challenge terms}{\underset{⏟}{\prod_{k = 1}^{i - 1} (1 - r_{k}) or r_{k}}} \cdot \underset{witness terms}{\underset{⏟}{\sum_{b \in {0, 1}^{i - 1}} p_{j} (b_{1}, \dots, b_{i - 1}, c, x)}} .

Expanding further, this results in:

p_{j} (r_{1}, \dots, r_{i - 1}, c, x) = \underset{challenge terms}{\underset{⏟}{\begin{aligned} (1 - r_{1}) \dots (1 - r_{i - 2}) (1 - r_{i - 1}) \cdot \\ (1 - r_{1}) \dots (1 - r_{i - 2}) \cdot r_{i - 1} \cdot \\ (1 - r_{1}) \dots r_{i - 2} (1 - r_{i - 1}) \cdot \\ (1 - r_{1}) \dots r_{i - 2} r_{i - 1} \cdot \\ ⋮ \\ r_{1} \dots r_{i - 2} r_{i - 1} \cdot \end{aligned}}} \underset{witness terms}{\underset{⏟}{\begin{aligned} p_{j} (0, \dots, 0, 0, c, x) + \\ p_{j} (0, \dots, 0, 1, c, x) + \\ p_{j} (0, \dots, 1, 0, c, x) + \\ p_{j} (0, \dots, 1, 1, c, x) + \\ ⋮ \\ p_{j} (1, \dots, 1, 1, c, x) \end{aligned}}}

Computational impact on the prover

These computations introduce a significant multiplication overhead in extension fields, which becomes a bottleneck on the prover's side of the sumcheck protocol. This inefficiency is particularly pronounced in the early rounds, where the number of terms is still large.

Linear-time sumcheck prover

The classical linear-time sumcheck prover is designed to compute the sumcheck for the expression:

\sum_{x \in {0, 1}^{ℓ}} g (x),

where

g (x)

is a product of polynomials:

g (x) = \prod_{i = 1}^{d} p_{i} (x) .

For simplicity, let's consider the case where

d = 2

, meaning:

g (x) = p (x) \cdot q (x) .

At a high level, the prover's goal is to progressively reduce the complexity of this computation by iteratively simplifying the problem over

$ℓ$ rounds. For this, the prover organizes evaluations of

p

and

q

into structured arrays and updates them in a way that allows computations to be performed in linear time relative to the input size.

Maintaining arrays

The prover maintains two arrays,

A

and

B

, which initially store all evaluations of

p (x)

and

q (x)

over the entire hypercube

{0, 1}^{ℓ}

. These arrays are indexed by

x

, meaning that each entry corresponds to a specific assignment of binary values to the input variables.

At initialization, the prover sets:

A [x] = p (x), B [x] = q (x), \forall x \in {0, 1}^{ℓ} .

At this stage, both

A

and

B

contain
$2^{ℓ}$ elements. However, in each round of sumcheck, the prover halves the size of these arrays by updating their values based on random challenges provided by the verifier. This halving process is a crucial reason why the early rounds of sumcheck are the most computationally expensive: the arrays are at their largest, and the cost of processing them is highest.

After the first round, the prover updates

A

and

B

as follows:

\begin{aligned} A [x] & \leftarrow A [0, x] + r_{1} (A [1, x] - A [0, x]), \\ B [x] & \leftarrow B [0, x] + r_{1} (B [1, x] - B [0, x]), \end{aligned}

where

r_{1}

is a random element chosen by the verifier from an extension field.

This transformation ensures that each update step preserves correctness while reducing the number of values stored. This process continues for

ℓ

rounds, progressively reducing the arrays until only a single value remains, which serves as the final output of the sumcheck protocol.

Computational cost of the sumcheck prover

For a polynomial

g (x)

of degree
$d = 2$ , the total computational cost for the prover, over all
$ℓ$ rounds, is given by:

Total Cost = (\frac{3 n}{2} \cdot t_{bb} + n \cdot t_{be}) + \sum_{i = 2}^{ℓ} \frac{4 n}{2^{i}} \cdot t_{ee},

where:

$n$ : The size of the hypercube, given by
$n = 2^{ℓ}$ .
$t_{bb}$ : The cost of a base field
$\times$ base field multiplication.
$t_{be}$ : The cost of a base field
$\times$ extension field multiplication.
$t_{ee}$ : The cost of an extension field
$\times$ extension field multiplication.

This formula breaks down into three main components, each representing different aspects of the prover’s workload.

Breaking down the cost

First term -
$\frac{3 n}{2} \cdot t_{bb}$ : This accounts for the computations in round 1, specifically for constructing the first sumcheck messages:
$s_{1} (0)$ ,
$s_{1} (1)$ , and
$s_{1} (2)$ .
Second term -
$n \cdot t_{be}$ : This term corresponds to the cost of batch evaluations in the first round, which involves computing polynomial values at multiple points simultaneously.
Summation term -
$\sum_{i = 2}^{ℓ} \frac{4 n}{2^{i}} \cdot t_{ee}$ : This captures the iterative updates for subsequent rounds (
$i \geq 2$ ), where the array size is progressively halved.

Conclusion

Through this analysis, we have explored the sumcheck protocol in binary fields and identified extension-field multiplications as a major computational bottleneck. While working over binary fields provides significant efficiency gains in polynomial commitment schemes and SNARKs, the need for random sampling from large extension fields introduces expensive multiplications that dominate the prover's workload.

By breaking down the round polynomial computations and the linear-time sumcheck prover, we have pinpointed where these costly multiplications occur and how they scale with the number of rounds. Specifically, in early rounds, when arrays are largest, the prover incurs the highest overhead, making optimization in these stages crucial.

Understanding the structure of binary fields, extension towers, and the arithmetic involved allows us to better design and optimize proof systems that leverage the speed of binary computation while mitigating the costs of working in extension fields. Future work will focus on reducing these overheads, improving the efficiency of sumcheck, and exploring alternative constructions that minimize the impact of extension-field multiplications in the prover’s computation.

Hashcaster - Part 2: The challenges of sumcheck in binary fields

Binary fields

From small fields to binary fields in SNARKs

Extension Fields

Addition in F2

Multiplication in F2

Constructing the first extension: F22

Arithmetic in F22

Addition Table for F22

Multiplication Table for F22

Tower construction

Multiplications in binary fields

How things work

Multiplication costs

Multiplications in the sumcheck protocol

Round polynomial representation

Example: Evaluating g at round 2

Multilinear polynomial decomposition

Expanding Multiplications in Sumcheck

Decomposing polynomial terms in sumcheck

Computational impact on the prover

Linear-time sumcheck prover

Maintaining arrays

Computational cost of the sumcheck prover

Breaking down the cost

Conclusion

Read more

beam call #2

Circuit gadgets

Openvm stark backend

Hashcaster - Part 3: boolcheck

Addition in
$F_{2}$

Multiplication in
$F_{2}$

Constructing the first extension:
$F_{2^{2}}$

Arithmetic in
$F_{2^{2}}$

Addition Table for
$F_{2^{2}}$

Multiplication Table for
$F_{2^{2}}$

Example: Evaluating
$g$ at round 2