Try   HackMD

Hashcaster - Part 2: The challenges of sumcheck in binary fields

Binary fields

From small fields to binary fields in SNARKs

A widely accepted approach in modern SNARK design is to work over a "small" finite field

F to maximize performance. The rationale behind this is simple: a fixed number
m
of field operations executes significantly faster in a 32-bit field compared to a 256-bit field.

Computers inherently represent numbers as sequences of bits (zeros and ones) and use digital circuits to perform arithmetic operations like addition and multiplication. Modern hardware is highly optimized for computations with standard integer sizes such as 16, 32, or 64 bits. As a result, certain moduli—such as

264232+1 and
2311
—are chosen not only to fit within these bit-size constraints but also to align efficiently with hardware capabilities. For instance, modular multiplication with
264232+1
can be executed efficiently using 32-bit multiplications combined with bitwise shifts and copies, reducing computational overhead.

Taking this idea further, one can consider performing arithmetic directly in binary. In such a setting, addition simplifies to a straightforward XOR operation, eliminating the need to manage carry propagation when summing

1+1 across bit positions. Similarly, multiplication becomes more parallelizable, further enhancing efficiency. This approach not only streamlines arithmetic but also aligns seamlessly with boolean logic, where computations naturally map to binary operations.

Building on this principle, the Binius team proposed a proof system that operates over the smallest possible field: the binary field

F2, which consists of just two elements,
0
and
1
. By leveraging arithmetic modulo 2, they exploit the inherent advantages of binary computation, making proof generation and verification more efficient in hardware-friendly environments.

Extension Fields

The field

F2 is fundamentally simple but also quite limited in its applications. To enable more powerful cryptographic constructions and enhance security, we require larger fields that preserve the efficiency of binary arithmetic while expanding its structure. This process of constructing larger fields from smaller ones is known as the tower of binary fields, a concept initially introduced by Wiedemann. An extension field is a finite field that extends a base field by introducing new elements, allowing for a richer algebraic structure.

Starting from a prime field

Fp, an extension field
Fpm
is built by incorporating polynomials of degree less than
m
with coefficients in
Fp
. This expansion can be visualized as a skyscraper, where each floor represents a progressively larger binary field. The "ground floor" is
F2
, the simplest field containing only
{0,1}
, and each new floor extends the structure by adding more elements and increasing computational possibilities.

Binary fields differ in their arithmetic compared to other finite fields, so it's essential to understand how addition and multiplication work in

F2 before exploring extensions.

Addition in
F2

Addition in

F2 follows XOR logic, where
1+1
results in
0
(without carry).

+
0 1
0 0 1
1 1 0

Multiplication in
F2

Multiplication behaves as expected, following the rules of Boolean AND.

0 1
0 0 0
1 0 1

Constructing the first extension:
F22

To construct a larger field, we introduce a new element

x that satisfies the equation:
x2=x+1.

This polynomial,
P(x)=x2+x+1
, is irreducible over
F2
, meaning it cannot be factored into lower-degree polynomials within the field. We verify its irreducibility by evaluating it at
0
and
1
:

P(0)=02+0+1=1,P(1)=12+1+1=1.

Since

P(x) has no roots in
F2
, it forms the basis for constructing
F22
. The elements of this extended field are all polynomials of degree less than 2 with coefficients in
F2
:

F22={0,1,x,x+1}.

Arithmetic in
F22

Addition and multiplication in

F22 follow specific rules based on polynomial arithmetic modulo
P(x)
. The resulting operation tables are:

Addition Table for
F22

+
0 1
x
x+1
0 0 1
x
x+1
1 1 0
x+1
x
x
x
x+1
0 1
x+1
x+1
x
1 0

Multiplication Table for
F22

0 1
x
x+1
0 0 0 0 0
1 0 1
x
x+1
x
0
x
x+1
1
x+1
0
x+1
1
x

This construction lays the foundation for further field extensions, following the tower approach, to create even larger fields such as

F24,
F28
, and beyond.

Tower construction

Each subsequent field is constructed by adding a new element

Xi, where the defining polynomial maintains a structured dependency on the previous field elements. The complete tower follows:

τ0=F2,τ1=τ0[X0]/(X02+X0+1)F22,τ2=τ1[X1]/(X12+X1X0+1)F24,τ3=τ2[X2]/(X22+X1X2+1)F28,τ4=τ3[X3]/(X32+X2X3+1)F216,τ5=τ4[X4]/(X42+X3X4+1)F232,τ6=τ5[X5]/(X52+X4X5+1)F264,τ7=τ6[X6]/(X62+X5X6+1)F2128.

As illustrated in the figure, a 128-bit string can be interpreted in various ways within a binary field. It can be treated as a single element in a 128-bit binary field, or it can be decomposed into smaller units—such as two 64-bit tower field elements, four 32-bit elements, sixteen 8-bit elements, or even 128 individual elements of

F2. This flexibility comes at no computational cost, as it simply involves reinterpretation of the bit string rather than any actual transformation. This property is particularly valuable, as it allows smaller field elements to be efficiently packed into larger ones without incurring additional overhead.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Multiplications in binary fields

How things work

In binary fields, addition and subtraction are equivalent and can be performed using the XOR operation (

):
x+y=xy.

However, multiplication in binary fields is more complex and involves recursive algorithms that decompose elements into lower-order components. Consider two elements

x and
y
in a binary field, represented as:
x=x1+x0xk,y=y1+y0xk,

where:

  • x1
    and
    y1
    are the low parts, without
    xk
    .
  • x0
    and
    y0
    are the high parts, scaled by
    xk
    .

To compute

xy, we expand the product:
xy=x1y1+(x1y0+x0y1)xk+x0y0xk2.

Let us now express some simplification rules in binary fields.

  1. Eliminating coefficients of 2: In binary fields,
    1+1=0
    , so terms involving a coefficient of
    2
    vanish.
  2. Handling
    xk2
    :
    The term
    xk2
    is reduced using the field’s defining polynomial. For instance, if
    xk2=xk1xk+1
    , substitute this back into the expanded expression.

After applying the above rules, the product simplifies to:

xy=x1y1+(x1y0+x0y1)xk+x0y0(xk1xk+1).

Breaking this down:

  • x1y1
    : The product of the low parts, independent of
    xk
    .
  • (x1y0+x0y1)xk
    : The cross-term involving the high and low parts.
  • x0y0xk2
    : The product of the high parts, reduced using the field polynomial.

Multiplication costs

The recursive nature of multiplication in binary fields introduces additional overhead, particularly in extension fields. These costs can be categorized as follows (using the terminology of the related paper from S. Bagad, Y. Domb and J. Thaler):

  • tbb
    : The cost of multiplying two base field elements.
  • tee
    : The cost of multiplying two extension field elements.

With this notation in mind, via experiment, the authors of the paper demonstrate the overhead for varying extension degrees:

Base field Extension field Degree of extension
tee/tbb
F2
F2128
128
145×102
F22
F2128
64
2.5×102
F24
F2128
32
0.5×102

Multiplications in the sumcheck protocol

In the Binius protocol, advancements in polynomial commitment schemes and proving infrastructure have significantly accelerated computations using binary fields. However, this progress has introduced a new challenge: the sumcheck protocol has become the primary computational bottleneck (as discussed in detail in this video). As a result, optimizing sumcheck is now a critical step toward improving prover efficiency.

At its core, the sumcheck protocol ensures soundness by requiring the prover to sample random field elements

r1,,rl in each round. To maintain security, these elements should ideally come from a large enough field, typically of size at least
2128
. When the polynomial
g
being summed is defined over a small base field, these random elements must instead be sampled from an extension field. This distinction is crucial because multiplications in an extension field are significantly more expensive than those in the base field, making them a major cost driver in the protocol.

This section focuses on pinpointing where these extension-field multiplications occur and analyzing their impact on the prover’s overall computational cost. By carefully breaking down the algorithm’s structure and its associated costs, we aim to build an intuitive understanding of why sumcheck is computationally intensive.

One key aspect of this analysis is the round polynomial, a fundamental component computed at each stage of sumcheck. Understanding its structure and how extension-field multiplications affect its evaluation is essential for designing a more efficient prover.

Round polynomial representation

At round

i of the sumcheck protocol, the round polynomial
si(c)
is defined as:

si(c):=x{0,1}ig(r1,r2,,ri1,c,x),

where

g is a recursively defined multilinear polynomial representing the function being summed over a subset of binary variables.

Typically,

g itself is expressed as a product of
d
multilinear polynomials
p1,,pd
:

g(r1,r2,,ri1,c,x)=j=1dpj(r1,r2,,ri1,c,x)Product of extension field elements.

This formulation highlights a critical computational challenge in the sumcheck protocol: the overhead introduced by extension-field multiplications.

Example: Evaluating
g
at round 2

To illustrate the mechanics of the sumcheck protocol, let’s examine how the polynomial

g is evaluated at round 2. This serves as a simple example to help understand the recursive nature of sumcheck before generalizing to later rounds.

At round 2, an arbitrary polynomial

p among the
d
multilinear polynomials
p1,,pd
is computed as:

p(r1,c,x)=(1r1)\colorredp(0,c,x)+r1\colorgreenp(1,c,x),

where:

  • \colorredp(0,c,x)
    and
    \colorgreenp(1,c,x)
    are precomputed evaluations of
    p
    at specific points.
  • r1
    is a randomly sampled challenge that ensures the soundness of the protocol.

This recursive decomposition is fundamental to sumcheck, as each polynomial evaluation in a given round is derived from its values in the previous round.

I feel like it is important to convince the reader why this expression is true. So let us make a small optional digression to justify this decomposition used to compute

p(r1,c,x).

Multilinear polynomial decomposition

A multilinear polynomial

p:FF is linear in each variable when the others are fixed. This property allows us to express
p
in a structured form that simplifies computations.

Focusing on a single variable,

r1, while keeping the others fixed as
x
, the polynomial
p(r1,x)
behaves as a degree-1 polynomial in
r1
:

p(r1,x)=ar1+b,

where

a and
b
depend on
x
.

  1. Defining boundary conditions
    The values of

    p(r1,x) at specific points determine
    a
    and
    b
    :

    • When
      r1=1
      :

      p(1,x)=a+b.
    • When
      r1=0
      :

      p(0,x)=b.
  2. Solving for

    a and
    b

    • By subtracting the second equation from the first:
      a=p(1,x)p(0,x).
    • Directly substituting
      b
      :
      b=p(0,x).
  3. Reconstructing the polynomial
    Substituting these values back into

    p(r1,x)=ar1+b, we get:
    p(r1,x)=r1p(1,x)+(1r1)p(0,x).

Expanding Multiplications in Sumcheck

Now, let’s consider a function

g as the product of two polynomials
p(r1,c,x)
and
q(r1,c,x)
:

p(r1,c,x)=(1r1)\colorredp(0,c,x)+r1\colorgreenp(1,c,x),q(r1,c,x)=(1r1)\colorredq(0,c,x)+r1\colorgreenq(1,c,x).

Expanding the product

p(r1,c,x)q(r1,c,x) results in:

g(r1,c,x)=p(r1,c,x)q(r1,c,x)=(1r1)(1r1)\colorredp(0,c,x)q(0,c,x)+(1r1)r1\colorredp(0,c,x)q(1,c,x)+r1(1r1)\colorgreenp(1,c,x)q(0,c,x)+r1r1\colorgreenp(1,c,x)q(1,c,x).

This example provides a concrete view of how sumcheck recursively expands polynomials while maintaining soundness through random challenges. In the next section, we generalize this expansion to an arbitrary round

i in the protocol.

Decomposing polynomial terms in sumcheck

As the sumcheck protocol progresses, the round polynomial expands into a structured decomposition involving challenge terms and witness terms. Specifically, each polynomial term takes the form:

pj(r1,,ri1,c,x)=k=1i1(1rk) or rkchallenge termsb{0,1}i1pj(b1,,bi1,c,x)witness terms.

Expanding further, this results in:

pj(r1,,ri1,c,x)=(1r1)(1ri2)(1ri1)(1r1)(1ri2)ri1(1r1)ri2(1ri1)(1r1)ri2ri1r1ri2ri1challenge termspj(0,,0,0,c,x)+pj(0,,0,1,c,x)+pj(0,,1,0,c,x)+pj(0,,1,1,c,x)+pj(1,,1,1,c,x)witness terms

Computational impact on the prover

These computations introduce a significant multiplication overhead in extension fields, which becomes a bottleneck on the prover's side of the sumcheck protocol. This inefficiency is particularly pronounced in the early rounds, where the number of terms is still large.

Linear-time sumcheck prover

The classical linear-time sumcheck prover is designed to compute the sumcheck for the expression:

x{0,1}g(x),

where

g(x) is a product of polynomials:

g(x)=i=1dpi(x).

For simplicity, let's consider the case where

d=2, meaning:

g(x)=p(x)q(x).

At a high level, the prover's goal is to progressively reduce the complexity of this computation by iteratively simplifying the problem over

rounds. For this, the prover organizes evaluations of
p
and
q
into structured arrays and updates them in a way that allows computations to be performed in linear time relative to the input size.

Maintaining arrays

The prover maintains two arrays,

A and
B
, which initially store all evaluations of
p(x)
and
q(x)
over the entire hypercube
{0,1}
. These arrays are indexed by
x
, meaning that each entry corresponds to a specific assignment of binary values to the input variables.

At initialization, the prover sets:

A[x]=p(x),B[x]=q(x),x{0,1}.

At this stage, both

A and
B
contain
2
elements
. However, in each round of sumcheck, the prover halves the size of these arrays by updating their values based on random challenges provided by the verifier. This halving process is a crucial reason why the early rounds of sumcheck are the most computationally expensive: the arrays are at their largest, and the cost of processing them is highest.

After the first round, the prover updates

A and
B
as follows:

A[x]A[0,x]+r1(A[1,x]A[0,x]),B[x]B[0,x]+r1(B[1,x]B[0,x]),

where

r1 is a random element chosen by the verifier from an extension field.

This transformation ensures that each update step preserves correctness while reducing the number of values stored. This process continues for

rounds, progressively reducing the arrays until only a single value remains, which serves as the final output of the sumcheck protocol.

Computational cost of the sumcheck prover

For a polynomial

g(x) of degree
d=2
, the total computational cost for the prover, over all
rounds
, is given by:

Total Cost=(3n2tbb+ntbe)+i=24n2itee,

where:

  • n
    : The size of the hypercube, given by
    n=2
    .
  • tbb
    : The cost of a base field
    ×
    base field multiplication.
  • tbe
    : The cost of a base field
    ×
    extension field multiplication.
  • tee
    : The cost of an extension field
    ×
    extension field multiplication.

This formula breaks down into three main components, each representing different aspects of the prover’s workload.

Breaking down the cost

  1. First term -
    3n2tbb
    : This accounts for the computations in round 1, specifically for constructing the first sumcheck messages:
    s1(0)
    ,
    s1(1)
    , and
    s1(2)
    .
  2. Second term -
    ntbe
    : This term corresponds to the cost of batch evaluations in the first round, which involves computing polynomial values at multiple points simultaneously.
  3. Summation term -
    i=24n2itee
    : This captures the iterative updates for subsequent rounds (
    i2
    ), where the array size is progressively halved.

Conclusion

Through this analysis, we have explored the sumcheck protocol in binary fields and identified extension-field multiplications as a major computational bottleneck. While working over binary fields provides significant efficiency gains in polynomial commitment schemes and SNARKs, the need for random sampling from large extension fields introduces expensive multiplications that dominate the prover's workload.

By breaking down the round polynomial computations and the linear-time sumcheck prover, we have pinpointed where these costly multiplications occur and how they scale with the number of rounds. Specifically, in early rounds, when arrays are largest, the prover incurs the highest overhead, making optimization in these stages crucial.

Understanding the structure of binary fields, extension towers, and the arithmetic involved allows us to better design and optimize proof systems that leverage the speed of binary computation while mitigating the costs of working in extension fields. Future work will focus on reducing these overheads, improving the efficiency of sumcheck, and exploring alternative constructions that minimize the impact of extension-field multiplications in the prover’s computation.