# Hashcaster - Part 2: The challenges of sumcheck in binary fields
## Binary fields
### From small fields to binary fields in SNARKs
A widely accepted approach in modern SNARK design is to work over a "small" finite field $\mathbb{F}$ to maximize performance. The rationale behind this is simple: a fixed number $m$ of field operations executes significantly faster in a 32-bit field compared to a 256-bit field.
Computers inherently represent numbers as sequences of bits (zeros and ones) and use digital circuits to perform arithmetic operations like addition and multiplication. Modern hardware is highly optimized for computations with standard integer sizes such as 16, 32, or 64 bits. As a result, certain moduli—such as $2^{64} - 2^{32} + 1$ and $2^{31} - 1$—are chosen not only to fit within these bit-size constraints but also to align efficiently with hardware capabilities. For instance, modular multiplication with $2^{64} - 2^{32} + 1$ can be executed efficiently using 32-bit multiplications combined with bitwise shifts and copies, reducing computational overhead.
Taking this idea further, one can consider performing arithmetic directly in binary. In such a setting, addition simplifies to a straightforward XOR operation, eliminating the need to manage carry propagation when summing $1 + 1$ across bit positions. Similarly, multiplication becomes more parallelizable, further enhancing efficiency. This approach not only streamlines arithmetic but also aligns seamlessly with boolean logic, where computations naturally map to binary operations.
Building on this principle, the Binius team proposed a proof system that operates over the smallest possible field: the binary field $\mathbb{F}_2$, which consists of just two elements, $0$ and $1$. By leveraging arithmetic modulo 2, they exploit the inherent advantages of binary computation, making proof generation and verification more efficient in hardware-friendly environments.
### Extension Fields
The field $\mathbb{F}_2$ is fundamentally simple but also quite limited in its applications. To enable more powerful cryptographic constructions and enhance security, we require larger fields that preserve the efficiency of binary arithmetic while expanding its structure. This process of constructing larger fields from smaller ones is known as the *tower of binary fields*, a concept initially introduced by Wiedemann. An *extension field* is a finite field that extends a base field by introducing new elements, allowing for a richer algebraic structure.
Starting from a prime field $\mathbb{F}_p$, an extension field $\mathbb{F}_{p^m}$ is built by incorporating polynomials of degree less than $m$ with coefficients in $\mathbb{F}_p$. This expansion can be visualized as a skyscraper, where each floor represents a progressively larger binary field. The "ground floor" is $\mathbb{F}_2$, the simplest field containing only $\{0,1\}$, and each new floor extends the structure by adding more elements and increasing computational possibilities.
Binary fields differ in their arithmetic compared to other finite fields, so it's essential to understand how addition and multiplication work in $\mathbb{F}_2$ before exploring extensions.
#### Addition in $\mathbb{F}_2$
Addition in $\mathbb{F}_2$ follows XOR logic, where $1+1$ results in $0$ (without carry).
| $+$ | 0 | 1 |
|----|---|---|
| 0 | 0 | 1 |
| 1 | 1 | 0 |
#### Multiplication in $\mathbb{F}_2$
Multiplication behaves as expected, following the rules of Boolean AND.
| $*$ | 0 | 1 |
|----|---|---|
| 0 | 0 | 0 |
| 1 | 0 | 1 |
### Constructing the first extension: $\mathbb{F}_{2^2}$
To construct a larger field, we introduce a new element $x$ that satisfies the equation:
\begin{equation}
x^2 = x + 1.
\end{equation}
This polynomial, $P(x) = x^2 + x + 1$, is irreducible over $\mathbb{F}_2$, meaning it cannot be factored into lower-degree polynomials within the field. We verify its irreducibility by evaluating it at $0$ and $1$:
\begin{equation}
P(0) = 0^2 + 0 + 1 = 1, \quad P(1) = 1^2 + 1 + 1 = 1.
\end{equation}
Since $P(x)$ has no roots in $\mathbb{F}_2$, it forms the basis for constructing $\mathbb{F}_{2^2}$. The elements of this extended field are all polynomials of degree less than 2 with coefficients in $\mathbb{F}_2$:
\begin{equation}
\mathbb{F}_{2^2} = \{0, 1, x, x+1\}.
\end{equation}
### Arithmetic in $\mathbb{F}_{2^2}$
Addition and multiplication in $\mathbb{F}_{2^2}$ follow specific rules based on polynomial arithmetic modulo $P(x)$. The resulting operation tables are:
#### Addition Table for $\mathbb{F}_{2^2}$
| $+$ | 0 | 1 | $x$ | $x+1$ |
|-------|-----|-----|------|------|
| 0 | 0 | 1 | $x$ | $x+1$ |
| 1 | 1 | 0 | $x+1$ | $x$ |
| $x$ | $x$ | $x+1$ | 0 | 1 |
| $x+1$ | $x+1$ | $x$ | 1 | 0 |
#### Multiplication Table for $\mathbb{F}_{2^2}$
| $*$ | 0 | 1 | $x$ | $x+1$ |
|-------|-----|-----|------|------|
| 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | $x$ | $x+1$ |
| $x$ | 0 | $x$ | $x+1$ | 1 |
| $x+1$ | 0 | $x+1$ | 1 | $x$ |
This construction lays the foundation for further field extensions, following the tower approach, to create even larger fields such as $\mathbb{F}_{2^4}$, $\mathbb{F}_{2^8}$, and beyond.
### Tower construction
Each subsequent field is constructed by adding a new element $X_i$, where the defining polynomial maintains a structured dependency on the previous field elements. The complete tower follows:
\begin{equation}
\begin{aligned}
\tau_0 &= \mathbb{F}_2, \\
\tau_1 &= \tau_0[X_0] / (X_0^2 + X_0 + 1) \cong \mathbb{F}_{2^2}, \\
\tau_2 &= \tau_1[X_1] / (X_1^2 + X_1 \cdot X_0 + 1) \cong \mathbb{F}_{2^4}, \\
\tau_3 &= \tau_2[X_2] / (X_2^2 + X_1 \cdot X_2 + 1) \cong \mathbb{F}_{2^8}, \\
\tau_4 &= \tau_3[X_3] / (X_3^2 + X_2 \cdot X_3 + 1) \cong \mathbb{F}_{2^{16}}, \\
\tau_5 &= \tau_4[X_4] / (X_4^2 + X_3 \cdot X_4 + 1) \cong \mathbb{F}_{2^{32}}, \\
\tau_6 &= \tau_5[X_5] / (X_5^2 + X_4 \cdot X_5 + 1) \cong \mathbb{F}_{2^{64}}, \\
\tau_7 &= \tau_6[X_6] / (X_6^2 + X_5 \cdot X_6 + 1) \cong \mathbb{F}_{2^{128}}.
\end{aligned}
\end{equation}
As illustrated in the figure, a 128-bit string can be interpreted in various ways within a binary field. It can be treated as a single element in a 128-bit binary field, or it can be decomposed into smaller units—such as two 64-bit tower field elements, four 32-bit elements, sixteen 8-bit elements, or even 128 individual elements of $\mathbb{F}_2$. This flexibility comes at no computational cost, as it simply involves reinterpretation of the bit string rather than any actual transformation. This property is particularly valuable, as it allows smaller field elements to be efficiently packed into larger ones without incurring additional overhead.

## Multiplications in binary fields
### How things work
In binary fields, addition and subtraction are equivalent and can be performed using the XOR operation ($\oplus$):
\begin{equation}
x + y = x \oplus y.
\end{equation}
However, multiplication in binary fields is more complex and involves recursive algorithms that decompose elements into lower-order components. Consider two elements $x$ and $y$ in a binary field, represented as:
\begin{equation}
x = x_1 + x_0 \cdot x_k, \quad y = y_1 + y_0 \cdot x_k,
\end{equation}
where:
- $x_1$ and $y_1$ are the low parts, without $x_k$.
- $x_0$ and $y_0$ are the high parts, scaled by $x_k$.
To compute $x \cdot y$, we expand the product:
\begin{equation}
x \cdot y = x_1 y_1 + \left( x_1 y_0 + x_0 y_1 \right) \cdot x_k + x_0 y_0 \cdot x_k^2.
\end{equation}
Let us now express some simplification rules in binary fields.
1. **Eliminating coefficients of 2:** In binary fields, $1 + 1 = 0$, so terms involving a coefficient of $2$ vanish.
2. **Handling $x_k^2$:** The term $x_k^2$ is reduced using the field’s defining polynomial. For instance, if $x_k^2 = x_{k-1} \cdot x_k + 1$, substitute this back into the expanded expression.
After applying the above rules, the product simplifies to:
\begin{equation}
x \cdot y = x_1 y_1 + \left( x_1 y_0 + x_0 y_1 \right) \cdot x_k + x_0 y_0 \cdot (x_{k-1} \cdot x_k + 1).
\end{equation}
Breaking this down:
- $x_1 y_1$: The product of the low parts, independent of $x_k$.
- $\left( x_1 y_0 + x_0 y_1 \right) \cdot x_k$: The cross-term involving the high and low parts.
- $x_0 y_0 \cdot x_k^2$: The product of the high parts, reduced using the field polynomial.
### Multiplication costs
The recursive nature of multiplication in binary fields introduces additional overhead, particularly in extension fields. These costs can be categorized as follows (using the terminology of the related [paper](https://eprint.iacr.org/2024/1046.pdf) from S. Bagad, Y. Domb and J. Thaler):
- $\mathbf{t_{bb}}$: The cost of multiplying two base field elements.
- $\mathbf{t_{ee}}$: The cost of multiplying two extension field elements.
With this notation in mind, via experiment, the authors of the paper demonstrate the overhead for varying extension degrees:
| **Base field** | **Extension field** | **Degree of extension** | **$t_{ee} / t_{bb}$** |
|---------------------|------------------------|------------------------|-----------------------|
| $\mathbb{F}_{2}$ | $\mathbb{F}_{2^{128}}$ | 128 | $145 \times 10^2$ |
| $\mathbb{F}_{2^2}$ | $\mathbb{F}_{2^{128}}$ | 64 | $2.5 \times 10^2$ |
| $\mathbb{F}_{2^4}$ | $\mathbb{F}_{2^{128}}$ | 32 | $0.5 \times 10^2$ |
## Multiplications in the sumcheck protocol
In the Binius protocol, advancements in polynomial commitment schemes and proving infrastructure have significantly accelerated computations using binary fields. However, this progress has introduced a new challenge: the sumcheck protocol has become the primary computational bottleneck (as discussed in detail in this [video](https://www.youtube.com/watch?v=9lldbNTqbYQ)). As a result, optimizing sumcheck is now a critical step toward improving prover efficiency.
At its core, the sumcheck protocol ensures **soundness** by requiring the prover to sample random field elements $r_1, \dots, r_l$ in each round. To maintain security, these elements should ideally come from a large enough field, typically of size at least $2^{128}$. When the polynomial $g$ being summed is defined over a small base field, these random elements must instead be sampled from an extension field. This distinction is crucial because multiplications in an extension field are significantly more expensive than those in the base field, making them a major cost driver in the protocol.
This section focuses on pinpointing where these extension-field multiplications occur and analyzing their impact on the prover’s overall computational cost. By carefully breaking down the algorithm’s structure and its associated costs, we aim to build an intuitive understanding of why sumcheck is computationally intensive.
One key aspect of this analysis is the *round polynomial*, a fundamental component computed at each stage of sumcheck. Understanding its structure and how extension-field multiplications affect its evaluation is essential for designing a more efficient prover.
### Round polynomial representation
At round $i$ of the sumcheck protocol, the round polynomial $s_i(c)$ is defined as:
\begin{equation}
s_i(c) := \sum_{x \in \{0,1\}^{\ell - i}} g(r_1, r_2, \dots, r_{i-1}, c, x),
\end{equation}
where $g$ is a recursively defined multilinear polynomial representing the function being summed over a subset of binary variables.
Typically, $g$ itself is expressed as a product of $d$ multilinear polynomials $p_1, \dots, p_d$:
\begin{equation}
g(r_1, r_2, \dots, r_{i-1}, c, x) = \underbrace{\prod_{j=1}^d p_j(r_1, r_2, \dots, r_{i-1}, c, x)}_{\text{Product of extension field elements}}.
\end{equation}
This formulation highlights a critical computational challenge in the sumcheck protocol: **the overhead introduced by extension-field multiplications**.
### Example: Evaluating $g$ at round 2
To illustrate the mechanics of the sumcheck protocol, let’s examine how the polynomial $g$ is evaluated at **round 2**. This serves as a simple example to help understand the recursive nature of sumcheck before generalizing to later rounds.
At round 2, an arbitrary polynomial $p$ among the $d$ multilinear polynomials $p_1, \dots, p_d$ is computed as:
\begin{equation}
p(r_1, c, x) = (1 - r_1) \cdot \color{red}{p(0, c, x)} + r_1 \cdot \color{green}{p(1, c, x)},
\end{equation}
where:
- $\color{red}{p(0, c, x)}$ and $\color{green}{p(1, c, x)}$ are precomputed evaluations of $p$ at specific points.
- $r_1$ is a randomly sampled challenge that ensures the soundness of the protocol.
This recursive decomposition is fundamental to sumcheck, as each polynomial evaluation in a given round is derived from its values in the previous round.
I feel like it is important to convince the reader why this expression is true. So let us make a small optional digression to justify this decomposition used to compute $p(r_1, c, x)$.
:::info
### Multilinear polynomial decomposition
A multilinear polynomial $p: \mathbb{F}^\ell \to \mathbb{F}$ is linear in each variable when the others are fixed. This property allows us to express $p$ in a structured form that simplifies computations.
Focusing on a single variable, $r_1$, while keeping the others fixed as $x'$, the polynomial $p(r_1, x')$ behaves as a degree-1 polynomial in $r_1$:
$$
p(r_1, x') = a \cdot r_1 + b,
$$
where $a$ and $b$ depend on $x'$.
1. **Defining boundary conditions**
The values of $p(r_1, x')$ at specific points determine $a$ and $b$:
- **When $r_1 = 1$:**
$$ p(1, x') = a + b. $$
- **When $r_1 = 0$:**
$$ p(0, x') = b. $$
2. **Solving for $a$ and $b$**
- By subtracting the second equation from the first:
$$ a = p(1, x') - p(0, x'). $$
- Directly substituting $b$:
$$ b = p(0, x'). $$
3. **Reconstructing the polynomial**
Substituting these values back into $p(r_1, x') = a \cdot r_1 + b$, we get:
$$
p(r_1, x') = r_1 \cdot p(1, x') + (1 - r_1) \cdot p(0, x').
$$
:::
### Expanding Multiplications in Sumcheck
Now, let’s consider a function $g$ as the product of two polynomials $p(r_1, c, x)$ and $q(r_1, c, x)$:
\begin{aligned}
p(r_1, c, x) &= (1 - r_1) \cdot \color{red}{p(0, c, x)} + r_1 \cdot \color{green}{p(1, c, x)}, \\
q(r_1, c, x) &= (1 - r_1) \cdot \color{red}{q(0, c, x)} + r_1 \cdot \color{green}{q(1, c, x)}.
\end{aligned}
Expanding the product $p(r_1, c, x) \cdot q(r_1, c, x)$ results in:
\begin{aligned}
g(r_1, c, x) &= p(r_1, c, x) \cdot q(r_1, c, x) \\
&= (1 - r_1)(1 - r_1) \cdot \color{red}{p(0, c, x) q(0, c, x)} \\
&\quad+ (1 - r_1)r_1 \cdot \color{red}{p(0, c, x) q(1, c, x)} \\
&\quad+ r_1(1 - r_1) \cdot \color{green}{p(1, c, x) q(0, c, x)} \\
&\quad+ r_1r_1 \cdot \color{green}{p(1, c, x) q(1, c, x)}.
\end{aligned}
This example provides a concrete view of how sumcheck recursively expands polynomials while maintaining soundness through random challenges. In the next section, we generalize this expansion to an arbitrary round $i$ in the protocol.
### Decomposing polynomial terms in sumcheck
As the sumcheck protocol progresses, the round polynomial expands into a structured decomposition involving **challenge terms** and **witness terms**. Specifically, each polynomial term takes the form:
\begin{equation}
p_j(r_1, \dots, r_{i-1}, c, x) = \underbrace{\prod_{k=1}^{i-1} (1 - r_k) \text{ or } r_k}_{\text{challenge terms}} \cdot \underbrace{\sum_{b \in \{0,1\}^{i-1}} p_j(b_1, \dots, b_{i-1}, c, x)}_{\text{witness terms}}.
\end{equation}
Expanding further, this results in:
\begin{equation}
p_j(r_1, \dots, r_{i-1}, c, x) =
\underbrace{
\begin{aligned}
&(1 - r_1) \cdots (1 - r_{i-2})(1 - r_{i-1}) \cdot \\
&(1 - r_1) \cdots (1 - r_{i-2}) \cdot r_{i-1} \cdot \\
&(1 - r_1) \cdots r_{i-2}(1 - r_{i-1}) \cdot \\
&(1 - r_1) \cdots r_{i-2}r_{i-1} \cdot \\
&\vdots \\
&r_1 \cdots r_{i-2}r_{i-1} \cdot
\end{aligned}
}_{\text{challenge terms}} \quad
\underbrace{
\begin{aligned}
&p_j(0, \dots, 0, 0, c, x) + \\
&p_j(0, \dots, 0, 1, c, x) + \\
&p_j(0, \dots, 1, 0, c, x) + \\
&p_j(0, \dots, 1, 1, c, x) + \\
&\vdots \\
&p_j(1, \dots, 1, 1, c, x)
\end{aligned}
}_{\text{witness terms}}
\end{equation}
### Computational impact on the prover
These computations introduce a significant multiplication overhead in extension fields, which becomes a bottleneck on the prover's side of the sumcheck protocol. This inefficiency is particularly pronounced in the early rounds, where the number of terms is still large.
### Linear-time sumcheck prover
The classical linear-time sumcheck prover is designed to compute the sumcheck for the expression:
\begin{equation}
\sum_{x \in \{0,1\}^\ell} g(x),
\end{equation}
where $g(x)$ is a product of polynomials:
\begin{equation}
g(x) = \prod_{i=1}^d p_i(x).
\end{equation}
For simplicity, let's consider the case where $d = 2$, meaning:
\begin{equation}
g(x) = p(x) \cdot q(x).
\end{equation}
At a high level, the prover's goal is to progressively reduce the complexity of this computation by iteratively simplifying the problem over **$\ell$ rounds**. For this, the prover organizes evaluations of $p$ and $q$ into structured arrays and updates them in a way that allows computations to be performed in **linear time** relative to the input size.
### Maintaining arrays
The prover maintains two arrays, $A$ and $B$, which initially store **all** evaluations of $p(x)$ and $q(x)$ over the entire hypercube $\{0,1\}^\ell$. These arrays are indexed by $x$, meaning that each entry corresponds to a specific assignment of binary values to the input variables.
At initialization, the prover sets:
\begin{equation}
A[x] = p(x), \quad B[x] = q(x), \quad \forall x \in \{0,1\}^\ell.
\end{equation}
At this stage, both $A$ and $B$ contain **$2^\ell$ elements**. However, in each round of sumcheck, the prover **halves the size** of these arrays by updating their values based on random challenges provided by the verifier. This halving process is a crucial reason why the **early rounds** of sumcheck are the most computationally expensive: the arrays are at their largest, and the cost of processing them is highest.
After the first round, the prover updates $A$ and $B$ as follows:
\begin{equation}
\begin{aligned}
A[x] &\gets A[0, x] + r_1 \left( A[1, x] - A[0, x] \right), \\
B[x] &\gets B[0, x] + r_1 \left( B[1, x] - B[0, x] \right),
\end{aligned}
\end{equation}
where $r_1$ is a random element chosen by the verifier from an extension field.
This transformation ensures that each update step preserves correctness while reducing the number of values stored. This process continues for $\ell$ rounds, progressively reducing the arrays until only a single value remains, which serves as the final output of the sumcheck protocol.
### Computational cost of the sumcheck prover
For a polynomial $g(x)$ of degree **$d = 2$**, the total computational cost for the prover, over all **$\ell$ rounds**, is given by:
\begin{equation}
\text{Total Cost} = \left( \frac{3n}{2} \cdot \mathbf{t_{bb}} + n \cdot \mathbf{t_{be}} \right) + \sum_{i=2}^\ell \frac{4n}{2^i} \cdot \mathbf{t_{ee}},
\end{equation}
where:
- **$n$**: The size of the hypercube, given by $n = 2^\ell$.
- **$\mathbf{t_{bb}}$**: The cost of a base field $\times$ base field multiplication.
- **$\mathbf{t_{be}}$**: The cost of a base field $\times$ extension field multiplication.
- **$\mathbf{t_{ee}}$**: The cost of an extension field $\times$ extension field multiplication.
This formula breaks down into three main components, each representing different aspects of the prover’s workload.
#### Breaking down the cost
1. **First term -** $\dfrac{3n}{2} \cdot \mathbf{t_{bb}}$: This accounts for the computations in **round 1**, specifically for constructing the first sumcheck messages: $s_1(0)$, $s_1(1)$, and $s_1(2)$.
2. **Second term -** $n \cdot \mathbf{t_{be}}$: This term corresponds to the **cost of batch evaluations** in the first round, which involves computing polynomial values at multiple points simultaneously.
3. **Summation term -** $\sum_{i=2}^\ell \dfrac{4n}{2^i} \cdot \mathbf{t_{ee}}$: This captures the **iterative updates** for subsequent rounds ($i \geq 2$), where the array size is progressively halved.
## Conclusion
Through this analysis, we have explored the sumcheck protocol in binary fields and identified extension-field multiplications as a major computational bottleneck. While working over binary fields provides significant efficiency gains in polynomial commitment schemes and SNARKs, the need for random sampling from large extension fields introduces expensive multiplications that dominate the prover's workload.
By breaking down the round polynomial computations and the linear-time sumcheck prover, we have pinpointed where these costly multiplications occur and how they scale with the number of rounds. Specifically, in early rounds, when arrays are largest, the prover incurs the highest overhead, making optimization in these stages crucial.
Understanding the structure of binary fields, extension towers, and the arithmetic involved allows us to better design and optimize proof systems that leverage the speed of binary computation while mitigating the costs of working in extension fields. Future work will focus on reducing these overheads, improving the efficiency of sumcheck, and exploring alternative constructions that minimize the impact of extension-field multiplications in the prover’s computation.