Haschaster - Part 1: what are the specificities of Sumcheck over small fields?

# Haschaster - Part 1: Sumcheck protocol ## Why do we need Sumcheck? Polynomials are fundamental to modern cryptographic protocols, especially zk-SNARKs (zero-knowledge, succinct, non-interactive arguments of knowledge). These protocols transform complex computational statements into polynomial equations, enabling efficient verification without exposing sensitive information. The sum-check protocol is a cornerstone of modern SNARKs and it is used all over the place. It ensures that the prover performs meaningful computation while minimizing the cryptographic commitment overhead. By harnessing multivariate polynomials, the protocol allows a prover to convince a verifier of the correctness of a polynomial sum over a Boolean hypercube. This is achieved interactively, with the verifier performing minimal computation while preserving soundness. Consider a $v$-variate polynomial $g$ defined over a finite field $\mathbb{F}$. The objective of the sum-check protocol is for the prover to convince the verifier of the correctness of the following sum: \begin{equation} H := \sum_{b_1 \in \{0,1\}} \sum_{b_2 \in \{0,1\}} \dots \sum_{b_v \in \{0,1\}} g(b_1, \dots, b_v), \end{equation} where the summation is taken over all Boolean assignments to the variables $b_1, \dots, b_v$. In this protocol, the prover has access to the polynomial $g(x)$, and the verifier sends a sequence of random challenges $r_k$ from a challenge set $\mathbb{F}$. The prover responds with polynomials $g_k(x)$, which allow the verifier to confirm the correctness of the claimed sum $H$. The key advantage of the sum-check protocol lies in its ability to significantly reduce the computational burden on the verifier. Instead of directly evaluating the $v$-variate polynomial $g(x)$ over all $2^v$ possible inputs (e.g., for $v = 16$, this would require $2^{16}$ evaluations which is very expensive), the verifier only needs to perform a single evaluation of $g$ at a randomly chosen point in $\mathbb{F}^v$. This is complemented by a small number of lightweight checks, enabling efficient verification while maintaining soundness. ## Description of the protocol As described by J. Thaler in [his book](https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.pdf) called "Proofs, Arguments, and Zero-Knowledge", the Sumcheck protocol can be described in a self-contained way as follows: - At the start of the protocol, the prover sends a value $C_1$ claimed to equal the value $H$. The sum-check protocol then proceeds in $v$ rounds, one for each variable of $g$. At the end of the protocol, the objective for the prover ($\mathcal{P}$) is to demonstrate to the verifier ($\mathcal{V}$) that $C_1$ is indeed equal to $H$ without ever revealing $g$ to $\mathcal{V}$. - In the first round, $\mathcal{P}$ sends the univariate polynomial $g_1(X_1)$ claimed to equal \begin{equation} s_1 \left ( X_1 \right ) := \sum_{(x_2,\dots,x_v) \in \{0,1\}^{v-1}} g(X_1, x_2, \dots, x_v). \end{equation} $\mathcal{V}$ checks that \begin{equation} C_1 = g_1(0) + g_1(1), \end{equation} and that $g_1$ is a univariate polynomial of degree at most $\deg_1(g)$, rejecting if not. Here, $\deg_j(g)$ denotes the degree of $g(X_1, \dots, X_v)$ in variable $X_j$. - The polynomial $g_1 \left ( X_1 \right )$ sent by the prover in the first round is claimed to equal the polynomial $s_1 \left (X_1 \right )$. The idea of the sum-check protocol is that $\mathcal{V}$ will probabilistically check this equality of polynomials holds by picking a random field element $r_1 \in \mathbb{F}$ and confirm that $g_1 \left (r_1 \right ) = s_1 \left (r_1 \right )$. So here, $\mathcal{V}$ chooses a random element $r_1 \in \mathbb{F}$, and sends $r_1$ to $\mathcal{P}$. - In the $j$th round, for $1 < j < v$, $\mathcal{P}$ sends to $\mathcal{V}$ a univariate polynomial $g_j(X_j)$ claimed to equal \begin{equation} s_j = \sum_{(x_{j+1},\dots,x_v) \in \{0,1\}^{v-j}} g(r_1, \dots, r_{j-1}, X_j, x_{j+1}, \dots, x_v). \end{equation} $\mathcal{V}$ checks that $g_j$ is a univariate polynomial of degree at most $\deg_j(g)$, and that $g_{j-1}(r_{j-1}) = g_j(0) + g_j(1)$, rejecting if not. - $\mathcal{V}$ chooses a random element $r_j \in \mathbb{F}$, and sends $r_j$ to $\mathcal{P}$. - In round $v$ (last round), $\mathcal{P}$ sends to $\mathcal{V}$ a univariate polynomial $g_v(X_v)$ claimed to equal \begin{equation} g(r_1, \dots, r_{v-1}, X_v). \end{equation} $\mathcal{V}$ checks that $g_v$ is a univariate polynomial of degree at most $\deg_v(g)$, rejecting if not, and also checks that $g_{v-1}(r_{v-1}) = g_v(0) + g_v(1)$. - $\mathcal{V}$ chooses a random element $r_v \in \mathbb{F}$ and evaluates $g(r_1, \dots, r_v)$ with a single oracle query to $g$. $\mathcal{V}$ checks that $g_v(r_v) = g(r_1, \dots, r_v)$, rejecting if not. - If $\mathcal{V}$ has not yet rejected, $\mathcal{V}$ halts and accepts. ![image](https://hackmd.io/_uploads/r1y-Wg_uJx.png) ## Sumcheck walkthrough via an example In order to describe the protocol in a clearer way and illustrate each step, let us take an example. It is important to note that during this example, you can forget about the notion of finite field. All calculations are made in the classical way for the sake of simplicity and pedagogy, without loss of generality. The goal is to provide to the reader a complete overview of the protocol. Let us consider the following polynomial: \begin{equation} g \left (X_1, X_2, X_3 \right ) = 2 X_1^3 + X_1 X_3 + X_2 X_3 \end{equation} - At the start of the protocol, the prover sends a value $C_1$ claimed to equal the value $H$ (sum of $g$ over the boolean hypercube). With our example: \begin{equation} \begin{aligned} C_1 &= g \left (0, 0, 0 \right ) + g \left (0, 1, 0 \right ) + g \left (0, 0, 1 \right ) + g \left (0, 1, 1 \right ) \\ &+ g \left (1, 0, 0 \right ) + g \left (1, 1, 0 \right ) + g \left (1, 0, 1 \right ) + g \left (1, 1, 1 \right ) \\ &= \underbrace{\left(2 \cdot 0^3 + 0 \cdot 0 + 0 \cdot 0\right)}_{g(0,0,0)} + \underbrace{\left(2 \cdot 0^3 + 0 \cdot 0 + 1 \cdot 0\right)}_{g(0,1,0)} + \underbrace{\left(2 \cdot 0^3 + 0 \cdot 1 + 0 \cdot 1\right)}_{g(0,0,1)} \\ &+ \underbrace{\left(2 \cdot 0^3 + 0 \cdot 1 + 1 \cdot 1\right)}_{g(0,1,1)} + \quad \underbrace{\left(2 \cdot 1^3 + 1 \cdot 0 + 0 \cdot 0\right)}_{g(1,0,0)} + \underbrace{\left(2 \cdot 1^3 + 1 \cdot 0 + 1 \cdot 0\right)}_{g(1,1,0)} \\ &+ \underbrace{\left(2 \cdot 1^3 + 1 \cdot 1 + 0 \cdot 1\right)}_{g(1,0,1)} + \underbrace{\left(2 \cdot 1^3 + 1 \cdot 1 + 1 \cdot 1\right)}_{g(1,1,1)} \\ &= 0 + 0 + 0 + 1 + 2 + 2 + 3 + 4 \\ &= 12 \end{aligned} \end{equation} - In the first round, the honest prover $\mathcal{P}$ sends the univariate polynomial $g_1(X_1)$ defined as: \begin{equation} g_1(X_1) = \sum_{(X_2, X_3) \in \{0,1\}^2} g(X_1, X_2, X_3). \end{equation} This gives: \begin{equation} \begin{aligned} g_1(X_1) &= g(X_1, 0, 0) + g(X_1, 0, 1) + g(X_1, 1, 0) + g(X_1, 1, 1) \\ &= (2X_1^3) + (2X_1^3 + X_1) + (2X_1^3) + (2X_1^3 + X_1 + 1) \\ &= 8X_1^3 + 2X_1 + 1. \end{aligned} \end{equation} The prover then sends the polynomial: \begin{equation} g_1(X_1) = 8X_1^3 + 2X_1 + 1. \end{equation} The verifier $\mathcal{V}$ checks: \begin{equation} C_1 = g_1(0) + g_1(1) = 1 + 11 = 12, \end{equation} which matches the claim. Also the verifier checks that $g_1$ is at most of degree 3 (which is the degree of $X_1$ in $g$), otherwise the claim is false. Indeed, if the prover is honest, the polynomial $g_1 \left ( X_1 \right )$ has degree $\deg_1 \left ( g \right )$. - The idea of the sum-check protocol is then to check in a probabilistic way that the polynomial $g_1 \left ( X_1 \right )$ sent by the prover in the first round really equals $s_1 \left (X_1 \right ) := \sum_{\left ( x_2, ..., x_v \right ) \in \{0,1\}^{v-1}} g \left ( X_1, x_2, ..., x_v \right )$. Then, the verifier chooses $r_1 = 2$ and sends it to $\mathcal{P}$. - In the second round, $\mathcal{P}$ sends a univariate polynomial $g_2(X_2)$ defined as: \begin{align*} g_2(X_2) &= \sum_{X_3 \in \{0,1\}} g(r_1, X_2, X_3) \\ &= g \left (2, X_2, 0 \right ) + g \left (2, X_2, 1 \right ) \\ &= \left ( 2 \cdot 2^3 + 2 \cdot 0 + X_2 \cdot 0 \right ) + \left ( 2 \cdot 2^3 + 2 \cdot 1 + X_2 \cdot 1 \right ) \\ &= 34 + X_2 \end{align*} The prover sends: \begin{equation} g_2(X_2) = 34 + X_2. \end{equation} The verifier checks: \begin{equation} g_2(0) + g_2(1) = 34 + 35 = 69, \end{equation} and ensures this matches $g_1(r_1) = g_1(2)$, which computes to: \begin{equation} g_1(2) = 8(2^3) + 2(2) + 1 = 64 + 4 + 1 = 69. \end{equation} The verifier rejects if inconsistent but accepts here. - $\mathcal{V}$ chooses a random element $r_2 = 3$ and sends it to $\mathcal{P}$. - In the third round, $\mathcal{P}$ sends the polynomial $g_3(X_3)$ defined as: \begin{align*} g_3(X_3) &= g(r_1, r_2, X_3) \\ &= g(2, 3, X_3) \\ &= 16 + 2 X_3 + 3 X_3 \\ &= 16 + 5 X_3 \end{align*} The prover sends: \begin{equation} g_3(X_3) = 16 + 5X_3. \end{equation} - The verifier checks: \begin{equation} g_3(0) + g_3(1) = 16 + 21 = 37, \end{equation} and ensures this matches $g_2(r_2) = g_2(3)$, which computes to: \begin{equation} g_2(3) = 34 + 3 = 37. \end{equation} The verifier proceeds as the values match. - Finally, $\mathcal{V}$ picks a random field element $r_3 = 6$ and evaluates $g(2, 3, 6) = 46$ with a single oracle query to $g$. The verifier confirms that $g_3(r_3) = g(2, 3, 6)$ by evaluating: \begin{equation} g_3(6) = 16 + 5(6) = 16 + 30 = 46. \end{equation} - If all checks pass, $\mathcal{V}$ halts and accepts. ## Completeness and soundness This section is for readers who wish to understand why the sum-check protocol satisfies completeness and soundness. It provides theoretical justifications that, while insightful, are not essential for grasping the protocol, which has already been thoroughly explained above. Those uninterested in the theoretical aspects may choose to stop here. Let us recall some definitions which may have been forgotten along the way. Let $g$ be a $v$-variate polynomial of degree at most $d$ in each variable, defined over a finite field $\mathbb{F}$. For any specified $H \in \mathbb{F}$, let $\mathcal{L}$ be the language of polynomials $g$ (given as an oracle) such that \begin{equation} H := \sum_{b_1 \in \left \{ 0,1\right \}} \sum_{b_2 \in \left \{ 0,1\right \}} \dots \sum_{b_v \in \left \{ 0,1\right \}} g \left ( b_1, \dots, b_v \right ) \end{equation} The sum-check protocol is an interactive proof system for $\mathcal{L}$ with: - Completeness error $\delta_c = 0$ - Soundness error $\delta_s \leq v \cdot d / |\mathbb{F}|$ Let us demonstrate that properly. ### Completeness Completeness is obvious: if the prover sends the prescribed polynomial $g_j \left ( X_j \right )$ at all rounds $j$, then $\mathbb{V}$ will accept with probability 1. ### Soundness We now provide a proof of the soundness property of the sum-check protocol, explaining why a dishonest prover cannot convince an honest verifier to accept an incorrect claim with high probability Suppose the claim made by the prover about the sum $H$ is incorrect, meaning \begin{equation} H \neq \sum_{(x_1, \dots, x_v) \in \{0,1\}^v} g(x_1, x_2, \dots, x_v). \end{equation} The verifier must detect this discrepancy during the protocol. For the prover to convince the verifier to accept despite the incorrect claim, there must exist at least one round $i$ where the prover sends a polynomial $g_i(X_i)$ that does not equal the prescribed polynomial $s_i(X_i)$, yet the prover manages to deceive the verifier by making $g_i(r_i) = s_i(r_i)$. Let us now break this down step by step: #### What is $s_i(X_i)$? In each round $i$, the verifier expects the polynomial $s_i(X_i)$, defined as: \begin{equation} s_i(X_i) := \sum_{(x_{i+1}, \dots, x_v) \in \{0,1\}^{v-i}} g(r_1, \dots, r_{i-1}, X_i, x_{i+1}, \dots, x_v), \end{equation} where $r_1, \dots, r_{i-1}$ are the random field elements chosen by the verifier in previous rounds. This is the correct polynomial for round $i$, determined by fixing the values of variables $X_1, \dots, X_{i-1}$ to $r_1, \dots, r_{i-1}$, and summing $g$ over all remaining Boolean inputs. #### What happens if $g_i(X_i) \neq s_i(X_i)$? If the polynomial $g_i(X_i)$ sent by the prover is not equal to the expected polynomial $s_i(X_i)$, the verifier may still fail to detect the discrepancy if the prover is lucky enough to pick $g_i(X_i)$ such that: \begin{equation} g_i(r_i) = s_i(r_i), \end{equation} where $r_i$ is the random field element chosen by the verifier for round $i$. Since both $g_i(X_i)$ and $s_i(X_i)$ are univariate polynomials of degree at most $d$, they can agree on at most $d$ points in the field $\mathbb{F}$ (by the fundamental theorem of algebra). ![image](https://hackmd.io/_uploads/SyHrpx_Okg.png) The probability that the verifier’s randomly chosen $r_i$ happens to be one of these $d$ points is therefore at most: \begin{equation} \Pr[g_i(r_i) = s_i(r_i) \mid g_i(X_i) \neq s_i(X_i)] \leq \frac{d}{|\mathbb{F}|}. \end{equation} #### What is the total probability of error? The prover must successfully deceive the verifier in every round $i$ in order to have the verifier accept the incorrect claim $H$. Using the union bound, the probability that the verifier fails to detect any discrepancy across all $v$ rounds is at most: \begin{equation} \Pr[\text{Verifier accepts incorrect claim}] \leq \sum_{i=1}^v \Pr[g_i(r_i) = s_i(r_i) \mid g_i(X_i) \neq s_i(X_i)]. \end{equation} Since there are $v$ rounds, and the probability of $g_i(r_i) = s_i(r_i)$ in any single round is at most $d / |\mathbb{F}|$, we have: \begin{equation} \Pr[\text{Verifier accepts incorrect claim}] \leq v \cdot \frac{d}{|\mathbb{F}|}. \end{equation} As a conclusion, the probability that a dishonest prover can convince the verifier to accept an incorrect claim is at most: \begin{equation} \frac{v \cdot d}{|\mathbb{F}|}. \end{equation} This bound shows that the soundness error of the sum-check protocol decreases as the size of the field $|\mathbb{F}|$ increases, or as the degree $d$ of the polynomial or the number of rounds $v$ decreases. For a sufficiently large field $\mathbb{F}$, the probability of the verifier being deceived can be made arbitrarily small.