# Small-Field Zerocheck: Continued. References: 1. [ZeroCheck](https://eprint.iacr.org/2024/108.pdf) 1. "A proposal for hybrid proof design" (unpublished note by Ulrich) 1. https://hackmd.io/AZ7BNUfuT9y_V39sFo8Xqw ## Background The essential idea of the univariate skip is to define "virtual" polynomials $\left( \hat{M}_i \right)_{i = 0}^{n - 1}$ on the rectangular domain $D \times \mathcal{B}_{\nu - k}$. These are defined so as to take the same values that the functions $\left( M_i \right)_{i = 0}^{n - 1}$ respectively do over the hypercube (though of course "rearranged" using some canonical mapping $D \times \mathcal{B}_{\nu - k} \to \mathcal{B}_\nu$). To make this slightly more precise (this will pay off below), let's fix an $\mathbb{F}_2$-basis of $(\beta_0, \ldots , \beta_{k - 1})$ of $D \subset K$. We then have the identification $\mathcal{B}_k \to D$, given by $(u_0, \ldots , u_{k - 1}) \mapsto \sum_{j = 0}^{k - 1} u_j \cdot \beta_j$. For notation, I will write $\{u\} := \sum_{j = 0}^{k - 1} u_j \cdot \beta_j$ for this latter thing. (This is _different_ from the notation in Binius. Here, instead of mapping $\{0, 1\}^k \to \{0, \ldots , 2^k - 1\}$, we want a mapping $\{0, 1\}^k \to D$.) Now the definitional property of the $\left( \hat{M}_i \right)_{i = 0}^{n - 1}$ is that for each $i \in \{0, \ldots , n - 1\}$, for each $u \in \mathcal{B}_k$ and each $v \in \mathcal{B}_{\nu - k}$, we have $M_i(u_0, \ldots , u_{k - 1}, v_0, \ldots , v_{\nu - k - 1}) = \hat{M}_i(\{u\}, v_0, \ldots , v_{\nu - k - 1})$. The main point of [Gru, § 5] is that the zerocheck on [a composition of] the $\left( \hat{M}_i \right)_{i = 0}^{n - 1}$ is more efficient than that on the $\left( M_i \right)_{i = 0}^{n - 1}$. The problem is that we can't directly commit to or evaluate the $\left( \hat{M}_i \right)_{i = 0}^{n - 1}$s, since we don't have the relevant oracles. The point of [Jim's note](https://hackmd.io/AZ7BNUfuT9y_V39sFo8Xqw) is to give an expression for the "virtual" oracles $\left( \hat{M}_i \right)_{i = 0}^{n - 1}$ in terms of the plain oracles for the $\left( M_i \right)_{i = 0}^{n - 1}$. ## Interlude: Ulrich's Idea I will now quickly summarize in generic language Ulrich's multilinear–univariate adaptation. Basically, the idea is a way to _emulate_ a univariate commitment scheme using a multilinear one. In other words, given access to a multilinear scheme, how do we make it walk and quack like a univariate one? Ulrich gives a way to do this which is succinct for the verifier; internally, it uses a bivariate Lagrange quotient polynomial, as well as his fractional sumcheck. ### Bivariate Lagrange in Characteristic 2 Let's fix once and for all a size parameter $k \in \mathbb{N}$ and a domain $D \subset K$ of size $2^k$. Angus describes the polynomial $\delta_D(X, Y) \in K[X, Y]$: \begin{equation}\delta_D(X, Y) = \sum_{u \in D} \prod_{w \neq u} \frac{(X - w) \cdot (Y - w)}{(u - w)^2}.\end{equation} You can check by hand that $\delta_D(X, Y)$'s restriction to $D \times D \subset K \times K$ is the $\delta$-function. (For $x \neq y$ in $D$, for each index $u \in D$ of the outer sum, the inner product will contain some term whose numerator is 0. For $x = y$ in $D$, the summand corresponding to $u = x = y$ will equal 1.) The first starting point is that, in the special case that $D \subset K$ is an $\mathbb{F}_2$-subspace, we can write down a bivarate _quotient_ which agrees identically with $\delta_D(X, Y)$ outside of its denominator's vanishing locus. Let's write $W_D(X) := \prod_{u \in D} (X - v) \in K[X]$ for the usual subspace vanishing polynomial of $D$; recall that this thing has at most $k$ nonzero terms. Now I claim moreover that we have the divisibility $X + Y \mid W_D(X) + W_D(Y)$ in $K[X, Y]$. To prove this, note the factorization \begin{equation}W_D(X) + W_D(Y) = \prod_{u \in D} (X + Y + u) = (X + Y) \cdot \prod_{u \neq 0} (X + Y + u).\end{equation} The validity of the first equality is a bit tricky; I can explain a proof shortly. Assuming the above equality, I now claim: \begin{equation}\delta_D(X, Y) = \frac{1}{c} \cdot \frac{W_D(X) + W_D(Y)}{X + Y} = \frac{1}{c} \cdot \prod_{u \in D \setminus \{0\}} (X + Y + u),\end{equation} where $c := \prod_{u \in D \setminus \{0\}} u$ is a constant normalization factor that can be computed in advance. This thing is a well-defined _polynomial_ by the factorization just given. As before, the above polynomial restricts to the $\delta$-function on $D \times D$. (For unequal elements $x \neq y$ of $D$, $W_D(x) + W_D(y) = 0 + 0 = 0$, whereas $x + y \neq 0$; we conclude from the above factorization that $\delta_D(x, y) = 0$. On the other hand, for $x = y$ in $D$, $\delta_D(x, y) = \frac{1}{c} \cdot \prod_{u \in D \setminus \{0\}} u = 1$.) Moreover, this thing is of individual degree less than $2^k$ in both $X$ and $Y$. We conclude that it's the same polynomial as Angus's $\delta_D(X, Y)$. **Remark.** In the special case that $D \subset K$ is a _subfield_, things become even nicer. (This fact in turn holds whenever $k$ divides the degree of $K$, and provided that the basis $(\beta_0, \ldots , \beta_k)$ is chosen sensibly. For $K$ a tower, this essentially amounts to requiring that $k$ is a power of 2, as well as using the multilinear basis to define $D$.) In this case, we have $c = 1$, thanks to [this fact](https://math.stackexchange.com/a/170614); moreover, $W_D(X) \in K[X]$ takes the especially simple form $X^{2^k} + X$. ### The protocol. Let's now fix a univariate polynomial $h(X)$ of degree less than $2^k$. The prover commits to the _multilinear_ $h'(X_0, \ldots , X_{k - 1})$ defined by the evaluation table $\left( h(\{u\}) \right)_{u \in \mathcal{B}_k}$. The verifier then samples the evaluation point $s'$. By all the above facts, the value the verifier's interested in learning is exactly: \begin{equation}h(s') = \sum_{u \in \mathcal{B}_k} \delta_D(\{u\}, s') \cdot h(\{u\}) = \sum_{u \in \mathcal{B}_k} \frac{W_D(\{u\}) - W_D(s')}{\{u\} - s'} \cdot h'(u).\end{equation} Now on the one hand, $W_D(\{u\})$ is identically zero for $u \in \mathcal{B}_k$, so we can just nuke it entirely from the above expression. On the other hand, the map which, on the cube, sends $u \mapsto \{u\}$, is as multilinear as it gets—in fact it's linear; it has the succinct expression $e(X_0, \ldots , X_{k - 1}) := \sum_{j = 0}^{k - 1} X_j \cdot \beta_j$. So the sum we're left with is: \begin{equation}h(s') = -W_D(s') \cdot \sum_{u \in \mathcal{B}_k} \frac{h'(u)}{e(u) - s'}. \end{equation} The sum above is a perfectly good quotient of multilinears, whose sum over the cube we're interested in learning. This thus perfectly plugs into the fractional GKR protocol described in [[Papini and Haböck](https://eprint.iacr.org/2023/1284.pdf), § 3]. So after running the fractional sumcheck, which takes $O(k)$ work for the verifier, the verifier will be left with the task of learning $W_D(s')$, which it can do locally in $O(k)$ work, as well as querying both $e(X)$ and $h'(X)$ at some _single_ random point $r'' \in K^k$ sampled during the fractional sumcheck. It can learn $e(r'')$ itself locally using $O(k)$ work. Finally, $h'(r'')$ will come from a query to the oracle committed by the prover. ## Application to Zerocheck Here is how we can apply these ideas. For notation, let's fix an integer $k' \in \mathbb{N}$ such that $2^{k'} \geq d \cdot (2^k - 1)$, as well as an extended domain $D \subset D'$ of size $2^{k'}$. ### The prover's first step The very first thing the prover will need to do is to, for each $i \in \{0, \ldots , n - 1\}$ and each $v \in \mathcal{B}_{\nu - k}$, is obtain the values $\hat{M}_k(y, v_0, \ldots , v_{\nu - k - 1})$ for $(d - 1) \cdot (2^k - 1)$ _further_ points $y \in K \setminus D$ (we are exploiting [Gru, § 3.2] in this count). Unwinding what this means, we see that it amounts to, again for each $i \in \{0, \ldots , n - 1\}$ and $v \in \mathcal{B}_{\nu - k}$, performing an LDE on the data $\left( M_i(u_0, \ldots , u_{k - 1}, v_0, \ldots , v_{\nu - k - 1})_{u \in \mathcal{B}_k} \right)$; i.e., extrapolating this data. The first remark is that we can use [LCH14] for this, and extrapolate to the power-of-2-sized domain $D'$. Note btw that the resulting extrapolations will also have relatively small coefficients; indeed, they will be the max of the $M_i$'s field of definition and $k$, whichever is larger. ### The verifier's first step At this point, the verifier will have to interpret the prover's data as the evaluations of _some_ univariate polynomial $h(X)$ of degree less than $2^{k'}$ on $D'$, which moreover vanishes on $D$; the verifier will then have to sample an element $s' \gets \mathcal{T}_\tau$ and determine $h(s')$. The naïve way of doing this would be for the verifier to do a standard barycentric extrapolation. This would take $\Theta(d \cdot 2^k)$ work. In order to make this succinct, Angus and I had the following idea. The prover and verifier instead run Ulrich's protocol above. In particular, the prover must commit to the "virtual" univariate oracle $h(X)$; the verifier then learns its value at $s'$. From the point of view of the prover, this requires a commitment of a $k'$-variate multilinear $h'(X_0, \ldots , X_{k' - 1})$, as well as a run of fractional GKR on something of size $2^{k'}$. Finally, in order to check that $h(X)$ vanishes identically on $D \subset K$, the verifier may moreover query $h'(r'_0, \ldots , r'_{k - 1}, 0, \ldots , 0)$, for $r' \gets K^k$ random. This is essentially the "Schwartz–Zippel test" for identical vanishing of multilinears; it proves that the partially evaluated multilinear $h'(X_0, \ldots , X_{k - 1}, 0, \ldots , 0)$ vanishes identically on $\mathcal{B}_k$, which is what we want to show. ### The verifier's last step In the very last step, the verifier is interested in learning, for each $i \in \{0, \ldots , n -1 \}$, the value $\hat{M}_i(s', r'_0, \ldots , r'_{\nu - k - 1})$, where the values $(s', r'_0, \ldots , r'_{\nu - k - 1})$ are sampled during the (rectangular) sumcheck. The observation of @jimpo-ulvt is that, for each $i \in \{0, \ldots , n -1 \}$, the desired virtual quantity $\hat{M}_i(s', r'_0, \ldots , r'_{\nu - k - 1})$, stands in relation to the committed multilinears $\left( M_i( u_0, \ldots , u_{k - 1}, r'_0, \ldots , r'_{\nu - k - 1}) \right)_{u \in \mathcal{B}_k}$ as: \begin{equation}\hat{M}_i(s', r'_0, \ldots , r'_{\nu - k - 1}) = \sum_{u \in \mathcal{B}_k} \delta_D(\{u\}, s') \cdot M_i(u_0, \ldots , u_{k - 1}, r'_0, \ldots , r'_{\nu - k - 1}).\end{equation} In the worst case, the verifier can always learn this right-hand quantity manually. We'll call this: **Option 1.** The verifier, for each $i$, makes $2^k$ queries to the (classical, hypercube) oracle, in order to learn the values $\left( M_i(u_0, \ldots , u_{k - 1}, r'_0, \ldots , r'_{\nu - k - 1}) \right)_{u \in \mathcal{B}_k}$, and then finally does a Lagrange extrapolation of size $2^k$ at the point $s' \in K$ on this data. Using barycentric extrapolation, this latter task should be doable in $\Theta(2^k)$ time (per $M_i$); on the other hand, getting the $2^k$ evaluations could be painful. @jimpo-ulvt has noted that a sumcheck can also be run; that is, we can define \begin{equation}S(Y_0, \ldots , Y_{k - 1}) := \tilde{\delta}_{D, s'}(Y_0, \ldots , Y_{k - 1}) \cdot M_i(Y_0, \ldots , Y_{k - 1}, r'_0, \ldots , r'_{\nu - k - 1}),\end{equation} where $\tilde{\delta}_{D, s'}$ is the MLE of the function which, on the hypercube, acts by sending $u \mapsto \delta_D(\{u\}, s')$. The benefit of doing this is that we only need to evaluate each $M_i$ once, as opposed to $2^k$ times. Jim's approach is the following: **Option 2.** Prover and verifier sumcheck $S(Y_0, \ldots , Y_{k - 1})$. In the very final step, the verifier will need to evaluate $\tilde{\delta}_{D, s'}(r''_0, \ldots , r''_{k - 1})$, for some point $(r''_0, \ldots , r''_{k - 1})$ sampled during the sumcheck. To do this, the verifier locally evaluates and collects the $2^k$ evaluations $\left( \tilde{\delta}_{D, s'}(u) \right)_{u \in \mathcal{B}_k} = \left( \delta_D(u, s') \right)_{u \in D}$. The desired final result can then be learned by tensor-expanding $(r''_0, \ldots , r''_{k - 1})$, and then taking the standard multilinear evaluation $\bigotimes_{i = 0}^{k - 1} ( 1 - r_i, r_i) \cdot \left( \delta_D(u, s') \right)_{u \in D}$. Efficiency-wise, the verifier can learn the above dot-product using the [barycentric algorithm](https://people.maths.ox.ac.uk/trefethen/barycentric.pdf) (specifically, we can use the [division-free variant](https://gitlab.com/IrreducibleOSS/binius/-/blob/main/crates/core/src/polynomial/univariate.rs?ref_type=heads#L100-108)). Indeed, note that the desired dot-product is nothing other than the _Lagrange extrapolation_ of the _univariate_ polynomial which takes the values $\bigotimes_{i = 0}^{k - 1} ( 1 - r_i, r_i)$ over the domain $D$, at the point $s'$. Upon computing this vector of "evaluations", namely the tensor, which takes $O(2^k)$ time, the rest of the work is a "standard" barycentric extrapolation (in disguise), which again takes $O(2^k)$ time (assuming precomputation dependent on $D$). The sumcheck itself should be efficient for the verifier, just $O(k)$ work. The third option is to note that this setting _again_ plugs exactly into the fractional GKR setting dealt with by Ulrich. Even better, we now have the relevant "multilinears" $M_i(Y_0, \ldots , Y_{k - 1}, r'_0, \ldots , r'_{\nu - k - 1})$ ready and committed to; indeed, we can evaluate these anywhere with one query, which is what we need. **Option 3.** Prover and verifier run fractional sumcheck on the $k$-variate multilinears $M_i(Y, r')$ and $e(X) - s'$ as above. At the end, the verifier will have to make one query to $M_i(r'', r')$, as well as locally evaluate both $W_D(s')$ and $e(r'')$, where $r''$ is a random point sampled during the sumcheck. It can do all of this, including the fractional sumcheck, in $O(k)$ work. ## A discussion of costs For parameters $k$ and $\nu$, the costs are essentially like this. If we use one of the non-succinct approaches above, then there isn't much to say; the essential point is that we can't let $k$ go beyond something small (small enough that the verifier can do $\Theta(2^k)$ work). If we do everything fully succinct—that is, the prover and verifier use Ulrich's idea in both the first round and the last—then the cost tradeoff is essentially like this. The prover need to commit to an auxiliary multilinear of size $\Theta(d \cdot 2^k)$; the prover and verifier then need to do a fractional GKR (which looks like a number of sumchecks of degree 3) on a pair of multilinears of size $\Theta(d \cdot 2^k)$. The verifier's work is only $O(\log(d \cdot 2^k))$. At the end, they need to do a final further fractional GKR of size $2^k$, for each oracle $i$ (though these can probably be batched). The benefit is that the positive-indexed rounds $\{1, \ldots , k - 1\}$ get completely eliminated from the zerocheck. These rounds represent the bulk of the prover's work, since they need to be carried out over the big field. ### Should we take this to the extreme? If we can make this succinct for the verifier, why not let $k = \nu$? This is a bit problematic, since we are now replacing a zerocheck of size $2^\nu$ by a fractional sumcheck of size $d \cdot 2^\nu$. _And yet_ in certain extreme cases, even this might make sense. The point is that the fractional GKR we run is going to involve a sumcheck with constraints of degree 3, which moreover are relatively simple. If the initial constraint is very complicated and the field is small, it might make sense to do an enormous amount of small-field constraint evaluations, followed by a larger sumcheck, but where the constraint is relatively simple. In any case, I suspect that the most practically useful value for $k$ will be around 4–6, in which point it's conceivable that the succinct tricks might not even be necessary.