# Understanding Binius, Part II In the [previous article](https://hackmd.io/@l2iterative/binius), we discussed Ligero-Binius, which is the Binius-family proof system released in 2023 by the [Irreducible](https://www.irreducible.com/) team. This year, the Irreducible team released another Binius-family proof system, FRI-Binius, which is a direct competitor to [Plonky3](https://github.com/Plonky3/Plonky3), the proof system from Polygon Zero. FRI-Binius improves over Ligero-Binius by having a smaller verification overhead, making itself more suitable for recursive proof verification, while the prover efficiency remains competitive. FRI-Binius consists of many new tools that were first used in binary fields. As a result, it requires a very different introduction and we need to provide the necessary background. ## Commit a polynomial Let us start with the [FRI](https://drops.dagstuhl.de/storage/00lipics/lipics-vol107-icalp2018/LIPIcs.ICALP.2018.14/LIPIcs.ICALP.2018.14.pdf) protocol, which stands for "Fast Reed-Solomon Interactive Oracle Proofs of Proximity" (abbreviated as "Fast RS IOPP" and therefore FRI). Here, we will present the more modern version of FRI from [BaseFold](https://eprint.iacr.org/2023/1705.pdf), combined with the generalization in Binius. Similar to our treatment of Ligero in [the previous article](https://hackmd.io/@l2iterative/binius), we will first talk about how one commits a polynomial in FRI, and then how to open the polynomial at a specific point (which would be more complicated). We now present the general version of commitment in FRI. The handling that is specific to binary fields would be discussed later. <style> img[src*='#center'] { display: block; margin: auto; } </style> **Committing the polynomial into a single hash.** To commit a multilinear polynomial $t(\vec{x})$ that takes a vector $\vec{x}$ of length $\ell$. we first compute its evaluations over the boolean hypercube, which takes $\vec{x}$ from $(0, 0, 0, ..., 0)$ to $(1, 1, 1, ..., 1)$, covering all possible $\vec{x}$ using only 0 and 1. ![tx](https://hackmd.io/_uploads/BydQyLS71x.gif#center =35%x) The evaluations are arranged into a row, as shown above in $2^\ell$ blue blocks. This can uniquely define the multilinear polynomial $t(x)$. Then, we apply a "foldable" error-correcting code. We will discuss what foldability means later, and for now we just treat it as a regular error-correcting code. This creates another polynomial $f(\vec{x})$ that takes a vector $\vec{x}$ of length $\ell + \mathcal{R}$ where we call the nonnegative integer $\mathcal{R}$ the log inverse rate. In other words, this polynomial is now $2^\mathcal{R}$ times the size of $t(\vec{x})$, but it can be decoded back to $t( \vec{x})$. In practice, $\mathcal{R}$ can take 1, 2, 4, meaning that $f(\vec{x})$ at least doubles in size compared with $t(\vec{x})$. ![encode](https://hackmd.io/_uploads/HJnROvSm1g.png#center =60%x) The commitment of the polynomial $t(\vec{x})$ is the Merkle tree root hash of the $2^{\ell+\mathcal{R}}$ entries shown above of $f(\vec{x})$. Compared with Ligero, FRI does not split the $2^{\ell}$ evaluations over the boolean hypercube into a matrix of $2^{\ell_0} \times 2^{\ell_1}$ but instead encode all the $2^{\ell}$ elements directly. This would result in two differences between Ligero and FRI: - If the error-correcting code is Reed-Solomon code, for the commitment phase, Ligero is expected to be about $2\times$ faster because Reed-Solomon encoding overhead is $O(n\log n)$, and the ability for Ligero to encode smaller chunks $2^{\ell_1}\approx \sqrt{2^{\ell}}$ reduces its overall encoding overhead. This, however, does not hold if one uses an error-correcting code with $O(n)$ encoding time, but these codes, often from expander graphs, seem to lack the nice folding structure that we desire. - As we will show in the opening protocol, FRI's proof size will be smaller than that of Ligero because FRI does not need to send an entire row (of size $2^{\ell_1}$) or a few columns (each of size $2^{\ell_0}$), which amounts to $O(\lambda\cdot \sqrt{2^{\ell}})$, but instead only the Merkle tree paths to a few locations in the entire row, which amounts to $O(\lambda\cdot \ell)$. **Opening the polynomial at a point $\vec{\beta}$.** To open the polynomial at a point $\vec{\beta}$, which is a vector of length $\ell$, it is equivalent to computing a linear combination over the evaluations of $t(\vec{x})$ over the boolean hypercube (i.e., the blue blocks above in the animations). Specifically, define a function $W_\beta(\vec{x})$ that gives the weight for each $\vec{x}\in\{0,1\}^\ell$ over the boolean hypercube. $$W_\beta(\vec{x}) = \prod_{i=0}^{\ell-1}(1-\beta_i)(1-x_i) + \beta_i\cdot x_i$$ The desired evaluation $t(\vec{\beta})$ is exactly the inner product (aka the dot product) of $t(\vec{x})$ and $W_\beta(\vec{x})$ over the evaluations on the boolean hypercube. $$t(\vec{\beta}) = \sum_{\vec{x}\in\{0,1\}^\ell} t(\vec{x})\cdot W_\beta(\vec{x})$$ And the computation using linear combination can be illustrated in the following animation. ![tr](https://hackmd.io/_uploads/rJ-dZaHX1l.gif#center =35%x) One can think of this way of evaluation by applying a linear combination as "playing the piano". If you assume that the $2^{\ell}$ blue blocks constitute the keys of a giant "piano", then the evaluation is done by pressing all the keys on the piano at once by applying different forces (aka, weights as in $W(\vec{x})$), creating a chord that represents $t(\vec{\beta})$. So far, we know how $t(\vec{\beta})$ can be computed by the prover, but we have not shown how a prover can convince a verifier about the value of $t(\vec{\beta})$ without sending the entire $t(\vec{x})$ polynomial to the verifier. BaseFold and Binius present an idea that, to check $t(\vec{\beta})$, we can instead check the sum of a special polynomial $h(\vec{x})$. $$h(\vec{x}) = t(\vec{x})\cdot W_\beta(\vec{x})$$ By definition, the sum of $h(\vec{x})$ over the boolean hypercube is $t(\vec{\beta})$. $$\sum_{\vec{x}\in\{0,1\}^\ell} h(\vec{x}) = t(\vec{\beta})$$ And it is exactly what the verifier wants to know. Now, for the prover to convince the verifier about $t(\vec{\beta})$, the prover can instead show that the sum of $h(\vec{x})$ equals to the claimed value of $t(\vec{\beta})$. ## Sumcheck To prove the sum, the prover and the verifier engage in a [sumcheck](https://lance.fortnow.com/papers/files/ip.pdf) protocol, at the end of which the verifier should be convinced with the sum. The challenge is that the boolean hypercube has a high dimensionality, $\ell$, and the prover cannot simply send the polynomial and let the verifier compute the sum itself. The sumcheck protocol solves this challenge. It starts from an $\ell$-dimension boolean hypercube. And it reduces the dimensionality of the boolean hypercube one by one, until the sum becomes trivial, as illustrated below. $$\sum_{\vec{x}\in\{0,1\}^\ell}\rightarrow \sum_{\vec{x}\in\{0,1\}^{\ell-1}}\rightarrow \sum_{\vec{x}\in\{0,1\}^{\ell-2}}\rightarrow\cdots\rightarrow \sum_{\vec{x}\in\{0,1\}^{2}}\rightarrow \sum_{\vec{x}\in\{0,1\}}$$ We start with the first question that the verifier asks the prover, which is about $h(\vec{x})$ and a claimed value $s_0$ from the prover. The question is whether they are equal. $$\text{Question 1:}~\sum_{\vec{x}\in\{0,1\}^\ell} h(\vec{x}) \stackrel{?}{=} s_0$$ If so, then by definition, $s_0 = t(\vec{\beta})$. The sumcheck protocol aims to reduce this question to another question with lower dimensionality. Observe that, by definition, one can split the sums over an $\ell$-dimension boolean hypercube into two sums, each over a smaller, $(\ell-1)$-dimension boolean hypercube. $$\sum_{\vec{x}\in\{0,1\}^\ell} h(\vec{x}) = h_0 + h_1$$ where $h_0 = \sum_{\vec{x}\in\{0,1\}^{\ell-1}} h(0\parallel \vec{x})$ and $h_1=\sum_{\vec{x}\in\{0,1\}^{\ell-1}}h(1\parallel\vec{x})$. Here $\parallel$ denotes concatenation, here appending 0 or 1 at the beginning of a vector. In other words, the boolean hypercube is split by half in the first dimension, and the sum can be seen as adding the individual sums of the two sub-dimensions. In the sumcheck protocol, the prover tells the verifier $h_0$ and $h_1$, and the verifier will check if $h_0+h_1=s_1$. If so, the verifier chooses a random number $r_0$ from a large enough field and interpolates another sum: $$s_1 = (1-r_0)\cdot h_0 + r_0\cdot h_1$$ and the verifier will then ask the prover another question. $$\text{Question 2:}~\sum_{\vec{x}\in\{0,1\}^{\ell-1}}h(r_0\parallel\vec{x}) \stackrel{?}{=} s_1$$ And note that the dimensionality has been reduced by 1. Repeat the same process above. We can similarly sample another random number $r_1$ and have the next question where the dimensionality is again reduced by 1. $$\text{Question 3:}~\sum_{\vec{x}\in\{0,1\}^{\ell-2}}h(r_0, r_1\parallel\vec{x}) \stackrel{?}{=} s_2$$ Continue with the protocol until the entire hypercube vanishes. We end up with the final claim, which is no longer a sum but a very specific point on $h(\vec{x})$. $$\text{Question $\ell + 1$:}~h(r_0, r_1, ..., r_{\ell-1}) = h(\vec{r}) \stackrel{?}{=} s_\ell$$ Since $h(\vec{x}) = t(\vec{x}) \cdot W_\beta(\vec{x})$ and the verifier can compute $W_\beta(\vec{r})$ efficiently, this question can be further reduced into: $$\text{Final Question:}~t(r_0, r_1, ..., r_{\ell-1}) = t(\vec{r}) \stackrel{?}{=} s^*$$ where $s^*$ is $s_\ell$ divided by $W_\beta(\vec{r})$. In summary, our original goal is to open the polynomial $t(\vec{x})$ at a point $\vec{\beta}$, and we convert this into a question related to $h(\vec{x}) = t(\vec{x})\cdot W_\beta(\vec{x})$. We then perform the sumcheck protocol over $h(\vec{x})$, ending up with a leftover task of opening $t(\vec{x})$ at the random point $\vec{r}$. The interaction between the prover and the verifier can be summarized in the following animation. ![sumcheck-seq](https://hackmd.io/_uploads/rk2tLyI7yg.gif#center =40%x) So far it seems that sumcheck is not making any progress because we still need to open a point, just a different one: not $\vec{\beta}$, but $\vec{r}$. Before we dive into how to open at $\vec{r}$, we want to remark that this protocol can batch a number of points together into a single random point. Say that we want to open $t(\vec{x})$ on three points $\vec{\beta}$, $\vec{\gamma}$, and $\vec{\delta}$, we can build one single $h(\vec{x})$: $$h(\vec{x}) = t(\vec{x})\cdot (W_\beta(\vec{x}) + \eta_1\cdot W_\gamma(\vec{x}) + \eta_2 \cdot W_\delta(\vec{x}))$$ where $\eta_1$ and $\eta_2$ are random numbers sampled by the verifier. The sum of $h(\vec{x})$ over the boolean hypercube is expected to be $t(\vec{\beta}) + \eta_1\cdot t(\vec{\gamma}) + \eta_2 \cdot t(\vec{\delta})$. Interestingly, although we start with three points, the sumcheck protocol also reduces it into $t(\vec{r})$ for a random point $\vec{r}$. ## A seemingly useless protocol Before we unveil the secret method to open $t(\vec{x})$ at $\vec{r}$, we need to first present a seemingly useless protocol from BaseFold, which would serve as the foundation. **The seemingly useless protocol.** There is a seemingly useless protocol that enables the verifier to learn a point $\vec{r'}$ on $t(\vec{x})$, but the verifier cannot decide a specific $\vec{r'}$. In fact, for the protocol to work securely, the verifier needs to choose $\vec{r'}$ completely randomly. This is essentially similar to opening a mystery box. Every time the verifier runs the protocol, it learns a point of the polynomial, but it cannot pick the point. ![mystery-box](https://hackmd.io/_uploads/H1s59JIQkg.gif#center =40%x) At a high level, the protocol "folds" the polynomial $t(\vec{x})$ for $\ell$ times, each time using a fresh new random number $r'_i$. Each folding step reduces the dimensionality of the polynomial by 1, and therefore, after $\ell$ times of folding, the polynomial $t(\vec{x})$ becomes a constant, which happens to be $t(\vec{r'})$. The verifier checks that the prover has performed the folding correctly and honestly. Since the numbers $\{r'_i\}_{i=0, 1, ..., \ell-1}$ used in the folding steps must be random, this protocol, as mentioned above, can only allow the verifier to learn $t(\vec{r'})$ for a random point, but not a predetermined one. **Folding.** How does folding work? For a polynomial $g^{(0)}(\vec{x})\rvert_{\vec{x}\in\{0,1\}^\ell}$, one can fold it into a smaller polynomial $g^{(1)}(\vec{x})\rvert_{\vec{x}\in\{0,1\}^{\ell-1}}$ that takes a shorter $\vec{x}$ as input. We desire that the folding is done in such a way that the folded polynomial has some meaningful connection to the original polynomial. Folding can be done many times, and each time it reduces the dimensionality by one. $$g^{(0)}(\vec{x})\rvert_{\vec{x}\in\{0,1\}^{\ell}}\rightarrow g^{(1)}(\vec{x})\rvert_{\vec{x}\in\{0,1\}^{\ell-1}}\rightarrow\cdots\rightarrow g^{(\ell-1)}(\vec{x})\rvert_{\vec{x}\in\{0,1\}}\rightarrow g^{(\ell)}$$ If we fold $g^{(0)}(\vec{x})$ for $\ell$ times, we end up with $g^{(\ell)}$ which is just a constant and does not have any free variable. In our protocol, we do not fold $t(\vec{x})$, but instead, we fold its encoded form, $f(\vec{x})$, and in a special way that makes the folded result meaningful. **Foldable error-correcting code.** Let $f^{(0)}(\vec{x}) = f(\vec{x})$ be the encoded form of $t^{(0)}(\vec{x}) = t(\vec{x})$. We require a foldable error-correcting code with a folding algorithm such that, if we fold $f^{(0)}(\vec{x})\rvert_{\vec{x}\in\{0,1\}^{\ell+\mathcal{R}}}$ using the random number $r'_0$, we obtain another foldable encoded form $f^{(1)}(\vec{x})\rvert_{\vec{x}\in\{0,1\}^{\ell+\mathcal{R}-1}}$ which is exactly the encoded form of $t^{(1)}(\vec{x})\rvert_{\vec{x}\in\{0,1\}^{\ell-1}}$ defined as follows: $$t^{(1)}(\vec{x}) = t(r'_0\parallel \vec{x})$$ The verifier will be the one to choose the random numbers $\{r_i'\}_{i=0,1,...,\ell-1}$ and send them to the prover who performs the folding. After $\ell$ times, we have $f^{(\ell)}(\vec{x})\rvert_{\vec{x}\in\{0,1\}^\mathcal{R}}$ which is supposed to be a constant function, and this constant should equal to: $$t^{(\ell)} = t(r'_0, r'_1, ..., r'_{\ell-1}) = t(\vec{r'})$$ And the verifier just needs to verify that the prover has performed the folding correctly. This is done by having the prover commits to the intermediate result of folding and allows the verifier to check random locations of the intermediate result, thereby making the folding verifiable. **Verifying the folding.** The folding verification follows the low-degree testing protocol in FRI. To make folding verifiable, in each step of folding, the prover creates a Merkle tree of the intermediate folded result $f^{(i)}(\vec{x})$ and sends the corresponding Merkle tree root hash to the verifier. After all the $\ell$ steps are done, the verifier receives the claimed value of $t(\vec{r'})$. The verifier wants to check that: - each step of folding is correct - after $\ell$ steps of folding, $f^{(\ell)}(\vec{x})\rvert_{\vec{x}\in\{0,1\}^\mathcal{R}}$ is a constant function - this constant is the claimed value of $t(\vec{r'})$ This is done by having the verifier choose random locations $q_1$, $q_2$, ..., $q_k$ all in $\{0,1\}^{\ell+\mathcal{R}}$, and for each location $q_i\in\{0,1\}^{\ell+\mathcal{R}}$, it opens the Merkle tree and checks if folding has been performed correctly related to that location, across all these steps, and checks if it ends up reaching the claimed $t(\vec{r'})$. We desire the folding scheme to have good locality, such that the verification only needs to access a small portion of the Merkle tree and the corresponding intermediate results. Any folding scheme that satisfies these requirements can be used. Overall, this seemingly useless protocol has the following interactions between the prover and the verifier (folding verification not included). The random numbers that the verifier chooses, $\{r_i'\}_{i=0,1,...,\ell-1}$, affect the constant after $\ell$ steps of folding, which is $t(\vec{r'})$. ![seemingly-seq](https://hackmd.io/_uploads/rktWtl8Xke.gif#center =40%x) But, the seemingly useless protocol still seems to be useless, because the sumcheck protocol reduces into opening $t(\vec{x})$ at a random point $\vec{r}$, chosen randomly by the verifier, and the present protocol reduces into opening $t(\vec{x})$ at another random point $\vec{r'}$, also chosen randomly by the verifier. Is there a way to make $\vec{r} = \vec{r'}$ so that the protocol can answer the leftover question from the sumcheck protocol and conclude the entire polynomial opening? It turns out to be yes. ## Stick the two protocols together Note that in both protocols $\vec{r}$ or $\vec{r'}$ are chosen completely randomly by the verifier. The verifier can, in fact, flip coins in open air to decide $\vec{r}$ and $\vec{r'}$, as long as they are sufficiently random to achieve the desired level of security. The verifier's coin toss does not need to depend on anything else. These are formally called [public-coin](https://en.wikipedia.org/wiki/Interactive_proof_system#Public_coin_protocol_versus_private_coin_protocol) protocols. The two protocols here, sumcheck and folding, also do not depend on each other. Note that sumcheck starts with a claim from the prover about $t(\vec{r})$, while folding is independent from $\vec{r}$ and only involves the commitment to the encoded form $f(\vec{r})$. Therefore, we can stick these two protocols together in a way so that they can share the same random numbers, aka $\vec{r} = \vec{r'}$, resulting in the following combined protocol. BaseFold and FRI-Binius both provide a formal proof showing that both protocols remain secure. ![combined-seq](https://hackmd.io/_uploads/BkV3gqLXJl.gif#center =45%x) What interests us is that this combined protocol reduces the sumcheck into opening $t(\vec{r})$ as well as makes the final folded result $t(\vec{r})$ available. This allows us to prove the original statement about $t(\vec{\beta})$, which finishes the polynomial opening protocol. ## Reconcile FRI with small fields So far we have covered the general idea of FRI. Before Binius, one who wants to use FRI over small fields such as $\mathbb{F}_2$ will encounter a few challenges. - FRI requires some sort of foldable error-correcting code, and we often desire Reed-Solomon codes due to a number of benefits, but like Ligero-Binius, $\mathbb{F}_2$ does not have meaningful Reed-Solomon codes. And simply packing, for example, $2^k$ $\mathbb{F}_2$ elements into $\mathbb{F}_{2^k}$ to allow Reed-Solomon code, as we did in Ligero-Binius, is still not enough for the full proof system, because we need a special folding structure that allows us to have $t(\vec{r})$. - To achieve a sufficient level of security, $\vec{r}$ needs to be random enough, and therefore they need to come from large enough fields. This means that during the computation, it will involve a lot of small-large field multiplications. Recall that in Ligero-Binius, the approach is to use a larger extension of $\mathbb{F}_2$, called towers. Every 16 elements are packed together before encoding. FRI-Binius follows a similar approach and has every 128 elements in $\mathbb{F}_2$ packed into a single element in $\color{red}{\tau_7'}$, and the Reed-Solomon code will be defined over this field. $\color{red}{\tau_7'}$ has the same structure as $\color{blue}{\tau_7}$ but we want to distinguish it for now, considering it as the "packed" representation. This creates another polynomial $\color{red}{t'}(\vec{x})\rvert_{\vec{x}\in\{0,1\}^{\ell-7}}$ where each element is $\color{red}{\tau'_7}$ and the values are defined over a boolean hypercube of dimensionality $(\ell-7)$. The point $\color{blue}{\vec{\beta}}$ in Binius will be sampled from $\color{blue}{\tau_7}$. One can consider $\color{blue}{\tau_7}$ as the field extension using symbols $X_6, X_5, ..., X_0$. $$\begin{array}{l}\color{blue}{\tau_0} = \mathbb{F}_2\\\color{blue}{\tau_1} = \color{blue}{\tau_0}[X_0] / (X_0^2 + X_0 + 1)\cong\mathbb{F}_{2^2}\\\color{blue}{\tau_2} = \color{blue}{\tau_1}[X_1]/(X_1^2 + X_0\cdot X_1 + 1)\cong \mathbb{F}_{2^4}\\...\\\color{blue}{\tau_7}=\color{blue}{\tau_6}[X_6]/(X_6^2+X_5\cdot X_6 + 1)\cong \mathbb{F}_{2^{128}}\end{array}$$ And $\color{red}{\tau_7'}$ is the same field extension using symbols $Y_6, Y_5, ..., Y_0$, and it is intended for the packed representations used in error-correcting code. $$\begin{array}{l}\color{red}{\tau'_0} = \mathbb{F}_2\\\color{red}{\tau'_1} = \color{red}{\tau'_0}[Y_0] / (Y_0^2 + Y_0 + 1)\cong\mathbb{F}_{2^2}\\\color{red}{\tau'_2} = \color{red}{\tau'_1}[Y_1]/(Y_1^2 + Y_0\cdot Y_1 + 1)\cong \mathbb{F}_{2^4}\\...\\\color{red}{\tau'_7}=\color{red}{\tau'_6}[Y_6]/(Y_6^2+Y_5\cdot Y_6 + 1)\cong \mathbb{F}_{2^{128}}\end{array}$$ Now, to evaluate $t(\color{blue}{\vec{\beta}})$ with $\color{blue}{\vec{\beta}}\in(\color{blue}{\tau_7})^\ell$, we first remove the first 7 elements of $\color{blue}{\vec{\beta}}$ and has a truncated version $\color{blue}{\vec{\beta'}}\in({\color{blue}{\tau_7}})^{\ell-7}$. We compute $\color{red}{t'}(\color{blue}{\vec{\beta'}})$ using linear combinations as usual, but now it involves two different fields. Over the boolean hypercube, $\color{red}{t'}(\vec{x})$ is $\color{red}{\tau_7'}$, while $\color{blue}{W_{\beta'}}(\vec{x})$ is $\color{blue}{\tau_7}$. $$\color{red}{t'}(\color{blue}{\vec{\beta'}}) = \sum_{\vec{x}\in\{0,1\}^{\ell-7}} \color{red}{t'}(\vec{x})\cdot \color{blue}{W_{\beta'}}(\vec{x})$$ The resultant $\color{red}{t'}(\color{blue}{\vec{\beta'}})$ is called [an algebra over the field](https://en.wikipedia.org/wiki/Algebra_over_a_field) that has $128\times 128$ elements and involves both $X$ and $Y$ symbols. It is easier to express $\color{red}{t'}(\color{blue}{\vec{\beta'}})$ as a matrix: ![matrix2](https://hackmd.io/_uploads/SJpe_Y0Q1g.png#center =50%x) This matrix has 128 rows and 128 columns, and in total consists of 16384 cells. Each cell is just a single bit in $\mathbb{F}_2$. Given this matrix from the prover, the verifier can compute $t(\color{blue}{\vec{\beta}})$ from $\color{red}{t'}(\color{blue}{\vec{\beta'}})$ by applying a linear combination over the columns. The weight being applied to the $q$-th column is $\prod_{i=0}^{6} (1-\color{blue}{\beta_i})(1-q_i) + \color{blue}{\beta_i}\cdot q_i$, where $\{q_i\}_{i=0,1,...,6}$ are the bit representation of $q$. It is easy to see that the result of the linear combination is exactly $t(\color{blue}{\vec{\beta}})$, defined over $\color{blue}{\tau_7}$. It is not $\color{red}{\tau_7'}$ because the packing has been removed. Alternatively, one can treat such a linear combination as replacing $Y_0$, $Y_1$, ..., $Y_6$ with the following values, sum the columns together, and multiplies the sum with $\prod_{i=0}^{6}(1-\color{blue}{\beta_i})$. $$\begin{array}{cccc}Y_0 = \frac{\color{blue}{\beta_0}}{1-\color{blue}{\beta_0}} & Y_1 =\frac{\color{blue}{\beta_1}}{1-\color{blue}{\beta_1}} & Y_2 = \frac{\color{blue}{\beta_2}}{1-\color{blue}{\beta_2}} & Y_3 = \frac{\color{blue}{\beta_3}}{1-\color{blue}{\beta_3}}\\Y_4 = \frac{\color{blue}{\beta_4}}{1-\color{blue}{\beta_4}} & Y_5 = \frac{\color{blue}{\beta_5}}{1-\color{blue}{\beta_5}} & Y_6 = \frac{\color{blue}{\beta_6}}{1-\color{blue}{\beta_6}}\end{array}$$ Each replacement shrinks the number of columns by half, resulting in a single column, representing an element in $\color{blue}{\tau_7}$, which is $t(\color{blue}{\vec{\beta}})$. ![matrix2](https://hackmd.io/_uploads/SkXV290mJl.gif#center =50%x) Note that the need to use $\color{blue}{\tau_7}$ and $\color{red}{\tau'_7}$ is exactly because we do need the matrix representation for $\color{red}{t'}(\color{blue}{\vec{\beta'}})$ in order to recover $t(\color{blue}{\vec{\beta}})$ from the packed representation, otherwise, if we set both in $\color{red}{\tau_7'}$, the resulting representation (which would be a single row) is indeed not sufficient to recover $t(\color{red}{\vec{\beta}})$. Merely applying the weights to each column in that single row would not actually work because $\color{red}{\vec{\beta'}}$ somewhat mixes the different entries in the packed representation with each other, which breaks the packed representation. Using both $\color{blue}{\tau_7}$ and $\color{red}{\tau'_7}$ avoids $\color{blue}{\vec{\beta'}}$ from breaking the packed representations. Now that the verifier can compute $t(\color{blue}{\vec{\beta}})$ from $\color{red}{t'}(\color{blue}{\vec{\beta'}})$ provided by the prover, the remaining question is to check if this matrix, $\color{red}{t'}(\color{blue}{\vec{\beta'}})$, is correct, and as we discussed above, we want to use the sumcheck protocol to convert this into finding $\color{red}{t'}(\vec{r})$ for some random $\vec{r}$. The value of $\color{red}{t'}(\vec{r})$ can be found out using the folding protocol that runs alongside. The sumcheck protocol starts with a polynomial $\color{purple}{h}(\vec{x})$ as follows: $$\color{purple}{h}(\vec{x}) = \color{red}{t'}(\vec{x})\cdot \color{blue}{W_{\beta'}}(\vec{x})$$ And we consider its sum over the $(\ell-7)$-dimension boolean hypercube. $$\sum_{\vec{x}\in\{0,1\}^{\ell-7}} \color{purple}{h}(\vec{x}) = \color{red}{t'}(\color{blue}{\vec{\beta'}})$$ The sumcheck protocol for this could work by choosing $\color{blue}{\vec{r}}\in(\color{blue}{\tau_7})^{\ell-7}$ but it immediately feels less efficient because all the intermediate sums are $128\times128$ matrices. This also impacts the folding protocol: after folding $\color{red}{f^{(0)}}(\vec{x})$ with $r_0$, the folded result $\color{purple}{f^{(1)}}(\vec{x})$ consists of matrices as $\color{blue}{\tau_7}\times\color{red}{\tau'_7}$ all over the boolean hypercube, and the overhead of committing to the matrices is basically 128 times the overhead of committing $\color{blue}{\tau_7}$ or $\color{red}{\tau'_7}$. To avoid matrices, it is important to have $\color{red}{\vec{r}}$ in $(\color{red}{\tau_7'})^{\ell-7}$ instead of $(\color{blue}\tau_7)^{\ell-7}$. This immediately helps both the folding protocol because now it only deals with $\color{red}{\tau_7'}$ and the blowup disappears, but it does not help with the sumcheck. The sumcheck protocol still suffers from having matrices because each evaluation of $\color{purple}{h}(\vec{x})$ over the boolean hypercube $\{0,1\}^{\ell-7}$ is already a matrix. Remember that $\color{purple}{h}(\vec{x}) = \color{red}{t'}(\vec{x})\cdot \color{blue}{W_{\beta'}}(\vec{x})$. - $\color{red}{t'}(\vec{x})$ has evaluations in $\color{red}{\tau_7'}$ over the boolean hypercube, and its evaluation over $\color{red}{\vec{r}}$ would be also in $\color{red}{\tau_7'}$, which is great. - $\color{blue}{W_{\beta'}}(\vec{x})$ has evaluations $\color{blue}{\tau_7}$ over the boolean hypercube because $\color{blue}{\vec{\beta'}}\in(\color{blue}{\tau_7})^{\ell-7}$, but this means that its evaluation over $\color{red}{\vec{r}}\in(\color{red}{\tau_7'})^{\ell-7}$ would become $\color{blue}{\tau_7}\times\color{red}{\tau_7'}$, the matrices. How do we get rid of $\color{blue}{\tau_7}$ that comes from $\color{blue}{W_{\beta'}}(\vec{x})$? Remember that the sumcheck protocol, which we use to reduce $t(\vec{\beta})$ to $t(\vec{r})$, can also batch multiple into one. For example, if we want to check $t(\vec{\beta})$, $t(\vec{\gamma})$, and $t(\vec{\delta})$, we can use randomizers $\eta_1$ and $\eta_2$ to batch them. $$h(\vec{x}) = t(\vec{x})\cdot (W_\beta(\vec{x}) + \eta_1\cdot W_\gamma(\vec{x}) + \eta_2 \cdot W_\delta(\vec{x}))$$ And we will be able to still reduce the sum of $h(\vec{x})$ over the boolean hypercube into calculating $t(\vec{r})$ (through folding) as well as $W_\beta(\vec{r})$, $W_\gamma(\vec{r})$, and $W_\delta(\vec{r})$ (which the verifier can compute directly). It inspires us to do the following. Note that $\color{blue}{W_{\beta'}}(\vec{x})$ has evaluations $\color{blue}{\tau_7}$ over the boolean hypercube. We can split each $\color{blue}{\tau_7}$ evaluation of $\color{blue}{W_{\beta'}}(\vec{x})$ into 128 polynomials, denoted as $\{W_{\beta'}^{(i)}(\vec{x})\}\rvert_{i=0,1,...,127}$, where the evaluations of each of these polynomials over the boolean hypercube are all $\mathbb{F}_2$. Doing so we remove all the $X$ symbols. We can set $\color{red}{\eta_0}=1$, have the verifier sample 127 randomizers, say $\color{red}{\eta_1}, \color{red}{\eta_2}, ..., \color{red}{\eta_{127}}$ all in $\color{red}{\tau_7'}$, and consider the following construction of $\color{red}{h}(\vec{x})$. $$\color{purple}{h}(\vec{x}) = \color{red}{t'}(\vec{x})\cdot \color{blue}{W_{\beta'}}(\vec{x}) ~~~\longrightarrow~~~\color{red}{h}(\vec{x}) = \color{red}{t'}(\vec{x})\cdot \sum_{i=0}^{127}\color{red}{\eta_i}\cdot W_{\beta'}^{(i)}(\vec{x})$$ The sumcheck for this new polynomial, which samples $\color{red}{\vec{r}}\in(\color{red}{\tau_7'})^{\ell-7}$, would be able to avoid $\color{blue}{\tau_7}$ entirely. To summarize, - the reason that we have to use both $\color{blue}{\tau_7}$ and $\color{red}{\tau_7'}$ is that the polynomial $t'(\vec{x})$ includes packed elements. If we treat packed elements as regular $\color{blue}{\tau_7}$ and perform multiplications directly over it, it would end up with a single $\color{blue}{\tau_7}$, which is insufficient for computing $t(\color{blue}{\vec{\beta}})$ out of $\color{blue}{t'}(\color{blue}{\vec{\beta'}})$. Dedicating $\color{red}{\tau_7'}$ for packed representations allows us to "protect" the packing before we learn $t(\color{blue}{\vec{\beta}})$. - the issue caused by $\color{blue}{\tau_7}$ and $\color{red}{\tau_7'}$ making the sumcheck to operate over a tensor product is alleviated by splitting the $\color{blue}{W_{\beta'}}(\vec{x})$ polynomial into a number of smaller polynomials and take a random linear combination of these polynomials, allowing the entire computation to be over $\color{red}{\tau_7'}$. Binius, instead of sampling $127$ random numbers $\color{red}{\eta_1}, \color{red}{\eta_2}, ..., \color{red}{\eta_{127}}$, shows that one can sample $7$ random elements $\color{red}{r_0''}, \color{red}{r_1''}, ..., \color{red}{r_6''}$ and compute: $$\color{red}{\eta_q} = \prod_{i=0}^{6} (1-\color{red}{r''_i})(1-q_i) + \color{red}{r''_i}\cdot q_i$$ where $\{q_i\}_{i=0,1,...,6}$ are the bit representation of $q$. Reducing the number of random elements that need to be sampled from a cryptographic hash function (modeled as a [random oracle](https://en.wikipedia.org/wiki/Random_oracle)) is important because, in the recursive proof generation setting, the verifier needs to invoke the hash function in order to generate these random numbers. Instead of generating $127$ random numbers independently, it often has a much lower recursive verification overhead to generate only $7$ numbers and then use the tensor product to expand from it. One may ask why we break down $\color{blue}{W_{\beta'}}(\vec{x})$ instead of breaking down $\color{red}{t'}(\vec{x})$, the latter of which would allow the entire $\color{blue}{h}(\vec{x})$ to be trivially defined over $\color{blue}{\tau_7}$, and we can sample $\color{blue}{\vec{r}}\in(\color{blue}{\tau_7})^{\ell-7}$. This may also seem more naturally because, when we compute $t(\color{blue}{\vec{\beta}})$ from $\color{red}{t'}(\color{blue}{\vec{\beta'}})$, we apply linear combinations over the different columns. Breaking down $\color{red}{t'}(\vec{x})$ as suggested here is in essence similar to breaking down into columns. The issue with this other direction is that we would not be able to "break down and apply random linear combination" on $\color{red}{t'}(\vec{x})$ when folding over the encoded form $\color{red}{f}(\vec{x})$. When we are folding, we are using a property of Reed-Solomon codes in that it is a linear code, and when we fold it over the code in a linear way, it performs a folding over the message that is being encoded, also in a linear way. This linearity, however, is over the entire packed element $\color{red}{\tau_7'}$ in $\color{red}{t'}(\vec{x})$, not the individual columns of $\color{red}{\tau_7'}$. In other words, it is possible to multiply the packed element $\color{red}{\tau_7'}$ with another $\color{red}{\tau_7'}$, which essentially applies the multiplication over each column of $\color{red}{\tau_7'}$ and sums the results up together, because it is counted as a linear operation over $\color{red}{t'}(\vec{x})$, but applying different weights to each of the columns, which is the idea suggested here, is not a linear operation over $\color{red}{\tau_7'}$ and therefore the folding method, which is over the encoded form, does not preserve the packing. To put it more generally, although in Binius we apply the Reed-Solomon code over the packed representations, it is important to realize that this is not a match made in heaven. Reed-Solomon code is a linear code over the packed representations and allows the folding over the packed level, but it does not penetrate to each single element in a packed representation. The other challenge when it comes to small fields, finding an error-correcting code that is foldable, is done in Binius through a careful design that leverages [an FFT-friendly characteristic-2 polynomial basis](https://ieee-focs.org/FOCS-2014-Papers/6517a316.pdf) by Sian-Jheng Lin, Wei-Ho Chung, and Yunghsiang S. Han. The Irreducible team found a novel way to fold over this polynomial basis. The specific construction is: - place the packed $\color{red}{\tau_7'}$ elements as coefficients of a $2^{\ell-7}$-degree polynomial expressed under this polynomial basis - use the FFT algorithm from the Lin-Chung-Han paper to encode the data - each folding reduces the Lin-Chung-Han polynomial basis for degree $2^i$ exactly to the polynomial basis for $2^{i-1}$ The Irreducible team proves a few new results relating to the security of such folding. ## Comparison We now compare FRI-Binius with Ligero-Binius and to show why, even though there is Ligero-Binius, there is a need for FRI-Binius. **Verifier work in Ligero-Binius.** Consider a polynomial of length $n=2^\ell$. In Ligero-Binius, the polynomial is expressed as a matrix of size $2^{\ell_0} \times 2^{\ell_1}$ (with $\ell = \ell_0 + \ell_1$). The verifier needs to read a proof that consists of: - the weighted sum of the rows, of size $O_\lambda(2^{\ell_1})$ - a number of columns for consistency check, of size $O_\lambda(2^{\ell_0})$ where $O_\lambda(\cdot)$ is similar to the big-O notation but ignores polynomials of the security parameters $\lambda$ to avoid ambiguity. Choosing $\ell_0 = \ell_1$, the proof size is about $O_\lambda(\sqrt{2^\ell}) = O_\lambda(\sqrt{n})$. The verifier computation work would be dominated by encoding the weighted sum, which is $O_\lambda( 2^{\ell_1})$ and the consistency check, which is $O_\lambda(2^{\ell_0})$. If we choose $\ell_0 = \ell_1$, we similarly have the total verifier computation complexity $O_\lambda(\sqrt{2^\ell}) = O_\lambda(\sqrt{n})$. The issue with Ligero-Binius is that the verification overhead grows with the square root of the polynomial size, and, in practice, it can be significant. Even if one wants to use recursive proof verification to reduce the complexity, the verifier of Ligero-Binius is still a headache. Many other zero-knowledge proof systems have already avoided this issue, and it is a desire to reduce the verifier overhead of Binius-family protocols. In particular, recursive proof verification is impacted by the large proof size because it needs to open the Merkle tree during the consistency check, and the verifier needs to hash $O(\sqrt{n})$ elements using these hash functions. **Verifier work in FRI-Binius.** The good thing about FRI-based protocols is that it is straightforward to obtain smaller proofs and a faster verifier. The proof size is $O_\lambda(\ell) = O_\lambda(\log n)$, and the same applies to the verifier size. However, although FRI-Binius is a clear winner in terms of verifier work, Ligero-Binius tends to be faster in proof generation. **Committing a polynomial.** Both Ligero-Binius and FRI-Binius apply an error-correcting code to the evaluations of the polynomial in the packed presentation. But Ligero-Binius will be doing less work for encoding, as we discussed. For a polynomial of length $2^\ell$, Ligero-Binius can split it into $\sqrt{2^\ell}$ FFTs each of size $\sqrt{2^\ell}$, and end up doing the FFT work of: $$2^\ell\cdot \log\sqrt{2^\ell} = \frac{1}{2}n\log n$$ while FRI-Binius needs to apply the error-correcting code over the entire polynomial altogether, resulting the FFT work of: $$2^\ell\cdot \log 2^\ell = n\log n$$ The rest is to commit the codeword into a Merkle tree and obtain the Merkle tree root hash. If the hash function is Blake2/3 or SHA-256, this part can be very efficient, but if a SNARK-friendly hash function has to be used (for recursive proof generation), it can also be a significant part of the proof generation. Both Ligero-Binius and FRI-Binius have to commit the entire encoded result, so it is the same. In summary, it is expected that Ligero-Binius has a faster commitment time. **Opening the polynomial commitment.** We now study the overhead to open the polynomial at a certain point. For Ligero-Binius, the only computation that the prover needs to do is to compute the weighted sum, and the rest is just revealing some paths of the Merkle tree. The computation of the weighted sum has computation complexity $O(2^\ell) = O(n)$. In contrast, FRI-Binius has a lot more work to do. It starts with the prover providing a claim of $t'(\vec{\beta'})$, doing which already has a computation overhead of $O(2^\ell)=O(n)$, which would be roughly the same as computing the weighted sum in Ligero-Binius. But, to support this claim, the prover needs to perform the sumcheck and the folding, which is another overhead of $O(n)$. As for reference, the evaluation in the [the FRI-Binius paper](https://eprint.iacr.org/2024/504.pdf) shows that, for $\mathbb{F}_2$, the proving time for FRI-Binius is about twice of Ligero-Binius. It is useful to note that if the base characteristic-2 field is not $\mathbb{F}_2$, but some $\tau_k$, the proving time gap between Ligero-Binius and FRI-Binius could be larger. It is because Ligero-Binius always multiplies the unpacked representation with $\tau_7$, while for FRI-Binius, during the sumcheck and the folding, it has to perform a lot of $\tau_7$ (aka, the packed presentations) multiplied by $\tau_7$. For example, when the base field becomes $\tau_3\cong \mathbb{F}_{2^8}$ instead of $\tau_0 = \mathbb{F}_2$, FRI-Binius will be at least $8\times$ more expensive because the packed representations for the same number of elements are $8\times$ longer, since a single $\tau_7$ can take 16 $\tau_3$ or 128 $\tau_0$. It is a little bit unusual for Ligero-Binius because the cost of multiplying $\tau_0$ with $\tau_7$ is very similar to the cost of multiplying $\tau_3$ with $\tau_7$, if the computation is done over CPU with some suitable vectorization instructions. We confirmed this observation running the [subfield multiplication benchmark](https://github.com/IrreducibleOSS/binius/blob/main/crates/field/benches/packed_field_subfield_ops.rs) in the Binius codebase on AWS R6id instances and Apple M3 laptops. Nevertheless, it is important to note that FPGA or ASIC, which are what Binius will eventually go to, do not care for special CPU instructions. From [the FRI-Binius paper](https://eprint.iacr.org/2024/504.pdf), one can observe a slowdown of about $8\times$ in total for opening the polynomial commitment, between Ligero-Binius and FRI-Binius. **Batching.** We want to emphasize that FRI-Binius has some advantages when it comes to batching, which improves FRI-Binius prover efficiency and verifier efficiency further. In practice, we often commit a number of polynomials and open them together at the same time, or, we may open the same polynomial at multiple different points. Both are very common and standard in zero-knowledge proof systems. If we look at the original Ligero-Binius polynomial commitment, when we open the polynomial at a different point, we will be using a different linear combination, resulting in virtually a recomputation of the weighted sum. This is also the case for the verifier. In FRI-Binius, we still need to compute the matrices. But starting from the sumcheck and the folding, the prover and the verifier will operate on a single sum, which would be reduced into opening the related polynomials at a single point $\vec{r}$. This is particularly useful when opening the same polynomial at different points, as the other points are basically for free. The verifier computation cost and proof size also benefit from the batching setting. ## Summary In this article we present the high-level techniques of FRI-Binius, which should help readers familiar with [Ligero-Binius](https://hackmd.io/@l2iterative/binius) understand this new algorithm and how it reduces the verifier overhead and makes it more suitable for recursive proof verification. There is a lot of development around Binius, including hardware acceleration and using it for full-fledged zkVM. We will continue to take a close look at Binius and more actively participate in its development. We would like to thank Radisav Cojbasic, Jim Posen, and Benjamin Diamond from Irreducible for reviewing this article and providing feedback.