# Understanding Binius, Part I In the past year, there have been a number of new ZK proof systems. One that interests us the most is [Binius](https://www.irreducible.com/binius) from [Irreducible](https://www.irreducible.com/) (formerly known as Ulvetanna). We pay specific attention to this proof system because our portfolio company, [RISC Zero](https://risczero.com/blog/designing-high-performance-zkVMs), who has been developing the RISC-V zkVM, told us that Binius will be the next generation proof system and we should keep an eye on. [Eddy Lazzarin](https://x.com/eddylazzarin/status/1784922354931397077), CTO of a16zcrypto, also has put Binius "as a top priority" for their proof system, [Jolt](https://a16zcrypto.com/posts/article/building-jolt/). We want to provide a high-level explanation of the Binius protocol. There are two papers on Binius: - [Ligero-Binius](https://eprint.iacr.org/2023/1784.pdf) (2023) - [FRI-Binius](https://eprint.iacr.org/2024/504.pdf) (2024) We will dedicate two articles for Binius. In this article, we will explain Ligero-Binius, which is simpler, as the first step for understanding Binius. In the next article, we will discuss FRI-Binius, which is the more recent version. ## Starting point: What is Ligero? Ligero-Binius can be seen as an improved version of the Ligero polynomial commitment scheme, but targeted at smaller fields, and it is crucial to first understand [Ligero](https://eprint.iacr.org/2022/1608.pdf). Ligero provides a way for a prover to commit a polynomial $f(x)$ and later opens it at a specific point, say $f(r)$. For the use cases that interest us, $r$ would be chosen at random. **Committing the polynomial into a single hash.** Ligero starts by turning the polynomial $f(x)$ into a matrix, or more intuitively, a rectangle. Consider that our polynomial is of size $2^\ell$, we will put it in a matrix with $2^{\ell_0}$ rows and $2^{\ell_1}$ columns, where $\ell = \ell_0 + \ell_1$. <style> img[src*='#center'] { display: block; margin: auto; } </style> ![Presentation1](https://hackmd.io/_uploads/H1iqXqgXke.gif#center =50%x) The matrix is being filled in the following manner. - First, we convert the polynomial into a list of evaluations over the boolean hypercube of size $2^\ell$, which encodes all the information necessary to uniquely represent this polynomial. - Then, we fill the matrix with the evaluations starting from the top row, and from left to right on each row, as illustrated in the animation above. Next, Ligero applies an error-correcting code on each row. Consider an error-correcting code that encodes $2^{\ell_1}$ elements into $2^{\ell_2}$ elements (where $\ell_2$ > $\ell_1$). The matrix after encoding will have more columns and become "longer". ![encoding](https://hackmd.io/_uploads/BktOrqe7yx.gif#center =50%x) Ligero then builds a Merkle tree of this final matrix of size $2^{\ell}$ times $2^{\ell_2}$. The root hash of the Merkle tree is the commitment of this polynomial. **Opening the polynomial at a random point $\vec{r}$.** We now discuss how a prover can prove to a verifier that the polynomial encoded inside that Merkle tree root hash will evaluate to a certain value at point $\vec{r}$, where $\vec{r}$ here is a vector of size $\ell = \ell_0 + \ell_1$. The verifier, in the Ligero protocol, sends the first $\ell_0$ elements of the vector $\vec{r}$ to the prover. The prover is asked to compute a weighted sum of the rows in the original matrix. The weight for each row is computed from the following formula, where $b_i$ is the $i$-th bit of the row index. $$W_{b_{\ell_0-1}...b_1b_0}=\prod_{i=0}^{\ell_0-1}(1-r_i)^{1-b_i}\cdot r_i^{b_i}$$ For example, the first row (on the top) will be assigned row index $0$, and all its bits are zero. Its weight would be the product of all $(1-r_i)$. The second row, right below the first row, will have the assigned row index $1$, and its weight differs from the previous one on $r_0$. $$W_0 = \prod_{i=0}^{\ell_0-1} (1-r_i),~W_1=r_0\cdot \prod_{i=1}^{\ell_0-1}(1-r_i)$$ The weighted sum is computed by multiplying each row of the un-encoded matrix (i.e., rows in the “blue” matrix) with the corresponding weights computed above for each row. ![sum](https://hackmd.io/_uploads/BkgC8cg7kl.gif#center =50%x) Remember that this sum is over the un-encoded matrix, not the encoded matrix. In the animation, we distinguish the final row that is the weighted sum by adding the letter “S” on it. The prover sends the weighted sum (as an entire row) to the verifier. The verifier now applies the same error-correcting code over the weighted sum. Since the error-correcting code that we use is a linear code, it appears that (1) the encoded form of the weighted sum equals to (2) the weighted sum of the encoded form. $$\mathsf{Encode}\left(\sum_{i=0}^{2^{\ell_0-1}}W_i\cdot \mathsf{row}_i\right) = \sum_{i=0}^{2^{\ell_0-1}}W_i\cdot \mathsf{Encode}(\mathsf{row}_i)$$ In other words, each column in the encoded form of the weighted sum (shown above as the dark green row with letter "S") should be the weighted sum of the column in the encoded matrix. ![linear](https://hackmd.io/_uploads/rJaxuqemJl.gif#center =50%x) The verifier, however, needs to check if this is the case—a malicious prover may have sent the incorrect weighted sum (dark blue with letter "S"), and it is incorrect in that it is inconsistent with the cells in the encoded matrix in the commitment. Therefore, the verifier will randomly select some columns among the $2^{\ell_2}$ columns of the encoded matrix. It is important to note that this random selection now occurs in the encoded matrix, not the original matrix. The number of columns selected depends on the necessary level of cryptographic security. The verifier then asks the prover to open these columns in their entirety against the Merkle tree root hash. Next, the verifier computes the weighted sum for these columns and checks if they match (1) the encoded form of the weighted sum. ![check](https://hackmd.io/_uploads/HJK7GhgXkl.gif#center =50%x) If the encoded form matches the Merkle trees on these columns, then the verifier is convinced that the prover is honest, and the verifier can compute $f(r)$, which is another layer of weighted sum over the previous weighted sum (dark blue with letter "S") sent by the prover. To compute this "weighted weighted" sum, we first compute the new weights, now over the $2^{\ell_1}$ columns. The new weights do not depend on the first $\ell_0$ elements of the vector $\vec{r}$ that were used previously, but only depend on the last $\ell_1$ elements. $$W'_{b_{\ell_0+\ell_1-1}...b_{\ell_0}} = \prod_{i=\ell_0}^{\ell_0+\ell_1-1}(1-r_i)^{1-b_i}r_i^{b_i}$$ The evaluation $f(r)$ will be an inner product of the weights and the previous weighted sum. ![fr](https://hackmd.io/_uploads/r1V77hlmye.gif#center =15%x) To see why this holds, one can firstly look at the structure of f(x), which is a multilinear extension of the boolean hypercube. $$f(x) = \sum_{u\in\{0,1\}^\ell}\mathsf{cell}_u\cdot \prod_{k=0}^{\ell-1}(1-x_k)(1-u_k)+x_k\cdot u_k$$ We can rewrite the term in the middle to make it more intuitive. Consider the following function. $$g_a(b) = \begin{cases}1-b & \text{if $a=0$}\\b & \text{if $a=1$}\end{cases}$$ Since the cells are all over the boolean hypercube, we know that for each cell, it is always one case or the other. This allows us to simplify $f(x)$ as follows. $$f(x) = \sum_{u\in\{0,1\}^\ell}\mathsf{cell}_u\cdot \prod_{k=0}^{\ell-1} g_{u_k}(x_k)$$ The "weighted weighted" sum is basically summing up each cell, multiplied by $W_i$ and $W'_j$, which equals to the product shown above. This concludes our discussion of how Ligero is used to open the polynomial at a random point $\vec{r}$, by converting it into a problem of "weighted weighted" sum. ## Ligero and small fields Ligero has an important benefit. The proof generation overhead tends to be smaller, as each step is usually over a row or a column, rather than the entire matrix all together at once. This makes it pretty suitable for hardware acceleration, as is why Irreducible has been paying a lot of attention into building a hardware accelerator for it. The question is how to achieve good performance of Ligero over small fields. Previously, we have proof systems that use small fields: Plonky2 pioneers the Goldilocks field, RISC Zero uses the BabyBear field, and Stwo uses the Circle M31 field. The error-correcting code is applied over the small field, while the random point r would be over a larger field to achieve a desired security level (such as 100 bits of security). This larger field is usually an extension of the smaller field, defined by the extension degree. The error-correcting code is naturally generalized into the extension field. | | **Goldilocks** | **BabyBear** | **Circle M31** | | -------- | -------- | -------- | -------- | | Prime | $p=2^{64}-2^{32}+1$ | $p=15\cdot 2^{27}+1$ | $p=2^{31}-1$ | | Bits | 64 | 31 | 31 | | Code | Reed-Solomon | Reed-Solomon | Circle Reed-Solomon | | Extension degree | 2 | 4 | 4 | For Binius, which targets at the smallest possible field $\mathbb{F}_2$ (which consists of only two elements, zero and one), there are two questions to answer. - **The first question is about the extension.** To achieve the desired security level, the degree of extension would be like $128$, in order to obtain close to $100$ bits of security. However, settling down the extension degree is not enough, because there are many different ways to construct extension of $\mathbb{F}_2$ by picking different irreducible polynomials. We will later show how Binius designs a special sequence of extension fields, called "towers", to build flexible and hardware-friendly extensions. - **The second question is about the error-correcting code.** The small field we are dealing with, $\mathbb{F}_2$, is too small for regular Reed-Solomon code. One can either pick or construct an error-correcting code over $\mathbb{F}_2$ (which works and can indeed provide somewhat okay performance) or pack a number of $\mathbb{F}_2$ elements into blocks and combine them into an element in a larger field (which can be another extension of $\mathbb{F}_2$) that permits Reed-Solomon code. We will later show the different proposals that Binius makes. With these two answers in mind, you basically have Binius. The rest is just to use Binius in replacement of Ligero in a zero-knowledge proof system that defines the circuit in binary fields. We now explain in more detail about the extension and the error-correcting code. ## Towers: Extension of fields Binius has two extension sequences: one for CPU with special instructions, one for other hardware devices that do not have these instructions, such as [GPU](https://www.irreducible.com/posts/slicing-up-binary-towers) and [FPGA](https://www.irreducible.com/posts/binary-tower-fields-are-the-future-of-verifiable-computing). We start with the second one, which is simpler, called [the Wiedemann construction](https://www.fq.math.ca/Scanned/26-4/wiedemann.pdf). It starts with $\mathbb{F}_2$, and then generates the next field one after another, by extending the previous field with a fixed degree-two polynomial. $$\begin{array}{l}\tau_0 = \mathbb{F}_2\\\tau_1 = \tau_0[X_0] / (X_0^2 + X_0 + 1)\cong\mathbb{F}_{2^2}\\\tau_2 = \tau_1[X_1]/(X_1^2 + X_0\cdot X_1 + 1)\cong \mathbb{F}_{2^4}\\\tau_3=\tau_2[X_2]/(X_2^2 + X_1\cdot X_2 + 1)\cong\mathbb{F}_{2^8}\\\tau_4=\tau_3[X_3]/(X_3^2 + X_2\cdot X_3 + 1)\cong \mathbb{F}_{2^{16}}\\\tau_5 = \tau_4[X_4]/(X_4^2+X_3\cdot X_4 + 1)\cong\mathbb{F}_{2^{32}}\\\tau_6 = \tau_5[X_5]/(X_5^2+X_4\cdot X_5+1)\cong\mathbb{F}_{2^{64}}\\\tau_7=\tau_6[X_6]/(X_6^2+X_5\cdot X_6 + 1)\cong \mathbb{F}_{2^{128}}\end{array}$$ One can see that these eight towers build on a previous tower (or \mathbb{F}_2) and extend from it. This construction has a benefit that it is easy to "lift" elements from the previous field to the next field, as it would be a simple copy, and it is easy to multiply field elements of different levels because of the explicit structure from the previous level to the next. The first one, on the contrary, is designed for CPUs and leverages special instructions these CPUs provide, and it made two modifications to the Wiedemann construction. Many CPUs have support for AES field multiplication that can handle it more efficiently. To use this instruction, Binius changes how $\tau_4$ is being generated by extending from the AES field. $$\begin{array}{l}\tau_0 = \mathbb{F}_2\\\tau_1 = \tau_0[X_0] / (X_0^2 + X_0 + 1)\cong\mathbb{F}_{2^2}\\\tau_2 = \tau_1[X_1]/(X_1^2 + X_0\cdot X_1 + 1)\cong \mathbb{F}_{2^4}\\\tau_3=\tau_2[X_2]/(X_2^2 + X_1\cdot X_2 + 1)\cong\mathbb{F}_{2^8}\\\color{blue}{\mathbb{F}_\mathsf{AES}=\mathbb{F}_2[X_0]/(X_0^8 + X_0^4 + X_0^3+X_0+1)\cong\mathbb{F}_{2^8}}\\\tau_4=\color{blue}{\mathbb{F}_\mathsf{AES}[X_3]}/(X_3^2 + \widetilde{X_2}\cdot X_3 + 1)\cong \mathbb{F}_{2^{16}}\\\tau_5 = \tau_4[X_4]/(X_4^2+X_3\cdot X_4 + 1)\cong\mathbb{F}_{2^{32}}\\\tau_6 = \tau_5[X_5]/(X_5^2+X_4\cdot X_5+1)\cong\mathbb{F}_{2^{64}}\\\tau_7=\tau_6[X_6]/(X_6^2+X_5\cdot X_6 + 1)\cong \mathbb{F}_{2^{128}}\end{array}$$ Here, $\widetilde{X_2} = X_0^7 + X_0^6 + X_0^4 + X_0 + 1$ is the result of mapping $X_2$ in $\tau_3$ into $\mathbb{F}_\mathsf{AES}$. Note that since $\tau_4$ has been changed, the subsequent fields, $\tau_5$, $\tau_6$, $\tau_7$, are also different, and they become extensions of the AES field. To bridge $\tau_3$ to higher levels, a conversation step is made to convert $\tau_3$ into a corresponding element in the AES field. Another change is to note that the POLYVAL field, which is over $2^{128}$, can also be implemented very efficiently. Therefore, in Binius, when we multiply two elements in $\tau_7$, it is preferred to convert them into the POLYVAL field and perform the multiplication over the POLYVAL field. $$\begin{array}{l}\tau_0 = \mathbb{F}_2\\\tau_1 = \tau_0[X_0] / (X_0^2 + X_0 + 1)\cong\mathbb{F}_{2^2}\\\tau_2 = \tau_1[X_1]/(X_1^2 + X_0\cdot X_1 + 1)\cong \mathbb{F}_{2^4}\\\tau_3=\tau_2[X_2]/(X_2^2 + X_1\cdot X_2 + 1)\cong\mathbb{F}_{2^8}\\\mathbb{F}_\mathsf{AES}=\mathbb{F}_2[X_0]/(X_0^8 + X_0^4 + X_0^3+X_0+1)\cong\mathbb{F}_{2^8}\\\tau_4=\mathbb{F}_\mathsf{AES}[X_3]/(X_3^2 + \widetilde{X_2}\cdot X_3 + 1)\cong \mathbb{F}_{2^{16}}\\\tau_5 = \tau_4[X_4]/(X_4^2+X_3\cdot X_4 + 1)\cong\mathbb{F}_{2^{32}}\\\tau_6 = \tau_5[X_5]/(X_5^2+X_4\cdot X_5+1)\cong\mathbb{F}_{2^{64}}\\\tau_7=\tau_6[X_6]/(X_6^2+X_5\cdot X_6 + 1)\cong \mathbb{F}_{2^{128}}\\\color{blue}{\mathbb{F}_\mathsf{POLYVAL}=\mathbb{F}_2[X_0]/(X_0^{128}+X_0^{127}+X_0^{126}+X_0^{121}+1)\cong\mathbb{F}_{2^{128}}}\end{array}$$ The introduction of the POLYVAL field does not break the sequence of extension fields, as it is usually the last extension field that we care about in the sequence (as it offers enough bits of security). In the simple case of Binius, all the data will be encoded over $\tau_0$, and the random point $\vec{r}$ would be over $\tau_7$, so most of the intermediate levels that we show above are not going to be used, but just serve as steps toward making $\tau_7$. Error-correcting code, which we will discuss next, will likely take one of the levels in the middle (such as $\tau_4$) to define the Reed-Solomon code. ## Packed Reed-Solomon: Error-correcting code There is indeed a lot of flexibility in choosing an error-correcting code here, but Binius strives to get Reed-Solomon code, which is simple and fast to do encoding, to work for small fields. As we previously discussed, the main limitation is that $\tau_0$ (aka $\mathbb{F}_2$) does not have useful Reed-Solomon codes because it is too small. Yet, if we move up the levels in the extension field, we will soon see that for larger fields such as $\tau_4$ or higher, Reed-Solomon codes do exist. The most straightforward way is to "lift" each element in $\tau_0$ to $\tau_4$ and perform the Reed-Solomon code there, but it seems quite wasteful as the message space can handle $16$ bits, but only one bit is used. Binius borrows an idea from the concatenated code and "packs" sixteen $\tau_0$ into $\tau_4$, and considers these sixteen elements "a block", so the entire message space is being used. Reed-Solomon code continues from $\tau_4$. ![pack](https://hackmd.io/_uploads/HkobQTlQyx.png#center =50%x) This packing mechanism also changes how the "columns" are randomly selected in the Ligero protocol. Previously, each column was a $\mathbb{F}_2$ element, or a bit, but now each column is a $\tau_4$ element. So, Binius is now testing blocks and blocks of $16$ bits rather than bits by bits. ## Performance The Binius paper, updated October 2024, provides some results on the performance of the proof system on the CPU. It can be summarized as follows. - The smaller the field is, the faster Binius is. For example, the 1-bit field $\mathbb{F}_2$ is about 13 times faster than a 32-bit field. - When the field is of the same size as BabyBear (about 32 bits), Binius achieves a similar performance compared with BabyBear. As for end-to-end performance, it has been shown that Keccak can be proven at least 2x faster than Plonky3 (the one that SP1 is built over). This is an encouraging note as our current approach before Binius, FRI-based proof systems, is close to reaching its limit, and to make ZK proof systems faster, either we need a new innovation in theory, or we find more efficient hardware acceleration—for example, a key reason that FRI is getting faster in the past few years has to do with Apple’s Metal chips and Nvidia’s new GPUs. This is why Binius is useful: on the one hand, it shows an approach different from FRI with evidence that it can be faster in some difficult use cases that FRI has been struggling with, and on the other hand, its less reliance on the 32-bit integer processing units of CPU and GPU opens the possibility of FPGA and ASIC accelerators that can be faster than CPU or GPU. In fact, Irreducible has released the [benchmark](https://www.irreducible.com/posts/binary-tower-fields-are-the-future-of-verifiable-computing) of their in-progress ASIC design for Binius’s towers and compares it with Mersenne-31. The key metrics over the 14nm process are summarized below. | | Mersenne-31 | $\tau_5$ | | -------- | -------- | -------- | | Silicon Area ($\mu m^2$) | 1357 | 748 | | Critical Path ($ns$) | 2.0 | 0.7 | Both metrics are crucial for ASIC development—the smaller the silicon area is, the more units we can put on a single chip, and the shorter the critical path is, the more likely we can run it at a high frequency and have a high processing speed on a single unit. These metrics are often inherent to a specific type of computation and their implementations have been studied for more than a few decades, meaning that the numbers here are likely the best we can do. One can draw another conclusion from this comparison that Binius's towers would be more efficient than the BabyBear field that we are using today, as the BabyBear field would be more complicated than the Mersenne-31 field to be implemented on customized silicon. Irreducible also unveils their ASIC design, called X-Check, in a recent [talk](https://www.youtube.com/watch?v=RQ_hoKtifzg). ## Summary In this article we explain the core idea of Ligero-Binius and show how it can be useful in constructing new proof systems that break the current barriers of efficient proof generation. We will continue working on Binius, especially using Binius in Jolt, for next-generation zkVM. The next article will discuss FRI-Binius, the more recent version of Binius. We would like to thank Radisav Cojbasic, Jim Posen, and Benjamin Diamond from Irreducible for reviewing this article and providing feedback.