Try   HackMD

Understanding Binius, Part I

In the past year, there have been a number of new ZK proof systems. One that interests us the most is Binius from Irreducible (formerly known as Ulvetanna).

We pay specific attention to this proof system because our portfolio company, RISC Zero, who has been developing the RISC-V zkVM, told us that Binius will be the next generation proof system and we should keep an eye on. Eddy Lazzarin, CTO of a16zcrypto, also has put Binius "as a top priority" for their proof system, Jolt.

We want to provide a high-level explanation of the Binius protocol. There are two papers on Binius:

We will dedicate two articles for Binius. In this article, we will explain Ligero-Binius, which is simpler, as the first step for understanding Binius.

In the next article, we will discuss FRI-Binius, which is the more recent version.

Starting point: What is Ligero?

Ligero-Binius can be seen as an improved version of the Ligero polynomial commitment scheme, but targeted at smaller fields, and it is crucial to first understand Ligero.

Ligero provides a way for a prover to commit a polynomial

f(x) and later opens it at a specific point, say
f(r)
. For the use cases that interest us,
r
would be chosen at random.

Committing the polynomial into a single hash. Ligero starts by turning the polynomial

f(x) into a matrix, or more intuitively, a rectangle. Consider that our polynomial is of size
2
, we will put it in a matrix with
20
rows and
21
columns, where
=0+1
.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

The matrix is being filled in the following manner.

  • First, we convert the polynomial into a list of evaluations over the boolean hypercube of size
    2
    , which encodes all the information necessary to uniquely represent this polynomial.
  • Then, we fill the matrix with the evaluations starting from the top row, and from left to right on each row, as illustrated in the animation above.

Next, Ligero applies an error-correcting code on each row. Consider an error-correcting code that encodes

21 elements into
22
elements (where
2
>
1
). The matrix after encoding will have more columns and become "longer".

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Ligero then builds a Merkle tree of this final matrix of size

2 times
22
. The root hash of the Merkle tree is the commitment of this polynomial.

Opening the polynomial at a random point

r. We now discuss how a prover can prove to a verifier that the polynomial encoded inside that Merkle tree root hash will evaluate to a certain value at point
r
, where
r
here is a vector of size
=0+1
.

The verifier, in the Ligero protocol, sends the first

0 elements of the vector
r
to the prover. The prover is asked to compute a weighted sum of the rows in the original matrix. The weight for each row is computed from the following formula, where
bi
is the
i
-th bit of the row index.

Wb01...b1b0=i=001(1ri)1biribi

For example, the first row (on the top) will be assigned row index

0, and all its bits are zero. Its weight would be the product of all
(1ri)
. The second row, right below the first row, will have the assigned row index
1
, and its weight differs from the previous one on
r0
.

W0=i=001(1ri), W1=r0i=101(1ri)

The weighted sum is computed by multiplying each row of the un-encoded matrix (i.e., rows in the “blue” matrix) with the corresponding weights computed above for each row.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Remember that this sum is over the un-encoded matrix, not the encoded matrix. In the animation, we distinguish the final row that is the weighted sum by adding the letter “S” on it. The prover sends the weighted sum (as an entire row) to the verifier.

The verifier now applies the same error-correcting code over the weighted sum. Since the error-correcting code that we use is a linear code, it appears that (1) the encoded form of the weighted sum equals to (2) the weighted sum of the encoded form.

Encode(i=0201Wirowi)=i=0201WiEncode(rowi)

In other words, each column in the encoded form of the weighted sum (shown above as the dark green row with letter "S") should be the weighted sum of the column in the encoded matrix.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

The verifier, however, needs to check if this is the case—a malicious prover may have sent the incorrect weighted sum (dark blue with letter "S"), and it is incorrect in that it is inconsistent with the cells in the encoded matrix in the commitment.

Therefore, the verifier will randomly select some columns among the

22 columns of the encoded matrix. It is important to note that this random selection now occurs in the encoded matrix, not the original matrix. The number of columns selected depends on the necessary level of cryptographic security.

The verifier then asks the prover to open these columns in their entirety against the Merkle tree root hash. Next, the verifier computes the weighted sum for these columns and checks if they match (1) the encoded form of the weighted sum.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

If the encoded form matches the Merkle trees on these columns, then the verifier is convinced that the prover is honest, and the verifier can compute

f(r), which is another layer of weighted sum over the previous weighted sum (dark blue with letter "S") sent by the prover.

To compute this "weighted weighted" sum, we first compute the new weights, now over the

21 columns. The new weights do not depend on the first
0
elements of the vector
r
that were used previously, but only depend on the last
1
elements.

Wb0+11...b0=i=00+11(1ri)1biribi

The evaluation

f(r) will be an inner product of the weights and the previous weighted sum.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

To see why this holds, one can firstly look at the structure of f(x), which is a multilinear extension of the boolean hypercube.

f(x)=u{0,1}celluk=01(1xk)(1uk)+xkuk

We can rewrite the term in the middle to make it more intuitive. Consider the following function.

ga(b)={1bif a=0bif a=1

Since the cells are all over the boolean hypercube, we know that for each cell, it is always one case or the other. This allows us to simplify

f(x) as follows.

f(x)=u{0,1}celluk=01guk(xk)

The "weighted weighted" sum is basically summing up each cell, multiplied by

Wi and
Wj
, which equals to the product shown above.

This concludes our discussion of how Ligero is used to open the polynomial at a random point

r, by converting it into a problem of "weighted weighted" sum.

Ligero and small fields

Ligero has an important benefit. The proof generation overhead tends to be smaller, as each step is usually over a row or a column, rather than the entire matrix all together at once. This makes it pretty suitable for hardware acceleration, as is why Irreducible has been paying a lot of attention into building a hardware accelerator for it.

The question is how to achieve good performance of Ligero over small fields.

Previously, we have proof systems that use small fields: Plonky2 pioneers the Goldilocks field, RISC Zero uses the BabyBear field, and Stwo uses the Circle M31 field. The error-correcting code is applied over the small field, while the random point r would be over a larger field to achieve a desired security level (such as 100 bits of security). This larger field is usually an extension of the smaller field, defined by the extension degree. The error-correcting code is naturally generalized into the extension field.

Goldilocks BabyBear Circle M31
Prime
p=264232+1
p=15227+1
p=2311
Bits 64 31 31
Code Reed-Solomon Reed-Solomon Circle Reed-Solomon
Extension degree 2 4 4

For Binius, which targets at the smallest possible field

F2 (which consists of only two elements, zero and one), there are two questions to answer.

  • The first question is about the extension. To achieve the desired security level, the degree of extension would be like
    128
    , in order to obtain close to
    100
    bits of security. However, settling down the extension degree is not enough, because there are many different ways to construct extension of
    F2
    by picking different irreducible polynomials. We will later show how Binius designs a special sequence of extension fields, called "towers", to build flexible and hardware-friendly extensions.
  • The second question is about the error-correcting code. The small field we are dealing with,
    F2
    , is too small for regular Reed-Solomon code. One can either pick or construct an error-correcting code over
    F2
    (which works and can indeed provide somewhat okay performance) or pack a number of
    F2
    elements into blocks and combine them into an element in a larger field (which can be another extension of
    F2
    ) that permits Reed-Solomon code. We will later show the different proposals that Binius makes.

With these two answers in mind, you basically have Binius. The rest is just to use Binius in replacement of Ligero in a zero-knowledge proof system that defines the circuit in binary fields. We now explain in more detail about the extension and the error-correcting code.

Towers: Extension of fields

Binius has two extension sequences: one for CPU with special instructions, one for other hardware devices that do not have these instructions, such as GPU and FPGA.

We start with the second one, which is simpler, called the Wiedemann construction. It starts with

F2, and then generates the next field one after another, by extending the previous field with a fixed degree-two polynomial.

τ0=F2τ1=τ0[X0]/(X02+X0+1)F22τ2=τ1[X1]/(X12+X0X1+1)F24τ3=τ2[X2]/(X22+X1X2+1)F28τ4=τ3[X3]/(X32+X2X3+1)F216τ5=τ4[X4]/(X42+X3X4+1)F232τ6=τ5[X5]/(X52+X4X5+1)F264τ7=τ6[X6]/(X62+X5X6+1)F2128

One can see that these eight towers build on a previous tower (or \mathbb{F}_2) and extend from it. This construction has a benefit that it is easy to "lift" elements from the previous field to the next field, as it would be a simple copy, and it is easy to multiply field elements of different levels because of the explicit structure from the previous level to the next.

The first one, on the contrary, is designed for CPUs and leverages special instructions these CPUs provide, and it made two modifications to the Wiedemann construction.

Many CPUs have support for AES field multiplication that can handle it more efficiently. To use this instruction, Binius changes how

τ4 is being generated by extending from the AES field.

τ0=F2τ1=τ0[X0]/(X02+X0+1)F22τ2=τ1[X1]/(X12+X0X1+1)F24τ3=τ2[X2]/(X22+X1X2+1)F28\colorblueFAES=F2[X0]/(X08+X04+X03+X0+1)F28τ4=\colorblueFAES[X3]/(X32+X2~X3+1)F216τ5=τ4[X4]/(X42+X3X4+1)F232τ6=τ5[X5]/(X52+X4X5+1)F264τ7=τ6[X6]/(X62+X5X6+1)F2128

Here,

X2~=X07+X06+X04+X0+1 is the result of mapping
X2
in
τ3
into
FAES
. Note that since
τ4
has been changed, the subsequent fields,
τ5
,
τ6
,
τ7
, are also different, and they become extensions of the AES field. To bridge
τ3
to higher levels, a conversation step is made to convert
τ3
into a corresponding element in the AES field.

Another change is to note that the POLYVAL field, which is over

2128, can also be implemented very efficiently. Therefore, in Binius, when we multiply two elements in
τ7
, it is preferred to convert them into the POLYVAL field and perform the multiplication over the POLYVAL field.

τ0=F2τ1=τ0[X0]/(X02+X0+1)F22τ2=τ1[X1]/(X12+X0X1+1)F24τ3=τ2[X2]/(X22+X1X2+1)F28FAES=F2[X0]/(X08+X04+X03+X0+1)F28τ4=FAES[X3]/(X32+X2~X3+1)F216τ5=τ4[X4]/(X42+X3X4+1)F232τ6=τ5[X5]/(X52+X4X5+1)F264τ7=τ6[X6]/(X62+X5X6+1)F2128\colorblueFPOLYVAL=F2[X0]/(X0128+X0127+X0126+X0121+1)F2128

The introduction of the POLYVAL field does not break the sequence of extension fields, as it is usually the last extension field that we care about in the sequence (as it offers enough bits of security).

In the simple case of Binius, all the data will be encoded over

τ0, and the random point
r
would be over
τ7
, so most of the intermediate levels that we show above are not going to be used, but just serve as steps toward making
τ7
. Error-correcting code, which we will discuss next, will likely take one of the levels in the middle (such as
τ4
) to define the Reed-Solomon code.

Packed Reed-Solomon: Error-correcting code

There is indeed a lot of flexibility in choosing an error-correcting code here, but Binius strives to get Reed-Solomon code, which is simple and fast to do encoding, to work for small fields.

As we previously discussed, the main limitation is that

τ0 (aka
F2
) does not have useful Reed-Solomon codes because it is too small. Yet, if we move up the levels in the extension field, we will soon see that for larger fields such as
τ4
or higher, Reed-Solomon codes do exist. The most straightforward way is to "lift" each element in
τ0
to
τ4
and perform the Reed-Solomon code there, but it seems quite wasteful as the message space can handle
16
bits, but only one bit is used.

Binius borrows an idea from the concatenated code and "packs" sixteen

τ0 into
τ4
, and considers these sixteen elements "a block", so the entire message space is being used. Reed-Solomon code continues from
τ4
.

pack

This packing mechanism also changes how the "columns" are randomly selected in the Ligero protocol. Previously, each column was a

F2 element, or a bit, but now each column is a
τ4
element. So, Binius is now testing blocks and blocks of
16
bits rather than bits by bits.

Performance

The Binius paper, updated October 2024, provides some results on the performance of the proof system on the CPU. It can be summarized as follows.

  • The smaller the field is, the faster Binius is. For example, the 1-bit field
    F2
    is about 13 times faster than a 32-bit field.
  • When the field is of the same size as BabyBear (about 32 bits), Binius achieves a similar performance compared with BabyBear.

As for end-to-end performance, it has been shown that Keccak can be proven at least 2x faster than Plonky3 (the one that SP1 is built over). This is an encouraging note as our current approach before Binius, FRI-based proof systems, is close to reaching its limit, and to make ZK proof systems faster, either we need a new innovation in theory, or we find more efficient hardware acceleration—for example, a key reason that FRI is getting faster in the past few years has to do with Apple’s Metal chips and Nvidia’s new GPUs.

This is why Binius is useful: on the one hand, it shows an approach different from FRI with evidence that it can be faster in some difficult use cases that FRI has been struggling with, and on the other hand, its less reliance on the 32-bit integer processing units of CPU and GPU opens the possibility of FPGA and ASIC accelerators that can be faster than CPU or GPU.

In fact, Irreducible has released the benchmark of their in-progress ASIC design for Binius’s towers and compares it with Mersenne-31. The key metrics over the 14nm process are summarized below.

Mersenne-31
τ5
Silicon Area (
μm2
)
1357 748
Critical Path (
ns
)
2.0 0.7

Both metrics are crucial for ASIC development—the smaller the silicon area is, the more units we can put on a single chip, and the shorter the critical path is, the more likely we can run it at a high frequency and have a high processing speed on a single unit. These metrics are often inherent to a specific type of computation and their implementations have been studied for more than a few decades, meaning that the numbers here are likely the best we can do.

One can draw another conclusion from this comparison that Binius's towers would be more efficient than the BabyBear field that we are using today, as the BabyBear field would be more complicated than the Mersenne-31 field to be implemented on customized silicon.

Irreducible also unveils their ASIC design, called X-Check, in a recent talk.

Summary

In this article we explain the core idea of Ligero-Binius and show how it can be useful in constructing new proof systems that break the current barriers of efficient proof generation. We will continue working on Binius, especially using Binius in Jolt, for next-generation zkVM.

The next article will discuss FRI-Binius, the more recent version of Binius.

We would like to thank Radisav Cojbasic, Jim Posen, and Benjamin Diamond from Irreducible for reviewing this article and providing feedback.