owned this note
owned this note
Published
Linked with GitHub
#### Pun Wai, Lyron Co Ting Keh
---
![](https://hackmd.io/_uploads/HJ4VOl1w2.png)
## What are proto-danksharding blobs?
Ethereum's rollups need [data availability](https://ethereum.org/en/developers/docs/data-availability/) (DA) to remain permissionless. Many production rollups today ensure DA by storing transactions directly [in Ethereum calldata](https://medium.com/ethereum-optimism/the-road-to-sub-dollar-transactions-part-2-compression-edition-6bb2890e3e92#:~:text=Calldata%20Overview,store%20Optimism%20transactions%20in%20calldata). Doing so inherits the security of Ethereum. This, however, comes with the drawback of high costs. Rollups that use Ethereum for DA must compete with all other network activity for the same available space per block, thus driving up L2 transaction costs.
Proto-danksharding (described in [EIP-4844](https://www.eip4844.com/)) is a soon-to-be upgrade that increases the data available in each Ethereum block from ~100KB to 1MB. More space means cheaper storage. Cheaper storage means cheaper DA. Cheaper DA means cheaper txs for users.
How does proto-danksharding make this increase in block storage economically viable for nodes? **Rather than storing data blobs in the execution layer, proto-danksharding has raw data stored temporarily in the consensus layer, with only the commitments stored in the execution layer.** Raw blob data in the consensus layer can then be deleted after a small time window to avoid blowing up disk space requirements for nodes.
## Why is it useful to open blob commitments in-circuit?
These blob commitments are KZG vector commitments. Higher level reasoning about these commitments (eg. "transactions $[A, B, C]$ were involved in $D$ rollup state transition and resulted in an $E$ absolute change to this user's balance") is prohibitively expensive when done on-chain. These types of statements are only practical when done within a SNARK that's then verified on-chain.
The first step in writing these circuits is proving KZG multi-opens. In particular, we need Halo2 circuits to prove that a given KZG commitment $\bar p$ opens to $\{(x_1, b_1), (x_2, b_2), ..., (x_m, b_m)\}$ points. This is what we focused on for this work.
## Writing the prover for KZG multi-opens
### Setting up and committing to blob data
KZG is instantiated over a pairing friendly curve $E(F_p)$ with scalar field $F_r$ and bilinear map $e: G_1 × G_2 \rightarrow G_T$ for groups $G_1 = <H_1>$ and $G_2 = <H_2>$. The construction requires a trusted setup to generate $pp = (g_1 = \{[\tau^0]_1, [\tau^1]_1, ..., [\tau^n]_1\}, g_2 = \{[\tau^0]_2, [\tau^1]_2, ..., [\tau^m]_2\}$).
Blobs are treated as vectors $b \in F_r^n$, where $n$ is the constant $4096$. Let $\omega$ be an $n$-th root of unity in $F$. We can commit to $b$ using the below function.
$Commit(pp, b) \rightarrow C:$
1. Interpolate a polynomial $p(X)$ through the points $\{(\omega^0, b_0), (\omega^1, b_1), ..., (\omega^{n-1}, b_{n-1})\}$.
2. Output $C = [p(\tau)]_1 = <p_{coeffs}, pp.g_1>$. Note that $<>$ denotes inner product.
It is $\bar p$ (*small*, a single group elment) that's stored in the EVM while the raw blob data $b$ (*big*, 4096 field elements) is temporarily stored in the consensus layer of nodes.
### Generating the opening proofs
We'd like to prove that a commitment $C$ opens to some subset $s \subseteq b$ of the form $\{(w^i, b_i)\}$. Let $m = |s|$. The naive way to do this is to create a KZG opening proof for every element in $s$. This, however, requires $m$ expensive pairing checks. We instead employ a multi-open scheme as outlined below.
$Open(pp, C, s) \rightarrow \pi:$
1. Interpolate a polynomial $r(X)$ through $s$.
2. Construct vanishing polynomial $Z(X) = \prod_{w^i \in s} (X - w^i)$.
3. Compute quotient polynomial $q(X) = \frac{p(X) - r(X)}{Z(X)}$.
4. Output multi-open proof $\pi = [q(\tau)]_1 = <q_{coeffs}, pp.g_1>$.
Our prover implementation is included in the demo below. We do not cover any of the intuition behind KZG's correctness and evaluation binding. For more information, we recommend [Dankrad's explainer](https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html).
### A note on monomial vs lagrange basis
We use the monomial basis from the trusted setup even though EIP4844 specifies that $[p(\tau)]_1$ is computed using the lagrange basis. This is to maintain consistency with operations required in the verifier for multi-opens. Computing $[r(\tau)]_1$ requires the monomial basis in $G_1$- otherwise it would unnecessarily be an $n$ (blob size) degree polynomial when it only needs to pass through $m$ (opening size) points. Additionally, computing $[Z(\tau)]_2$ requires the monomial basis in $G_2$.
Using the monomial basis for evaluating $[p(\tau)]_1$ isn't a problem since 1) it would result in the same commitment as reference implementations due to the uniqueness of polynomial interpolation and 2) we have access to enough powers of tau according to the [Ethereum KZG ceremony specs](https://github.com/ethereum/kzg-ceremony-specs/).
## A Halo2 primitive for the KZG multi-open verifier
We implement the verifier in a `KZGChip`, which comes with a `PolyChip` to do polynomial evaluations. They're both [merged into Axiom's halo2 fork](https://github.com/axiom-crypto/halo2-lib/pull/70).
### Executing the verifier in-circuit
The verifier checks whether a given commitment $C$ opens to a claimed $s$. It does so using the logic outlined below.
$Verify(pp, \pi, s) \rightarrow \{0, 1\}:$
1. Interpolate $r(X)$ through $s$ outside of the circuit and load $r_{coeffs}$ directly into advice cells. Constrain $r(w_i) = b_i \forall (w_i, b_i) \in s$ in the circuit so provers cannot cheat. Compute $[r(\tau)]_1$ with an MSM $<r_{coeffs}, pp.g_1>$
2. Construct $Z(X)$ over $s$ outside of the circuit and load $Z_{coeffs}$ directly into advice cells. Constrain $Z(w_i) = 0 \forall w_i \in ∧ Z(0) = \prod_{w^i \in s} -w^i$ in the circuit so provers cannot cheat. Compute $[Z(\tau)]_2$ with an MSM $<Z_{coeffs}, pp.g_2>$
3. Check $e(\pi, [Z(\tau)]_2) = e(C - [r(\tau)]_1, H_2)$.
### Properly constraining $r(X)$ and $Z(X)$
Interpolating $r(X)$ requires lagrange interpolation. Constructing $Z(X)$ requires polynomial multiplication. Both are expensive. To avoid incurring the proving cost, we construct the two polynomials outside of the circuit, then check that they are properly constructed inside of it.
We need to take special care to ensure that these polynomials are properly constrained since our interpolation and polynomial multiplication is unconstrained. Think "single arrow" in circom. $r(X)$ is a polynomial of degree $m - 1$, so checking it on $m$ points is sufficient. $Z(X)$, however, is a degree $m$ polynomial, so an additional evaluation is necessary. We use $(0, Z(0))$ in our circuit since it saves on $m$ subtractions in $F_r$, but any point $p \notin s$ works as well.
### Benchmarks for the KZGChip
We ran benchmarks on an M1 Max with 10 cores and 64GB of RAM. The table below shows metrics for blob length $n = 4096$ and opening size $m = 64$.
| | Poly Eval | MSM | Pairing Check |
| ------------------- | --------- | --- | ------------- |
| Num advice cells | 5M | 25M | 9M |
| Circuit compilation | 10s | 59s | 23s |
| Proving time | 11s | 61s | 28s |
| Verification time | 1ms | 1ms | 1ms |
### Optimizing the pairing check
Notice that the pairing check is a heavy component in our circuit. There's [a well-established technique](https://hackmd.io/@benjaminion/bls12-381) for optimizing it that we saw as worthwhile. A pairing has two primary steps- a miller loop and a final exponentiation. If we define $e'$ as the truncated function that only executes the miller loop, we can view our check as
$(e'(\pi, [Z(\tau)]_2))^x = (e'(C - [r(\tau)]_1, H_2))^x$
Since we know 1) how to invert either side by negating any of the inputs to $e'$ and 2) how to multiply two elements in $G_T$ with a multi-miller loop that's [already implemented](https://github.com/axiom-crypto/halo2-lib/blob/d3d271bfef0b726afe42e6a3317afe676149f838/halo2-ecc/src/bn254/pairing.rs#L492), we can rearrange the check into
$(e'(-\pi, [Z(\tau)]_2)e'(C - [r(\tau)]_1, H_2)))^x = 1$
This rearrangement only requires one final exponentiation, thus shaving off the number of required context cells by 30%. We implemented this efficient pairing check in [PR#65](https://github.com/axiom-crypto/halo2-lib/pull/65/files) and incorporated it into our project.
### What else is this useful for?
The `KZGChip` is a general-purpose primitive that can be used for any construction built on top of KZG. Aside from our immediate use case with proto-danksharding, we're also excited for future work to use the chip for Halo2 composition with KZG-based proof systems (including itself).
## Running our demo repo
You can begin by cloning [our repository](https://github.com/lyronctk/kzg-blob).
```
git clone git@github.com:lyronctk/kzg-blob.git
```
You can run our demo with the following command. It runs through what a sample workflow would look like for a rollup proving transaction inclusion in a blob commitment.
```
cargo run --example demo
```
We first randomly initializes 16 random transactions.
```
[
DemoTx { from: "0xJDp5PvyZPVfOt3YDqabZwi4mYA9LVC4O8Q1NUUaj", to: "0xKUGZ8LOdC2HL6Q6dDNtMb0Arwv9yRk9CL38RrZ4N", gas_limit: 12173, max_fee_per_gas: 126, max_priority_fee_per_gas: 14, nonce: 22, value: 123008058146 },
...,
DemoTx { from: "0x7selZXOPMPraS5guZACuJW4YexSpzVZi6bZxhsES", to: "0xXd225ynUdqKTlmZlFtfVbpVWhBQeoSX1hjTeT4Vj", gas_limit: 16162, max_fee_per_gas: 423, max_priority_fee_per_gas: 14, nonce: 25, value: 62855985777 }
]
```
The transactions are then packed into elements in $F_r$. Each demo transaction can be represented by two elements, so the blob length $n$ in this case is 32.
```
[
0x00006000009dc030000000000000002a000000000000002a00006000009dc000,
...,
0x00000000000001a70000000000003f22000000000000002a000000000000002a
]
```
We then commit to the blob data, producing a $C$ that is only a single group element in $G_1$. This is the only item that is stored in the EVM. Cheap!
```
(0x196decccff1936530fe5b88d5e529b9460f056a515342b33bed55a88c740e16c, 0x0c93b3a9f6b70434c0fe06f8f758f6c0eba61583342606e89e51be0b765e85dc)
```
The commitment $C$ anchors transactions in the EVM. At some later date, when the rollup wants to prove a statement about a set of historical transactions $s$, it must first prove that $s$ is consistent with $C$. [TODO when PR is merged]
## What's next?
One limitation with the current chip is that it's specific to the BN254 curve. An immediate next step following this project is adding support for other pairing-friendly curves, most notably BLS12-381 since this is what's officially used for proto-danksharding.
Upcoming use cases will also require concurrent reads from multiple blob commitments. Concretely, this refers to opening a set of commitments $\{C_1, C_2, ..., C_d\}$ at multiple non-overlapping points. This can be done efficiently using [SHPLONK](https://eprint.iacr.org/2020/081.pdf?ref=hackernoon.com).
## Acknowledgements
- [Yi Sun](https://twitter.com/theyisun?lang=en) and [Jonathan Wang](https://twitter.com/jonathanpwang?lang=en) for introducing us to Halo2 and guiding us through their fork. Contributions were made under the [Axiom Open Source Initiative](https://www.axiom.xyz/open-source).
- [Dan Boneh](https://crypto.stanford.edu/~dabo/) for supporting the project as a whole and for walking us through KZG.