# Ethereum zkVM - Proving L1 Blocks with Block-In-Blobs (BIOB) This document shares findings on the L1 proving overhead when the execution payload is included in blobs, i.e., DASed by CL clients. ## Context An inherent requirement of re-execution to check the State Transition Function (STF) validity is having full access to the execution payload — this is required since you can’t re-execute if you don’t have, for example, the transactions of a block. When STF validity proofs are used in the network, propagating the full execution payload isn’t strictly required, since it isn't needed for proof verification. The claim being proven is that the block with the expected `block_hash` is a valid STF. This means the set of zkCL attestors (attestors relying on STF proofs instead of re-execution) doesn’t need to download this data, freeing up bandwidth. Although downloading the entire payload isn’t required, we still need to ensure the data is recoverable. Said differently, we can see the L1 in the same way as L2s. ## Why does this reality affect L1 proving? Let’s dig a bit deeper into what is going on from the zkCL attestor's perspective. ### Without execution payload data-sampling In a reality where DA isn’t done for the execution payload, the proof public inputs will contain the `execution_payload_hash` being proven. The zkCL must actually check (~simplified) that this `execution_payload_hash` corresponds to the execution payload in the CL block. This check is just calculating the hash from this data in CL block and verifying is the same as the public input of the proof. ### With execution payload data-sampling In a reality where DA is done, the zkCL does not have the full execution payload, but only did DA to assert that this data is available. The public input of the proof won’t contain `execution_payload_hash` but `execution_payload_header_hash`, which is easily checked as mentioned before, since the execution payload **header** is available to the zkCL. But the zkCL still has to check that the DASed blobs claimed to correspond to the execution payload indeed correspond to the execution payload i.e. the KZG commitments alone can’t give you any information on what this data is about. This is the same problem that zk-L2 must solve. To solve this problem, we can leverage the same trick done by L2s: a proof of equivalence between the KZG commitment and the block body (private input): - The KZG commitment of a blob is a commitment to some polynomial `P`. - The block body must correspond to the same polynomial, but that’s what we want to prove, so let’s call it `P'`. - We define a random evaluation point `z = Hash(P' || KZG_Commitment)` - If `P(z) = P'(z)` then by Schwartz–Zippel lemma, both polynomials should be the same. - The `P(z)` is done by the host providing a `KZG_Proof` of the evaluation of `P` at `z`. This proof is done outside the guest program and passed as private input. - The `P'(z)` is done by using the barycentric formula. This work doesn’t require heavy cryptography like pairings. This work is done inside the guest program. - This concludes that the provided (private input) block body matches the commited blob. Note that hashing `P' || KZG_Commitment` binds both polynomials in a way that neither the blob nor the private input can be tricked. ## Experiment run We ran an experimental test demonstrating L1 blocks in a future reality where the execution payload is stored in blobs and extra checks are applied. Additionally, we considered adding uncompressed and compressed blobs of the execution payload to explore the relationship between saved blobs and decompression overhead in the guest program. This allows us to compare three scenarios: - `none` proves the STF without execution payload in blobs, i.e. the baseline scenario. - `raw` proves: i) Proof of equivalence check, ii) STF as usual - `snappy` proves: i) Proof of equivalence check, ii) Decompress using snappy since we need the block body for the next step, iii) STF as usual. Snappy was chosen since ELs already use it (not for STF, but other codebase layers). All runs were performed over 30 consecutive mainnet blocks in the range [23326233, 23326262]. ### A peek at the execution payload compression outputs Before even proving the described scenarios, we can already analyze the impact of snappy compression on the blob count compared to the uncompressed flavor. ```jsx Fixture Raw Size Compressed Ratio Raw Blobs Snappy Blobs Saved ------------------------------------------------------------------------------------------------------------------ rpc_block_23326239 255,304 103,516 40.5% 3 1 2 rpc_block_23326252 199,023 82,226 41.3% 2 1 1 rpc_block_23326236 131,001 53,682 41.0% 2 1 1 rpc_block_23326235 127,529 52,499 41.2% 2 1 1 rpc_block_23326234 213,452 87,860 41.2% 2 1 1 rpc_block_23326247 227,735 92,260 40.5% 2 1 1 rpc_block_23326255 232,799 93,425 40.1% 2 1 1 rpc_block_23326257 144,435 59,602 41.3% 2 1 1 rpc_block_23326259 149,271 58,465 39.2% 2 1 1 rpc_block_23326251 151,297 57,274 37.9% 2 1 1 rpc_block_23326246 191,138 75,751 39.6% 2 1 1 rpc_block_23326244 164,730 67,094 40.7% 2 1 1 rpc_block_23326262 152,801 59,842 39.2% 2 1 1 rpc_block_23326261 250,433 97,759 39.0% 2 1 1 rpc_block_23326250 220,775 84,135 38.1% 2 1 1 rpc_block_23326241 194,114 76,954 39.6% 2 1 1 rpc_block_23326238 217,842 86,972 39.9% 2 1 1 rpc_block_23326248 141,985 55,988 39.4% 2 1 1 rpc_block_23326256 189,247 77,546 41.0% 2 1 1 rpc_block_23326245 129,039 52,268 40.5% 2 1 1 rpc_block_23326243 123,234 50,033 40.6% 1 1 0 rpc_block_23326253 73,301 30,557 41.7% 1 1 0 rpc_block_23326242 114,254 50,488 44.2% 1 1 0 rpc_block_23326237 92,871 40,632 43.8% 1 1 0 rpc_block_23326240 88,082 35,435 40.2% 1 1 0 rpc_block_23326258 122,815 43,551 35.5% 1 1 0 rpc_block_23326249 77,448 33,054 42.7% 1 1 0 rpc_block_23326260 86,756 36,621 42.2% 1 1 0 rpc_block_23326254 116,775 49,428 42.3% 1 1 0 rpc_block_23326233 74,732 29,442 39.4% 1 1 0 ------------------------------------------------------------------------------------------------------------------ TOTAL (30 fixtures) 4,654,218 1,874,359 40.3% 51 30 21 ``` Some notes on the output above: - The block body is serialized with [bincode](https://docs.rs/bincode/latest/bincode/), not RLP. This is done for efficiency reasons since deserializing bincode is faster than RLP. This corresponds to the `Raw Size` column. - The `Compressed` and `Ratio` the columns correspond to the result of using Snappy to compress it. - `Raw Blobs` and `Snappy Blobs` correspond to the number of blobs required to fit them. The transformation of block body bytes into blobs is by mapping 31-byte chunks into the corresponding field. (31-bytes is used per field element to avoid having elements bigger than the modulus). - Example interpretations: - For block 23326239, the uncompressed block body spans 3 blobs, whereas the compressed block spans 1. - For block 23326233, the uncompressed and compressed variants map to 1 blob. To reiterate, the tension being explored is the relationship between saved blob counts by compression versus the decompression overhead in the guest program. ### Host, Guest and Prover configurations The measurements were done in a prover machine with a single L40 GPU — this should be okay, we are interested in relative proof times. The following are code pointers to the host, guest and prover: - Host: - The block body is passed as an [independent field from the rest of the block](https://github.com/eth-act/zkevm-benchmark-workload/blob/67435e99e0332394de7b16dd85af0ef04de84217/ere-guests/stateless-validator/reth/io/src/lib.rs#L24-L25). This is required since raw bytes are passed depending if we want compressed/uncompressed variants. - The [block body](https://github.com/eth-act/zkevm-benchmark-workload/blob/67435e99e0332394de7b16dd85af0ef04de84217/ere-guests/libs/src/blobs.rs#L10-L15) is represented in the new variants, and contains an optional KZG proof as private input, since the guest program supports the three variants `none`, `raw` and `snappy`. - The [block body compression analysis tool code](https://github.com/eth-act/zkevm-benchmark-workload/blob/67435e99e0332394de7b16dd85af0ef04de84217/crates/benchmark-runner/src/stateless_validator.rs#L49-L115). - Guest: - Contains new logic to [properly check the proof of equivalence and deserialize the potentially compressed block body](https://github.com/eth-act/zkevm-benchmark-workload/blob/67435e99e0332394de7b16dd85af0ef04de84217/ere-guests/stateless-validator/reth/guest/src/guest.rs#L36-L68). - Prover setup: - We use Risc0 for the zkVM since it is the only zkVM that uses `c-kzg` as a patched zkVM precompile that already contains convenient methods for proof-of-equivalence checks. - Other zkVMs could also be used, but they don’t have convenient zkVM-patched KZG crates with the proof of equivalence already implemented. In theory, it shouldn’t require much work, since the barycentric formula isn’t hard to implement, and the KZG proof check should naturally align with support for the EVM point evaluation precompile. ### Results The following are the collected wall-clock proving times: ![image](https://hackmd.io/_uploads/rJpA17JfZg.png) *Open the image in a separate tab to see it bigger.* The above image is a high-level overview of the proving time per block for the baseline (no block body in blobs), uncompressed block body in blobs, and snappy-compressed block body in blobs. At a glance, Snappy compression pays for itself—decompression overhead looks negligible. To understand this better, let’s group the blocks by how many blobs has the snappy compression achieved: ![image](https://hackmd.io/_uploads/HkjklQ1Gbg.png) *Open the image in a separate tab to see it bigger.* Interpretation: - The overhead of blob checking variants against the baseline is in the double-digit range. - The high variance is mostly related on the relative cost of the proof-of-equivalence evaluation, and the corresponding block gas limit (which is independent of block body size). - For blocks where the snappy compression didn’t save any blobs (x-axis = 0), both uncompressed and compressed proving times are almost the same. This suggests the snappy decompression isn’t a significant overhead compared to the rest of the guest program logic. - For blocks where snappy compression saved one blob (x-axis = 1) and 2 blobs (x-axis = 2), the proving times are clearly lower than in the uncompressed case, which makes sense since one blob proof of equivalence check is saved. This is expected, since the proof of equivalence should be linear in the number of blobs, dominated by the barycentric formula evaluation of the block body. (Caveat for x-axis = 2, only one block from the 30 fits this group) ## Potential future explorations Since today we might be in the early days of such an exploration in using zkVMs for L1 with block body in blobs, it is worth having in mind as potential future work: - zkVMs haven’t optimized the workload related to this feature at all, so most numbers should be taken with a grain of salt. If they optimize for a proof-of-equivalence zkVM precompile, which does the barycentric formula evaluation and internally leverages the already existing zkVM precompile for point evaluation; the overhead could be lower. - A benchmark run in a bigger set of blocks could be done. Instead of taking consecutive blocks, they could be hand-picked to include mainnet cases with bigger block bodies. - Since the overhead doesn’t depend on the block body contents, we could also avoid mainnet blocks hand-picking and use synthetic data to measure raw proof of equivalence performance for theoretical N blobs. Although we already expect the cost to be linear. - The pairing check work for the KZG proof verification could be removed from the guest program by making the barycentric evaluation a public input and having the host evaluate it. (h/t to Kev for suggesting this). - The snappy-compression could be done dynamically, depending on whether the compression saves any blob count, so no extra overhead is mandatory if there is no gain in doing the compression. This might not make sense since the data suggest the overhead seems small. - We can consider introducing other compression algorithms, such as zstd, to explore further compression ratios and decompression overheads.