Try   HackMD

Offloading Proof Computation from Beacon Nodes to Transaction Sender

TL;DR

  • With EIP-4844, transaction senders compute KZG proofs and send them along with the blob transactions.
  • Under the current PeerDAS design, verifying samples requires KZG proofs at the sample (cell) level, which are currently computed in the CL to avoid leaking DAS cryptography to the EL. However, this creates a computation bottleneck and limits scalability.
  • The proposed solution shifts this computation to transaction senders, so they compute KZG proofs for individual cells instead of per blob, removing the need for nodes to compute any KZG proofs in the block production window.

Intro

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

(source: Dune)

We need more blobs today, and even more tomorrow.

Can the current form of PeerDAS scale to the theoretical limit of 1D PeerDAS - 64 to 72 blobs max? The short answer is unlikely.

There are some known bottlenecks, and we're not even sure if scaling to 32 blobs is safe (more on this below). If we start with a conservative increase in Fusaka - say 16-18 blobs max - we'd have to wait for the G fork (2026-2027) to increase it again. That's likely too late and reduces the value of pushing PeerDAS now.
(unless we have BPO forks to allow us to change blob count between forks)

In this post, we'll explore a solution that enables us to scale blob count further (possible to the theoretical limit of 1D PeerDAS) and more safely, and present a case to have this shipped in the Fusaka fork. This solution was discussed during DevCon, and advocated by Francesco to include with PeerDAS.

Problem

Today on mainnet (Deneb), sending blob transactions (EIP-4844) require the transaction sender to include a computed KZG commitment and proof as part of the raw transaction. The commitment ensures fake blob data cannot be substituted, and the blob proofs are validated in both the execution layer (EL) when the transaction enters the mempool, and in the consensus layer (CL) when they are transmitted across the network.

From the PeerDAS upgrade, the network form of blob data changes in the consensus layer - instead of Blobs, blob data will be sent in the forms of DataColumns that comprises of Cells, with a proof accompanying each respective cell[1], and nodes would use these proofs (instead of blob KZG proofs from Deneb) to verify all transmitted cells against the KZG commitments.

In the current specification, these cell KZG proofs are computed by the proposer during block production. Computing these proofs is quite expensive, ~150ms for each blob on a single thread but can be parallelised. This has been a known bottleneck for a while, and there's been new optimisations to bring this number down, including optimising KZG libraries performance and distributed blob building.

This will allow us to safely scale to some limited extent without increasing full node hardware and bandwidth requirement. However the proof computation time increases as the blob count increases, and regardless of who computes the proof, the proposer or another more powerful node, someone has to compute the proofs in the 4 second block proposal critical path. This limits how far we can scale with PeerDAS under the current form, and going for the theoretical limit of 64-72 blobs may be a bit risky.

Below table shows estimated Proof Computation Time based on benchmarks for various blob count and CPU thread count. Note that we use "available CPU threads" here because the node would also be spending cycles on other tasks within the CL + potentially an EL and other processes runing on the same machine.

Blob Count \ Available CPU Threads 8 16 32
16 300ms 150ms 150ms
32 600ms 300ms 150ms
64 1200ms 600ms 300ms
72 1350ms 750ms 450ms

As you can see above, in the best case for 32 blobs, a powerful node (with 16 avaialble threads) that contributes to proof building would still take 300ms for computing proofs, plus it also has to publish them to its mesh peers.

Possible Solutions

There are a few possible solutions that would allow scaling blobs to a higher degree:

  1. Distributed Blob Building that allows computing a subset of all blobs[2].
  2. Offload Proof Computation to Transaction Sender.

The main downside of Option 1 is complexity and the substantial changes required to the existing PeerDAS implementation.

Option 2 was discussed during the R&D workshop at Devcon 2024 and in multiple CL breakout calls afterward, with agreement from all participants. As a next step, we'd like more eyes on this proposal - especially from the EL teams, who would be involved in implementing some of the changes.

How it works

Instead of computing the cell KZG proofs in the CL during block production, we have the transaction sender compute the cell proofs and send them along with the transaction, similar to how it's done in EIP-4844.

Beacon NodeExecution ClientBlob SenderBeacon NodeExecution ClientBlob Sendercompute blob cell proofs and KZG commitmentsverify tx, cell proofs and add to blob mempoolblock production begins"extend" blobs, assemble data columns and publishsend signed EIP-4844 transactionget blobs bundlereturns blobs, commitments, and cell proofs

Replacing the unused blob KZG proofs with cell KZG proofs across all layers is probably a worthy cleanup too, as they will no longer be used in the CL.

Limitations

Initially this was not widely accepted because it leaks DAS cryptography into the EL, and would result in a coupling that potentially make future changes harder (e.g. cell size reduction, encoding changes etc). However, despite requiring changes in multiple layers, this is arguably a simpler and necessary change, as it doesn't requiring an extensive amount of optimisation. Longer term, I suspect the EL may end up having knowledge of DAS cryptograpgy anyway if we end up implementing a vertically sharded blob mempool.

High Level List of Required Changes

See appendix section below.

Q&A

It's not 100% clear to me from the proposal, but if we shift the proof computation from the CL to the tx sender, are 72 blobs as new max realistic or is there another limitation that wouldn't allow us to reach that?

Based on what I know so far, I believe it's realistic - the main known bottlenecks are proof computation time and bandwidth requirements to propagate blobs during block proposal. This new proposal eliminates the proof computation bottleneck during block production, and distributed blob building is quite effective on solving the later based on the testing so far.

We expect an increase in EL bandwidth as the blob count increases, but it is relatively small compared to the bandwidth usage on the CL (gossip is bandwidth intensive), and I believe the potential optimisations on CL gossip could offset the increase in EL bandwidth. Some numbers here: https://blog.sigmaprime.io/peerdas-distributed-blob-building.html#impact-on-node-operators

Now the 64-72 max blobs is still a theoretical limit until we have a version to test with, I've only tested up to 32 blobs. AFAIK there are no other known bottlenecks on the CL side, but possible limitations:

  • Unexpected bottlenecks in the EL with a max blobs of 64+. Is there any limitations with the blob mempool?
  • Distributed Blob Building only works well for public blob transactions, which is what most blob transactions are today. We might need to think about a solution to cover private transactions, but this doesn't seem to be an immediate blocker.

Conclusion

If there are no major issues with this approach, the goal is to finalise the spec across all layers ASAP so client teams can start working on an implementation. We'll also need teams to help drive spec changes listed below.

This could be the final spec change needed to ship the first iteration of PeerDAS - hopefully in 2025. 🚀

Thanks for reading!

Appendix 1: High Level List of Required Changes

EIP

Update EIP-7594: include cell proofs in network wrapper of blob txs #9378

EL Changes

  • RPC API: Replace blob KZG proofs with cell KZG proofs in the EIP-4844 transaction.
  • Mempool and data distribution: See Francesco's post here: https://hackmd.io/@fradamt/mempool-change
  • Engine API:
    • getPayloadV5: changes BlobsBundle to include cell KZG proofs instead of blob KZG proofs.
    • getBlobsV2: changes to return list of blobs and cell KZG proofs (instead of blob KZG proofs).

CL changes

  • Update integration with EL getBlobsBundle and EL getBlobs
  • Update code to assemble DataColumnSidecar:
    • Extend blobs
    • Use cell proofs from EL (no computaion)
  • GetBlobSidecar API
    • We no longer have the blob KZG proofs, may need a v2 to return cell proofs.

KZG libraries

Changes to support the above operations:

  • method for tx sender to compute cell proofs: computeProofsWithoutExtendingBlob
  • method for EL to verify blob and extended cell proofs: verifyBlobWithExtendedProof
  • method for CL to extend blobs without computing proofs
    (Thanks @Kev for the inputs 🙏)

Appendix 2: Distributed Blob Building Refresher

Here's a diagram illustrating the computation and bandwidth bottleneck in PeerDAS:

PeersProposer with 8 CPU coresPeersProposer with 8 CPU corest=1000ms, Produced block, and Start Compute Data Columns for 32 blobst=1900ms Computed Data Columns and Proofst=4000ms, attestation deadline, block not available yett=1000ms, Publish Beacon Blockt=1500ms, Beacon Block Sentt=1900ms, Publish Data Columns for 32 blobs (8MB to 8 mesh peers, *in parallel*)t=10000ms+, All Data Columns (64mb) sent, but too late ☠️

Distributed blob building was proposed to alleviate these problems and distribute the computation and data propagation work across more powerful nodes in the network:

Other nodesSupernode Peer with 32 CPU coresProposer with 8 CPU coresOther nodesSupernode Peer with 32 CPU coresProposer with 8 CPU corest=1000ms, Produced block, and Start computing Data Columns for 32 blobst=1500ms, Fetch 32 blobs from EL, start computing columnst=1800ms, Computed all data columns, block available 🎉t=1900ms Computed Data Columns, start publishing toot= ~2500ms received all columns, block available 🎉t=1000ms, Publish Beaocon Blockt=1500ms, Beacon Block Sentall supernodes publish *some* data columns

note: numbers are estimates only based on benchmarks and tests conducted earlier with 16 blobs. Assumes 50mbps upload bandwidth, 8 cores with 6 avaialble cores for proof computation, and 8 mesh peers.


  1. Blob data network form changes from Blobs to Columns, which comprises of cells.
    image ↩︎

  2. This approach changes data column gossip to allow transmitting a subset of cells and proofs. this will enable a more efficient and more effective form of distribute blob building, where nodes can compute and distribute cell proofs without having all the blobs. (Currently the network form DataColumnSidecar requires all blobs to compute). ↩︎