# Offloading Proof Computation from Beacon Nodes to Transaction Sender
**TL;DR**
- With EIP-4844, transaction senders compute KZG proofs and send them along with the blob transactions.
- Under the current PeerDAS design, verifying samples requires KZG proofs at the sample (cell) level, which are currently computed in the CL to avoid leaking DAS cryptography to the EL. However, this creates a computation bottleneck and limits scalability.
- The proposed solution shifts this computation to transaction senders, so they compute KZG proofs for individual cells instead of per blob, removing the need for nodes to compute any KZG proofs in the block production window.
## Intro

*(source: [Dune](https://dune.com/hildobby/blobs))*
We need more blobs today, and even more tomorrow.
Can the current form of PeerDAS scale to the theoretical limit of 1D PeerDAS - **64 to 72 blobs max?** The short answer is **unlikely**.
There are some known bottlenecks, and we're not even sure if scaling to 32 blobs is safe (more on this below). If we start with a conservative increase in Fusaka - say 16-18 blobs max - we'd have to wait for the G fork (2026-2027) to increase it again. That's likely too late and reduces the value of pushing PeerDAS now.
*(unless we have [BPO forks](https://ethereum-magicians.org/t/blob-parameter-only-bpo-forks/22623) to allow us to change blob count between forks)*
In this post, we'll explore a solution that enables us to scale blob count further (possible to the theoretical limit of 1D PeerDAS) and more safely, and present a case to have this shipped in the Fusaka fork. This solution was discussed during DevCon, and advocated by Francesco to include with PeerDAS.
## Problem
Today on mainnet (Deneb), sending blob transactions (EIP-4844) require the transaction sender to include a computed KZG commitment and proof as part of the raw transaction. The commitment ensures fake blob data cannot be substituted, and the blob proofs are validated in both the execution layer (EL) when the transaction enters the mempool, and in the consensus layer (CL) when they are transmitted across the network.
From the PeerDAS upgrade, the network form of blob data changes in the consensus layer - instead of `Blobs`, blob data will be sent in the forms of `DataColumns` that comprises of `Cells`, with a proof accompanying each respective cell[^1], and nodes would use these proofs (instead of blob KZG proofs from Deneb) to verify all transmitted cells against the KZG commitments.
In the current specification, these cell KZG proofs are computed by the proposer during block production. Computing these proofs is quite expensive, **~150ms** for each blob on a single thread but can be parallelised. This has been a known bottleneck for a while, and there's been new optimisations to bring this number down, including optimising KZG libraries performance and [distributed blob building](#Appendix-2-Distributed-Blob-Building-Refresher).
This will allow us to safely scale to some limited extent without increasing full node hardware and bandwidth requirement. However the proof computation time increases as the blob count increases, and **regardless of who computes the proof, the proposer or another more powerful node, *someone* has to compute the proofs in the 4 second block proposal critical path**. This limits how far we can scale with PeerDAS under the current form, and going for the theoretical limit of 64-72 blobs may be a bit risky.
Below table shows estimated **Proof Computation Time** based on benchmarks for various blob count and CPU thread count. Note that we use "available CPU threads" here because the node would also be spending cycles on other tasks within the CL + potentially an EL and other processes runing on the same machine.
| Blob Count \ **Available** CPU Threads | 8 | 16 | 32 |
| -------------------------------------- | ------ | ----- | ----- |
| 16 | 300ms | 150ms | 150ms |
| 32 | 600ms | 300ms | 150ms |
| 64 | 1200ms | 600ms | 300ms |
| 72 | 1350ms | 750ms | 450ms |
As you can see above, in the best case for 32 blobs, a powerful node (with 16 avaialble threads) that contributes to proof building would still take 300ms for computing proofs, plus it also has to publish them to its mesh peers.
## Possible Solutions
There are a few possible solutions that would allow scaling blobs to a higher degree:
1. **Distributed Blob Building that allows computing a subset of all blobs**[^2].
2. **Offload Proof Computation to Transaction Sender**.
The main downside of Option 1 is complexity and the substantial changes required to the existing PeerDAS implementation.
**Option 2** was discussed during the R&D workshop at Devcon 2024 and in multiple CL breakout calls afterward, with agreement from all participants. As a next step, we'd like more eyes on this proposal - especially from the EL teams, who would be involved in implementing some of the changes.
### How it works
Instead of computing the cell KZG proofs in the CL during block production, we have the transaction sender compute the cell proofs and send them along with the transaction, similar to how it's done in EIP-4844.
```mermaid
sequenceDiagram
participant sender as Blob Sender
participant EL as Execution Client
participant CL as Beacon Node
Note over sender: compute blob cell proofs and KZG commitments
sender->>EL: send signed EIP-4844 transaction
Note over EL: verify tx, cell proofs and add to blob mempool
Note over CL: block production begins
CL->>EL: get blobs bundle
EL->>CL: returns blobs, commitments, and cell proofs
Note over CL: "extend" blobs, assemble data columns and publish
```
Replacing the unused blob KZG proofs with cell KZG proofs across all layers is probably a worthy cleanup too, as they will no longer be used in the CL.
## Limitations
Initially this was not widely accepted because it leaks DAS cryptography into the EL, and would result in a coupling that potentially make future changes harder (e.g. cell size reduction, encoding changes etc). However, despite requiring changes in multiple layers, this is arguably a simpler and necessary change, as it doesn't requiring an extensive amount of optimisation. Longer term, I suspect the EL *may* end up having knowledge of DAS cryptograpgy anyway if we end up implementing a vertically sharded blob mempool.
## High Level List of Required Changes
See [appendix section below](https://hackmd.io/7eaKv96DQdGGFgDqNKh9SQ?both#Appendix-1-High-Level-List-of-Required-Changes).
## Q&A
> It's not 100% clear to me from the proposal, but if we shift the proof computation from the CL to the tx sender, are 72 blobs as new max realistic or is there another limitation that wouldn't allow us to reach that?
Based on what I know so far, I believe it's realistic - the main known bottlenecks are proof computation time and bandwidth requirements to propagate blobs during block proposal. This new proposal eliminates the proof computation bottleneck during block production, and distributed blob building is quite effective on solving the later based on the testing so far.
We expect an increase in EL bandwidth as the blob count increases, but it is relatively small compared to the bandwidth usage on the CL (gossip is bandwidth intensive), and I believe the potential optimisations on CL gossip could offset the increase in EL bandwidth. Some numbers here: https://blog.sigmaprime.io/peerdas-distributed-blob-building.html#impact-on-node-operators
Now the 64-72 max blobs is still a theoretical limit until we have a version to test with, I've only tested up to 32 blobs. AFAIK there are no other known bottlenecks on the CL side, but possible limitations:
- Unexpected bottlenecks in the EL with a max blobs of 64+. Is there any limitations with the blob mempool?
- Distributed Blob Building only works well for public blob transactions, which is what most blob transactions are today. We might need to think about a solution to cover private transactions, but this doesn't seem to be an _immediate_ blocker.
## Conclusion
**If there are no major issues with this approach, the goal is to finalise the spec across all layers ASAP so client teams can start working on an implementation.** We'll also need teams to help drive spec changes listed below.
This could be the final spec change needed to ship the first iteration of PeerDAS - hopefully in 2025. 🚀
Thanks for reading!
## Appendix 1: High Level List of Required Changes
### EIP
[Update EIP-7594: include cell proofs in network wrapper of blob txs #9378](https://github.com/ethereum/EIPs/pull/9378)
### EL Changes
- **RPC API**: Replace blob KZG proofs with cell KZG proofs in the EIP-4844 transaction.
- **Mempool and data distribution**: See Francesco's post here: https://hackmd.io/@fradamt/mempool-change
- **Engine API**:
- `getPayloadV5`: changes `BlobsBundle` to include cell KZG proofs instead of blob KZG proofs.
- `getBlobsV2`: changes to return list of blobs and cell KZG proofs (instead of blob KZG proofs).
### CL changes
- Update integration with EL `getBlobsBundle` and EL `getBlobs`
- Update code to assemble `DataColumnSidecar`:
- Extend blobs
- Use cell proofs from EL (no computaion)
- `GetBlobSidecar` API
- We no longer have the blob KZG proofs, may need a v2 to return cell proofs.
### KZG libraries
Changes to support the above operations:
- method for tx sender to compute cell proofs: `computeProofsWithoutExtendingBlob`
- method for EL to verify blob and extended cell proofs: `verifyBlobWithExtendedProof`
- method for CL to extend blobs without computing proofs
(Thanks @Kev for the inputs 🙏)
## Appendix 2: Distributed Blob Building Refresher
Here's a diagram illustrating the computation and bandwidth bottleneck in PeerDAS:
```mermaid
sequenceDiagram
participant Proposer as Proposer with 8 CPU cores
activate Proposer
Note over Proposer: t=1000ms, Produced block, and Start Compute Data Columns for 32 blobs
Proposer ->> Peers: t=1000ms, Publish Beacon Block
Proposer ->> Peers: t=1500ms, Beacon Block Sent
Note over Proposer: t=1900ms Computed Data Columns and Proofs
Proposer ->> Peers: t=1900ms, Publish Data Columns for 32 blobs (8MB to 8 mesh peers, *in parallel*)
Note over Peers: t=4000ms, attestation deadline, block not available yet
Proposer ->> Peers: t=10000ms+, All Data Columns (64mb) sent, but too late ☠️
```
Distributed blob building was proposed to alleviate these problems and distribute the computation and data propagation work across more powerful nodes in the network:
```mermaid
sequenceDiagram
participant Proposer as Proposer with 8 CPU cores
participant Peers as Supernode Peer with 32 CPU cores
participant Others as Other nodes
Note over Proposer: t=1000ms, Produced block, and Start computing Data Columns for 32 blobs
Proposer ->> Peers: t=1000ms, Publish Beaocon Block
Proposer ->> Peers: t=1500ms, Beacon Block Sent
Note over Peers: t=1500ms, Fetch 32 blobs from EL, start computing columns
Note over Peers: t=1800ms, Computed all data columns, block available 🎉
Peers ->> Others: all supernodes publish *some* data columns
Note over Proposer: t=1900ms Computed Data Columns, start publishing too
Note over Others: t= ~2500ms received all columns, block available 🎉
```
*note: numbers are estimates only based on benchmarks and [tests conducted earlier with 16 blobs](https://blog.sigmaprime.io/peerdas-distributed-blob-building.html). Assumes 50mbps upload bandwidth, 8 cores with 6 *avaialble* cores for proof computation, and 8 mesh peers.*
[^1]: Blob data network form changes from Blobs to Columns, which comprises of cells.

[^2]: This approach changes data column gossip to allow transmitting a subset of cells and proofs. this will enable a more efficient and more effective form of distribute blob building, where nodes can compute and distribute cell proofs without having all the blobs. (Currently the network form `DataColumnSidecar` requires all blobs to compute).