## Blob gossip and validation before and after PeerDAS

<sup>$\uparrow$ https://whyy.org/episodes/why-we-gossip/</sup>
$\cdot$
**tl;dr;** Blobs contribute to Ethereum scaling, in part, by providing a confirmation rule for L2 transactions. The value of this confirmation rule depends on the L2s sequencer model and can lead to public or private L1 blob transactions (80% of blobs are gossiped in the public mempool today). With this context, we examine how the protocol presently handles blobs by considering the execution layer (abbr. EL) mempool and the consensus layer (abbr. CL) blob validation. We then turn our attention to the changes introduced by PeerDAS (the next step along the path of Ethereum's data scaling roadmap) to demonstrate that, while the CL only samples a subset of the total blob data, the EL mempool will, by default, still receive all public blobs. This fact reduces the benefit of sharding the blobs in the first place, and we conclude by examining a few candidate mechanisms to shard the EL mempool horizontally or vertically and the tradeoffs they make.
$\cdot$
by [mike neuder](https://x.com/mikeneuder) β *april 15, 2025 (happy tax day!* πͺπ *)*
$\cdot$
*Thanks to [Julian Ma](https://x.com/_julianma) and [Francesco D'Amato](https://x.com/fradamt) for extensive comments. Further thanks to [Alex Stokes](https://x.com/ralexstokes) and [lightclients](https://x.com/lightclients) for helpful discussions.*
#### π Specs and protocol docs
| Description | Link |
|-------------|------|
| 4844 networking spec | [ethereum/EIPs β EIP-4844: Networking](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-4844.md#networking) |
| EL mempool spec | [ethereum/devp2p β `NewPooledTransactionHashes`](https://github.com/ethereum/devp2p/blob/master/caps/eth.md#newpooledtransactionhashes-0x08) |
| Deneb p2p interface | [ethereum/consensus-specs β Deneb: Blob Subnets](https://github.com/ethereum/consensus-specs/blob/dev/specs/deneb/p2p-interface.md#blob-subnets) |
| Cancun engine API | [ethereum/execution-apis β `engine_getBlobsV1`](https://github.com/ethereum/execution-apis/blob/main/src/engine/cancun.md#engine_getblobsv1) |
| Fulu p2p interface | [ethereum/consensus-specs β Fulu: Data Column Subnets](https://github.com/ethereum/consensus-specs/blob/9d377fd53d029536e57cfda1a4d2c700c59f86bf/specs/fulu/p2p-interface.md#data_column_sidecar_subnet_id) |
#### π Related articles, notes, and dashboards
| Description | Link |
|-------------|------|
| Davide's article on Taiko sequencing | [Understanding Based Rollups & Total Anarchy β ethresear.ch](https://ethresear.ch/t/understanding-based-rollups-pga-challenges-total-anarchy-and-potential-solutions/21320) |
| Pop, Nishant, Chirag on improving CL gossip | [Doubling the Blob Count with GossipSub v2.0 β ethresear.ch](https://ethresear.ch/t/doubling-the-blob-count-with-gossipsub-v2-0/21893) |
| Francesco's blob mempool tickets | [Blob Mempool Tickets β HackMD](https://hackmd.io/@fradamt/blob-mempool-tickets) |
| Dankrad's mempool sharding document | [Mempool Sharding β Ethereum Notes](https://notes.ethereum.org/@dankrad/BkJMU8d0R) |
| DataAlways' orderflow dashboard | [Private Order Flow β Dune](https://dune.com/dataalways/private-order-flow) |
---
### Contents
[(1). L2 transaction lifecycles](#1-L2-transaction-lifecycles)
[(1.1). Centralized sequencers β patient blobs](#11-Centralized-sequencers-implies-patient-blobs)
[(1.2). Total anarchy β impatient blobs](#12-Total-anarchy-implies-impatient-blobs)
[(1.3). Aside: blobs in a "preconf" world](#13-Aside-blobs-in-a-preconf-world)
[(2). Blob gossip and validation pre-PeerDAS](#2-Blob-gossip-and-validation-pre-PeerDAS)
[(2.1). Blob gossip and the mempool](#21-Blob-gossip-and-the-mempool)
[(2.2). Block validation and blobs](#22-Block-validation-and-blobs)
[(2.3). The full pre-PeerDAS picture](#23-The-full-pre-PeerDAS-picture)
[(3). Blob gossip and validation post-PeerDAS](#3-Blob-gossip-and-validation-post-PeerDAS)
[(3.1). Block validation and blobs](#31-Block-validation-and-blobs)
[(3.2). Blob gossip changes](#32-Blob-gossip-changes)
[(3.2.1). Horizontally shard the EL mempool](#321-Horizontally-shard-the-EL-mempool)
[(3.2.2). Vertically shard the EL mempool](#322-Vertically-shard-the-EL-mempool)
[(4). Summary and conclusion](#4-Summary-and-conclusion)
---
### (1). L2 transaction lifecycles
To understand the properties of blob transactions, we first need to understand the **service** that the blobs provide to L2 users. Blobs are the vehicle through which Ethereum rollups post L2 transaction data to the L1. In this way, an L2 user who sees their transaction sequenced in a blob and included on the L1 can use that as a "confirmation rule" on their transaction inclusion and ordering.
> **Definition (informal)** β *A <u>confirmation rule</u> is a signal indicating that a transaction has been included and ordered.*
This definition is vague because confirmation rules can come in many flavors. We will discuss this ad nauseam below, but here are a few examples that should feel familiar.
<u>Example confirmation rules</u>:
1. *Six bitcoin blocks* β The bitcoin core client marks any transaction with six or more blocks built on the including block as [confirmed](https://en.bitcoin.it/wiki/Confirmation).
2. *Ethereum finality* β Ethereum blocks are finalized in batches (called epochs). Once a transaction is finalized, it will only be reverted if 1/3 of the validator set is provably slashable. Finality is a robust guarantee, but it is a bit slow.

As seen in the image above, Etherscan lets you know how strong this confirmation rule is.
4. *Ethereum block inclusion* β Even before finality, Ethereum transactions are confirmed by being included in a block.

As seen in the image above, Etherscan gives you a green checkmark, with some text advising that the block is not yet finalized. Still, for most transactions, this confirmation rule is sufficient.
4. *Base centralized sequencer green check* β Since Coinbase is the only party that sequences Base transactions, you only need confirmation from their sequencer.

The image above shows that Basescan gives you the green check because the sequencer confirmed the transaction.
Returning to blobs, an L2 transaction being included in a blob that landed on the L1 is a confirmation rule, but the importance of this specific confirmation depends significantly on how the L2 sequences transactions. For a centrally sequenced L2 (e.g., Base, OP Mainnet, Arbitrum), the green checkmark you get from the sequencer is the only confirmation rule you care about, while the actual posting of data to the L1 is not that meaningful to the L2 users.[^3] In contrast, for a based rollup using [total anarchy sequencing](https://vitalik.eth.limo/general/2021/01/05/rollup.html) (e.g., [Taiko](https://ethresear.ch/t/understanding-based-rollups-pga-challenges-total-anarchy-and-potential-solutions/21320)), the L2 transaction inclusion in a blob that lands in an L1 block is *the first* and most crucial confirmation you get. This distinction is *vital* because it determines the properties of blob transactions on the L1, which we should consider when designing the L1.
Sections (1.1) and (1.2) further describe the L2 transaction lifecycle for centralized and total anarchy sequencing, respectively. We examine these two modalities in detail because they are what exists today. Section (1.3) briefly considers the potential implications of a world with based & native rollups that give "preconfs."
[^3]: Of course, the blob posting is essential for fraud proofs and forced exits, two critical features of L2s. We emphasize the confirmation rule aspect because the default path for L2 transactions will be to treat the sequencer confirmation as final.
#### (1.1). Centralized sequencers $\implies$ patient blobs
Let's start with the most basic rollup construction: a centralized sequencer occasionally posting L2 transaction data as blobs to the L1. The figure below demonstrates this flow.

<u>Step-by-step annotation:</u>
1. The user submits their L2 transaction to the centralized sequencer.
2. The sequencer immediately confirms the transaction for the user. We call this `conf #1` as the first confirmation.
3. The sequencer batches many L2 transactions into an L1 blob, which they submit to the public mempool.
4. The Ethereum builder/proposer observes the mempool, picks up the blob to include in a block, and publishes the block to the consensus layer network.
5. The user receives their second confirmation when the blob that includes their transaction is published to the L1.[^1]
[^1]: The user could further wait for the finality of the L1 block as a third confirmation.
**Key point:** *almost all L2 transactions will rely on the centralized sequencer confirmation (`conf #1`) and won't demand timely blob inclusion on the L1 (`conf #2`).* There are many proposed fallbacks to the centralized sequencer in the case of outages or censorship (e.g., [Arbitrum's "Censorship Timeout"](https://docs.arbitrum.io/how-arbitrum-works/sequencer#censorship-timeout) or [Optimism's "OptimismPortal"](https://docs.optimism.io/stack/rollup/outages#about-the-optimismportal)). Still, the overwhelming majority of transactions will mainly rely on the sequencer confirmation. *Critically, this implies that blobs posted by centralized sequencer rollups will not be latency-sensitive.*[^2] We categorize these blobs as "patient" (borrowing the definition from Noam's [Serial Monopoly](https://arxiv.org/abs/2311.12731) paper), because they are indifferent (over reasonable time horizons) about which exact L1 block the blob is included in.
[^2]: Terence [mentioned some other reasons](https://x.com/terencechain/status/1907485321459372417) centralized sequencer L2s might be time-sensitive, but there is still a long time window during which the blob can be posted without effecting rollup operations meaningfully.
#### (1.2). Total anarchy $\implies$ impatient blobs
Moving to a much different sequencer model, let's consider Taiko, which uses a "total anarchy" permissionless sequencer model (for now β they are planning to upgrade to an allow list for block builders partly because of the problems outlined below). The figure below demonstrates the L2 transaction lifecycle in this case.

<u>Step-by-step annotation:</u>
1. The user submits their L2 transaction to the Taiko mempool.
2. Searchers listen to the mempool, construct L1 blobs containing the L2 transactions, and send the blobs to the L1 builders directly (more on this private connection below).
3. The builder/proposer includes the blob in their published block.
4. The user receives their first confirmation when they see an L1 block containing a blob containing their transaction.
**Key point:** *all these L2 transactions will rely on timely blob inclusion on the L1 (`conf #1`).* Until then, their transaction will remain pending. Searchers will compete to submit L1 blobs with profitable L2 transaction sequencing. We call these blobs "impatient" because both (i) their timely inclusion and (ii) their order within the L1 block are critical to the L2 functioning. We already see this empirically; see Davide Rezzoli's [recent article](https://ethresear.ch/t/understanding-based-rollups-pga-challenges-total-anarchy-and-potential-solutions/21320) outlining how Taiko labs face adverse selection when posting blobs and are often outbid by more competitive searchers.
One subtlety alluded to in step 2 above: we expect the vast majority of these blobs to go directly to builders instead of going through the public mempool. We also see this empirically as described by DataAlways in [this tweet](https://x.com/Data_Always/status/1904298479448375450). When there is an open competition to sequence L2 transactions, blobs will be carrying MEV and thus must flow through private channels to avoid being front-run and/or unbundled. DataAlways summarizes nicely in [this tweet](https://x.com/Data_Always/status/1907479889407590757); see the surrounding thread for further context.
#### (1.3). Aside: blobs in a "preconf" world
["Preconf" rollups](https://ethresear.ch/t/based-preconfirmations/17353) aim to give L2 sequencing authority to L1 validators who opt-in to an out-of-protocol service. With this authority, the L1 proposer who is elected as the next leader to propose an L2 block can issue "preconfirmations" (promises of inclusion and/or ordering) to L2 transactions (a preconf is, itself, a confirmation rule). Thus, the L1 proposer who also builds the L2 block receives payments and MEV from building the L2 block.
We aren't going to spend too much time here because preconf rollups don't exist yet, but it is worth touching on. We believe blobs built by L1 proposers (or builders/relays) who are giving preconfs to L2 users may hit the public mempool. Consider a validator (the next enrolled preconfer) who is the L1 proposer eight slots into the future. Thus, they have sole sequencing rights over the L2 for 96 seconds. Each preconf they issue corresponds to an L2 transaction, which they must pack into a blob and post to the L1 (in a specific order). This validator can publish the blobs in order and doesn't necessarily need to wait for their slot to include the blob in their own block. Again, this is all a bit speculative and dependent on the L2 construction, but it seems possible that these blobs will need to be included over the next eight slots but won't be as latency-sensitive as those that use total anarchy to sequence (as discussed in the previous section); these blobs might be best modeled as "quasi-patient" transactions (e.g., see [this paper](https://arxiv.org/abs/2405.17334)).
Of course, once it is the validator's slot, they can simply include any remaining blobs with the preconfed L2 transactions directly. Existing designs have these preconfs enforced by slashing conditions, so the validator would be strongly incentivized to ensure the blobs make it on a chain in the order they promised. We close this topic here, but it will be important to discuss if we see increased usage of preconf rollups.
### (2). Blob gossip and validation pre-PeerDAS
Let's take stock of where we are today based on Section (1). We partition blobs into two categories:
1. Patient, public mempool blobs.
2. Impatient, private mempool blobs.
From this DataAlways [dashboard](https://dune.com/queries/4266705/7172423), we see that about 80% of blobs hit the public mempool, and only the Taiko sequencers (a permissionless set, as described above) are consistently sending [private blobs](https://dune.com/queries/4266826/7172681). For now, this partition accurately characterizes the existing blob flow. We return to the L1 and consider how blobs consume network bandwidth for validators participating in consensus. A validator has the following blob-related roles:
1. gossiping blob transactions, and
2. validating that blobs a block commits to are available before attesting.
These roles have very different implications for each validator's network resource consumption based on when they happen in the slot.
#### (2.1). Blob gossip and the mempool
Today, validators connect to different peers with their EL and CL clients. The "mempool" refers to the set of transactions the EL client hears about before being included in a block. As specified in EIP-4844, blob transactions are gossiped in a pull-based manner.
> "Nodes MUST NOT automatically broadcast blob transactions to their peers. Instead, those transactions are only announced using NewPooledTransactionHashes messages and can then be manually requested via GetPooledTransactions."
> β [Networking, EIP-4844](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-4844.md#networking).
The [`NewPooledTransactionHashes` message](https://github.com/ethereum/devp2p/blob/master/caps/eth.md#newpooledtransactionhashes-0x08) serves as an announcement of a blob, and any peer who hasn't yet downloaded that blob responds directly with a [`GetPooledTransactions` request](https://github.com/ethereum/devp2p/blob/master/caps/eth.md#getpooledtransactions-0x09). In this manner, all blobs that hit the public mempool are propagated quickly to every node. The sequence diagram below demonstrates this process.

<u>Step-by-step annotation:</u>
1. Alice notifies Bob of a new blob transaction with `NewPooledTransactionHashes`, which contains the transaction type, size, and hash.
2. If Bob doesn't already have that blob, he requests it from Alice with `GetPooledTransctions`.
3. Alice responds by sending the full blob to Bob.
4. Bob notifies his peers with a `NewPooledTransactionHashes` message.
**Key point:** Each node should only download each blob a single time when they request it from the first peer because they ask for the blob sequentially from one peer at a time. After that, they will ignore any `NewPooledTransacionHashes` that include the blobs they already downloaded.
With this heuristic, we calculate today's blob mempool bandwidth consumption should be around 32 kB/s = (128 kB/blob * 3 blob/slot) / 12 s/slot. The figure below shows the empirical data is close to this theoretical value.

> *Blob mempool ingress bandwidth consumption. 33.8 kB/s is only slightly higher than the expected 32 kB/s, resulting from 3 blobs per slot.*
**Key point:** Public mempool blobs are spread out over the 12-second slot, distributing the network load over the interval. Additionally, each node expects to see *every public mempool blob*.
#### (2.2). Block validation and blobs
The mempool accounts for blob transactions not yet included in a block. Separately, when a validator receives a new block, they must ensure that the blobs the block commits to are available to determine overall block validity. Today, the validators ascertain this blob availability by fully downloading the blobs. As mentioned above, the CL has an entirely different gossip network and peers than the EL. Validators use a combination of both to receive all the blob data needed to attest to a block. The first and primary source of blobs for the CL is the [blob subnets](https://github.com/ethereum/consensus-specs/blob/dev/specs/deneb/p2p-interface.md#blob-subnets) (`blob_sidecar_{subnet_id}`). With a maximum of six blobs, there are six subnets that every validator connects to. When gossiping a block, the corresponding blobs are gossiped over their respective subnet (e.g., the blob committed to in index two is gossiped over `blob_sidecar_2`). If a validator doesn't receive a blob over their CL gossip,[^a] they can check if their EL client has it in the mempool (received over EL gossip); they use this [`engine_getBlobsV1` API](https://github.com/ethereum/execution-apis/blob/main/src/engine/cancun.md#engine_getblobsv1). Lastly, the validator can directly ask their CL peers for a blob (instead of just waiting to hear it over gossip) with the [`blob_sidecar_by_range` API](https://github.com/ethereum/consensus-specs/blob/dev/specs/deneb/p2p-interface.md#blobsidecarsbyrange-v1). Note, however, that the req/resp model is not usually used on the critical path and is unlikely to help retrieve missing blobs in the time between hearing about the beacon block and the attestation deadline. Still, we include it here because it is part of the spec and worth highlighting. The sequence diagram below shows this flow for three blobs, which Bob receives in three distinct ways.
[^a]: Many technical details within the CL gossip can impact the probability of a blob coming over the subnet. Control messages like `IHAVE, IWANT, IDONTWANT` signal to your peers what data you have and need. For this document, we elide these details.

<u>Step-by-step annotation:</u>
1. Bob receives a `beacon_block` over the pub-sub topic and needs to attest to its validity. The block contains three blobs (but the blobs are gossiped separately).
2. Bob hears about `blob_1` over CL gossip on the `blob_subet_1` topic. He still doesn't have `blob_2` or `blob_3` from either subnet.
3. Bob calls `engine_getBlobsV1` to see if the EL has heard about any blobs over mempool gossip. The engine call returns `blob_2`, but not `blob_3`.
4. Bob makes a direct request to his CL peers for `blob_3`.
**Key point:** The blob subnets are a *push* model instead of a pull. When Bob receives a blob over the subnet, he forwards it to his CL peers, even though they haven't explicitly asked for it.
The push model can lead to redundancy, where a node receives blobs multiple times, which we see empirically.

> Ingress traffic over blob_sidecar topics. At an average of 153 kB/s ($\approx$ 14 blobs per slot), we see about 4x more blobs than are included in each block.
A pull-based model would more efficiently consume bandwidth but at the cost of latency (given an extra round trip of control messages before the blob is transmitted). See [Gossipsub v2.0](https://ethresear.ch/t/doubling-the-blob-count-with-gossipsub-v2-0/21893) from Pop, Nihant, and Chirag, which aims to reduce this amplification.
#### (2.3). The full pre-PeerDAS picture
The figure below highlights the timeline of events within a slot for both blob gossip and block validation.

<u>Step-by-step annotation:</u>
1. The green (public) blobs arrive through the mempool (EL) at a uniform rate throughout the slot (they are patient and don't need to be strategic with timing).
2. The block arrives after the slot begins but before the attestation deadline. It commits some blobs, which may be private or public.
3. The purple (private) blobs arrive through the consensus layer network along with the block (they are impatient and thus strategic with timing and propagation).
### (3). Blob gossip and validation post-PeerDAS
[PeerDAS](https://efdn.notion.site/DAS-what-why-how-1c2d9895554180ceaca7d173c797e52d), widely held to be the main priority for the Fulu/Osaka hardfork, changes how the protocol interacts with blobs. For the purposes of this post, the only piece of PeerDAS we need to cover is the columnar approach used by the CL to verify data availability. The figure below demonstrates this distinction.

For simplicity, let's assume that the quantity of data needed to perform the two tasks is approximately the same. In other words, PeerDAS *does* increase the number of blobs per block, but it *doesn't* significantly increase the amount of blob data downloaded by the CL because each validator only downloads a subset of each blob (e.g., with a 48-blob target, if each validator downloads 1/8 of every blob, then in aggregate they download six blobs β the Prague/Electra [target](https://eips.ethereum.org/EIPS/eip-7691) β worth of data).
With this setup, we can consider how the validator interactions with blobs change. We will reverse the order by examining the CL block validation rule before discussing the mempool and EL gossip.
#### (3.1). Block validation and blobs
On the CL side, the validators still have to determine block validity based on the availability of blobs. Instead of downloading the full set of blobs, they download a random subset of columns from each blob. The blob subnets described above are deprecated in favor of data column sidecar subnets (`data_column_sidecar_{subnet_id}`), which is the topic where full columns of blobs are gossiped. Critically, there is no concept of *partial columns*; thus, each column depends on the complete set of blobs committed to by a block. To validate a block, the CL checks the result of the [`is_data_available` function](https://github.com/ethereum/consensus-specs/blob/9d377fd53d029536e57cfda1a4d2c700c59f86bf/specs/fulu/fork-choice.md#modified-is_data_available), which ensures that the node has access to their assigned columns for each blob in the block. As before, let's consider the three ways to retrieve their columns of data:
1. over [gossip](https://github.com/ethereum/consensus-specs/blob/9d377fd53d029536e57cfda1a4d2c700c59f86bf/specs/fulu/p2p-interface.md#data_column_sidecar_subnet_id) on the `data_column_sidecar` subnets,
2. from the blobs fetched from the EL with `engine_getBlobsV1` (same as before, more on this below), or
3. from the request/response domain with the new [`data_column_sidecars_by_root` API](https://github.com/ethereum/consensus-specs/blob/9d377fd53d029536e57cfda1a4d2c700c59f86bf/specs/fulu/p2p-interface.md#the-gossip-domain-gossipsub).
**Key point:** Step (2) above returns the *blobs* themselves instead of just the columns the validator needs to check. To construct the entire column, the EL will only be helpful if *it returns every blob* in the block (meaning the block cannot have any private blobs).
There are multiple problems with this. First (and most obviously), if the EL has the entire set of blobs for the block, then what was the point of the CL sampling only a subset of each blob? (More on this in the following section.) The CL could just confirm the data is available directly by fetching every blob from the EL. Secondly, if *any* of the blobs in the block is private, then the EL call won't aid in constructing the entire column (recall: no partial columns). So ... this is awkward. We sharded the CL blob validation, but by doing so, we eliminated the value of the public EL mempool for block validation (it is still potentially useful for block building, especially considering the patient, public mempool blobs described in Section 1). Consider the figure below as an example.

As before, the green blobs arrive uniformly throughout the slot (there are now 48), while the purple blobs are not gossiped until the block has been published. Now, the honest attester faces the following situation:

They need these full columns, but because of some private (purple) blobs in the block, the public mempool is insufficient for the full-column construction. Thus, if they don't receive the full column over gossip via `data_column_sidecar`, they cannot consider the block valid (as before, we assume that the request/response domain API isn't dependable for the critical path of block validation).
Note that the builder/proposer is *highly incentivized* to make sure the columns of data are distributed over the CL gossip because the validity of their block depends on it. It *may* be OK to just leave it to them to figure out how much to push the latency when distributing columns. This comes with some risk, as builders may push latency to the limit and leave low-bandwidth attestors unable to download the columns fast enough to attest properly. Even today, builders make difficult decisions about which blobs to include, especially during periods of high volatility β see [discussion here](https://ethresear.ch/t/introduction-to-optimistic-v3-relays/22066#p-53641-data-size-and-network-efficiency-comparison-6). Even if we are happy to leave most of the blob distribution to the builders (albeit a big assumption), we still have the issue of blob gossip under a significant increase in blob throughput.
#### (3.2). Blob gossip changes
The reality remains that the EL blob gossip *still receives all public blobs*, eliminating the benefit of sampling only a subset of each blob on the CL. We calculated an average of 32 kB/s with three blobs; under a 48-blob regime, this would be 512 kB/s = (128 kB/blob * 48 blob/slot) / 12 s/slot. This is a significant increase and potentially too much bandwidth for many solo/small-scale operators. Thus, it is worth considering changes to gossip to alleviate this. To conclude this article, we will consider horizontal and vertical sharding of the EL mempool, each of which could be implemented with various levels of complexity and effectiveness; this list is by no means exhaustive, and each proposal is probably worthy of an entire article analyzing the tradeoffs. Instead, we aim to give a feel for the design space and defer the complete analysis and recommendations to future work.
##### (3.2.1). Horizontally shard the EL mempool
We use "horizontal sharding" to describe the process of the EL downloading only a subset of the total set of public blobs. It is horizontal because the messages are still complete blobs (instead of just some columns within the blob). There are several different approaches to achieve this; generally, these are easy to implement, but they don't resolve the issue of block validation requiring the full column (the same cell of every blob). Here are a few candidate ideas:
1. *Hardcode an ingress-byte limit.* For example, the EL could limit the number of blobs they download to no more than 12 per slot. This rate limiting will mean that each node only receives a fraction of the total blobs (in particular, the first ones they heard announced).
2. *Only download a random subset of blobs.* For example, based on some randomness, only download blobs sent from peers with a `0x01` prefix. This is another rate-limiting method that is not restricted to the first blobs the node hears about.
3. *Store only the highest-N blobs ordered by priority fee.* For example, the EL could maintain a list of 12 blobs ordered by priority fee. When they hear a new blob announced (we would need to add a fee to the blob metadata), they decide to pull it if it has a higher fee than any transaction in their list.
Pros
- *Simple to implement.* Most of these are local heuristics that can be implemented in the clients without a hard fork as they are out of protocol.
- *Effectively minimizes the bandwidth consumption.* They do accomplish the stated goal of reducing the worst-case bandwidth consumption of the EL mempool.
Cons
- *Changes the model of the blob mempool today.* As constructed today, nodes expect to (eventually) have very similar views of the mempool. This model breaks down if not every node expects to download every transaction. We don't know what the resulting fragmentation of mempool views would cause.
- *May cause/contribute to/hasten the death of the public mempool.* The mempool guarantees worsen under a horizontal sharding mechanism because you no longer expect every node to hear about every blob transaction. Given we have a robust public mempool (recall that 80% of blobs are public, and we expect this to remain so long as the centralized sequencers continue to be the predominant blob consumers), making the mempool less effective for blobs certainly increases the odds that rollups go direct-to-builder to post blobs. (There are different views about the longevity of the public mempool as is, and we won't make a value judgment on that here.)
##### (3.2.2). Vertically shard the EL mempool
The CL uses vertical sharding (only downloads a subset of columns of each blob), so if the EL could only download the columns the CL needs, we would completely solve the issue. However, implementing this is not simple because of the potential DoS risk; without the full blob transaction, any subset of the blob columns is gossiped without validators confirming that it corresponds to a valid, fee-paying blob transaction. Thus, the naΓ―ve approach of simply gossiping columns/cells instead of complete blob transactions is untenable. Turning our attention to DoS prevention, there are a few promising threads.
1. *Blob mempool tickets.*[^4] As proposed by Francesco in [this article](https://hackmd.io/@fradamt/blob-mempool-tickets), blob mempool tickets create an explicit, in-protocol market for allocating write access to the blob mempool. As such, the vertical sharding of the mempool no longer becomes a DoS risk because the ticket (and limited number of tickets) ensures that only authorized senders have access to the mempool. This gives strong guarantees on the upper bound of blobs flowing through the mempool at any time.
2. *Blob mempool reputation system.* As proposed by Dankrad in [this article](https://notes.ethereum.org/@dankrad/BkJMU8d0R), limiting write access to the mempool to nodes who have either successfully landed blobs or have burned a small amount of ETH can also mitigate this DoS risk. This doesn't give as strong of a bound on blobs in the mempool, but it may be simpler to implement.
[^4]: This article was originally going to be about the market design of blob mempool tickets... but here we are, lol.
Pros
- *Minimizes bandwidth consumption and mirrors the CL.* From first principles, it is clear that this is the "more correct" method as it mirrors the vertical sharding of the CL. Recall that the whole point of data sharding on the CL was eliminating the need for the nodes to download the full set of blobs. Sharding the EL horizontally and the CL vertically severely limits the utility of the EL mempool.
- *Preserves the public mempool.* There is a clear path to preserving the public mempool as the default path for patient blobs to utilize by explicitly resolving the bandwidth concern.
- *Blob tickets may serve as a blob inclusion list mechanism.* (more speculative) Because the protocol is explicitly aware of who has access to send blobs over the mempool, the attesters could also be employed to enforce the inclusion of timely blobs (similar to [FOCIL](https://eips.ethereum.org/EIPS/eip-7805)).
Cons
- *Complex to implement.* Clearly, either of these systems is much harder to implement than any horizontal sharding option listed above; the engineering overhead is significant (especially if the fork-choice rule is modified).
- *Complex to reason about economically.* Beyond just the engineering challenge, a suite of economic questions accompany these proposals. How do we price the tickets? How will rollups strategize about purchasing mempool access versus going direct-to-builder? What heuristics would be necessary for a robust reputation system? The design space here is vast, and it may be the case that simple rules work, but it is not obvious.
### (4). Summary and conclusion
We covered a lot. In summary:
1. L2 transactions have a variety of confirmation rules, one of which is inclusion in an L1 blob. The L1 blob confirmation provides very different utility depending on the sequencing model of the L2. These confirmations also have implications for the properties of the L1 blob transactions.
- 1.1. Blobs generated by L2s with centralized sequencers are neither MEV-carrying nor particularly latency-sensitive. This makes them likely candidates for the public mempool, and 80% of today's blobs fit this model.
- 1.2. Blobs generated by L2s with permissionless sequencers (e.g., in Taiko's total anarchy) *are MEV-carrying* and will compete for L1 inclusion. This paradigm leads to private blob flow, and 20% of today's blobs follow this path (exclusively Taiko's blobs).
- 1.3. In a based/native rollup where the L1 validators issue preconfirmations, the blobs containing the L2 transactions are likely to be latency-sensitive to some extent, depending on the construction of the L2. We don't spend much time here because these rollups don't yet exist.
2. We examine how blobs are handled by the protocol today by considering the EL mempool and the CL blob validation.
- 2.1. The EL downloads blobs into the mempool using a pull-based model. Each node is expected to eventually have the same view of the public set of blobs. This blob download is spread evenly over the slot.
- 2.2. The CL must determine blob availability as part of the block validation logic. The main venue for hearing about blobs on the CL is from the CL gossip over the blob subnets, but they can also check their EL mempool for blobs that a block references. The CL only has four seconds to check that they have access to all blobs before attesting to the block.
3. PeerDAS changes how the protocol interacts with blobs. Specifically, the CL now only downloads a subset of each blob instead of the entire thing.
- 3.1. Data columns and the beacon block are gossiped on the CL (full blobs are no longer gossiped over the blob subnets). As a result, any blobs that are gossiped on the EL are only helpful in constructing the columns *if there are no private blobs*, which seems unlikely. By default, the EL mempool will try to download *all public blobs*. This eliminates the benefit of the CL only needing to observe some subset of the total blob data.
- 3.2. We need to address this asymmetry by sharding the EL mempool somehow (or acknowledging that the public mempool is not viable in the long term).
- 3.2.1. Horizontal (blob-wise) sharding of the mempool is easiest but has significant drawbacks that may limit the value of the public mempool.
- 3.2.2. Vertical (column-wise) sharding of the mempool aligns with the vertical sharding of blobs done in the CL. However, it is more difficult to implement and requires serious anti-DoS mechanisms.
---
<span style="color:red;">*βββ made with β₯ and markdown. thanks for reading! βmike βββ*</span>
<!-- - four proposed solutions to this problem:
1. (easy) limit the number of blobs you download within a period of time. (e.g., no more than 6 blobs a slot)
2. (medium) blob mempool tickets (e.g., ensure that no blobs to download than the number of tickets sold)
3. (hard) shard the EL mempool through sampling
4. (easy) only store the n highest fee blobs at any given time as a mempool. (probably forces a lot of blob flow through private channels)
mempool themes from disc with matt:
1. everyone has the same view
2. public mempool might just die
3. the builders do most of this anyway
4. need a path for censored blobs
long in the tooth
-->
<!-- new sketch
- l2 transaction lifecycles = "what service do blobs provide for l2 users"
- centralized sequencer modality: "patient blobs" (latency insensitive) (base / arbi)
- l2 sequencer green check mark
- l2 sequencer batch post
- resulting properties of blob transactions:
- large (batch posting is more efficient)
- recurring (periodic updates to give better confirmations)
- patient (order agnostic over some time horizon)
- decentralized sequencer modality: "impatient blobs" (latency sensitive) (taiko)
- no centralized sequencer
- searchers compete to land blob and collect fees and MEV
- resulting properties of blob transactions:
- large (batch posting is more efficient)
- per-slot (updates as fast as the L1 can)
- impatient (order-dependent)
- aside: this also is probably also the case for native/based rollups too (even if only one l2 proposer has write access at a given time), where current sequencer is incentivized to include as many l2 txns as possible in the blob
- blob inclusion today
- public mempool vs. direct to builders
- we see 80/20 public/private
- https://dune.com/queries/4266705/7172423
- https://x.com/mikeneuder/status/1907475731275522463
- constraint #1 for L1 validators: bandwidth consumed by the public blob mempool
- current blob gossip model
- proposed improvements
- constraint #2 for L1 validators: downloading sufficient blobs to attest to blocks
- current block validation model
- sampling in peerdas
- new proposal: blob mempool tickets
- strawman #1: sell blob tickets directly.
- issue: overly perscriptive for native/based rollups
- strawman #2: sell mempool tickets in isolation.
- issue: no guarantees of blob inclusion means ticket is a lot less valuable.
- proposal: sell mempool tickets with blob inclusion guarantees.
- benefit #1: market mechanism to limit bandwidth of public mempool.
- benefit #2: ticket price is effectively blob priority fee (still need to pay base feee).
- benefit #3: gives better CR for blobs.
- benefit #4: allows other forms of blob delivery for impatient blobs (direct to builder).
- benefit #5: predictability/many auctions for recurring demand (good for patient blobs).
- downside #1: impacts fork-choice.
- downside #2: semi-explicitly codifies the two blob paths.
- downside #3: may be mostly acheivable with p2p engineering.
-->
<!-- ---
---
---
STALE(ish) $\downarrow$
sketch
- goals
- make more blob propogation more efficient and predictable
- make better use of bandwidth
- we ignore all the engineering angles and focus on how to allocate access to the blob mempool
- recent related work. mostly discussing engineering and networking improvements to blobs, we instead focus on the question of the potential changes to the blob market itself.
- https://hackmd.io/@fradamt/blob-mempool-tickets
- https://ethresear.ch/t/doubling-the-blob-count-with-gossipsub-v2-0/21893
- https://ethresear.ch/t/improving-das-performance-with-gossipsub-batch-publishing/21713
- https://ethresear.ch/t/faster-block-blob-propagation-in-ethereum/21370
- https://ethresear.ch/t/number-duplicate-messages-in-ethereums-gossipsub-network/19921
- https://ethresear.ch/t/block-arrivals-home-stakers-bumping-the-blob-count/21096
- preliminaries
- how blobs work today
- how blobs will work with PeerDAS
- how blobs are different than normal transactions
- key features of blob mempool tickets
- access to the mempool is sold before the slot itself
- blobs are only gossiped if they are signed by a ticket holder
- market design of selling blob mempool tickets
- goals of the market:
- efficiency (highest value player wins)
- revenue maximizing
- incentive compatible
- unique features of the market:
- repeated game
- temporal demand for blobs
- no single auctioneer
- potential designs:
- lookahead blob allocation (e.g., N-32 proposer is the only one allowed to include blobs in the slot N block)
- lookahead ticket auction (in protocol)
- posted price auction -->
<!-- ### How are blobs allocated & priced today?
- Blobs are currently included in the normal list of transactions specified in [the `ExecutionPayload`](https://github.com/ethereum/consensus-specs/blob/dev/specs/deneb/beacon-chain.md#executionpayload). (n.b., a set of commitments is also included in [the `BeaconBlockBody`](https://github.com/ethereum/consensus-specs/blob/dev/specs/deneb/beacon-chain.md#beaconblockbody).)
- As such, the proposer for a given slot has to choose between:
1. **Local build** the entire `ExecutionPayload` (including blobs), or
2. **Outsource** the entire `ExecutionPayload` (including blobs) to `MEV-boost` (n.b., [the `BlindedBeaconBlockBody`](https://github.com/ethereum/builder-specs/blob/main/specs/deneb/builder.md#blindedbeaconblockbody), which the proposer signs over without seeing the block contents includes the `blob_kzg_commitments`).
- The blob pricing is implemented as a 1559-style transaction fee mechanism with a posted price (base fee) + tip (priority fee) payment rule. The auction for blob inclusion is conducted *by the proposer* (local building) or *by the builder* (`MEV-boost` building).
**Key question:** *Should we directly allocate blob space to blob consumers?* -->
<!--
### Changing the blobspace supply chain
> **Strawman proposal #1 β** *Decouple the allocation of blobspace from the allocation of the execution payload itself. E.g., the protocol conducts an auction for slot N+M blobspace during slot N.*
This was proposed explicitly by Francesco in "[Blob Tickets](https://hackmd.io/@fradamt/blob-tickets)," but was rejected because of the potential incompatability with native and based rollups in a world where the blobspace is critical for interop. See "[Why not blob tickets](https://hackmd.io/@fradamt/blob-mempool-tickets#Why-not-blob-tickets)" for more context.
> **Strawman proposal #2 β** *Explicitly sell blob mempool access without changing the allocation and payment rules for the blob itself. E.g., the protocol conducts an auction for slot N+M blob mempool access during slot N.*
As described by Francesco in "[Blob mempool tickets](https://hackmd.io/@fradamt/blob-mempool-tickets)," we could instead simply sell access to the mempool during a specific slot.
**Key question:** *Is it sufficient to just allocate mempool space? Isn't there a direct griefing vector where the gossiping of the blob doesn't guarantee its inclusion? I guess it ensures that everyone gets it without necessarily guaranteeing it is included immediately?*
**Proposed mechanism (simplified for a single blob)**
1. **Slot N** - Blob mempool access for slot N+M is sold in a first price auction. Only the winning bid is included onchain and the full payment goes to the slot N proposer.
2. **Slot N+M** - The winning bidder in the slot N auction has permission to send a blob during this slot. The blob must still pay the relevant base fee and may include a priority fee.
3. **Slot N+M+1** - The proposer/builder may choose to include the blob that was gossiped during slot N+M or any other blob that pays the base fee.
**Potential issues**
1. Gossiping the blob during slot N+M doesn't guarantee that it will be included in slot N+M+1. If not included immediately, the base fee could increase, making the blob payment insufficient
- Potential fix: the attesting committee enforces that a timely blob that they observed in the mempool during slot N+M must be included if there is space. Lots of complexity, introduces split views into the fork-choice, I generally don't love this type of change.
2. First price auctions are not incentive compatible, but a second price auction is not feasible in this setting because the proposer can submit fake bids driving the winning bidder to pay their full value.
### Model sketch
- assume that any blob gossiped during slot N is included by the block in slot N+1 (non-strategic builders who include all blobs they hear)
- slot N+M auction is conducted during slot N (far enough ahead that the price of the blob shouldn't change dramatically during the course of the slot)
- $n$ blob consumers (e.g., rollups) each with a private demand for blob space over some contiguous set of slots $[start_i, end_i]$, for $i \in 1, 2, ..., n$, but with indifference over the exact slot they are allocated.
- Each blob consumer can bid in any auction they want to, but only one auction occurs per slot (M slots ahead of when the mempool access is available).
- (similar to "patient transaction" model in "[Serial Monopoly](https://arxiv.org/abs/2311.12731).") -->
<!--
---
---
stale
- mempool details
- current blob spread
- announce and pull
- theoretical improvements
- bandwidth in comparision to consensus layer
- futuristic ideas
- mempool sharding
- blob sampling -->