Blob mempool tickets

Introduction

What

The core ideas are:

  • Mirrored mempool sampling: using a vertically sharded mempool, where nodes download all blob txs but only samples of the associated blobs, with indices chosen to mirror the column sampling done for blocks, i.e. a node that downloads column
    i
    would also download sample
    i
    of all blob txs in the mempool.
  • Mempool space allocation: to avoid DoS concerns on the sharded mempool, the mempool is restricted to a fixed blob capacity per slot, and this scarce mempool write access is allocated by selling it in the form of mempool tickets.

Why

  • It's a relatively simple way to implement the necessary restrictions on a vertically sharded mempool, with the added benefit of making the mempool throughput be exactly limited by the desired network throughput.
  • For all blobs that go through the mempool, sampling can happen already there, with more relaxed timelines. This implies:
    • Peak bandwidth is not as much of a constraint
    • We can more safely afford to use pull-based gossip (like the blob mempool already does today). This reduces redundancy, because each blob is sent and received only once per node on average.
    • No duplication of bandwidth between CL and EL, only mempool propagation.
  • No need for a separate blob IL mechanism: instead of allocating the right of making blob ILs to validators, by randomly electing them in the IL committee, we can think of each mempool ticket as allocating a seat in the blob IL committee as well. In other words, each blob tx in the mempool can itself act as a blob IL. Since there's only a small number of these, there's no need for validators to package them into ILs, and we can instead rely on direct mempool observation (though we still need attesters that don't run a full mempool to have some kind of availability signal, like a committee's vote. See here).

Mempool

For now let's not worry about how the tickets are allocated, and let's just focus on how they are used to build a blob mempool. For this, the only thing we are going to need is that the consensus layer knows about the tickets somehow. Let's just say that the BeaconState keeps a record of the ticket holders for the next SLOTS_PER_EPOCH slots, in the circular array state.blob_tickets_record. In particular, state.blob_tickets_record[slot % SLOTS_PER_EPOCH] records the ticket holders for slot, in the form of BLS pubkeys.

class BeaconState(Container):
    ...
    blob_tickets_record: Vector[List[BLSPubkey, MAX_BLOBS_PER_BLOCK], SLOTS_PER_EPOCH]

The blob mempool lives entirely on the CL p2p network, and consists of a single blob_transaction_envelope global topic used for sharing blob transactions, and NUMBER_OF_COLUMNS mempool_cells topics used for sharing the blob samples associated with them, together with proofs. Sending messages in these topics at slot N requires holding a ticket for slot N, which can be checked against the pubkeys recorded for it, i.e., pubkeys = state.blob_tickets_record[N % SLOTS_PER_EPOCH]. In particular, a message has to include a ticket_index and be signed with pubkeys[ticket_index].

Blob transaction topic

blob_transaction_envelope

class BlobTransactionEnvelope(Container):
    transaction: Transaction
    kzg_commitments: List[KZGCommitments, MAX_BLOBS_PER_TRANSACTION]
    slot: Slot
    ticket_indices: List[uint8, MAX_BLOBS_PER_TRANSACTION]
    
class SignedBlobTransactionEnvelope(Container):
    message: BlobTransactionEnvelope
    signature: BLSSignature

Note: we could remove kzg_commitments from BlobTransactionEnvelope if we had ssz encoded transactions, because we could just access the versioned hashes and check them against the kzg commitments in the cells. Without that, we will also have to give to the EL the versioned hashes corresponding to kzg_commitments and have it check that they correspond to the ones in transaction.

Gossip checks

Let blob_tickets = state.blob_tickets[slot % SLOTS_PER_EPOCH]. Before forwarding, we essentially just check that the message is correctly signed by the holder of an unused ticket, by requiring that:

  • All holders of tickets corresponding to ticket_indices are the same, i.e. blob_tickets[ticket_index].pubkey is the same for all ticket_index in ticket_indices.
  • For each ticket_index in ticket_indices, is the first valid message received for ticket_index
  • signature is a valid signature of message by pubkey, wherepubkey is the unique pubkey that owns all tickets, i.e., blob_tickets[ticket_index].pubkey for any ticket_index in ticket_indices.

Sample topics

mempool_cells_{subnet_id}

class CellSidecar(Container)
    cell: Cell
    index: CellIndex 
    kzg_proof: KZGProof
    kzg_commitment: KZGCommitment
    
class MempoolCells(Container):
    cell_sidecars: List[CellSidecar, MAX_BLOBS_PER_TRANSACTION]
    slot: Slot
    ticket_indices: List[uint8, MAX_BLOBS_PER_TRANSACTION]  
    
class SignedMempoolCells(Container):
    message: MempoolCells
    signature: BLSSignature

Note: the overhead compared to a CellSidecar is 105 bytes (mostly coming from the signature), or ~5% with NUM_COLUMNS = 128 and ~10% with with NUM_COLUMNS = 256. The overhead compared to sending a blob with proofs in a single (signed) message is only coming from the kzg_commitment, cell_index, slot and ticket_index in each cell, for a total of 65 bytes per cell.

Gossip checks

Before forwarding we do exactly the same checks as in the BlobTransactionEnvelope topic, based on slot and ticket_indices, which amount to checking that the message is correctly signed by the holder of unused tickets for this slot. We also check that the subnet is the correct one:

  • compute_subnet_for_cell_sidecar(cell_sidecar.cell_index) == subnet_id

Validity

When gossiping, we do not require the following check, which is however required in order for a MempoolCells object to be valid (together with the gossip checks):

  • verify that cell_sidecar.cell is a an opening of cell_sidecar.kzg_commitment at cell_sidecar.cell_index, through cell_sidecar.kzg_proof.

In other words, we do not require verifying the cell sidecar itself. The reason we do not do it is that verifying a single cell, though reasonably cheap (~3ms), is much less efficient than batched verification (~16ms for a whole blob for example). For the purpose of preventing DoS, checking the signature is enough. The full verification can be done later, once all cells for a certain blob have been retrieved (or anyway, a client is free to schedule this as it pleases). The cells are ultimately only considered valid once the proof is verified.

Alternative designs

Since verifying the signature is "only" ~2-3x faster than verifying a cell proof, an alternative design could be to not sign MempoolCells (which saves 96 KBs, ~10%) and to instead verify the proof immediately, without waiting for batch verification. However, a node would then only verify and forward a MempoolCells object if it knows a corresponding BlobTransactionEnvelope, because that's the only place where signature verification (i.e. gating of the mempool by tickets) would happen in this design.

Yet another design, which reintroduces a bit of bandwidth overhead but completely eliminates any verification overhead, is to include a field mempool_cells_hashes: List[Root, NUMBER_OF_COLUMNS] in the BlobTransactionEnvelope, storing the hash of all MempoolCells objects associated with the transaction. This way, once a node has the BlobTransactionEnvelope, it can forward any CellSidecar whose hash matches, without waiting for kzg proof verification, which can be done whenever convenient for batching. There is then no verification overhead, while the bandwidth overhead is only 32 bytes per cell (assuming the worst case of a transaction with a single blob), 3x less than with a signature.

Summarizing, there are at least three possible designs, with these tradeoffs:

Verification overhead Bandwidth overhead Need envelope to forward cells
Sign cells 1ms 96 KBs per cell No
Verify kzg proof 2-3ms 0 Yes
Check against hash in envelope 0 32 KBs per cell Yes

Ticket allocation

1559 sale

Tickets could be sold through a system contract on the EL, implementing a 1559-like mechanism, much like the one used for EIP-7002 (EL-triggered withdrawals). However, it would probably make sense to sell tickets in bulk, for a period of time longer than one slot. For example, we could sell

The target and max could be set to the same values that are used for the fee market of blob transactions, since we want the mempool to have enough capacity to support the network's throughput, but not more than that: ideally, we don't want transactions that go through the mempool but don't end up onchain. For example, eventually we might have a target of 128 tickets sold per slot, and a max of 256.

Note: one weakness of this approach is that the "IL ticket is wasted whenever its slot is missed. We could refund tickets for missed slots, but then the block proposer could do things like buy all tickets and miss the slot on purpose if it cannot resell them for a sufficient profit

Refunds

The ticket allocation mechanism has to ensure DoS resistance: we only have a finite amount of blob transactions that we want to allow in the mempool, and thus only a finite amount of tickets we can allocate. If we could know in advance which senders are going to land blobs onchain, we could allocate tickets to them. In practice we of course do not have that knowledge, and we might be forced to charge a price for tickets.

To avoid discouraging blob transaction senders from using the mempool, we can refund the price of tickets that land onchain, without giving up DoS resistance: a ticket is only free if you land a corresponding transaction onchain, in which case the transaction is already paying a fee and we're ok with having allowed it in the mempool. However, refunding a ticket whenever its tx lands onchain can be abused: someone that normally lands a few blob txs per epoch could buy up all mempool tickets for some period of time, much in excess of their own demand for them, and then slowly get refunded by sending their (real) blob txs directly to builders. It's not much of an attack, but it still freely allows one to temporarily prevent mempool access to others, which is not ideal. To prevent this, we can shorten the refund period. If we set it to one slot, meaning you only get refunded for a slot N ticket if you land a tx in slot N+1, at which point the attack vector does not exist anymore. Essentially, you only get refunded if you prove that you did in fact have a legitimate reason to use the mempool in slot N.

Why not blob tickets

A previous post explored the idea of the protocol selling blob tickets, which would both serve to gate access to the blob mempool and to give inclusion rights. The idea was to kill two birds with one stone: get a DoS resistant vertically sharded blob mempool, and ensure that it is used to sample blob txs in advance, rather than in the critical path of block propagation.

This solution seems very well suited to the current world where sequencing is mostly not done by the L1, and thus where blob submission is not particularly time sensitive, because it only confirms onchain something that is already known at the rollup level. However, things might become much more complicated in a future full of based rollups, and even more so of based and native rollups that constantly interact. At that point, blob txs allow one to access a unified state across L1 and all such L2s, and inclusion time and ordering of blob txs are as important in this extended state as they are for regular L1 state today, e.g. the first blob tx in a block might arbitrage between AMMs on L1 and multiple L2s.

In such a world, selling inclusion rights for blob txs in advance would be akin to enshrining some form of partial block building today, since each blob would essentially be a partial block for the unified state. As for partial block building, the surface for potential issues due to state contention is large, and it's not clear that the system does not collapse back to a single superbuilder buying most tickets. Moreover, timing games now apply to blob txs as well (again, they're basically just txs for this unified state), and so there's very little reason to expect that mempool propagation would happen throughout the slot rather than as late as possible.