Mempool tickets

# Blob mempool tickets ## Introduction ### What The core ideas are: - **Mirrored mempool sampling**: using a [vertically sharded mempool](https://notes.ethereum.org/@dankrad/BkJMU8d0R), where nodes download all blob txs but only samples of the associated blobs, *with indices chosen to mirror the column sampling done for blocks*, i.e. a node that downloads column $i$ would also download sample $i$ of all blob txs in the mempool. - **Mempool space allocation**: to avoid DoS concerns on the sharded mempool, the mempool is restricted to a fixed blob capacity per slot, and this scarce mempool write access is allocated by selling it in the form of mempool tickets. ### Why - It's a relatively simple way to implement the necessary restrictions on a vertically sharded mempool, with the added benefit of making the mempool throughput be *exactly* limited by the desired network throughput. - For all blobs that go through the mempool, sampling can happen already there, with more relaxed timelines. This implies: - Peak bandwidth is not as much of a constraint - We can more safely afford to use pull-based gossip (like the blob mempool already does today). This reduces redundancy, because each blob is sent and received only once per node on average. - No duplication of bandwidth between CL and EL, only mempool propagation. - No need for a separate blob IL mechanism: instead of allocating the right of making blob ILs to validators, by randomly electing them in the IL committee, we can think of each mempool ticket as allocating a seat in the blob IL committee as well. In other words, each blob tx in the mempool can itself act as a blob IL. Since there's only a small number of these, there's no need for validators to package them into ILs, and we can instead rely on direct mempool observation (though we still need attesters that don't run a full mempool to have some kind of availability signal, like a committee's vote. See [here](https://notes.ethereum.org/7EGS7DVtTAKnqlh9LDEWxQ?view#Multiple-independent-availability-checks)). ## Mempool For now let's not worry about how the tickets are allocated, and let's just focus on how they are used to build a blob mempool. For this, the only thing we are going to need is that the consensus layer knows about the tickets somehow. Let's just say that the `BeaconState` keeps a record of the ticket holders for the next `SLOTS_PER_EPOCH` slots, in the circular array `state.blob_tickets_record`. In particular, `state.blob_tickets_record[slot % SLOTS_PER_EPOCH]` records the ticket holders for `slot`, in the form of BLS pubkeys. ```python class BeaconState(Container): ... blob_tickets_record: Vector[List[BLSPubkey, MAX_BLOBS_PER_BLOCK], SLOTS_PER_EPOCH] ``` The blob mempool lives entirely on the CL p2p network, and consists of a single `blob_transaction_envelope` global topic used for sharing blob transactions, and `NUMBER_OF_COLUMNS` `mempool_cells` topics used for sharing the blob samples associated with them, together with proofs. Sending messages in these topics at slot `N` requires holding a ticket for slot `N`, which can be checked against the pubkeys recorded for it, i.e., `pubkeys = state.blob_tickets_record[N % SLOTS_PER_EPOCH]`. In particular, a message has to include a `ticket_index` and be signed with `pubkeys[ticket_index]`. ### Blob transaction topic `blob_transaction_envelope` ```python class BlobTransactionEnvelope(Container): transaction: Transaction kzg_commitments: List[KZGCommitments, MAX_BLOBS_PER_TRANSACTION] slot: Slot ticket_indices: List[uint8, MAX_BLOBS_PER_TRANSACTION] class SignedBlobTransactionEnvelope(Container): message: BlobTransactionEnvelope signature: BLSSignature ``` *Note: we could remove `kzg_commitments` from `BlobTransactionEnvelope` if we had ssz encoded transactions, because we could just access the versioned hashes and check them against the kzg commitments in the cells. Without that, we will also have to give to the EL the versioned hashes corresponding to `kzg_commitments` and have it check that they correspond to the ones in `transaction`.* #### Gossip checks Let `blob_tickets = state.blob_tickets[slot % SLOTS_PER_EPOCH]`. Before forwarding, we essentially just check that the message is correctly signed by the holder of an unused ticket, by requiring that: - All holders of tickets corresponding to `ticket_indices` are the same, i.e. `blob_tickets[ticket_index].pubkey` is the same for all `ticket_index` in `ticket_indices`. - For each `ticket_index` in `ticket_indices`, is the first valid message received for `ticket_index` - `signature` is a valid signature of `message` by `pubkey`, where`pubkey` is the unique pubkey that owns all tickets, i.e., `blob_tickets[ticket_index].pubkey` for any `ticket_index` in `ticket_indices`. ### Sample topics `mempool_cells_{subnet_id}` ```python class CellSidecar(Container) cell: Cell index: CellIndex kzg_proof: KZGProof kzg_commitment: KZGCommitment class MempoolCells(Container): cell_sidecars: List[CellSidecar, MAX_BLOBS_PER_TRANSACTION] slot: Slot ticket_indices: List[uint8, MAX_BLOBS_PER_TRANSACTION] class SignedMempoolCells(Container): message: MempoolCells signature: BLSSignature ``` *Note: the overhead compared to a `CellSidecar` is 105 bytes (mostly coming from the signature), or ~5% with `NUM_COLUMNS = 128` and ~10% with with `NUM_COLUMNS = 256`. The overhead compared to sending a blob with proofs in a single (signed) message is only coming from the `kzg_commitment`, `cell_index`, `slot` and `ticket_index` in each cell, for a total of 65 bytes per cell.* #### Gossip checks Before forwarding we do *exactly* the same checks as in the `BlobTransactionEnvelope` topic, based on `slot` and `ticket_indices`, which amount to checking that the message is correctly signed by the holder of unused tickets for this slot. We also check that the subnet is the correct one: - `compute_subnet_for_cell_sidecar(cell_sidecar.cell_index) == subnet_id` #### Validity When gossiping, we *do not* require the following check, which is however required in order for a `MempoolCells` object to be valid (together with the gossip checks): - verify that `cell_sidecar.cell` is a an opening of `cell_sidecar.kzg_commitment` at `cell_sidecar.cell_index`, through `cell_sidecar.kzg_proof`. In other words, we do not require verifying the cell sidecar itself. The reason we do not do it is that verifying a single cell, though reasonably cheap (~3ms), is much less efficient than batched verification (~16ms for a whole blob for example). For the purpose of preventing DoS, checking the signature is enough. The full verification can be done later, once all cells for a certain blob have been retrieved (or anyway, a client is free to schedule this as it pleases). The cells are ultimately only considered valid once the proof is verified. #### Alternative designs Since verifying the signature is "only" ~2-3x faster than verifying a cell proof, an alternative design could be to not sign `MempoolCells` (which saves 96 KBs, ~10%) and to instead verify the proof immediately, without waiting for batch verification. However, a node would then only verify and forward a `MempoolCells` object if it knows a corresponding `BlobTransactionEnvelope`, because that's the only place where signature verification (i.e. gating of the mempool by tickets) would happen in this design. Yet another design, which reintroduces a bit of bandwidth overhead but completely eliminates any verification overhead, is to include a field `mempool_cells_hashes: List[Root, NUMBER_OF_COLUMNS]` in the `BlobTransactionEnvelope`, storing the hash of all `MempoolCells` objects associated with the transaction. This way, once a node has the `BlobTransactionEnvelope`, it can forward any `CellSidecar` whose hash matches, without waiting for kzg proof verification, which can be done whenever convenient for batching. There is then no verification overhead, while the bandwidth overhead is only 32 bytes per cell (assuming the worst case of a transaction with a single blob), 3x less than with a signature. Summarizing, there are at least three possible designs, with these tradeoffs: | | **Verification overhead** | **Bandwidth overhead** | **Need envelope to forward cells** | |------------------------------------|----------------------------|-------------------------|-------------------------------------| | **Sign cells** | 1ms | 96 KBs per cell | No | | **Verify kzg proof** | 2-3ms | 0 | Yes | | **Check against hash in envelope** | 0 | 32 KBs per cell | Yes | ## Ticket allocation ### 1559 sale Tickets could be sold through a system contract on the EL, implementing a 1559-like mechanism, much like the one used for EIP-7002 (EL-triggered withdrawals). However, it would probably make sense to sell tickets in bulk, for a period of time longer than one slot. For example, we could sell The target and max could be set to the same values that are used for the fee market of blob transactions, since we want the mempool to have enough capacity to support the network's throughput, but not more than that: ideally, we don't want transactions that go through the mempool but don't end up onchain. For example, eventually we might have a target of 128 tickets sold per slot, and a max of 256. *Note: one weakness of this approach is that the "IL ticket is wasted whenever its slot is missed. We could refund tickets for missed slots, but then the block proposer could do things like buy all tickets and miss the slot on purpose if it cannot resell them for a sufficient profit* ### Refunds The ticket allocation mechanism has to ensure DoS resistance: we only have a finite amount of blob transactions that we want to allow in the mempool, and thus only a finite amount of tickets we can allocate. If we could know in advance which senders are going to land blobs onchain, we could allocate tickets to them. In practice we of course do not have that knowledge, and we might be forced to charge a price for tickets. To avoid discouraging blob transaction senders from using the mempool, we can refund the price of tickets that land onchain, without giving up DoS resistance: a ticket is only free if you land a corresponding transaction onchain, in which case the transaction is already paying a fee and we're ok with having allowed it in the mempool. However, refunding a ticket *whenever* its tx lands onchain can be abused: someone that normally lands a few blob txs per epoch could buy up all mempool tickets for some period of time, much in excess of their own demand for them, and then slowly get refunded by sending their (real) blob txs directly to builders. It's not much of an attack, but it still freely allows one to temporarily prevent mempool access to others, which is not ideal. To prevent this, we can shorten the refund period. If we set it to one slot, meaning you only get refunded for a slot N ticket if you land a tx in slot N+1, at which point the attack vector does not exist anymore. Essentially, you only get refunded if you prove that you did in fact have a legitimate reason to use the mempool in slot N. ## Why not blob tickets A [previous post](https://hackmd.io/@fradamt/blob-tickets) explored the idea of the protocol selling blob tickets, which would both serve to gate access to the blob mempool and to give inclusion rights. The idea was to kill two birds with one stone: get a DoS resistant vertically sharded blob mempool, and ensure that it is used to sample blob txs in advance, rather than in the critical path of block propagation. This solution seems very well suited to the current world where sequencing is mostly not done by the L1, and thus where blob submission is not particularly time sensitive, because it only confirms onchain something that is already known at the rollup level. However, things might become much more complicated in a future full of based rollups, and even more so of based *and* native rollups that constantly interact. At that point, blob txs allow one to access a unified state across L1 and all such L2s, and inclusion time and ordering of blob txs are as important in this extended state as they are for regular L1 state today, e.g. the first blob tx in a block might arbitrage between AMMs on L1 and multiple L2s. In such a world, selling inclusion rights for blob txs in advance would be akin to enshrining some form of partial block building today, since each blob would essentially be a partial block for the unified state. As for partial block building, the surface for potential issues due to state contention is large, and it's not clear that the system does not collapse back to a single superbuilder buying most tickets. Moreover, timing games now apply to blob txs as well (again, they're basically just txs for this unified state), and so there's very little reason to expect that mempool propagation would happen throughout the slot rather than as late as possible.