Try   HackMD

ERC-4337 P2P Spec Feedback

Edit Date: 2023/10/12

Spec Commit: 2351f17

The following is a collection of feedback for the ERC-4337 P2P specification. I'll be adding to this as I learn more.

Discovery

There is currently no described mechanism for a node to find peers that support a particular mempool. The discovery process should provide at least some efficiency mechanisms to improve this.mempool_nets is undefined in the spec beyond "mempool subnet subscriptions" which has no meaning. There is a partial definition of how this looks in the ENR here.

Status Quo

A reasonable goal for a node is to try to maintain N peers per mempool to improve latency for receiving new UOs. To do this with the current protocol:

  1. Discover any peer (no way to filter)
  2. Dial that peer
  3. Request status/metadata
  4. Determine mempool matches, either keep peer or drop.

Proposal

We are limited to 300 bytes of information that we can pack into an ENR. Thus, we cannot put each supported mempool ID into the ENR directly.

Instead, we can define a "mempool subnet" as a mempool bucket. A mempool is assigned to a bucket by: mempool_id % MEMPOOL_ID_SUBNET_COUNT. The ENR structure is:

Key Value
mempool_subnets SSZ Bitvector[MEMPOOL_ID_SUBNET_COUNT]

Where MEMPOOL_ID_SUBNET_COUNT: 1024 (SSZ 128 bytes).

Note: A much more efficient struture would be a "sparse" bitvector. Nothing says we have to use SSZ here. We could only store the indexes/values of non-zero bytes in the bitvector. If MEMPOOL_ID_SUBNET_COUNT >> MAX_SUPPORTED_MEMPOOLS this should compress very nicely. This is worth exploring further, as increasing MEMPOOL_ID_SUBNET_COUNT reduces the chance of a false positive on a peer that supports a subnet not supporting the particular pool.

Now to discover a peer that supports the mempool ID:

  1. Map a mempool ID to its subnet
  2. Discover a peers filtering to find an ENR that supports the subnet
  3. Dial peer
  4. Request status/metadata
  5. Determine mempool matches, either keep peer or drop.

This should reduce the amount of dialing needed to find a peer that supports a particular mempool by a factor of MEMPOOL_ID_SUBNET_COUNT.

Request/Response

status/metadata

status should be reserved for more dynamic data while metadata should be used for more static data. metadata should only be requested upon initial handshake, and when a ping noticies that the metadata_seq_number has changed.

We should move supported_mempools to metadata and add syncing data to status.

status:

(
    block_hash: u256
    block_number: u256
)

block_hash: Hash of the last processed block.
block_number: Number of the last processed block.

Rational: status can be used by nodes to determine the syncing progress of a peer. Peers need to keep their mempools up to date or they risk sending invalid user operations. By adding these fields to status a node can determine when a peer is unhealthy.

This better tracks how both devp2p and the consensus spec define their status endpoints.

metadata:

(
  seq_number: uint64
  mempool_nets: Bitvector[MEMPOOL_SUBNET_COUNT]
  supported_mempools: List[Bytes32, MAX_SUPPORTED_MEMPOOLS]
)

seq_number: metadata sequence number.
mempool_nets: Mempool subnets as defined above.
supported_mempools: List of supported mempool IDs.

Rational: Changing the list of supported mempools should be a relatively rare occurance and should require special processing by the node including increasing their metadata sequence number.

Current implementation

If we keep the current implementation we should change the framing for the status endpoint. Its request/response should be sent in a single chunk. Not:

Responses that consist of a single SSZ-list send each list item as a response_chunk.

pooled_user_ops_hashes

Suggestion: Drop the s => pooled_user_op_hashes

Drop mempool from the request

Peers have already negotiated the common mempools in their connection. Peers should only need to send a single pooled_user_op_hashes request and should receive hashes for each mempool in common. This will limit the amount of requests a user needs to send.

It is not important which mempool a UO is associated with. The peer needs to re-validate the UO anyway, and can determine its mempool association at that point.

New request:

(
  offset: uint64
)

offset: Offset into a list of UO hashes. This this should be determined once by a peer and should time out.

Offset list timeout

The offset field only has meaning in the context of a previously sent pooled_user_op_hashes request. We should define this as the following:

  1. When offset = 0 a node should retrieve a list of UO hashes on shared mempools, up to a limit (maybe MAX_OPS_PER_REQUEST * 8). If this list > MAX_OPS_PER_REQUEST it should persist the portion that it hasn't yet sent.
  2. When offset != 0 the node should return UO hashes from the above list, skipping any already sent.
  3. After N seconds a node should time out this list and any request with offset != 0 should be treated as invalid. Suggestion: N = 1.
  4. Any request with offset = 0 causes a new list retrieval and timer restart.

pooled_user_ops_by_hash

The size of this response is currently unbounded. We need to cap the maximum size of the response and mark it as a protocol violation if a sender sends more than this size.

Suggested cap: 10MB

Gossip

user_ops_with_entry_point

As defined user_ops_with_entry_point will incur large amounts of inefficiency in the gossip network.

Libp2p gossipsub deduplicates message by message ID. Nodes gossip IHAVE messages to notify peers of the message IDs that they currently have for a topic. Peers then look at their local cache of message IDs, and if they have not yet received a message for that message ID they issue an IWANT message to the peer that gossiped the message ID.

The current spec defines message IDs as:

SHA256(MESSAGE_DOMAIN_VALID_SNAPPY + snappy_decompress(message.data))[:20

and user_ops_with_entry_point as:

class UserOperationsWithEntryPoint(Container):
    entry_point_contract: Address
    verified_at_block_hash: uint256
    chain_id: uint256
    user_operations: List[UserOp, MAX_OPS_PER_REQUEST]

If two messages have a different list of user ops, they will have different message IDs. Peers will be required to issue IWANT messages for these full lists, and then do the UserOp deduplication internally. This will massively increase network utilization.

Suggestion

The gossip message for user operations should be redefined as verified_user_operation.

class VerifiedUserOperation(Container):
    verified_at_block_hash: uint256
    user_operation: UserOperation

Note: Even this adds inefficiencies as there may be multiple messages for a given UO that was verified at different block hashes. Ideally a UO should only be verified at 1-2 different blocks so this shouldn't add too much overhead to the network.

Gossipsub internally batches message ids into the IHAVE and IWANT messages, so no need to do batching in this message itself.

entry_point_contract and chain_id are redundant pieces of information. entry_point_contract should be implied from the mempool definition. chain_id should be implied from the network itself. If we feel we need chain_id somewhere we could add it to the ENR or to the metadata.

Mempool Topic Deduplication

The libp2p Gossipsub specification contains the following.

In all pubsub implementations, we can first check the seen cache before forwarding messages to avoid wastefully republishing the same message multiple times.

This seen cache is shared across topics and caches message IDs.

If a caller attempts to publish the same message ID on two different topics, the 2nd attempt will be deduplicated and fail.

Rust implementation

Go implementation

The definition of message ID above doesn't incorporate the topic string its being sent on, and thus if the same message is sent to two mempool topics, it will be deduplicated by the 2nd request and will not reach subscribers.

If a node has a UO that matches two separate mempools it can only send that UO to the subscribers of a single pool.

More recent definitions of the consensus spec now include the message topic in the message id. See here.

Problem

If we incorporate the topic into the message ID, peers lose the ability to deduplicate their IWANT messages. Each UO will be requested by each peer on each mempool that it matches and the peer supports.

Options

  1. The current message ID definition should change to incorporate the topic ID. This will at least ensure that a UO is sent to each mempool it matches on. The downside here will be network inefficiencies.
  2. A small change to Gossipsub would remove this issue. The seen cache should be split into a read-side and write-side cache. The read-side cache can be cross-topic. The write-side cache should be per topic. This would ensure that a message ID is broadcast on all topics it applies to, but a peer knows to only read that message once.

I asked a question about this on the libp2p forum. Would be helpful to talk to maintainers.

  1. This is a more radical suggestion, but we could switch to an entirely different pubsub implementation, or just use the request/response domain. For now, I'm much more inclined to go with option (1) and push for the change described in (2) before considering this.

Misc

MESSAGE_DOMAIN_VALID_SNAPPY is undefined.

Suggestion: 0x0100000000

MESSAGE_DOMAIN_INVALID_SNAPPY is undefined.

Suggestion: 0x0000000000