Edit Date: 2023/10/12
Spec Commit: 2351f17
The following is a collection of feedback for the ERC-4337 P2P specification. I'll be adding to this as I learn more.
There is currently no described mechanism for a node to find peers that support a particular mempool. The discovery process should provide at least some efficiency mechanisms to improve this.mempool_nets
is undefined in the spec beyond "mempool subnet subscriptions" which has no meaning. There is a partial definition of how this looks in the ENR here.
A reasonable goal for a node is to try to maintain N peers per mempool to improve latency for receiving new UOs. To do this with the current protocol:
We are limited to 300 bytes of information that we can pack into an ENR. Thus, we cannot put each supported mempool ID into the ENR directly.
Instead, we can define a "mempool subnet" as a mempool bucket. A mempool is assigned to a bucket by: mempool_id % MEMPOOL_ID_SUBNET_COUNT
. The ENR structure is:
Key | Value |
---|---|
mempool_subnets | SSZ Bitvector[MEMPOOL_ID_SUBNET_COUNT] |
Where MEMPOOL_ID_SUBNET_COUNT
: 1024 (SSZ 128 bytes).
Note: A much more efficient struture would be a "sparse" bitvector. Nothing says we have to use SSZ here. We could only store the indexes/values of non-zero bytes in the bitvector. If MEMPOOL_ID_SUBNET_COUNT
>> MAX_SUPPORTED_MEMPOOLS
this should compress very nicely. This is worth exploring further, as increasing MEMPOOL_ID_SUBNET_COUNT
reduces the chance of a false positive on a peer that supports a subnet not supporting the particular pool.
Now to discover a peer that supports the mempool ID:
This should reduce the amount of dialing needed to find a peer that supports a particular mempool by a factor of MEMPOOL_ID_SUBNET_COUNT
.
status
/metadata
status
should be reserved for more dynamic data while metadata
should be used for more static data. metadata
should only be requested upon initial handshake, and when a ping
noticies that the metadata_seq_number
has changed.
We should move supported_mempools
to metadata
and add syncing data to status
.
status
:
block_hash
: Hash of the last processed block.
block_number
: Number of the last processed block.
Rational: status
can be used by nodes to determine the syncing progress of a peer. Peers need to keep their mempools up to date or they risk sending invalid user operations. By adding these fields to status
a node can determine when a peer is unhealthy.
This better tracks how both devp2p and the consensus spec define their status
endpoints.
metadata
:
seq_number
: metadata sequence number.
mempool_nets
: Mempool subnets as defined above.
supported_mempools
: List of supported mempool IDs.
Rational: Changing the list of supported mempools should be a relatively rare occurance and should require special processing by the node including increasing their metadata sequence number.
If we keep the current implementation we should change the framing for the status
endpoint. Its request
/response
should be sent in a single chunk. Not:
pooled_user_ops_hashes
Suggestion: Drop the s
=> pooled_user_op_hashes
mempool
from the requestPeers have already negotiated the common mempools in their connection. Peers should only need to send a single pooled_user_op_hashes
request and should receive hashes for each mempool in common. This will limit the amount of requests a user needs to send.
It is not important which mempool a UO is associated with. The peer needs to re-validate the UO anyway, and can determine its mempool association at that point.
New request:
offset
: Offset into a list of UO hashes. This this should be determined once by a peer and should time out.
The offset
field only has meaning in the context of a previously sent pooled_user_op_hashes
request. We should define this as the following:
offset = 0
a node should retrieve a list of UO hashes on shared mempools, up to a limit (maybe MAX_OPS_PER_REQUEST * 8). If this list > MAX_OPS_PER_REQUEST
it should persist the portion that it hasn't yet sent.offset != 0
the node should return UO hashes from the above list, skipping any already sent.offset != 0
should be treated as invalid. Suggestion: N = 1
.offset = 0
causes a new list retrieval and timer restart.pooled_user_ops_by_hash
The size of this response is currently unbounded. We need to cap the maximum size of the response and mark it as a protocol violation if a sender sends more than this size.
Suggested cap: 10MB
user_ops_with_entry_point
As defined user_ops_with_entry_point
will incur large amounts of inefficiency in the gossip network.
Libp2p gossipsub
deduplicates message by message ID. Nodes gossip IHAVE
messages to notify peers of the message IDs that they currently have for a topic. Peers then look at their local cache of message IDs, and if they have not yet received a message for that message ID they issue an IWANT
message to the peer that gossiped the message ID.
The current spec defines message IDs as:
SHA256(MESSAGE_DOMAIN_VALID_SNAPPY + snappy_decompress(message.data))[:20
and user_ops_with_entry_point
as:
If two messages have a different list of user ops, they will have different message IDs. Peers will be required to issue IWANT
messages for these full lists, and then do the UserOp
deduplication internally. This will massively increase network utilization.
The gossip message for user operations should be redefined as verified_user_operation
.
Note: Even this adds inefficiencies as there may be multiple messages for a given UO that was verified at different block hashes. Ideally a UO should only be verified at 1-2 different blocks so this shouldn't add too much overhead to the network.
Gossipsub internally batches message ids into the IHAVE
and IWANT
messages, so no need to do batching in this message itself.
entry_point_contract
and chain_id
are redundant pieces of information. entry_point_contract
should be implied from the mempool
definition. chain_id
should be implied from the network itself. If we feel we need chain_id
somewhere we could add it to the ENR or to the metadata
.
The libp2p Gossipsub
specification contains the following.
In all pubsub implementations, we can first check the seen cache before forwarding messages to avoid wastefully republishing the same message multiple times.
This seen
cache is shared across topics and caches message IDs.
If a caller attempts to publish the same message ID on two different topics, the 2nd attempt will be deduplicated and fail.
The definition of message ID above doesn't incorporate the topic string its being sent on, and thus if the same message is sent to two mempool topics, it will be deduplicated by the 2nd request and will not reach subscribers.
If a node has a UO that matches two separate mempools it can only send that UO to the subscribers of a single pool.
More recent definitions of the consensus spec now include the message topic in the message id. See here.
If we incorporate the topic into the message ID, peers lose the ability to deduplicate their IWANT
messages. Each UO will be requested by each peer on each mempool that it matches and the peer supports.
Gossipsub
would remove this issue. The seen
cache should be split into a read-side and write-side cache. The read-side cache can be cross-topic. The write-side cache should be per topic. This would ensure that a message ID is broadcast on all topics it applies to, but a peer knows to only read that message once.I asked a question about this on the libp2p forum. Would be helpful to talk to maintainers.
MESSAGE_DOMAIN_VALID_SNAPPY
is undefined.
Suggestion: 0x0100000000
MESSAGE_DOMAIN_INVALID_SNAPPY
is undefined.
Suggestion: 0x0000000000