P2p Publish-Subscribe mechanism for Data-availability

###### tags: `pubsub` `design` # P2p Publish-Subscribe mechanism for Data-availability Data-availability design only works if the p2p layer allows to exchange different pieces of data amongst different actors in the network on "separate channels" so they don't have to download all the data. We will implement an ad-hoc Publish-Subscribe mechanism to address this problem. ## Recap of DA actors and their interaction with the network : In the current DAL design, we have $sl=256$ **slots**, which are split into $sh=2048$ **shards**. Slots initial data are 1 MB of data, that are expanded with an erasure code with a factor 16. It gives us $n=sl * sh$ shards of size slightly greater than $1024kB * 16 /2048 = 8kB$ due to encoding of bytes into points. Those values can change but they give an order of magnitude of what we need to handle. Once a slot header is selected in a head, the corresponding data can be pushed into the network. Here there is a choice to make on whether we want the block to be finalised or not, probably not. - Slot producers: produce and distribute, at some levels (can be all levels), all the shards of a slot. - Endorsers: Receive and store at some level, some shards for all slots - Slot consumers (rollup nodes, rollup indexers, ?): need enough data to reconstruct the original slot data (all shards get at least 1/16 for One or more slots) - Samplers: Sample a few data, at each level (may change their topics more frequently) ## How will we use Topic subscription A topic is a product $shard*slot*level$. Subscribing to such topic means that - either we have the slot's shard for the given level, - or we are willing to get it and to transfer it (or announce we have it and serve it) to any peer on the same topic. This choice implies that at each new level (~30s), there are $2048*256=1,048,576$ new topics, which is too much. However, most actors will register either to all shards (slot producers/consumers), or to all slots (endorsers). Thus, topics set exchanged and stored to announce subscription can be compactly represented by '$shard* {\tt any} *level$' or '${\tt any}*slot*level$' where ${\tt any}$ is a wildcard value that announces interest in the whole range of slots or shards. This narrows the number of topics representation (if we exclude samplers) to $2048+256=2304$. Samplers will only subscribe to a few random topics of the form '$shard*slot*level$' and advertising this shouldn't have a strong impact. This wildcard logic has to be taken into account when selecting peers to connect to. For example, - An endorser assigned to shard 1002 and 1003 for level 4000 will subscribe to $1002 * {\tt any} * 4000$ and $1003 * {\tt any} * 4000$. - A slot-producer for slot 22 will subscribe to ${\tt any} * 22 * 4000$ - A rollup node using slot 22 can subscribe to ${\tt any} * 22 * {\tt any}$ - The slot producer a rollup nodes, will satisfy the need of the endorser for a connection to node in topics $1002 * 22 * 4000$ and $1003 * 22 * 4000$ The advantage of this topic definition, versus a definition that would not embed levels, is that everyone can announce in advance to which topic it will belong in the coming levels, which should minimize the amount of reorganisation of connections. Note that messages should not be targeted to wildcard topics The advantage of this topic definition, versus a definition that would not embed levels, is that everyone can announce in advance to which topic it will belong in the coming levels, which should minimize the amount of reorganisation of connections. ## Node discovery In our current p2p layer, list of known nodes are advertised on request. For the pubsub version, we will annotate these advertisements with the most recently subscribed topics known for each peer. This will not allow to find nodes that are serving old topics (only usefull for sampling). To do so, we could implement several mechanism: - crawling network : - inefficient O(network-size) connections to established - easy to implement (almost done in current maintenance). - in network message relay for request for contact on topics - O(network-size) messages, can be less in practice, as we can aggregate and memoise answers, limit the propagation of the query message. - relatively easy to implement - DHT: - TODO check details of DiscV5 from libp2p - common DHT implementation uses UDP - high-risk that nobody cares to properly set port openning - can we have routing mechanism to avoid that ? - heavy to implement ## Data dissemination To send data in the network, there are two possibilities: - lazy push : - the sender just advertises that he has the data for the given shard, the recipient sends a query for the full data if he doesn't already has the data. - downside: it incurs a bit more latency, as there are two extra messages to handle (`IHave x`, `IWant x`, `Full x_with_payload` instead of sending `x_with_payload` ) - upside: it prevents receiving the payload multiple times - eager push + lazy push: - the sender selects some peers to which it will push the data immediately and uses lazy push with all other peers In a first design we will test lazy push, as it clearly spares a lot of bandwidth: Full messages (slot's shard) being shard of ~9kB and metadata for announcements and requests being either slot-header of ~58B or even unit as the topic fully characterise the expected data when using slot-header provided by L1. If we can't get enough responsiveness to quikcly propagate data, we will consider pushing eagerly to some connections. ## Data late retrieval To retreive data after the dissemination wave has traversed the network, one will rely on pull - lazy-pull: broadcast `Have_you x` to relevant neighbors, some reply `IHave x` then proceed on one peer with `IWant x `, `Full x_with_payload` dialogue - eager-pull: just send `IWant x ` to neighbors and process multiple`Full x_with_payload` Having level in topics allows to find peers that are supposed to have the data by using peer discovery mechanism. ## Performance issues **Massive number of topics:** As seen before, having wildcard topics alleviates a lot the management of topic registration as long as we have an efficient way to detect topic intersections. We can have rolling dal-nodes that only stays in topics for a bounded time. Nonetheless, we will have to manage an evergrowing number of registered topics for say archive dal-nodes. As number of topics to which nodes subscribed can be ever growing, we have to manage the fact that we cannot know all the subscriptions of our peers without being subject to spam attacks, and we cannont maintain a fair number of connection for all topics we have subscribed to: - To address spamming issue, - we will memorise peers topics in a ring. The ring should be able to handle topics for a short period around the current level, say 10 levels. - We will advertize only a subset of the known topics of our neighbors. - To address the bookkeeping of relevant connection, we will add a notion of topics-of-interest which reflects the fact that at a given time, some topics are more interesting to us that other: - the request for peers list can carry those topics-of-interest and the replier will preferably send the peers that match these topics - When contacting a peer, we will have a mechanism to find if it follows the topics we are interested in. - The maintenance will have to focus on keeping connections to nodes in topics-of-interests (which are in a window around the current level), with a higher score for those that cover the current level. ## Security issues ### anti-spam #### Using L1 as a filter: The fact that we only accept values that match the slot-headers published on L1 makes it easy to spot nodes that would provide garbage data and to ban them at the first attempt. ### Eclipse Having the L1 as oracle allows detect situation were we expect to receive some data, but they are not coming. We can use timeout to do a discovery pass to find more peers on the topics for which we didn't received any data.