owned this note
owned this note
Published
Linked with GitHub
# Replication Networks
A proposal for a simple replication networks with swarm-based membership. At a high-level, the network is a (eg gossipsub or hyperswarm) mesh network using a Kademlia DHT, where datasets are replicated across all members of the swarm.
## Assumptions
- The main requirement is to support two use cases: archival storage (message bundles) and blob ("large file") storage. Both use cases require *some* level of durability guarantees, however it could be assumed that they do not require the same level of guarantees, with blob storage guarantees likely being less than archival storage.
- A DHT is not necessarily required, as it may create too much overhead when files are "small", eg blobs.
- The number of connections is limited.
- A block exchange protocol can be simplified to gossip messages, or even waku messages.
- Replication across all nodes in a swarm is sufficient for durability guarantees.
- Each node in the swarm COULD have a direct connections to every other node in the swarm.
- A single node can be part of multiple replication networks^1^
- For the sake of simplicity, we can assume that all nodes are altruistic and honest in nature. In other words, node will not maliciously act outside the bounds of the protocol.
- At least one member of the swarm must remain online at all times. If all members of a swarm go offline, the data is considered lost.
## Scalability concerns
There are several limitations when creating a gossipsub newtork. Typically these include the number of connections a single node can create and the amount of bandwidth the node can consume in the network.
A node in the replication network would, at a minimum, be paired with a Waku node that already has its own gossipsub network overhead (connections and bandwidth consumption). Additionally, each node could join multiple replication networks, further increasing the quantity of required connections and bandwidth.
These factors may severely limit the scalability of this type of network The order of magnitude would likely be in the 10's, however this needs exploration.
## Adhoc network creation
To create, or join, a network, a node uses a SPR and a swarm id. The node creates a connection to the bootstrap node using the SPR, creating a network between the two nodes, or joining an existing gossipsub network that the bootstrap node is a part of. It additionally subscribes to the gossip pubsub topic of the swarm id.
## Join routine
Upon joining, a node will receive the swarm membership list and dataset list from the bootstrap node. It will create a connection to each member in the swarm and download a dataset from the list. This can be done in parallel for speed, and also in the background as a lower priority task. Bandwidth usage may need to be throttled to avoid clogging the swarm and preventing higher priority tasks from completing, like downloading new (non-replicated) datasets. Downloads should be resumable.
## Downloads
A dataset can be reqested by CID from other peers in the swarm. Each peer must make their best effort to respond with the dataset.
## New datasets
To add a new dataset to the network, a control message with the dataset CID will be gossipped to the members of the swarm on the swarm id topic. Each node in the swarm will start a download from the source node by sending a request with the CID over the connection to the source node. The source node will reply with a
## Upkeep
Maintenance of the DHT records as per the Kademlia DHT protocol.
## Churn
Nodes can drop out of the swarm at any time. Because each dataset is assumed to be replicated across all members in the swarm, this has a low impact on the durability of the dataset. If $n$ represents the number of nodes in a swarm, the repliaction factor will remain at $n$.
## Guarantees: replication factor and priority
The guarantees in this system are purely redundancy-based. Remote auditing and repair could be added later, however this may increase the gossip over the network, further reducing its scalability. Replication factor and priority could be set at the swarm-level and this metadata could be distributed when nodes join. This would allow for higher "guarantees" for certain swarms (eg message archival).
## Use case: message archival (bundles)
Let's assume for the sake of simplicity that each store node in the Waku network. Store nodes would join the replicaiton network using a swarm id for message archival. Bundles could be pushed to the network and replicated. Bundles could also be retrieved from the network.
## Use case: blob (large file storage)
In the blob ("large file") use case, peers join a swarm with a swarm id (topic) that is different from the message archival swarm id files are pushed to the network and replicated by all the peers.
## SDK
-----
^1^ The limitations of multiple replication network membership need to be understood, not the least of which would include limitations on connections and swarm upkeep.