The Rewrite - Some Ideas

# The Rewrite - Some Ideas As I mentioned during the offsite, I have a starting point to think about how to improve the filesharing client. Here's a few unorganized thoughts. ### Blockstore (Repostore) I haven't thought about this that much. We'll of course need an underlying store in which we can address indexed blocks per content-addressable dataset, but how exactly it should be implemented - and if it should be different from what we have now - is not something I gave much consideration. For instance: 1. since we're caching frontends locally as the user navigates the "web", a quota/eviction mechanism is probably still desirable; even more so because every dataset a user maintains locally also translates onto network overhead (announcement/swarm maintenance traffic). But we can probably manage evictions per dataset instead of per-block, which would be much more efficient; 2. we understand that using SQLite for storing/retrieving metadata and managing quota counters is a hotspot in the current implementation - since we're rewriting, I suppose a bit of consideration as to whether this is needed would also be reasonable. Some considerations on (2) could be: 1. preallocating space for entire datasets on download and deducing the quota only once datasets are added/deleted (as opposed to doing that at every block) would likely remove that particular hotspot; 2. storage of the Merkle trees can probably be simplified by allocating a single array on-disk along with the dataset and updating it, instead of using SQLite for this as we do. **Data layout.** This is a complicated[^1] problem, and maybe a "worm of cans" not worth opening. We have considered storing files sequentially, but this probably won't make much of a difference as block requests are not sequential. I'd go for something simple, correct, and fast enough. A super tentative high-level interface for this could look something like: ```python class DatasetStore: def __new__(quota: int) -> Self: pass def create_dataset(manifest: Manifest) -> Dataset: """ Creates a new `Dataset` in the `DatasetStore`. The dataset is initially empty. Creating a dataset deduces its size from the quota. """ pass def delete_dataset(cid: Cid) -> None: """ Removes a `Dataset` from the `DatasetStore` and its associated blocks. """ pass # def lru() -> Iterator[Dataset]: # """ # Iterates over the `Datasets` in the `DatasetStore` in # Least Recently Used (read or written to) order."" # """ # pass class Dataset: def blockmap() -> Bitset: """ Returns a compact representation for the list of blocks that this node has/does not have from this `Dataset`. Bits that are set mean blocks that we have. """ pass def completed() -> int: """ Returns the number of blocks that we currently have for this dataset. Equivalent to: >>> dataset.blockmap().cardinality() """ def put_block( block: Block, index: int, proof: MerkleProof, ) -> None: """ Stores a `Block` and a `MerkleProof` for this `Dataset`. Does nothing if the `Block` is already on disk, raises an error if the `MerkleProof` doesn't match. """ pass def get_block(index: int) -> Option[(Block, MerkleProof)]: """ Reads a `Block` for this `Dataset` and its associated `MerkleProof`. """ pass def has_block(index: int) -> bool: """ Do we have this particular `Dataset` `Block`? """ pass # No "del block" as you usually don't delete blocks # within a dataset - you only delete whole datasets. ``` Under the interface we can do whatever we want: use a "chunkstore" per dataset (like [Webtorrent](https://webtorrent.io/) does), or use a "block soup" store as we do today. This also decouples download strategies from dataset storage - all the store does is storing files and proofs - and exposes a sorted iterator for eviction. ### Protocol Layers I believe the protocol should be layered - a neighborhood management mechanism, followed by block knowledge management, followed by block request scheduling. #### Neighborhood Management Random swarms are simple to manage - you keep your neighbor bounds between $\delta_{\min}$ and $\delta_{\max}$, with a target $\delta_s$ such that $\delta_{\min} \leq \delta_s \leq \delta_{\max}$: ```python class Swarm: dataset: Dataset neighbors: Set[Peer] async def maintain(self: Swarm): target = self.delta_s - live_neighbors peers = await sampler.random_peers(swarm.dataset.cid, target) for peer in peers: if peer.connect() == SUCCESS: swarm.peers.add(peer) async def join(self: Swarm): while True: await condition(swarm.live_neighbors < delta_min) await maintain(swarm) ``` Here `sampler` could be a DHT-based discovery server, or something like gossip-based peer sampling. #### Block Knowledge Updates For every peer in your neighbtorhood, you keep up-to-date knowledge about which blocks they have, and keep them informed about those too. Sending an update per block will likely incur in too many update messages, so in practice some kind of batching mechanism is likely to be required. ```python= async def peer_connected(swarm: Swarm, peer: Peer): response = await peer.send( Handshake( swarm.dataset.cid, swarm.dataset.blockmap())) blockmap = response.blockmap swarm.update_knowledge(blockmap, peer) async def push_update_loop(swarm: Swarm): batch = Batch() # As blocks are stored, they get added to a # push batch. async def on_block_stored(block: Block): batch.add(block) swarm.on_block_stored = on_block_stored while True: # If we have enough blocks or if we have haven't # sent updates for long enough, send batch. await condition(batch.len == THRESHOLD || batch.duration > MAX_DELAY) await multicast(batch) batch.clear() ``` #### Block Request Scheduling Block request scheduling is probably the most complicated part of this. At its simplest, we want to ensure that: 1. every block is scheduled onto _at least_ one live peer that claims to have it; 2. we keep enough scheduled requests per peer in flight, so as to fill the bandwidth-delay product of the channels towards them. Since: * peers can disconnect or time out; * the information on which peers have which block changes over time; * the bandwidth-delay product is not known in advance, and may change. This means we need to handle all that. Very roughly, the schedule building process could look something like: ```python= class Block: cid: Cid requested: Set[Peer] class Dataset: blocks: Map[Cid, Block] class Peer: bandwidth_delay_product: int inflight_requests: int def is_saturated(self: Peer) -> bool: return peer.inflight_requests >= peer.bandwidth_delay_product async def build_schedule(swarm: Swarm, peer: Peer): blocks = [ block for block in swarm.dataset.blocks if len(block.requested) == 0 ] # Can sort for rarity once we have that. blocks.sort(swarm.rarest_first) for block in blocks: if peer.is_saturated(): break if swarm.has_block(peer, block): swarm.dataset[block.cid].requested.add(peer) await peer.schedule_request(block) async def block_knowledge_received( swarm: Swarm, peer: Peer, presences: Bitset, ): swarm.update_presences(peer, presences) await build_schedule(swarm, peer) async def peer_connected(swarm: Swarm, peer: Peer): await build_schedule(swarm, peer) ``` **Peer disconnections.** We then need to implement proper peer timeout policies; i.e., peers that don't reply to requests should be disconnected. In fact, we should probably in general keep tabs on _all_ requests issued to a certain peer, and implement an orderly decision rule as to when to disconnect it. Again, looking at libtorrent's management of this could be useful. [^1]: https://blog.libtorrent.org/2014/12/a-bittorrent-filesystem/