BlockExchange Improvements

$$ \newcommand{\WantHave}{\text{WantHave}} $$ BlockExchange Improvements ========================== This document contains a brief summary of the changes I've envisioned for the BlockExchange over my initial time at Codex, as well as some insight gathered after that. In summary, those are: 1. [use swarms](#use-swarms); 2. [use more efficient want list encoding](#Efficient-encoding-for-Block-Discovery); 3. [minor implementation improvements](#Implementation-Improvements). ## Use Swarms Advertising each individual block on the DHT can incur large costs, with the problem becoming larger for files that are more popular. In a simple back-of-the-envelope calculation, a single 1TB slot ($2^{40}$ bytes) on a network block size of $2^{16} = 64\text{kB}$ requires publishing $2^{40 - 16} = 2^{24}$ CIDs to the DHT. These publications need to be refreshed, meaning this traffic needs to go out every $\ll \delta$ time instants, where $\delta = 24\text{h}$ for Codex, currently. Furthermore, as we get more providers for a given block (i.e., downloaders), these costs multiply accordingly -- $10$ downloaders means $10$ times more DHT (re)publish traffic. For popular files, this can become a big problem - DHTs do not naturally load balance against hotspots, and this may end up drafting nodes that have no incentive to provide heavy DHT traffic -- as they are not paid for it -- to be required to do so. One idea that we have discussed is having a swarm per file[^1][^6][^2] instead of relying on the DHT for lookups of blocks that neighboring peers do not have. A swarm is just an overlay shaped as a random graph in which all participants (nodes) are interested in a given file either as storage providers or as downloaders. Lookups are replaced by: 1. passive waiting which relies on swarm percolation time [^2]; 2. active resampling; i.e., replace neighbors by others in the same swarm if things get too slow. Swarms are more resilient and scalable by nature, as BitTorrent has shown. This should also be a lot more efficient [than our current blind peer selection mechanism for block discovery](https://github.com/codex-storage/nim-codex/blob/master/codex/blockexchange/engine/engine.nim#L189), which currently does not exploit locality at all. Open issues include: 1. establishing mechanisms for resampling which offer reasonable performance while incurring acceptable overhead (e.g. connection churn). These likely hinge on the assumptions we make on swarm homogeneity, node specs, and node behavior; 2. little ability, by default, to control the order in which blocks arrive. This can be a problem for applications like video streaming; 3. byzantine resilience, in particular from Sybil attacks. Sybil nodes can disrupt or slow down a swarm, either by dilluting honest nodes or by attempting to eclipse the trackers (DHT nodes tracking swarm membership). Adopting S/Kademlia-like mechanisms may be useful here[^5]. ## Efficient encoding for Block Discovery Block discovery is currently done by exchanging long lists of CIDs amongst peers, which is an inherited trait from IPFS's BitSwap. Since [our data is a lot more structured than IPFS's](https://hackmd.io/PkCL3kq3Rcadgc_IVm-eoA), however, we can leverage sparse encoding techniques like Run-Lengh Encoding and Bitmaps to gain efficiency[^3]. The literature has in fact evolved significantly over the past decade, with efficient _set reconciliation_ algorithms[^4] becoming an area of research on its own. ## Implementation Improvements Some issues came up while I was fixing bugs. Those [are described here](https://github.com/codex-storage/nim-codex/issues/719): * bounding resource consumption for keeping peer context, in particular want lists; * implementing fairer block scheduling to avoid peers with large want lists starving peers with small ones during block sends. [^1]: [Block Discovery (in Swarms)](https://rpubs.com/giuliano_mega/1067876) [^2]: [(Diameter and Degree of) Codex Swarm Overlays](https://rpubs.com/giuliano_mega/1082104) [^3]: [Exchanging Want Lists with Bitsets](https://hackmd.io/PkCL3kq3Rcadgc_IVm-eoA) [^4]: [Space- and Computationally-Efficient Set Reconciliation via Parity Bitmap Sketch (PBS)](https://arxiv.org/abs/2007.14569) [^5]: Discv5 actually does already embody some level of Sybil resilience, but it's unclear to me how much of S/Kademlia they actually implement. [^6]: [Publishing and Downloading in Codex](https://hackmd.io/@TIZ3miARQNKm97IukDubJw/HJErTLW32)