Exchanging Want Lists with Bitsets

Exchanging Want Lists with Bitsets ================================== $$ \newcommand{\concat}[0]{\mid\mid} \newcommand{\id}[0]{\text{id}} $$ Bitswap relies on exchanging want lists and have lists as a way for peers to figure out who has which blocks in a Bitswap Swarm. Such lists contain the CIDs of each requested block, and allow the flexiblity IPFS needs to reconstruct arbitrary Merkle DAGs. The question I am trying to address here is: _do we really need this? Can we do better?_ ## Integer Lists and Bitsets Unlike IPFS, we do not represent our data as an arbitrary DAG. Codex files are _totally ordered, finite sets of blocks_, which means they can be mapped into sets of consecutive integers[^5]. If we make the intuitive convention that this set of integers must start at $1$, then we can uniquely identify the blocks in a file $F = \{b_1, \cdots, b_n\}$ by mapping them into an integer set $I_n = \{1, \cdots, n\}$; i.e. $\id(b_i) = i$. Identifying a block within the whole network still requires a pair $\left(\text{id}(F), i\right)$ where $\text{id}(F)$ is some system-wide identifier for $F$ -- e.g., its Merkle root -- but for peers that are _already_ in a swarm and sharing $F$, the integer $i$ is all they need. The question then, becomes: _Why can't we exchange simple sets of integer-valued indexes instead of long CID lists?_ **Drawbacks.** Clients looking for a given block can no longer rely on being able to lookup the CID for the block in the DHT -- they must use the swarm. Arguably, however, _that is what we want_, so this could be seen as a plus. **Advantages.** The main advantages revolve around efficiency. Because blocks map into consecutive integers, we can for instance tell which blocks there should be between $i$ and $j$ by just looking at $i$ and $j$. This means integer lists are on their own amenable to compression by encoding ranges[^6]. Finite integer sets can also be efficently encoded as _bitsets_. Given as set $S \subseteq I_n$, we can construct a _bitset_ $B_n(S) = \{e_1, \cdots, e_n\}$; $e_i \in \{0, 1\}$ such that $e_k = 1 \iff k \in S$. This means a bitset of size $n$ can encode an arbitrary subset of $I_n$ using $n$ bits. For dense subsets, savings are immediately obvious, but bitsets have been around for long enough that sparse and mixed sets can also be encoded efficiently. For instance, because of their limited alphabet, sparse bitsets will typically benefit from Run-Length Encoding[^1], and more sophisticated encodings like Roaring[^2][^3] can reduce the number of bits required even further when the encoded subset is much smaller than $I_n$. Indeed, on sorted data like ours, Roaring has been shown to require as little as $0.34$ bits per integer. Roaring is also available as an open source library[^4], which could make things simpler. ## Integers and Merkle Trees Integer indexes bear no trace of the actual content of the block, and the downloader node knows nothing about $F$ but its root hash. This could intuitively raise questions about how straighforward it would be for a client to return _i)_ a different block than what was requested, as those always have a valid Merkle proof or, _ii)_ a made up block. We look into thses issues a little closer here. **Made up blocks.** Making a block up is hard because it reduces to a preimage attack on the hashing function. The simplest case of a tree with height 1 (Figure 1) is enought o see it: let $h_i$ and $h_l$ be the hash functions used for internal and leaf nodes, respectively. Also, denote $\concat$ to be a string concatenation operator. <center> <image src="https://hackmd.io/_uploads/HkzGPPfpn.png" width="150px"> </center> **Figure 1.** A simple Merkle tree. Let $l_i = h_l(b_i)$. We know that $i_1 = h_i(h_l(b_1) \concat h_l(b_2))) = h_i(l_1 \concat l_2)$. This means that for the proof to pass with a fake block, say $l_2$, the attacker must find $l_1$ satisfying $h_i(l_1 \concat l_2) = i_1$, which is computationally hard for the hashing functions currently deployed. **Wrong block.** Sending back the wrong block, however, does seem more plausible. After all, for any block $b_i$, there should always be a valid Merkle proof for that block that evaluates to the right root ($i_1$). Not only that -- if we have a number of blocks that is a power of two, all proofs should be of the same size (Figure 2). ![](https://hackmd.io/_uploads/H1OAtLZph.png) **Figure 2.** A Merkle tree with a proof (orange nodes) for block $b_4$. We will now show that a Merkle proof for a different block always fails by showing that the unique path that leads to the block is the only one that allows the proof to pass. The proof is by induction (and some amount of handwaving). **Base case.** We start with the simple case of a path with length 1 (Figure 1). Suppose that block $b_1$ was requested, but block $b_2$ was sent, together with a Merkle proof for $b_2$. Since the client is lying, it will send $(b_2, l_1)$ instead of $(b_1, l_2)$. But $i_1 = h_i(l_1 \concat l_2) \neq h_i(l_2 \concat l_1)$. When the client computes the proof, it will therefore obtain $h_i(h_l(b_2) \concat l_1) \neq i_1$, thus failing the proof. **Induction step.** Now suppose we knew that in paths of up to length $n - 1$ it is not possible to cheat, meaning that a node has to return the correct nodes in the tree all the way down to height $n - 1$ or the proof fails. We are then left with a correct $n - 1$ prefix of the path to the correct leaf node, and the only choice a cheating node has to try to cheat is to return the wrong block among the two leaf nodes at height $n$. For instance, it could try returning node $l_3$ (and the value for $l_4$'s hash) instead of $l_4$ (and the value for $l_3$'s hash) in Figure 2. But because $h_i(l_3 \concat l_4) \neq h_i(l_4 \concat l_3)$, this would fail to compute the right value for $i_9$. And because we know the tree is exactly the same from that point up, we know that this would cause the proof to fail. It is therefore the case that the only proof that holds is the one following the exact path to the requested block. $\blacksquare$ Another way of looking at this is that the expansion of the computed values -- which is the responsibility of the downloader -- induces a unique composite hashing function for each leaf. For instance, the proof for $l_4$ looks like: $$ \begin{equation} i_1 = h_i(h_i(h_i(i_8 \concat h_i(l_3 \concat h_l(b_4))) \concat i_5) \mid\mid i_3) = f(i_8, l_3, h_l(b_4), i_5, i_3) \end{equation} $$ $f$ encodes the path from the root that leads to the block indexed at position $4$, and anything that is not the block at position $4$ and the right values for $l_3$, $i_8$, $i_5$, and $i_3$ will cause the proof to fail. The scheme, therefore, should work as one expects. [^6]: e.g. a list $\{1,2,3,4,5,6\}$ can be encoded as $[1, 6]$. [^1]: https://en.wikipedia.org/wiki/Run-length_encoding [^2]: Samy Chambi, Daniel Lemire, Owen Kaser, and Robert Godin. "[Better bitmap performance with Roaring bitmaps](https://onlinelibrary.wiley.com/doi/10.1002/spe.2325)", _Software: Practice and Experience_, 46(5), pp 709--719, 2015. [^3]: Daniel Lemire, Gregoy Ssi-Yan-Kay, and Owen Kaser. "[Consistently faster and smaller compressed bitmaps with Roaring](https://dl.acm.org/doi/10.1002/spe.2402), _Software: Practice and Experience_, 46(11), pp 1547--1569, 2016. [^4]: https://github.com/RoaringBitmap [^5]: DAGs can also be topologically sorted into a total order but, because those are not unique, we would need extra constraints to make sure everyone agrees on what that order would be.