Kad - Building Something Like BitTorrent Part 1: Introduction to (Modified) Kademlia

# Building Something Like BitTorrent Part 1: Introduction to (Modified) Kademlia BitTorrent is a distributed file storage application that allows files to be distributed and replicated across multiple hosts rather than being stored behind a well-defined server interface. I made a similar application that allows multiple hosts to store/index a shared set of files with no enforced client/server relationship. This video demonstrates the file upload/download flow: [![Youtube link - Kademlia demo](https://img.youtube.com/vi/1sxCVfoZV2g/0.jpg)](https://www.youtube.com/watch?v=1sxCVfoZV2g) - A server ("founder") bootstraps a session and starts running against a discoverable endpoint (in this case `founder_container:8000` on the Docker network) - A client ("transient") joins the session using the remote endpoint above and with its own discoverable endpoint (`transient_container:8000`) - The founder and transient are able to list available files in the session - The founder is able to upload files (`map.jpg` and `chart.jpg`) to the session - these files may be sharded across multiple hosts (including either itself or other transients) - The transient then sees the available files in the session's list and can download the files from the session to a local destination The underlying data structure for this application is a **distributed hash table** (DHT) based on the Kademlia protocol. Through these set of notes on this application I'm going to work my way from the ground up, starting from the main Kademlia building blocks and ending with the file storage application interface. ## Fundamental Definitions The fundamental entities that Kademlia works with are: - **Chunks**: Blocks of stored data (in a file storage application this will correspond to portions of each file). There is no requirement that blocks have a constant size. Chunks are going to be distributed and replicated across multiple **nodes** (hosts). - **Peers**: Independently running nodes that can store Chunks and query for Chunks from other Peers. Peers may join a DHT and leave and return; we enforce no guarantees on individual peer behavior. - **Key**: A $K$-bit string. All chunks and peers will be identified by a Key. $K$ should be large enough that no two entities should share the same Key with high probability (in the paper $K=160$; in this post I'll set $K=3$). For peers, this key will always be randomly generated. For now, we'll assume that for chunks this key will be randomly generated too (there will be exceptions to this rule added in Part 5 to make the top-level file storage application work). ## DHT Interface It'll take a while to fully implement all parts of the modified Kademlia protocol; however, I'll introduce the Kademlia interface I want to implement here so that we can see the intermediate destination we're working towards. The DHT interface has four functions: - `join(Peer self, Peer other)`: a peer (self) joins an existing DHT using the advertised endpoint of another peer (other) - `get(Peer self, Key k)`: retrieves the chunk associated with the key (k) in the DHT via a peer (self) - `insert(Peer self, Chunk c)`: inserts a chunk $c$ into the DHT via a peer (self) - `leave(Peer self)`: a peer (self) leaves the DHT There are a few important differences here from the Kademlia protocol, and more generally from an actual Hash Table. First, the DHT does not expose a key-value pair interface for chunk insertion; instead the DHT is better thought of as a Distributed Hash Set, where the Key is the hash of the corresponding Chunk rather than an explicit user-specified key. This is notable because for a long-term file storage application like BitTorrent the DHT should not be used as an actual Hash Table; more specifically, **it shouldn't be able to easily expose a silent "overwrite" interface** (i.e., where the DHT contains the pair (k, v1), and the user can then insert (k, v2) to overwrite the former pair). Why? After all, for a file storage application we need to associate constant file names with data that could change if a user needs to modify the corresponding file. The use case for Kademlia is explicitly mentioned in the paper: **a system where a few nodes are long-lived and most nodes are short-lived**. Kademlia is a peer-based alternative to a client-server architecture to try to get the best of both worlds: - we get the security of a peer-based architecture since no single node controls data access and can't be a single point of failure - we get the performance of a client-server architecture since the Kademlia routing protocol will be designed to priotize long-lived ("founder") peers over short-lived ("transient") peers But we can narrow this use case even further, and in the process modify the Kademlia interface (which is that of a bona fide Hash Table) for our purposes. **We want to make BitTorrent.** General file *systems* need to support (both explicitly through their interface, and implicitly through performance-driven design) frequent reads and surgical writes for a multitude of use cases (append-only log files, OS swap files, office documents, databases, ...); we don't need this for file *storage*. In our scenario, the "short-lived" nodes in the system are mostly going to be *read-only*; i.e., clients that are downloading existing data. These nodes don't need an interface to modify existing data, so I'm not going to give them one. The "long-lived" nodes in the system (servers) are going to be uploading new data to the DHT frequently, and modifying file data rarely (if we wanted to we could even disallow this entirely by only allowing file uploads and deletes - no modifications to existing files)$^1$. Why is this non-mutating interface better for a file storage solution? As we'll see, **Kademlia has practically no consistency guarantee if we allow for overwrites**. It has a search protocol very similar to that of Chord but, again, better handles scenarios with a high churn of short-lived nodes. However, it does this by spamming nodes with data replication to take the "as long as some node as the data, we're good" approach. Guarantees that a given overwrite will actually overwrite all relevant replicated data in the DHT don't exist. In my mind, this setup was just begging to be reformulated as a Hash Set. --- [1] In a perfect world we wouldn't need to modify data corresponding to file storage: we'd use a separate replication system like Raft for storing file storage metadata (namely, the pointers from file names to the keys of chunks of file data in the Kademlia DHT) to ensure total replication over all long-lived nodes. Then, when we occassionally need to modify an existing portion of a file, we would update the metadata accordingly, and upload new chunks and delete stale chunks using the existing DHT interface. Since this isn't a perfect world (namely, because I don't want to), I'm going to use the DHT for file metadata (and cheat a little bit, which I'll explain more in detail when covering the top-level file storage application on top of the DHT - see Part 5).

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.