With the project on the enhanced Distributed Hash Table (DHT) with rated lists and hierarchical levels formally decided in the first week, the initial step is to understand the basic Kademlia Distributed Hash Table. This table utilizes libp2p to facilitate peer discovery and content routing. Next week, the primary focus will be on developing the enhanced DHT specification with rated lists.
# DHT primitive
## DHT, WHY?
In Distributed Hash Tables (DHTs), data assignment to nodes typically follows these steps:
- Generate a unique key for each piece of data, typically using a hash function: Key = hash(data).
- Assign a unique ID to each participating node in the network.
- Assign each key to the node with the 'closest' ID based on a specific metric (e.g., the XOR metric).
size of keyspace = number of bits in keys. What if the hash space is real big?
## Kademlia DHT
question 1: Which parties take which part of hash map?
A: In Kademlia, 'distance' is defined using the XOR operator, applied to the IDs (interpreted as non-negative integers) of two nodes, or between a node ID and a key. Essentially, the 'closest' node to a key in Kademlia is the one with the smallest XOR result when the key is XORed with the node's ID.
question 2: Findability problem. How to find the exact data from the distributed storage?
A: Routing tables. Kademlia nodes store contact information about each other to route query messages. For each 0<=i<4, everyone node keeps a list of <IP address, UDP port, Node ID> triples for nodes of distance between $2^i$ and $2^{i+1}$ from itself. We call these lists k-buckets.
Node Lookup Procedure๏ผ
1. Find the ๐ closest nodes to a given key in your routing table using the XOR metric.
2. Continue the following steps until responses are received from all ๐ closest nodes:
a. Query these ๐ closest nodes for their own ๐ closest nodes.
b. Update your list of the ๐ closest nodes based on the new information received.
3. If a node leaves the network and cannot respond to queries, it will be replaced in the k-buckets of other nodes, thereby maintaining the network's robustness.
## Algorithm & Implementation
Before implement DHT, I went through the `libp2p` based on this [doc] (https://github.com/chaors/Ethereum_read/blob/master/Docs/0xa1%20p2p%E5%AE%9E%E7%8E%B0.md)
```
package dht
import (
"context"
"fmt"
"math"
"math/rand"
"sync"
"time"
"github.com/libp2p/go-libp2p-core/host"
"github.com/libp2p/go-libp2p-core/network"
"github.com/libp2p/go-libp2p-core/peer"
"github.com/libp2p/go-libp2p-core/peerstore"
"github.com/libp2p/go-libp2p-core/protocol"
"github.com/libp2p/go-libp2p-core/routing"
"github.com/libp2p/go-libp2p-kad-dht/internal"
dhtcfg "github.com/libp2p/go-libp2p-kad-dht/internal/config"
"github.com/libp2p/go-libp2p-kad-dht/internal/net"
"github.com/libp2p/go-libp2p-kad-dht/metrics"
pb "github.com/libp2p/go-libp2p-kad-dht/pb"
"github.com/libp2p/go-libp2p-kad-dht/providers"
"github.com/libp2p/go-libp2p-kad-dht/rtrefresh"
kb "github.com/libp2p/go-libp2p-kbucket"
"github.com/libp2p/go-libp2p-kbucket/peerdiversity"
record "github.com/libp2p/go-libp2p-record"
recpb "github.com/libp2p/go-libp2p-record/pb"
"github.com/gogo/protobuf/proto"
ds "github.com/ipfs/go-datastore"
logging "github.com/ipfs/go-log"
"github.com/jbenet/goprocess"
goprocessctx "github.com/jbenet/goprocess/context"
"github.com/multiformats/go-base32"
ma "github.com/multiformats/go-multiaddr"
"go.opencensus.io/tag"
"go.uber.org/zap"
)
```
```
func (dht *IpfsDHT) maybeAddAddrs(p peer.ID, addrs []ma.Multiaddr, ttl time.Duration) {
// Don't add addresses for self or our connected peers. We have better ones.
if p == dht.self || dht.host.Network().Connectedness(p) == network.Connected {
return
}
dht.peerstore.AddAddrs(p, addrs, ttl)
}
```
In the Kademlia algorithm, each node has only four commands:
`PING`: Tests if a node is online.
`STORE`: Requests a node to store a piece of data.
`FIND_NODE`: Searches for a node based on its node ID.
`FIND_VALUE`: Searches for data based on a key, which is very similar to FIND_NODE.
Reference:
- [Distributed Hash Tables with Kademlia](https://codethechange.stanford.edu/guides/guide_kademlia.html)
- [Kademlia: A Peer-to-peer information system based on the XOR metric](https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf)
- [The Distributed Hash Table - libp2p](https://https://pl-launchpad.io/curriculum/libp2p/dht/)
- [libp2p Kademlia DHT specification](https://github.com/libp2p/specs/blob/master/kad-dht/README.md)