Secure Kademlia Library
Summary
DAS Networking in Ethereum requires lots of small blob samples to be placed in a distributed network where arbitrary nodes can query "the network" to make sure these samples are available.
What this "network" actually looks like is still an open question, but there seems to be high level consensus on its process and design:
- Initial Dissemination - Builder sends rows and columns to validators via gossipsub-like protocol (unstructured network)
- Continued Dissemination - Validators distribute samples to rest of network via Kademlia-style DHT(structured network)
- Querying DHT for samples - Sampling nodes ask DHT for individual samples
- Reconstruction - If there's a liveness failure within the network and there are enough individual samples held amongst participants, participants send samples to at least one honest validator (or full node?) to reconstruct (and begin redistribution?) of the blob.
Motivation
This structured network, where anyone can participate, has a lot of nice properties for distributing small chunks of data across an ever-changing set of data and peers, but this thing comes with major drawbacks.
Most notably, anyone can spin up a bunch of node ids and have a disproportionately large amount of control over data and data flow within the network. These are sybil attacks.
A secure kademlia overlay can help mitigate adversaries wishing to cause harm and seems to important piece of the puzzle to making DAS Networking happen.
Within this crate, I modify libp2p's rust implementation of the kademlia protocol to create a "hardened version" of our Kademlia DHT.
High Level Roadmap
1. Fork Libp2p's implementation of a Kademlia library.
Questions:
- Does it matter that Libp2p's DHT relies on libp2p streams?
Resources:
Inspired by Dankrad's S/Kademlia document:
- Parallel, disjoint-path key-value lookups -
Libp2p already implemented this. See here.
It might be worth implementing the modified version of parallel lookups, where disjoint lookups kick in if the original lookup doesn't work, as seen in Dankrad's optimizations of S/Kademlia.
- Validator-only peer logic for routing table -
*Might stick with a proof of work requirement for node id generation in initial creation of the library (if it's easier) to get the interface down.
- Sibling broadcast list -
Instead of replicated storage for a value being defined by length of the kbucket variable (really used for redundancy of overlay connectivity), we define the replicated storage of a sample by a seperate variable.
Expose same functionality as Kademlia:
- Ping
- Store
- FindNode
- FindValue
3. Integrate with discv5 (via overlay protocol)
Our secure routing table needs to follow specialized rules. Where and how should these rules be implemented?
I believe these questions fall under the umbrella of the Portal Network's Overlay Routing Table functionality.
Questions:
- How can this library plug into overlay protocol routing tables.
- How do sampling nodes within a validator-only routing table ask for queries?
- Should I be creating a specialized routing table library?
Or some sort of logical interface that plugs into arbitrary DHTs?
Resources: