## Naive altruistic codex idea ### Assumptions - The only goal for now is to replace Bittorrent in status-go for Status Desktop. - libp2p gossipsub mesh network - kademlia DHT with two roles: `client` and `server` (see [spec](https://github.com/libp2p/specs/tree/master/kad-dht#client-and-server-mode)). The client role is reserved for "light clients" or more ephermal, high churn, nodes. The server role is reserved for more reliable nodes (Status Desktop node or Waku store node). - Storage Providers (SPs) MUST act as a kademlia dht server role - These nodes MUST advertise the libp2p Kademlia protocol identifier via the [identify protocol](https://github.com/libp2p/specs/blob/master/identify/README.md). - Nodes MUST operate in an honest manner, and will not try to game the protocol - Average dataset size of 50MB for now - Retain as much existing code as possible - No erasure coding, entire datasets only - System MAY provide zero guarantees that the data exists - this means no proving system is OK - SPs will store downloaded files indefinitely, for now - SPs will participate in every storage request, ie download everything - Sales process will not use availabilities at all, simply download and bounce if quota limit is reached. - No integration tests for now - Datasets are downloadable by any node in the swarm - No encryption ### Roles 1. **Storage Provider + Client**. Stores files in the network. Should be a reliable node. Also supports a client role, where files can be "uploaded" to the network (broadcast). ### Sequence 1. Start codex node, bootstrapped to network 1. `[UPLOADER]` Client node uploads dataset: 1. hashes dataset, generating a `CID` 1. Produces protected manifest with its own `CID` 1. `[UPLOADER]` broadcasts `StorageRequest` (includes protected manifest `CID`) 1. `[SP]` Upon receipt of storage request: 1. Establish connection with uploader 2. start download of dataset ## Open Issues 1. NAT holepunching 2. Discovery speed 3. **Connection limits**. When a storage request is broadcast, there will be many peers attempting to establish connections with the uploader and this may exhaust connections quite quickly. Additionally, Codex nodes will be sharing resources with Waku or Status Desktop nodes, potentially further limiting available connections. 4. Scalability 5. Unknown level of guarantees ### Potential solutions A naive idea for potentially solving the first two is by using a decentralised "tracker" network that replaces the DHT. Maintaining a DHT creates a lot of gossip across the network. Instead of replicating a DHT across all nodes in the mesh, provider records could be replicated and maintained by a smaller separate swarm. Requests for provider records would go to the separate swarm instead of being broadcast across the network, thereby reducing congestion. - Similar idea may have already been attempted (and failed): https://github.com/libp2p/hydra-booster