## Naive altruistic codex idea
### Assumptions
- libp2p gossipsub mesh network
- kademlia DHT with two roles: `client` and `server` (see [spec](https://github.com/libp2p/specs/tree/master/kad-dht#client-and-server-mode)). The client role is reserved for "light clients" or more ephermal, high churn, nodes. The server role is reserved for more reliable nodes (Status Desktop node or Waku store node).
- Storage Providers (SPs) MUST act as a kademlia dht server role
- These nodes MUST advertise the libp2p Kademlia protocol identifier via the [identify protocol](https://github.com/libp2p/specs/blob/master/identify/README.md).
- Light clients (clients) MUST act as a kademlia dht client role
- These nodes MUST advertise the libp2p Kademlia protocol identifier via the [identify protocol](https://github.com/libp2p/specs/blob/master/identify/README.md).
- Nodes MUST operate in an honest manner, and will not try to game the protocol
- Submission of a proof indicates that a SP actually has the data
- Proof system can be swapped out later for a real proving system
- Attempt to use existing Codex modules
- Average dataset size of 50MB for now
- Retain as much existing code as possible, including erasure coding
- System MAY provide a non-zero level of guarantees that the data exists
- proofs of storage MAY be provided
- proving system SHOULD be swapped to a more robust system later on
- SPs will store downloaded files indefinitely, for now
### Roles
1. **Storage Provider**. Stores files in the network. Should be a reliable node. Also supports a client role, where files can be "uploaded" to the network (chunk + EC + broadcast).
2. **Light client**. Can "upload" (chunk + EC + broadcast) and download datasets.
### Sequence
1. Start codex node, bootstrapped to network
1. `[UPLOADER]` Client node uploads dataset:
1. $≥ 50MB$: `mode = EC`
1. chunk, EC dataset, produces $[CID_0...CID_n]$
1. $< 50MB$: `mode = repl`
1. No chunks, no EC produces $[CID]$
1. Produces protected manifest with its own `CID`
1. `[UPLOADER]` broadcasts `StorageRequest` (includes `mode`, protected manifest `CID`), starts handshake
* `StorageRequest`
* "topic"
* timestamp
* requestid
3. `[SP]` Upon receipt of storage request:
1. Establish connection with uploader
1. Send message with `intent to store` (ie slot reservations)
1. Payload: signed message with $RequestID$
4. `[UPLOADER]` Upon receipt of `intent to store`:
1. Verifies signature
2. Buffers verified intents
3. Once enough verified `intent to store` received, send `intent approved` message to all SPs with intents.
1. payload: signed $hash(RequestId|SlotIndex)$, where $SlotIndex$ is randomly assigned to a SP.
5. `[SP]` Upon receipt of `intent approved`:
1. verify signature
2. start download of slot
6. `[FUTURE?]` `[SP proving]` At regular intervals, each SP submits a "fake proof" (signature of $hash(CID|some agreed entropy)$) to the original uploader. The uploader can verify this signature.
## Open Issues
1. NAT holepunching
2. Discovery speed
A naive idea for potentially solving the first two is by using a decentralised "tracker" network that replaces the DHT. Maintaining a DHT creates a lot of gossip across the network. Instead of replicating a DHT across all nodes in the mesh, provider records could be replicated and maintained by a smaller separate swarm. Requests for provider records would go to the separate swarm instead of being broadcast across the network, thereby reducing congestion.
3. **Connection limits**. When a storage request is broadcast, there will be many peers attempting to establish connections with the uploader and this may exhaust connections quite quickly. Additionally, Codex nodes will be sharing resources with Waku or Status Desktop nodes, potentially further limiting available connections.
1. If `NodeID` is $k$ away from protected manifest `CID`, continue