Altruistic Mode - List of Problems

# Altruistic Mode - List of Problems ## Churn & repair Altruism seems to carry no incentives to maintain uptime. _Active repair under heavy churn_ is probably a bad idea. To state this as research questions, we can say: - In an altruistic mode, how to balance repair frequency and bandwidth costs under high churn, while still maintaining (some) acceptable data durability? - How to design a repair system so that it distinguishes between temporary churn (peers logging off briefly but still hold data) and permanent churn (peers leaving the network entirely)? ## Incentives **Reputation/slashing.** Lightweight verification works well for ensuring that we trigger repair when required, even under byzantine assumptions. Curbing bad peers however requires a reputation mechanism as otherwise peers can just keep re-offending. They could, for instance, purposefully keep triggering repair to destabilize the system. - In an altruistic setting, how should repair be triggered? Who can initiate repair, and how can the system validate/ensure that repair is genuinely required? - What attack vectors can malicious peers do to overwhelm the system with repair, and how can a reputation mechanism prevent these attacks? - When using reputation/slashing to prevent frequent repair, we need to consider false positives and false negatives, how to handle such scenarios without over-punishing honest peers? **Tokenomics** Tokens can be use for rewarding/punishing storage providers for storing the data. This requires consensus for how the rewards are distributed? We probably need to re-visit the old tokenomics design to account for rewards based on: - file size (since we would probably have different sizes or tiers), - the work done on erasure coding (if outsourced), - the durability guarantee provided, i.e. number of proofs and how frequent these are submitted (this is the case if we design Codex with multiple durability options/tiers). ## Sybil resistance Another vector of abuse would be trying to fill the system with garbage to deplete resources (bandwidth or storage). Peers with bad reputation must be banned. This also leads to the need to avoid Sybils. - Is a rate-limiting mechanism (e.g. RLN) suitable in an altruistic mode? Which attack vectors does it prevent, how does it work in practice, and how can it be enforced? We might need RLN to prevent DDOS especially if we support lightweight devices (mobile phones) that only sents requests and don't participate in storage. ## Replication & Erasure Coding **Repair under compaction.** Outsourcing erasure coding and doing compaction of small files solves the problem of increased proof costs and traffic, but it needs to play well with the network-level erasure coding for repair. **Replication vs Erasure coding** Can we support both? which use-cases/settings make one option more favourable than the other (e.g. small vs large files)? ## Lightweight Verification & Proofs In order to have any sort of durability guarantees, we need to have some verification (periodic checks) to detect missing data even if it is through light-weight proofs that provide probabilistic or low durability guarantees. We can have different tiers of durability, starting with best-effort (light-weight verification, e.g. Merkle proofs) to full durability (e.g. original codex storage proofs), and we can rollout these proving systems over-time with increasing levels of durability guarantees and efficiency. However, for each tiers, we must be clear on few things: - What is the threat model that we assume in each tier? What is the guarantees that this proving system provide? How efficient is it (size? Bandwidth required?)? How does it scale with the number of (small/large) files? - Depending of the setting/use-case & durability & other factors, there is a large menu of zk systems that can be used, research is needed to decide which one we can use? - The choice of proof system also depends on who is verifying (proof size matters), is it off-chain/on-chain? - Choice of field & hash function? - Choice of commitment? Merkle tree construction/organization? - Do we require proof aggregation? How expensive? And who is performing the aggregation? Do we need proof compression? - Data availability vs retrievability, clarify what the proofs are guaranteeing? - Merging/bundling small files? Proving such merge? and updating the DHT entry with every merge, and does that require a proof of merge? merging might require erasure coding to amplify the power of the proofs (so it doesn't miss files - less proofs). - Outsourcing erasure coding? And proving correctness of erasure coding? - Entropy/randomness sources? ## Validator Network & Consensus A validator network is a set of nodes that does the periodic checks for data availability. This would require consensus and incentives which can be the same consensus layer as the one used to reward storage providers or it can be a seperate one. There were previous attemps at this: [Decentralised Validator Network](https://hackmd.io/@codex-storage/Hy-Kb7bweg) and Proof Aggregation Network [V1](https://hackmd.io/@mghazwi/SyEu4E1Wel), [V2](https://hackmd.io/@mghazwi/ryiC2Xe7xx) However, the following questions remain open for research: - How would nodes/validators reach consensus on which proofs failed or which datasets/slots require repair? - What are the incentives for being a validator given it requires cpu and bandwidth (for proof verification)? - ## DHT Scaling & Security Considering the use-cases that Codex would like to aim for, we might see a need for storing large number of small files. These would need (if not bundled somehow) metadata DHT entries for each causing scaling issues for the DHT. Can a DHT like the one used in Codex handle this? - How many entries/content keys can a DHT with say $10^4 - 10^5$ nodes handle? What would be the expected bandwidth for a peer in such network? We have some [results](https://hackmd.io/@codex-storage/HkGRPTTYlg) on this for the specific case of BitTorrent replacement of the status community history archive. - Can we use a trie (= prefix tree) bundling structure to reduce the number of DHT entries, so instead of say 1 billion DHT entries, you have let's say 10 million "bundled entries", and each of those entries contain on average 100 "actual entries"? - Given the possiblity of poisoning the DHT with false entries, what type of validations/proofs can be required prior to updating the DHT entries? ## Privacy, Encryption, Anonymity & Plausible deniablity For data privacy, we need encryption and we have a [proposal](https://github.com/codex-storage/codex-docs-obsidian/blob/main/10%20Notes/Codex%20Encryption%20Basis.md) and [design](https://github.com/codex-storage/codex-docs-obsidian/blob/main/10%20Notes/Codex%20Encryption%20Design.md) for this. Anonymity especially in the status and Waku use-cases is something we need to look into. Ideally we would like the ability for any participant in the protocol to connect the data sender/receiver to the data, this I would assume is specially needed in the Waku sending large msgs use-case. The same research problem related to plausible deniablity as described in: [Plausible Deniability in Storage Networks](https://medium.com/@ramsesfv/plausible-deniability-in-storage-networks-e449dbc66160) ## Dynamic Data The same old [research problem](https://hackmd.io/@codex-storage/S1THTouPlg) of supporting dynamic data, but with the different assumptions that we have for the altruistic mode. We need to adjust the research and design to support this. The move to supporting small files that can be appended might simplify the problem especially if we have an idea/design for merging/bundling small files, proving this merge, proving erasure coding, and then proving availability. If these components work, I would assume that we can support dynamic data from append-only files, but more research and design is needed here.