Swarm Data Chain

--- tags: Swarm Data Chain WIP: <to be assigned> title: Swarm Data Chain author: Mohamed Zahoor (jmozah) discussions-to: https://discord.com/channels/799027393297514537/1068161013934985287/1239528605013377085 status: Draft type: Standards Track category: Layer 2 created: 2024-05-15 ---  ## Simple Summary  One of the biggest issues that any blockchain faces is to store and manage its ledger(data). The faster the transaction execution of a chain, the bigger the issue of making its data available and retrievabile to all its clients. This SWIP proposes to solve the data availabilty and retrievability problems by having a generic data chain to manage and store data of other blockchains. This data chain will use Swarm as the storage layer to acheieve this goal. ## Abstract  Blockchain scaling usually means to scale in the following dimensions - Transaction Execution - Data storage - Bandwidth Lately, rollups have helped scale Ethereum to a degree by offloading the Transaction Execution and compressing block data (rollup). Data storage related issues like availability and retrievability are still open problems. The proposed solution is to solve the data availability and retrievability problems of blockchains (Especially Ethereum Layer2's). Chains will be able to store their data (blob, blocks, state, logs, receipts etc) and allow its clients to check for avilability and to retreive them later if needed. ## Motivation  Modular blockchains are gaining popularity so that new chains can be build fast and with ease. Having a modular storage layer for blockchains will make the data storage of these chains more manageble. Storing a blockchain data in a decentralised storage will give rise to new dimensions for centralised applications like etherscan. Following are some of the motivations for creating a generic Data chain - Solving Data Availability problems - Light nodes need strong data availability assurances without the need to download the entire block. - Ethereum Layer2 is another example where the data should be available to other nodes for liveness. - This is also required to build future "stateless" clients where it need not be required to download and store the data. - Solving Data Retrievability problems: - Blockchains have special archive nodes to store the entire data. most of the other clients rely on them to get the full data. This is a problem especially if the number of archive nodes are small. - Future chains can totally eliminate the storage of data in each client and instead support a stateless client model which will be light and thereby increase decentralisation. - Using Swarm as the base Layer: - Highly distributed storage network - Provable data storage (Merkle Proof) - Censorship resistance - Data redundancy (chunk is stored in all the neighbourhood) - Efficient use of Bandwidth if a chunk is requested often. ## Specification  The design consists of a new "Data Chain" and the existing Swarm network. - New Data Chain - This is a new blockchain that uses a Delegated Poof of Stake (DPoS) consensus with multiple validators which manage the network. Validators need to stake a certain amount of BZZ to become active. Other BZZ holders can delegate their BZZ tokens to any of the existing validators. The voting power of a validator will be proportional to the number of BZZ tokens it has staked. - Validators arrive at consensus about a new data (ex: block) that is created by the supported blockchain. Once greater then 2/3 majority if reached about the data, it is then permenantly stored in the Swarm network. The Validators will take care of all the pre-processing work like data sampling and organising the data before storing them in Swarm. Any request for the original data or a piece of data (sample) will get the necessary mapping from the validators and will get the respective Swarm hash to access the data from Swarm network. - New data sources (blockchains) and types (block, state, logs, receipts, blobs etc) can be added for ingestion over time using governance. - The state of the Data chain should be updated to a smart contract in Layer 1 (Ethereum) so that it inherits the same security gurantees as in Ethereum. ## Rationale  1) [SWIP-42](https://github.com/ethersphere/SWIPs/pull/42/files#diff-b0a6bcf1f6e706ea47edb89ad8b82c36c4ef6dee3576e1e91b2b0248fd31a5a8) proposes a similar design where every data that is ingested is updated to layer 1. This solution is prohibitively expensive and less in data capacity since it uses Layer 1. 2) Using a seperate blockchain for managing the storage makes it much more data centric and helps to create a generic solution to add dataspaces on the fly. 3) Using Swarm as the final resting place for data inherits all the goodies that has been built in Swarm over the years. 4) Later this chain can be upgraded to have an EVM so that more programmable usage and control of data can be enabled. 5) Having a seperate Swarm chain and bringing in more chain data in to the network and will help Swarm operators directly. 6) Future blockchains will be highly decentralised as the resource requirements of a client will come down drastically and at the same time they will have the same security as if they have stored all the chain data. ## Backwards Compatibility  This is a new design for Data Chain so there is no specific backward compatibility requirement. Special care has to be taken to ensure that the data sampling algorithms are backward compatible. With r.to Swarm, the validators will use the Swarm API to push data and store their BZZ address as part of their state data. ## Test Cases  AT first we should run a testnet that can capture some data from testnets of other blockchains. This will help in testing the working of the chain and the integration with Swarm network. ## Implementation  To start with, the Data chain ca be built with CometBFT and Cosmos SDK on top of it. We can start with few validators and then increase the number as we test. The BZZ can be brough in using a bridge from Ethereum Layer1 so that validators can stake and other users can delegate them to run the network. ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).