blockchain on swarm 2022

# blockchain on swarm 2022 This document is describing a product that demonstrates the power of swarm and provides a solution for data availability for blockchains long desired yet still unsolved. The ethereum state is using a data structure called Patricia Trie and references nodes by the keccak hash of their RLP serialisation. Patricia trie nodes naturally lend themselves to being represented by chunks in swarm. The idea is that to provide an implementation of the ETH RPC API and use existing client implementations but switching out the local disk access of trie nodes by keccak hash to swarm retrieval. ### objectives and potential - redecentralise the broken p2p model of ETH state provision to Dapps - unburden eth miners from syncing traffic with potential improvement on transaction throughput - combined with swarm decentralised database services, serve as a decentralised (backend for a) chain explorer - eliminate the problem of full/archival nodes, moreover it offers any swarm node to run a gateway and serve archival API - incentives for light server ### architecture The product can be impleented in several ways, but the simplest approach seems to be to provide a stand-alone proxy server that talks to a swarm node using the chunk API. Retrieval functionality - independent of bee, provides an effective ETH RPC API endpoint - using bee chunk API to retrieve Patricia Trie nodes Seeding functionality - potentially exposing ETH API for profit - have connection to real ETH node and - retrieve missing ETH state trie nodes from there and - store them in its own cache AND - potentially upload it with a postage stamp ### implementation We could use go-ethereum's libs that implement the API endpoints (essentially doing state trie traversals) as a backend of the proxy. The problem is that go-ethereum uses a series of optimisations to enhance disk-based storage of state with leveldb, such as tiling, block-indexing by blocknumber+hash. Ideally, we only need a slight adaptation of the go client ETH API and statetrie related libraries and only switch leveldb storage to swarm API client calls. ### DISC-level (layer 1) prerequisites In order to do traversal on state nodes, swarm needs to provide retrieval by keccak hash. For this we need a genuine way to map keccak hashes and swarm chunks. SOC as they are currently defined would require permissioned seeding, so a novel type of SOC is devised and needed to be implemented by bee. - special SOC (with deterministic relation between SOC ID and content) to facilitate retrieval by eth hashes from swarm - postage stamp signature to include BMT hash of content in order for storage incentives to cover special SOCs. ### bootsrapping - if proxy has access to trad ETH node, then missing trie nodes are - requested from ETH node - cached - uploaded to swarm - this is costing money so can be incentivised - taking money for remote API services - keep quality by upload - this way bootstrapping optimised for demand - possibly gateway run by foundation ### security - fairly low liability since it is only optmisation, - completely reproducible dataset - not dependent on postage stamps and storage incentives ### MVP description by Dani we outline how the Swarm network can possibly be used by Ethereum Light Clients as a state cache and a load balancer. This proposal seems to be the lowest hanging fruit, requiring the minimum of implementation effort on both sides (geth and bee) resulting in the most value for users. #### Preliminaries Ethereum's state consists of many small binary blobs addressed by their Keccak hashes. The vast majority of these blobs are less than 4 kilobytes in size. Requests to such state data constitute the bulk of light client requests to light servers. Swarm has been specifically engineered to retrieve and cache binary blobs (called "chunks") that are at most 4 kilobytes in size. While Swarm's content addressing is different from Ethereum's, it is relatively easy to define an extension to Swarm that would allow retrieving specific chunk payload by its Keccak hash. If light clients could request such data from Swarm, which, in turn would either serve it up from cache or, in the absence of a cached copy, request it from different Ethereum nodes (not necessarily light servers), it would considerably alleviate the load on light servers and thus improve the overall user experience of using light clients. #### Protocol Outline - When requesting a blob of state data, the light client first turns to Swarm. - The Swarm request either gets served from cache or gets routed to the neighborhood that is closest in XOR metric to the content address of the blob. - The node in the neighborhood that received the request, requests the given blob from the Ethereum node to which it is connected. In case the blob is found and is no greater than 4 kilobytes, it is cached and served as a response. - If the node is not found or exceeds 4kB in size, a response to this effect is routed back to the requesting light client, which requests the blob directly from a full client, if applicable. The caching and serving of state data is implicitly incentivized by the bandwidth incentives already present in Swarm. Storage incentives are not involved, so no need to wait for them in order to work on this proposal.