Note on Data Availability Sampling(DAS)

# Note on Data Availability Sampling(DAS) > :bulb: This blog aims to draw comparisons between DAS proposals(on 26th may'24) and may be stale later. > Thanks to codex.storage team, prysm team and various research blogs on ethresear.ch to provide me with the right context on DAS. ### What is DAS? Data Availability Sampling works by cherry picking random samples for small portions of block data and based on reconstruction, deterministically verifying that the data for the full block is available to anyone interested - without requiring any single node in the system to hold the whole data of the block. This essential is suggestive, that light clients can contribute to network security without larger storage requirements like those of operating a full node. There are a few different avenues in which DAS is being implemented aiming at different optimisation. Before diving into individual proposals, let's first try and understand the common features that these proposal have. **[DAS data structure]()** Most of the proposal refer to the 2D data availability encoding as proposed in the original Danksharding proposal. using a square structure, with same [Reed Solomon](https://en.wikipedia.org/wiki/Reed–Solomon_error_correction) code with $K=256$ and $N=512$ in both dimensions. This means that any row or column can be fully recovered if at $K=256$ segments of it are available. ![DAS_data_structure](https://hackmd.io/_uploads/BJRKYyWVA.png) Although, SubnetDAS comes in as an exception and employees a 1D extension instead of the 2D extension, which is discussed further in this article. **[Interest]()** Interest is termed as the data segment that is delivered to a node. There are two types of interests based on sync method of clients. 1. **custody** : segments are data shards that participating full-sync nodes receive to both serve sampling to other light clients or non-full-sync* nodes and provide availability at the same time. 2. **sample** : these segment which are delivered to nodes convince to itself about available with a high probability are samples. sample segments can be delivered from various nodes that hold custody. Interest can change over time, for example from epoch to epoch for the sample selection, or with some other time granularity for custody. Changing it or keeping it fixed has both security and network efficiency implications. **[DAS phases]()** The DAS function can be fundamentally bifurcated into two phases 1. **dispersal** : the block is sharded and distributed over p2p network to provide availability and custody in the network. 2. **sampling** : phase dedicated for nodes to collect samples from custody nodes. Here's a quick dive into how each of the proposals are largely similar but have subtle differences to cater different requirement. ### [FullDAS](https://ethresear.ch/t/fulldas-towards-massive-scalability-with-32mb-blocks-and-beyond/19529) Danksharding was planned for 32MB blocks, but the current networking stack can't handle that, becoming the bottleneck. With hardware-accelerated KZG on the horizon for block encoding, the networking stack will need to scale even more. Data Availability Sampling (DAS) encompasses two different concepts: data availability achieved by dispersal to custody, and sampling from custody. This distinction can be leveraged to design an efficient dispersal and sampling protocol. liteDAS is a sampling protocol designed to provide low-latency, bandwidth-efficient, and robust sampling. Dispersal can be done with protocols similar to GossipSub, though modifications are necessary. The combination of deterministic custody assignments and topic routing enables fast peer discovery for both dispersal and sampling. To facilitate sampling, Ephemeral Connect, currently unsupported by the existing stack, should be enabled. 2D encoding (or another form of locally repairable code) is required for in-network repair, which is key to availability amplification. ### [LossyDAS](https://ethresear.ch/t/lossydas-lossy-incremental-and-diagonal-sampling-for-data-availability/18963) Sharding and Data Availability Sampling (DAS) have been under development for some time. While initial papers and the Danksharding proposal focused on a two-dimensional Reed-Solomon (RS) encoding, recent posts propose a transitional one-dimensional encoding. This discussion focuses on aspects of sampling and its probabilistic guarantees, proposing three system improvements. The techniques described have been developed for the original 2D RS encoding but can also apply to transitional proposals. First, LossyDAS is introduced, providing the same probabilistic guarantees as random uniform sampling with added error tolerance in exchange for slightly larger sample sizes. Second, IncrementalDAS is discussed, forming the basis for a dynamic sampling strategy with increasing sample sizes. Finally, Diagonal Sampling for Data Availability (DiDAS) is introduced, a method that improves sampling performance over the specific 2D erasure code, offering better performance than uniform random sampling for worst-case "adversarial" erasure patterns. ### [PeerDAS](https://ethresear.ch/t/peerdas-a-simpler-das-approach-using-battle-tested-p2p-components/16541) The intent of a PeerDAS design is to reuse well-known, battle-tested peer-to-peer components already in production in Ethereum to bring additional data availability (DA) scale beyond that of EIP-4844 while keeping the minimum amount of work for honest nodes similar to EIP-4844 (downloading less than 1MB per slot). This exploration aims to better understand the scale achievable with a relatively simple network structure and various distributions of node types without relying on a more advanced Distributed Hash Table (DHT)-like solution. ### [SubnetDAS](https://ethresear.ch/t/subnetdas-an-intermediate-das-approach/17169) This is a subnet-based Data Availability Sampling (DAS) proposal designed to bridge the gap between EIP-4844 and full Danksharding, similar to peerDAS. The proposal involves creating a subnet for each sample, with nodes obtaining their samples by connecting to these subnets. This approach aims to increase scalability without jeopardizing the network's liveness or increasing node requirements compared to EIP-4844. However, this approach sacrifices the unlinkability of queries. To retrieve samples, nodes must join the corresponding subnets, exposing their queries in the process. This exposure allows an attacker to publish only those samples, misleading nodes into believing data is available when it is not. While query linkability is a common issue in all DAS constructions, it is particularly challenging to address in a subnet-based model. Although this does not affect the availability guarantees of the entire chain and the safety of rollups, it weakens the safety of full nodes against double spends. The issue is discussed in detail, with the argument that this weakening is not as significant as it might seem and that it could be a reasonable trade-off for an intermediate DAS solution. ## References [Introduction to PeerDAS](https://ethresear.ch/t/peerdas-a-simpler-das-approach-using-battle-tested-p2p-components/16541) [Dankrad's blog on Data availability checks](https://dankradfeist.de/ethereum/2019/12/20/data-availability-checks.html) [Fradamt introducing SubnetDAS](https://ethresear.ch/t/subnetdas-an-intermediate-das-approach/17169) [FullDAS proposal](https://ethresear.ch/t/fulldas-towards-massive-scalability-with-32mb-blocks-and-beyond/19529) [LossyDAS proposal](https://ethresear.ch/t/lossydas-lossy-incremental-and-diagonal-sampling-for-data-availability/18963) [Codex.storage blog on DAS](https://blog.codex.storage/data-availability-sampling/)