Project documentation template

--- title: 'PeerDAS: Everything you need to know' disqus: blog --- Everything PeerDAS: A complete guide for upcoming devnet === ## Table of Contents [TOC] ## Summary In this blog I discuss everything related to PeerDAS, what is it, why is it necessary, general idea proposed for construction of PeerDAS, what is DAS, how sampling is conducted various method to improve sampling and the current consensus client implementations ## PeerDAS: A general overview PeerDAS (Peer-to-Peer Data Availability Sampling) is an approach designed to enhance the scalability and reliability of Ethereum's data availability layer by utilizing existing, proven peer-to-peer components. It is necessary because as Ethereum scales, the amount of data that needs to be verified for availability increases, potentially overwhelming individual nodes if they were required to download and verify all data. PeerDAS addresses this challenge by distributing the workload across multiple nodes, each responsible for verifying only a subset of the data, thereby ensuring that the network can handle larger volumes of data efficiently without compromising on data availability or security. ## Construction Basic contruction of PeerDAS can divided into these 6 steps(these steps deals with p2p implementation part of PeerDAS and not DAS implementation itself): ```graphviz digraph hierarchy { nodesep=1.0 // increases the separation between nodes node [color=Red,fontname=Courier,shape=box] //All nodes will this shape and colour edge [color=Blue, style=dashed] //All the lines look like this Custody->{PeerDiscovery} PeerDiscovery->{RowColumnGossip} RowColumnGossip->PeerSampling PeerSampling->PeerScoring// Put them on the same level } ``` with the following specs: `NUMBER_OF_ROWS_AND_COLUMNS`, `SAMPLES_PER_ROW_COLUMN`, `CUSTODY_REQUIREMENT`, `SAMPLES_PER_SLOT`, `NUMBER_OF_PEERS`, uses of these specs will be discussed in detail along contruction. ### Custody **Overview** Custody in PeerDAS refers to the responsibility of nodes to store and serve specific pieces of data (samples) from the blockchain. Each node in the network is assigned to download and custody a minimum number of rows and columns from each block. **How it Works** 1. **Assignment of Rows and Columns:** Each node is pseudo-randomly assigned specific rows and columns to custody. This assignment is determined by the node’s ID, the current epoch, and the custody size parameter. 2. **Custody Requirement:** Nodes are required to download and serve atleast the minmum number of samples specifed by `CUSTODY_REQUIREMENT`. They may choose to handle more, advertising their increased capacity through the peer discovery mechanism. 3. **Duration:** Nodes store the custodied data for the duration of the pruning period, which is the time frame in which data must be kept before it can be safely discarded. 4. **Serving Requests:** Nodes respond to peer requests for samples from the rows and columns they have been assigned to custody. This ensures data availability across the network. ### Peer Discovery **Overview** Peer discovery is the process by which nodes find and maintain connections with other nodes in the network. Effective peer discovery is crucial for ensuring that nodes can efficiently request and receive data samples. **How It Works** 1. **Maintaining Peers:** Each node aims to maintain a diverse and reliable set of peers, with at least a target number of peers specified by `NUMBER_OF_PEERS`. 2. **Custody Distribution:** Nodes seek peers with a variety of custody distributions, meaning they look for peers handling different rows and columns to ensure comprehensive data coverage. 3. **Discovery Mechanisms:** * **DHT-based Discovery:** Utilizing a Distributed Hash Table (DHT) to find peers. The current Ethereum network uses discv5 for similar purposes. * **Gossipsub:** Leveraging libp2p gossipsub for latent peer discovery to add redundancy and resilience against attacks on peer discovery methods. 4. **Peer Capacity:** Nodes prioritize finding peers that can meet their sampling requirements without overly centralizing the network, maintaining a balance across node capacities. ### Row/Column Gossip **Overview** Row and column gossip refers to the communication channels used by nodes to disseminate and receive data samples related to specific rows and columns they custody. **How it works** 1. **Gossip Subnets:** The network is divided into gossip subnets for each row and column (e.g., column_X and row_Y). 2. **Joining Subnets:** Nodes join the gossip subnets corresponding to the rows and columns they are assigned to custody. 3. **Gossiping Samples:** Nodes publish verifiable samples on their assigned subnets. This allows other nodes to receive the necessary samples for data availability verification. 4. **Reconstruction and Cross-seeding:** If a node does not receive all samples for its assigned rows/columns but can reconstruct them (based on coding rates), it gossips the reconstructed samples. Additionally, nodes can cross-seed samples they have obtained via other methods (e.g., ancillary gossip). ### Peer Sampling **Overview** Peer sampling involves querying peers for specific data samples to ensure data availability and integrity across the network. **How it works** 1. **Querying Peers:** At each slot, nodes make a predetermined number of sample queries to their peers. 2. **Sample Requests:** Nodes use the `custodied_rows()` and `custodied_columns()` functions to determine which peers to query for specific samples. 3. **`DO_YOU_HAVE` Packet:** Nodes send a `DO_YOU_HAVE` packet to peers to request samples. Peers respond with a bitfield indicating the samples they possess. 4. **Handling Responses:** Upon receiving a sample, nodes distribute it to other peers that did not previously have it, based on their initial `DO_YOU_HAVE` responses. ### Peer Scoring **Overview** Peer scoring is the mechanism by which nodes evaluate the reliability and honesty of their peers based on their responses to sample requests. **How it works** 1. **Deterministic Custody Verification:** Nodes know exactly what samples a peer should have based on deterministic custody functions. 2. **Scoring Criteria:** * **Honesty:** If a peer does not respond with the correct samples they are supposed to custody, they are scored negatively. * **Bandwidth:** If a peer fails to respond due to bandwidth issues, they are considered less reliable. * **Sample Acquisition:** Peers that consistently fail to acquire the necessary samples are also downscored. 3. **Actions Based on Scores:** Nodes may downscore or disconnect from peers that do not consistently meet sampling requirements, optimizing for better peer connections. ### DAS Providers **Overview** DAS providers are high-capacity or super-full nodes that offer consistent and reliable data availability services to the network. **How it works** 1. **High Capacity Nodes:** DAS providers are nodes that handle a significant portion of the network's data, either as high-capacity nodes (handling more than the baseline requirement) or super-full nodes (handling 100% of the data). 2. **Discovery and Configuration:** These nodes can be found through the normal peer discovery process or configured directly by other nodes for prioritized connections. 3. **Public Goods:** Some organizations, like L2 DAOs, might support several super-full nodes as a public good, and these can be added to nodes’ configurations to enhance DAS service quality. 4. **Redundancy and Security:** Direct peering with DAS providers complements other peer discovery mechanisms, reducing the risk of attacks and improving the overall robustness of data availability. We have discussed how p2p components will developed for implementing peerDAS, now let's study the DAS part of peerDAS and it's stages of development. ## Stages of DAS Development Achieving full dank-sharding is divided into followin three stages: 1. **stage-0:** stage 0 is the release of **EIP-4844**. Subnets are introduced to distribute blobs, but nodes are required to participate in all subnets and download all the data. There is no Data Availability Sampling (DAS) at this stage.(We're currently at this stage of development) ![324f0aa21ab6d5ff62ef792eb1e03d290fe7ca0d](https://hackmd.io/_uploads/r11Y-0Y70.png) 2. **stage-1:** also called **1-D peerDAS** is supposed to follow with the upcoming **electra fork**, all consensus client teams have currently implemented stage-1 of PeerDAS development. Blobs are extended horizontally in a 1D manner. Blob distribution is sharded by introducing column subnets, and nodes participate in only a subset of these subnets. Blob/row subnets might be phased out at this stage. Networking components for peer sampling are introduced, including the discovery of sampling peers and peer sampling using a request/response protocol. Nodes perform Data Availability Sampling (DAS) on columns, and the number of column subnets each node participates in is selected so that a node's peer set can reliably cover all columns. During this stage, the maximum blob count is expected to gradually increase from 32 to 64 and possibly to 128. ![03ea266fa8c9e9a9c39bd1f6aa374427f52d2b70_2_996x750](https://hackmd.io/_uploads/ByJ9GRFQA.png) 3. **stage-2:** also called **2-D peerDAS** or **full Dank Sharding** is the ultimate goal of scaling data-availability layer of ethereum. In this stage, we implement full Danksharding, with a focus on peer sampling as the core of the sampling infrastructure. Blobs are extended in two dimensions (2D), adding a vertical extension to the existing horizontal extension. Peer sampling now operates on cells within the extended matrix rather than entire columns, making the process lightweight and resulting in negligible bandwidth consumption. Additionally, light sampling nodes that do not participate in the distribution of data may be supported. Distributed reconstruction, when implemented, enhances robustness against subnet failures, ensuring greater data availability and reliability. During this stage, the initial blob count is expected to start at 64 or even 128, with a gradual increase towards the full Danksharding maximum throughput of 256 blobs. This incremental approach allows for continuous monitoring and optimization of the system's performance and scalability. ![391ac497c47108bf5a02859a4f42bba3bae550fd_2_796x748](https://hackmd.io/_uploads/BkDHE0tQ0.png) ## Further efficiency improvements in sampling mechanisms The sampling efficiency and redundancy in DAS methods discussed above can be improved further using some logical mathematical aspects, these methods are discussed in the following three strategies: ### LossyDAS (Lossy Data Availability Sampling) **Concept:** * LossyDAS involves tolerating a small number of missing segments in the sampling process. This method adjusts the sample size to maintain the same probabilistic guarantees for data availability despite some segments being unavailable. **Mechanism:** * **Sample Size Adjustment:** Instead of requiring all sampled segments to be present, LossyDAS allows for a predefined number of segments to be missing. This adjustment increases the sample size slightly to account for potential losses. * **Error Tolerance:** By allowing a few segments to be missing, the system becomes more robust to transient network errors and temporary unavailability of segments. **Mathematical Foundation:** * The false positive (FP) rate, which is the probability of a non-available block passing the test, is a critical parameter. LossyDAS calculates the required sample size 𝑆 based on the target FP rate and the number of allowed missing segments 𝑀. * Using the hypergeometric distribution, LossyDAS determines the probability of successfully sampling the required number of segments despite allowing some losses. **Benefits:** * **Increased Robustness:** The system can handle transient errors more effectively, reducing the need for perfect network conditions. * **Flexibility:** Nodes can sample more segments initially, anticipating some losses, which leads to a more reliable data availability test. ### IncrementalDAS (Incremental Data Availability Sampling) **Concept:** * IncrementalDAS dynamically adjusts the sample size based on the initial sampling results. If the initial sample does not meet the availability criteria, the sample size is incrementally increased. **Mechanism:** * **Dynamic Sampling:** The node begins with a smaller sample size and a tolerance for missing segments. If the sample fails to provide sufficient data, the node increases the sample size and attempts to retrieve additional segments. * **Conditional Probability:** The extended sample includes previously sampled segments, ensuring that the probabilistic guarantees remain valid. **Procedure:** * **Initial Sample:** Select an initial sample size 𝑆(𝐿1) and allow 𝐿1 losses. If the test fails (i.e., not enough segments are retrieved), the node increases the sample size to 𝑆(𝐿2) and samples additional segments. * **Iterative Extension:** This process can be repeated, incrementally increasing the sample size until the required data availability is confirmed or a predetermined limit is reached. **Benefits:** * **Adaptive Strategy:** Nodes can adapt to network conditions and transient errors by increasing the sample size only when necessary, reducing unnecessary data retrieval. * **Improved Efficiency:** IncrementalDAS ensures that the sampling process is efficient by minimizing the initial sample size and only increasing it when needed. ### DiDAS (Distinct or Diagonal Data Availability Sampling) **Concept:** * DiDAS improves the sampling process by ensuring that sampled segments are distributed across distinct rows and columns of the 2D Reed-Solomon encoded grid. This method targets the detection of worst-case erasure patterns more effectively than uniform random sampling. **Mechanism:** * **Distinct Row and Column Sampling:** Instead of selecting segments uniformly at random, DiDAS ensures that each sampled segment is in a distinct row and column. This reduces the likelihood of missing critical data due to clustered erasures. * **Enhanced Detection:** By distributing samples more evenly across the grid, DiDAS increases the chances of detecting worst-case erasures, where data is missing in specific patterns designed to evade detection. **Mathematical Foundation:** * The probability of detecting worst-case erasures using DiDAS can be modeled using a combination of hypergeometric distributions, providing a more accurate assessment of the sampling process's effectiveness. **Benefits:** * **Improved Performance:** DiDAS offers better detection of worst-case erasures compared to uniform random sampling, especially for larger sample sizes. * **Simple Implementation:** Implementing DiDAS is straightforward, as it only requires ensuring that sampled segments are chosen from distinct rows and columns. This guide is rapidly under development, consensus specs for eip 7594, and implemenation guide in lighthouse and prysm will be added shortly........