PeerDAS - HackMD

# PeerDAS: Scaling Ethereum's Data Availability Layer Ethereum’s long-term scalability roadmap is fundamentally dependent on cheap and reliable data availability. A pivotal milestone, after [Proto-Danksharding](https://eips.ethereum.org/EIPS/eip-4844), in this progression is the introduction of **Peer-to-Peer Data Availability Sampling (PeerDAS)**. Introduced in the Fusaka upgrade and formalized in [EIP-7594](https://eips.ethereum.org/EIPS/eip-7594), PeerDAS is a significant advancement in the network's capacity to scale blob throughput without raising each node's hardware requirements. ## Before Danksharding: The `CALLDATA` Era ### Rollup Architecture Ethereum's Layer 2 solutions generally fall into two categories: * Optimistic Rollups (Arbitrum, Optimism, etc.) * Zero Knowledge Rollups (Starknet, Scroll, etc.) Both Optimistic and ZK rollups rely on an L1 smart contract and a batcher to facilitate data settlement. Prior to the introduction of blobs, batchers were restricted to use `CALLDATA` to post transaction information to the Ethereum L1. ![image](https://hackmd.io/_uploads/Syfa1pxLWx.png) ### Disadvantages of `CALLDATA` * **Irrelevant Computation:** Rollup data is used for off-chain reconstruction or dispute resolution, not for EVM execution. However, because `CALLDATA` is part of the execution layer, it is priced same as a smart contract instruction. * **Cost:** Posting data to `CALLDATA` is expensive for users, as it competes for the same limited gas as high-value L1 transactions. * **Permanent Storage:** Ethereum nodes traditionally store execution data forever, but rollups only need data availability for a few weeks. So using `CALLDATA` bloats the state. ## The Introduction to Blobs To address the limitations of `CALLDATA`, the Dencun upgrade introduced the **Proto-Danksharding (EIP-4844)**. This transition fundamentally changed how Ethereum handles Layer 2 data by introducing a new transaction type: **Blobs (Binary Large Objects)**. ### What are blobs? Unlike `CALLDATA`, which is stored in the execution layer, blobs are stored in the consensus layer. This separation allows Ethereum to treat rollup data differently than standard transactions. ### How blobs solved the issues? * **Dedicated Fee Market:** Blobs have their own fee market. They do not compete with smart contract instructions for gas. This multi-dimensional fee market ensures that a spike in L1 activity doesn't automatically make L2 transactions more expensive. * **Ephemeral Storage:** They are automatically pruned from consensus nodes after approximately 18 days. This window is sufficient for rollups to ensure data integrity, while protecting the network from the growth problem of `CALLDATA`. ![image](https://hackmd.io/_uploads/rJnIeTeI-x.png) ### Limitations of blobs While EIP-4844 was a massive leap forward, it introduced a new constraint: **bandwidth**. To keep the network decentralized and accessible to solo validators, Ethereum currently limits the amount of data added to each block through a **Target** and **Max** blob system. The Target represents the ideal network load. If demand exceeds this, the protocol automatically increases blob fees to discourage congestion. The Max acts as a hard ceiling to ensure that the data load never exceeds what a standard internet connection can handle. To scale beyond these conservative limits without crashing smaller nodes, Ethereum needed a way to verify data availability without requiring every node to download every single blob. ## PeerDAS: Architecture PeerDAS solves this problem of bandwidth by ensuring that every node need not to download every blob to verify the data availability. ### How it Works? PeerDAS introduces a structured way of organizing blob data into a two-dimensional matrix of rows and columns. In this model, the rows represent individual blobs submitted by rollups, while the columns are vertical slices that cut across every blob in a single block. Instead of downloading a full blob (a complete row), a node only needs to download and verify specific columns. Because each column contains a small piece or cell of every blob in that block, a node can verify that data for all rollups is present just by checking these vertical slices. To ensure this sampling is reliable, PeerDAS utilizes **1D Reed-Solomon Erasure Coding**. The original blob data is mathematically stretched to double its size. This creates a powerful guarantee: even if half of the data is missing, the original information can be fully reconstructed from any 50% of the extended dataset. By extending the blobs into these larger sets of columns, the network ensures that data isn't just present, but recoverable. ![image](https://hackmd.io/_uploads/r1oa5hlIWl.png) *(Yellow ones are the erasure-coded blobs) Once the data is stretched, nodes can verify its availability without downloading the whole matrix. This is done through probabilistic sampling, by checking a small, random selection of columns, a node can mathematically confirm with near-certainty that the rest of the data exists. A critical innovation here is the **Cell KZG Proof**. Each small unit of data (a cell) comes with a fixed-size cryptographic proof. This allows a node to download a single cell from a peer and immediately verify that it correctly belongs to the original blob commitment. This removes the need for trust, the math proves the data is authentic. ### Why KZG instead of Merkle Trees? While Merkle proofs are a common way to verify data, they become heavier as the data grows. PeerDAS uses KZG proofs because: * **Constant Size:** A KZG proof is always 48 bytes, whether it's proving a single cell or a whole blob. A Merkle proof would grow significantly with the number of cells. * **Unified Commitment:** The same KZG commitment and proof work for both the original blob and the extended version, since both correspond to evaluations of the same underlying polynomial. Merkle proofs would require different paths for the original and the extended data, adding significant complexity. ### Distributed Custody PeerDAS optimizes the network by distributing the storage workload based on a node’s identity. Instead of random guesses, the Peer-to-Peer (P2P) network is divided into subnets, with each node assigned to custody (store and serve) specific columns based on its Node ID. * **Standard Nodes:** These nodes are responsible for a minimum of 8 data columns. As a validator’s stake increases (for every 32 ETH), they are required to custody additional columns. * **Supernodes (DAS Providers):** These are high-capacity nodes—often operated by explorers, indexers, or large staking pools—that download, store, and serve the entire data matrix (all 128 columns) to ensure the network can always "heal" and reconstruct data if parts of the matrix become unreachable. ### Why Custody Columns Instead of Rows? In a Row Custody model, nodes would store and serve full individual blobs. While intuitive, this creates a major vulnerability of **targeted DDoS attacks**. Imagine a block with 6 blobs. If each validator only custodies 1 or 2 specific blobs, an attacker could launch a DDoS attack against the specific subset of nodes responsible for a particular blob, let's say, Blob #3. If those few nodes are knocked offline, Blob #3 becomes unavailable to the entire network, even if every other part of the block is perfectly fine. By using Column Custody, PeerDAS ensures that every node holds a vertical slice (cells) of every single blob in the block. To make a single blob unavailable, an attacker would have to take down a massive, randomized cross-section of the entire network rather than a small, predictable cluster. ## Potential Attacks & Fork Choice Defenses The most dangerous threat in a sampling-based system is a **Data Withholding Attack**. This occurs when a malicious proposer publishes a block header (making the block appear valid) but withholds just enough blob data to make reconstruction impossible. To prevent the network from building on top of this invalid data, PeerDAS updates the fork choice rule to include a mandatory Data Availability (DA) check. * **The Availability Check:** A block is only considered valid by a node if it passes the DA check. This means a validator must successfully sample their assigned columns. If the data is missing or the peer-to-peer sampling fails, the validator simply ignores that block. It treats the slot as empty, regardless of how many other signatures the block might have. * **Tight & Trailing Fork Choice:** Ethereum uses a two-layered approach to ensure that unavailable blocks are identified and discarded before they can harm the chain: 1. **Tight Fork Choice (The Immediate Defense):** During the current slot, validators must perform their sampling immediately upon receiving a block. If the sampling fails, they refuse to attest for that block. This ensures that an unavailable block fails to gain the 2/3rds supermajority required to progress, leading to an immediate reorg where the network switches to a different, available branch. 2. **Trailing Fork Choice (The Final Safety Net):** If a block somehow passes the tight check but is later found to be unavailable (e.g., during a period of high network latency), the trailing fork choice prevents the node from moving its justified head forward. Nodes will refuse to justify or finalize any checkpoint if a block in that chain is missing data. This ensures that honest validators are never locked onto an unavailable chain that they cannot later exit. * **Protection Against Reorgs:** PeerDAS prevents blind chain growth by requiring proposers to verify the availability of the parent block before building a new one. If an attacker tricks a proposer into building on an unavailable block, other honest nodes will see that the parent data is missing and reject the entire new branch. This creates a self-healing effect where the canonical chain only grows on a foundation of verifiable data. ## What’s Next: The Path to Full Danksharding PeerDAS is the foundational unlock for Ethereum’s scalability, but it is not the final destination. It serves as the bridge between Proto-Danksharding (1D sampling) and Full Danksharding (2D sampling). * **2D Data Availability Sampling:** While PeerDAS currently uses 1D erasure coding, the next evolution involves 2D erasure coding. In this model, the entire data matrix is erasure-coded together. This provides even stronger redundancy, even if massive portions of the matrix go offline, the data can be recovered more efficiently. This transition will allow Ethereum to safely scale from the current target of 3–6 blobs to 64 or more blobs per block. * **BPO Forks: Scaling on the Fly:** With the activation of the Fusaka upgrade, Ethereum introduced Blob-Parameter-Only (BPO) forks. These are pre-programmed, streamlined updates that allow the network to increase blob capacity (e.g., moving to a target of 14 and a max of 21) without requiring a massive, coordinated hard fork. PeerDAS provides the bandwidth efficiency that makes these rapid capacity increases possible. * **The Rise of DAS Providers:** As the data load grows, we will see a clearer distinction in node roles. While standard nodes maintain decentralization by only custodying a fraction of the data, supernodes will ensure the network’s healing capability by storing 100% of the data matrix. This hybrid approach allows Ethereum to match the throughput of centralized torrent systems while keeping the consensus layer secure and decentralized. ## Conclusion The transition from the expensive, state-bloating era of `CALLDATA` to the efficient, sampled architecture of PeerDAS marks the most significant evolution in Ethereum’s history since The Merge. By decoupling data verification from data storage, PeerDAS effectively breaks the bandwidth bottleneck that previously limited Layer 2 scaling. Through the clever use of 1D Erasure Coding, Cell KZG Proofs, and a distributed custody model, Ethereum has created a system where solo validators can secure a network processing tens of thousands of transactions per second. With PeerDAS now live, the Surge is no longer just a roadmap item, it is the operational reality of a mature, scalable global settlement layer.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.