### **Week 0: PeerDAS and Data Availability Sampling** #### **Key Takeaways** * **What is DAS?** Data Availability Sampling (DAS) is a technique that allows network nodes to verify that a large amount of data is available for download without having to download all of it. This is crucial for scaling blockchains by making it possible to handle large data blocks (like those from rollups) efficiently. * **What is PeerDAS?** PeerDAS is the next major step in Ethereum's data availability roadmap, following the introduction of data "blobs" in EIP-4844. It introduces the "sampling" component to the network, where clients verify data availability by downloading small, random pieces of the total data instead of the entire set. * **How it Works:** In PeerDAS, data is arranged into a matrix. Each row represents a "blob" of data that has been mathematically extended for redundancy using a Reed-Solomon code. Clients download commitments for each row and then sample and verify entire columns of this matrix to ensure the whole dataset is available. * **Why it Matters:** PeerDAS is a critical component of "The Surge" in Ethereum's roadmap, designed to massively increase the data capacity of the network. This will significantly lower transaction fees for Layer 2 rollups, making Ethereum more scalable and accessible. It's expected to increase blob count by 2-4x initially and more over time. --- #### **1. The Challenge: Scaling Ethereum** Modern blockchains face a scalability challenge: how to increase transaction throughput without increasing the hardware requirements for nodes to a point that it harms decentralization. Layer 2 rollups are a key solution, as they batch thousands of transactions off-chain and post a summary to the main Ethereum chain (Layer 1). However, the L1 still needs to guarantee that the full data for these transactions is available, so anyone can verify the rollup's state. Initially, with EIP-4844 (Proto-Danksharding), this was solved by requiring nodes to download all the data contained in new "blob" data structures. While this provided dedicated space for rollup data, it did not solve the underlying issue that as data capacity grows, so does the burden on nodes. #### **2. PeerDAS: The Solution** PeerDAS introduces the principle of Data Availability Sampling to solve this problem. It's an intermediate step toward the final vision of "full Danksharding." **Encoding and Commitments** In PeerDAS, the data handling process is as follows: 1. **Blobs as Rows:** Data is split into chunks called blobs. Each blob is treated as a row in a matrix. 2. **Reed-Solomon Extension:** Each blob (row) is encoded using a Reed-Solomon code. This process can be viewed as defining a polynomial from the blob's data and then evaluating that polynomial at additional points. The result is an "extended blob" that contains redundant data, allowing the original blob to be reconstructed even if parts of it are missing. 3. **KZG Commitments:** A cryptographic commitment, specifically a Kate-Zaverucha-Goldberg (KZG) commitment, is created for each extended blob (row). This commitment is a small, fixed-size proof that uniquely represents the data in that row. The full commitment for all the data is simply the list of commitments for all the rows. **Sampling and Verification** Once the data is encoded and committed to, clients can verify its availability: 1. **Download Commitments:** A client downloads the list of KZG commitments for all rows. This is a very small amount of data. 2. **Sample Columns:** Instead of downloading all the extended blobs, the client requests random *columns* from the data matrix. A column consists of one "cell" (a small segment of the extended blob) from each row. 3. **Verify Openings:** Along with the data in the column, the client receives a "multiproof" for each cell. This proof demonstrates that the cell's data correctly corresponds to the KZG commitment for its row. Thanks to an optimization called **batch verification**, a client can check all the proofs for a full column with a single, efficient equation that uses only two pairing operations. This makes the verification process extremely fast. #### **3. Security Guarantees** The security of PeerDAS is formally proven and relies on two core cryptographic properties: * **Position-Binding:** It is impossible for a malicious actor to create two different values for the same data cell that both verify against the same commitment. * **Code-Binding:** Any collection of validly opened cells is guaranteed to be consistent with a single, unique, and correctly encoded data matrix. This ensures that all clients who successfully sample will converge on the same underlying data. These security properties are based on the d-BSDH assumption, a standard cryptographic hardness assumption. #### **4. The Road to Full Danksharding** PeerDAS is a major milestone, but not the final step. The long-term vision for Ethereum involves a two-dimensional version of this scheme: * **Tensor Codes:** After the data matrix is created by extending the rows (as in PeerDAS), it will also be extended *vertically*. Each column will be treated as a polynomial and extended. * **Cell Sampling:** In this future version, clients will only need to sample individual *cells* rather than entire columns. This further reduces the amount of data a client needs to download, improving bandwidth efficiency and scalability even more. The introduction of PeerDAS is scheduled for the Fusaka hard fork, expected in 2025, and is a critical step toward enabling Ethereum to handle over 100,000 transactions per second. --- #### References * **[ASBK21]** Mustafa Al-Bassam, Alberto Sonnino, Vitalik Buterin, and Ismail Khoffi. Fraud and data availability proofs: Detecting invalid blocks in light clients. In Nikita Borisov and Claudia Díaz, editors, Financial Cryptography and Data Security - 25th International Conference, FC 2021, Virtual Event, March 1-5, 2021, Revised Selected Papers, Part II, volume 12675 of Lecture Notes in Computer Science, pages 279-298. Springer, 2021. * **[BFL+22]** Vitalik Buterin, Dankrad Feist, Diederik Loerakker, George Kadianakis, Matt Garnett, Mofi Taiwo, and Ansgar Dietrichs. EIP-4844: Shard Blob Transactions. https://eips.ethereum.org/EIPS/eip-4844, 2022. Accessed: 2024-07-10. * **[But18]** Vitalik Buterin. Reed-Solomon erasure code recovery in n*log^2(n) time with FFTs. https://ethresear.ch/t/reed-solomon-erasure-code-recovery-in-n-log-2-n-time-with-ffts/3039, 2018. Accessed: 2024-06-27. * **[D'A23]** Francesco D'Amato. From 4844 to Danksharding: a path to scaling Ethereum DA. https://ethresear.ch/t/from-4844-to-danksharding-a-path-to-scaling-ethereum-da/18046, 2023. Accessed: 2024-06-27. * **[Eth24a]** Ethereum. Ethereum Consensus Specs - Commit 54093964c95f. https://github.com/ethereum/consensus-specs/commit/54093964c95fbd2e48be5de672e3baae8531a964, 2024. Accessed: 2024-08-09. * **[FK23]** Dankrad Feist and Dmitry Khovratovich. Fast amortized KZG proofs. Cryptology ePrint Archive, Report 2023/033, 2023. https://eprint.iacr.org/2023/033. * **[HASW23]** Mathias Hall-Andersen, Mark Simkin, and Benedikt Wagner. Foundations of data availability sampling. Cryptology ePrint Archive, Paper 2023/1079, 2023. https://eprint.iacr.org/2023/1079.