In blockchain technology, Data Availability (DA) ensures that all participants, including or excluding a supermajority of full nodes, can access the necessary data to validate transactions. DA guarantees that data is published and not withheld, which is crucial for maintaining blockchain integrity and security. Unlike data storage or continuous availability, DA focuses on ensuring data can be accessed when needed.
DA is critical for scaling solutions like rollups. For example:
To scale DA, several methods are employed, including DA sampling and erasure coding. These methods reduce the workload for network participants while ensuring data integrity.
DA sampling involves dividing data into chunks and randomly selecting bits to download. If the sampled chunks are available, the rest of the data is likely available too. However, simple random sampling may not be sufficient, necessitating more sophisticated techniques like erasure coding.
Erasure coding extends data using a Reed-Solomon code, allowing data reconstruction even if some chunks are missing. For instance, with a coding rate of 0.5, only 50% of the blocks are needed to reconstruct the entire data set. This method ensures data availability even if parts are lost or corrupted. In erasure coding, if we have N chunks of data, we extend it to 2N chunks. Any 50% of these 2N chunks can reconstruct the whole data.
Using polynomials, given a polynomial of degree < N, any N evaluations at N different points can reconstruct the polynomial and hence the original data. Evaluating at 2N points, any N of them can reconstruct the data.
To ensure correct erasure coding, we use KZG commitments.
KZG commitments enhance DA by committing to a polynomial rather than individual data points, ensuring all points lie on the same polynomial. This provides a stronger guarantee than traditional Merkle roots, which only commit to individual data points.
Danksharding is an advanced DA solution involving committees responsible for confirming data availability. Validators pick random rows and columns and attest only if the assigned data is available for the entire epoch. This significantly reduces the chance of an unavailable block passing validation.
A key principle behind DA solutions like erasure coding is polynomial commitment. If you have (d+1) points of a degree (d) polynomial, you can uniquely determine the polynomial. This principle extends data and ensures its availability.
The KZG 2D scheme extends data row-wise and column-wise, including commitments in the data and extending it two-dimensionally. This allows for efficient verification and ensures data lies on the same polynomial, enabling robust DA sampling.
To construct a 2D matrix from a k X k matrix, apply Reed-Solomon encoding on each row and column to extend data horizontally and vertically. Then, encode horizontally again on the vertically extended portion to create a 2k X 2k matrix.
For any chunk in the 2D matrix to be unrecoverable, at least (k + 1)2 out of (2k)2 chunks must be unavailable. Light clients randomly sample 0 < s < (k + 1)2 distinct chunks from the extended matrix and accept the block only if all sampled chunks are received. Light clients gossip received data chunks to the network, allowing honest full nodes to recover the full block.
This scheme achieves DA without a trusted setup or intermediate committees.
Please note that this scheme is for improving security during DA sampling and reducing chances of verifying a malicious block's availibilty.
EIP-4844 introduces blobs, large chunks of data included in blocks, to improve DA. This proposal enables block + blob data storage, which can be further enhanced with erasure coding. This paves the way for full sharding, expected around 2025, where DA sampling of data and blocks will be implemented.
The roadmap for DA includes:
In full sharding, as long as one supernode is honest, data remains available because supernodes store all DA samples, ensuring data integrity.
Data Availability is crucial for blockchain scalability, ensuring all necessary data is accessible for transaction validation and security. Advanced solutions like DA sampling, erasure coding, and KZG commitments, along with innovations like EIP-4844 and danksharding, provide robust mechanisms to enhance DA. Understanding and implementing these solutions are vital for the future of scalable and secure blockchain networks.