This is a non-exhaustive list of questions that have come up in Eric Wall's discussions regarding data availability.
Data availability refers to the availability of transactions in a block that is appended to the tip of the chain. During consensus, validators download the block to verify its availability. If the block contains any transactions that are withheld by a validator, the block is unavailable and will be rejected as invalid.
Data availability is only concerned about the availability of a block when it is being proposed by a validator. Once the block has completed the consensus processes, is appended to the tip of the chain, and has propagated throughout the network, then the ability to download transactions from that block is what we call retrievability. This distinction is important because retrievability is a different problem from availability.
The data availability problem is concerned with the ability for nodes to verify that a block is available. For validators this occurs during consensus. For non-consensus nodes, this occurs when the block has passed consensus and is propagated throughout the network.
In the context of light clients, the data availability problem is concerned with the ability for them to verify that a block is available without downloading the entire block. In Celestia, this is achieved through data availability sampling.
The data retrievability problem refers to the ability for historical data to be retrieved - any block that has had another block built on top of it is historical. Fortunately, the ability to retrieve data is the weakest possible assumption, 1-of-N, where only a single copy of the data needs to be retrievable from anywhere on the internet. Additionally, history can be stored on inexpensive hard drives.
No. It is not the job of a blockchain to guarantee that its history is retrievable in perpetuity. This is evident by the fact that the majority of blockchains don’t incentivize storage of historical data, including Bitcoin. However, this does not mean that other parties won’t have motive to store the history partially or fully without incentives from the protocol.
Some examples include:
Yes.
No. The purpose of sampling is to provide the node a probabilistic guarantee that data is available. This is the same case for full nodes – they don’t need to be aware of other full nodes downloading the block because security guarantees are derived from downloading the block themselves.
Yes. This is also true for full nodes in all blockchains – they don’t conduct data availability sampling but they require a connection to at least one other node that will provide them with historical data over the p2p network. Nodes dont receive block data out of thin air.
Celestia assumes that there is a minimum number of light nodes that are sampling blocks such that the blocks can be fully reconstructed from the stored samples. Since Celestia plans to elastically change its block size, it must require that a larger number of light nodes are present in the network to increase the block size.
The other assumption that is made by light nodes is that they are connected to at least one honest full node. This ensures that they can receive fraud proofs for incorrectly erasure coded blocks. If a light node is not connected to an honest full node, such as during an eclipse attack, it can’t verify that the block is improperly constructed.
Yes. Even if some history is unable to be retrieved it does not impact the overall security and function of the blockchain. Validators can continue voting and producing new blocks. In PoS, new full nodes can bootstrap needing only the state at the weak subjectivity checkpoint and the historical data that leads up to it. Any party that requires historical data for certain functions, such as applications or rollups, should store their own data if they require stronger retrievability guarantees.
A couple things: