# A data-availability layer for Tezos
The Mumbai protocol, introducing the mechanism of smart rollups, has
indeed marked the inception of Layer-2 solutions in the blockchain
landscape of Tezos. Layer-2 is fundamentally designed to enhance the
scalability of blockchain systems by amplifying transactional
throughput, often denoted as TPS or Transactions Per Second.
In an earlier announcement, we unveiled the potential to achieve a
staggering 1 million TPS via these smart rollups. However, in order to
facilitate such elevated throughput, the requirement was to transition the content of
operations off-chain. Indeed, if we consider every transaction of the
rollup passing through Layer-1 (L1), we could predict the maximum
throughput of a rollup to be approximately 5000 TPS[^compute] for the
Nairobi protocol, a figure that pales in comparison to the initially
stated 1 million TPS. This necessitates moving data off-chain, leading
us to confront a critical question of consensus regarding these
off-chain data. Once the data is offchain, how can we reach a consensus about the content of those data and whether those data actually exist and deem available? This is the data-availability problem.
The data-availability conundrum is a recognized issue within the
blockchain sphere, and several solutions have already been proposed to
address this complexity.
One such methodology proposes the use of a specialized committee that
operates independently of the Layer-1 (L1) stake. This concept is
known as a Data-Availability Committee, or DAC for short. A DAC
solution has been meticulously crafted for Tezos, and is explored in
more detail in this article (link to DAC article).
The fact that the committee could be anybody (governed by a multisig)
makes this solution not decentralized. Even though this can be
suitable in practice for some use-cases, for other use-cases it could
be interesting to get a fully decentralised solution. This is the
data-availability layer: A decentralised solution that aims to solve
the data-availability problem.
[^compute]: Assuming an L2 transaction is `10` bytes. Roughly, on the L1,
we can put at most 512K of transactions in an L1 block. Meaning, we can put at
most 51.2K transactions. Since we have a block every 15 seconds,
this means we can roughly achieve a throughput of 5K TPS at
most. This is far from the 1M of TPS advertised.
## The data-availability layer
The Data Availability Layer, or DAL, is a solution designed to achieve
consensus on data availability, relying on Layer-1 (L1)
stakeholders. This approach introduces an independent peer-to-peer
(P2P) network where data can be both submitted and retrieved. Unlike
the P2P protocol employed by L1, where each node receives all data,
the P2P protocol[^protocol] utilized by the DAL is designed such that
nodes only receive data of interest to them. This effectively
circumvents the bandwidth limitation inherent to L1, optimizing data
transfer and accessibility.
[^protocol]: A re-implementation of the [gossipsub algorithm](https://docs.libp2p.io/concepts/pubsub/overview/) that we will detail in a later blog post
The high-level workflow for using the DAL is as follows:
1. A user desiring to submit new data submits a commitment to the data
(similarly to a *hash*[^hash]) of the data onto the L1
2. Following validation of this hash by L1, the user proceeds to
submit the actual data to the DAL P2P network.
3. L1 attesters verify the availability of this data and communicate
their findings using their respective attestations.
4. Finally, L1 collates these results to ascertain the availability of
the data.
It is worth noting that what is transferred to the DAL isn't the raw
data in its initial form. Instead, the submitted elements can be
broadly characterized as chunks of the original data, accompanied by
an erasure code. This arrangement ensures that even if only a portion
of these chunks are received by enough attesters on the DAL but not
all, it remains feasible to reconstruct the original data.
Once L1 has confirmed the availability of this data, it can be
utilized by various applications, such as smart rollups.
The preliminary requirement for the user to post a hash allows to
achieving consensus on the data that is eligible for submission to the
DAL. This precautionary step effectively deters an attacker from
flooding the DAL with arbitrary data.
[^hash]: Technically, this is not really a hash but a [KZG commitment](https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html).
## The smart-rollups integration
Smart rollups are designed to be compatible with the DAL in the
future, and will be able to leverage it once it is activated. The
integration of the DAL with smart-rollups relies on the reveal data
channel[^reveal]. This particular channel enables the PVM (Parallel
Virtual Machine) to request a hash, or more generally, a unique
identifier corresponding to certain data. From the kernel's
perspective, submitted data onto the DAL that has been duly recognized
as available can be uniquely identified via three numerical values:
- The level at which the commitment to the data was submitted
- The slot index: This parameter will be elaborated upon in the
following section
- The page index: This is a technical detail arising from the fact
that the WASM PVM can import data of at most 4KiB, while data submitted to
the DAL can be larger (such as 1MiB). Therefore, the submitted data can be split into pages of 4KiB each. The page index identifies the specific page of the original data that is to be imported.
The kernel is allowed to request any data that has been validated on
the DAL, ranging from any point in the past up to the current
level. The obligation falls on the rollup node to fetch this data from
the DAL P2P network. This will require a DAL node as discussed in the
Infrastructure section. More details about this will be thoroughly
discussed in an upcoming blog article.
## Slots
The L1 maintains control over the maximum quantity of data that can be
submitted to the DAL per L1 block level. From L1's perspective, data
introduced to the DAL is organized into slots. Each user aiming to
submit data to the DAL must specify the slot for which they are
posting the data. This allows multiple users at each level to
submit data to the DAL simultaneously. Consequently, any smart rollup
operator intending to utilize the DAL need not depend on a third
party for submiting the data since an operator can do it itself.
In instances where two different users aspire to submit data for the
same slot index at the same level, only the first operation will prove
successful. Given that the order is determined by the baker, and the
baker uses fees to sequence operations, it follows that the fee market
will effectively determine which operation claims the slot.
## The DAL is made to evolve
The implementation of the DAL hinges on several key parameters, such as:
- The number of slots
- The size of a slot
Both of these are established by an economic protocol. These two constants are pivotal as they control the DAL's bandwidth:
$$\text{bandwith}=\frac{\text{number of slot} * \text{size of a slot}}{\text{time between blocks}}$$
The DAL's architecture is targeted to accommodate a bandwidth of at
least $10$ MiB/s. However, as will be outlined in the DAL's roadmap, we
intend to initially launch the DAL with a lower bandwidth,
incrementally expanding it over time. The advantage of this approach
lies in our ability to first verify the seamless operation of the P2P
protocol, and bolster its resilience on the test network as we
gradually enhance the bandwidth.
Latency is another critical factor to consider. Given that it takes
time for attesters to fetch data from the DAL, a certain degree of
latency is to be expected between the moment the data's commitment is
posted on Layer-1 (L1) and the time when the data is officially marked
as available by L1. This latency is governed by a parameter known as
'attestation_lag', which can be fine-tuned in tandem with the
'time_between_block' parameters.
The attestation lag regulates the time window (in terms of blocks)
that an attester has to retrieve the data. Reducing the time interval
between blocks necessitates an increase in the attestation
lag. However, as a general rule, the latency should ideally be kept
under one minute.
## Infrastructure
Primarily, there will be three types of users engaging with the DAL:
- Slot producers will submit a commitment of the data to Layer-1 (L1)
and then proceed to upload the data to the DAL. For the
smart rollup use-case, a rollup operator can be a slot producer.
- Attesters will confirm the availability of the data submitted by the
slot producers.
- Slot consumers will fetch the data uploaded to the DAL, typically
data that has been validated as available by L1. For the smart
rollup use-case, a slot consumer can be anybody interested to track
the activity of a given rollup.
Given that the DAL employs a different P2P protocol than the one used
by L1, we've decided to implement a separate binary to facilitate
connection to the DAL network, namely the DAL node. Anyone wishing to
utilize the DAL will need to operate a DAL node (we aim to make the
command line interface and the configuration closely resemble those of
the octez-node).
Initially, this means that both rollup operators and attestors will
need to run a DAL node. However, based on feedback from the community,
we might consider adjusting this user experience in the future. For
instance, we could directly integrate the DAL node with the baker or
the rollup node.
## Roadmap for the DAL
Our current agenda involves rolling out the DAL on Mondaynet by the
end of June. Concurrently, we will be preparing the DAL for production
and conducting stress tests on the DAL P2P protocol. As mentioned
earlier, for safety reasons, we prefer to initially launch the DAL
with a lower bandwidth and then gradually increase it over time.
We anticipate a release on Mainnet towards the end of the year, or at
the onset of the following year.
For those intrigued by the technical aspects of the DAL, you can delve
into our design document available [here](https://hackmd.io/@p-cUv0l5RNaDKBCowZ0IzA/HJgFgSzpo/https%3A%2F%2Fhackmd.io%2FUQuA_59QRdOjU47fGM9CsQ).
## Conclusion
The DAL represents a ground-breaking solution that offers a
decentralized data-availability approach for Tezos. We plan to
introduce the DAL on test networks by the end of June, with an
ultimate goal of launching on the mainnet by the end of this year or
at the start of the next.
Additional blog articles will follow, delving deeper into the design
of the DAL and explaining the technical compromises we've decided
upon. We also intend to describe other data-availability solutions
that have been proposed within the blockchain ecosystem and contrast
them with ours.
[^reveal]: That were already described in the DAC article here TODO.