We suggest improvements of the DAL design for a future version.
Unlike Etheurem or Celestia, Tezos DAL design allows multiple slot producers for one L1 level.
This design choice makes the sampling more limited:
We discuss below the technical difficulties in implementing the sampling.
With sampling, the P2P protocol must ensure the following properties simultaneously:
While the first two points are properties already provided by the current design, the third one seems quite challenging and still needs to be solved in the blockchain community (as mentioned here).
Moreover, implementing the sampling requires an answer to the following questions:
Assume \(256\) slot producers. Furthermore, assume that there is one slot not available. The total number of shards is \(524288\), and the number of available shards is \(522368\). The probability of drawing an available shard is \(\frac{522368}{524288}=0.996337\). If the sampling uniformly selects shards, it means that for a node sampling \(20\) shards, the odds of detecting the problem are
\(1 - \frac{522368}{524288}^{20} \approx 7\)%
Is it enough to be able to stop the chain? How such a failure could be detected?
Two possible ways to make this percentage increase:
However, we believe this approach gives a new perspective that could be helpful in the future. In particular, it could help with weakening the honest majority of attesters assumption.
Given a finite family of commitments \(c_i\), it may be possible to aggregate them into a linear combination
\[
c = \sum c_i
\]
giving a new commitment \(c\).
With this scheme, \(c\) is called an aggregated commitment and it could be computed from previous confirmed commitments.
Associated with an aggregated commitment, the L1 could draw a random point \(x\) which is outside of the shards' domain.
Given a valuation \(y\) and a proof \(p\), the L1 is able to check whether \(y=P(x)\) where \(P\) is the polynomial associated with the commitment \(c\).
Recall that a commitment is actually a polynomial commitment.
Computing \(P\) requires to reconstruct the polynomial \(P_i\) associated to the commitments \(c_i\).
Attestations of attesters could be extended with a valuation of a point and a proof that this point belongs to the aggregated commitment computed by the L1.
At level \(n\), an attestation contains the bitset attesting the availability of the data plus a valuation of a point and its proof of the aggregated commitment the attester must commit on.
To ensure that the attester has the time to fetch all the polynomials, the commitments involved in the aggregation could be published a few levels ago (the constant remains to be determined). But notice it is only a matter of latency here.
Any time an attester is chosen for to commit on a point associated with an aggregated commitment, the bandwidth required to compute the valuation is around \(\frac{\text{slot size}}{\text{time between blocks}}\) which is assumed to be constant (about \(33\)KB/s). However, it requires more computational resources to compute the corresponding valuation.
Regarding the P2P protocol, such a scheme seems easier to support but may require more memory and more bandwith. Such a scheme would be compatible with the notion of topic mentioned previously.
Even though this scheme demands more resources in terms of bandwidth or computation, the number of resources is still proportional to the stake.
Morever, as we detail below we can mitigate bandwidth issues if attesters run multiple DAL nodes.
In the current design, only one DAL node can be used to fetch the shards assigned to an attestor. In the case of large attesters, this could have a cost in term of bandwidth.
A way to overcome this limitation is to allow an attestor to run several DAL nodes. There are two ways to split the bandwidth: Either by splitting the slots or by splitting the shards.
Splitting the slots sounds easier and fits better with the current design.
Each DAL node would fetch the shards for a subset of the slots. Doing so fits naturally with the current design since the slots are already part of the notion of a topic.
An attestor wishes to split its bandwidth into two could run \(3\) DAL nodes:
This does not change the interactions between the attestor and the DAL node. Communication between the DAL nodes can be done via RPCs.
Each DAL node would be responsible for handling one fraction of the stake of the attestor.
An implementation for that is introducing a new L1 operation allowing an attestor to split its stake into \(n\) parts.
Hence, for the DAL committee, the shards of this attestor would be split into \(n\) parts. Consequently, the definition of a topic should be changed.
The current design assumes a limited number of attestors since the number of outbound connections for slot producers depends on the number of attestors. To make the DAL/P2P network scaling, we should consider introducing some increase in the replication factor parameter (i.e. multiple attestors may route a shard) so that there is a benefit for attestors to be interconnected. The simulation via the relayed shards parameter shows a benefit but also highlights a bandwidth increase.
This can be mitigated by splitting the infrastructure into multiple DAL nodes, as described here. Another way to reduce this bandwidth overhead is lazy message propagation.