[toc]
## References
- [KR 2.1/DAL](https://hackmd.io/HLoGjZyKSeuNXxvbnSEcVQ?view#Key-results-for-DAL-6pw)
- [KR 2.3/P2P](https://hackmd.io/wOcQhCcCRNGWVFHUes4APQ)
- [P2P design](https://hackmd.io/JUHDNI69RSKiuPGwG7-5eQ)
## Objectives
- DAL and P2P without sampling
- Show 1M TPS on non-adverserial setting with test network
## Status Update
### General status
- 75% to have automatic end-to-end tests without P2P (rollup-node+dal node+L1)
- 75% to have p2p realistic test with producer-endorser-consumers exchanging massive amount of data, and analysis tools to assess the success of the experiment
### Current state (POC/DAL)
- Can post slot headers
- Attest slot headers
- Part of the refutation code merged
- Dal node can use crypto : split slot/reconstruct slots
- Dal node track the L1 and monitor slot headers posted
- Rollup node can store slot headers and slots
- Integration of the cryptographic primitives into the Tezos library and the environment for the protocol (a protocol can change the constants of the design, e.g. size of a slot, number of shards, ...)
- Integration of the SRS into Tezos and the CI (in particular for long tests)
- Several integration tests (via Tezt) with partial end-to-end-tests
### Major design change (with respect to May)
- Refutation game: Remove the slot subscription from the DAL **and** the sequential ordering
### Current state (P2P)
Implemented functionality
- Topic subscribtion,
- Publishing with lazy-push on topics (without data validation step)
Implemented large scale test infrastructure:
- Binary with support for behaviour descriptions on top of p2p layer
- Infrastructure for collection of all nodes logs
- Live monitoring of some metrics on the experiment
#### Experiment
Run an experiment with 656 nodes,
- 256 naive "slot-producers" (shards are just dummy bytes),
- 400 "endorsers", downloading their shard data, no change in shards affectation.
- Shard size : ~512B
- 5-6 shards per "endorsers"
- 256 slots x 2048 shards
- 50 connections per node
256MB/30 sec -> ~8MB/s (64Mb/s)
**Still processing data**, but firsts estimation gives
$\frac{1}{4}$ of slots received in 30 sec in average on shards.
Hypothesis:
- Data loss during sending (need force connect)
Experiments' log retrieval is currently a bootleneck for experiment analysis.
## Next steps (POC/DAL)
### Short term (< 3 months)
- Refutation game complete (end of the September)
- Expose L1 RPCs related to the DAL committee (MR waiting to be reviewed/merged)
- Ensure validity of slot headers (No date yet, not very hard)
- DAL node should be able to stream slot headers via RPCs
- Exporting a convenient RPC by the DAL node for the endorser
- Communicate between two DAL nodes via RPCs
- rollup node can push DAL pages to the PVM and fetch the slots from a DAL node
- Integrate the new FFT that enable better performance by reducing the padding
- Integrate the DAL node with the endorser
### Mid term (< 6-9 months)
- Cryptographic primitives should be better tested and auditated (internally and externally)
- Some design/reflexions around sharing code with the other nodes
- Better automation for writing end-to-end tests
- Ensure scalability
- Fast node should support the DAL
- Plug the DAL node with the P2P layer (equivalent of the DDB for the `tezos-node`)
### Long term (< 1.5 years)
- Making the nodes resilient to various scenarios such as: attacks,reboot,errors,disconnections, ...
- Handling protocol upgrades from the rollup nodes and the DAL nodes
- Detect corner cases/edge cases
- Preparing integration with the eco-system
- External documentation
## Next steps (P2P)
### Short term (< 3 months)
- Fix "slow logs retrieval problems" for experiments
- Experiment with every node to $500$ connections
- Experiment with mainnet's shards distribution, comittee changes at various rate, on long run
- Implements
- "Force connection mechanism" : smartely use current topics and force connections to some topics the time to send data
- have "topics of interest window in maintenance process" to focus on most relevant topic (around the current level)
### Mid term (< 6-9 months)
- Better tooling to analyse/visualize tests results
- Integration with DAL-node
- Reuse test infrastructure with full DAL node
- Last iteration on prototype to handle bandwidth need
### Long term (< 1.5 years)
- Going from prototype to product
- Topic discovery (for data retrieval)
- Ensure P2p layer security (bounded messages, peers scoring, ...)
- Integrate pubsub to tezos-node's lib_p2p, with backward compatibility.
- Heavily test code (unit/integration/large scale)
- Ensure maintainability of the code
## Difficulties
- Several ressources are new to the code base
- Few ressources (more focus on higher priority projects/SCORU)
- Summer vacation
- Employee leaving the company
- Employees on several tasks (helping higher priority projects)
- DAC
## Alternatives
### For the demo
- Kernel: SC/Tx rollup
- 1000 rollups
- Bandwith of the DAL: 256 MiB/seconds. Number of slots to be adapted.
+ 6 months
### Model of the current design:
- A cryptographic redudancy of `16` means that: We need to **trust only** 20% of the stake (explanations are [here](https://hackmd.io/g8l2M47eR1eN2WNB-m_6Rg)) . With a redudancy factor of `8`, it will be about 30%
- P2P bandwith with this model (factor of $16$) for an endorser with more than $\frac{100}{16}$ of the stake will be around $\frac{1*8*256}{30}=68$ Mib/s (padding is not accounted)
- P2p bandwith for a slot producer will be around $\frac{1*8*16}{30}=5$ Mib/s (padding is not accounted)
### Release on mainnet
- Change Tenderbake committee every `x` blocks?
- Benefit: Less switches for the P2P
- About sampling
- Benefit: Liveness/Malicious bakers
- Break pipelining
- Design is still open
- It's integration into the code base will take time
## Identified problems
- Which latency is allowed?
- Assuming an endorser may run several DAL nodes is ok?
- Is it ok if the design depends on archive DAL nodes?