Substrate Sync 2.0 design

# Substrate Sync 2.0 design This document describes the design of Substrate's new syncing algorithm and outlines the architectural changes needed in `sc-network-sync` to add support for the new protocol. ## Goals * clear split between syncing strategies * high test coverage * good documentation ## Non-goals * backwards-compatible * Sync 2.0 is a separate syncing algorithm that lives alongside Sync 1.0 ## Sync 2.0 design Sync 2.0 uses finality to simplify the syncing process. All nodes advertise their best finalized block in the handshake and block announcement messages. Since there is only one finalized chain, this simplifies the syncing process as all nodes of the same genesis share the same chain and Sync 2.0 can start requesting ranges of blocks in parallel from all of these nodes without having to perform ancestry search. Below are the message formats of the handshake and block announcement messages used by Sync 2.0: ```protobuf /// Block announces handshake. message Handshake { /// Genesis hash. bytes genesis = 1; /// Hash of the best finalized block. bytes best_finalized = 2; // Number of the best finalized block. bytes best_number = 3; /// Block hashes of all unfinalized leaf blocks. repeated bytes leaves = 4; } /// Block announcement. message Announcement { /// Hash of the best finalized block. bytes best_finalized = 1; // Number of the best finalized block. bytes best_number = 3; /// New block header. bytes header = 1; /// Data associated with this block announcement, e.g., a candidate message. bytes data = 3; // optional } ``` The finalized block of the node is advertised in both messages so that the node doing initial sync can add these block directly to its block request scheduler without having to perform ancestry search on the blocks first. ### Initial sync Initial sync starts after the node is starting from genesis or has been offline for a while. Node receives connections from peers who advertise blocks with unknown parents which starts the process of downloading the missing blocks. Since the nodes are advertising finalized blocks, the local node doens't have to do ancestry search as the chain is known to be same. Blocks between the local best and network's best are stored as a long range of blocks from which the request scheduler then allocates subranges of blocks that it starts requesting from peers. These subranges are downloaded in parallel from multiple peers and multiple peers can act as providers for any given subrange. When the blocks of a given subrange have been downloaded, the subrange is marked as completed and another subrange is allocated for the peer. If a peer disconnects or the request fails while a subrange download is in progress, the peer is removed from the list of providers and the subrange is allocated for some other peer. As blocks must be imported in order while they're downloaded in parallel, the request scheduler contains a wait queue which temporarily stores downloaded blocks which cannot be drained to the import queue. When a subrange is received which has a direct descendant of the local best block, all the blocks in the wait queue that form a contiguous range of blocks are drained to the import queue. The wait queue is bounded so if the block import process is slower than then rate at which blocks are received from the network, the request scheduler throttless itself and stops sending requests until it has received the subrange which allows draining the wait queue to import queue. ### Keep-up sync After the initial sync has downloaded all the blocks up until the initially set target block, the initial sync is complete and the algorithm switches to keep-up sync. This sync mode is responsible for keeping the node up to date with the network by downloading the data it receives in block announcements. These block announcements may be in competing forks and the node must download, import and announce all of these forks to the network. Especially during initial sync, the fork target queue may grow large as the initial sync is prioritized over fork targets. Care must be taken when storing the fork targets as not to waste memory and OOM kill the node. The algorithm can remove stale forks that are not part of the canonical chain and compress long fork targets into a start hash + number of blocks. ### Warp syncing, and the block history phase or gap sync Warp sync stays unmodified in Sync 2.0 and the same code that is used for the Sync 1.0 is used for the new implementation as well. The only modification that has to be introduced in `sc-network-sync` is moving the Warp sync out of `ChainSync` into the syncing controller so that there is only one warp/state sync running per `SyncingEngine`. After warp sync has finished, the node starts to download the block history in the background. The process is the same as for initial sync except that node is already at the tip and can process received block announcements immediately, download the new blocks with a higher priority than what the block history download has, import the blocks and announce them to the network. ## Coexistence of two syncing algorithms Since Sync 1.0 cannot be deprecated in one release and instead must be slowly phased out, `sc-network-sync` must handle peers using both the new and the old syncing algorithm. Having two fully-functioning syncing algorithms is not desirable as they compete with each other and download the same blocks over and over again unless they coordinate with each other. As Sync 2.0 provides a chance to refactor the syncing subsystem and introducing a coordination API for the Sync 1.0 would be difficult, the goal instead is to move the common code between the syncing implementations, namely warp and state sync and block request handling, out of the `ChainSync` and into a new chain syncing implementation. The old `ChainSync` is used for peers using Sync 1.0 to do ancestry search on them to find the common block and fork targets. This information is then given to the new chain syncing implementation which is responsible for syncing the chain and downloading blocks from the peers of both implementations. Since the block request/response stays the same between the syncing implementations, the request handling code can treat both peers the same way. Sync 1.0 must be called only when a peer using the old algorithm connects, or a block announcement from `/block-announces/1` is received. This will be handled by `SyncingEngine` which has received the peer connections/block announcements and knows which protocol the peer is using, as indicated in the `NotificationStreamOpened` message received from `Notification`. Once the network has fully transitioned to use Sync 2.0, the old Sync 1.0 code can be fully removed. ## Integration into existing architecture ![](https://hackmd.io/_uploads/HyGqL6Bv2.png) Most of the changes needed for Sync 2.0 can be done before the new syncing algorithm is introduced. The goal is to move code from `ChainSync` to a new `BlockSync` (TODO: rename maybe) object that handles chain syncing in coordination with `SyncingEngine` (which should be responsible for all syncing-related I/O). This also makes it possible to refactor the old syncing code in incremental pieces and keep the individual tasks as small as possible. Firstly, all I/O/polling-related code must be moved to `SyncingEngine`. This means all request/response handling, block announcement validation and block importing must be handled by `SyncingEngine`. This also implies that `ChainSync` is converted into a pure state machine and a set of events that describe what `ChainSync` wants `SyncingEngine` to do must be introduced. This change also gives the ability to write a lot of tests for `SyncingEngine <-> ChainSync` and `SyncingEngine <-> BlockSync` communication as the interface between them is then described as a set of state changes without any need for I/O. After that, the code that is common between the syncing implementations (warp and state sync, block requests/responses) is moved out of `ChainSync` into a new `BlockSync` object. The goal is to make `BlockSync` the main state machine for the syncing implementation and keep the Sync 1.0-related code in `ChainSync`. `BlockSync` also contains a new block reqeuest scheduler that is designed for the Sync 2.0 and works under the assumption that the chain it's downloading is finalized and can schedule requests to happen in parallel using the block number. The request scheduler is able to handle Sync 1.0 peers which don't provide information about finality and either tries to associate the received information to already-available finality information received from Sync 2.0 peers or downloads the blocks from Sync 1.0 the same way as it's done in `ChainSync`. `Notifications` code is modified to introduce the concept of an actual fallback protocol. This entails introducing versioning to the notification protocol where each new version of the protocol may contain a unique handshake. The old `fallback_names` is kept unmodified so while each protocol may have multiple versions with different handshakes and implementations, the protocol may also have aliases that refer to the same protocol implementation but under a different name. After all these preparation tasks are completed, the new main protocol Sync 2.0 is introduced which is just a matter of introducing the new message formats and setting `/block-announces/2` as the default protocol for the block announces substream. ### Task list * [ ] Move all polling code to `SyncingEngine` * [ ] Convert `ChainSync` into a pure state machine * [ ] Introduce a set of events that inform the `SyncingEngine` what `ChainSync` wants to do * [ ] Move all block announcement handling to `SyncingEngine` * [ ] Introduce `BlockSync` and move warp and state sync to `BlockSync` * [ ] Implement block request scheduler (already partially implemented) * [ ] Move all request/response handling for initial, keep-up and gap sync to `BlockSync` * [ ] Introduce the concept of a fallback protocol for `Notifications` (different from the current `fallback_names`/alias support) * [ ] Introduce Sync 2.0 messages and set `/block-announces/2` as the default notification protocol for syncing * [ ] Move all syncing code from `sc-network-common` to `sc-network-sync` * [ ] Move all request handlers to `SyncingEngine` and handle requests only from accepted peers ## References [Node refactoring](https://www.notion.so/paritytechnologies/Node-Refactoring-937b5770d14c494991903a4b7ce52012) [Sync 2.0](https://github.com/paritytech/substrate/issues/10740) [Extracting import queue out of `sc-network-sync`](https://github.com/paritytech/substrate/issues/11295) [Break apart verification and import](https://github.com/paritytech/substrate/issues/11294) [Break down/split Sync](https://github.com/paritytech/substrate/issues/11293)