HotStuff-rs View Sync Design

# Byzantine View Synchronization: Design for `hotstuff-rs` Implementation ## Protocol sketch Here is a sketch of the [Lewis-Pye](https://arxiv.org/pdf/2201.01107.pdf) view synchronisation protocol, with some optimisations and adaptations tailored to `hotstuff-rs`.  ### Preliminaries * The total validator power is $3f+1$. * The epoch length is $l_{e}$. Ideally, it should be greater than the upper bound on the number of byzantine nodes across validator set updates (to guarantee the invariant that each epoch can produce at least one QC), but this is not essential. * $l_{e}$ successive views form an epoch, such that $\forall k \in \mathbb{Z^{+}}$ the views $\{(k-1)l_{e}+1, \dots, kl_{e}\}$ form the $k^{th}$ epoch. * View $0$ is the *null view* of the *null epoch* 0, and is not part of any epoch. The idea is that all replicas start in the *null view* before they can enter any epoch, and there is no progress mode in the *null view*, i.e., replicas go straight to "epoch sync" (defined below). This helps synchronise the replicas at the beginning of the protocol. * Let $l(v)$ be the leader of view $v$. * Let a replica R be in the set $L(e)$, i.e., be a leader of epoch $e$, iff $R = l((e-1)l_{e} + k)$ for some $k \in \{1, \dots, l_{e}\}$. * Let view $v$ such that $v = 0 \mod l_{e}$ be called an "epoch view", i.e., the last view of an epoch. Any view $v$ such that $v \neq 0 \mod l_{e}$ is called a "non-epoch view". We define the predicate `is_epoch_view(v)` to be true if and only if $v = 0 \mod l_{e}$. * Let $C(v)$ be called the set of "collectors" for view $v$. For a non-epoch view $v$ $c(v) = \{l(v+1)\}$, i.e., the next leader is the unique collector, but for an epoch view $v$, $C(v) =$ `committed_validator_set`. So we make every validator a collector for an epoch view. This guards the liveness of the protocol in case: - we decide to introduce a leader selection mechanism different from Round-Robin, or - the "at least one replica in every $f+1$ replicas is honest" property doesn't hold because $f$ varies **due to dynamic validator sets**. * By a "quorum" we always mean a collection of votes for the same value from distinct validators whose power sums up to at least $2f+1$. ### Assumptions 1. Any leader selection mechanism that satisfies **fairness**, i.e., probability or frequency of being selected is proportional to power. 2. For simplicity, for this section one can assume that replicas' **blockchains are always in sync**. In case they are not, the block sync mechanism specified below can be used to catch up. ### High-level idea The idea is that replicas may fall out of sync with each other during an epoch if the collector for a view is byzantine, but they will necessarily synchronise upon epoch change, because at least one of the epoch change collectors must be honest. The epoch synchronisation also ensures that each replica has approximately the same deadline for each view in the next epoch. Additionally, an honest and up-to-date leader should always be able to bring all honest nodes to its view, either via 1. All honest replicas timing out in the previous view at approximately the same time, or 2. Broadcasting evidence for the completion of the previous view (a QC). Since being able to succesfully bring all honest replicas to the same view via (2) **may require that the honest replicas state (the local blockchain) is in sync with that of the leader** (since the QC must pass the `correct` and `safe_qc` checks) Ensuring that at least with high likelihood the local blockchains are in sync is left to the block sync mechanism (next section).     ### The protocol If in a view $v$, $c \in C(v)$ collects a quorum of votes for the proposal in $v$ (i.e., the QC for $v$), then $c$: 1. Broadcasts the QC via an `AdvanceView` message. **Invariant:** If a replica receives a QC for a view $v$, this means that a quorum of replicas is advancing to or has already advanced to $v+1$.  If in an **epoch view** $v$, $c \in C(v)$ collects a quorum of `NewEpoch` messages for view $v$ (i.e., the TC for $v-1$ or $v$, of `TimeoutCertificate` type), then $c$: 1. Broadcasts the TC in an `AdvanceEpoch` message. **Invariant:** If a replica receives a TC for an epoch view $v$, this means that a quorum of replicas is advancing to or has already advanced to the first view of a new epoch $v+1$. **Anytime**, if a replica receives a **correct** and **safe** QC `qc` for a view $v \geq$ `cur_view`, the replica: 1. Sets its `highest_qc` equal to `qc`. 2. If $v$ is an epoch view, then also broadcasts the received QC. 3. Exits `cur_view`. 4. Enters $v+1$. Any QC or TC for a view $v \geq$ `cur_view` is also **stored in the buffer**, so that in case advancing to $v+1$ via the "fast track" described above fails because the QC or TC is not `correct` or not `safe_qc`, the replica can still advance to $v+1$ via the "slow track", by reconsidering the QC or TC for $v$ when in view $v$ (since then the QC should be `correct` and `safe_qc`). **Note:** To advance to a view via the "fast track" of receiving a QC for the previous view, a replica's blockchain must be in sync with the quorum's blockchain. In case it is not in sync, the block sync mechanism can be used to catch up. Moreover, because the QCs are stored in the buffer "slow track" is still available as a fallback mechanism. **Invariant (ideally to be guaranteed by block sync):** If an honest replica advances to $v$, then it is (at least with high probability) ready to accept a proposal from an honest leader in $v$. **Anytime**, if a replica receives a **correct** TC `tc` for an **epoch view** $v \geq$ `cur_view`, the replica: 1. Sets its `highest_tc` equal to `tc`. 2. Broadcasts the received TC. 3. Exits `cur_view`. 4. Enters $v+1$. Again, any QC for a view $v \geq$ `cur_view` is also **stored in the buffer**, so that in case advancing to $v+1$ via the "fast track" described above fails because the TC is not `correct`, the replica can still advance to $v+1$ via the "slow track". If a replica times out in view $v$, the replica: * If $v$ is a non-epoch view then: 1. Sends `NewView` for $v$ to all $c \in C(v)$, i.e., to the next leader. (as in current `hotstuff-rs`) 2. Exits $v$. 3. Enters $v+1$. * If $v$ is an epoch view, then: 1. If `!epoch_sync` then sends `NewView` as usual. 2. Sends `AdvanceEpoch` for $v$ to all validators from the `committed_validator_set`. 3. Resets the timeout for $v$ to `now()` + $\Delta_{view}$ and sets `epoch_sync = true` (otherwise it is always `false` on entering each `execute_view`). 4. Call `execute_view` on the same view $v$ again. Stay in `execute_view` for $v$ until a TC or QC for $v' \geq v$ is received, if not received until timeout go back to step 1. So the "epoch sync" mode is just `execute_view` but 1. With resetting the view timer on timeout, so that a replica will stay in `execute_view` for an epoch view until it receives a TC or QC. 2. Without the leader broadcasting the proposal (since it has already been broadcasted). 3. Without sending `NewView` since it was sent on entering the "epoch sync" mode. ### Timers Each view has a deadline. The view deadlines are set on entering the first view of an epoch, i.e, for any $v$ such that `is_epoch_view(v-1)` holds. The deadline for an epoch view can also be reset upon reaching its initial deadline. Concretely, let $\Delta_{view}$ be a constant corresponding to the intended maximum duration for a view (should be a bit bigger than the upper bound for how long it takes to execute a view), and $s$ be the time when a new epoch is entered. Then the deadline for a non-epoch view $v$ is: $d_{v} = s + (v \mod l_{e})\Delta_{view}$, and for an epoch view $v$ it is: $d_{v} = s + l_{e}\Delta_{view}$. In terms of implementation, the `Pacemaker`will store a `timeout` vector with $l_{e}$ elements corresponding to the deadlines for each of the $l_{e}$ views within an epoch (the epoch number is also stored in the `Pacemaker`), and every time $v$ such that `is_epoch_view(v-1)`is entered, `timeout` will be reset such that: `timeout[i]` = $d_{i}$ for $v \leq i < v + l_{e}$ Note that this implies that if replicas make progress faster in earlier views, then they will have to wait longer in subsequent views. In other words, the total duration of an epoch is constant and worst-case if there is at least one unsuccessful view (which leads to timeout). However, as will be shown in the proofs of liveness related properties, this design ensures that: 1. Replicas enter each epoch sync phase at approximately the same time, ensuring a smooth epoch change with a quorum, 2. Unless byzantine collectors decide to selectively send the QCs to only some of the honest replicas, the honest replicas should be in sync. 3. An honest leader can always bring replicas to the same view. Additionally, **if a replica enters a view v of a new epoch $e$ mid-epoch** (i.e. $v$ such that `!is_epoch_view(v-1)`) at time $s$ due to hearing about a QC for $v-1$, it will reset `timeout` as follows: `timeout[i]` = $s + (i-v + 1)\Delta_{view}$ for $v \leq i \leq v_{e}$, where $v_{e}$ is the last view of epoch $e$. Note that this means that for **a replica that was lagging at least one epoch behind** and synced with others or **a replica that became a validator mid-epoch**, its timeouts for the current epoch may differ from the timeouts of other up-to-date replicas. Worst-case, the lagging replica will finish the epoch earlier than others and sync with others on epoch change. ### Setting epoch lengths $l_{e}$ is a constant, a configurable parameter. **Must** be the same for all replicas, and can change only through version upgrade. This is made possible by the decision to do all-to-all broadcast in epoch change, instead of sending `Vote` or `NewEpoch` only to the $f+1$ leaders of the next epoch, as in [Lewis-Pye](https://arxiv.org/pdf/2201.01107.pdf). This does not introduce any degradation in the message complexity, since $f$ defined as $\frac{1}{3}n$ is in $O(n)$ anyways. ### Setting the $\Delta_{view}$ parameter Same as above, it should be a constant, configurable parameter. It **must** be the same for all replicas. Since each view involves a leader's broadcast and waiting for votes, if $\Delta$ is an upper bound on message delivery time, then $\Delta_{view} > 2\Delta$. Again, this parameter can be tuned experimentally.  ### Correctness sketch Here are some properties of the view sync mechanism described above relevant for the liveness of the hotstuff-rs protocol. **Note:** These properties were specified before the shift towards all-to-all broadcast and a constant epoch length, but they will hold as long as epoch length is greater than the upper bound on the number of byzantine nodes. Relaxing leader selection from round robin to any fair selection mechanism also should not affect the properties. Important: The properties below hold **assuming the the replicas local blockchains are always in sync**. The block sync mechanism should be designed such that at any moment, at least with high probability, a replica's blockchain is in sync with the quorum. Let $\Delta$ be an upper bound on the message delivery time. **(1) Synchronization through Epoch Change:** > Let $t*$ be the time when the first honest replica $r$ enters the first view $v$ of epoch $e$. Then all honest replicas will enter $v$ by $t*+ \Delta$. **Proof:** Suppose $r$ enters the first view $v$ of an epoch $e$. Then $r$ must have received a QC or a TC for $v-1$, in which case it must have broadcasted the received QC or TC to all other replicas right before entering $v$. It takes a maximum of $\Delta$ time to deliver the TC or QC, and after receiving it the honest replicas that haven't entered $v+1$ yet must enter $v+1$. **(2) Progress Guarantee for Each Epoch** > In every epoch $e$, a view $v$ with an honest leader will produce a new valid block extending the block associated with the `highest_qc` known to the honest leader and all honest nodes will vote for the proposal. **Proof:** Since an epoch has $f+1$ distinct view leaders, at least one is guaranteed to be honest. Let $h$ be the honest leader of view $v$ in epoch $e$. Then $h$ is the collector for $v-1$, and either: 1. $h$ collects a QC in $v-1$, in which case it broadcasts it, bringing all honest replicas to $v$ within $\Delta$ time, or 2. $h$ times out in $v-1$ at $d_{v-1}$, and so do other replicas regardless of whether their views were initially in sync, since without having received a QC for $v-1$ they must time out at around $d_{v-1}$. Therefore, all honest replicas enter $v$ at approximately the same time. Since all $\geq 2f+1$ honest replicas are in $v$ and $h$, the leader of $v$, is honest, the proposal by $h$ must be accepted. This means that all honest replicas insert the same new block and vote for it. **(3) Block Commit Guarantee for Every 3 Epochs** > Every sequence of 3 subsequent epochs is guaranteed to produce at least one committed block. **Proof:** **Assuming round-robin leader selection**, the first 3f+1 out of 3f+3 leaders of the 3 epochs will be distinct, and the last two leaders of epoch 3 will be the same as the first 2 leaders of epoch 1. By the Pigeonhole Principle there must be a subsequence of 3 subsequent honest leaders, call them $h_{1}, \: h_{2}, \: h_{3}$. By **(2)** $h_{1}$ will bring all honest replicas to the same view $v$ and propose a valid block $B$ that all other replicas will vote for. Since $h_{2}$ is honest, it will collect a QC for $B$ and broadcast it, bringing all honest replicas to view $v+1$. Then $h_{2}$ will propose another valid block $B'$ extending $B$, and other honest replicas will vote for it and set the QC for $B$ to be their `highest_qc`. Then $h_{3}$ will collect the votes, broadcast the QC, and bring all replicas to $v_2$. Then it will propose $B''$, and all honest replicas will vote for it, thus setting their `locked_view` equal to $v$. This effectively locks $B$, and the next proposed block, whether proposed by an honest or byzantine node, will have to extend $B$ in order to be valid, thereby committing $B$. **(4) Synchronised Timeout** > All honest replicas that timeout at view $v$ timeout at approximately the same time. **Discussion:** Since the view deadlines for the views in the epoch are set upon entering the epoch, it is guaranteed that any two honest replicas that timeout in a view will timeout at approximately the same time. This is something we might want to verify experimentally, at least if $f$ grows big and hence each epoch takes longer, but for small $f$ the periodic synchronization on entering each epoch should suffice. If there is a risk that the view deadlines with an epoch might go out of sync for some replicas, a possible solution is to broadcast the TCs also for non-epochs views. For now, I decided against it in order to avoid complicating the implementation of the `recv` method of `ProgressMessageBuffer` (since then a collector might need to process `NewView` messages for a previous view). **(5) Stable quorum** > If a set of replicas R is ahead of remaning replicas from R', say R is in view $v$ and R is in $v' < v$, then there must a correct QC for $v-1$ known to all replicas in R. In other words, there is always a quorum ahead, because the byzantines require a quorum to move faster. **Proof:** Suppose there is no correct QC for $v-1$ known to all replicas in R. Then some replicas in R must have timed out in $v-1$ at $d_{v-1}$. However, replicas in R' also time out in $v-1$ in $d_{v-1}$, so those replicas in R are not ahead of R, which contradicts the assumption that all replicas in R are ahead of R'. ### Performance **Message complexity:** The view sync protocol only adds a $O(n^{2})$ epoch change every $l_{e}$ views to our otherwise $O(n)$ protocol, but this is amortized so for average case the message complexity stays linear. **Latency:** $O(l_{e}\Delta_{view})$ in the worst case when there is at least one byzantine replica, hence $O(n\Delta_{view})$. **Optimistic responsiveness:** Guaranteed only when there are no byzantine nodes. The presence of even a single byzantine leader in an epoch can bring the execution time for an epoch up to $l_{e}\Delta_{view}$. However, otherwise the epoch execution can move at the network speed. **Overall:** * The latency is not ideal but better than the current latency which is unbounded. * The protocol implementation doesn't require adding new threads, which helps us save on CPU usage. * Optimistic responsiveness also not ideal, but probably still better than what we have right now and than previous protocols such as Cogsworth and Naor-Keidar - there is perfect optimistic responsiveness in case all replicas are honest. * The protocol implementation is simple and easy to reason about. ### The Lumiere protocol for improved performance [Lumiere](https://arxiv.org/abs/2311.08091v1), co-authored by Lewis-Pye with a pre-print published last week, introduces two performance improvements relative to the Lewis-Pye view sync protocol: 1. Latency depends on the actual number of byzantine replicas, not on the upper bound $f$. Hence, more optimistically responsive than LP. 2. The heavy $O(n^{2})$ epoch change can be eliminated in case of persistent progress. However, this comes at the cost of increased complexity, here are some changes relative to LP: 1. Two certificate sizes, $2f+1$ and $f+1$, not just $f+1$, and the following new certificate types: - Epoch Certificate: $2f+1$ NewView messages for an epoch view. - Timeout Certificate: $f+1$ NewView messages for an epoch view. - View Certificate: $f+1$ NewView messages for an "initial" non-epoch view. 2. Even views are "initial" and odd views "non-initial", each pair has the same leader: - Initial view: entered on either timeout in previous view, QC fro previous view or VC for previous view. Upon entering a NewView message is sent. - Non-initial view: only entered if QC for the previus view is received, otherwise skipped according to view timeout. This is meant as a "grace period", allowing the initial view to produce a QC even after timeout. 3. NewView messages for all skipped views. 4. Increasing epoch lengths. 5. Resetting timers when the view is advanced on receivinga QC before timeout. This is what enables latency depending only on the actual number of byzantine replicas. 6. Optimised epoch change: - Replica can advance to new epoch on finding on that the previous epoch was "successfull" (this can mean that each epoch leader managed to collect a QC), without the need for the $O(n^{2})$ message exchange. ## Implementation sketch ### Modified flow of a `view` The following changes concern the body of the `start_algorithm` loop: 1. After setting `cur_view`, we check if we have just entered a new epoch, and if so reset the timers. 2. After setting `cur_view`, we check if we have just entered an epoch view, and if so we sync with a random peer. 3. The return type of `execute_view` will be `Result<(), EpochEnd>` - `ShouldSync` error is not needed anymore, since block sync should be called from execute_view, thus counting towards view execution time, - New `EpochEnd` error which signals that the epoch view has timed out, and the replica should enter `epoch_sync`. 4. If the `execute_view` returns `Err(EpochEnd)`, then call `epoch_sync`. ### New `epoch_sync` method When called, `epoch_sync` will: 1. Reset the timer for the current epoch view, and 2. Broadcast a `NewEpoch` message, and 3. Enter a loop, bound by the timeout of the current epoch view, where: - `NewView`, `NewEpoch`, `AdvanceView`, `AdvanceEpoch` messages are processed, including collecting a TC, - On seeing a QC or TC for `>= cur_view` we process the QC/TC as usual (if safe and correct, set `highest_qc`/`highest_tc`, if for unknown block then sync) and exit the loop. ### New `TimeoutCertificate` type ```rust #[derive(Clone, BorshSerialize, BorshDeserialize, PartialEq, Eq)] pub struct TimeoutCertificate { pub chain_id: ChainID, pub view: ViewNumber, pub signatures: SignatureSet, } ``` Just like for `QuorumCertificate`, there will be an `is_correct` method for `TimeoutCertificate` to check if the certificate is cryptographically correct and signed by validators from the `committed_validator_set` (regardless of the speculation phase) with total power greater or equal to the quorum threshold of $2f+1$. ### New `highest_tc` field for the `BlockTree` ```rust mod paths { \\ All other fields... pub(super) const HIGHEST_TC: [u8; 1] = [12]; } ``` ### New `ProgressMessage` instances  ```rust enum ProgressMessage { Proposal(Proposal), Nudge(Nudge), Vote(Vote), NewView(NewView), NewEpoch(NewEpoch), // new AdvanceView(QuorumCertificate), // new AdvanceEpoch(Certificate), // new } ``` where ```rust struct NewEpoch { chain_id: ChainID view: ViewNumber signature: Signature highest_tc: TimeoutCertificate } ``` ```rust enum Certificate { QuorumCertificate{QuorumCertificate}, TimeoutCertificate{TimeoutCertificate}, } ``` and: * The `view()` of a `AdvanceEpoch(tc)` or `AdvanceView(qc)` message is the view of the `tc` or `qc` respectively, * The `chain_id()` of a `AdvanceEpoch(tc)` or `AdvanceView(qc)` message is the chain id of the `tc` or `qc` respectively. Optionally, we can define a `ViewSyncMessage` enum and add the three new message instances as its instances. ### `ProgressMessageBuffer` updates ```rust enum ProgressMessageReceiveError { Timeout, ReceivedQCFromTheFuture, // TODO: delete ReceivedQCFromTheFuture{qc: QuorumCertificate, sender: VerifyingKey}, // new ReceivedTCFromTheFuture{tc: TimeoutCertificate, sender: VerifyingKey}, // new } ``` As for the `recv` method: * An `AdvanceView` or `AdvanceEpoch` message for `cur_view` will be returned like all other messages for the current view, i.e, via `Ok((msg, sender))`, * Only QCs and TCs for a future view are returned via an error (contained in `AdvanceView`, `AdvanceEpoch`, `NewView`, or `Proposal`). The idea is that TC or QC for current or future view are immediately returned in the corresponding error instances. Upon getting one of these errors the algorithm thread will check if the QC or TC is `correct` (and if it is `safe_qc` for a QC), and if so it will update its `highest_tc` or `highest_qc` field and return in `execute_view`. ### `I_have_voted` variable In algorithm thread, keeps track of whether a replica has voted in the `cur_view`. More likely we need two such variables, one per each of the maximum two recognised leaders. Ensures a replica doesn't vote twice (to the same leader) in a view. Optional: once this variable becomes true a non-leader replica should stop processing messages other than `AdvanceView`, `AdvanceEpoch`, `NewEpoch`, `NewView` (i.e., only process view sync related messages) ### Modular `Pacemaker` implementation New methods for the trait: ```rust pub trait Pacemaker: Send { fn view_timeout( &mut self, cur_view: ViewNumber, highest_qc_view_number: ViewNumber, // TODO: delete, no longer needed. ) -> Duration; // new provided method. fn reset_epoch_view_timeout( &mut self, epoch_number: EpochNumber ) -> () // new provided method. // sets the timers for a new epoch. fn start_epoch( &mut self, epoch_number: EpochNumber, ) -> () fn view_leader(&mut self, cur_view: ViewNumber, validator_set: &ValidatorSet) -> VerifyingKey; } ``` `DefaultPacemaker` implements the trait by storing, reading from, and writing to a `timeout` vector: ```rust pub struct DefaultPacemaker { view_timeout: Duration, cur_epoch: u64, timeout: Vec<Instant>, } impl DefaultPacemaker { /// # Safety /// `view_timeout` must not be larger than [u32::MAX] seconds for reasons explained in [Pacemaker]. pub fn new(view_timeout: Duration) -> DefaultPacemaker { Self { view_timeout, 0, // timeout = [..., now()] } } } impl Pacemaker for DefaultPacemaker { // ... } ``` For the implementation of the `Pacemaker` trait, `view_timeout` will panic if called on a view from different epoch than the current epoch. ### `NewEpochCollector` Now stores signatures, so that a TC can be returned from the `collect` method. ```rust struct NewEpochCollector { chain_id: ChainID, view: ViewNumber, validator_set: ValidatorSet, validator_set_total_power: TotalPower, signature_set: SignatureSet, } ``` # Block sync Now that view sync takes care of view synchronization, the only remaining concern for sync is if data, i.e., blocks, are not synchronised. There are some subtle dependencies here: * Unsynchronised blocks **can, in some circumstances,** disable view sync, namely when the future QC/TC prompting the replica to move to the view is signed by an updated validator set, and for some solutions also if it is not a `safe_qc`. * However, to synchronise blocks synchronised views are **not required**. This seems to imply that view sync should happen simulatenously with block sync (if needed), more concretely: 1. Demand-driven approach: when views are about to be synced, check if there is a need to sync blocks, and if so, sync blocks first. 2. Preventive approach: try to sync blocks at the beggining or end of every view, to ensure (or increase the probability of) ability to vote for an honest proposal and advance views on receiving a QC or a TC. We also need to define "need to sync", the three candidate definitions, listed in order of strictness: 1. Replica needs to sync when it is not making progress, or 2. Replica needs to sync when it knows there is a quorum ahead, or 3. Replica needs to sync when it enters a view to ensure it can make progress in the view. I think 2 or 3 are more appropriate than 1. The reason is that 1 does not cover situations in which a validator replica goes temporarily offline or its execution of the protcol is suspended for whatever reason: in this case, assuming that the others still have a quorum, upon returning to protocol execution a replica will still receive buffered messages for future views from the quorum ahead, thus making progress. However, ideally it should catch up with others faster via sync. Other desired **properties of block sync** relevant for liveness: 1. **Required: The time spent syncing counts towards the view time, and a replica may re-enter the progress mode for the same view after syncing.** This is important, since otherwise the liveness properties of the view sync protocol might be violated, in particular an honest leader might not be able to bring all honest replicas to the same view. 2. **Required: No sync cycles that lead to skipping views** (or not spending enough time in a view required to make progress). This is particularly relevant for timeout-based approaches, but can be relevant to other approaches too. If sync doesn't bring about progress it is because either (1) the sync peer is byzantine, or (2) there has been no progress overall among other replicas, e.g., because of a sequence of byzantine leaders. In case (2), view sync ensures that there will be progress eventually, thus a replica must be ready to vote when this happens. One solution for the timeout-based approach is: in case sync (triggered after $\Delta_{sync\_trigger\_timeout}$ of no progress) does not bring about progress, we schedule another sync attempt after $\Delta_{retry\_sync}$ in case no progress is made until that time. The idea is that $\Delta_{retry\_sync} << \Delta_{sync\_trigger\_timeout}$. However, more generally we can maintain a `last_sync_complete_time` variable and ensure that another sync is not called before $\Delta_{retry\_sync}$ from `last_sync_complete_time`. 3. **Enhancement: If the leader of view $v$ is honest, then every honest (and online) replica is "ahead or in sync" with the leader within some $\Delta_{sync}$ time from entering $v$**. By "ahead or in sync" I mean that its blockchain is the same as or extends the leader's blockchain. This is to account for the possibility of an honest leader lagging behind on blocks, in which case its proposal shall not be accepted. **This property, together with property (2) of view sync, guarantees that a proposal of an honest leader in sync with a quorum will always be endorsed by the honest replicas.** ## Possible solutions Below I list a few ideas for the block sync mechanism, which can be classified as follows: 1. Timeout-based, or 2. Periodic, or 3. Event-based. We can also classify them as: 1. Preventive - sync to ensure or increase the likelihood that "progress" will happen, or 2. Remedial - sync when "progress" doesn't happen for long enough. And: 1. Random - the sync peer is random, or 2. Targeting - the sync peer is chosen for a reason, usually the leader. ### #1 Sync at the start of each view (periodic) Try to sync with a peer at the start of each view (even before proposing). We can make it targeting by **syncing with the leader of the view**, instead of syncing with a random peer. Additionally, on receiving a QC for $v \geq$ `cur_view` a replica will only check if it is `correct`, but not if it is `safe_qc`. If `correct`, it will exit the current view. (This is in case a replica didn't receive a proposal from a byzantine leader, so that it can advance and sync once it receives a proposal from an honest leader) This ensures (probabilistically) that: 1. The replica will always be able to accept the proposal of an honest leader if it is in the same view. 2. If the replica is in $v' \leq v$ and the validator set has not changed between $v'$ and $v$, then it will be able to advance to $v+1$ via the "fast track" i.e., on receiving a QC for $v$.  **Enhancement property 3** should be satisfied if the sync peer is the leader of the view.  Note that: * (+) Simple. * (+) **If the sync peer is up-to-date and honest, the syncing replica will be able to accept the proposal of an honest and up-to-date leader in the current view**. * (+) Consistent with the view sync design, i.e., this way block sync is called after every view sync "probabilistically ensuring" (probabilistically because chances are the peer is honest an up-to-date, with probability > 1/3) that when views are in sync, blocks are also in sync, so an honest proposal will be accepted and a correct QC or TC from the future will let the replica advance (since its validator set is up-to-date). * (-) Overapproximates the space of situations in which a replica needs to sync, i.e., oftentimes sync will be triggered when the replica is in sync with the quorum. * (-) The leader replica's sync server will receive sync requests from all other validators at the same time. * (-) Issue with advancing views on receiving a `correct` (but not necessarily `safe_qc`) qc: even f a qc is correct given the current validator set, it might not be correct given a future validator set. This can lead to liveness violations in some very unlikely hand-picked scenarios in which a byzantine replica takes advanatage of the validator set change and engineers the QC from the votes of a few byzantine and confused honest replicas and sends it to the replica lagging behind. ### #2 Sync on seeing a correct QC for an unknown block (event-based) The QC can be either a `justify` of a proposal or a QC message. The high-level idea is that a replica syncs blocks: 1. **On receiving a correct QC for an unknown block, contained in either a QC message or a proposal's `justify`**: so that it can advance to the view and accept the proposal of an honest leader. (If the QC is from the current view, the block is known and the QC is correct, a replica will advance via the view sync mechanism above), 2. **On epoch change:** in case it couldn't catch up with the quorum earlier because its validator set is outdated and so previously received QCs were incorrect.  When **a proposal with a QC** for view $v \geq$ `cur_view` is received by the `ProgressMessageStub`, it is immediately returned via `ReceivedQCFromFuture{qc, sender}` error (but also cached). Note: As mentioned in the "view sync" section, when **a QC** message with a QC for view $v \geq$ `cur_view` is received by the `ProgressMessageStub`, it is also immediately returned via `ReceivedQCFromFuture{qc, sender}` error (but also cached). On receiving `ReceivedQCFromFuture{qc, sender}` error the algorithm thread: * If `qc` is `correct` and `safe_qc`, sets `highest_qc = qc` and exits the current view (as specified in the "view sync" section). * Else, if `qc` is `correct` (cryptographically), correct ("generic" for a non-validator-set-updating block etc.) **but for an unknown block**, it enters sync with `sender`. After sync, it checks if the `qc` is `safe_qc` now, and - if so, updates its `highest_qc` and exits the view. **(NEW: resolves [issue 12](https://github.com/parallelchain-io/hotstuff_rs/issues/12))** - else returns to processing other progress messages for the current view. * Else it ignores it. Additionally, the algorithm thread should check if the sender is the leader of `qc.view + 1` (unless `qc.view` is an epoch view), and if not then ignore the error. On receiving `ReceivedTCFromFuture{tc, sender}` the algorithm thread: * If `tc` is `correct`, sets `highest_tc = tc` and exits the current view. * Else ignores it. Additionally, a **replica enters sync with a random peer upon entering each epoch view**. This way, a replica can catch up with others **even if its validator set is outdated**. **(!)** Assumption about Dynamic Validator Set implementation: a validator knows immediately when it is no longer part of the validator set and it doesn't vote anymore. Otherwise, a `correct` `qc` might not be correct. **Required changes:** 1. **Important:** If sync starts at time $t$ in view $v$ then it will timeout at $min(d_{v}, t + \delta_{sync})$. 2. **Important:** If a replica receives a proposal for the current view for a known block, it doesn't insert it, but it should vote for it. In other words, **we need to have different criteria for inserting vs. voting for a block**. This is to prevent liveness issues, in case a replica receives the proposed block via sync and then on entering the view of the proposal it can't vote for it. **Key idea behind view sync vs. block sync:** * Early QC broadcast on its collection serves to bring progress-making replicas to new view as soon as the current view achieves progress. * If a replica is lagging behind and not participating in the progress made by the quorum, it needs to undergo block sync. * Block sync is triggered by receiving a a QC or proposal with a QC **for an unknown block**. Since an honest leader will always broadcast a proposal, this guarantess that once the leader is honest, a replica can sync with the quorum (even if a byzantine leader that collected the QC failed to send it some nodes). It also guarantess that in case the replicas ahead no longer form a quorum, their honest leaders will keep broadcasting proposals that extend the last-QC-supported block until the replicas lagging behind can sync (unlike broadcasting a newly collected QC, which happens only once). This is particularly important for liveness with validator set updates. **Note:** * (+) This is most compatible with the view sync design, since for an honest leader to be able to bring all honest validators to the same view it is necessary that: - Honest replicas receive QCs and TCs from the future immediately. - If the received QC or TC is `correct` and `safe_qc`, then the replica can advance view immediately, and be ready to accept the entered view's leader's proposal. - If the QC is for an unknown block, then sync is called only once a proposal extending this unknown block is received, so that (1) we avoid unnecessary sync because of unused QC, and (2) if progress can't be made ahead this QC is persisted by subsequent honest leaders until lagging replicas catch up and a new quorum collects a new QC. * (+) Satisfies the enhancement property (3) above. **If the leader is honest and up-to-date, other honest replicas will also be up-to-date and vote for its proposal**. * (+) Compatible with definition no. 2 of "need to sync", i.e., lets replicas sync whenever there is a quorum ahead. * (+) The design above should avoid the liveness-threatening situation from [issue 12](https://github.com/parallelchain-io/hotstuff_rs/issues/12), since if the lagging replica receives a QC from the future for a block B* and with view $v$, but the sync peer only sends it blocks up to B* and a `highest_qc` for B*'s parent, the replica will update its `highest_qc` to the QC for B*, and enter the view in which B*'s child, call it B, is proposed, i.e. $v+1$. This should happen within $\delta + \Delta_{sync}$ from the moment when the QC for B* was sent, so the replica should enter soon enough to receive and process the proposal for B, without triggering sync again. * (+) This design, unlike the current block sync trigger, cannot be exploited for a DoS attack by overloading the replica with tons of incorrect QC or TC messages for a future view, which are prioritized despite being incorrect. Now incorrect QCs are immediately rejected by the algorithm thread, and epoch change sync exists as a fallback mechanism in case a replica finds received QCs incorrect because of unknown validator set updates. * (-) If there has been validator set updates, a replica will only sync if a QC for the validator-set-updating block from its known validator set is first received (should be received from the buffer), or on entering new epoch.  ### #3 Sync timeout on lack of progress (timeout-based) Define progress as: > Progress = > > 1. Receiving an acceptable proposal for a block **that extends the blockchain** > 2. Receiving an acceptable nudge **with a new QC** > 3. Receiving an acceptable block **that extends the blockchain via sync** Assume $\Delta_{retry\_sync} << \Delta_{sync\_trigger\_timeout}$. If no progress made for $\Delta_{sync\_trigger\_timeout}$ in view $> 0$ then trigger sync (as part of the current view, may spill over to the next view). If no progress made via sync, then schedule another sync attempt after $\Delta_{retry\_sync}$, unless progress made in the meantime. Design for compatibility with view sync: * When QC or TC from the future is received, then if it is correct, the view is advanced, otherwise the QC is ignored. * In case the validator set has changed, and hence the QC is incorrect, the lagging replica needs to keep advancing according to buffered messages or timeout until it can process a QC or TC from the updated validator set (for instance because it syncs with another replica and learns about the new validator set). Note that: * (+) In case of lack of progress due to a sequence of byzantine leaders failing to send valid proposals, replicas will not be stuck in a sync cycle, but rather they will try to sync every $\Delta_{retry\_sync}$, and eventually stop when an honest leader brings about progress. Hence, [this](https://github.com/parallelchain-io/hotstuff_rs/pull/18#issuecomment-1772013312) should not be an issue. * (+) Unlike in the [hotstuff-rs 0.3 discussion of this solution](https://github.com/parallelchain-io/hotstuff_rs/pull/18#issuecomment-1772013312), the scenario in which there is no progress because there is no quorum in sync doesn't happen here because of view sync (see property (5) of view sync). By the above view sync mechanism replicas can only go out of sync if the byzantines with a byzantine leader decide to send QCs to $\geq f+1$ honest replicas, leaving $\leq f$ replicas behind to move slower according to view timeouts. Yet in this case, even if partially byzantine, there is a quorum ahead, so there is someone to sync with. * (-) Yet the risk is that when an honest leader comes, her proposal will be at the same height as some blocks that didn't get a quorum of votes, so periodic sync will still continue until this block has a child. **It is essential that the periodic block sync does not prevent the quorum from voting for the above-mentioned block**. Yet the **timeouts are user-defined, hence nothing can be guaranteed here. Liveness will depend on the config values set by the user.** * (-) **Issue (!)**: This doesn't cover the situation in which a replica's protocol execution is suspended for some time and upon coming back, it is still making progress by processing buffered messages from the quorum ahead, but in reality it should sync. Even if the replica's sync timer causes it to sync right upon coming back online, a byzantine sync peer can prevent the replica from syncing, and it won't retry syncing because the replica will be making progress with the buffered messages. * (-) It doesn't offer the "fast track" for either view or block sync. * (-) Not very compatible with view sync, difficult to reason about the correctness and properties of view sync.  # Fix: Enable voting for a proposal received early via sync Regardless of the chosen view sync and block sync mechanism, and even for the current implementation, it might happen that a lagging replica $r$ will receive a sync response from a peer $p$ in view $v$ that includes the block proposed in $v$, call this block B. In this case, even upon entering $v$ and receiving the proposal for B, $r$ will not be able to vote for the proposal. If this means that the proposal cannot obtain a quorum of votes, then the view will timeout, and the next view can obtain the quorum. Hence, this situation does not threaten liveness, but it may constitute a violation of the enhancement property 3 (that an honest leader can always obtain a QC for its proposal, even if initially some replicas are behind on blocks or views). Proposed fix - in `on_receive_proposal`: * If the proposed block is identical to the highest block in the replica's block tree, then the replica doesn't insert the proposed block, but it sends a vote for it to the next leader. # Leader selection TODO