---
tags: Time, Ethereum2.0
---
# Lightweight Clock Sync Protocol for beacon chain
This is a follow up of [Sensor Fusion for BFT Clock Sync](https://ethresear.ch/t/sensor-fusion-for-bft-clock-sync/7142), [Time as a Public Service in Byzantine context](https://ethresear.ch/t/time-as-a-public-service-in-byzantine-context/6895) and
[Time attacks and security models](https://ethresear.ch/t/time-attacks-and-security-models/6936) writings.
The goal of the write up is to design protocol architecture to address clock sync problems in BFT context. Clock sync protocols can be rather simple, but it can be hard to reach precision and accuracy guarantees when arbitrary faults are possible and there are lots of participating nodes. There are three factors, which can help to solve the problem in practice: a) reference clocks, synchronized to a common Time Standard, b) independent local clocks and c) uncertainty of clock drift and network delays.
# Requirements
## Protocol requirements
There are two group of requirements to the protocol:
- precision and accuracy bounds, i.e. time estimates of participants should be close to each other (precision) and close to the common time standard (accuracy)
- scalability, we assume there can be thousands and tens of thousands of participating node
The requirements are discussed in more details [here](https://hackmd.io/x6CaC2OXQ-OTnaobQp39yA).
## Design requirements
However, reasoning about time in distributed systems can be difficult, e.g. establishing worst case bounds can be non-trivial. It can become even more complicated, if new features added to the protocol to solve particular problems (e.g. found during simulations).
We therefore formulate two additional requirements on the design level
- protocol should have clean and extensible structure
- it should be relatively easy to reason about the protocol properties (precision, accuracy, scalability, resources)
# "Mathematical" design
Reasoning is one of the most important problems. In general, we are using [Sensor Fusion Framework](https://ethresear.ch/t/sensor-fusion-for-bft-clock-sync/7142), which establish connections to already known methods and their properties. Particularly, we employ the following "tools" and simplifications:
- intervals and interval arithmetic, to model uncertainty
- three sources of uncertainty and faults: a reference clock, a local clock and network delays
- single kind of fault: a true value is beyond interval bounds (of estimate of the value)
- interval fusion procedures (robust interval aggregation)
- theorems about interval fusion and its properties
## Uncertainty, faults and intervals
Even correctly functioning devices are susceptible to fluctuations: clocks can output readings meandering about true value and network can delay messages for varying periods of time. This directly affects time-related operations.
The uncertainty is inevitable, but can be bounded. We represent the uncertainty with intervals. And assume a fault to occur only when an output value is beyond admissible bounds. As a consequence it allows to hide certain faults implicitly, when summing up several intervals, i.e. a reference clock can go wrong, but network delay can compensate this in some cases, so the resulting estimate can still include true value.
We assume three sources of uncertainty - and faults, correspondingly:
- reference clocks, which output intervals that should containing true time (as defined by a common time standard).
- local clocks can be drifting, but drift rate is limited, i.e. clock skew belongs to an interval around 1 (second per second).
- network delays messages, but the delays in normal conditions should belong to an interval too.
All other sources of uncertainty can be bounded by corresponding intervals, however, we ignore them (e.g. they can be subsumed by a network delay interval).
As we assume that there is only one kind of fault - a value doesn't belong to an interval (that it should belong), other faults should be mapped to the fault kind (see [here](https://ethresear.ch/t/sensor-fusion-for-bft-clock-sync/7142) for more details). We are assuming a limited amount of faults at the same time, e.g. < 1/3 typically. Intervals can be widened to reduce probability of a fault, but this will lower accuracy and/or precision.
## Mathematical goals
There are two main properties that we care about:
- an interval contains true value
- an interval width, i.e. the estimate uncertainty meets the required bounds
As we are assuming reference clocks, synchronized to a common standard, the most important value kind is the time as understood by the common standard (e.g. UTC).
The second property ensures that time estimates are useful, otherwise one could choose a very wide, but uninformative interval.
The properties are assumed to be hold for input data and sensor fusion procedures should preserve them, so that they hold for outputs too.
## Reasoning about intervals
In BFT context, however, faults are possible, so we assume the properties hold for majority of cases, but not for all of them. Sensor fusion procedures therefore should be robust and preserve the properties for output result despite partially incorrect inputs, under the appropriate majority assumptions.
We need an instrument to reason about such properties. With intervals and the single fault kind, the reasoning actually becomes rather simple: if majority of estimates are correct, then we can construct an interval which is correct too (see [here](https://ethresear.ch/t/sensor-fusion-for-bft-clock-sync/7142) for more details), i.e. it contains the true value and the uncertainty is bounded. This is the main "theorem", which is to be discussed more rigorously and formally in a separate write up.
Weaker forms of the theorem are possible, which will gives reasonable estimates, even if such an interval doesn't exist. For example, one can state that a lower boundary of interval is wrong with less that 1/3 cases and the same holds for upper bounds. Thus, we can also construct an interval, containing the true value. In other words, even "honest" clocks can go faster or slower, but only a third minority are wrong the same way. So, there is <1/3 slow clocks, < 1/3 fast clocks and > 1/3 true clocks.
An additional important property, which is important if an iterative/recursive estimation is desired, is that the estimate fusion process is a contraction, i.e. it shrinks estimates. Then, it can be applied iteratively, e.g. to get better estimates or to employ implied timestamps. Otherwise, output uncertainty can grow and becomes useless at some point.
## Estimate model
In the presence of reference clocks, thus, the main goal for a honest node is to obtain most accurate true time estimate and exchange them in most accurate way possible.
Any correct estimate is modeled by $E(t) = t + acc(t) + net(t) + drift(t)$. Where $acc(t)$, $net(t)$ and $drift(t)$ contain zero. Therefore, a correct estimate contains $t$.
A fault is modeled by adding an additional unbounded term $e(t)$, which is zero (or small) for correct estimates and arbitrary for faulty ones.
# Framework
If majority of estimates are correct, then we can construct a correct estimate. But it should be tight enough to be useful. In general, precision and accuracy bounds are proportional to the sum of reference clock, network delay and local clock drift uncertainties. If faults are uncorrelated, then bounds should be tight, however, an attacker may construct correlated faults, so that worst case bounds are reached.
Measures can be taken to improve the bounds, e.g. better estimates are known and.or can be constructed. However, the additional information can be costly to communicate when there are lot of nodes.
In the section, we explore which data is available and required for constructing time estimates. And which processes are needed to produce final time estimates.
## Data and time sources
Raw data is obtained from time sources, either **reference clocks** (RC) or **local clocks** (RC). We assume all nodes possess at least one reference clock and at least one local clock. These constitute **local time sources**. A node can fuse its local time sources and produce a **local estimate**.
**NB** In general, some nodes can operate without any reference clock, since they can rely on the estimates received from peers. That can be modeled as if such a node possesses a faulty RC.
Besides own local data, nodes can send their data to peers and receive it from them. The data received from peers constitute **remote time sources**. Remote and local time sources are then combined to produce **final estimate**. In general, it can be done in an iterative/recursive manner, i.e. final estimates obtained at round N can be used as input data in consequent rounds.
Historical data can be used to improve estimates, e.g. to calibrate clocks and/or filter out erroneous readings.
Besides time data, there are parameters, which are very important to achieve good accuracy and precision, e.g. reference clock accuracy, local clock drift and network delay bounds.
### Reference Clocks
Correct reference clocks are synchronized to a common time standard. Let's denote the un-observable "true" time, predicted by standard, with small $t$ letter. It's predicted with reference clocks, up to some offset and scale (prescribed by the common time standard), which do not matter.
We model a *Reference clock* as a function from real time to an interval, i.e. $R_q(t)$ mean an estimate of a reference clock $q$ at the point of real time $t$. If the clock is correct, than $t \in R_q(t)$.
### local clocks
A node have an immediate access to an interval local clock, i.e. they have a un-observable offset to a reference clock. The clock rate also can drift, however, for a correct local clock, the drift is bounded.
We model local clocks in a similar way, i.e. it's a function from real time to a point estimate (in general, it's an interval too).
The property of a correct local clock $q$ is that it can estimate intervals of time, e.g. $\exists \rho \gt 0 \forall t_1, t_2 {1 \over (1+\rho)}(t1-t2) \le |L_q(t_1)-L_q(t_2)| \le (1+\rho)(t_1-t_2)$. An immediate consequence, is that real time interval is bounded by local measurements in the same way, i.e. ${1 \over (1+\rho)}(L_q(t_1)-L_q(t_2)) \le |t_1-t_2| \le (1+\rho)(L_q(t_1)-L_q(t_2))$.
Using interval arithmetic, $L_q(t1)-L_q(t2) \in (t1-t2) * \delta$, where $\delta$ is interval $[{1 \over (1+\rho)},(1+\rho)]$.
## Time operations
In order to construct reliable estimates in presence of faults, nodes need to exchange their clock readings and/or estimates - to perform **time transfer**. However, they cannot do that too often, especially assuming, there are lots of nodes. So, nodes need to perform **time keeping** in between, i.e. to update their estimates, as time passes. In BFT context, remote clock readings can be faulty including Byzantine faults. So, a node should **filter** out erroneous readings. In general, we consider **filtering** as a part of **sensor fusion**, since robust **sensor fusion** procedures explicitly filters out outliers.
Nodes' clocks may have varying properties (clock accuracy, stability, drift, etc). In order to reduce amount of data to exchange, a common standard is assumed, which is necessarily conservative. Real devices can be more accurate. Thus, devices can be **calibrated** and tighter bounds can be used, leading to improved accuracy and precision of final estimates.
### Time keeping
Let's assume a node $p$ obtained at time $t_1$ a time estimate $T(t_1)$ (e.g. read a reference clock or received over a network from another node). Later, at time $t_2$ it can calculate an updated estimate, using its local clock. Assume also, that it also measured its local clock $L(t_1)$ at $t_1$. Then, later, at time $t_2$, it can estimate $T(t_2) = T(t_1) + (1+\delta)(L(t_2)-L(t_1))$, where the $1+\delta$ term accounts for the local clock drift during the period. In general, it's more correct to call it a $q$'s estimate of $T$ at time $t_1$, i.e. $E_q T(t_2)$, where $E_q$ denotes 'estimate by $q$'. For example, process $q$'s estimate of process $p$ time can be expressed as $E_q E_p T(t)$.
### Time transfer
Nodes can access remote clocks via a message exchange, which entails a message delay. The delay is unknown, but for a correctly functioning network at the moment of message transmission, we assume the delay is bounded, which we denote with $\Delta$ letters. It can be path dependent, i.e. we use $\Delta_{pq}$ to denote a delay bounds when sending a message from $p$ to $q$.
We model time transfer in the following way:
- process $q$ maintains an estimate of real time, that can be a reference clock reading or a derived estimate. Typically, we assume that it's a real time estimate, i.e. $t \in E_q(t)$, if the process is correct around $t$.
- at the moment $t_1$, $q$ sends its estimate $E_q(t_1)$ to process $p$.
- $p$ receives the message at time $t_2$ and records local time $L_p(t_2)$. At $t_2$, $p$ can estimate $E_q(t_2) = E_q(t_1) + \Delta_{qp}$, as we assume that network delay $t_2-t_1 \in \Delta_{qp}$. We assume that clock drift over the period is negligible (it can be incorporated into $\Delta_{qp}$).
So, process $p$ can use the above *time keeping* formula, to obtain an estimate at any time $t$, process $p$: $E_q(t)$ by calculating an interval $E_q(t) = E_q(t_1) + \Delta_{qp} + \delta*(L_p(t)-L_p(t_2))$.
### Filtering
Erroneous reading can be filtered out. For example, a reference clock is accurate with $\pm 500ms$, then two clock readings can be 2 seconds apart. Additionally, local clock can drift, for example, two ref clock readings are performed with 10000 seconds interval. Local clock can drift $\pm 10000 * 0.0001 = \pm 1 sec$. So, if two such ref clock readings are more than 3 seconds apart, then one of the is wrong. If there is a history of reference and other clocks readings, then a minority of erroneous readings can be filtered out.
In general, we consider filtering to be a part of robust sensor fusion. Though, it may be beneficial to consider filtering explicitly in some cases.
### Sensor fusion
There exist numerous procedures for robust sensor fusion (see examples [here](http://users.cis.fiu.edu/~iyengar/publication/On-Precision-Bound-of-Distributed-Fault-Tolerant-Sensor-Fusion-Algorithms-2015.pdf)). We mainly use [Marzullo's algorithm](https://en.wikipedia.org/wiki/Marzullo%27s_algorithm), which is simple and worked best in our experiments, though simulations and analysis with other fusion procedures are to be done too.
### Calibration
Calibration of local clocks can improve accuracy of estimates dramatically. Network delays between nodes can be measured and calibrated too.
As resulting uncertainty depends on input data uncertainties and nodes can keep historical data, it can be highly desired to build more accurate models of clocks and network delays.
The general idea is that [Theil-Sen](https://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator) or [repeated median regression](https://en.wikipedia.org/wiki/Repeated_median_regression) can be adapted to the interval settings and thus can be used to robustly estimate simple linear models using interval historical data.
So, a node can produce **local estimates** which filter outs outlier reference clock readings, with respect to historical data. The same approach can be applied to filter out outlier peer time estimates. In result, a node produces its **final estimates** using not only local and remote time sources, but historical records too. Which gives greater abilities to tolerate periods when amount of faults exceeds expected level, for a reasonable period of time (say, less 1/3 or 1/2 of the period that historical data is kept in memory)
The approach can be difficult to implement and to assess. We therefore leave discussion of details until further writings.
# Design choices
In theory, there can be multiple ways, how clock synchronization performance and fault-tolerance can be improved in theory. In practice, however, there are bandwidth limitations. Which are critical in our case, since we are assuming tens of thousands nodes,. In the section, we overview main design choices and their consequences.
## Protocol kind
We know three main ways, how BFT clock sync protocol can be implemented:
- adapted BFT agreement
- iterative In-exact/Approximate Agreement
- non-iterative In-exact/Approximate Agreement
### Adapted BFT
There are clock sync protocols which are adapted versions of BFT Agreement protocols. We do not consider the option, since they require lot of communication and because of circular dependence problem: our clock sync protocol is intended to harden BFT protocols, which depend on clock synchronicity assumption.
However, reliable exchange of parameters may require a BFT protocol.
### Iterative
There are Approximate/In-exact BFT Agreement protocols (see details [here](https://ethresear.ch/t/sensor-fusion-for-bft-clock-sync/7142)), which are iterative: i.e. participants have initial estimates and they exchange with them iteratively, reducing discrepancy between honest node estimates.
As we assume reference clocks, synchronized to a common Time Standard, iterativity is not required. However, it can be a helpful to either improve accuracy and precision of estimates or to reduce network traffic.
If the clock sync protocol is a complement to some other BFT protocol, operating in a lock-step fashion (like Ethereum 2.0 beacon chain protocol), then the existing message flow can be employed to transfer clock sync data. A particularly promising case is to use implied timestamps of the message flow, if such protocol prescribes that certain messages should be sent in a pre-defined moments of time. However, such moments of time will be defined relating to **final estimates** in most cases. That means the time estimation is iterative/recursive.
### Non-iterative
Non-interative fusion of clocks is the most simple option, though it requires to transfer **local estimates** (and/or other local data). In particular, it does not require fusion procedure to be a contraction, so it can give reliable estimates in a broader set of situations.
Therefore, we choose it as the main option, which can be augmented with iteration and/or BFT protocol for parameters exchange.
## Data transfers
### Parameters
In order to calculate final estimates, nodes need to know bounds of reference and local clock accuracies and network delays. The data can be specific to a particular clock or network path and/or can vary. Exploiting tighter bounds can be beneficial for accuracy and precision. However, it can be very costly to transfer the amount of data.
The main option is therefore assume predefined estimates, which are necessarily conservative. Or refresh them relatively rare.
In some cases, tighter bounds can be obtained by using **calibration** using historical data.
### Relaying peer's data
Relaying other nodes data can be very helpful to fight Byzantine Faults, for example, a faulty node can send different estimates to different peers (that can be a consequence of varying network conditions too). If nodes relay data, received from peers, that can help to filter out certain faults.
However, it can easily overload network resources.
In a p2p network, nodes transit messages, so it's useful to employ data relaying in a limited form. For example, a node $p$ can receive clock readings from node $q$ via different routes, due to message routing/diffusion in p2p-graph. So, this is the additional information, which is available "for free" for the clock sync protocol, given that it's already necessary in some other protocol.
### Local data
There is also a number of variants, which local data can be sent to peers.
#### All available data
A node can have several local sensors, i.e. one or several local clocks and one or several reference clocks.
**NB** We assume reference clocks to be local resources, i.e. specific to a particular node, though in practice, it's an interface to receive data from a remote clock, e.g. [GNSS](https://en.wikipedia.org/wiki/Satellite_navigation) satellite or [NTP](https://en.wikipedia.org/wiki/Network_Time_Protocol) server. Transferring the info can be beneficial for accuracy and precision, but will require lot of traffic.
#### Local estimates only
To reduce traffic, instead of sending all its info to other nodes, a node can send its local estimate only, which is a fusion of local and reference clocks.
#### Final estimates
As pointed out earlier, a node can send its **final estimates** only, which is a fusion of local data sources together with data, received from other nodes. That can reduce network traffic, if the node participates in another protocol, where is should send messages at predefined moments of time.
Thus, recipients of such messages can calculate implied timestamps of when the message had to be sent. As a result, protocol can piggyback existing message flow with zero-cost overhead, which can be very important in some cases.
For example, in Ethereum2.0 beacon chain protocol, attestation messages are to be aggregated. Aggregation format is very efficient, i.e. one bit per attester is used to denote attestation producer and attester signatures are aggregated using BLS signature aggregation scheme. Adding just one new field to the attestation messages will increase the size of an aggregate considerably.
# Implementation
Before discussing particular implementation strategies, let's overview general assumptions and terminology.
## Node-level assumptions
Each node:
- has one or several reference clocks (RC)
- one (or several) local clock (LC)
- maintains a local estimate (LE), by fusing its local data (RC and LC)
- maintains a final estimate (FE), by fusion of data received from others node and local data (e.g. LE, or RC+LC)
- exchanges estimates with peer nodes (PE)
## Network-level assumptions
Connectivity and membership:
- membership is static (will extend to dynamic membership in future)
- communication graph is p2p, though it can be fully connected in a simple case
## Parameter-level assumptions
Parameters:
- three groups of interval bounds:
- network delay, can be the same (for all paths) or path dependent (e.g. $\Delta_{pq}$)
- reference clock accuracy, can the same (for all nodes) or node/clock dependent
- known to nodes, need not be estimated typically
- local clock drift, the same for all nodes typically (e.g. $100 ppm$)
- real (vs nominal) clock drift can be better (less magnitude), it can vary from node to node and can be estimated. We assume it's not known to nodes, though can be specified during simulations
- adversary power limitations
- unforgeable signatures
- can control a minority of nodes (or their clocks), typically, $\lfloor {(n-1) \over 3} \rfloor$ or $\lfloor {(n-1) \over 2} \rfloor$
## Initial design
We start evaluation of the approach with the most simple and lightweight approach:
- one RC and one LC per node
- LE is maintained by simple time keeping (no filtering, calibration, etc)
- exchanged estimates are either LE or FE (last one results in a form of recursion)
- FE is a fusion of LE plus remote estimates (peers' LE or FE)
- n-to-n connectivity, nodes communicate by direct channels
- membership is static
- all parameters are known beforehand
### Viability
Below is a sketch proof that the approach is viable. A more formal and rigorous proof with worst case bounds analysis is to be described in a future work.
A correct node can maintain a LE estimate (i.e. latest RC reading adjusted with elapsed time): $LE(t) = RC(t_1) + \delta * (LC(t) - LC(t_1)), s.t. t \in LE(t)$, if no fault occurred.
A correct node ($p$) can maintain estimates of LE's (or FE's) of other nodes ($q$): $E_p(LE_q(t)) = LE_q(t_1) + \delta (L_p(t) - L_p(t_1)) + \Delta_{qp}, s.t. t \in E_p(LE_q(t))$, if no fault occurred.
Under the third minority assumption (total amount of faulty intervals $\le \lfloor {(n-1) \over 3} \rfloor$), a correct node can calculate a fused interval, which still contains true time.
The width of the interval is less than max width of correct intervals, if the fusion procedure is a contraction. That can be important if nodes exchange with FE kind of estimates. However, if the width expansion is limited, it can be okay, if nodes exchanges with LE kind of estimates. The last can be beneficial, since fusion procedure can give meaningful results in a broader set of cases (e.g. when almost half of reference clocks are faulty).
**NB** Care should be taken in the case of recursive estimation sent over network, since network delays widen the estimate interval. To the best of our knowledge, that can be handled by some sensor fusion approaches and our preliminary experiments support this. We will investigate this issue in more details and more rigorously in future works.
### Implementation code
We have implemented the simple protocol as well as simulation code in Kotlin aiming to obtain simple and concise code, which is easy to analyze. The code is available in [here](https://github.com/ericsson49/research/blob/master/clock_sync/src/clock_sync.kt).
#### Preliminary simulation results
We have simulated a simple attack, when an adversary can control some fraction of node's reference clocks (emulates [NTP attacks](https://hackmd.io/bSE1ArzpSfWLw6F-6wyWhQ)).
Using Marzullo's algorithm, the protocol can tolerate big clock offsets, incurred by adversary for < 1/3 or < 1/2 of faulty clocks, depending on algorithm settings.
Simulation parameters:
- N=1000 amount of nodes
- ref clock accuracy $\pm 500 ms$
- network delay $\Delta = [0, 2000ms]$
- (N-1)/3 or (N-1)/2 faulty clocks
- k (expected maximum amount of faults) = (N-1)/3 or (N-1)/2
- one- or two-sided attack (i.e. adversary shifts faulty clocks in the same or in both directions)
- faulty clock offset - 10000ms
EF correct means FE interval contains true value
Precision means discrepancy between nodes' FE midpoints.
Offset means approximate offset of FE (should be less 1500ms, to be correct).
|attack type|#faults|k|FE is correct| offset|precision|
|---|---|---|---|---|---|
|no attack|0|333|yes|$\pm$ 100ms|~170ms|
|one-sided|333|333|yes|200ms|~200ms|
|one-sided|499|333|no|~5000ms|~200ms|
|one-sided|0|499|yes|$\pm$ 100ms|~200ms|
|one-sided|333|499|yes|~600ms|~240ms|
|one-sided|499|499|yes|<1500ms|~250ms|
|two-sided|666|333|yes|$\pm$ 100ms|~300ms|
Resulting time estimates remain correct under attack (i.e. resulting interval includes true time), though accuracy becomes worse, but within expected bounds`< RC accuracy + max delay` / 2, with the exception of the case, when amout of faults exceeds $k$ (i.e. maximum expected amount of faults, which is normal). Discrepancy between nodes' estimates (i.e. a form of precision measure) doesn't change much, even under severe attacks.
An interesting moment, is that protocol can tolerate about 2/3 of faulty clocks, if < 1/3 of clocks are too fast and < 1/3 are two slow. Precision loss is moderate and estimates remain accurate on average (quite similar to no attack case).
Besides attacks, network delays seem to be a major factor affecting precision, since network delays are Byzantine to some degree, i.e. when a node broadcast its estimate to peers, different peers can receive the message with varying delays (including message omissions modeled as being delivered in some distant future).
Simulation details is to be described in an upcoming write up.
## Further improvements
The initial design can be improved:
- several reference clocks can be used (typical NTP setup includes 4 NTP servers), as well as several local clocks
- historical data can be employed:
- skew of own local clock can be estimated from data
- skew of peer local clocks can be estimated too
- actual network delays can be estimated
- outlier reference clock readings can be filtered out
- outlier peer time estimates can be filtered out
- p2p graph support
- clock readings received via different routs can be supported (e.g. employing transient traffic)
- network delay along paths can be estimated (if a message contains information about the path)
We are going to implement and evaluate the designs in near future.
<!--
## Parameters transfer and estimation
Each node need to know parameters to calculate estimates, however, they have to be communicated somehow. Reliable data exchange in NFT context can be a problem, so we assume parameters are known beforehand.
In practice, it means a lmited set of parameters, i.e the same local drift bound (e.g. 100 ppm), the same reference clock accuracy (e.g. $\pm 500ms$), the same network delay (e.g. 2 sec).
It's safe to assume a bigger interval, so it's reasonble in many cases. However, it lowers down available precision and accuracy. So, it's desirable to sommunicate more accurate estimates somehow.
If p2p-graph topology is known or messages contains its path, then path dependent delay estimates can be constructed (e.g. 1 second per hop). Nodes can dessiminate its reference clock accuracy periodically (to lower load on the network).
An alternative to transfer is estimation from historical data. Drift of own local clocks and local clocks of peers can be estimated with a robust model. Network delays can be estimated too.
### Local clocks
In reality, a local clock returns discrete measurements, so two consecutive reads can return the same result, breaking the above assumption. Indeed, there is some accuracy associated with reading a local clock. We can either add a markup to local clock readings or assume that it should not be used to measure too short intervals.
Actually, in practice, there are other delays, like the can be a delay between an event and the time when its measurement.
We assume, that a node can fully control its local clock with the exception of minor random fluctuations, which cannot be controlled by an attacker either. So, we assume the local clock inaccuracy is insignificant (or can be made so) compared to network delays. Local clock faults are possible, but they cannot be induced by an attacker and are independent.
Basically, such inaccuracy can be subsumed by network delays. And local clock faults can be subsumed by node faults.
## Time transfer and time transfer
## Time keeping
## Time transfer
## Communcation costs
There can be many ways to organize fusion of available clocks. The main limiting factor is network resources required to transfer data. As we assume there can be tens of thousands of nodes, mutual clock exchange can be very expensive. For example, if each node sends its clock to other nodes, then there are $n*(n-1)$ message total per round. But each node has lot's of time estimates: own local clock (or clocks), several reference clocks typically (e.g. 4 time servers in the default NTP setup) and time estimates from other nodes.
In BFT context, where two-face clocks are expected, propagate other nodes estimate via multiple paths would be beneficial, though very expensive.
Therefore, we assume that each node sends a small number of estimates per round. There can be the following cases:
- piggyback existing message flow, i.e. use existing network-level timestamps or implied timestamps, which can be calculated from message contents (e.g. slots in blocks and attestations).
- send an explicit estimate
- send an explicit estimate plus accuracy information (e.g. interval width)
- send several estimates, e.g. own local clock plus reference clock (with or without interval widths)
Sending interval widths can improve accuracy and precision, since else one has to assume maximum admissible interval, which is necessarily wider.
## Sensor Fusion to estimate Time
### Most simple approach
#### Clock exchange
Each node $p$ maintains its local estimate of true time $t \in ET_q(t)$, combining in some way available clocks, including past records, if needed.
When appropriate, it sends the estimate to others, either implicitly or explicitly, either as an interval or as a point estimate.
#### Estimate fusion
Each node also receives other nodes estimates. At any point of time, it can use the *time keeping* formula to calculate adjusted estimates, so that it's restricted by the actual time, the estimate was received.
**NB** Due to clock drift, the more time passed after receiving the less accuracy/precision will be. Therefore, each peer estimate should be refreshed during an appropriate period of time (e.g. if $\rho = 100 ppm$ and refresh happens about once every $D = 10000$ seconds, then two clocks can differ by about $2 \rho D = 2$ sec.
So, each node maintains a set of estimates $ET_q(t)$ from its peers, with the property that $t \in ET_q(t)$, if there has been no fault during or the faults cancel each other.
As we assume limited amount of faults, then the node can apply robust interval averaging function, like Marzullo's algorithm or Brooks-Iyengar algorithm. Non-interval averages can adapted too.
# Rest/Drivel
In SF Framework, clock synchronization protocol typically consists of two parts:
- clock offset exchange protocol, where nodes exchange with relative clock offsets
- time source fusion, where gathered data is fused with local and remote reference clock readings
## Clock offset exchange
In order to minimize changes in beacon chain protocol, it's tempting to build a clock offset exchange protocol by piggybacking existing message flow.
### Message flow and implied timestamps
Validators issue two main kind of messages which are to be propagated to all participants, which makes them great candidates to be used for clock synchronization: blocks and attestations.
However, block are proposed relatively rare: e.g. if there are 300K validators and slot duration is 12 seconds, then every validator on average will propose each 300K * 12 / 3600 / 24 = 41.667 day, which is very rare.
An attestation should be issued each epoch by each validator, which is a much better rate. An attestation should be issued after one third of a slot has transpired or after a valid block for the slot has been received, whichever comes first. This ambiguity makes it more difficult to calculate implied timestamp of the attestation, however, given high rate of such messages (every validator should issue an attestation once during 32*12 seconds) this gives plenty of possibilities to obtain precise clock differences. Worst case bounds can be not so good, however.
An alternative is to to append a field, which records an offset, which specifies sending time relative to one third of a slot point of time.
However, the beacon chain protocol assumes attestation aggregator role (one per committee), which send an aggregate once two thirds of a slot elapsed. There are maximum 64 committees per slot and 32 slots per epoch, which means 2048 validators send an aggregate each epoch, for 300K validators, one validator will send an aggregate once per 150 epochs or viewing from different angle, 150 epochs are needed to gather aggregate messages from all participants, which means 150 * 32 * 12 sec = 57600 sec. Assuming 100ppm clock drift, that means a validator clock can drift around up to 11.5 sec (2 * 57600 * 100 / 1000000).
However, it's expected that there will be much less nodes than validators, e.g. around 10K nodes. Clocks are attached to nodes, so in general, nodes will send messages much more often, depending on distribution of amount of validators per node. So, messages from all nodes will be gathered several times faster.
Summing up, there is inherent uncertainty about several seconds due to either uncertainty of when an attestation has been sent or clock drift over a period of time required to gather attestation aggregates from all nodes.
One improvement that can be made is to send an attestation within 1 sec window before one third of a slot, which will limit uncertainty of implied timestamp of an attestation to 1 second. As significant time required to prepare and propagate a block and to verify it by attesters, 1 second windows is a reasonable compromise.
## Time source fusion
In the case of implied timestamps, each node has three kinds of time sources:
- local clock(s)
- reference clocks
- implied timestamps of incoming messages
A very simple approach is to use Marzullo's algorithm (or slightly more complex Brooks-Iyengar algorithm) to fuse the readings of the clocks, which behaved quite well in simulations.
There can be different ways to implement that, but the most simple one is to use local clock disciplined by reference clocks of the node to provide local intermediate estimates of world time (since reference clock are assumed to be synchronized the world time standard). The local estimate can be combined with other node estimates, which are obtained from implied timestamp of incoming message flow. E.g. when an incoming message arrives, a node records receiving timestamp, calculates an implied timestamp of the message, calculates an interval estimating sender-receiver clock offset and stores it in a table.
When a fused time estimate is required, a node forms an array of intermediate estimates, which consists of:
- local intermediate estimate, calculated as described above
- for other nodes, the most recent clock offset interval is retrieved from the corresponding table, and the interval is adjusted according to estimated clock drift, proportional to time elapsed from the last message has been received
Then Marzullo's (Brooks-Iyengar) algorithm can be used to calculate the required fused estimate.
One simple modification is that a node can keep history of relative clock offset estimates per each remote node as well as history of its own reference clock readings. The histories can be filtered using robust statistics or robust regression approaches, described earlier.
This will result in an approach which uses "spatial" as well as "temporal" fusion of clock readings.
In general, if a clock rate is estimated more accurately with a robust clock calibration approach, the info can be propagated to other nodes, so that more accurate clock drift adjustments can be made. However, this will require BFT broadcast protocol, which can complicate clock sync protocol significantly.
-->