owned this note
owned this note
Published
Linked with GitHub
---
tags: Time
---
# Time Considerations of Ethereum 2.0
Goals of the doc are:
- problem statement
- problem analysis
- establishing terminology
- overview of implementation opportunities
- anlysis of problems and attacks
- overview of solutions and ways to solve problems
prelim notes
terminology
time sync
time keeping
XO props
## Preliminary notes
Initially, this work started as a node time synchronization research, based on Vitalik's proposal (link??). But throughout discussions with colleagues, the scope has expanded.
First, different people have different usage of time in their minds, leading to frequent misunderstandings. Time turned to be very familiar but not that trivial concept.
Second, terminology differs among people, leading to the same problem of misunderstanding.
Thrid, Ethereum 2.0 specs lack comprehesive overview of time requirements (and analysis of problems arising when the requirements are not met).
Fourth, Ethereum 2.0 research takes a somewhat different route than the more traditional academic research on BFT protocols. In particular, the latter considers time aspect as a very foundational, while the former arguably treats it as a third class citizen.
The work attempts to resolve the issues, or at least, start the process.
### BFT cosiderations
In non BFT context (assuming fail-stop or so faults), it's typical to relay on NTP to synchronize clocks. Other methods could be employed like GPS or radiowave time synchronization. While it's not expensive, it's hardly appropriate to require something besides NTP.
However, relying on NTP questions whether such distributed system is really decentralised. And what is BF tolerance properties of such system?
Ethereum 2.0 relies on assumption that validator clocks are synchronised. If clocks are synchronized via NTP (as it is expected to be), then the overall system is vulnerable to NTP-level attacks.
However, nodes have on board crystal oscilators (XO), either in form of Real Time Clock or XO driving CPU, the former typically having a battery, which preserves clock offset during restarts. So, there arises idea that a time synchronization protocol can be implemented, which relies on the node's internal clocks.
There is a limited set of world time standards (atomic clocks are somewhat expensive), so in general, BFT properties become limited by the ability of an adversary to control clocks or sync paths to them.
#### Validator clock synchronization
Clock phase/synchronization is needed, so that validator actions (block/attestation producing) are coordinated. However, one can investigate whether `logical clocks`, i.e. message level synchronization, can be employed here.
#### World time synchronization/syntonization
Slot duration should remain as specified by specs, despite potential "pressure" from clock synchronization protocol or/and validator incentive to receive more frequent rewards.
That can be specified as a clock rate synchronization - or syntonization - with world time standard.
Around `GENESIS` validator clock phase should be synchronized with world time standard, so that chain can start at the correct moment. But that should not be a big deal.
A more interesting is: should validator clock offset/phase continue to be synchronised with world time, thoughout consequent chain operation?
It's definitely beneficial on application layer, probably required on network layer and necessary for debugging purposes, however NTP service should be enough for the goals.
## Time requirements
### Initial sweep through specs
In the case of Ethereum 2, the starting point related to time is the following quote (from [fork-choice specs](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/core/0_fork-choice.md#fork-choice))
> Honest nodes are assumed to have clocks synchronized within SECONDS_PER_SLOT seconds of each other
An additional requirement is that slot duration should be `SECONDS_PER_SLOT` (as specifed by the [specs](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/core/0_beacon-chain.md#time-parameters)). This requirement may seem trivially excessive, but a clock synchonization protocol may synchronize local clocks of participating nodes, while clocks may drift relative to world clock. However, clock rate translates to a reward rate, e.g. faster rate results in an increased rewards and vice versa. So, cryptoeconomic assumptions must hold as well.
And there may be an incentive to speed the whole system up as well as a dis-incentive for participants, if the whole system is slowed down.
However, [p2p-inteface spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#configuration) assumes `MAXIMUM_GOSSIP_CLOCK_DISPARITY` to be `500ms` at the moment of writing.
This is not necessarily a contradiction, because these are requirements on different layers.
## Terminology
Accuracy
Tueness
Precision
Bias
Rate
Offset
Skew
Drift
Handover
Synchronization
Syntonization
## Requirements analysis
### Clock synchronization
The reason for clocks to be roughly synchronized is that (honest) validators should behave in a lockstep fashion. The main actions - block proposal and attesting a head - should be performed at time moments which are specified relative to slot
slots should start roughly at the same moment for different (honest) validators.
If a validator's clock is severely slow, then other validators will ignore its votes. That means there is one less honest validator. The more validators there with slow clocks, the higher the risk that there will not be enough votes to justify an epoch.
Another case is block proposal. If a proposer has slow clocks, then attesters won't see its block, when they are to attest latest head.
TODO Obviously, the validator clocks should be synchronized within a second or even better, since any clock disparity reduces available communication window (to propagate blocks/attestations).
And latest p2p-inteface spec update also assumes 500ms max disparity time.
### Clock syntonization with global clock
### Local clock accuracy
A modern computer typically has Real Time Clock (RTC), which is built around quartz crystal oscilator (XO). Even is RTC is absent, CPU is driven by quartz XO.
Typical RTC is not stable in long term
-
- Internal (RTC) clocks are not stable in general, so most users rely on NTP services to synchronize clocks. However, NTP servers are semi-decentralized, e.g. there are only several "true" clocks and other servers are just relaying the info.
Can we build distributed clock synchronization based on local node clocks?
There is extensive literature on Distributed Clock Synchronization, including Byzantine ones.
I won't cover it at the moment, but the main factors limiting accuracy seem to be network delay/delay instability and graph diameter. E.g. if network delay about 200ms and graph diameter is about 5, then we can expect accuracy around 1-2 second.
If we reduce diameter, by building n-to-n graph, then the communication overhead grows quadratically, which can be a problem for 10K node networks.
Overall, that might be acceptable. However, I need to consult literature for details.
### Problems with local clock consensus
If nodes synchronize clocks with a distributed consensus algo, they will have roughly the same bias and clock frquency. However, both bias and frequency can differ from world clock.
E.g. slot time can be 10 seconds instead of 12, but if all participants have roqughly 10 second slot durations, it's not a problem from local clock synchronization perspective.
However, this is a problem from cryproeconomics point of view, since rewards are recalculated each epoch. And the reward size doesn't depend on the actual time spent.
## Terminology
### World time - UTC
The ultimate goal is to synchronize node clocks with the universal world time standard (UTC, for example).
In ML sense, we should design a world time prediction algorithm, based on available time sources.
If prediction errors are quite small for all participants, then discrepancies between local times are also small. If their are not, we may need to impose an additional restriction on pairwise discrepancies.
### Local (quartz) clocks
Each node should have local quartz clock, either RTC (Real Time Clock) or processor quartz/tick counter.
The actual frequency is not known exactly and may vary due to temperture changes (the most important factor) or aging.
So, local quartz clocks can drift around several seconds a day - which is a problem for ETh2.
If calibrated properly, we can expect better stability.
The main advantage of local quartz clock is that it's always accessible and cannot be attacked by an adversary (or such an attack is extremely difficult).
We assume, that local quartz clocks are accessible in the form of $\frac {ticks}{rf}$, where `rf` us a reference frequency, and `ticks` is the amount of ticks from system start.
We assume that `ticks/rf` may slowly deviate from world time, at rate about several seconds per day.
We also assume that `rf` may deviate during time from it's initial value at slow rate. So that when calibrated, the $b * ticks \over rf$ deviates from world time much slower, e.g. several seconds per months. `b` being calibration factor.
### NTP/correction oracle
Atomic clocks are expensive, so there are some cheaper means to synchronize clocks. There can be RF clock synchronization, however they are rare. We assume that most systems will use NTP to correct their local clocks. However, relaying on NTP servers makes the whole system less centralized. E.g. NTP servers may be used for attacks.
One way to mitigate the problem is to use multiple NTP servers. Still, the NTP ecosystem is semi-centralised, because the amount of "true" time sources is limited.
Our assumptions however, is that most of the time NTP servers are more or less correct (e.g. discrepancy between NTP and world time is low). However, it's possible that it occasionally deviates significantly from the world time.
We also assume that the periods are not correlated between different NTP groups. E.g. the probability that two NTP groups are broken at the same time is negligible.
### Rough idea
We have several sources of time:
- local clocks
- stable in short term, but not reliable enough in long term
- NTP/oracle corrections
- stable in long term, but can occasionally glitch in short term (e.g. an attack or network problems)
- consensus time
- participants can exchange with their local time information to form some distributed clock synchronization activity
- in some sense, we have time stamps from other participants, however, we assume there is an integral value, like a median of delay-corrected timestamps or so.
One may build a corrected time source based on the three sources of time, so that:
- the corrected local time is synchronized between participants
- the corrected local time is synchronized with workd time
- the corrected local time is tolerant to NTP attacks as well as to byzantine behaviour of some participants
- to some degree
## Problem statement
We consider calibration of base clocks in the form of $a + b * t$ so that the discrepancies between different reference times are minimised in some sense.
## Robust time calibration
Consider "local" set up, where we have local quartz clock and time oracle (e.g. NTP server(s)).
Local quartz clock is reliable in short term (e.g. hours, or even days/weeks, when calibrated), but may deviate in long term (days, weeks, months).
We also have world time oracle which can be queried about current world time, so that we can calculate time correction, i.e. difference between "world time" and local quartz time.
We assume the oracle time contains two errors:
- network delay noise, i.e. request delay differs from response delay, expected value is zero
- occasional attacks which are relatively rare and doesn't last long, which can add arbitrary error.
Ignoring the last error, the expected value of oracle time is equal to the world time (we can also assume negligibly small systematic error here).
The idea is to pose a robust linear regression problem. Possibly, in recursive form (robist recursive filtering).
I.e. $y_t = a + b*l_t + e_t + a_t$, where:
- $l_t$ - local quartz time samples (preiodically, once an hour or a day or so)
- $y_t$ - oracle responses at $l_t$
- $e_t$ - zero-mean network noise
- $a_t$ - occasional attack noise, seen like outliers
A robust regression estimator of $a$ and $b$ coeffs (bias and rate) can be constructed, e.g. RANSAC, (sliding window) LAD (piecewise-)linear regression