# Week 02 - Time
###### tags: `WS2020-IN2259-DS`
## 2-1 Time Introduction
### Why is time important for us?
* 在分散式系統中,我們需要:
* **Coordination**:**協調** nodes 之間的操作,nodes 必須**同意 (agree)** 於某些事情。
* High degree of parallelism:nodes 可以獨立完成工作。
* Time 在其中扮演所謂的「**參考點 (point of reference)**」,以此讓每個 node 在無需額外溝通手段的情況下跟上進度。
* 然而,Time-keeping 是無法一直完美下去的,因此我們需要同步手段。
### Real time clock (RTC, CMOSC, HWC)
* RTC is used even when the PC is **`hibernated`** or **`switched off`**
* Based on alternative low power source
* Cheap quartz crystal (<$1), inaccurate (+/- 1-15 secs/day)
* Referred to as “**`wall clock`**” time
* Synchronizes the **system clock** when computer on
* Should not be confused with **real-time computing**
* Neither with hardware clock
* IRQ 8

* sysfs interface **`/sys/class/rtc/rtc0 ... n`**
* UNIX:
* cat /sys/class/rtc/rtc0/since_epoch
* cat /sys/class/rtc/rtc0/wakealarm
### Computer “clocks”

### Universal Time Coordinated (UTC)
* Universal
* Standard used around world & Internet (e.g., NTP)
* Independent from time zones (UTC 0)
* Converted to local time by adding/subtracting local time zone (EST: UTC-5; CET: UTC+2)
* Coordinated
* 400 institutions contribute their estimates of current time (using atomic clocks)
* UTC is built by combining these estimates
### Caesium-133 fountain atomic clock in Switzerland
* Uncertainty of one second in 30 million years!
* Uncertainty of one second in 15 billion years
* 圖片:

### Atomic clocks
* Atomic clock on the market May 11, 2011. Quoted $1500 with an accuracy of less than 0.5 micro seconds per day.
* Chip-Scale Atomic Clock. The ultimate in precision--the caesium clock--has been miniaturized By Willie D. Jones Posted 16 Mar 2011.
* 圖片:

### Measuring latency in distributed systems experiments
* 測量第一個伺服器與最後一個伺服器之間的延遲:

### Clock skew & drift
* Clock skew:兩個 clocks 間的**瞬時差異** (**instantaneous difference** )。
* Clock drift:兩個 clocks 間,隨著時間流逝而**持續變化的頻率差** (**variation in frequency** )。
### Summary
* Bad news: Clocks drifts
* Time keeping is not perfect
### Self-study questions
* Bring all clock accuracies reported in this unit to the same reference frame, e.g., seconds per day
* Find typical clock accuracies and submit a detailed table with references to the instructor
* Best submission will be aired in a future lecture
## 2-2 Cristian's Clock
### External synchronization

### Probabilistic clock synchronization
* Proposed by F. Cristian (IBM), 1989
* 所謂外部的時鐘同步。
* 因此也有內部的時鐘同步 (見 [Berkeley Clock](#2-3-Berkeley-Clock))。
* Transmission delay 不受限制,但通常很短,特別是在區域網路內部。
* 因此無法保證獲得一個先驗的 (priori) 時鐘精準度。
* 透過夠多的嘗試次數,便可以很高的機率得到想要的時間精度 (desired clock precision)。
* 主要用於 LANs。
### Synchronization request and reply
* 向 reference time source 要時間:
* Request involves network **`round trip time (RTT)`**
* Response **no longer current** by time client receives it
* Client 必須針對 RTT 來調整從 response 獲得的時間。
* 圖示:

### Cristian’s algorithm (1989)
* Client 測量至 server 的 RTT。
* Client 假設來回的 delay 是相等的。
* 公式:
$$
T = T^{\prime} + RTT / 2
$$
* 圖示:

### Small improvement
* Make multiple timing requests
* 就是多量幾次 RTT 的意思。
* 選 RTT 最小的。
* 精度為:$\pm \dfrac{RTT}{2}$
* 圖示:

### RTT/2 Accuracy bound
* 假設 transmission delays 為對稱的 ($RTT/2$)。
* 因此可以估計 $RTT = T_1 - T_0$。
* 最後計算所得的時間為:
$$
T_{new} = T_{server} + (T_1 - T_0)/2
$$
### Accuracy bound
* $T_{min}$ 為 transmission delay 的最小值 (目前未知)。
* Accuracy:
$$
\pm |(T_1 - T_0)/2 -T_{min}|
$$
* 圖示:

* 更清楚的圖示:

### Nota Bene: Factors contributing to RTT

### Nota Bene: Errors are cumulative
* For time requests over a series of hops
* Say Node B synchronizes time with Node C with an accuracy of **`+/- 5 ms`**
* Then, Node A synchronizes its time with Node B with an accuracy of **`+/- 7 ms`**
* Then, the net accuracy at Node A is **`+/-(7+5) ms = +/- 12 ms`**
## 2-3 Berkeley Clock
### Berkeley algorithm overview
* Physical clock synchronization algorithm developed in 1989 as part of TEMPO in BSD 4.3
* Internal clock synchronization: **No node has accurate time source**
* Performs clock synchronization to set clocks of all nodes to **within a bound of each other**
* Intended for use in **intranets (LANs)**
* TEMPO synchronized clocks to within 20-25 msin LAN of 15 DEC VAX machines (1989)
### Berkeley algorithm
* a.k.a. TEMPO algorithm in original literature
* Has a **time daemon** running on all nodes
* Assumes **nodes may be faulty** (crash failure model)
* Key idea – runs periodically at **`designated node`**
* **Measures time difference** between its clock and clock of all nodes
* Rejects outliers (based on threshold)
* Averages measurements
* Requests all nodes to adjust their clocks
* Clock adjustments at each done to respect clocks' monotonicity
### Determine clock differences
* Leader requests time from followers

* Nodes reply with their time

* Leader computes network time
$$
\dfrac{0-5+4+14+2}{5} = \dfrac{15}{5} = 3
$$
* Compute clock adjustments (Network time is 14:03)

### Determining clock outliers
* Avoid adversely effecting “healthy” clocks
* Outliers represent faulty clocks/nodes (**malfunctioning**, **fast drift**)
* Leader determines outliers among clock values based on a threshold $\gamma$
* Outliers are not used in averaging, i.e., in computing network time
* **Leader’s clock may represent an outlier itself!**
* Faulty clocks are adjusted as well
* Clock considered faulty if its value is more than $\gamma$ away from the majority of clocks in system
* $\gamma$ is system-specific configuration parameter
### Summarizing observations
* **Leader election** is separate algorithm
* Local clock adjustment must respect **clock’s monotonicity requirement** – separate algorithm
* Nodes’ clocks can be fast or slow
* Leader’s request for nodes’ times is impacted by **transmission delay** – needs to be adjusted for
* Clock adjustments (corrections) are expressed as time differences as opposed to absolute times – no impact from transmission delay
### Self-study questions
* What are some use case scenarios of internal clock synchronization?
* Would resetting a fast clock cause problems?
* Would advancing a slow clock cause problems?
* Think of an efficient way to determine outliers.
* What factors may impact transmission delay of timing requests?
* What is the impact of transmission delay on absolute timing measures vs. time differences?
* What is a good choice for $\gamma$?
## 2-4 Clock Adjustment
### Clock adjustment – the wrong way
* Adjusting the clock is not straight forward
* T = T’ + RTT/2 (strict no-no - ☹ !)
* Time must be **continuous** and **increase monotonically**
* Cannot **go back to the past**
* Timestamps are important, can't repeat them
* Cannot **jump into future – show sudden jumps**
* Lose time and miss deadlines
* 因此,時間的調整必須用加速或減速的方式!
### Hardware vs. software clock
* At real time $t$, $H(t)$ represents time on node’s **hardware clock**
* $C(t)$ is time calculated on computer’s **software clock**:
$$
C(t) = a * H(t) + b
$$
* **Clock monotonicity requirement**:
$$
t^{\prime} > t: C(t^{\prime}) > C(t)
$$
* Achieve monotonicity by adjusting a and b.
### Clock adjustment – the right way
* Two parameters are constant **offset** and **slope**:
$$
C(t) = a * H(t) + b
$$
* Determine “catch up” value for scaling time down or up in a linear fashion
* Run software clock slower vs. faster
* Until it attains the real time (i.e., time measured) or
* Until another synchronization is taking place
### Perfect clock

### Catch up” Wrong Example

### Catch up” Right Example

### Summary
* Clock is a continuous function, cannot show sudden jumps (discontinuities)
* Clock must increase monotonically
* Either slow clock down or speed it up
### Self-study questions
* Identify a few scenarios where resetting the clock would be detrimental.
* Identify a few scenarios where setting the clock forward would be detrimental.
* Lookup how to read the hardware clock on your computer from the command line.
* How are time zones and daylight savings time accounted for?
## 2-5 Lamport Clock
### Events
* **Event** is an abstraction for anything we’d like to track (timestamp) in a (distributed) system
* E.g., event may represent instruction execution, method call, order entry, access request, exception...
* Events represent **`message send`** and **`receive`**
* Often sufficient to know **order of events** instead of events’ exact physical time
### Events within and across nodes
* Within a single node **event order** determined by execution sequence (e.g., relative to a physical clock)
* Between two different nodes **event order** cannot be determined using local physical clocks, since those **clocks cannot be perfectly synchronized**
* How then are we to represent time?
### Logical clocks
* Key insight is to **abandon idea of physical time**
* Only care about **order of events**, not when exactly they happened (or how much time between events)
* Lamport introduced **logical time** and method to synchronize logical clocks in 1978
### The happened-before relation
* Denoted by “**`→`**”
* Describes causal **order of events** in a system
* Definition “**`→`**”:
* If **a and b are events** in the same node and **a occurred before b** then **`a → b`**
* If **a** is the event of **sending a message m** in one node and **b** is the event of **receiving m** in another node then **`a → b`**
* Relation “**`→`**” is **transitive**: If **`a → b`** and **`b → c`** then **`a → c`**
* If neither **`a → b`** nor **`b → a`** then a and b are concurrent, denoted by **`a || b`**
### Causality of “→”-relation
* a.k.a. causality relation
* Intuitively, past events influence future events
* Influence among causally related events is referred to as causal effects
* If **`a → b`**, assume event a causally effects event b
* Concurrent events **do not causally effect** each other (e.g., neither `a → b` nor `b → a`)
### Example: Happened-before relation

### Lamport clock (logical clock)
* **數值化**的追蹤 “**`→`**” (Happened-before relation)。
* 每個 node $P_i$ 都有一個 **logical clock** $C_i$ (a local variable)。
* $C_i$ 會為 $P_i$ 內的每一個事件指派一個**時間 (值)** $C_i(a)$。
* $C_i(a)$ 稱作事件 $a$ 位於 $P_i$ 內的 **timestamp**。
* Timestamps **與 physical time 沒有關係**,因此稱作 logical clock。
* Logical clocks 可以指派 **monotonically increasing timestamps**。
* 可以用 **integer counter** 實作。
### Clock conditions
* 同一個 node 內 ($P_i$):
* If $a → b$ then $C_i(a) < C_i(b)$
* 不同 nodes 內 ($P_i$ 與 $P_k$):
* If $a$ is the event of **sending a message** at node $P_i$ and $b$ is the event of **receiving that same message** at a different node $P_k$ then $C_i(a) < C_k(b)$
### Logical clock implementation

### Example: Logical clock


### “→” is a unique partial order of events
* Happened-before relation is a **unique partial order** of events in system
* a, e are concurrent events
### Induced total order
* Induce a **non-unique total order** as follows
* Use logical time stamps to order events
* Break ties by **using an arbitrary total ordering** of nodes,
* e.g., $P_1 < P_2$ (numerically compare node identifiers)
* Thus, timestamp is comprised of logical clock value and identifier,
* i.e., $(C_i(a), i)$
### Total order of events in system
* Denoted by “⇒”
* Timestamps are $(C_i(a), i)$ and $(C_k(b), k)$
* 先比 $C_i(a)$ 與 $C_k(b)$,如果相同再比 $i$ 與 $k$。
* 如此就有 **total order of all events** 了!
### Potential causality of “→”-relation
* “→” captures **potential flow** of information between events
* In $a → b$, $a$ **might** or **might not** have caused $b$ (relation assumes it has, but we don’t know for sure)
* Information may have flown in system in ways other than via message passing (not modeled by “→” )
### Summary
* Do away with physical time, focus on **order of events**
* **Happened-before relation** as unique partial order of events in system
* Relation tracks **potential causality** of events in system
* Can induce a **non-unique total order** by agreeing on an arbitrary order of nodes
* Implement happened-before relation with **integer variable** (essentially a counter) at each node and rules for updating it
* You cannot determine whether two events are causally related from timestamps alone!
### Self-study questions
* Can all events in a distributed system be ordered?
* What is the difference between a partial order and a total order?
* Why is it important to totally order events?
* If event timestamps are equal, does it always follow that the associated events occurred at different nodes?
* If event timestamps are equal, does it always follow that the associated events are concurrent?
* If clocks are initialized to zero at the beginning of time and dis always one, what conclusions can we draw from looking at timestamps?
* What are applications of Lamport clocks?
## 2-6 Vector Clock
* $n$ 個 nodes 的系統。
* 各個 node 的時鐘 $C_i$ 都記錄著 n 個「時間」 (此為對於 “global” system time 的認知):
$$
C_i = (C_i[1], C_i[2], \cdots, C_i[n])
$$
* 其中 $C_i[i]$ 為 **local logical time**。
### Vector clock implementation

### Example: Vector clock

### Comparing vectors

### Example: Comparing vectors

### Example: Vector clock


### Comparing vector clocks

### Summary
* Similar context as Lamport Clocks
* Represent knowledge about **logical time** with a **vector of size $n$**
* Each vector component $i$ represents node’s knowledge of **logical clock** at node $i$
* Node’s own vector component **tracks number of events it has timestamped** (assuming $d$ = 1)
* Comparing timestamps based on comparing vectors
* **Isomorphic relationship between timestamps and event order** (i.e., inferences in both direction possible, fundamental difference to Lamport clock)
* 可以用比較的方式來確認 event order。
* Lamport clock 不行!
### Self-study questions
* Can all events in a distributed system be ordered?
* Does a vector clock induce a partial or total order?
* Why is it important to totally order events?
* Can two events have equal vector clock timestamps?
* For two events to be concurrent, what does this imply for their timestamps?
* If clocks are initialized to zero at the beginning of time and dis always one, what conclusions can we draw from looking at timestamps?
* What are applications of vector clocks?