---
# System prepended metadata

title: Week 02 - Time
tags: [WS2020-IN2259-DS]

---

# Week 02 - Time
###### tags: `WS2020-IN2259-DS`

## 2-1 Time Introduction

### Why is time important for us?
* 在分散式系統中，我們需要：
    * **Coordination**：**協調** nodes 之間的操作，nodes 必須**同意 (agree)** 於某些事情。
    * High degree of parallelism：nodes 可以獨立完成工作。
* Time 在其中扮演所謂的「**參考點 (point of reference)**」，以此讓每個 node 在無需額外溝通手段的情況下跟上進度。
* 然而，Time-keeping 是無法一直完美下去的，因此我們需要同步手段。

### Real time clock (RTC, CMOSC, HWC)
* RTC is used even when the PC is **`hibernated`** or **`switched off`**
    * Based on alternative low power source
    * Cheap quartz crystal (<$1), inaccurate (+/- 1-15 secs/day) 
* Referred to as “**`wall clock`**” time
    * Synchronizes the **system clock** when computer on 
    * Should not be confused with **real-time computing**
    * Neither with hardware clock 
* IRQ 8
    ![](https://i.imgur.com/FEdKHZN.png)
* sysfs interface **`/sys/class/rtc/rtc0 ... n`**
    * UNIX: 
        * cat /sys/class/rtc/rtc0/since_epoch 
        * cat /sys/class/rtc/rtc0/wakealarm

### Computer “clocks”
![](https://i.imgur.com/T4lYdpD.png)

### Universal Time Coordinated (UTC)
* Universal 
    * Standard used around world & Internet (e.g., NTP) 
    * Independent from time zones (UTC 0) 
    * Converted to local time by adding/subtracting local time zone (EST: UTC-5; CET: UTC+2) 
* Coordinated 
    * 400 institutions contribute their estimates of current time (using atomic clocks) 
    * UTC is built by combining these estimates

### Caesium-133 fountain atomic clock in Switzerland
* Uncertainty of one  second in 30 million  years!
* Uncertainty of one second in 15 billion years
* 圖片：
    ![](https://i.imgur.com/Z4NfmOm.png)

### Atomic clocks
* Atomic clock on the market May 11, 2011.   Quoted $1500 with an accuracy of less than 0.5 micro seconds per day.
* Chip-Scale Atomic Clock. The ultimate in precision--the caesium clock--has been miniaturized By Willie D. Jones Posted 16 Mar 2011.
* 圖片：
    ![](https://i.imgur.com/YuqqIqk.png =300x)

### Measuring latency in distributed systems experiments
* 測量第一個伺服器與最後一個伺服器之間的延遲：
    ![](https://i.imgur.com/bK26XoF.png =300x)


### Clock skew & drift
* Clock skew：兩個 clocks 間的**瞬時差異** (**instantaneous difference** )。
* Clock drift：兩個 clocks 間，隨著時間流逝而**持續變化的頻率差** (**variation in frequency** )。

### Summary
* Bad news: Clocks drifts 
* Time keeping is not perfect

### Self-study questions
* Bring all clock accuracies reported in this unit to the same reference frame, e.g., seconds per day 
* Find typical clock accuracies and submit a detailed table with references to the instructor 
    * Best submission will be aired in a future lecture

## 2-2 Cristian's Clock 

### External synchronization
![](https://i.imgur.com/I2FXa0K.png =500x)

### Probabilistic clock synchronization 
* Proposed by F. Cristian (IBM), 1989
* 所謂外部的時鐘同步。
    * 因此也有內部的時鐘同步 (見 [Berkeley Clock](#2-3-Berkeley-Clock))。
* Transmission delay 不受限制，但通常很短，特別是在區域網路內部。
* 因此無法保證獲得一個先驗的 (priori) 時鐘精準度。
* 透過夠多的嘗試次數，便可以很高的機率得到想要的時間精度 (desired clock precision)。
* 主要用於 LANs。

### Synchronization request and reply
* 向 reference time source 要時間：
    * Request involves network **`round trip time (RTT)`**
    * Response **no longer current** by time client receives it 
* Client 必須針對 RTT 來調整從 response 獲得的時間。
* 圖示：
    ![](https://i.imgur.com/QX3OGCj.png =500x)

### Cristian’s algorithm (1989)
* Client 測量至 server 的 RTT。
* Client 假設來回的 delay 是相等的。
* 公式：
$$
T = T^{\prime} + RTT / 2
$$
* 圖示：
    ![](https://i.imgur.com/arxIFTj.png =500x)

### Small improvement
* Make multiple timing requests
    * 就是多量幾次 RTT 的意思。
* 選 RTT 最小的。
* 精度為：$\pm \dfrac{RTT}{2}$
* 圖示：
    ![](https://i.imgur.com/0XY5ACx.png =200x)
    
### RTT/2 Accuracy bound
* 假設 transmission delays 為對稱的 ($RTT/2$)。
* 因此可以估計 $RTT = T_1 - T_0$。
* 最後計算所得的時間為：
    $$
    T_{new} = T_{server} + (T_1 - T_0)/2
    $$
    
### Accuracy bound
* $T_{min}$ 為 transmission delay 的最小值 (目前未知)。
* Accuracy：
    $$
    \pm |(T_1 - T_0)/2 -T_{min}|
    $$
* 圖示：
    ![](https://i.imgur.com/HhuI52G.png =300x)
* 更清楚的圖示：
    ![](https://i.imgur.com/aJk1uH4.png)

### Nota Bene: Factors contributing to RTT
![](https://i.imgur.com/bMtE7j5.png =500x)

### Nota Bene: Errors are cumulative 
* For time requests over a series of hops
* Say Node B synchronizes time with Node C with an accuracy of **`+/- 5 ms`** 
* Then, Node A synchronizes its time with Node B with an accuracy of **`+/- 7 ms`** 
* Then, the net accuracy at Node A is **`+/-(7+5) ms =         +/- 12 ms`**

## 2-3 Berkeley Clock

### Berkeley algorithm overview
* Physical clock synchronization algorithm developed in 1989 as part of TEMPO in BSD 4.3
* Internal clock synchronization: **No node has accurate time source** 
* Performs clock synchronization to set clocks of all nodes to **within a bound of each other** 
* Intended for use in **intranets (LANs)**
* TEMPO synchronized clocks to within 20-25 msin LAN of 15 DEC VAX machines (1989)

### Berkeley algorithm 
* a.k.a. TEMPO algorithm in original literature
* Has a **time daemon** running on all nodes
* Assumes **nodes may be faulty** (crash failure model)
* Key idea – runs periodically at **`designated node`**
    * **Measures time difference** between its clock and clock of all nodes
    * Rejects outliers (based on threshold) 
    * Averages measurements
    * Requests all nodes to adjust their clocks 
* Clock adjustments at each done to respect clocks' monotonicity

### Determine clock differences 
* Leader requests time from followers
    ![](https://i.imgur.com/rWQSgCR.png =500x)
* Nodes reply with their time
    ![](https://i.imgur.com/JTwlGIj.png =350x)
* Leader computes  network time
    $$
    \dfrac{0-5+4+14+2}{5} = \dfrac{15}{5} = 3
    $$
* Compute clock adjustments (Network time is 14:03)
    ![](https://i.imgur.com/anG6bBK.png =400x)

### Determining clock outliers
* Avoid adversely effecting “healthy” clocks
* Outliers represent faulty clocks/nodes (**malfunctioning**, **fast drift**) 
* Leader determines outliers among clock values based on a threshold $\gamma$
* Outliers are not used in averaging, i.e., in computing network time 
* **Leader’s clock may represent an outlier itself!**
* Faulty clocks are adjusted as well
* Clock considered faulty if its value is more than $\gamma$ away from the majority of clocks in system
* $\gamma$ is system-specific configuration parameter

### Summarizing observations
* **Leader election** is separate algorithm
* Local clock adjustment must respect **clock’s monotonicity requirement** – separate algorithm
* Nodes’ clocks can be fast or slow 
* Leader’s request for nodes’ times is impacted by **transmission delay** – needs to be adjusted for 
* Clock adjustments (corrections) are expressed as time differences as opposed to absolute times – no impact from transmission delay

### Self-study questions
* What are some use case scenarios of internal clock synchronization?
* Would resetting a fast clock cause problems? 
* Would advancing a slow clock cause problems? 
* Think of an efficient way to determine outliers. 
* What factors may impact transmission delay of timing requests? 
* What is the impact of transmission delay on absolute timing measures vs. time differences? 
* What is a good choice for $\gamma$?

## 2-4 Clock Adjustment 

### Clock adjustment – the wrong way
* Adjusting the clock is not straight forward
    * T = T’ + RTT/2 (strict no-no - ☹ !)
    * Time must be **continuous** and **increase monotonically**
* Cannot **go back to the past**
    * Timestamps are important, can't repeat them
* Cannot **jump into future – show sudden jumps** 
    * Lose time and miss deadlines
* 因此，時間的調整必須用加速或減速的方式！

### Hardware vs. software clock
* At real time $t$, $H(t)$ represents time on node’s **hardware clock**
* $C(t)$ is time calculated on computer’s **software clock**:
    $$
    C(t) = a * H(t) + b
    $$
* **Clock monotonicity requirement**: 
    $$
    t^{\prime} > t: C(t^{\prime}) > C(t)
    $$
* Achieve monotonicity by adjusting a and b.

### Clock adjustment – the right way
* Two parameters are constant **offset** and **slope**: 
    $$
    C(t) = a * H(t) + b
    $$
* Determine “catch up” value for scaling time down or up in a linear fashion 
    * Run software clock slower vs. faster
    * Until it attains the real time (i.e., time measured) or
    * Until another synchronization is taking place

### Perfect clock
![](https://i.imgur.com/cWBrxGW.png =500x)
 
### Catch up” Wrong Example
![](https://i.imgur.com/DDBEP5y.png =500x)

### Catch up” Right Example
![](https://i.imgur.com/OzTt6Nz.png =500x)

### Summary
* Clock is a continuous function, cannot show sudden jumps (discontinuities) 
* Clock must increase monotonically 
* Either slow clock down or speed it up

### Self-study questions
* Identify a few scenarios where resetting the clock would be detrimental. 
* Identify a few scenarios where setting the clock forward would be detrimental. 
* Lookup how to read the hardware clock on your computer from the command line. 
* How are time zones and daylight savings time accounted for?

## 2-5 Lamport Clock

### Events
* **Event** is an abstraction for anything we’d like to track (timestamp) in a (distributed) system
    * E.g., event may represent instruction execution, method call, order entry, access request, exception...
* Events represent **`message send`** and **`receive`** 
* Often sufficient to know **order of events** instead of events’ exact physical time

### Events within and across nodes
* Within a single node **event order** determined by execution sequence (e.g., relative to a physical clock) 
* Between two different nodes **event order** cannot be determined using local physical clocks, since those **clocks cannot be perfectly synchronized** 
* How then are we to represent time?

### Logical clocks
* Key insight is to **abandon idea of physical time**
* Only care about **order of events**, not when exactly they happened (or how much time between events) 
* Lamport introduced **logical time** and method to synchronize logical clocks in 1978

### The happened-before relation
* Denoted by “**`→`**”
* Describes causal **order of events** in a system
* Definition “**`→`**”: 
    * If **a and b are events** in the same node and **a occurred before b** then **`a → b`**
    * If **a** is the event of **sending a message m** in one node and **b** is the event of **receiving m** in another node then **`a → b`**
    * Relation “**`→`**” is **transitive**: If **`a → b`** and **`b → c`** then **`a → c`** 
* If neither **`a → b`** nor **`b → a`** then a and b are concurrent, denoted by **`a || b`** 

### Causality of “→”-relation
* a.k.a. causality relation
* Intuitively, past events influence future events 
* Influence among causally related events is referred to as causal effects
* If **`a → b`**, assume event a causally effects event b
* Concurrent events **do not causally effect** each other (e.g., neither `a → b` nor `b → a`)

### Example: Happened-before relation
![](https://i.imgur.com/hIFUcCY.png)

### Lamport clock (logical clock)
* **數值化**的追蹤 “**`→`**” (Happened-before relation)。
* 每個 node $P_i$ 都有一個 **logical clock** $C_i$ (a local variable)。
* $C_i$ 會為 $P_i$ 內的每一個事件指派一個**時間 (值)** $C_i(a)$。
* $C_i(a)$ 稱作事件 $a$ 位於 $P_i$ 內的 **timestamp**。
* Timestamps **與 physical time 沒有關係**，因此稱作 logical clock。
* Logical clocks 可以指派 **monotonically increasing timestamps**。
* 可以用 **integer counter** 實作。

### Clock conditions
* 同一個 node 內 ($P_i$)：
    * If $a → b$ then $C_i(a) < C_i(b)$
* 不同 nodes 內 ($P_i$ 與 $P_k$)：
    * If $a$ is the event of **sending a message** at node $P_i$ and $b$ is the event of **receiving that same message** at a different node $P_k$ then $C_i(a) < C_k(b)$

### Logical clock implementation
![](https://i.imgur.com/2804Iap.png =500x)

### Example: Logical clock
![](https://i.imgur.com/bKxr27d.png =500x)
![](https://i.imgur.com/TOvgQWf.png =500x)

### “→” is a unique partial order of events
* Happened-before relation is a **unique partial order** of events in system 
* a, e are concurrent events

### Induced total order
* Induce a **non-unique total order** as follows
    * Use logical time stamps to order events
    * Break ties by **using an arbitrary total ordering** of nodes, 
        * e.g., $P_1 < P_2$ (numerically compare node identifiers) 
    * Thus, timestamp is comprised of logical clock value and identifier, 
        * i.e., $(C_i(a), i)$ 

### Total order of events in system 
* Denoted by “⇒” 
* Timestamps are $(C_i(a), i)$ and $(C_k(b), k)$ 
* 先比 $C_i(a)$ 與 $C_k(b)$，如果相同再比 $i$ 與 $k$。
* 如此就有 **total order of all events** 了！

### Potential causality of “→”-relation 
* “→” captures **potential flow** of information between events 
* In $a → b$, $a$ **might** or **might not** have caused $b$ (relation assumes it has, but we don’t know for sure) 
* Information may have flown in system in ways other than via message passing (not modeled by “→” )

### Summary
* Do away with physical time, focus on **order of events** 
* **Happened-before relation** as unique partial order of events in system 
* Relation tracks **potential causality** of events in system 
* Can induce a **non-unique total order** by agreeing on an arbitrary order of nodes 
* Implement happened-before relation with **integer variable** (essentially a counter) at each node and rules for updating it 
* You cannot determine whether two events are causally related from timestamps alone!

### Self-study questions
* Can all events in a distributed system be ordered?
* What is the difference between a partial order and a total order?
* Why is it important to totally order events?
* If event timestamps are equal, does it always follow that the associated events occurred at different nodes?
* If event timestamps are equal, does it always follow that the associated events are concurrent?
* If clocks are initialized to zero at the beginning of time and dis always one, what conclusions can we draw from looking at timestamps?
* What are applications of Lamport clocks?

## 2-6 Vector Clock
* $n$ 個 nodes 的系統。
* 各個 node 的時鐘 $C_i$ 都記錄著 n 個「時間」 (此為對於 “global” system time 的認知)：
    $$
    C_i = (C_i[1], C_i[2], \cdots, C_i[n])
    $$
* 其中 $C_i[i]$ 為 **local logical time**。

### Vector clock implementation
![](https://i.imgur.com/mXPDEif.png =500x)

### Example: Vector clock 
![](https://i.imgur.com/W4RlbeJ.png)

### Comparing vectors
![](https://i.imgur.com/iMTh4AT.png =500x)

### Example: Comparing vectors
![](https://i.imgur.com/t09PX0Y.png =500x)

### Example: Vector clock
![](https://i.imgur.com/BEBSKg7.png =500x)
![](https://i.imgur.com/0w5GKT0.png =500x)

### Comparing vector clocks
![](https://i.imgur.com/LtgmlY9.png =500x)

### Summary
* Similar context as Lamport Clocks
* Represent knowledge about **logical time** with a **vector of size $n$**
* Each vector component $i$ represents node’s knowledge of **logical clock** at node $i$
* Node’s own vector component **tracks number of events it has timestamped** (assuming $d$ = 1)
* Comparing timestamps based on comparing vectors
* **Isomorphic relationship between timestamps and event order** (i.e., inferences in both direction possible, fundamental difference to Lamport clock)
    * 可以用比較的方式來確認 event order。
    * Lamport clock 不行！

### Self-study questions
* Can all events in a distributed system be ordered?
* Does a vector clock induce a partial or total order?
* Why is it important to totally order events?
* Can two events have equal vector clock timestamps?
* For two events to be concurrent, what does this imply for their timestamps?
* If clocks are initialized to zero at the beginning of time and dis always one, what conclusions can we draw from looking at timestamps?
* What are applications of vector clocks?