# Week 02 - Time ###### tags: `WS2020-IN2259-DS` ## 2-1 Time Introduction ### Why is time important for us? * 在分散式系統中,我們需要: * **Coordination**:**協調** nodes 之間的操作,nodes 必須**同意 (agree)** 於某些事情。 * High degree of parallelism:nodes 可以獨立完成工作。 * Time 在其中扮演所謂的「**參考點 (point of reference)**」,以此讓每個 node 在無需額外溝通手段的情況下跟上進度。 * 然而,Time-keeping 是無法一直完美下去的,因此我們需要同步手段。 ### Real time clock (RTC, CMOSC, HWC) * RTC is used even when the PC is **`hibernated`** or **`switched off`** * Based on alternative low power source * Cheap quartz crystal (<$1), inaccurate (+/- 1-15 secs/day) * Referred to as “**`wall clock`**” time * Synchronizes the **system clock** when computer on * Should not be confused with **real-time computing** * Neither with hardware clock * IRQ 8 ![](https://i.imgur.com/FEdKHZN.png) * sysfs interface **`/sys/class/rtc/rtc0 ... n`** * UNIX: * cat /sys/class/rtc/rtc0/since_epoch * cat /sys/class/rtc/rtc0/wakealarm ### Computer “clocks” ![](https://i.imgur.com/T4lYdpD.png) ### Universal Time Coordinated (UTC) * Universal * Standard used around world & Internet (e.g., NTP) * Independent from time zones (UTC 0) * Converted to local time by adding/subtracting local time zone (EST: UTC-5; CET: UTC+2) * Coordinated * 400 institutions contribute their estimates of current time (using atomic clocks) * UTC is built by combining these estimates ### Caesium-133 fountain atomic clock in Switzerland * Uncertainty of one second in 30 million years! * Uncertainty of one second in 15 billion years * 圖片: ![](https://i.imgur.com/Z4NfmOm.png) ### Atomic clocks * Atomic clock on the market May 11, 2011. Quoted $1500 with an accuracy of less than 0.5 micro seconds per day. * Chip-Scale Atomic Clock. The ultimate in precision--the caesium clock--has been miniaturized By Willie D. Jones Posted 16 Mar 2011. * 圖片: ![](https://i.imgur.com/YuqqIqk.png =300x) ### Measuring latency in distributed systems experiments * 測量第一個伺服器與最後一個伺服器之間的延遲: ![](https://i.imgur.com/bK26XoF.png =300x) ### Clock skew & drift * Clock skew:兩個 clocks 間的**瞬時差異** (**instantaneous difference** )。 * Clock drift:兩個 clocks 間,隨著時間流逝而**持續變化的頻率差** (**variation in frequency** )。 ### Summary * Bad news: Clocks drifts * Time keeping is not perfect ### Self-study questions * Bring all clock accuracies reported in this unit to the same reference frame, e.g., seconds per day * Find typical clock accuracies and submit a detailed table with references to the instructor * Best submission will be aired in a future lecture ## 2-2 Cristian's Clock ### External synchronization ![](https://i.imgur.com/I2FXa0K.png =500x) ### Probabilistic clock synchronization * Proposed by F. Cristian (IBM), 1989 * 所謂外部的時鐘同步。 * 因此也有內部的時鐘同步 (見 [Berkeley Clock](#2-3-Berkeley-Clock))。 * Transmission delay 不受限制,但通常很短,特別是在區域網路內部。 * 因此無法保證獲得一個先驗的 (priori) 時鐘精準度。 * 透過夠多的嘗試次數,便可以很高的機率得到想要的時間精度 (desired clock precision)。 * 主要用於 LANs。 ### Synchronization request and reply * 向 reference time source 要時間: * Request involves network **`round trip time (RTT)`** * Response **no longer current** by time client receives it * Client 必須針對 RTT 來調整從 response 獲得的時間。 * 圖示: ![](https://i.imgur.com/QX3OGCj.png =500x) ### Cristian’s algorithm (1989) * Client 測量至 server 的 RTT。 * Client 假設來回的 delay 是相等的。 * 公式: $$ T = T^{\prime} + RTT / 2 $$ * 圖示: ![](https://i.imgur.com/arxIFTj.png =500x) ### Small improvement * Make multiple timing requests * 就是多量幾次 RTT 的意思。 * 選 RTT 最小的。 * 精度為:$\pm \dfrac{RTT}{2}$ * 圖示: ![](https://i.imgur.com/0XY5ACx.png =200x) ### RTT/2 Accuracy bound * 假設 transmission delays 為對稱的 ($RTT/2$)。 * 因此可以估計 $RTT = T_1 - T_0$。 * 最後計算所得的時間為: $$ T_{new} = T_{server} + (T_1 - T_0)/2 $$ ### Accuracy bound * $T_{min}$ 為 transmission delay 的最小值 (目前未知)。 * Accuracy: $$ \pm |(T_1 - T_0)/2 -T_{min}| $$ * 圖示: ![](https://i.imgur.com/HhuI52G.png =300x) * 更清楚的圖示: ![](https://i.imgur.com/aJk1uH4.png) ### Nota Bene: Factors contributing to RTT ![](https://i.imgur.com/bMtE7j5.png =500x) ### Nota Bene: Errors are cumulative * For time requests over a series of hops * Say Node B synchronizes time with Node C with an accuracy of **`+/- 5 ms`** * Then, Node A synchronizes its time with Node B with an accuracy of **`+/- 7 ms`** * Then, the net accuracy at Node A is **`+/-(7+5) ms = +/- 12 ms`** ## 2-3 Berkeley Clock ### Berkeley algorithm overview * Physical clock synchronization algorithm developed in 1989 as part of TEMPO in BSD 4.3 * Internal clock synchronization: **No node has accurate time source** * Performs clock synchronization to set clocks of all nodes to **within a bound of each other** * Intended for use in **intranets (LANs)** * TEMPO synchronized clocks to within 20-25 msin LAN of 15 DEC VAX machines (1989) ### Berkeley algorithm * a.k.a. TEMPO algorithm in original literature * Has a **time daemon** running on all nodes * Assumes **nodes may be faulty** (crash failure model) * Key idea – runs periodically at **`designated node`** * **Measures time difference** between its clock and clock of all nodes * Rejects outliers (based on threshold) * Averages measurements * Requests all nodes to adjust their clocks * Clock adjustments at each done to respect clocks' monotonicity ### Determine clock differences * Leader requests time from followers ![](https://i.imgur.com/rWQSgCR.png =500x) * Nodes reply with their time ![](https://i.imgur.com/JTwlGIj.png =350x) * Leader computes network time $$ \dfrac{0-5+4+14+2}{5} = \dfrac{15}{5} = 3 $$ * Compute clock adjustments (Network time is 14:03) ![](https://i.imgur.com/anG6bBK.png =400x) ### Determining clock outliers * Avoid adversely effecting “healthy” clocks * Outliers represent faulty clocks/nodes (**malfunctioning**, **fast drift**) * Leader determines outliers among clock values based on a threshold $\gamma$ * Outliers are not used in averaging, i.e., in computing network time * **Leader’s clock may represent an outlier itself!** * Faulty clocks are adjusted as well * Clock considered faulty if its value is more than $\gamma$ away from the majority of clocks in system * $\gamma$ is system-specific configuration parameter ### Summarizing observations * **Leader election** is separate algorithm * Local clock adjustment must respect **clock’s monotonicity requirement** – separate algorithm * Nodes’ clocks can be fast or slow * Leader’s request for nodes’ times is impacted by **transmission delay** – needs to be adjusted for * Clock adjustments (corrections) are expressed as time differences as opposed to absolute times – no impact from transmission delay ### Self-study questions * What are some use case scenarios of internal clock synchronization? * Would resetting a fast clock cause problems? * Would advancing a slow clock cause problems? * Think of an efficient way to determine outliers. * What factors may impact transmission delay of timing requests? * What is the impact of transmission delay on absolute timing measures vs. time differences? * What is a good choice for $\gamma$? ## 2-4 Clock Adjustment ### Clock adjustment – the wrong way * Adjusting the clock is not straight forward * T = T’ + RTT/2 (strict no-no - ☹ !) * Time must be **continuous** and **increase monotonically** * Cannot **go back to the past** * Timestamps are important, can't repeat them * Cannot **jump into future – show sudden jumps** * Lose time and miss deadlines * 因此,時間的調整必須用加速或減速的方式! ### Hardware vs. software clock * At real time $t$, $H(t)$ represents time on node’s **hardware clock** * $C(t)$ is time calculated on computer’s **software clock**: $$ C(t) = a * H(t) + b $$ * **Clock monotonicity requirement**: $$ t^{\prime} > t: C(t^{\prime}) > C(t) $$ * Achieve monotonicity by adjusting a and b. ### Clock adjustment – the right way * Two parameters are constant **offset** and **slope**: $$ C(t) = a * H(t) + b $$ * Determine “catch up” value for scaling time down or up in a linear fashion * Run software clock slower vs. faster * Until it attains the real time (i.e., time measured) or * Until another synchronization is taking place ### Perfect clock ![](https://i.imgur.com/cWBrxGW.png =500x) ### Catch up” Wrong Example ![](https://i.imgur.com/DDBEP5y.png =500x) ### Catch up” Right Example ![](https://i.imgur.com/OzTt6Nz.png =500x) ### Summary * Clock is a continuous function, cannot show sudden jumps (discontinuities) * Clock must increase monotonically * Either slow clock down or speed it up ### Self-study questions * Identify a few scenarios where resetting the clock would be detrimental. * Identify a few scenarios where setting the clock forward would be detrimental. * Lookup how to read the hardware clock on your computer from the command line. * How are time zones and daylight savings time accounted for? ## 2-5 Lamport Clock ### Events * **Event** is an abstraction for anything we’d like to track (timestamp) in a (distributed) system * E.g., event may represent instruction execution, method call, order entry, access request, exception... * Events represent **`message send`** and **`receive`** * Often sufficient to know **order of events** instead of events’ exact physical time ### Events within and across nodes * Within a single node **event order** determined by execution sequence (e.g., relative to a physical clock) * Between two different nodes **event order** cannot be determined using local physical clocks, since those **clocks cannot be perfectly synchronized** * How then are we to represent time? ### Logical clocks * Key insight is to **abandon idea of physical time** * Only care about **order of events**, not when exactly they happened (or how much time between events) * Lamport introduced **logical time** and method to synchronize logical clocks in 1978 ### The happened-before relation * Denoted by “**`→`**” * Describes causal **order of events** in a system * Definition “**`→`**”: * If **a and b are events** in the same node and **a occurred before b** then **`a → b`** * If **a** is the event of **sending a message m** in one node and **b** is the event of **receiving m** in another node then **`a → b`** * Relation “**`→`**” is **transitive**: If **`a → b`** and **`b → c`** then **`a → c`** * If neither **`a → b`** nor **`b → a`** then a and b are concurrent, denoted by **`a || b`** ### Causality of “→”-relation * a.k.a. causality relation * Intuitively, past events influence future events * Influence among causally related events is referred to as causal effects * If **`a → b`**, assume event a causally effects event b * Concurrent events **do not causally effect** each other (e.g., neither `a → b` nor `b → a`) ### Example: Happened-before relation ![](https://i.imgur.com/hIFUcCY.png) ### Lamport clock (logical clock) * **數值化**的追蹤 “**`→`**” (Happened-before relation)。 * 每個 node $P_i$ 都有一個 **logical clock** $C_i$ (a local variable)。 * $C_i$ 會為 $P_i$ 內的每一個事件指派一個**時間 (值)** $C_i(a)$。 * $C_i(a)$ 稱作事件 $a$ 位於 $P_i$ 內的 **timestamp**。 * Timestamps **與 physical time 沒有關係**,因此稱作 logical clock。 * Logical clocks 可以指派 **monotonically increasing timestamps**。 * 可以用 **integer counter** 實作。 ### Clock conditions * 同一個 node 內 ($P_i$): * If $a → b$ then $C_i(a) < C_i(b)$ * 不同 nodes 內 ($P_i$ 與 $P_k$): * If $a$ is the event of **sending a message** at node $P_i$ and $b$ is the event of **receiving that same message** at a different node $P_k$ then $C_i(a) < C_k(b)$ ### Logical clock implementation ![](https://i.imgur.com/2804Iap.png =500x) ### Example: Logical clock ![](https://i.imgur.com/bKxr27d.png =500x) ![](https://i.imgur.com/TOvgQWf.png =500x) ### “→” is a unique partial order of events * Happened-before relation is a **unique partial order** of events in system * a, e are concurrent events ### Induced total order * Induce a **non-unique total order** as follows * Use logical time stamps to order events * Break ties by **using an arbitrary total ordering** of nodes, * e.g., $P_1 < P_2$ (numerically compare node identifiers) * Thus, timestamp is comprised of logical clock value and identifier, * i.e., $(C_i(a), i)$ ### Total order of events in system * Denoted by “⇒” * Timestamps are $(C_i(a), i)$ and $(C_k(b), k)$ * 先比 $C_i(a)$ 與 $C_k(b)$,如果相同再比 $i$ 與 $k$。 * 如此就有 **total order of all events** 了! ### Potential causality of “→”-relation * “→” captures **potential flow** of information between events * In $a → b$, $a$ **might** or **might not** have caused $b$ (relation assumes it has, but we don’t know for sure) * Information may have flown in system in ways other than via message passing (not modeled by “→” ) ### Summary * Do away with physical time, focus on **order of events** * **Happened-before relation** as unique partial order of events in system * Relation tracks **potential causality** of events in system * Can induce a **non-unique total order** by agreeing on an arbitrary order of nodes * Implement happened-before relation with **integer variable** (essentially a counter) at each node and rules for updating it * You cannot determine whether two events are causally related from timestamps alone! ### Self-study questions * Can all events in a distributed system be ordered? * What is the difference between a partial order and a total order? * Why is it important to totally order events? * If event timestamps are equal, does it always follow that the associated events occurred at different nodes? * If event timestamps are equal, does it always follow that the associated events are concurrent? * If clocks are initialized to zero at the beginning of time and dis always one, what conclusions can we draw from looking at timestamps? * What are applications of Lamport clocks? ## 2-6 Vector Clock * $n$ 個 nodes 的系統。 * 各個 node 的時鐘 $C_i$ 都記錄著 n 個「時間」 (此為對於 “global” system time 的認知): $$ C_i = (C_i[1], C_i[2], \cdots, C_i[n]) $$ * 其中 $C_i[i]$ 為 **local logical time**。 ### Vector clock implementation ![](https://i.imgur.com/mXPDEif.png =500x) ### Example: Vector clock ![](https://i.imgur.com/W4RlbeJ.png) ### Comparing vectors ![](https://i.imgur.com/iMTh4AT.png =500x) ### Example: Comparing vectors ![](https://i.imgur.com/t09PX0Y.png =500x) ### Example: Vector clock ![](https://i.imgur.com/BEBSKg7.png =500x) ![](https://i.imgur.com/0w5GKT0.png =500x) ### Comparing vector clocks ![](https://i.imgur.com/LtgmlY9.png =500x) ### Summary * Similar context as Lamport Clocks * Represent knowledge about **logical time** with a **vector of size $n$** * Each vector component $i$ represents node’s knowledge of **logical clock** at node $i$ * Node’s own vector component **tracks number of events it has timestamped** (assuming $d$ = 1) * Comparing timestamps based on comparing vectors * **Isomorphic relationship between timestamps and event order** (i.e., inferences in both direction possible, fundamental difference to Lamport clock) * 可以用比較的方式來確認 event order。 * Lamport clock 不行! ### Self-study questions * Can all events in a distributed system be ordered? * Does a vector clock induce a partial or total order? * Why is it important to totally order events? * Can two events have equal vector clock timestamps? * For two events to be concurrent, what does this imply for their timestamps? * If clocks are initialized to zero at the beginning of time and dis always one, what conclusions can we draw from looking at timestamps? * What are applications of vector clocks?