Distributed Systems Lecture 1 (Introduction, Clocks) === {%hackmd theme-dark %} Tanenbaum: Chapters 6.1, 6.2, 6.3 Distributed Systems --- ![](https://i.imgur.com/LP4Fw6u.png) Distributed Middleware encompasses all services between OS/Hardware, and the applications. 3 metrics to assess a distributed systems protocol: * Scalability * Correctness * Availability (Fault Tolerance) Clocks === Physical Clock --- ### Centralized Systems Problem also exists in a multiprocessor machine. However, since the processors reside in the same machine, clock could be synchronized by making a system call. ### Distributed Systems In distributed systems, each system has its own timer, which requires message passing in order to synchronize the system clocks. Timers are based on oscillation of quartz crystal, which isn't perfect and drifts away from the true time. Synchronizing timers is important since some applications require a certain level of physical time accuracy. Since it's not possible to create a hardware that is without error, there is a need to do clock synchronization on the software level. In designing applications, it's important to have some tolerance for time inaccuracies. ### Resynchronizing Clocks A typical clock has a relative error of 10^-5^. Normally, the timer has 100 ticks/second, which implies 360,000 ticks/hour. With an error, an hour could have &pm; 4 ticks. This means that in the worst case, two systems drift 8 ticks apart in an hour. e.g. Assuming that we need to maintain all clocks in a distributed systems within 2 ticks, we need to sync all the clocks 4 times per hour. ### Cristian's Algorithm ![](https://i.imgur.com/u90EfwZ.png) Building block of Network Time Protocol. Algorithm: - Client sends a request to time server - Time server responds with a timestamp - Client synchronizes local time with the server provided timestamp ![](https://i.imgur.com/aVbwO71.png) #### Addressing the Assumptions ![](https://i.imgur.com/hM5l6xl.png) **Assumption 1** is resolved by accounting for T~delay~. **Assumption 2** is still required to hold. ![](https://i.imgur.com/JmSxjpy.png) **Assumption 3**: take weighted average of the adjusted time. If &alpha;~i~ are close enough, it makes sense to use this algorithm. However, with larger `n`, it makes more sense to take the average of the errors (i.e. $\alpha_i - T_{\text{client}_i}$) rather than the average of the raw time &alpha;~i~. This is because the values of errors are more constant compared to the values of the raw times. The client clock is then adjusted according to the weighted average of the error. i.e. $Adjust(n) = T_{\text{client}_n} + avg\_client\_error(n)$ The solution to address assumption 3, however, does not solve the issue of asymmetric time taken in the RTT. In order to solve that: 1. When server receives the sync request from the client, record the current server time, $T_{\text{server}}$. 2. The server then records the time taken for it to process #### Assessment Scalability: :negative_squared_cross_mark: With increasing number of clients, time server will be overloaded. Availability: :negative_squared_cross_mark: If time server is down / malicious, it will mess up the clients' clocks Correctness: :heavy_check_mark: In addition, it does not address the problem if the client clock is faster than the server clock. If the client clock adopts a prior time, it may disturb the order of events being sent. The clock should instead be slowed down. ### Berkley Algorithm ![](https://i.imgur.com/BSQxtIH.png) 1. Elect the time server. 2. The elected time server will (periodically) poll all machines that are connected to it and record the local time of each machine. 3. It will then calculate the adjustment that each machine needs to make and send it back to them (adjustment = faster / slower clock). The idea of election means that there is no single server that will always become the time server. This is so that if the current elected time server crashes, the time server will (according to Berkley Algorithm) elect another time server. #### Assumptions in the Algorithm We follow the same set of assumptions as Cristian's Algorithm. 1. RTT is symmetric 2. Client has negligible processing delay 3. Clock drift is negligible ### Network Time Protocol (NTP) ![](https://i.imgur.com/IoPVDo7.png) Time servers are arranged in strata, with the lowest stratum (stratum 1) being the most accurate server. The reason behind the multiple strata is for availability (servers in lower strata could be less responsive compared to those in the higher strata) The reason behind the multiple servers in the same stratum is for backup (redundancy). There are different messages shared depending on the sender and receiver of the messages. #### Between Stratum X and X+1 Servers ![](https://i.imgur.com/nqjlCk7.png) Similar to Cristian's Algorithm #### Within Stratum X Servers Chosen at random, do Cristian's Algorithm to synchronize among servers in the same stratum. #### Between Client & Stratum X Server There are 2 things to consider when a client chooses which time server to sync from: 1. Accuracy of time server (lower stratum = more accurate) 2. Low value/Predictability of RTT (if standard deviation is too high, then it means the time server is likely to be overloaded). The RTT should be low also since this would minimize the impact of clock drift on the client's side. Otherwise, it follows the same algorithm as Cristian's to synchronize from the server. Logical Clock --- Sometimes, in distributed systems, we don't need the absolute time that an action is committed. We just need to know the relative time (i.e. ordering of event). In a logical clock, it is not possible to determine the absolute time of an event occurring. ### Happens-before Relation If something happens before another in a machine, it should follow in another machine. If A->B and B->C, then A->C. However, if A&not;->B, then A and B may proceed concurrently, which means A and B may occur in any unknown order. A distributed system must be able to handle this (regardless of the order) without error. ### Lamport's Logical Clock ![](https://i.imgur.com/cSZV9SE.png) Objective: Assign timestamps to events such that if A&rightarrow;B, then timestamp(A) < timestamp(B) Every time there is a local event at machine $i$, the logical clock, $L_i$ is incremented. Each event in the machine should have a unique timestamp. When a message is sent to another machine, the timestamp should be included in the message. When the other machine receives the message from $i$, $L_j = max(L_i,L_j)+1$ in order to follow the Happens-before Relation. In a logical clock system, an event in machine $i$ must happen before an event in machine $j$, iff. there is a happens-before relations between the two events, and the logical clock value at event ei is less than the logical clock value at event ej. *Note: $L_i$ can be initialized as 0 or 1* Lamport's protocol creates a partial order of events (basically means that it's not the absolute order of events). He proposed that in order to derive the total (global) order of events, tiebreaker (i.e. when there are 2 events from different machines with the same logical clock) should be done based on machine ID. ### Causality Violation Due to the partial ordering nature, there are implications to this protocol. For example, process P1 may receive A first and then B, while P2 might receive B before A. ![](https://i.imgur.com/MWl0iug.png) In this diagram, the partial order (happens-before) is as follows: 1. send(m1) -> send(m2) 2. rec(m2) -> send(m3) 3. send(m3) -> rec (m3) 4. rec (m3) -> rec(m1) Overall 2-4: rec(m2) -> rec(m1) (violates the first relation!) There is a contradiction between 5 and 2-4 This issue arises because the protocol doesn't provide information about where the logical clock value is derived from. In L2.1, the logical clock was incremented due to the synchronization (from remote clock rather than local). This is a problem that could not be solved with the original Lamport's algorithm. #### Potential Causality Violation In designing systems, we assume that: * any action a host takes may be affected by any message it has previously received. * Note that making such a definition more specific would require application-specific knowledge ### Vector Clock ![](https://i.imgur.com/L8ep4xQ.png) #### Algorithm * In local computations, only increment the local clock when there is a process. * Receiver: merge the incoming logical clock * Sender: send message along with the local logical clock. #### Properties of Protocol - Comparison of vector timestamps is crucial for detection of causality violation. - If corresponeding elements in 2 vector clocks are identical, the 2 events are the same event -- vector clocks of different events should never be identical. - Comparing scalar variables is a trivial operation - If A &rightarrow; B, then each element of A's VC is less than or equal to the corresponding element of B's VC. At least one element is less than the corresponding element in B's VC. - Comparing vector clocks / timestamps require element-wise comparison. - If 2 events are concurrent, they will have "mixed" vector clocks such that at least one pair of corresponding elements is "greater than" and at least one corresponding pair of elements is "less than". (e.g. `(1,2,0,0)` and `(2,0,1,0)` are concurrent) #### Exercise ![](https://i.imgur.com/oBAWeaO.png) ### Revisiting Causality Violation With Lamport's LC, it was not possible to detect potential causality violation since there was no information about concurrency among the machines / processes. With VC, this problem is solved. Intuition: if clock B is updated with clock A's value by C, and B receives a message from A with a timestamp that is strictly less than the one that B receives from C, there is a potential causality violation. ![](https://i.imgur.com/qW1cegy.png)