<style> H2{color:#BF0060 !important;} H3{color:#009393 !important;} p{color:Black !important;} li strong {color:#4682b4 !important;} .alert-info{ background-color:#e4edf6 ; color: black; } .alert-warning{color: black;} td{font-weight:bold;} </style> # Coordination and agreement ## **Distributed mutual exclusion** ### **Mutual Exclusion** * **三條件** * ME1(safety) : only 1 in CS * ME2(liveness) : enter/exit CS eventually success ⇨ deadlock-free / no starvation * ME3(ordering) : only required in distribued system * distributed system 才需要此條件 ### **Central Server Algorithm** * **不滿足 ME3 !** * 令 pj 先到 server,但其實 pi HB pj * **步驟** * Process * To enter critical section, process ==sends request== to server * Server * If the token is held then the server doesn't reply but ==queue== the request * Choose the oldest request from queue and ==grant token== * Process * release token ### **Ring-Based Algorithm** * **步驟** * All processes arranged in a logical cycle * token is passed clockwise * 不需要則pass * 需要則 hold, exit CS 再 pass ### **Multicast and logical clock** * **Algo** * Initially :::info state = released ::: * To enter the CS, do 3 steps with request processing deferred (delayed) :::info 1. state = ==WANTED== 2. multicast request to all other processes 3. T = request's timestamp <br> * Wait until (number of replies received == N-1) * state = HELD ::: * On receipt of request〈Timstamp~i~, pi〉at pj ≠ pi :::info If (state == HELD) or (state == WANTED and〈T,pj〉<〈Ti,pi) &nbsp; queue request from pi without replying else &nbsp; reply immediately to pi ::: * To exit CS :::info state = RELEASED reply to any queued requests ::: ### **Maekawa's Voting Algorithm** * **概念** * Permission from ==voting set== 即可 * Intersection of 2 sets cast their ==votes to only one== candidate to guarantee ME1(safety) * Besides repuest, reply, add a new type of msg --- release * **Voting Set** * 條件 :::warning * pi ∈ Vi : 自己要在 Vi * Vi ⋂ Vj ≠ ∅ for any Vi, Vj * |Vi| = K : not necessary, but to be fair * Each pi is contained in M voting set : not necessary ::: * 取法 * To minimize K, K~√N, M = K * V~11~ = row3 U col3 * 由此可見 |Vi| ~ 2√N * | 1 | 2 | ==3== | 4 | | - | - | - | - | | 5 | 6 | ==7== | 8 | | ==9== | ==10== | ==11== | ==12== | | 13 | 14 | ==15== | 16 | * **Algo** * Initally :::info state = RELEASED and voted = FALSE ::: * For pi to enter CS :::info state = WANTED Multicast requests to Vi Wait until (number of replies ==K) state = HELD ::: * On receipt of request from pi at pj (pi can be pj so there might be deadlock) :::info If (state == HELD) or (voted == TRUE) &nbsp; queue request from pi without replying else &nbsp; send reply to pi // modify this to avoid deadlock &nbsp; voted = TRUE ::: * Correct the problem of deadlock : * Deadlock : 都先投自己導致大家 voted 都是 true * Instead of simply sending reply, queue the outstanding requests in HB order. outstanding : 已發出待回應 ## **Election** ### **Ring-Based Election Algo** * **Assume : Asynchronous System** and no failure occur * **Algo** * Initially :::info every process = non-participant ::: * Any process can begin an election :::info Mark itself = participant Send election msg <id~itself~> to the clockwise neighbor ::: * On receipt of an election msg :::info If id~arrived~ > id~itself~ &nbsp; mark itself = participant &nbsp; forward else &nbsp; if I'm already a participant : &nbsp; &nbsp; drop msg &nbsp; else : &nbsp; &nbsp; mark itself = participant, substitute the id and forward * Leader setting :::info * Receive id equal to its own * The largest pi declares itself as th leader &nbsp; Mark itself non-participant &nbsp; Sends an elected msg to a neighbor (differ from election) * When pj ≠ pi received elected msg &nbsp; pj mark itself non-participant, elected~j~ = pi &nbsp; forward the elected msg to its neighbor * 看 p30 圖 ### **Hirschberg-Sinclair Algo** * **Synchronous** * **Algo** * In round k, 送給左右各 2^(k-1)^ 個neighbors,比左右都大才能記進入 round k+1,直到最後剩一個 winner * 看 p32 圖 ### **Bully Algo** * **Synchronous** * **Msg Type** * election * Begins an election when coordinator failed (timeout) * 可能有很多人發現老大掛了,都發動 election * pi with lower id begins election by sending election msgs to higher ids * If no answer &nbsp; It knows it has the highest id &nbsp; ⇨ Elect itself as coordinator by sending coordinator msg to all lower ids * Otherwise &nbsp; wait for coordinator msg to arrive from the new coordinator * coordinator * If pi receives a coordinator msg, set elected~i~ to that id * answer * If a higher id pj receives an election msg, reply answer msg (alive msg) ## **Msg ordering** ### **Notation** * s~s' / r~r' * s≺s' / r≺r' ### **Paradigms** * Asynchronous ⇨ FIFO ⇨ Causality ⇨ Synchronous execution ### **Asynchronous executions** * an execution (E,≺) is an irreflexive partial order * irreflexive : A不會≺自己 * Assymetric:若A≺B,則B不會≺A * Transitive ### **FIFO executions** * an A-execution in which :::info * for all (s,r) and (s',r') ∈ T, &nbsp; (s ~ s' and r ~ r' and s ≺ s') ⇨ r ≺ r'. ::: * can be implimented over a non-FIFO channel by numbering scheme : 給 msg 編號,若收到2號,則 queue 住等1號 ### **Causally Ordered executions (CO)** * an A-execution in which :::info * for all (s,r) and (s',r') ∈ T, &nbsp; (r ~ r' and s ≺ s') ⇨ r ≺ r'. ::: * 看 p41 圖 * 先檢查條件:s ≺ s' 和 r ~ r',其一不成立 ⇨ CO * 兩個都成立則檢查 r ≺ r',成立 ⇨ CO * **實作CO : Delivery event** * 若 s ≺ s' 且 r ~ r',但 m' 先到, 則延遲 delivery(to application) 使 delivery m ≺ m' ## **Synchronous execution by A-execution** * **review** : Asynchronous ⇨ FIFO ⇨ Causality ⇨ Synchronous execution * A-execution (E,≺) is irrefexive partial order * S-execution (E,≺≺) is partial order ### **S-execution** * **概念** * send 跟 receive 視為同時(atomic),不違反 causality * (s, r) ∈ T, s ≺≺ r 且 r ≺≺ s, note that ≺≺ is ==partial order== * **A to S-execution** * An Asynchronous execution is synchronous iff ∃ a mapping from ==E to T== such that * T(s(m)) = T(r(m)) * e~i~ ≺ e~i~' ⇨ T(e~i~) < T(e~i~') for each pi * e~i~ ≺ e~j~ ⇨ T(e~i~) < T(e~j~) if e~i~ and e~j~ 不是一對(s,r) ### **A to S 的 Deadlock scenario** * **原理** * 一次 (s,r) 要等對方收才能完成,當兩人互等 ⇨ deadlock * (s,r) pair 的 sequence 有 cyclic dependency * **RSC (realizable with synchronous communication)** * An A-execution (E,≺) is an RSC-execution iff ∃ ==non-separated linear extension== * linear extension : a total order that satisfies partial order * non-separated : 一組(s,r)之間沒有其他 event * **Crown** * 一串有 HB 的 (s,r) 形成迴圈 * s^0^≺r^1^, s^1^≺r^2^,...s^k-1^≺r^0^ * non-separated linear extension = exp time,所以改找 crown * ∃ cyclic dependency 就不可能有 non-separated 的 linear extension * 有 crown 就不是 RSC * ==crown test== :::info * directed edge : (s1,r1) 和 (s2, r2) 四種≺組合 * 寫出 (s1,r1), (s2, r2),...,依 hb relation 加上 directed edge * 找 directed cycle ⇨ crown ::: * p68 整理 ## **Simulating Syn and Asyn Channel** * **Sync Channel** * 對於不能 RSC 的 A-execution * Decouple pi, pj by model ==Cij== with a control process ==pij== * So that commu. between pi and pij (pij and pj) are syn (while the logical channel Cij is asyn) * **Syn Programs on Asyn systems** * Using ==middleware== to provide sybchrony * msg send is scheduled, middleware waits for an ack * After the ack is received, synchronous send primitive completes. * 對 pi, pj來說自己是 sync/receive,實際上中間利用 Asyn primitives 溝通 ## **Non-determinism** * **造成 non-determinism 原因** * msg delay * not specify the sender (non-deterministic receive call) * send and receive enabled at the same process ### **Msg scheduling problem** * **看 p81** * point out safe schedule * ###### tags: `OS` <img src="https://i.imgur.com/.png" width = "500"/></img>