<style>
H2{color:#BF0060 !important;}
H3{color:#009393 !important;}
p{color:Black !important;}
li strong {color:#4682b4 !important;}
.alert-info{
background-color:#e4edf6 ;
color: black;
}
.alert-warning{color: black;}
td{font-weight:bold;}
</style>
# Coordination and agreement
## **Distributed mutual exclusion**
### **Mutual Exclusion**
* **三條件**
* ME1(safety) : only 1 in CS
* ME2(liveness) : enter/exit CS eventually success ⇨ deadlock-free / no starvation
* ME3(ordering) : only required in distribued system
* distributed system 才需要此條件
### **Central Server Algorithm**
* **不滿足 ME3 !**
* 令 pj 先到 server,但其實 pi HB pj
* **步驟**
* Process
* To enter critical section, process ==sends request== to server
* Server
* If the token is held then the server doesn't reply but ==queue== the request
* Choose the oldest request from queue and ==grant token==
* Process
* release token
### **Ring-Based Algorithm**
* **步驟**
* All processes arranged in a logical cycle
* token is passed clockwise
* 不需要則pass
* 需要則 hold, exit CS 再 pass
### **Multicast and logical clock**
* **Algo**
* Initially
:::info
state = released
:::
* To enter the CS, do 3 steps with request processing deferred (delayed)
:::info
1. state = ==WANTED==
2. multicast request to all other processes
3. T = request's timestamp
<br>
* Wait until (number of replies received == N-1)
* state = HELD
:::
* On receipt of request〈Timstamp~i~, pi〉at pj ≠ pi
:::info
If (state == HELD) or (state == WANTED and〈T,pj〉<〈Ti,pi)
queue request from pi without replying
else
reply immediately to pi
:::
* To exit CS
:::info
state = RELEASED
reply to any queued requests
:::
### **Maekawa's Voting Algorithm**
* **概念**
* Permission from ==voting set== 即可
* Intersection of 2 sets cast their ==votes to only one== candidate to guarantee ME1(safety)
* Besides repuest, reply, add a new type of msg --- release
* **Voting Set**
* 條件
:::warning
* pi ∈ Vi : 自己要在 Vi
* Vi ⋂ Vj ≠ ∅ for any Vi, Vj
* |Vi| = K : not necessary, but to be fair
* Each pi is contained in M voting set : not necessary
:::
* 取法
* To minimize K, K~√N, M = K
* V~11~ = row3 U col3
* 由此可見 |Vi| ~ 2√N
* | 1 | 2 | ==3== | 4 |
| - | - | - | - |
| 5 | 6 | ==7== | 8 |
| ==9== | ==10== | ==11== | ==12== |
| 13 | 14 | ==15== | 16 |
* **Algo**
* Initally
:::info
state = RELEASED and voted = FALSE
:::
* For pi to enter CS
:::info
state = WANTED
Multicast requests to Vi
Wait until (number of replies ==K)
state = HELD
:::
* On receipt of request from pi at pj (pi can be pj so there might be deadlock)
:::info
If (state == HELD) or (voted == TRUE)
queue request from pi without replying
else
send reply to pi // modify this to avoid deadlock
voted = TRUE
:::
* Correct the problem of deadlock :
* Deadlock : 都先投自己導致大家 voted 都是 true
* Instead of simply sending reply, queue the outstanding requests in HB order.
outstanding : 已發出待回應
## **Election**
### **Ring-Based Election Algo**
* **Assume : Asynchronous System** and no failure occur
* **Algo**
* Initially
:::info
every process = non-participant
:::
* Any process can begin an election
:::info
Mark itself = participant
Send election msg <id~itself~> to the clockwise neighbor
:::
* On receipt of an election msg
:::info
If id~arrived~ > id~itself~
mark itself = participant
forward
else
if I'm already a participant :
drop msg
else :
mark itself = participant, substitute the id and forward
* Leader setting
:::info
* Receive id equal to its own
* The largest pi declares itself as th leader
Mark itself non-participant
Sends an elected msg to a neighbor (differ from election)
* When pj ≠ pi received elected msg
pj mark itself non-participant, elected~j~ = pi
forward the elected msg to its neighbor
* 看 p30 圖
### **Hirschberg-Sinclair Algo**
* **Synchronous**
* **Algo**
* In round k, 送給左右各 2^(k-1)^ 個neighbors,比左右都大才能記進入 round k+1,直到最後剩一個 winner
* 看 p32 圖
### **Bully Algo**
* **Synchronous**
* **Msg Type**
* election
* Begins an election when coordinator failed (timeout)
* 可能有很多人發現老大掛了,都發動 election
* pi with lower id begins election by sending election msgs to higher ids
* If no answer
It knows it has the highest id
⇨ Elect itself as coordinator by sending coordinator msg to all lower ids
* Otherwise
wait for coordinator msg to arrive from the new coordinator
* coordinator
* If pi receives a coordinator msg, set elected~i~ to that id
* answer
* If a higher id pj receives an election msg, reply answer msg (alive msg)
## **Msg ordering**
### **Notation**
* s~s' / r~r'
* s≺s' / r≺r'
### **Paradigms**
* Asynchronous ⇨ FIFO ⇨ Causality ⇨ Synchronous execution
### **Asynchronous executions**
* an execution (E,≺) is an irreflexive partial order
* irreflexive : A不會≺自己
* Assymetric:若A≺B,則B不會≺A
* Transitive
### **FIFO executions**
* an A-execution in which
:::info
* for all (s,r) and (s',r') ∈ T,
(s ~ s' and r ~ r' and s ≺ s') ⇨ r ≺ r'.
:::
* can be implimented over a non-FIFO channel by numbering scheme : 給 msg 編號,若收到2號,則 queue 住等1號
### **Causally Ordered executions (CO)**
* an A-execution in which
:::info
* for all (s,r) and (s',r') ∈ T,
(r ~ r' and s ≺ s') ⇨ r ≺ r'.
:::
* 看 p41 圖
* 先檢查條件:s ≺ s' 和 r ~ r',其一不成立 ⇨ CO
* 兩個都成立則檢查 r ≺ r',成立 ⇨ CO
* **實作CO : Delivery event**
* 若 s ≺ s' 且 r ~ r',但 m' 先到,
則延遲 delivery(to application) 使 delivery m ≺ m'
## **Synchronous execution by A-execution**
* **review** : Asynchronous ⇨ FIFO ⇨ Causality ⇨ Synchronous execution
* A-execution (E,≺) is irrefexive partial order
* S-execution (E,≺≺) is partial order
### **S-execution**
* **概念**
* send 跟 receive 視為同時(atomic),不違反 causality
* (s, r) ∈ T, s ≺≺ r 且 r ≺≺ s, note that ≺≺ is ==partial order==
* **A to S-execution**
* An Asynchronous execution is synchronous iff ∃ a mapping from ==E to T== such that
* T(s(m)) = T(r(m))
* e~i~ ≺ e~i~' ⇨ T(e~i~) < T(e~i~') for each pi
* e~i~ ≺ e~j~ ⇨ T(e~i~) < T(e~j~) if e~i~ and e~j~ 不是一對(s,r)
### **A to S 的 Deadlock scenario**
* **原理**
* 一次 (s,r) 要等對方收才能完成,當兩人互等 ⇨ deadlock
* (s,r) pair 的 sequence 有 cyclic dependency
* **RSC (realizable with synchronous communication)**
* An A-execution (E,≺) is an RSC-execution iff ∃ ==non-separated linear extension==
* linear extension : a total order that satisfies partial order
* non-separated : 一組(s,r)之間沒有其他 event
* **Crown**
* 一串有 HB 的 (s,r) 形成迴圈
* s^0^≺r^1^, s^1^≺r^2^,...s^k-1^≺r^0^
* non-separated linear extension = exp time,所以改找 crown
* ∃ cyclic dependency 就不可能有 non-separated 的 linear extension
* 有 crown 就不是 RSC
* ==crown test==
:::info
* directed edge : (s1,r1) 和 (s2, r2) 四種≺組合
* 寫出 (s1,r1), (s2, r2),...,依 hb relation 加上 directed edge
* 找 directed cycle ⇨ crown
:::
* p68 整理
## **Simulating Syn and Asyn Channel**
* **Sync Channel**
* 對於不能 RSC 的 A-execution
* Decouple pi, pj by model ==Cij== with a control process ==pij==
* So that commu. between pi and pij (pij and pj) are syn (while the logical channel Cij is asyn)
* **Syn Programs on Asyn systems**
* Using ==middleware== to provide sybchrony
* msg send is scheduled, middleware waits for an ack
* After the ack is received, synchronous send primitive completes.
* 對 pi, pj來說自己是 sync/receive,實際上中間利用 Asyn primitives 溝通
## **Non-determinism**
* **造成 non-determinism 原因**
* msg delay
* not specify the sender (non-deterministic receive call)
* send and receive enabled at the same process
### **Msg scheduling problem**
* **看 p81**
* point out safe schedule
*
###### tags: `OS`
<img src="https://i.imgur.com/.png" width = "500"/></img>