# Week 05 - Consistency Model
###### tags: `WS2020-IN2259-DS`
## Consistency Model(一致性模型)
### Definition (data-centric consistency model)
* 在 distributed data store 與 a set of clients 之間,達成 **concurrent read/write operations 一致的 contract**。
* Distributed data store:指 replicas, distributed database, shared memory, shared files 等等。
* **Data-centric** consistency models **dictate** the **outcome** of **concurrent reads and writes**。
### Concurrent Operations
* 所有 operation:有特定的順序。
* **Global ordering**:too costly、not scalable
* 如 consensus algorithm。
* 解決方式:使用適合 (**suitable**) 的 **weaker** consistency 就好。
## 5-1 CM - Strict Consistency
### Strict Consistency (嚴格的一致)
* **定義**:所有 read 均取得 **most recent write**。
* Uni-processor systems (單處理器系統) 使用 strict consistency。
* 定義需要假設 **absolute global time** 的存在。
### Special sample execution

### Interpretation of Strict Consistency
* 對於所有節點,write 是**可見的**,直接寫到記憶體 (cache 或 store buffer):instantaneously **visible** to all nodes。
* 需要絕對全域時間:**absolute global time** order is maintained。
* 簡而言之,不管什麼情況之下,**read 一定會得到最近一次 write 的值**。
### Notation
* W(X)a:將 a 寫入到記憶體 X。
* R(X)a:於記憶體 X 讀取到 a。
### Strict Consistency: The Bad News
* 不可能有瞬時的複寫操作 (instantaneously replicate operations)。
* 不可能完美同步時間(Fallacy #2: Latency is zero)。
### Self-study questions
* Verify the timing assumptions in the thought experiment on strict consistency?
* What aspect of a consistency model could be relaxed and why?
* Create some concurrent read and write sequences that conform to strict consistency.
* Create some concurrent operation sequences that violate strict consistency.
* See if the projected outcome of r=0, s=0 can occur on an x86 or due to compiler optimizations.
* Would a quantum tunnel help us achieve stric consistency in a replicated system?
## 5-2 CM - SEQUENTIAL CONSISTENCY
### Definition
* All **processes** 遵守 **sequential order** 執行。
* 而單個 process 裡的 **operations**,同樣是遵守各自 program 賦予的 **sequential order** 執行。
* 比 strict consistency 弱 (weak):**logical** time vs. **physical** time。
### Example
* 同一 process 內,R(x)b 不能在 R(x)a 之前。

### Self-study questions
* Create some concurrent read and write sequences that conform to sequential consistency.
* Create some concurrent operation sequences that violate sequential consistency.
* Is a strictly consistent system, sequentially consistent and vice versa, discuss?
* How could sequential consistency be implemented?
## 5-3 CM - LINEARIZABILITY
### Definition
* 擁有 Sequential consistency。
* $\mathrm{ts_{OP}(x)}$:操作 (operation) read (R) or write (W) 的 timestamp。
* $\mathrm{ts_{OP_1}(x)} \lt \mathrm{ts_{OP_2}(y)}$:$\mathrm{OP_1(x)}$ 先操作。
* 比 strict consistency 弱,比 sequential consistency 強。
* 假設有 **global time** (同 strict consistency),但**不是 absolute** global time (不同於 strict consistency)。
* 假設有 **physical clocks**,只是同步上有個 **bounded error**。
* 如果先後有 W(x)a 與 W(x)b:
* **no overlapping**:讀取會得到 b。
* **overlapping**:讀取結果可能是 a 或 b。
* 不同於 Sequential consistency:基於 **synchronized clocks** 的 ordering。
* 防止 **stale reads** (過時讀取):保證 write 完成後才能 read。
### Example






### Self-study questions
* Create some concurrent read and write sequences that conform to linearizability.
* Create some concurrent operation sequences that violate linearizability.
* Is a sequentially consistent data store linearizable, argue for or against?
* Is a strictly consistent system, sequentially consistent and vice versa, discuss?
* How could linearizability be implemented?
## 5-4 CM - CAUSAL CONSISTENCY (因果一致性)
### Intuition
* 比 sequential consistency 弱。
* 以因果關係來劃分事件 (read or write),分成 **causally related** 與 **not causally related**。
* 與 **happened-before** relation 相似 (cf. Lamport clock)。
* 假設 write A → write B:所有 processes 先看到 A 的結果,再看到 B 的結果。
* **Concurrent writes** **可能會發生**在不同 processes 的不同 order 中。
### Causal Relationship
* **相同 process** 中的 **read followed by write**:causally related
* R(x)a → W(y)b (有可能 y=f(x))
* **相同或不同 process** 中,針對相同空間 (x) 的 **write followed by read**:
* W(x)a → R(x)a
* **Transitivity**:有遞移性,可以畫箭頭串接起來。
* **Independent writes** by **different processes**:concurrent
* W(x)a || W(x)b
### “Potentially” Causally Related

* W(x)a in P1 與 R(x)a in P2:causally related。
* R(x)a in P2 與 W(y)b in P2:**potentially** causally related。
* y 可能會與 x 有關。
* 例如:P2 讀取 x 後進行運算得到 y,然後再將 y 寫回記憶體。
* W(x)a in P1 與 W(y)b in P2:**causal chain** (透過 R(x)a in P2)
### Definition
* **Potentially** causally related **writes**:在所有 processes 視角中,必須是 **same order**。
* **Concurrent writes**:在不同 processes 視角中,可以是 different order。
* 最後,Program order must be met!
### Example 1

### Example 2
* 對於 P3 而言:W(x)c 先於 W(x)b 完成同步,
* 對於 P4 而言:W(x)b 先於 W(x)c 完成同步,

### Example 3
* 要記得同一 process 內的本地因果順序。
* W(x)a in P1 → W(x)c in P1 → R(x)c in P5。
* R(x)a in P5 無法在 W(x)c in P1 後得到。

### Self-study questions
* Create some concurrent read and write sequences that conform to causal consistency.
* Create some concurrent operation sequences that violate causal consistency.
* Is a sequentially consistent data store causally consistent, argue for or against?
* Is a causally consistent data store sequentially consistent, argue for or against?
* How could causal consistency be implemented?
## 5-5 CM - FIFO CONSISTENCY
### Definition
* 來自同一 process 的 writes,在其他 processes 看來都是 in order。
* 來自不同 processes 的 writes,在其他 processes 看來有可能是 diffrent order。
* Easy to maintain:很簡單,就是使用 TCP 直接 in order 順序傳送資料。
### Example 1
* W(x)b in P2 → R(x)b in P3 → W(x)a in P1 → R(x)a in P3 → W(x)c in P2 → R(x)c in P3
* W(x)c in P2 → R(x)c in P3 → W(x)a in P1 →R(x)a in P3 → 不可能 R(x)b in P3!

::: warning
思考:第一張圖是否滿足 causal consistency?
:::
### Example 2
* W(x)a in P1 → R(x)a in P3 → W(x)b in P2 → R(x)b in P3 → W(x)c in P2 → R(x)c in P3
* W(x)b in P2 → R(x)b in P3 → W(x)c in P2 → R(x)c in P3 → W(x)a in P1 → R(x)a in P3

### Self-study Questions
* Create some concurrent read and write sequences that conform to FIFO consistency.
* Create some concurrent operation sequences that violate FIFO consistency.
* Is a sequentially consistent system, FIFO consistent, argue for or against?
* Is a causally consistent system, FIFO consistent, argue for or against?
* Find sample applications of FIFO consistency.
## 5-6 CM - WEAK CONSISTENCY
### Definition
* 並非所有的 process 都需要「看到」所有的 write,更不用說 in same order。
* 透過 **synchronization variable S** 來完成一個同步操作 **Synchronize(S)**。
* 在 **synchronization 之前**的所有操作,order 都是不一致的 (**not consistent**)。
* 在 **synchronization 之後**的所有操作,每個 processes 都可以看到**相同的 outcome**。
### 解釋 (Interpretation)
* Process 會強制把剛寫入的 value,輸出到所有同步中的 replicas。
* Process 會保證「最近一次」的 value 在讀取之前寫入。
* WC Model 強制 **group of operations** 的 **consistency** (而非單獨的 Read and Write)。
* 注意**一整組 (group)**讀寫的**影響 (effect)**!
### Example 1
* P2 & P3 have yet to synchronize, no guarantees about values read

### Example 2
* P1 & P2 agree on value a

### Example 3
* 若 P1 較 P2 晚完成同步,P1 寫入 b 到所有 processes 的 x 裡。
* 若 P2 較 P1 晚完成同步,P2 寫入 a 到所有 processes 的 x 裡。
* 因此,P1 與 P2 在同步後一定會讀取到相同的值。

### Self-study Questions
* Create some concurrent read and write sequences that conform to weak consistency.
* Can weak consistency emulate any of the other consistency models we study?
* How can a weakly consistent data store be implemented?
* Find sample applications of weak consistency.
## 5-7 SUMMARY - DATA-CENTRIC CONSISTENCY MODELS SUMMARY

eager replication
### Where does ... come into the picture?
* Strict:不可能實現。
* Linearizability:最強大的分散式解決方案,透過 eager replication (synchronous)、chain replication 實現,Hbase 與 Bigtable 都有支援。
* Sequential:System-wide consistent reads,如每個人都可以看到依樣順序的貼文與留言。
* Causal:每個人在留言 reply 之前,必須先等待 post 抵達 (送出的 reply 會被郵件伺服器 hold 住,等到收到 post 後才送出)。
* FIFO:收到一堆信,對於同一個寄件人的信件而言,是按照順序的。但是收到不同寄件人的信之間的順序,可能會與真實寄出的順序不同。
* Weak:同步的重責大任交給 developer 進行 explicit 定義。
### Self-study Questions
* Create some concurrent read and write sequences that conform to different consistency models.
* How do consistency models relate to one another, if at all?
* Find sample applications for each consistency model.
## 5-8 CLIENT-CENTRIC CONSISTENCY
### 背景
* 之前討論的一致性模型都是針對 **concurrent read** and **write** operations。
* 但仍然有很多使用案例 (use cases) 是**沒有或很少** **concurrent writes to the same key**,如:
* **DNS**:No write-write conflicts,一個 domain (disjoint partitions) 只有一個負責更新的授權伺服器 (authority)。
* **Key-value stores**:Usually no write-write conflicts,負責更新的伺服器根據 key 分成好幾個 partition,如:Dynamo、Cassandra。
* **WWW**:大量地使用 client-side caching,閱讀相對於較舊的頁面在很多情形下是可以接受的。
### 目標
* **Client-centric** consistency 側重 **single process** 的 **consistent view**,而非強調系統裡的資料一致性。
* 注重 **read-write conflicts**,假設 write-write conflicts 不存在 (也就是沒有或很少 concurrent writes)。
* Client-centric consistency models 描述了當 **single client** 向 **multiple replicas** 進行 **writes/reads** 的情境。
### Eventual Consistency
* Eventual consistency states that all replicas **eventually converge** **(最終會收斂!)** when write operations stop。
* 如:lazy replication using gossiping (cf. replication)
* 沒有 time bound,非常弱 (weak) 的 consistency form,但具有高可用度 (highly available,意即總是可以提供資料,但是可能是舊資料)。
* 如果 client 總是從同一個 replica 讀取資料,那運作上不會有問題。但是,**client** 可能**會從多個 replica 上進行讀取**,這樣就會發生怪異的同步問題 (因為有的 replica 更新了,有的 replica 還沒跟新)。如:
* Client mobility:在公司與家裡連上的 replica 是不同台的。
* Replica failure:原本用的 replica 壞掉惹,自動 failover 到另一台。
### Read Your Writes
* Client 在 A 地 write,之後在 B 地 read。
* Client 在 B 地必須先進行 write,以取得之前在 A 地的 write 結果。
* 案例:

### Monotonic-reads
* Client 在 A 地 read,之後在 B 地 read。
* Client 在 B 地先進行的 write 資料中**必須包含** A 地的 write 資料,目標是 **more recent data**。
* 公式:
* 案例:

### Writes follow reads
* Client 在 A 地 read,之後在 B 地 write。
* Client 在 B 地先進行的的 write 資料中必須**包含** A 地的 read 資料,目標是 **most recently read**。
* 案例:

### Monotonic-writes
* Client 在 A 地 write,之後在 B 地 write。
* B 地必須先進行 A 地進行過的 write,這是為了確保 process 依序完成兩個 writes。
* 就像是上 patch,必須從 baseline 開始一個一個往後執行。
* 案例:

### Summary
* **Eventual consistency**:
* 使用前提:非常少、甚至幾近於無的 concurrent writes。
* 提供:high availability、scalability。
* 案例:Dynamo, Cassandra, Riak, et al.
* **Client-centric consistency** dictates **how reads and writes** should look from the **client’s perspective**:
* Read your writes
* Monotonic-reads
* Monotonic-writes
* Writes follow reads
### Self-study Questions
* Write some concrete operation sequences that illustrate write-write and read-write conflicts in a data store.
* Create some concurrent read and write sequences that conform to different client-centric consistency models.
* Is there a correspondence among data-centric and client-centric consistency models? Could one be used to emulate the other, argue for or against.
* How do client-centric consistency models relate to one another, if at all?
* Find sample applications for each consistency model.
* How can client-centric consistency models be implemented?