consistency models

# consistency models (Some old notes on consistency models). A consistency model is a set of histories of operations ### Read Uncommitted [https://jepsen.io/consistency/models/read-uncommitted](https://jepsen.io/consistency/models/read-uncommitted) - Prohibits *dirty writes* where two transactions modify the same data, but allows all other behavior, including - $w_1(x) … r_2(x)$ - *dirty read* - $r_1(x) … w_2(x)$ - *fuzzy read (read by* `tx1` *then write by* `tx2`*)* ### Read Committed [https://jepsen.io/consistency/models/read-committed](https://jepsen.io/consistency/models/read-committed) - Strengthens Read Uncommitted to prohibit a dirty read, so like read uncommitted, it prohibits dirty writes and now also *dirty reads*, where a transaction is not allowed to observe a write of a transaction that does not commit. Still allows *fuzzy read* and *phantom read* ### Snapshot Isolation [https://jepsen.io/consistency/models/snapshot-isolation](https://jepsen.io/consistency/models/snapshot-isolation) - A system that is snapshot isolated appears for transactions to run on full consistent snapshots of the system state. Transaction execution is broken up into an interval from `StartTimestamp` to `CommitTimestamp`, and when a tx starts execution it does so on a consistent *snapshot.* Any writes the transaction makes are visible only to the transaction itself, and *externally*, writes are visible to all other transactions only at commit time and become visible atomically - thus at that point "creating" the next snapshot. The rule is that if `tx1` starts execution, and before it commits observes a commit `tx2` that then must be in the interval `[StartTimestamp - CommitTimestamp]`, then `tx1` aborts, since `tx2` committed first and we want to guarantee that no writes are lost. Snapshot isolation is therefore "first commit wins". - In other formulation, can also be specified in terms of properties: 1. *internal consistency -* writes that are local in `tx1` are visible to subsequent reads by `tx1` 2. *external consistency* - reads without preceding writes (local) by `tx1` must observe the state as written by some tx `tx0` that has committed before `tx1`'s `StartTimestamp` 3. transactions become visible to all nodes in same order 4. if two transactions `tx1`, `tx2` modify the same object, then one of the transactions must be *visible* to the other, i.e. became visible as committed before the 'later' transaction started execution (any local reads) ### Serializability - Extends Snapshot Isolation to enforce a total order on events, as if the events were executed one at a single point in time - Example of difference between Snapshot Isolation vs. Serializability Imagine a bag (equivalent of a storage table) with black and white marbles - we have 2 transactions: `tx1` - turn all marbles black `tx2` - turn all marbles white If we run these two transactions under Serializable consistency level, then only 2 states are possible: all black marbles or all white marbles. Why? Under serializable execution we can have two valid serializable orders: 1. initial_state →`tx1` → `tx2` → final_state 2. initial_state →`tx2` → `tx1` → final_state If we run these two transactions under Snapshot Isolation consistency level, then we have one extra state - assume that both transactions start execution at the *same exact time*, then `tx1` turns all white into black and `tx2` does the opposite, but when both commit and apply their writes we still have a mixed state with the colors reversed. ### Strict Serializability - Extends Serializability to enforce a real-time ordering on events, with real-time ordering given by theoretical "observer" clocks, such that, if each operation has an invocation time and, should it complete, a strictly greater completion time, both are given by this perfectly synchronized, globally accessible clock - Can think of a strictly serializable database a single linearizable object in which the object is all of the database state - Fixes issues with serializability without linearizability when running in a distributed manner ### Serializability vs. “Strict” Serializability - When running on single-machine deployments, serializable systems are practically also strictly serializable - When running in distributed system and attempting to do distributed transactions, have problems such as - *Stale read* — `tx1` reading and modifying state on replica 1 while `tx2` is routed to replica 2 which has not yet "heard from" via replication (perhaps asynchronous) from the primary replica, therefore resulting in an order `tx2` → `tx1` which violates strict serializability as it violates the real-time order in which the transactions have arrived - *Causal reverse —* if performing two transactions on disjoint sets, such as transferring funds from a savings account to a checking account and then again from a savings account as a payment to another account, if the two transactions get re-ordered, then the state of the checking account may be different and thus violate some safety constraints (such as balance being greater than 0). Example: ```python Checking = 0 Savings = 100 TRANSACTION tx_1 Checking += 100 Savings -= 100 TRANSACTION tx_2 Checking -= 100 Other_bank_account += 100 ``` if `tx_1` and `tx_2` get re-ordered, then the balance of Checking will dip below zero before arriving at the final balance state --- ## Single-object consistencies ### Causal consistency - Causally related events should appear in the same order on all processes, although causally independent events are arbitrarily ordered - Captures the notion of "happened before" relation from Lamport's definition - Example — In a chat application, assume Alice sends a message "lunch?", after which Bob responds "yes" and Zoe responds "no". Causal consistency allows Alice to observe 1. `"yes"` → `"no"` 2. `lunch?` → `"no"` → `"yes"` but never `lunch?` before either `"yes"` or `"no"`, since those are causally-related operations ### Sequential consistency - All operations on processes appear to take place in *some* total order, and that order is consistent with the order of operations on each individual process ### Linearizability - Strengthens Sequential Consistency such that operations appear to take place atomically in a single total order that is consistent with *real time ordering* of those operations - If operation `A` completes before operation `B` begins, then operation `B` should logically take effect on the state after operation `A` --- ### Some notes on "**ANSI SQL Isolation Levels**" *A Critique of ANSI SQL Isolation Levels* - [https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf) "Isolation" defined with the three phenomena — 1. P1 (Dirty Read) - `tx1` applies a write, then `tx2` reads the modified data before `tx1` commits or aborts the transaction, so then if `tx1` does end up aborting, `tx2` read a value that is no longer there and thus never really existed 2. P2 (Fuzzy Read or Non-repeatable read) - `tx1` performs a read, then `tx2` writes or deletes the record that `tx1` has read and commits, so then if `tx1` attempts to re-read the value, it finds it modified and not equal to the previous read 3. P3 (Phantom) - `tx1` performs a read that satisfied a `< search condition filter >` , then `tx2` applies a write that also satisfies the same `< search condition filter >` and commits. If `tx1` performs the same read of `< search condition filter >` again, it finds a record that was not there in the previous read.