# consistency models
(Some old notes on consistency models). A consistency model is a set of histories of operations
### Read Uncommitted
[https://jepsen.io/consistency/models/read-uncommitted](https://jepsen.io/consistency/models/read-uncommitted)
- Prohibits *dirty writes* where two transactions modify the same data, but allows all other behavior, including
- $w_1(x) … r_2(x)$ - *dirty read*
- $r_1(x) … w_2(x)$ - *fuzzy read (read by* `tx1` *then write by* `tx2`*)*
### Read Committed
[https://jepsen.io/consistency/models/read-committed](https://jepsen.io/consistency/models/read-committed)
- Strengthens Read Uncommitted to prohibit a dirty read, so like read uncommitted, it prohibits dirty writes and now also *dirty reads*, where a transaction is not allowed to observe a write of a transaction that does not commit. Still allows *fuzzy read* and *phantom read*
### Snapshot Isolation
[https://jepsen.io/consistency/models/snapshot-isolation](https://jepsen.io/consistency/models/snapshot-isolation)
- A system that is snapshot isolated appears for transactions to run on full consistent snapshots of the system state. Transaction execution is broken up into an interval from `StartTimestamp` to `CommitTimestamp`, and when a tx starts execution it does so on a consistent *snapshot.* Any writes the transaction makes are visible only to the transaction itself, and *externally*, writes are visible to all other transactions only at commit time and become visible atomically - thus at that point "creating" the next snapshot. The rule is that if `tx1` starts execution, and before it commits observes a commit `tx2` that then must be in the interval `[StartTimestamp - CommitTimestamp]`, then `tx1` aborts, since `tx2` committed first and we want to guarantee that no writes are lost. Snapshot isolation is therefore "first commit wins".
- In other formulation, can also be specified in terms of properties:
1. *internal consistency -* writes that are local in `tx1` are visible to subsequent reads by `tx1`
2. *external consistency* - reads without preceding writes (local) by `tx1` must observe the state as written by some tx `tx0` that has committed before `tx1`'s `StartTimestamp`
3. transactions become visible to all nodes in same order
4. if two transactions `tx1`, `tx2` modify the same object, then one of the transactions must be *visible* to the other, i.e. became visible as committed before the 'later' transaction started execution (any local reads)
### Serializability
- Extends Snapshot Isolation to enforce a total order on events, as if the events were executed one at a single point in time
- Example of difference between Snapshot Isolation vs. Serializability
Imagine a bag (equivalent of a storage table) with black and white marbles - we have 2 transactions:
`tx1` - turn all marbles black
`tx2` - turn all marbles white
If we run these two transactions under Serializable consistency level, then only 2 states are possible: all black marbles or all white marbles. Why? Under serializable execution we can have two valid serializable orders:
1. initial_state →`tx1` → `tx2` → final_state
2. initial_state →`tx2` → `tx1` → final_state
If we run these two transactions under Snapshot Isolation consistency level, then we have one extra state - assume that both transactions start execution at the *same exact time*, then `tx1` turns all white into black and `tx2` does the opposite, but when both commit and apply their writes we still have a mixed state with the colors reversed.
### Strict Serializability
- Extends Serializability to enforce a real-time ordering on events, with real-time ordering given by theoretical "observer" clocks, such that, if each operation has an invocation time and, should it complete, a strictly greater completion time, both are given by this perfectly synchronized, globally accessible clock
- Can think of a strictly serializable database a single linearizable object in which the object is all of the database state
- Fixes issues with serializability without linearizability when running in a distributed manner
### Serializability vs. “Strict” Serializability
- When running on single-machine deployments, serializable systems are practically also strictly serializable
- When running in distributed system and attempting to do distributed transactions, have problems such as
- *Stale read* — `tx1` reading and modifying state on replica 1 while `tx2` is routed to replica 2 which has not yet "heard from" via replication (perhaps asynchronous) from the primary replica, therefore resulting in an order `tx2` → `tx1` which violates strict serializability as it violates the real-time order in which the transactions have arrived
- *Causal reverse —* if performing two transactions on disjoint sets, such as transferring funds from a savings account to a checking account and then again from a savings account as a payment to another account, if the two transactions get re-ordered, then the state of the checking account may be different and thus violate some safety constraints (such as balance being greater than 0). Example:
```python
Checking = 0
Savings = 100
TRANSACTION tx_1
Checking += 100
Savings -= 100
TRANSACTION tx_2
Checking -= 100
Other_bank_account += 100
```
if `tx_1` and `tx_2` get re-ordered, then the balance of Checking will dip below zero before arriving at the final balance state
---
## Single-object consistencies
### Causal consistency
- Causally related events should appear in the same order on all processes, although causally independent events are arbitrarily ordered
- Captures the notion of "happened before" relation from Lamport's definition
- Example — In a chat application, assume Alice sends a message "lunch?", after which Bob responds "yes" and Zoe responds "no". Causal consistency allows Alice to observe
1. `"yes"` → `"no"`
2. `lunch?` → `"no"` → `"yes"`
but never `lunch?` before either `"yes"` or `"no"`, since those are causally-related operations
### Sequential consistency
- All operations on processes appear to take place in *some* total order, and that order is consistent with the order of operations on each individual process
### Linearizability
- Strengthens Sequential Consistency such that operations appear to take place atomically in a single total order that is consistent with *real time ordering* of those operations
- If operation `A` completes before operation `B` begins, then operation `B` should logically take effect on the state after operation `A`
---
### Some notes on "**ANSI SQL Isolation Levels**"
*A Critique of ANSI SQL Isolation Levels* - [https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf)
"Isolation" defined with the three phenomena —
1. P1 (Dirty Read) - `tx1` applies a write, then `tx2` reads the modified data before `tx1` commits or aborts the transaction, so then if `tx1` does end up aborting, `tx2` read a value that is no longer there and thus never really existed
2. P2 (Fuzzy Read or Non-repeatable read) - `tx1` performs a read, then `tx2` writes or deletes the record that `tx1` has read and commits, so then if `tx1` attempts to re-read the value, it finds it modified and not equal to the previous read
3. P3 (Phantom) - `tx1` performs a read that satisfied a `< search condition filter >` , then `tx2` applies a write that also satisfies the same `< search condition filter >` and commits. If `tx1` performs the same read of `< search condition filter >` again, it finds a record that was not there in the previous read.