# PACELC > CAP :::info This note provides a summary of [PACELC](https://www.cs.umd.edu/~abadi/papers/abadi-pacelc.pdf), a framework for understanding the tradeoff between consistency and latency in distributed systems. ::: ***Table of Contents:*** [ToC] ## What is the CAP Theorem <center> <img src="https://hackmd.io/_uploads/SkJ8sjAfA.png" width="450" height="350"> </center> In case you're unfamiliar, the [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem) states that any distributed data store can provide only two of the following three guarantees: **Consistency** - > "Consistency, informally, simply means that each server returns the right response to each request, i.e., a response that is correct according to the desired service specification. (There may, of course, be multiple possible correct responses.) The meaning of consistency depends on the service." [[1]](https://groups.csail.mit.edu/tds/papers/Gilbert/Brewer2.pdf) **Availability** - >"Availability simply means that each request eventually receive a response. Obviously, a fast response is better than a slow response, but for the purpose of CAP, it turns out that even requiring an eventual response is sufficient to create problems. (In most real systems, of course, a response that is sufficiently late is just as bad as a response that never occurs.)" [[1]](https://groups.csail.mit.edu/tds/papers/Gilbert/Brewer2.pdf) **Partition tolerance** - >"The third requirement of the CAP theorem is that the service be partition tolerant. Unlike the other two requirements, this property can be seen as a statement regarding the underlying system: communication among the servers is not reliable, and the servers may be partitioned into multiple groups that cannot communicate with each other. For our purposes, we simply treat communication as faulty: messages may be delayed and, sometimes, lost forever. (Again, it is worth pointing out that a message that is delayed for sufficiently long may as well be considered lost, at least in the context of practical systems.)" [[1]](https://groups.csail.mit.edu/tds/papers/Gilbert/Brewer2.pdf) :::success :bulb: **The CAP Theorem** : In a network subject to communication failures, it is impossible for any web service to implement an atomic read/write shared memory that guarantees a response to every request. [[1]](https://groups.csail.mit.edu/tds/papers/Gilbert/Brewer2.pdf) ::: **Proof sketch** : Consider an execution in which the servers are partitioned into two disjoint sets: {`p1`} and {`p2`, . . . , `pn`}. Some client sends a read request to server `p2`. Since `p1` is in a different component of the partition from `p2`, every message from p1 to `p2` is lost. Thus, it is impossible for `p2` to distinguish the following two cases: - There has been a previous write of value `v1` requested of `p1`, and `p1` has sent an ok response. - There has been a previous write of value `v2` requested of `p1`, and `p1` has sent an ok response. No matter how long `p2` waits, it cannot distinguish these two cases, and hence it cannot determine whether to return response `v1` or response `v2`. It has the choice to either eventually return a response (and risk returning the wrong response) or to never return a response. [[1]](https://groups.csail.mit.edu/tds/papers/Gilbert/Brewer2.pdf) ## Why is the CAP Theorem Misleading? The CAP tells us that we can't pick all three from Consistency, Availability, and Partition Tolerance. Therefore, only CA systems (consistent and highly available, but not partition-tolerant), CP systems (consistent and partitiontolerant, but not highly available), and AP systems (highly available and partition-tolerant, but not consistent) are possible. But it is not the partition tolerance that necessitates a tradeoff between consistency and availability; rather, it is the combination of - partition tolerance and - the existence of a network partition itself The theorem simply states that a network partition causes the system to have to decide between reducing availability or consistency. But in reality, network partitions are highly dependent on the details of your system implementation. Is it using highly redundant machines? Are the machines largely dispersed? Is it a local cluster? Designing your system based on chosing two out of three of the CAP properties is misleading as in typical distributed system you will ideally not be experiencing a partition often. Therefore it's better to think through what properties should your system have in the event of a partition and in the event of normal operation with no partition. ## Data Replication or "Do You Want Availability?" Every component in a system is prone to eventual failure of some sort, whether it's hardware or software based it does not matter as cosmic rays will treat your system the same. Thus if we want our system to have high availability, that necessitates data replication so that we can handle multiple components going offline yet still be responsive to requests. The mere risk of failure dictates the need for ongoing data replication to meet availability requirements. Introducing data replication in a distributed system immediately creates a tradeoff between consistency and latency. To understand why, we can think through all of the ways data can be updated: synchronized across all replicas simultaneously, sent to a designated master node, or directed first to a randomly chosen node. Each method has its own implications for balancing consistency with latency. ### Three Types of Data Replication **Case 1: Data updates sent to all replicas at the same time:** - If updates do not first pass through a preprocessing layer or some other agreement protocol, replica divergence—a clear lack of consistency—could ensue (assuming multiple updates to the system are submitted concurrently, for example, from different clients), as each replica might choose a different order in which to apply the updates. - On the other hand, if updates first pass through a preprocessing layer or all nodes involved in the write use an agreement protocol to decide on the order of operations, then it is possible to ensure that all replicas will agree on the order in which to process the updates. However, this leads to several sources of increased latency. In the case of the agreement protocol, the protocol itself is the additional source of latency. **Case 2: Data updates sent to an agreed-upon location first:** 1. **Synchronous** - master node waits until updates have made it to the replicas. - *Latency is limited by slowest entity.* 2. **Asynchronous** - system treats the update as replicated as soon as it happens and hopes that the update will be propagated to others. - *Consistency/latency trade off here depends on how the system deals with reads:* - If the system routes all reads to the master node and serves them from there, then there is no reduction in consistency. However, there are two latency problems with this approach: - It is likely that for some nodes on the network the master node is far away, ***increasing latency.*** - If the master node is overloaded with other requests or has failed, there is no option but to wait for it, ***increasing latency.*** - If reads can be done from anywhere then read latency is better but write latency is still potentially far away. This can also result in inconsistent reads of the same data item, as different locations have different versions of a data item while the system is still propagating updates, and it could send a read to any of these locations. - Asynchronous case is a bit more complicated 3. **A combination of (1) and (2) is possible** - the system sends updates to some subset of replicas synchronously, and the rest asynchronously. The consistency/latency tradeoff in this case again is determined by how the system deals with reads: - If it routes reads to at least one node that has been synchronously updated—for example, when R + W > N in a quorum protocol, where R is the number of nodes involved in a synchronous read, W is the number of nodes involved in a synchronous write, and N is the number of replicas—then consistency can be preserved. However, latency problems above are all present, though to somewhat lesser degrees, as the number of nodes involved in the synchronization is smaller, and more than one node can potentially serve read requests. - If it is possible for the system to serve reads from nodes that have not been synchronously updated, for example, when R + W ≤ N, then inconsistent reads are possible, as above. **Case 3: Data updates sent to an arbitrary location first** The system performs updates at that location, and then propagates them to the other replicas. The difference between this case and case 2 is that the location the system sends updates to for a particular data item is not always the same. For example, two different updates for a particular data item can be initiated at two different locations simultaneously. The consistency/latency tradeoff again depends on two options: - If replication is synchronous, then latency problems are present. Additionally, the system can incur extra latency to detect and resolve cases of simultaneous updates to the same data item initiated at two different locations. - If replication is asynchronous, then consistency problems are present. :::success :bulb: Data replication is essential for a high availability system in the event of a failure of one or more components. This necessity introduces a tradeoff between consistency and latency, influenced by the chosen method of data update: synchronized across all replicas, through a master node, or to a random node. Each approach has distinct implications on the system's performance, balancing the two factors differently. ::: ## PACELC ![Screenshot 2024-05-12 at 5.47.36 PM](https://hackmd.io/_uploads/SyH-Y2CGC.png) A more complete portrayal of the space of potential consistency tradeoffs for distributed systems can be achieved by rewriting CAP as PACELC (pronounced “pass-elk”): :::success **:star: PACELC Framework:** If there is a partition (P ), how does the system trade off availability and consistency (A and C); else (E), when the system is running normally in the absence of partitions, how does the system trade off latency (L) and consistency (C )? ::: Note that the latency/consistency tradeoff (ELC) only applies to systems that replicate data. Otherwise, the system suffers from availability issues upon any type of failure or overloaded node. Because such issues are just instances of extreme latency, the latency part of the ELC tradeoff can incorporate the choice of whether or not to replicate data. Popular databases like Dynamo, Cassandra, and Riak operate as PA/EL systems by default: if a partition occurs, they give up consistency for availability, and under normal operation they give up consistency for lower latency. Giving up both Cs in PACELC makes the design simpler; once a system is configured to handle inconsistencies, it makes sense to give up consistency for both availability and lower latency. ## Examples Below is a list of examples from the original paper that outlines which parts of the CAP + PACELC tradeoff space various distribted database systems inhabit: | System | CAP Classification | PACELC Classification | Use Case | Justification | |--------------------|--------------------|-----------------------|----------|---------------| | Dynamo, Cassandra, and Riak | AP | PA/EL | E-commerce, social media, big data | Prioritizes transaction completion over consistency; user settings can alter trade-offs. | | PNUTS | CP/CA | PC/EL | Global data distribution for low-latency access | Prioritizes consistency, even increasing it during network partitions. | | MongoDB | CP/AP | PA/EC | Wide range of applications requiring fast development and schema flexibility | Balances availability and consistency; adapts to conditions by electing a new master during partitions. | | VoltDB/H-Store | CP | PC/EC | Financial services, telecommunications, real-time analytics | Focuses on strong consistency and high performance, sacrificing availability and latency if necessary. | - **Dynamo, Cassandra, and Riak**: - **Use Case**: E-commoerce, social media, and big data, all systems which where the ability to complete transactions without interruptions is more critical than absolute consistency across all nodes at all times. - **CAP Classification**: All three fall into the AP category—highly Available and Partition-tolerant. - **PACELC Classification**: **PA/EL** - if a partition occurs, they give up consistency for availability, and under normal operation they give up consistency for lower latency. Giving up both Cs in PACELC makes the design simpler; once a system is configured to handle inconsistencies, it makes sense to give up consistency for both availability and lower latency. However, these systems have user-adjustable settings to alter the ELC tradeoff - **PNUTS**: - **Use Case**: Targets applications needing global distribution of data to ensure low-latency access across different regions. - **CAP Classification**: Balances consistency and latency, often leaning towards CP or CA depending on the operational mode. - **PACELC Classification**: **PC/EL** - prioritizes consistency over latency during normal operations. When a partition occurs, it chooses to sacrifice availability to maintain a higher level of consistency. This makes PNUTS somewhat unique in that it appears to become more consistent during a network partition, unlike many systems that might reduce consistency to maintain availability. - **MongoDB**: - **Use Case**: Popular in a wide range of applications from small apps to major enterprise solutions requiring fast development cycles and flexible schema evolution. - **CAP Classification**: Can operate as either CP or AP depending on configuration, with a default leaning towards high availability. - **PACELC Classification**: **PA/EC** - In normal operation, MongoDB aims to ensure reads and writes are consistent. If a partition occurs, MongoDB maintains availability by electing a new master node, although this can cause temporary inconsistencies until the partition is resolved and data is reconciled. MongoDB's approach emphasizes availability over consistency during partitions. - **VoltDB/H-Store**: - **Design Focus**: VoltDB and the underlying H-Store architecture are optimized for in-memory transaction processing, offering extremely high performance and strong consistency. - **Use Case**: Ideal for financial services, telecommunications, and any real-time analytics that require immediate consistency. - **CAP Classification**: These systems are generally considered CP, focusing on consistency and partition tolerance. - **PACELC Classification**: **PC/EC** - they refuse to give up consistency, and will pay the availability and latency costs to achieve it. BigTable and related systems such as HBase are also PC/EC. ## References [1] : https://groups.csail.mit.edu/tds/papers/Gilbert/Brewer2.pdf