CAP distribute system

# CAP distribute database node (indirectly to application that rely on these database) ### C: get all the same newest response -> Every read receives the most recent write or fails (strong consistency). -> In a distributed system, when data is written to one node, all subsequent reads (from any node) must return the most up-to-date value (may has some latency on read request to get the up-to-date data)(synchronous). Example: If you write X=100 on Node A, a read from Node B should immediately return X=100, not an older version. ### A: get response even some node fail -> The system does not refuse requests as long as there is a reachable node, even if it means returning stale (outdated) data. Example: If a database cluster has 3 nodes and 1 node crashes, the other 2 nodes must still respond to requests. ### P: The system continues to operate despite network failures (partitions). -> Even if some nodes cannot communicate due to network failures, the system must still function. -> The system continues working even when network failures prevent nodes from communicating. -> A network partition means that some nodes cannot talk to each other at all (not just slow, but completely disconnected). Example: If a network split happens (e.g., nodes in Europe and the US lose connection), both sides must continue operating independently ### CP 1. Write -> Rejects writes when some nodes are unavailable. 2. Read -> By rejecting new writes, all nodes (even isolated ones) return the same consistent response. In case of a network partition (node isolation), we want to ensure that all nodes return the same data (strong consistency). If a node cannot communicate with others, it will reject requests instead of serving stale or inconsistent data. By doing this, we reject Availability (A) to ensure Consistency (C), meaning we refuse to serve requests even if some nodes are still alive. * A system following CP will stop responding rather than risk inconsistent data. * The system guarantees that all nodes have the same data, even if some are temporarily unavailable. example: MySQL Group Replication (with strong consistency enabled): If a network partition happens, writes are blocked until all nodes synchronize. Reads always return the latest committed data, ensuring consistency. * for cosistency and partition, if the node not getting newest version return error or timeout, ex. bank system- If inconsistency occurs due to a network partition, the bank system returns an error before the inconsistency is resolved. ### AP 1. Write -> Accepts writes even when some nodes are unavailable. 2. Read -> May return different responses since some nodes may return stale data due to replication lag. In case of a network partition (node isolation), we prioritize serving responses (Availability). Since nodes cannot communicate, different nodes may return different (potentially stale) versions of the data. By doing this, we sacrifice Consistency (C) for Availability (A), meaning the response from different nodes may not be the same due to replication lag or eventual consistency. * This means the system always responds, even if it returns outdated data. * The inconsistency happens because some nodes may have old versions of the data. example: Cassandra / DynamoDB: If a network partition occurs, writes are accepted, and reads return eventually consistent data. Some nodes might lag behind, leading to temporary inconsistencies. * sacrifice consistency, the non-fail node even under network latency still request the available response, even if the response is stale. the system keeps accepting reads, even though it might return stale data. For writes, n1 and n2 will keep accepting writes, and data will be synced to n3 when the network partition is resolved.-> read might be stale, write cannot be sync to n3 ### AC not tolerance partition( most apply to standalone mode DB), if there is no latency, not fail node can retuen newest version response, but if latency occurs, the response will throw error or timeout. and for network latency cannot be avoid, no distributed system as AC. ### common case: recent condition, the network latency cannot be controlled, we choose CP or AP CP: Consistency/Partition Tolerance - Wait for a response from the partitioned node which could result in a timeout error. The system can also choose to return an error, depending on the scenario you desire. Choose Consistency over Availability when your business requirements dictate atomic reads and writes.(non-fail node return timeout or error) AP: Return the most recent version of the data you have, which could be stale. This system state will also accept writes that can be processed later when the partition is resolved. Choose Availability over Consistency when your business requirements allow for some flexibility around when the data in the system synchronizes. Availability is also a compelling option when the system needs to continue to function in spite of external errors (shopping carts, etc.) (can tolerance partition, get the response might be stale response)