# A. Architecture Basics
## Scalability
Ability of a system to scale with ever increasing demand
### Vertical vs Horizontal
#### Vertical scaling
VS is achieved by adding additional resources in the form of CPU or memory to an existing machine. By doing so, the machine is able to service additional customers or perform compute task quicker.

##### Disadvantages
Eventually, maximum machine size will constrain your ability to scale - either technically or from a cost perspective.
#### Horizontal scaling
HS is achieved by adding additional machines into a pool of resources , each of which provide the same service. HS suffers none of the size limitations of VS and can scale nearly infinite levels but require application support to scale effectively.

##### Disadvantages
* Scaling horizontally introduces complexity and involves cloning servers
* Servers should be stateless: they should not contain any user-related data like sessions or profile pictures
* Sessions can be stored in a centralized data store such as a database (SQL, NoSQL) or a persistent cache (Redis, Memcached)
* Downstream servers such as caches and databases need to handle more simultaneous connections as upstream servers scale out
### Performance vs scalability
If you have a **performance problem**, your system is **slow for a single user**.
If you have a **scalability problem**, your system is fast for a single user but **slow under heavy load**.
### Latency vs throughput
**Latency** is the time to perform some action or to produce some result.
**Throughput** is the number of such actions or results per unit of time.
Generally, you should aim for maximal throughput with acceptable latency.
## Availability vs consistency
### CAP theorem
The consistency, availability, and partition tolerance theorem (CAP theorem) states that you can highly achieve only two out of the three dimensions for a particular storage design.

**Consistency** - Every read receives the most recent write or an error
**Availability** - Every request receives a response, without guarantee that it contains the most recent version of the information
**Partition Tolerance** - The system continues to operate despite arbitrary partitioning due to network failures
Networks aren't reliable, so when dealing with modern distributed systems, Partition Tolerance is not an option. It’s a necessity. You'll need to make a software tradeoff between consistency and availability.
**CP** - consistency and partition tolerance
Waiting for a response from the partitioned node might result in a timeout error. CP is a good choice if your business needs require atomic reads and writes.
**AP** - availability and partition tolerance
Responses return the most recent version of the data available on a node, which might not be the latest. Writes might take some time to propagate when the partition is resolved.
AP is a good choice if the business needs allow for eventual consistency or when the system needs to continue working despite external errors.
<details>
<summary><b>You cannot choose AC</b></summary>
> You cannot, however, choose both consistency and availability in a distributed system.
>
As a thought experiment, imagine a distributed system which keeps track of a single piece of data using three nodes— **A**, **B**, and **C** — and which claims to be both consistent and available in the face of network partitions. Misfortune strikes, and that system is partitioned into two components: **{A,B}** and **{C}**. In this state, a write request arrives at node C to update the single piece of data.
That node only has two options:
* Accept the write, knowing that neither A nor B will know about this new data until the partition heals.
* Refuse the write, knowing that the client might not be able to contact A or B until the partition heals.
You either choose availability (Door #1) or you choose consistency (Door #2). You cannot choose both.
To claim to do so is claiming either that the system operates on a single node (and is therefore not distributed) or that an update applied to a node in one component of a network partition will also be applied to another node in a different partition component magically.
This is, as you might imagine, rarely true.
</details>
## Consistency patterns
With multiple copies of the same data, we are faced with options on how to synchronize them so clients have a consistent view of the data. Recall the definition of consistency from the CAP theorem - Every read receives the most recent write or an error.
### Weak consistency
After a write, reads may or may not see it. A best effort approach is taken.
This approach is seen in systems such as memcached. Weak consistency works well in real time use cases such as VoIP, video chat, and realtime multiplayer games. For example, if you are on a phone call and lose reception for a few seconds, when you regain connection you do not hear what was spoken during connection loss.
### Eventual consistency
After a write, reads will eventually see it (typically within milliseconds). Data is replicated asynchronously.
This approach is seen in systems such as DNS and email. Eventual consistency works well in highly available systems.
### Strong consistency
After a write, reads will see it. Data is replicated synchronously.
This approach is seen in file systems and RDBMSes. Strong consistency works well in systems that need transactions.
## Availability patterns
There are two main patterns to support high availability: fail-over and replication.
### Fail-over
#### Active-passive
With active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat is interrupted, the passive server takes over the active's IP address and resumes service.
The length of downtime is determined by whether the passive server is already running in 'hot' standby or whether it needs to start up from 'cold' standby. Only the active server handles traffic.
Active-passive failover can also be referred to as master-slave failover.
#### Active-active
In active-active, both servers are managing traffic, spreading the load between them.
If the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are internal-facing, application logic would need to know about both servers.
Active-active failover can also be referred to as master-master failover.
#### Disadvantage(s): failover
* Fail-over adds more hardware and additional complexity.
* There is a potential for loss of data if the active system fails before any newly written data can be replicated to the passive.
### Replication
Master-slave and master-master
#### Availability in numbers
Availability is often quantified by uptime (or downtime) as a percentage of time the service is available. Availability is generally measured in number of 9s--a service with 99.99% availability is described as having four 9s.
* 99.9% availability - three 9s
|Duration| Acceptable downtime|
|--|--|
|Downtime per year| 8h 45min 57s|
|Downtime per month| 43m 49.7s|
|Downtime per week| 10m 4.8s|
|Downtime per day| 1m 26.4s|
* 99.99% availability - four 9s
|Duration| Acceptable downtime|
|--|--|
|Downtime per year| 52min 35.7s|
|Downtime per month| 4m 23s|
|Downtime per week| 1m 5s|
|Downtime per day| 8.6s|
#### Availability in parallel vs in sequence
If a service consists of multiple components prone to failure, the service's overall availability depends on whether the components are in sequence or in parallel.
##### In sequence
Overall availability decreases when two components with availability < 100% are in sequence:
`Availability (Total) = Availability (Foo) * Availability (Bar)`
If both Foo and Bar each had 99.9% availability, their total availability in sequence would be 99.8%.
##### In parallel
Overall availability increases when two components with availability < 100% are in parallel:
`Availability (Total) = 1 - (1 - Availability (Foo)) * (1 - Availability (Bar))`
If both Foo and Bar each had 99.9% availability, their total availability in parallel would be 99.9999%.