> System Design based on [Roadmap.sh - System Design](https://roadmap.sh/system-design) ## Introduction to System Design (Pt1) # What is System Design? System design is the process of defining the components of a system, as well as their interactions and relationships, to meet a set of specified requirements. It involves taking a problem statement, breaking it down into smaller components, and designing each component to work together effectively to achieve the overall goal of the system. It follows the [Divide and Conquer](https://en.wikipedia.org/wiki/Divide-and-conquer_algorithm) technique. This process typically includes analyzing the current system (if any) and determining any deficiencies, creating a detailed plan for the new system, and testing the design to ensure that it meets the requirements. It is an iterative process that may involve many rounds of design, testing, and refinement. # How To: System Design? There are several steps that can be taken when approaching a system design: - **Understand the problem:** Gather information about the problem you are trying to solve and the requirements of the system. Identify the users and their needs, as well as any constraints or limitations of the system. - **Identify the scope of the system:** Define the boundaries of the system, including what the system will do and what it will not do. - **Research and analyze existing systems:** Look at similar systems that have been built in the past and identify what worked well and what didn’t. Use this information to inform your design decisions. - **Create a high-level design:** Outline the main components of the system and how they will interact with each other. This can include a rough diagram of the system’s architecture, or a flowchart outlining the process the system will follow. - **Refine the design:** As you work on the details of the design, iterate and refine it until you have a complete and detailed design that meets all the requirements. - **Document the design:** Create detailed documentation of your design for future reference and maintenance. - **Continuously monitor and improve the system:** The system design is not a one-time process, it needs to be continuously monitored and improved to meet the changing requirements. # Systems Properties There are properties that are common to all systems, but when we talk about distributed systems we must consider other factors within the design. :::info :mage: A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another. The components interact with one another in order to achieve a common goal. - Wikipedia :::: ## Performance vs Scalability Scalability is the measure of a system’s ability to increase or decrease in performance and cost in response to changes in application and system processing demands. Examples would include how well a hardware system performs when the number of users is increased, how well a database withstands growing numbers of queries, or how well an operating system performs on different classes of hardware. A service is scalable if it results in increased performance in a manner proportional to resources added. Generally, increasing performance means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow. Applications that are growing rapidly should pay special attention to scalability when evaluating hardware and software. Another way to look at performance vs scalability: - If you have a performance problem, your system is slow for a single user. - If you have a scalability problem, your system is fast for a single user but slow under heavy load. ![Performance vs Scalability](https://hackmd.io/_uploads/BJiqwShqh.png) Figure 1. *Performance vs Scalability* A system that scales well means that performance is consistent no matter what load the system is dealing with. ![Good Scalability](https://hackmd.io/_uploads/SJw3urh5h.png) Figure 2. *Scalability is making sure that the performance is constant no matter now many people are coming in.* ## Latency vs Throughput Latency and throughput are two important measures of a system’s performance. Latency refers to the amount of time it takes for a system to respond to a request. Throughput refers to the number of requests that a system can handle at the same time. ### Latency It is a measure of how long it takes for a system to complete a task or process data. It is often measured in milliseconds or microseconds and is an important metric for determining the performance of a system. High latency can lead to slow performance, while low latency can result in faster and more responsive systems. Latency can be caused by various factors e.g., network i.e. time it takes for data to travel through the network (more hops, high latency), network congestion, inefficient algorithms, load on the resources and so on. Generally, you should aim for maximal throughput with acceptable latency. ### Throughput Throughput refers to the number of requests that a system can handle at the same time or the number of units of data that can be processed in a given period of time. Throughput is often measured in requests per second, transactions per second, or bits per second. Throughput can be limited by various factors, such as the capacity of the systems involved, the number of available resources, and the efficiency of the algorithms used to process the data. For example, in a network, throughput can be limited by the bandwidth available or the number of connections that can be made at the same time. In a computer, it can be limited by the CPU or memory capacity. High throughput can lead to more responsive systems and more efficient use of resources, while low throughput can result in slow performance and increased latency. ### Relationship Between Latency and Throughput There is a trade-off between latency and throughput, as increasing throughput often requires sacrificing some of the time it takes for the system to respond to each individual request (latency). Therefore, when designing and evaluating systems, it is important to consider both latency and throughput to find the right balance. ## Availability vs Consistency Availability refers to the ability of a system to provide its services to clients even in the presence of failures. This is often measured in terms of the percentage of time that the system is up and running, also known as its uptime. Consistency, on the other hand, refers to the property that all clients see the same data at the same time. This is important for maintaining the integrity of the data stored in the system. In distributed systems, it is often a trade-off between availability and consistency. Systems that prioritize high availability may sacrifice consistency, while systems that prioritize consistency may sacrifice availability. Different distributed systems use different approaches to balance the trade-off between availability and consistency, such as using replication or consensus algorithms. ### CAP Theorem The CAP Theorem states that, in a distributed system (a collection of interconnected nodes that share data.), you can only have two out of the following three guarantees across a write/read pair: Consistency, Availability, and Partition Tolerance - one of them must be sacrificed. You can only support two of the following guarantees: - **Consistency** - Every read receives the most recent write or an error - **Availability** - Every request receives a response, without guarantee that it contains the most recent version of the information - **Partition Tolerance** - The system continues to operate despite arbitrary partitioning due to network failures ![CAP Theorem](https://hackmd.io/_uploads/ryqKxP392.png) Figure 3. *CAP Theorem* #### A distributed computing fallacy: networks are reliable One such fallacy of distributed computing states that networks are reliable. They are not. Networks and parts of networks go down frequently and unexpectedly. Network failures happen to your system and you don't get to choose when they occur. Given that networks aren't completely reliable, you must tolerate partitions in a distributed system. Fortunately, though, you get to choose what to do when a partition does occur. According to the CAP theorem, this means we are left with two options: Consistency and Availability. :::warning :warning: No distributed system is safe from network failures, thus network partitioning generally has to be tolerated. ::: ##### CP - Consistency/Partition Tolerance Waiting for a response from the partitioned node might result in a timeout error. CP is a good choice if your business needs require atomic reads and writes. ##### AP - availability and partition tolerance Responses return the most readily available version of the data available on any node, which might not be the latest. Writes might take some time to propagate when the partition is resolved. AP is a good choice if the business needs to allow for eventual consistency or when the system needs to continue working despite external errors. # System Design Patterns Design patterns are typical solutions to commonly occurring problems in software design. It describes the problem, the solution, when to apply the solution, and its consequences. ## Consistency Patterns Consistency patterns refer to the ways in which data is stored and managed in a distributed system, and how that data is made available to users and applications. There are three main types of consistency patterns: - Strong consistency - Weak consistency - Eventual Consistency Each of these patterns has its own advantages and disadvantages, and the choice of which pattern to use will depend on the specific requirements of the application or system. ### Strong Consistency In a strong consistency system, any updates to some data are immediately propagated to all locations. This ensures that all locations have the same version of the data, but it also means that the system is not highly available and has high latency. **Example.** A financial system where users can transfer money between accounts. The system is designed for high data integrity, so the data is stored in a single location and updates to that data are immediately propagated to all other locations. This ensures that all users and applications are working with the same, accurate data ### Weak Consistency In a weakly consistent system, updates to the data may not be immediately propagated. This can lead to inconsistencies and conflicts between different versions of the data, but it also allows for high availability and low latency. **Example.** A gaming platform where users can play online multiplayer games. When a user plays a game, their actions are immediately visible to other players in the same data center, but if there was a lag or temporary connectoin loss, the actions may not be seen by some of the users and the game will continue. This can lead to inconsistencies between different versions of the game state, but it also allows for a high level of availability and low latency. ### Eventual Consistency Eventual consistency is a form of Weak Consistency. After an update is made to the data, it will be eventually visible to any subsequent read operations. The data is replicated in an asynchronous manner, ensuring that all copies of the data are eventually updated. This means that the system is highly available and has low latency, but it also means that there may be inconsistencies and conflicts between different versions of the data. **Example.** A social media platform where users can post updates, comments, and messages. The platform is designed for high availability and low latency, so the data is stored in multiple data centers around the world. When a user posts an update, the update is immediately visible to other users in the same data center, but it may take some time for the update to propagate to other data centers. This means that some users may see the update while others may not, depending on which data center they are connected to. This can lead to inconsistencies between different versions of the data, but it also allows for a high level of availability and low latency. ## Availability Patterns Availability is measured as a percentage of uptime, and defines the proportion of time that a system is functional and working. Availability is affected by system errors, infrastructure problems, malicious attacks, and system load. Cloud applications typically provide users with a service level agreement (SLA), which means that applications must be designed and implemented to maximize availability. :::info :mage: A **service-level agreement** (SLA) is an agreement between a service provider and a customer. Particular aspects of the service – quality, availability, responsibilities – are agreed between the service provider and the service user. ::: ### How is measured? Availability is often quantified by uptime (or downtime) as a percentage of time the service is available. Availability is generally measured in number of 9s—a service with 99.99% availability is described as having four 9s. #### 99.9% Availability - Three 9s: Duration | Acceptable downtime ------------- | ------------- Downtime per year | 8h 41min 38s Downtime per month | 43m 28s Downtime per week | 10m 4.8s Downtime per day | 1m 26s #### 99.99% Availability - Four 9s Duration | Acceptable downtime ------------- | ------------- Downtime per year | 52min 9.8s Downtime per month | 4m 21s Downtime per week | 1m 0.5s Downtime per day | 8.6s #### Availability in parallel vs in sequence If a service consists of multiple components prone to failure, the service’s overall availability depends on whether the components are in sequence or in parallel. **In sequence** Overall availability decreases when two components with availability < 100% are in sequence: ```bash Availability (Total) = Availability (Foo) * Availability (Bar) ``` If both Foo and Bar each had 99.9% availability, their total availability in sequence would be 99.8%. **In parallel** Overall availability increases when two components with availability < 100% are in parallel: ```bash Availability (Total) = 1 - (1 - Availability (Foo)) * (1 - Availability (Bar)) ``` If both Foo and Bar each had 99.9% availability, their total availability in parallel would be 99.9999%. ### How do we achieve High Availability? The key to coming up with a solid design for a highly available system lies in identifying and addressing single points of failure. A single point of failure simply refers to any part whose failure will result into a complete system shutdown. Production servers are complex systems whose availability is dependent on multiple factors, including hardware, software and communication links; each of these factors is a potential point of failure. High availability is the ability of a system to maintain operation despite the failure of components. To increase availability, we can use redundancy by duplicating or adding additional hardware (servers or storage) components. Introducing redundancy is the surest way to address single points of failure. Redundancy guarantees that there will always be a secondary component available to take over in the event a critical component fails. Redundancy relies on the assumption that they system cannot simultaneously experience multiple faults. For example, a system with two identical web servers behind a load balancer can continue operating even if one of the servers goes down, as the load balancer will redirect traffic to the remaining server. So by adding redundancy, we can make the system more resilient to failure. - **Passive redundancy:** Here only some of the components (server or storage device) are active at any given time and backup components are available in case of a failure. If some component will fail, the backup component will takes over and becomes active. This will allow system to continue to operate and maintain availability. - **Active redundancy:** Here multiple active components (servers or storage devices) work simultaneously to perform the task. In the event of a failure of one of the active components, the other active components can take over and maintain the availability of the system. It is important to note that redundancy alone is not enough to guarantee high availability. Failure detection mechanisms must also be in place to identify failures. This requires regular high-availability testing and the ability to take corrective action whenever one of the components in the system becomes unavailable. #### Difference between High Availability and Fault Tolerance Both high availability and fault tolerance are strategies used to achieve high uptime in systems, but they approach the problem differently. High availability is about system or component's ability to remain operational and accessible with minimal downtime. On other side, Fault tolerance is about system or component's ability to continue functioning normally even in the event of a failure. - **Fault tolerance** involves utilization of multiple systems that run in parallel. In the event of a failure in the main system, another system can take over without any loss of uptime. This requires advanced hardware that can detect component faults and enable the systems to operate in coordination. However, it may take longer for complex networks and devices to respond to malfunctions, and technical issues that result in a system crash may also cause the failure of redundant systems running in parallel, leading to a system-wide failure. - **High availability** uses software-based approach to minimize server downtime rather than relying on hardware redundancy. A high-availability cluster uses a collection of servers together to achieve maximum redundancy. This can be more flexible and easier to implement than a fault-tolerant system, but it may not provide the same level of protection against system failures. ### Fail-Over Failover is an availability pattern that is used to ensure that a system can continue to function in the event of a failure. It involves having a backup component or system that can take over in the event of a failure. In a failover system, there is a primary component that is responsible for handling requests, and a secondary (or backup) component that is on standby. The primary component is monitored for failures, and if it fails, the secondary component is activated to take over its duties. This allows the system to continue functioning with minimal disruption. Failover can be implemented in various ways, such as active-passive, active-active, and hot-standby. #### Active-passive With active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat is interrupted, the passive server takes over the active’s IP address and resumes service. The length of downtime is determined by whether the passive server is already running in ‘hot’ standby or whether it needs to start up from ‘cold’ standby. Only the active server handles traffic. Active-passive failover can also be referred to as master-slave failover. #### Active-active In active-active, both servers are managing traffic, spreading the load between them. If the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are internal-facing, application logic would need to know about both servers. Active-active failover can also be referred to as master-master failover. #### Disadvantages of Failover - Fail-over adds more hardware and additional complexity. - There is a potential for loss of data if the active system fails before any newly written data can be replicated to the passive. ### Replication Replication is an availability pattern that involves having multiple copies of the same data stored in different locations. In the event of a failure, the data can be retrieved from a different location. There are two main types of replication: Master-Master replication and Master-Slave replication. - **Master-Master replication**: Multiple servers are configured as “masters,” and each one can accept read and write operations. This allows for high availability and allows any of the servers to take over if one of them fails. However, this type of replication can lead to conflicts if multiple servers update the same data at the same time, so some conflict resolution mechanism is needed to handle this. ![Master-Master replication](https://hackmd.io/_uploads/r1JAq7in3.png) Source: [Scalability, Availability & Stability Patterns ](https://www.slideshare.net/jboner/scalability-availability-stability-patterns) - **Master-Slave replication**: One server is designated as the “master” and handles all write operations, while multiple “slave” servers handle read operations. If the master fails, one of the slaves can be promoted to take its place. This type of replication is simpler to set up and maintain compared to Master-Master replication. ![Master-Slave replication](https://hackmd.io/_uploads/r1AvqXs3n.png) Source: [Scalability, Availability & Stability Patterns ](https://www.slideshare.net/jboner/scalability-availability-stability-patterns) # Background Jobs Background jobs in system design refer to tasks that are executed in the background, independently of the main execution flow of the system. These tasks are typically initiated by the system itself, rather than by a user or another external agent. Background jobs can be used for a variety of purposes, such as: - Performing maintenance tasks: such as cleaning up old data, generating reports, or backing up the database. - Processing large volumes of data: such as data import, data export, or data transformation. - Sending notifications or messages: such as sending email notifications or push notifications to users. - Performing long-running computations: such as machine learning or data analysis. ## Triggers Background jobs can be initiated in several different ways. They fall into one of the following categories: - **Event-driven triggers**. The task is started in response to an event, typically an action taken by a user or a step in a workflow. - **Schedule-driven triggers**. The task is invoked on a schedule based on a timer. This might be a recurring schedule or a one-off invocation that is specified for a later time. ## Returning results Background jobs execute asynchronously in a separate process, or even in a separate location, from the process that invoked the background task. Ideally, background tasks are "fire and forget" operations, and their execution progress has no impact on calling process. This means that the calling process does not wait for completion of the tasks. Therefore, it cannot automatically detect when the task ends. If you require a background task to communicate with the calling task to indicate progress or completion, you must implement a mechanism for this. Some examples are: - Write a status indicator value to storage that is accessible to caller task, which can monitor or check this value when required. Other data that the background task must return to the caller can be placed into the same storage. - Establish a reply queue that the caller listens on. The background task can send messages to the queue that indicate status and completion. Data that the background task must return to the caller can be placed into the messages. - Expose an API or endpoint from the background task that the caller can access to obtain status information. Data that the background task must return to the caller can be included in the response. - Have the background task call back to the caller through an API to indicate status at predefined points or on completion. This might be through events raised locally or through a publish-and-subscribe mechanism. Data that the background task must return to the caller can be included in the request or event payload. # Domain Name Systems A Domain Name System (DNS) translates a domain name such as www.example.com to an IP address. DNS is hierarchical, with a few authoritative servers at the top level. Your router or ISP provides information about which DNS server(s) to contact when doing a lookup. Lower level DNS servers cache mappings, which could become stale due to DNS propagation delays. DNS results can also be cached by your browser or OS for a certain period of time, determined by the time to live (TTL). - NS record (name server) - Specifies the DNS servers for your domain/subdomain. - MX record (mail exchange) - Specifies the mail servers for accepting messages. - A record (address) - Points a name to an IP address. - CNAME (canonical) - Points a name to another name or CNAME (example.com to www.example.com) or to an A record. Services such as CloudFlare and Route53 provide managed DNS services. Some DNS services can route traffic through various methods: - **Weighted Round Robin:** Similar to the Round Robin in a sense that the manner by which requests are assigned to the nodes is still cyclical, but the node with the higher specs will be apportioned a greater number of requests. - **Latency Based:** When you have resources in multiple regions and want to route traffic to the Region that provides the best latency. - **Geolocation Based:** To route traffic based on the location of your users. # Content Delivery Networks (CDN) A content delivery network (CDN) is a globally distributed network of proxy servers, serving content from locations closer to the user. Generally, static files such as HTML/CSS/JS, photos, and videos are served from CDN, although some CDNs support dynamic content. The site’s DNS resolution will tell clients which server to contact. Serving content from CDNs can significantly improve performance in two ways: - Users receive content from data centers close to them - Your servers do not have to serve requests that the CDN fulfills ## Push CDNs Push CDNs receive new content whenever changes occur on your server. You take full responsibility for providing content, uploading directly to the CDN and rewriting URLs to point to the CDN. You can configure when content expires and when it is updated. Content is uploaded only when it is new or changed, minimizing traffic, but maximizing storage. Sites with a small amount of traffic or sites with content that isn't often updated work well with push CDNs. Content is placed on the CDNs once, instead of being re-pulled at regular intervals. ## Pull CDNs Pull CDNs grab new content from your server when the first user requests the content. You leave the content on your server and rewrite URLs to point to the CDN. This results in a slower request until the content is cached on the CDN. A time-to-live (TTL) determines how long content is cached. Pull CDNs minimize storage space on the CDN, but can create redundant traffic if files expire and are pulled before they have actually changed. Sites with heavy traffic work well with pull CDNs, as traffic is spread out more evenly with only recently-requested content remaining on the CDN. ## CDN: Disadvantages - CDN costs could be significant depending on traffic, although this should be weighed with additional costs you would incur not using a CDN. - Content might be stale if it is updated before the TTL expires it. CDNs require changing URLs for static content to point to the CDN. # Load Balancers Load balancers distribute incoming client requests to computing resources such as application servers and databases. In each case, the load balancer returns the response from the computing resource to the appropriate client. Load balancers are effective at: - Preventing requests from going to unhealthy servers - Preventing overloading resources - Helping to eliminate a single point of failure Load balancers can be implemented with hardware (expensive) or with software such as HAProxy. Additional benefits include: - **SSL termination** - Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations. Removes the need to install X.509 certificates on each server - **Session persistence** - Issue cookies and route a specific client’s requests to same instance if the web apps do not keep track of sessions ## Disadvantages of load balancer The load balancer can become a performance bottleneck if it does not have enough resources or if it is not configured properly. A single load balancer is a single point of failure, configuring multiple load balancers further increases complexity. ## Load Balancer vs Reverse Proxy Deploying a load balancer is useful when you have multiple servers. Often, load balancers route traffic to a set of servers serving the same function. Reverse proxies can be useful even with just one web server or application server. Solutions such as NGINX and HAProxy can support both layer 7 reverse proxying and load balancing. ### Disadvantages of Reverse Proxy: - Introducing a reverse proxy results in increased complexity. - A single reverse proxy is a single point of failure, configuring multiple reverse proxies (i.e. a failover) further increases complexity. ## Load Balancing Algorithms A load balancer is a software or hardware device that keeps any one server from becoming overloaded. A load balancing algorithm is the logic that a load balancer uses to distribute network traffic between servers. There are two primary approaches to load balancing: - Dynamic load balancing uses algorithms that take into account the current state of each server and distribute traffic accordingly. - Static load balancing distributes traffic without making these adjustments. Some static algorithms send an equal amount of traffic to each server in a group, either in a specified order or at random. ### Layer 7 Load Balancing Layer 7 load balancers look at the application layer to decide how to distribute requests. This can involve contents of the header, message, and cookies. Layer 7 load balancers terminate network traffic, reads the message, makes a load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers. ### Layer 4 Load Balancing Layer 4 load balancers look at info at the transport layer to decide how to distribute requests. Generally, this involves the source, destination IP addresses, and ports in the header, but not the contents of the packet. Layer 4 load balancers forward network packets to and from the upstream server, performing Network Address Translation (NAT). :::info :mage: At the cost of flexibility, Layer 4 load balancing requires less time and computing resources than Layer 7, although the performance impact can be minimal on modern commodity hardware. ::: ## Horizontal Scaling Load balancers can also help with horizontal scaling, improving performance and availability. Scaling out using commodity machines is more cost efficient and results in higher availability than scaling up a single server on more expensive hardware, called Vertical Scaling. It is also easier to hire for talent working on commodity hardware than it is for specialized enterprise systems. ### Disadvantages of horizontal scaling Scaling horizontally introduces complexity and involves cloning servers Servers should be stateless: they should not contain any user-related data like sessions or profile pictures Sessions can be stored in a centralized data store such as a database (SQL, NoSQL) or a persistent cache (Redis, Memcached) Downstream servers such as caches and databases need to handle more simultaneous connections as upstream servers scale out. # Application Layer Separating out the web layer from the application layer (also known as platform layer) allows you to scale and configure both layers independently. Adding a new API results in adding application servers without necessarily adding additional web servers. The single responsibility principle advocates for small and autonomous services that work together. Small teams with small services can plan more aggressively for rapid growth. ![Application Layer](https://hackmd.io/_uploads/ryuCohDA3.png) ### Disadvantages Adding an application layer with loosely coupled services requires a different approach from an architectural, operations, and process viewpoint (vs a monolithic system). Microservices can add complexity in terms of deployments and operations. ## Microservices Related to the “Application Layer” discussion are microservices, which can be described as a suite of independently deployable, small, modular services. Each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. 1 Pinterest, for example, could have the following microservices: user profile, follower, feed, search, photo upload, etc. ## Service Discovery Systems such as Consul, Etcd, and Zookeeper can help services find each other by keeping track of registered names, addresses, and ports. Health checks help verify service integrity and are often done using an HTTP endpoint. Both Consul and Etcd have a built in key-value store that can be useful for storing config values and other shared data. # Databases Picking the right database for a system is an important decision, as it can have a significant impact on the performance, scalability, and overall success of the system. Some of the key reasons why it’s important to pick the right database include: - **Performance**: Different databases have different performance characteristics, and choosing the wrong one can lead to poor performance and slow response times. - **Scalability**: As the system grows and the volume of data increases, the database needs to be able to scale accordingly. Some databases are better suited for handling large amounts of data than others. - **Data Modeling**: Different databases have different data modeling capabilities and choosing the right one can help to keep the data consistent and organized. - **Data Integrity**: Different databases have different capabilities for maintaining data integrity, such as enforcing constraints, and can have different levels of data security. - Support and maintenance: Some databases have more active communities and better documentation, making it easier to find help and resources. Overall, by choosing the right database, you can ensure that your system will perform well, scale as needed, and be maintainable in the long run. ## SQL vs noSQL ### What is a SQL database? SQL, which stands for “Structured Query Language,” is the programming language that’s been widely used in managing data in relational database management systems (RDBMS) since the 1970s. In the early years, when storage was expensive, SQL databases focused on reducing data duplication. SQL databases have been a primary data storage mechanism for more than four decades. Usage exploded in the late 1990s with the rise of web applications and open-source options such as MySQL, PostgreSQL and SQLite. SQL databases, are best suited for structured, relational data and use a fixed schema. They provide robust ACID (Atomicity, Consistency, Isolation, Durability) transactions and support complex queries and joins. ### What is a NoSQL database? NoSQL is a non-relational database, meaning it allows different structures than a SQL database (not rows and columns) and more flexibility to use a format that best fits the data. The term “NoSQL” was not coined until the early 2000s. It doesn’t mean the systems don’t use SQL, as NoSQL databases do sometimes support some SQL commands. More accurately, “NoSQL” is sometimes defined as “not only SQL.” NoSQL databases have existed since the 1960s, but have been recently gaining traction with popular options such as MongoDB, CouchDB, Redis and Apache Cassandra. NoSQL databases, are best suited for unstructured, non-relational data and use a flexible schema. They provide high scalability and performance for large amounts of data and are often used in big data and real-time web applications. NoSQL is a collection of data items represented in a key-value store, document store, wide column store, or a graph database. Data is denormalized, and joins are generally done in the application code. Most NoSQL stores lack true ACID transactions and favor eventual consistency. BASE is often used to describe the properties of NoSQL databases. In comparison with the CAP Theorem, BASE chooses **availability over consistency**. - Basically Available - the system guarantees availability. - Soft state - the state of the system may change over time, even without input. - Eventual consistency - the system will become consistent over a period of time, given that the system doesn’t receive input during that period. #### Key-Value Store A key-value store generally allows for O(1) reads and writes and is often backed by memory or SSD. Data stores can maintain keys in lexicographic order, allowing efficient retrieval of key ranges. Key-value stores can allow for storing of metadata with a value. They provide high performance and are often used for simple data models or for rapidly-changing data, such as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application layer if additional operations are needed. ![ Key-Value Store](https://hackmd.io/_uploads/Hk8Kn5Fmp.png) Image by Clescop, via Wikimedia Commons #### Document Store A document store is centered around documents (XML, JSON, binary, etc), where a document stores all information for a given object. Document stores provide APIs or a query language to query based on the internal structure of the document itself. Note, many key-value stores include features for working with a value’s metadata, blurring the lines between these two storage types. Based on the underlying implementation, documents are organized by collections, tags, metadata, or directories. Although documents can be organized or grouped together, documents may have fields that are completely different from each other. #### Wide Column Store A wide column store’s basic unit of data is a column (name/value pair). A column can be grouped in column families (analogous to a SQL table). Super column families further group column families. You can access each column independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for versioning and for conflict resolution. Google introduced Bigtable as the first wide column store, which influenced the open-source HBase often-used in the Hadoop ecosystem, and Cassandra from Facebook. Stores such as BigTable, HBase, and Cassandra maintain keys in lexicographic order, allowing efficient retrieval of selective key ranges. #### Graph Databases In a graph database, each node is a record and each arc is a relationship between two nodes. Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships. Graphs databases offer high performance for data models with complex relationships, such as a social network. They are relatively new and are not yet widely-used; it might be more difficult to find development tools and resources. Many graphs can only be accessed with REST APIs. ![Graph Databases](https://hackmd.io/_uploads/SyLJCqK7T.png) Image by Ahzfl, via Wikimedia Commons ### Myths There are myths that we must put aside before focusing on the difference between one and the other. - MYTH: NoSQL supersedes SQL - MYTH: NoSQL is better / worse than SQL - MYTH: SQL vs NoSQL is a clear distinction - MYTH: the language/framework determines the database Some projects are better suited to using an SQL database. Some are better suited to NoSQL, but there is no such thing as "the best." ### Main the Differences SQL and NoSQL differ in whether they are relational (SQL) or non-relational (NoSQL), whether their schemas are predefined or dynamic, how they scale, the type of data they include and whether they are more fit for multi-row transactions or unstructured data. #### SQL Tables vs NoSQL Documents SQL databases provide a store of related data tables. For example, if you run an online book store, book information can be added to a table named book: | ISBN | title | author | format | price | | -------- | -------- | -------- | -------- | -------- | |9780992461225| JavaScript: Novice to Ninja |Darren Jones | ebook | 29.00 | |9780994182654| Jump Start Git | Shaumik Daityari | ebook | 29.00 Every row is a different book record. The design is rigid; you cannot use the same table to store different information or insert a string where a number is expected. NoSQL databases store JSON-like field-value pair documents, e.g. ```json { ISBN: 9780992461225, title: "JavaScript: Novice to Ninja", author: "Darren Jones", format: "ebook", price: 29.00 } ``` Similar documents can be stored in a collection, which is analogous to an SQL table. However, you can store any data you like in any document; the NoSQL database won’t complain :::info SQL tables create a strict data template, so it’s difficult to make mistakes. NoSQL is more flexible and forgiving, but being able to store any data anywhere can lead to consistency issues. ::: #### SQL Schema vs NoSQL Schemaless In an SQL database, it’s impossible to add data until you define tables and field types in what’s referred to as a schema. The schema optionally contains other information, such as — - primary keys — unique identifiers such as the ISBN which apply to a single record - indexes — commonly queried fields indexed to aid quick searching - relationships — logical links between data fields - functionality such as triggers and stored procedures. Your data schema must be designed and implemented before any business logic can be developed to manipulate data. It’s possible to make updates later, but large changes can be complicated. In a NoSQL database, data can be added anywhere, at any time. There’s no need to specify a document design or even a collection up-front. A NoSQL database may be more suited to projects where the initial data requirements are difficult to ascertain. #### SQL Normalization vs NoSQL Denormalization Normalization is a technique that minimizes data redundancy and has practical benefits. We can update a single row without changing the whole data. We can use normalization techniques in NoSQL. However, this is not always practical, since it leads to data redundancy. Finally, it leads to faster queries, but updating the data in multiple records will be significantly slower. #### SQL Relational JOIN vs NoSQL SQL queries offer a powerful JOIN clause. We can obtain related data in multiple tables using a single SQL statement. For example: ```sql SELECT book.title, book.author, publisher.name FROM book LEFT JOIN book.publisher_id ON publisher.id; ``` This returns all book titles, authors and associated publisher names (presuming one has been set). NoSQL has no equivalent of JOIN, and this can shock those with SQL experience. If we used normalized collections as described above, we would need to fetch all book documents, retrieve all associated publisher documents, and manually link the two in our program logic. This is one reason denormalization is often essential. #### SQL vs NoSQL Data Integrity Most SQL databases allow you to enforce data integrity rules using foreign key constraints. Our book store could ensure all books have: - a valid publisher_id code that matches one entry in the publisher table, and - not permit publishers to be removed if one or more books are assigned to them. The schema enforces these rules for the database to follow. It’s impossible for developers or users to add, edit or remove records, which could result in invalid data or orphan records. The same data integrity options are not available in NoSQL databases; you can store what you want regardless of any other documents. Ideally, a single document will be the sole source of all information about an item. #### SQL vs NoSQL Transactions In SQL databases, two or more updates can be executed in a transaction — an all-or-nothing wrapper that guarantees success or failure. In a NoSQL database, modification of a single document is atomic. In other words, if you’re updating three values within a document, either all three are updated successfully or it remains unchanged. However, there’s no transaction equivalent for updates to multiple documents. There are transaction-like options, but, at the time of writing, these must be manually processed in your code. #### SQL vs NoSQL CRUD Syntax Creating, reading updating and deleting data is the basis of all database systems. SQL is a lightweight declarative language. It’s deceptively powerful, and has become an international standard (ANSI SQL) , although most systems implement subtly different syntaxes. NoSQL databases use JavaScripty-looking queries with JSON-like arguments. Basic operations are simple, but nested JSON can become increasingly convoluted for more complex queries. #### SQL vs NoSQL Performance Perhaps the most controversial comparison, NoSQL is regularly quoted as being faster than SQL. This isn’t surprising; NoSQL’s simpler denormalized store allows you to retrieve all information about a specific item in a single request. There’s no need for related JOINs or complex SQL queries. However, a well-designed SQL database will almost certainly perform better than a badly designed NoSQL equivalent and vice versa. #### SQL vs NoSQL Scaling As your data grows, you may find it necessary to distribute the load among multiple servers. This can be tricky for SQL-based systems. How do you allocate related data? Clustering is possibly the simplest option; multiple servers access the same central store — but even this has challenges. NoSQL’s simpler data models can make the process easier, and many have been built with scaling functionality from the start. #### SQL vs NoSQL Practicalities Finally, let’s consider security and system problems. The most popular NoSQL databases have been around a few years; they are more likely to exhibit issues than more mature SQL products. Many problems have been reported, but most boil down to a single issue: knowledge. Developers and sysadmins have less experience with newer database systems, so mistakes are made. Opting for NoSQL because it feels fresher, or because you want to avoid schema design inevitably, leads to problems later. ### When to use SQL vs NoSQL The choice between SQL and NoSQL depends on the specific use case and requirements of the project. If you need to store and query structured data with complex relationships, an SQL database is likely a better choice. If you need to store and query large amounts of unstructured data with high scalability and performance, a NoSQL database may be a better choice. #### When to use SQL - SQL is a good choice when working with related data. Relational databases are efficient, flexible and easily accessed by any application. A benefit of a relational database is that when one user updates a specific record, every instance of the database automatically refreshes, and that information is provided in real-time. - SQL and a relational database make it easy to handle a great deal of information, scale as necessary and allow flexible access to data — only needing to update data once instead of changing multiple files, for instance. It’s also best for assessing data integrity. Since each piece of information is stored in a single place, there’s no problem with former versions confusing the picture. - Most of the big tech companies use SQL, including Uber, Netflix and Airbnb. Even major companies like Google, Facebook and Amazon, which build their own database systems, use SQL to query and analyze data. #### When to use NoSQL - While SQL is valued for ensuring data validity, NoSQL is good when it’s more important that the availability of big data is fast. It’s also a good choice when a company will need to scale because of changing requirements. NoSQL is easy-to-use, flexible and offers high performance. - NoSQL is also a good choice when there are large amounts of (or ever-changing) data sets or when working with flexible data models or needs that don’t fit into a relational model. When working with large amounts of unstructured data, document databases (e.g., CouchDB, MongoDB, and Amazon DocumentDB) are a good fit. For quick access to a key-value store without strong integrity guarantees, Redis may be the best choice. When a complex or flexible search across a lot of data is needed, Elastic Search is a good choice. - Scalability is a significant benefit of NoSQL databases. Unlike with SQL, their built-in sharding and high availability requirements allow horizontal scaling. Furthermore, NoSQL databases like Cassandra, developed by Facebook, handle massive amounts of data spread across many servers, having no single points of failure and providing maximum availability. - Other big companies that use NoSQL systems because they are dependent on large volumes of data not suited to a relational database include Amazon, Google and Netflix. In general, the more extensive the dataset, the more likely that NoSQL is a better choice. ## RDBMS A relational database like SQL is a collection of data items organized in tables. ACID is a set of properties of relational database transactions. - Atomicity - Each transaction is all or nothing - Consistency - Any transaction will bring the database from one valid state to another - Isolation - Executing transactions concurrently has the same results as if the transactions were executed serially - Durability - Once a transaction has been committed, it will remain so There are many techniques to scale a relational database: master-slave replication, master-master replication, federation, sharding, denormalization, and SQL tuning. ### Replication Replication is the process of copying data from one database to another. Replication is used to increase availability and scalability of databases. There are two types of replication: master-slave and master-master. #### Master-slave Replication: The master serves reads and writes, replicating writes to one or more slaves, which serve only reads. Slaves can also replicate to additional slaves in a tree-like fashion. If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned. #### Master-master Replication: Both masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system can continue to operate with both reads and writes. ### Sharding Sharding distributes data across different databases such that each database can only manage a subset of the data. Taking a users database as an example, as the number of users increases, more shards are added to the cluster. Similar to the advantages of federation, sharding results in less read and write traffic, less replication, and more cache hits. Index size is also reduced, which generally improves performance with faster queries. If one shard goes down, the other shards are still operational, although you’ll want to add some form of replication to avoid data loss. Like federation, there is no single central master serializing writes, allowing you to write in parallel with increased throughput. ### Federation Federation (or functional partitioning) splits up databases by function. For example, instead of a single, monolithic database, you could have three databases: forums, users, and products, resulting in less read and write traffic to each database and therefore less replication lag. Smaller databases result in more data that can fit in memory, which in turn results in more cache hits due to improved cache locality. With no single central master serializing writes you can write in parallel, increasing throughput. ### Denormalization Denormalization attempts to improve read performance at the expense of some write performance. Redundant copies of the data are written in multiple tables to avoid expensive joins. Some RDBMS such as PostgreSQL and Oracle support materialized views which handle the work of storing redundant information and keeping redundant copies consistent. Once data becomes distributed with techniques such as federation and sharding, managing joins across data centers further increases complexity. Denormalization might circumvent the need for such complex joins. ### SQL Tuning SQL tuning is the iterative process of improving SQL statement performance to meet specific, measurable, and achievable goals. #### Purpose of SQL Tuning A SQL statement becomes a problem when it fails to perform according to a predetermined and measurable standard. After you have identified the problem, a typical tuning session has one of the following goals: - Reduce user response time, which means decreasing the time between when a user issues a statement and receives a response - Improve throughput, which means using the least amount of resources necessary to process all rows accessed by a statement For a response time problem, consider an online book seller application that hangs for three minutes after a customer updates the shopping cart. Contrast with a three-minute parallel query in a data warehouse that consumes all of the database host CPU, preventing other queries from running. In each case, the user response time is three minutes, but the cause of the problem is different, and so is the tuning goal. SQL tuning is a broad topic and many books have been written as reference. It’s important to benchmark and profile to simulate and uncover bottlenecks. - Benchmark - Simulate high-load situations with tools such as ab. - Profile - Enable tools such as the slow query log to help track performance issues. Benchmarking and profiling might point you to the following optimizations. # Caching Caching is the process of storing frequently accessed data in a temporary storage location, called a cache, in order to quickly retrieve it without the need to query the original data source. This can improve the performance of an application by reducing the number of times a data source must be accessed. ## Caching strategies: ### Refresh Ahead You can configure the cache to automatically refresh any recently accessed cache entry prior to its expiration. Refresh-ahead can result in reduced latency vs read-through if the cache can accurately predict which items are likely to be needed in the future. #### Disadvantages Not accurately predicting which items are likely to be needed in the future can result in reduced performance than without refresh-ahead. ![Refresh Ahead Cache](https://hackmd.io/_uploads/SJzNWKlw6.png) ### Write-Behind In write-behind, the application does the following: - Add/update entry in cache - Asynchronously write entry to the data store, improving write performance #### Disadvantages There could be data loss if the cache goes down prior to its contents hitting the data store. It is more complex to implement write-behind than it is to implement cache-aside or write-through. ![Write-Behind Cache](https://hackmd.io/_uploads/SJ8KWFlDp.png) ### Write-through The application uses the cache as the main data store, reading and writing data to it, while the cache is responsible for reading and writing to the database: - Application adds/updates entry in cache - Cache synchronously writes entry to data store - Return Application code: ```python set_user(12345, {"foo":"bar"}) ``` Cache code: ```python def set_user(user_id, values): user = db.query("UPDATE Users WHERE id = {0}", user_id, values) cache.set(user_id, user) ``` Write-through is a slow overall operation due to the write operation, but subsequent reads of just written data are fast. Users are generally more tolerant of latency when updating data than reading data. Data in the cache is not stale. #### Disadvantages - When a new node is created due to failure or scaling, the new node will not cache entries until the entry is updated in the database. Cache-aside in conjunction with write through can mitigate this issue. - Most data written might never be read, which can be minimized with a TTL. ![Write-through Cache](https://hackmd.io/_uploads/BJXIMteP6.png) ### Cache Aside The application is responsible for reading and writing from storage. The cache does not interact with storage directly. The application does the following: - Look for entry in cache, resulting in a cache miss - Load entry from the database - Add entry to cache - Return entry ```python def get_user(self, user_id): user = cache.get("user.{0}", user_id) if user is None: user = db.query("SELECT * FROM users WHERE user_id = {0}", user_id) if user is not None: key = "user.{0}".format(user_id) cache.set(key, json.dumps(user)) return user ``` Memcached is generally used in this manner. Subsequent reads of data added to cache are fast. Cache-aside is also referred to as lazy loading. Only requested data is cached, which avoids filling up the cache with data that isn’t requested. ![Cache Aside](https://hackmd.io/_uploads/BJXIMteP6.png) ## Where to implement cache ### Client Caching Client-side caching refers to the practice of storing frequently accessed data on the client’s device rather than the server. This type of caching can help improve the performance of an application by reducing the number of times the client needs to request data from the server. One common example of client-side caching is web browsers caching frequently accessed web pages and resources. When a user visits a web page, the browser stores a copy of the page and its resources (such as images, stylesheets, and scripts) in the browser’s cache. If the user visits the same page again, the browser can retrieve the cached version of the page and its resources instead of requesting them from the server, which can reduce the load time of the page. Another example of client-side caching is application-level caching. Some applications, such as mobile apps, can cache data on the client’s device to improve performance and reduce the amount of data that needs to be transferred over the network. Client side caching has some advantages like reducing server load, faster page load times, and reducing network traffic. However, it also has some drawbacks like the potential for stale data if the client-side cache is not properly managed, or consuming memory or disk space on the client’s device. ### CDN Caching A Content Delivery Network (CDN) is a distributed network of servers that are strategically placed in various locations around the world. The main purpose of a CDN is to serve content to end-users with high availability and high performance by caching frequently accessed content on servers that are closer to the end-users/ When a user requests content from a website that is using a CDN, the CDN will first check if the requested content is available in the cache of a nearby server. If the content is found in the cache, it is served to the user from the nearby server. If the content is not found in the cache, it is requested from the origin server (the original source of the content) and then cached on the nearby server for future requests. CDN caching can significantly improve the performance and availability of a website by reducing the distance that data needs to travel, reducing the load on the origin server, and allowing for faster delivery of content to end-users. ### Web Server Caching Reverse proxies and caches such as Varnish can serve static and dynamic content directly. Web servers can also cache requests, returning responses without having to contact application servers. ### Database Caching Your database usually includes some level of caching in a default configuration, optimized for a generic use case. Tweaking these settings for specific usage patterns can further boost performance. ### Application Caching In-memory caches such as Memcached and Redis are key-value stores between your application and your data storage. Since the data is held in RAM, it is much faster than typical databases where data is stored on disk. RAM is more limited than disk, so cache invalidation algorithms such as least recently used (LRU) can help invalidate ‘cold’ entries and keep ‘hot’ data in RAM. Generally, you should try to avoid file-based caching, as it makes cloning and auto-scaling more difficult. # Asynchronism Asynchronous workflows help reduce request times for expensive operations that would otherwise be performed in-line. They can also help by doing time-consuming work in advance, such as periodic aggregation of data. ## Back Pressure If queues start to grow significantly, the queue size can become larger than memory, resulting in cache misses, disk reads, and even slower performance. Back pressure can help by limiting the queue size, thereby maintaining a high throughput rate and good response times for jobs already in the queue. Once the queue fills up, clients get a server busy or HTTP 503 status code to try again later. Clients can retry the request at a later time, perhaps with exponential backoff. ## Task Queues Tasks queues receive tasks and their related data, runs them, then delivers their results. They can support scheduling and can be used to run computationally-intensive jobs in the background. ## Message Queues Message queues receive, hold, and deliver messages. If an operation is too slow to perform inline, you can use a message queue with the following workflow: - An application publishes a job to the queue, then notifies the user of job status - A worker picks up the job from the queue, processes it, then signals the job is complete The user is not blocked and the job is processed in the background. During this time, the client might optionally do a small amount of processing to make it seem like the task has completed. For example, if posting a tweet, the tweet could be instantly posted to your timeline, but it could take some time before your tweet is actually delivered to all of your followers. # Idempotent Operations Idempotent operations are operations that can be applied multiple times without changing the result beyond the initial application. In other words, if an operation is idempotent, it will have the same effect whether it is executed once or multiple times. It is also important to understand the benefits of idempotent operations, especially when using message or task queues that do not guarantee exactly once processing. Many queueing systems guarantee at least once message delivery or processing. These systems are not completely synchronized, for instance, across geographic regions, which simplifies some aspects of their implemntation or design. Designing the operations that a task queue executes to be idempotent allows one to use a queueing system that has accepted this design trade-off. # Communication Network protocols are a key part of systems today, as no system can exist in isolation - they all need to communicate with each other. You should learn about the networking protocols such as HTTP, TCP, UDP. Also, learn about the architectural styles such as RPC, REST, GraphQL and gRPC. ## Protocolos ### HTTP HTTP is a method for encoding and transporting data between a client and a server. It is a request/response protocol: clients issue requests and servers issue responses with relevant content and completion status info about the request. HTTP is self-contained, allowing requests and responses to flow through many intermediate routers and servers that perform load balancing, caching, encryption, and compression. A basic HTTP request consists of a verb (method) and a resource (endpoint). Below are common HTTP verbs: Verb | Description | Idempotent* | Safe | Cacheable | -------|-------------------------------|-------------|------|-----------------------------------------| GET | Reads a resource | Yes | Yes | Yes | POST | Creates a resource or trigger | No | No | Yes if response contains freshness info | PUT | Creates or replace a resource | Yes | No | No | PATCH | Partially updates a resource | No | No | Yes if response contains freshness info | DELETE | Deletes a resource | Yes | No | No | HTTP is an application layer protocol relying on lower-level protocols such as TCP and UDP. ### TCP TCP is a connection-oriented protocol over an IP network. Connection is established and terminated using a handshake. All packets sent are guaranteed to reach the destination in the original order and without corruption through: - Sequence numbers and checksum fields for each packet - Acknowledgement packets and automatic retransmission If the sender does not receive a correct response, it will resend the packets. If there are multiple timeouts, the connection is dropped. TCP also implements flow control and congestion control. These guarantees cause delays and generally result in less efficient transmission than UDP. To ensure high throughput, web servers can keep a large number of TCP connections open, resulting in high memory usage. It can be expensive to have a large number of open connections between web server threads and say, a memcached server. Connection pooling can help in addition to switching to UDP where applicable. TCP is useful for applications that require high reliability but are less time critical. Some examples include web servers, database info, SMTP, FTP, and SSH. Use TCP over UDP when: - You need all of the data to arrive intact - You want to automatically make a best estimate use of the network throughput ### UDP UDP is connectionless. Datagrams (analogous to packets) are guaranteed only at the datagram level. Datagrams might reach their destination out of order or not at all. UDP does not support congestion control. Without the guarantees that TCP support, UDP is generally more efficient. UDP can broadcast, sending datagrams to all devices on the subnet. This is useful with DHCP because the client has not yet received an IP address, thus preventing a way for TCP to stream without the IP address. UDP is less reliable but works well in real time use cases such as VoIP, video chat, streaming, and realtime multiplayer games. Use UDP over TCP when: - You need the lowest latency - Late data is worse than loss of data - You want to implement your own error correction ## Strategies ### RPC In an RPC, a client causes a procedure to execute on a different address space, usually a remote server. The procedure is coded as if it were a local procedure call, abstracting away the details of how to communicate with the server from the client program. Remote calls are usually slower and less reliable than local calls so it is helpful to distinguish RPC calls from local calls. Popular RPC frameworks include Protobuf, Thrift, and Avro. RPC is a request-response protocol: - Client program - Calls the client stub procedure. The parameters are pushed onto the stack like a local procedure call. - Client stub procedure - Marshals (packs) procedure id and arguments into a request message. - Client communication module - OS sends the message from the client to the server. - Server communication module - OS passes the incoming packets to the server stub procedure. - Server stub procedure - Unmarshalls the results, calls the server procedure matching the procedure id and passes the given arguments. - The server response repeats the steps above in reverse order. #### Sample RPC calls: ```html GET /someoperation?data=anId POST /anotheroperation { "data":"anId"; "anotherdata": "another value" } ``` RPC is focused on exposing behaviors. RPCs are often used for performance reasons with internal communications, as you can hand-craft native calls to better fit your use cases. #### Disadvantage of RPC - RPC clients become tightly coupled to the service implementation. - A new API must be defined for every new operation or use case. - It can be difficult to debug RPC. - You might not be able to leverage existing technologies out of the box. For example, it might require additional effort to ensure RPC calls are properly cached on caching servers such as Squid. ### gRPC gRPC is a high-performance, open-source framework for building remote procedure call (RPC) APIs. It is based on the Protocol Buffers data serialization format and supports a variety of programming languages, including C#, Java, and Python. ### REST REST is an architectural style enforcing a client/server model where the client acts on a set of resources managed by the server. The server provides a representation of resources and actions that can either manipulate or get a new representation of resources. All communication must be stateless and cacheable. There are four qualities of a RESTful interface: - Identify resources (URI in HTTP) - use the same URI regardless of any operation. - Change with representations (Verbs in HTTP) - use verbs, headers, and body. - Self-descriptive error message (status response in HTTP) - Use status codes, don’t reinvent the wheel. - HATEOAS (HTML interface for HTTP) - your web service should be fully accessible in a browser. REST is focused on exposing data. It minimizes the coupling between client/server and is often used for public HTTP APIs. REST uses a more generic and uniform method of exposing resources through URIs, representation through headers, and actions through verbs such as GET, POST, PUT, DELETE, and PATCH. Being stateless, REST is great for horizontal scaling and partitioning. ### GraphQL GraphQL is a query language and runtime for building APIs. It allows clients to define the structure of the data they need and the server will return exactly that. This is in contrast to traditional REST APIs, where the server exposes a fixed set of endpoints and the client must work with the data as it is returned. # Performance Antipatterns Performance antipatterns in system design refer to common mistakes or suboptimal practices that can lead to poor performance in a system. These patterns can occur at different levels of the system and can be caused by a variety of factors such as poor design, lack of optimization, or lack of understanding of the workload. Some of the examples of performance antipatterns include: - **N+1 queries:** This occurs when a system makes multiple queries to a database to retrieve related data, instead of using a single query to retrieve all the necessary data. - **Chatty interfaces:** This occurs when a system makes too many small and frequent requests to an external service or API, instead of making fewer, larger requests. - **Unbounded data:** This occurs when a system retrieves or processes more data than is necessary for the task at hand, leading to increased resource usage and reduced performance. - **Inefficient algorithms:** This occurs when a system uses an algorithm that is not well suited to the task at hand, leading to increased resource usage and reduced performance. ## Busy Database A busy database in system design refers to a database that is handling a high volume of requests or transactions, this can occur when a system is experiencing high traffic or when a database is not properly optimized for the workload it is handling. This can lead to Performance degradation, Increased resource utilization, Deadlocks and contention, Data inconsistencies. To address a busy database, a number of approaches can be taken such as Scaling out, Optimizing the schema, Caching, and Indexing. ## Busy Frontend Performing asynchronous work on a large number of background threads can starve other concurrent foreground tasks of resources, decreasing response times to unacceptable levels. Resource-intensive tasks can increase the response times for user requests and cause high latency. One way to improve response times is to offload a resource-intensive task to a separate thread. This approach lets the application stay responsive while processing happens in the background. However, tasks that run on a background thread still consume resources. If there are too many of them, they can starve the threads that are handling requests. ## Chat I/O The cumulative effect of a large number of I/O requests can have a significant impact on performance and responsiveness. Network calls and other I/O operations are inherently slow compared to compute tasks. Each I/O request typically has significant overhead, and the cumulative effect of numerous I/O operations can slow down the system. Here are some common causes of chatty I/O. - Reading and writing individual records to a database as distinct requests - Implementing a single logical operation as a series of HTTP requests - Reading and writing to a file on disk ## Extraneous Fetching Extraneous fetching in system design refers to the practice of retrieving more data than is needed for a specific task or operation. This can occur when a system is not optimized for the specific workload or when the system is not properly designed to handle the data requirements. Extraneous fetching can lead to a number of issues, such as: - Performance degradation - Increased resource utilization - Increased network traffic - Poor user experience ## Improper Instantiation Improper instantiation in system design refers to the practice of creating unnecessary instances of an object, class or service, which can lead to performance and scalability issues. This can happen when the system is not properly designed, when the code is not written in an efficient way, or when the code is not optimized for the specific use case. ## Monolithic Persistence Monolithic Persistence refers to the use of a single, monolithic database to store all of the data for an application or system. This approach can be used for simple, small-scale systems but as the system grows and evolves it can become a bottleneck, resulting in poor scalability, limited flexibility, and increased complexity. To address these limitations, a number of approaches can be taken such as Microservices, Sharding, and NoSQL databases. ## No Caching No caching antipattern occurs when a cloud application that handles many concurrent requests, repeatedly fetches the same data. This can reduce performance and scalability. When data is not cached, it can cause a number of undesirable behaviors, including: - Repeatedly fetching the same information from a resource that is expensive to access, in terms of I/O overhead or latency. - Repeatedly constructing the same objects or data structures for multiple requests. - Making excessive calls to a remote service that has a service quota and throttles clients past a certain limit. In turn, these problems can lead to poor response times, increased contention in the data store, and poor scalability. ## Noisy Neighbor Noisy neighbor refers to a situation in which one or more components of a system are utilizing a disproportionate amount of shared resources, leading to resource contention and reduced performance for other components. This can occur when a system is not properly designed or configured to handle the workload, or when a component is behaving unexpectedly. Examples of noisy neighbor scenarios include: - One user on a shared server utilizing a large amount of CPU or memory, leading to reduced performance for other users on the same server. - One process on a shared server utilizing a large amount of I/O, causing other processes to experience slow I/O and increased latency. - One application consuming a large amount of network bandwidth, causing other applications to experience reduced throughput. ## Retry Storm Retry Storm refers to a situation in which a large number of retries are triggered in a short period of time, leading to a significant increase in traffic and resource usage. This can occur when a system is not properly designed to handle failures or when a component is behaving unexpectedly. This can lead to Performance degradation, Increased resource utilization, Increased network traffic, and Poor user experience. To address retry storms, a number of approaches can be taken such as Exponential backoff, Circuit breaking, and Monitoring and alerting. ## Synchronous I/O Blocking the calling thread while I/O completes can reduce performance and affect vertical scalability. A synchronous I/O operation blocks the calling thread while the I/O completes. The calling thread enters a wait state and is unable to perform useful work during this interval, wasting processing resources. Common examples of I/O include: - Retrieving or persisting data to a database or any type of persistent storage. - Sending a request to a web service. - Posting a message or retrieving a message from a queue. - Writing to or reading from a local file. This antipattern typically occurs because: - It appears to be the most intuitive way to perform an operation. - The application requires a response from a request. - The application uses a library that only provides synchronous methods for I/O. - An external library performs synchronous I/O operations internally. A single synchronous I/O call can block an entire call chain. # Monitoring ## Health Monitoring A system is healthy if it is running and capable of processing requests. The purpose of health monitoring is to generate a snapshot of the current health of the system so that you can verify that all components of the system are functioning as expected. ## Availability Monitoring A truly healthy system requires that the components and subsystems that compose the system are available. Availability monitoring is closely related to health monitoring. But whereas health monitoring provides an immediate view of the current health of the system, availability monitoring is concerned with tracking the availability of the system and its components to generate statistics about the uptime of the system. ## Performance Monitoring As the system is placed under more and more stress (by increasing the volume of users), the size of the datasets that these users access grows and the possibility of failure of one or more components becomes more likely. Frequently, component failure is preceded by a decrease in performance. If you’re able detect such a decrease, you can take proactive steps to remedy the situation. ## Security Monitoring All commercial systems that include sensitive data must implement a security structure. The complexity of the security mechanism is usually a function of the sensitivity of the data. In a system that requires users to be authenticated, you should record: - All sign-in attempts, whether they fail or succeed. - All operations performed by—and the details of all resources accessed by—an authenticated user. - When a user ends a session and signs out. Monitoring might be able to help detect attacks on the system. For example, a large number of failed sign-in attempts might indicate a brute-force attack. An unexpected surge in requests might be the result of a distributed denial-of-service (DDoS) attack. You must be prepared to monitor all requests to all resources regardless of the source of these requests. A system that has a sign-in vulnerability might accidentally expose resources to the outside world without requiring a user to actually sign in. ## Usage Monitoring Usage monitoring tracks how the features and components of an application are used. An operator can use the gathered data to: - Determine which features are heavily used and determine any potential hotspots in the system. High-traffic elements might benefit from functional partitioning or even replication to spread the load more evenly. An operator can also use this information to ascertain which features are infrequently used and are possible candidates for retirement or replacement in a future version of the system. - Obtain information about the operational events of the system under normal use. For example, in an e-commerce site, you can record the statistical information about the number of transactions and the volume of customers that are responsible for them. This information can be used for capacity planning as the number of customers grows. - Detect (possibly indirectly) user satisfaction with the performance or functionality of the system. For example, if a large number of customers in an e-commerce system regularly abandon their shopping carts, this might be due to a problem with the checkout functionality. - Generate billing information. A commercial application or multitenant service might charge customers for the resources that they use. - Enforce quotas. If a user in a multitenant system exceeds their paid quota of processing time or resource usage during a specified period, their access can be limited or processing can be throttled. ## Instrumentation Instrumentation is a critical part of the monitoring process. You can make meaningful decisions about the performance and health of a system only if you first capture the data that enables you to make these decisions. The information that you gather by using instrumentation should be sufficient to enable you to assess performance, diagnose problems, and make decisions without requiring you to sign in to a remote production server to perform tracing (and debugging) manually. Instrumentation data typically comprises metrics and information that’s written to trace logs. ## Visualization and Alerts An important aspect of any monitoring system is the ability to present the data in such a way that an operator can quickly spot any trends or problems. Also important is the ability to quickly inform an operator if a significant event has occurred that might require attention. # References - [Roadmap.sh - System Design](https://roadmap.sh/system-design) - [Scalability](https://www.gartner.com/en/information-technology/glossary/scalability) - [Performance Vs Scalability ](https://blog.professorbeekums.com/performance-vs-scalability/) - [System Design: Latency vs Throughput](https://cs.fyi/guide/latency-vs-throughput) - [CAP Theorem Revisited](https://robertgreiner.com/cap-theorem-revisited/) - [Consistency Patterns in Distributed Systems](https://cs.fyi/guide/consistency-patterns-week-strong-eventual) - [SQL vs NoSQL: The Differences](https://www.sitepoint.com/sql-vs-nosql-differences/) - [SQL vs. NoSQL Databases: What’s the Difference?](https://www.ibm.com/blog/sql-vs-nosql/) - [An Unorthodox Approach To Database Design : The Coming Of The Shard](http://highscalability.com/blog/2009/8/6/an-unorthodox-approach-to-database-design-the-coming-of-the.html) - [Facade](https://refactoring.guru/design-patterns/facade)