AWS Certified Solutions Architect - Associate (Database)

# AWS Certified Solutions Architect - Associate (Database) ###### tags: `AWS` ## Database Two features: - Multi-AZ - for Disaster Recovery ![](https://i.imgur.com/FR57UIS.png) - Read Replicas - for performance ![](https://i.imgur.com/7R23kWB.png) ### ElastiCache ElastiCache supports tow open-source in-memory caching engines: - Memcached - Redis Tips: **RDS(OLTP)** - SQL - MySQL - PostgreSQL - Oracle - Aurora - MariaDB **DynamoDB(No SQL)** **Red Shifted OLAP** > Notes: > Redshift for Business Intelligence or Data Warehousing. > Elasticache to speed up performance of existing databases (frequent identical queries) Tips: - RDS runs on VM - You cannot log in to these operating systems however. - Patching of the RDS OS and DB is Amazon's responsibility - RDS is NOT Serverless - Aurora Serverless IS Serverless ### RDS - backup - Automated Backups - automated - enabled by default - get S3 storage equal to the size of the DB - DB Snapshots - manually done - they are stored after you delete the original RDS instance Notes: :::Whenever you restore either an automatic backup or a manual snapshots, the restored version of DB will be a new RDS instance with new DNS endpoints::: ![](https://i.imgur.com/tVOkgcn.png) **Encryption is done using the AWS Key Management Service(KMS)** ### RDS - Multi-AZ & Read Replicas - Multi-AZ - for Disaster Recovery - Used for DR - Can force a failover from one AZ to another by rebooting the RDS instance. - An exact copy of your prod DB in another AZ - The write will automatically be synchronized to the stand by DB - Failover to the standby DB without administrative intervention - Available for the DB 1. **SQL server** 2. Oracle 3. MySQL server 4. PostgresSQL 5. MariaDB >Notes: >- **Multi-AZ is for disaster Recovery only** >- **Not primarily used for improving performance** - Read Replicas - for performance - READ-ONLY, can be multi-AZ, different regions - increase perfomance, and must turned on backups - Can be promoted to master, this will break the read replica - Asynchronous replication from the primary RDS instance to the read replica. - For very read-heavy DB workloads. - Available for the DB 1. **Aurora** 2. Oracle 3. MySQL server 4. PostgresSQL 5. MariaDB >Notes: > - Used for scaling, not for DR!! > - Must have automatic backups > - You can have up to 5 read replica copies > - You can have read replicas of read replicas(Latency?) > - Each read replica will have its own DNS > - You can have read replicas that have Multi-AZ > - You can create read replicas of Multi-AZ source DB > - Read replicas can be promoted to be their own DB. This breaks the replication. > - You can hava a read replica in a second region ### DynamoDB DynamoDB is a fast and flexible NoSQL DB service for all applications that need consistent, single-digit ms latency at any scale. - Stored on SSD - Spread across 3 geographically distinct data centres - Eventual Consistent Reads(default) - Strongly Consistent Reads(within one second) #### DynamoDB Accelerator (DAX) - Fully managed, high available, in-memory cache - 10x performance - Reduces requests time from ms to us - No need to manage caching logic ![](https://i.imgur.com/7OYO3fu.png) #### Transactions - Multiple "all-or-nothing" operations - Financial transactions - Fulfilling orders - Two underlying reads or writes - prepare/commit - Up to 25 items or 4MB of data ##### On-Demand Capacity - Pay-per-request pricing - Balance cost and performance - No minimum capacity - No charge for read/write - only storage and backups - **Pay more per request** than with provisioned capacity - Use for new product launches ##### On-Demand Backup and Restore - Full backups at any time - Zero impact on table performance or availability - Consistent within seconds and **retained until deleted** - Operates within same region as the source table ##### Point-In-Time Recovery - Protects against accidental writes or deletes - Restore to any point in the last **35 days** - Incremental backups - Not enabled by default - Lastest restorable: **five minutes** in the past ##### Streams - Time-ordered sequence of item-level changes in a table ![](https://i.imgur.com/uDybIqg.png) - Stored for **24 hours** - Inserts, updates, and deletes - Combine with Lambda functions for functionality like stored procedures ##### Global Tables Managed Multi-Master, Multi-Region Replication - Globally distributed applications - Based on DynamoDB streams - Multi-region redundancy for DR or HA - No applicartion rewrites - Replication latency under **one second** ##### DMS DMS is a cloud service that makes it easy to migrate RDB, data warehouse, NoSQL databases, and other types of data stores. ![](https://i.imgur.com/T0s6wDe.png) ![](https://i.imgur.com/GJXlNzH.png) ![](https://i.imgur.com/NQ180Xi.png) - homogeneous ![](https://i.imgur.com/f48prip.png) - heterogeneous ![](https://i.imgur.com/PTyoxC8.png) **You do not need AWS Schema Conversion Tool(SCT) if you are migrating to identical DB** Tips: - DMS allows you to migrate DB from one source to AWS - The source can either be on-premises, or inside AWS itself or another cloud provider such as Azure - You can do **homogeneous** migrations(same DB engines) or **heterogeneous** migrations - If you do a heterogeneous migration, you will need to **AWS Schema Conversion Tool**(SCT). ##### Security - Encryption at rest using **KMS** - Site-to-site VPN - Direct Connect(DX) - IAM policies and roles - Fine-grained access - CloudWatch and CloudTrail - VPC endpoints ### Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. #### Configuration - Single Node(160Gb) - Multi-Node - Leader Node (manages client connections and receives queries) - Compute Node (store data and preform queries and computations) Up to 128 Compute Nodes #### Compression - Columnar data stores can be compresed much more than row based data stores. - Doesn't require indexed or materialized views. #### Massively Parallel Processing (MPP) Amazon Redshift automatically distributes data and query load across all nodes. #### Backups - 1 day retention period by default - Maximum retention period is 35 days - Redshift always attempts to maintain at least 3 copies of your daya (the **original**, **replica** on compute nodes and **backup** in S3) - Redshift can also asynchronously replicate your snapshots to S3 in another region for disaster recovery. #### Pricing - Compute Node Hours - Backup - Data transfer (only within a VPC) #### Security - Encrypted in transit using SSL - Encrypted at rest using AES-256 encryption - By defalut RedShif takes care of key management - Manage your own keys through HSM - AWS KMS #### Availability - Currently only available in 1AZ - Can restore snapshots to new AZs in the event of an outage Tips: - Redshift is used for BI - Available in only 1AZ - 1~35 days retention period - Attempts to maintain at least 3 copies of your data - asynchronously replicate your snapshots to S3 in another region for disaster recovery. #### Aurora Aurora is a MYSQL(5x Performance) and PostgresSQL(3x Performance)-compatible RDB engine. - Storage: Start with 10GB, Scales in 10GB increments to 64TB (autoscaling) - Compute resourses: Scale up to 32vCPUs and 244GB of Memory. - Availability Zone: 2 copies of your data is contained in each availability zone, with MIN of 3 AZ. 6 copies of your data. ![](https://i.imgur.com/PCGfP7e.png) Aurora Serverless provides a relatively simple, cost-effective option for infrequent, intermittent, or unpredictable workloads.(pay-per-invocation) Tips: - 2 copies in each AZ, MIN of 3 AZ, 6 copies - You can share Aurora snapshot with other AWS accounts - Automated backups turned on by default - 3 types of replica available. Aurora, MySQL & PostgresQL - Aurora Serverless if you want a simple, cost-effective option for infrequent, intermittent, or unpredictable workloads. #### Elasticache ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. - Memcached - Redis - ![](https://i.imgur.com/4nVkQiD.png) Tips: - Use Elasticahe to increase DB and web application performance - Redis is Multi-AZ - Memcached is scale horizontally - You can do back ups and restores of Redis #### Caching Strategies Caching is a balancing act between **up-to-date, accurate information and latency**. - CloudFront - API Gateway - ElastiCache - **Memcached and Redis** - DynamoDB Accelerator(DAX) ![](https://i.imgur.com/eIP6gyr.png) ### Elastic Map Reduce (EMR) EMR is the industry-leading cloud big data platform for processing vast amounts of data using open-source tools(Spark, Hive, HBase, Flink, Hudi, presto) PB-scale analysis at **less than half the cost of traditional on-premisees solutions** and 3x faster that standard Spark. **The central component of EMR is the cluster. A collection of EC2 instances** #### Node types of EMR - Master node(exactly one in the cluster) - A node that manages the cluster(track status) - Core node(at least one in the cluster) - A node with software components that **runs tasks and stores data ** in the Hadoop Distributed File System(HDFS) on your cluster. - Task node(optional) - A node with software components that only runs tasks and **does not sore data in HDFS** ![](https://i.imgur.com/L23jOO6.png) ![](https://i.imgur.com/Xqv5jzr.png) Solution: You can configure a cluster **periodically archive the log files stored on the master node to Amazon S3.** > EMR archives the log files to Amazon S3 at ** five-minute intervals. (first set up only) Tips: - Big data processing - Consists of a **master node**, **core node** and a **task node** - By default, log data is **stored on the master node** - You can configure replication to S3 on ** five-minutes intervals for all log data from the master node**.(only be configured when creating) > MongoDB is NoSQL > If I wanted to run a database on an EC2 instance, which of the following storage options would Amazon recommend? EBS > Redshift would be the most suitable for OLAP, DW and BI. > RDS would be the most suitable for OLTP.