# AWS Certified Solutions Architect - Associate (Database)
###### tags: `AWS`
## Database
Two features:
- Multi-AZ - for Disaster Recovery

- Read Replicas - for performance

### ElastiCache
ElastiCache supports tow open-source in-memory caching engines:
- Memcached
- Redis
Tips:
**RDS(OLTP)**
- SQL
- MySQL
- PostgreSQL
- Oracle
- Aurora
- MariaDB
**DynamoDB(No SQL)**
**Red Shifted OLAP**
> Notes:
> Redshift for Business Intelligence or Data Warehousing.
> Elasticache to speed up performance of existing databases (frequent identical queries)
Tips:
- RDS runs on VM
- You cannot log in to these operating systems however.
- Patching of the RDS OS and DB is Amazon's responsibility
- RDS is NOT Serverless
- Aurora Serverless IS Serverless
### RDS - backup
- Automated Backups
- automated
- enabled by default
- get S3 storage equal to the size of the DB
- DB Snapshots
- manually done
- they are stored after you delete the original RDS instance
Notes: :::Whenever you restore either an automatic backup or a manual snapshots, the restored version of DB will be a new RDS instance with new DNS endpoints:::

**Encryption is done using the AWS Key Management Service(KMS)**
### RDS - Multi-AZ & Read Replicas
- Multi-AZ - for Disaster Recovery
- Used for DR
- Can force a failover from one AZ to another by rebooting the RDS instance.
- An exact copy of your prod DB in another AZ
- The write will automatically be synchronized to the stand by DB
- Failover to the standby DB without administrative intervention
- Available for the DB
1. **SQL server**
2. Oracle
3. MySQL server
4. PostgresSQL
5. MariaDB
>Notes:
>- **Multi-AZ is for disaster Recovery only**
>- **Not primarily used for improving performance**
- Read Replicas - for performance
- READ-ONLY, can be multi-AZ, different regions
- increase perfomance, and must turned on backups
- Can be promoted to master, this will break the read replica
- Asynchronous replication from the primary RDS instance to the read replica.
- For very read-heavy DB workloads.
- Available for the DB
1. **Aurora**
2. Oracle
3. MySQL server
4. PostgresSQL
5. MariaDB
>Notes:
> - Used for scaling, not for DR!!
> - Must have automatic backups
> - You can have up to 5 read replica copies
> - You can have read replicas of read replicas(Latency?)
> - Each read replica will have its own DNS
> - You can have read replicas that have Multi-AZ
> - You can create read replicas of Multi-AZ source DB
> - Read replicas can be promoted to be their own DB. This breaks the replication.
> - You can hava a read replica in a second region
### DynamoDB
DynamoDB is a fast and flexible NoSQL DB service for all applications that need consistent, single-digit ms latency at any scale.
- Stored on SSD
- Spread across 3 geographically distinct data centres
- Eventual Consistent Reads(default)
- Strongly Consistent Reads(within one second)
#### DynamoDB Accelerator (DAX)
- Fully managed, high available, in-memory cache
- 10x performance
- Reduces requests time from ms to us
- No need to manage caching logic

#### Transactions
- Multiple "all-or-nothing" operations
- Financial transactions
- Fulfilling orders
- Two underlying reads or writes - prepare/commit
- Up to 25 items or 4MB of data
##### On-Demand Capacity
- Pay-per-request pricing
- Balance cost and performance
- No minimum capacity
- No charge for read/write - only storage and backups
- **Pay more per request** than with provisioned capacity
- Use for new product launches
##### On-Demand Backup and Restore
- Full backups at any time
- Zero impact on table performance or availability
- Consistent within seconds and **retained until deleted**
- Operates within same region as the source table
##### Point-In-Time Recovery
- Protects against accidental writes or deletes
- Restore to any point in the last **35 days**
- Incremental backups
- Not enabled by default
- Lastest restorable: **five minutes** in the past
##### Streams
- Time-ordered sequence of item-level changes in a table

- Stored for **24 hours**
- Inserts, updates, and deletes
- Combine with Lambda functions for functionality like stored procedures
##### Global Tables
Managed Multi-Master, Multi-Region Replication
- Globally distributed applications
- Based on DynamoDB streams
- Multi-region redundancy for DR or HA
- No applicartion rewrites
- Replication latency under **one second**
##### DMS
DMS is a cloud service that makes it easy to migrate RDB, data warehouse, NoSQL databases, and other types of data stores.



- homogeneous

- heterogeneous

**You do not need AWS Schema Conversion Tool(SCT) if you are migrating to identical DB**
Tips:
- DMS allows you to migrate DB from one source to AWS
- The source can either be on-premises, or inside AWS itself or another cloud provider such as Azure
- You can do **homogeneous** migrations(same DB engines) or **heterogeneous** migrations
- If you do a heterogeneous migration, you will need to **AWS Schema Conversion Tool**(SCT).
##### Security
- Encryption at rest using **KMS**
- Site-to-site VPN
- Direct Connect(DX)
- IAM policies and roles
- Fine-grained access
- CloudWatch and CloudTrail
- VPC endpoints
### Redshift
is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud.
#### Configuration
- Single Node(160Gb)
- Multi-Node
- Leader Node (manages client connections and receives queries)
- Compute Node (store data and preform queries and computations) Up to 128 Compute Nodes
#### Compression
- Columnar data stores can be compresed much more than row based data stores.
- Doesn't require indexed or materialized views.
#### Massively Parallel Processing (MPP)
Amazon Redshift automatically distributes data and query load across all nodes.
#### Backups
- 1 day retention period by default
- Maximum retention period is 35 days
- Redshift always attempts to maintain at least 3 copies of your daya (the **original**, **replica** on compute nodes and **backup** in S3)
- Redshift can also asynchronously replicate your snapshots to S3 in another region for disaster recovery.
#### Pricing
- Compute Node Hours
- Backup
- Data transfer (only within a VPC)
#### Security
- Encrypted in transit using SSL
- Encrypted at rest using AES-256 encryption
- By defalut RedShif takes care of key management
- Manage your own keys through HSM
- AWS KMS
#### Availability
- Currently only available in 1AZ
- Can restore snapshots to new AZs in the event of an outage
Tips:
- Redshift is used for BI
- Available in only 1AZ
- 1~35 days retention period
- Attempts to maintain at least 3 copies of your data
- asynchronously replicate your snapshots to S3 in another region for disaster recovery.
#### Aurora
Aurora is a MYSQL(5x Performance) and PostgresSQL(3x Performance)-compatible RDB engine.
- Storage: Start with 10GB, Scales in 10GB increments to 64TB (autoscaling)
- Compute resourses: Scale up to 32vCPUs and 244GB of Memory.
- Availability Zone: 2 copies of your data is contained in each availability zone, with MIN of 3 AZ. 6 copies of your data.

Aurora Serverless provides a relatively simple, cost-effective option for infrequent, intermittent, or unpredictable workloads.(pay-per-invocation)
Tips:
- 2 copies in each AZ, MIN of 3 AZ, 6 copies
- You can share Aurora snapshot with other AWS accounts
- Automated backups turned on by default
- 3 types of replica available. Aurora, MySQL & PostgresQL
- Aurora Serverless if you want a simple, cost-effective option for infrequent, intermittent, or unpredictable workloads.
#### Elasticache
ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud.
- Memcached
- Redis
- 
Tips:
- Use Elasticahe to increase DB and web application performance
- Redis is Multi-AZ
- Memcached is scale horizontally
- You can do back ups and restores of Redis
#### Caching Strategies
Caching is a balancing act between **up-to-date, accurate information and latency**.
- CloudFront
- API Gateway
- ElastiCache - **Memcached and Redis**
- DynamoDB Accelerator(DAX)

### Elastic Map Reduce (EMR)
EMR is the industry-leading cloud big data platform for processing vast amounts of data using open-source tools(Spark, Hive, HBase, Flink, Hudi, presto)
PB-scale analysis at **less than half the cost of traditional on-premisees solutions** and 3x faster that standard Spark.
**The central component of EMR is the cluster. A collection of EC2 instances**
#### Node types of EMR
- Master node(exactly one in the cluster)
- A node that manages the cluster(track status)
- Core node(at least one in the cluster)
- A node with software components that **runs tasks and stores data ** in the Hadoop Distributed File System(HDFS) on your cluster.
- Task node(optional)
- A node with software components that only runs tasks and **does not sore data in HDFS**


Solution: You can configure a cluster **periodically archive the log files stored on the master node to Amazon S3.**
> EMR archives the log files to Amazon S3 at ** five-minute intervals. (first set up only)
Tips:
- Big data processing
- Consists of a **master node**, **core node** and a **task node**
- By default, log data is **stored on the master node**
- You can configure replication to S3 on ** five-minutes intervals for all log data from the master node**.(only be configured when creating)
> MongoDB is NoSQL
> If I wanted to run a database on an EC2 instance, which of the following storage options would Amazon recommend? EBS
> Redshift would be the most suitable for OLAP, DW and BI.
> RDS would be the most suitable for OLTP.