AWS Topics - HackMD

# AWS Topics # I. AWS Architecture ## 1. Region: - region is a cluster of data centers that has its name tied to a specific geography region. Here are some examples of region name: - us-east-1 (US East - N.Virginia) - af-south (Africa - Cape Town) - ap-east-1 (Asia Pacific - Hong Kong) ## 2. Availability Zones: - Each region has many availability zones (usually 3, the minimum is 3, the maximum is 6). Here are the example name of the Availability zone within the AWS region ap-southeast-2 (Sydney): - ap-southeast2a - ap-southeast2b - ap-southeast2c - Each availability zone (AZ) is one or more discrete data centers with redundant power, networking and connectivity - They are separte from each other so that they're isolated from disasters - These availability zones they are connected with high bandwidth, low latency networking to form a region ![](https://i.imgur.com/FrNoE3X.png) # II. IAM (Identity and Access Management) ## 1. Introduction: - This is an identity and access management service. It is not an authentication service for user. It is a service to organize users and permission to access aws console - **Root account** is created by default. It shouldn't be used or shared - **Users** are people within your organization, and can be grouped - **Group** only contains users, not other group - Users don't have to belong to a group, and user can belong to multiple groups ![](https://i.imgur.com/msy19aU.png) ## 2. Permissions: - **Policies**: Users or Groups can be assigned JSON documents called policies - These policies define the permissions of the users - In AWS, you apply the **least priviledge principle**: don't give more permissions than a user need ## 3. Policies: - Policy is a set of permissions and access controls that define what actions an AWS user or an AWS user group is allowed or denied to perform - Policy can be assigned and appliced to an user or an user group. An user can have multiple policies that are applied from different groups. ![](https://i.imgur.com/GAS4z6f.png) - IAM policies structure: consist of - Version: policy language version - Id (optional): id for the policy - Statement (required): one or more individual statements. Statements consists of: - Sid (optional): identifier for the statement - Effect: whether the statement allows or denies access (Allow, Deny) - Principal: account/user/role to which this policy is applied to - Action: list of actions this policy allows or denies - Resource: list of resources to which the actions applied to - Condition (optional): conditions for when this policy is in effect ![](https://i.imgur.com/s9ByKEx.png) - Password policy: in AWS, root users and admin users can setup a custom password policy ## 4. MFA (Multi Factor Authentication): - You want to protect your root account and IAM users - MFA = password you know + security device you own - Benefit of MFA: if a password is stolen or hacked, the account is not compromised ![](https://i.imgur.com/ViTbQxG.png) - MFA options: - Virtual MFA device: a phone or a laptop with authenticator app - Universal 2nd Factor Security (U2F): a physical device such a special USB device - Hardware Key Fob MFA device: a key fob device that display MFA passcode ![](https://i.imgur.com/R1pus8I.png) ## 5. AWS Access Key, CLI and SDK: - To access AWS, you have 3 options: - AWS management console (protected by password + MFA) - AWS Command Line Interface (CLI - protected by access key) - AWS software development kit (SDK - protected by access key) - See how to setup cli access on Mac OS X in video 20 ## 6. IAM roles: - Some AWS service will need to perform actions on your behalf - To do so, we will assign permission to AWS services with IAM roles - Common roles: - EC2 Instance Roles - Lambda Function Roles - Roles for CloudFormation ![](https://i.imgur.com/u4YPO17.png) ## 7. IAM Security Tools: - IAM Credentials Report (account-level): - a report that lists all your account's users and the status of their various credentials - How to access ? IAM > Access Report > Credential Reports > Download the report - IAM Access Advisor (user-level): - Access advisor shows the service permissions granted to a user and when those services were last accessed - You can use this information to revise your policies - How to access ? IAM > Users > Choose a certain user > click on tab Access Advisor ## 8. Best Practices & Summary - Don't use the root account except for AWS account setup - One physical user = one AWS user - Assign user to groups and assign permissions to group - Create a strong password policy - Use and enforce the use of MFA - Create and use Roles for giving permissions to AWS services - Use Access Keys for Programmatic Access (CLI/SDK) - Audit permissions of your account using IAM Credentials Reports & IAM Access Advisor - Never share IAM users & access keys ![](https://i.imgur.com/8A4wO7U.png) # III. Amazon EC2: ## 1. What is EC2 ? - EC2 = Elastic Compute Cloud = Infrastructure as a Service - It mainly consists the capability of: - Renting virtual machines (EC2) - Storing data on a virtual device (EBS) - Distributing load across machines (ELB) - Scaling the services using an auto-scaling group (ASG) - Configuration options: - Operating System - Compute power & cores (CPU) - Random-access memory (RAM) - Storage space: - Network attached (EBS & EFS) - Hardware (EC2 Instance Store) - Network card: speed of the card, Public IP - Firewall rules: security group - Bootstrap script (configure at first launch): EC2 User Data - Bootstraping capability: means launching commands when a machine starts. That script is only run once at instance first start. Bootstrap can automate: - Installing updates - Installing software - Downloading files or any other tasks ## 2. EC2 Instance Types: - AWS has the following naming convention. Example: m5.2xlarge - m: instance class - 5: generations (AWS improves them overtime) - 2xlarge: size within the instance class - Different EC2 instance types: https://aws.amazon.com/ec2/instance-types/ | Instance Type | Great For | Examples | | -------- | -------- | -------- | |General Purpose|Balance between: compute, memory, networking|t2.micro| |Compute Optimized|Compute intensive tasks that require high performance processors|Intensive tasks: Batch processing workloads, Media transcoding, High performance web servers, Scientific modeling & machine learning, dedicated gaming servers| |Memory Optimized|Fast performance for workloads that process large data in memory (RAM)|use cases: High performance databases, Distributed web scale cache stores, In-memory databases optimized for BI, Applications performing real-time processing of big unstructured data| |Storage Optimized|Great for storage intensive tasks that require high, sequential read and write access to large data sets on local storage|Use cases: High frequency online transaction processing systems, Databases, Cache for in-memory databases, Data warehousing applications, Distributed systems| ![](https://i.imgur.com/uUQu71w.png) - Find the right instance for your project: https://instances.vantage.sh/ ## 3. Security Groups: ### a. Introduction - Security Groups control how traffic is allowed into or out of our EC2 instances - Security groups only contain allow rules, rules can reference by IP or by security group - Security group are acting as firewall on EC2 instances. They regulate: - Access to Ports - Authorized IP ranges - IPv4 and IPv6 - Control of inbound network and outbound network (from other to instance and from instance to other) ![](https://i.imgur.com/EUUCeeg.png) ![](https://i.imgur.com/p2vDK7E.png) - In order to access EC2 for troubleshooting, you can access through SSH ### b. Further information: - Security Groups can be attached to multiple instances - Security groups are locked down to a region / VPC combination - Security group is located outside of the EC2 - if traffic is blocked, the EC2 instance won't see those traffic - It's good to maintain one separate security group for SSH access - If your application is not accessible (timed out), then it's a security group issue - If your application receives a "connection refused" error, then it's an application error or it's not launched - By default: - All inbound traffic is blocked - All outbound traffic is authorized - Class ports to know: - 22 = SSH (secure shell) - log into a Linux instance - 21 = FTP (file transfer protocol) - upload files into a file share - 22 = SFTP (secure file transfer protocol) - upload files using SSH - 80 = HTTP - access unsecure websites - 443 = HTTPS - access secured websites - 3389 = RDP (Remote Desktop Protocol) - log into a Window instance ### c. Referencing other security group: - You can reference different security groups to each other and then assign it to different EC2 instances. That way you don't have to setup custom ip addresses within the security groups - Watch the use of Security Group 1 and Security Group 2 in this diagram: ![](https://i.imgur.com/CMhBkoA.png) ## 4. Purchasing Options: ### a. On-Demand Instances - Short workload, predictable pricing, pay by second - Linux or Windows - billing per second, after the first minute - All other OS: billing per hour - Has the highest cost but no upfront payment - No long term commitment => Recommended for **short-term** and **un-interrupted** workloads, where you can't predict how the application will behave ### b. EC2 Reserved Instances - Up to 72% discount compared to On-demand - You reserved a specific instance attributes (Instance Type, Region, Tenancy, OS) - Reservation Period: 1 year or 3 years. 3 years got more discount - Payment Options: - No Upfront - Partial Upfront - All Upfront - Reserved Instance's Scope: Regional or Zonal (reserve capacity in an AZ) - You can buy and sell Reserved Instance in the Marketplace => Recommended for steady-state usage application (think database) **Convertible Reserved Instance:** - Can change the EC2 instance type, instance family, OS, scope and tenancy - Up to 66% discount ### c. EC2 Saving Plans - Get a discount based on long term usage (up to 72% - Same as Reserved Instance) - Commit to a certain type of usage ($10/hour for 1 or 3 years) - Usage beyond EC2 Saving Plans is billed at the On-Demand price - Locked to a specific instance family & AWS region - Flexible accross: - Instance Size (e.g: m5.xlarge, m5.2xlarge) - OS (e.g: Linux, Window) - Tenancy (host, dedicated, default) ### d. EC2 Spot Instances - Can get a discount to up to 90% compared to the On-Demand - Instances that you can lose at any point of time if your max is less than the current spot price - This is the MOST cost-efficient instances in AWS - Useful for workloads that are resilient to failure: - Batch jobs - Data analysis - Image processing - Any distributed workloads - Workloads with a flexible start and end time - Not suitable for critical jobs or databases ### e. EC2 Dedicated Host - A physical server with EC 2 instance capacity fully dedicated to your use - Allows you to address compliance requirements and use your existing server-bound software licenses (per-socket, per-core, per-VM software licenses) - Purchasing Options: - On-Demand: pay per second for active Dedicated Host - Reserved: 1 or 3 years (No upfront, partial upfront, all upfront) => This is the most expensive option - Useful for software that have complicated licensing model (BYOL - Bring Your Own License) - Or for companies that have strong regulatory or compliance needs ### f. EC2 Dedicated Instances: - Instances run on hardware that's dedicated to you - May share hardware with other instances in the same account - You have no control over instance placement (hardware can be moved after Stop/Start) - With the EC2 dedicated instances, even though you have a dedicated hardware, you don't have access to the low-level hardware, only the instance ### g. EC2 Capacity Reservations: - Reserve On-Demand instances capacity in a specific AZ (Availability Zone) for any duration - You always have access to EC2 capacity when you need it - No time commitment (create / cancel anytime), no billing discounts - Combine with Regional Reserved Instances and Saving Plans to benefit from billing discounts - You're charged at On-Demand rate whether you run instances or not => suitable for short-term, uninterrupted workloads that needs to be in a specific AZ The Resort Analogy: ![](https://i.imgur.com/oquQAGU.jpg) ![](https://i.imgur.com/q7af0rE.png) # IV. EC2 Storage Options: ## 1. EBS Volume (Elastic Block Store); ### a. What is it ? - EBS (Elastic Block Store) Volume is a **network drive** (sort of an usb stick) that you can attach to your instances while they run - It allows your instances to persist data, even after termination. It can also serve as an additional option to expand storage, store database, etc. - EBS can only be mounted to one instance at a time and they are bound to a **specific availability zone** *, **Free tier:** 30GB of free EBS storage of the SSD or Magnetic per month ### b. EBS characteristics: - It's a network drive: - It uses the network to communicate to the instance, which means there might be a bit of latency - It can be detached from an EC2 instanced and attached to another one quickly - It's locked to an Availability Zone (AZ): - An EBS Volume in us-east-1a cannot be attached to us-east-1b - To move a volume across, you first need to snapshot it - Have provisioned capacity: - Size in GBs - Size in Input-Output-Operations-Per-Second (IOPS) Example: 1 or more EBS can be attached to one instances and EBS can also be left unattached ![](https://hackmd.io/_uploads/rJXArND4h.png) ### c. Delete on termination: - Control the EBS behavior when an EC2 instance terminates: - By defaults, the root EBS volume is deleted (attributes enabled) - By defaults, any other attached EBS volume is not deleted (attribute disabled) - This can be controlled by the AWS console ![](https://hackmd.io/_uploads/HyiuROP43.png) ### d. Volume Types: - EBS Volumes are characterized in Size / IOPS - 6 EBS Volume types: - gp2 / gp3 (SSD): general purpose SSD volume that balances price and performance for a wide variety of workloads - io1 / io2 (SSD): Highest performance SSD volume for mission-critical low latency or high-throughput workloads - st1 (HDD): Low cost HDD volume designed for frequently accessed. throughput intensive workloads - sc1 (HDD): Lowest cost HDD volume designed for less frequently accessed workloads ## 2. EBS Snapshot: ### a. What is it ? - Make a backup (snapshot) of your EBS volume at a point in time - Not necessary to detach volume to do snapshot but it is recommended - You can copy snapshots across AZ or Region ![](https://hackmd.io/_uploads/rkdgj4wEn.png) ### b. EBS snapshot features: - EBS Snapshot Archive: - Move a snapshot to an archive tier that is 75% cheaper - Takes within 24 to 72 hours for restoring the archive ![](https://hackmd.io/_uploads/SkMf34wN2.png) - Recycle Bin for EBS Snapshots: - Setup rules to retain deleted snapshots so you can recover them after an accidental deletion - You can specify the retention of the bin (1 day to 1 year) ![](https://hackmd.io/_uploads/SJ6fhEwE2.png) - Fast snapshot restore (FSR): - Force full initialization of snapshot to have no latency on the first use (this one is costly) See video 46 for creating snapshot ## 3. EBS Multi-Attach: - For io1 and io2 family, you can attach the same EBS volume to multiple EC2 instances in the same AZ - Each instance has full read & write permissions to the high-performance volume - Use case: - Achieve higher application availability in clustered Linux application - Application must manager concurrent write operations - This multi-attach feature can be attached to 16 EC2 instances at a time - For this to work, must use a file system that's cluster-aware (not XFS, EXT4, etc.) ![](https://hackmd.io/_uploads/r1RNVPPE3.png) ## 4. AMI (Amazon Machine Image): ### a. What is it ? - AMI are a customization of an EC2 instance: - You add your own software, configuration, os, monitoring, etc. - Faster boot / configuration time because all your software is pre-packaged - AMI are built for a specific region (and can be copied across regions) - You can launch EC2 instances from: - A Public AMI: AWS provided - Create your own AMI: make and maintain yourself - An AWS marketplace AMI: someone else made (and potentially sell) ### b. AMI Process (from an EC2 instance): - Start an EC2 instance and customize it - Stop the instance (for data integrity) - Build an AMI - this will also create EBS snapshots - Launch instances from other AMI ![](https://hackmd.io/_uploads/r1awuBwEn.png) ## 5. EC2 Instance Store: - Similar to EBS but higher I/O performance - It is physically attached to the EC2 instance - Characteristics: - Better I/O performance - EC2 instace store lose their storage if they are stopped - Good for buffer / cache / scratch data / temporary content - Risk of data loss if hardware fails => When there is a high IOPS (Input-Output-Operations-Per-Second), use this ## 6. EFS (Elastic File System): ### a. What is it ? - Managed NFS (network file system) that can be mounted on many EC2 and secured with Security Group - EFS works with EC2 instances in multiple AZ - Highly available, scalable and expensive (3x gp2), pay per use - Accessibility: - AMIs are built for a specific AWS Region, they're unique for each AWS Region. - You can't launch an EC2 instance using an AMI in another AWS Region - but you can copy the AMI to the target AWS Region and then use it to create your EC2 instances. ![](https://hackmd.io/_uploads/SkDVBdDN2.png) ### b. More information: - Use cases: content management, web serving, data sharing, wordpress - Uses security group to control access to EFS - Compatible with Linux based AMI (not Windows) - File system scales automatically, pay per use, no capacity planning For performance and storage classes, see slide 89 and 90 in this link: https://media.datacumulus.com/aws-dva/AWS%20Certified%20Developer%20Slides%20v15.pdf ## 7. EBS vs EFS vs Instance Store: - EBS: - Elastic Block Store - Is bounded to 1 AZ and dedicated to 1 EC2 instance - There is EBS multi-attach which can be attached to maximum of 16 EC2 instances in the same AZ - Can be accessed under EC2 Dashboard > Elastic Block Store > Volumes or Snapshot option - Use snapshot to transfer EBS data between AZs ![](https://hackmd.io/_uploads/ByllauDEh.png) - EFS: - Mounting 100s of instances across multiple AZ - Has a much higher price point than EBS - Can leverage EFS-IA for cost saving - Use case: EFS share website files (WordPress) - Only for Linux instances, not other OS instances - Can be accessed under: EFS Dashboard ![](https://hackmd.io/_uploads/SkjWp_PEn.png) - Instance Store: - Physically attached to the EC2 Instances - Has a high I/O performance # V. AWS Fundamental: ELB + ASG ## 1. Scalability - Scalability means the ability to scale a system in order to handle certain workload without sacraficing performance or availability ### a. Vertical Scalability: - Vertical scalability means increasing or decreasing the capacity of the instance - For example, scaling from t2.micro instance into t2.large - This is very common for non-distributed system such as a database. - In this type of scalability you can **scale up** or **scale down** **Call center example:** ![](https://hackmd.io/_uploads/SJybODOEh.png) ### b. Horizontal Scalability: - Horizontal Scalability means increasing the number or instances / systems for your application - Horizontal scaling implies distributed system, common for web applications / modern applications - In this type of scalability, you can **scale out** or **scale in** **Call center example:** ![](https://hackmd.io/_uploads/BkbuuDu43.png) ## 2. High Availability: - High availability usually goes hand in hand with horizontal scaling - This means running your application in at least 2 data centers (== Availability Zones) - The goal of high availability is to survive a data center loss **Call center example:** ![](https://hackmd.io/_uploads/S1RptwdEh.png) ## 3. Load Balancer Overview: ### a. What is it and why do we use it ? - Load Balancers are servers that forward traffic to multiple servers such as EC2 instances. - Reasons for using load balancers: - Spread load across multiple downstream instances - Expost a single point of access (DNS) to your application - Seamlessly handle failures of downstream instances - Perform regular health check for your instances - Provide SSL termination (HTTPS) for your websites - Enforce stickiness with cookies - High availability accross zones - Separate public traffic from private traffic ![](https://hackmd.io/_uploads/S1aGVduVh.png) - Why do we use Elastic Load Balancer (ELB) from AWS ? - ELB is a managed load balancer. AWS takes care of maintainance, upgrades and high availability - It take minimal effort to setup - It's integrated with many AWS services ### b. Health Check & Security Groups: - Load Balancers perform the health check on a certain port and route of an instance to make sure the instance is healthy ![](https://hackmd.io/_uploads/Sy4hBduV3.png) - Load Balancer Security Group: - Anyone should be able to access Load Balancer - Only ELB should have access to the EC2 instances ![](https://hackmd.io/_uploads/HkEHIuOV2.png) ### c. Types of Load Balancers: - Classic Load Balancer (v1 - old generation) - 2009: - Short name: CLB - Deprecated but still available - Allows HTTP, HTTPS, TCP, SSL (secure TCP) - Application Load Balancer (v2 - new generation) - 2016: - Short name: ALB - Allows: HTTP, HTTPS, WebSocket - Network Load Balancer (v2 - new gen) - 2017: - Short name: NLB - Allows: TCP, TLS (Secure TCP), UDP - Gateway Load Balancer - 2020 - Short name: GWLB - Operates at layer 3 (Network Layer) - IP protocol => Overall, it's recommended to use the newer version of ELB as it provide more features => Some ELB can be setup as internal (private) or external (public) ## 4. Application Load Balancer (ALB): ### a. Introduction: - ALB is layer 7 (http), it helps balancing to multiple HTTP applications routing across multiple machines (target group) - Load Balancing features: - Load balancing multiple HTTP applications routing across multiple machine (target group) - Load balancing to multiple applications on the same machine (containers) - Support HTTP/2, Websocket and redirects (HTTP to HTTPS for example) ### b. Routing Capabilities: - Routing table to different target group: - Routing based on path in URL (example.com/users & example.com/posts) - Routing based on the hostname in URL (one.example.com & other.example.com) - Routing based on query string, headers (example.com/users?id=123&order=false) - Has port mapping feature to redirect to a dynamic port in ECS ![](https://hackmd.io/_uploads/BJekIyKu42.png) - ALB are a great fit for micro-services & container-based applications (e.g: Docker & Amazon ECS) ### c. What are target groups ? - Target group: a group of same type resources that can handle traffic. When you create a target group, you specify a set of targets that should receive traffic. - Target group can be: - A group of EC2 instances - ECS tasks - Lambda Functions - IP Addresses - must be private IPs - ALB can route traffic to multiple target groups - Health checks are at the target group level ![](https://hackmd.io/_uploads/BJxDJYdNn.png) ### d. Good to know: - You get a Fixed host name with your ALB (XXX.region.elb.amazonaws.com) - The application servers (such as an EC2 instance) don't see the IP of the client directly: - The true IP of the client is inserted in the header X-Forwarded-For - We can also get Port (X-Forwarded-Port) and proto (X-Forwarded-Proto) - This is because the EC2 instances or any other target group interact with the Load Balancer so the client IP it receives is the IP of the Load Balancer