# AWS Certified Solutions Architect - Associate(IAM, S3)
###### tags: `AWS`
## IAM
### IAM
IAM allows you to manage users and their level of access to the AWS console.
#### IAM features
- Centralized control of your AWS account
- Shared Access to your AWS account
- Granular Permissions
- Identity Federation
- Multifactor Authentication
- Provide temporary access for users/devices and services where necessary
- Allows you to set up your own password rotation policy
- Integrates with many different AWS services
- Supports PCI DSS Compliance
#### Key Terminology for IAM
- Users: End users such as people, employees of an organization etc.
- Groups: A collection of users. Each user in the group will inherit the permissions of the group.
- Policies: Polices are mode up of documents, called Policy documents.
- Roles: You create roles and then assign them to AWS Resources.
Tips:
- **IAM is universal**. It does not apply to regions at this time.
- The "**root account**" is simply the account created when first setup your AWS account. It has complete Admin access.
- New users are assigned **Access Key ID & Secret Access Keys** when first created.
- **These are not the same as a password.** You cannot use the Access key ID & Secret Access Key to Login in to the console. You can use this to access AWS via the APIs and CLI.
- **You only get to view these once.** If you lose them, you have to regenerate them.
## S3
https://www.amazonaws.cn/en/s3/faqs/
### S3 Security & Encryption
- Encryption in Transit is achieved by
- SSL/TLS
- Encryption At Rest (Server Side) is achieved by
- S3 Managed Keys - **SSE-S3**
- AWS Key Management Service, Managed Keys - **SSE-KMS**
- Server Side Encryption With Customer Provided Keys - **SSE-C**
- Client Side Encryption
### S3 Versioning
#### Using Versioning with S3;
- Stores all versions of an object(including all writes and even if you delete an object)
- Great backup tool
- Once enabled, Versioning **CANNOT** be disabled, only suspended.
- Integrates with Lifecycle rules
- Versioning's MFA Delete capability, which uses multi-factor authentication, can be used to provide an additional layer of security
#### S3 Lifecycle management, IA & Glacier
- Automates moving your objects between the different storage tiers
- Can be used in conjunction with versioning
- Can be applied to current versions and previous versions
#### S3 Object Lock & Glacier Vault Lock
- to store objects using a write once, read many (WORM) model. It can help you prevent objects from being deleted or modified for a fixed amount of time or indefinitedly.
- to meet regulatory requirements that require WORM storage, or add an extra layer of protection against object changes and deletion.
1. S3 Object Lock modes: Governance mode
In governance mode, users can't overwrite or delete an object version or alter its lock setting unless they have special permissions.
2. S3 Object Lock modes: Compliance mode
In Compliance mode, a protected object version can't be overwritten or deleted by any user, including the root user in your AWS account.
> Notes:
> **Retention Periods** protects an object version for a fixed amount of time.
> **Legal Holds**
> **S3 Glacier Vault Lock** allows you to easily deploy and enforce compliance controls for individual S3 Glacier vaults with a Vault Lock policy. You can **specify controls such as WORM in a Vault Lock policy and lock the policy from future edits**. Once locked, the policy can no longer be changed.
#### Tips:
1. Use S3 object Lock to store objects using a write once, read many (WORM) model.
2. Object locks can be on individual objects or applied across the bucket as a whole.
3. Object locks come in two modes: **governance mode** and **compliance mode**
### S3 Performance
#### What is S3 prefixes
- [x] mybucketname/folder1/subfolder1/myfile.jpg > **/folder1/subfolder1**
- [x] mybucketname/folder2/subfolder1/myfile.jpg > **/folder2/subfolder1**
- [x] mybucketname/folder3/myfile.jpg > **/folder3**
- [x] mybucketname/folder4/subfolder4/myfile.jpg > **/folder4/subfolder4**
#### S3 Performance by prefixes
1. Two prefixes ->11,000 requests/s
2. Four prefixes -> 22,000 requests/s
#### S3 Limitations when using KMS
- If you are using SSE-KMS to encrypt your object in S3, you must keep in mind the **KMS limits**.
- When you **upload** a file, you will call **GenerateDataKey** in the KMS API.
- When you **download** a file, you will call **Decrypt** in the KMS API.
- [x] Uploading/downloading will count toward the **KMS quota**.
- [x] Region-specific, however, its either **5,500, 10,000 or 30,000** requests per second.
- [x] Currently, you **cannot** request a quota increase for KMS.
#### Multipart uploads
- Recommended for files **over 100 MB**
- Required for files **over 5 GB**
- Parallelize uploads (increases **efficiency**)

#### S3 Byte-Range Fetches
- Parallelize **downloads** by specifying byte ranges.
- If there's a failure in the download, it's only for a specific byte range.

- Features

#### S3 Select
S3 Select enables applications to retreive only a subset of data from an object by using simple SQL expressions. **Retrieving only the data needed by your application**, you can **achieve drastic performance increases** as much as a 400% improvement.

- [x] S3 Selected is used to **retrieve only a subset of data** from an object by using simple SQL expressions.
- [x] Get data by rows or columns using simple SQL expressions
- [x] Save money on **data transfer** and increase speed.
Example: zipped csv.
#### Glacier select
Glacier Select allows you to run SQL queries against Glaciers directly.
#### AWS organization & Consolidated billing
- Advantages of Consolidated billing
- One bill per AWS account
- Very easy to track charges and allocate costs
- Volume pricing discount
#### Sharing S3 buckets across accounts
- Using buckets policies & IAM (applies across the entire bucket). Programmatic Access Only
- Using Bucket ACLs & IAM (individual objects). Programmatic Access Only
- Cross-account IAM Roles. Programmatic AND Console access.
#### Cross region replication
- Versioning must be enabled on both the source and destination buckets.
- Files in an existing bucket are not replicated automatically.
- All subsequent updated files will be replicated automatically.
- Delete makers are not replicated.
- Deleting individual versions or delete markers will not be replicated.
- Understand what Cross Region Replication is at a high level.
#### S3 transfer Acceleration
It utilises the **CloudFront Edge Network** to acclerate your uploads to S3.
http://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html
#### AWS DataSync

- Used to move **large amounts** of data from on-premises to AWS.
- Used with **NFS-** and **SMB-** compatible file systems.
- **Replication** can be done hourly, daily, or weekly.
- Install the **DataSync agent** to start the replication.
- Can be used to replicate **EFS to EFS**.
#### CloudFront
A CDN is a systen of distributed servers(networks) that deliver webpages and other web content to a user based on the geographic locations of the user.
##### Key Terminology
- Edge Location - Where content will be cached. This is separate to an AWS region/AZ.
- Origin - This is the origin of all the files that CDN will distributed.
- Distribution - This is the name given the CDN which consists of a collection of Edge Locations.
- Web Distribution - Typically used for websites.
- RTMP - Used for Media streaming.
> Notes:
> - Edge locations not only can just READ, but also can WRITE.
> - Objects are cached for the life of TTL.
> - You can clear cached objects, but you will be charged.
#### Cloudfront signed urls & cookies
1. A signed URL is for individual files.
**1 file = 1URL**
2. A signed cookie is for multiple files.
**1 cookie = multiple files**
When we create a signed URL or signed cookie, we attach a policy
The policy can include:
- URL expiration
- IP ranges
- Trust signers(which AWS accounts can create **signed URLS**)

##### Cloudfront signed URL
- Can have different origins. Does not have to be **EC2**.
- Key-pair is account wide and managed by the root user.
- Can utilize **cacheing** features.
- Can filter by date, path, IP address, expiration, etc.
##### S3 signed URL
- Issues a request as the **IAM user** who creates the presigned URL.
- Limit **lifetime**.
Tips:
- [x] Use signed **URLs/cookies** when you want to secure content so that only the people you authorize are able to access it.
- [x] A signed URL is for individual files.
**1 file = 1URL**
- [x] A signed cookie is for multiple files.
**1 cookie = multiple files**
- [x] If your origin is EC2, then use CloudFront.
- [x] If your origin is S3, then you've only got a single file.(S3 signed URL)
#### Snowball
Snowball is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of AWS.
tips:
- Understand what Snowball is
- Snowball Can
- Import to S3
- Export from S3
#### Storage Gateway
Storage Gateway is a service that connents an **on-premises software appliance with cloud-based storage** to provide seamless and secure integration between an organization's on-premises IT environment and AWS's storage.

The three different types of storage gateway
- File Gateway(NFS & SMB)
- Volume Gateway(iSCSI)
- Stored Volumes
- Cached Volumes
- Tape Gateway(VTL)

Tips:
- File Gateway
- File Gateway - for flat files, stored directly on S3
- Volume Gateway
- Stored Volumes - Entire Dataset is stored on site and is asynchronously backed up to S3.
- Cached Volumes - Entire Dataset is stored on S3 and the most frequently accessed data is cached on site.
- Gateway Virtual Tape Library
#### Athena vs Macie
##### Athena
Interactive query service which enables you to analyse and query data located in S3 using standard SQL
- Serverless, nothing to provision, pay per query / per TB scanned
- No need to set up complex Extract/Transform/Load(ETL)
- Works directly with data stored in S3
##### Athena is Used for
- Query log files stored in S3, e.g. ELB logs, S3 access logs etc.
- Generate business reports on data stored in S3.
- Analyse AWS cost and usage reports.
- Run queries on click-stream data.
##### Macie
What is PII (Personally Identifiable Information)?
- Personal data used to establish an individual's identity.
- This data could be exploited by criminals, used in identity theft and financial fraud.
- Home address, email address, SSN
- Passport number, driver's license number.
- D.O.B, phone number, bank account, credit card number.
So Macie is Security service which uses Machine Learning and NLP to discover, classify and protect sensitive data stored in S3
- Uses AI to recognise if your S3 contain sensitive data such as PII
- Dashboards, reporting and alerts
- Works directly with data stored in S3
- Can also analyze CloudTrail logs
- Great for PCI-DSS and preventing ID theft
Tips:
1
**Athena**(secret query)
- Athena is an interactive query service
- It allows you to query data located in S3 using standard SQL
- Serverless
- Commonly used to analyse log data stored in S3
**Macie**(security service for PII)
- Macie uses AI to analyse data in S3 and helps identify PII
- Can also be used to analyse CloudTrail logs for suspicious API activity
- Includes Dashboards, Reports and Alerting
- Great for PCI-DSS compliance and preventing ID theft.
test:
Virtual style puts your bucket name 1st, s3 2nd, and the region 3rd. Path style puts s3 1st and your bucket as a sub domain. Legacy Global endpoint has no region. S3 static hosting can be your own domain or your bucket name 1st, s3-website 2nd, followed by the region. AWS are in the process of phasing out Path style, and support for Legacy Global Endpoint format is limited and discouraged. However it is still useful to be able to recognize them should they show up in logs. https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html
