---
title: Parallel and Distributed Systems Lab 5
---
<h1 style='border: none'><center>Parallel and Distributed Systems Lab 5</center></h1>
<h2 style='border: none'><center>Amazon Web Service(AWS)<br>S3 Storage and Data Management</center></h2>
<h5><center>The Islamic University of Gaza<br>Engineering Faculty<br>Department of Computer Engineering</center></h5>
<h6>Authors: Usama R. Al Zayan<span style="float:right">2022/10/28</span></h6>
---
## introduction
* Amazon S3 is one of the main building blocks of AWS
* It’s advertised as ”infinitely scaling” storage
* It’s widely popular and deserves its own section
* Many websites use Amazon S3 as a backbone
* Many AWS services uses Amazon S3 as an integration as well
## Amazon S3 Overview
### Amazon S3 - Buckets
* Amazon S3 allows people to store objects (files) in “buckets” (directories)
* Buckets must have a **globally unique name**
* Buckets are defined at the region level
* Naming convention
• No uppercase
• No underscore
• 3-63 characters long
• Not an IP
• Must start with lowercase letter or number
### Amazon S3 - Objects
* Objects (files) have a Key
* The <span style="color:#33ccff" >key</span> is the **FULL** path:
• s3://my-bucket/<span style="color:#33ccff" >my_file.txt</span>
• s3://my-bucket/<span style="color:#33ccff" >my_folder1/another_folder/my_file.txt</span>
* The key is composed of <span style="color:#ff8000" >prefix</span> + <span style="color:#2eb82e" >object name</span>
• s3://my-bucket/<span style="color:#ff8000" >my_folder1/another_folder/</span><span style="color:#2eb82e" >my_file.txt</span>
* There’s no concept of “directories” within buckets
(although the UI will trick you to think otherwise)
* Just keys with very long names that contain slashes (“/”)
* Object values are the content of the body:
• Max Object Size is 5TB (5000GB)
• If uploading more than 5GB, must use “multi-part upload”
* Metadata (list of text key / value pairs – system or user metadata)
* Tags (Unicode key / value pair – up to 10) – useful for security / lifecycle
* Version ID (if versioning is enabled)
### Amazon S3 - Versioning
* You can version your files in Amazon S3
* It is enabled at the **bucket level**
* Same key overwrite will increment the “version”: 1, 2, 3….
* It is best practice to version your buckets
• Protect against unintended deletes (ability to restore a version)
• Easy roll back to previous version
* **Notes**:
• Any file that is not versioned prior to enabling versioning will have version “null”
• Suspending versioning does not delete the previous versions
### Amazon S3 - Security
* **User based**
• IAM policies - which API calls should be allowed for a specific user from IAM
console
* **Resource Based**
• Bucket Policies - bucket wide rules from the S3 console - allows cross account
• Object Access Control List (ACL) – finer grain
• Bucket Access Control List (ACL) – less common
* **Note**: an IAM principal can access an S3 object if
• the user IAM permissions allow it OR the resource policy ALLOWS it
• AND there’s no explicit DENY
### Amazon S3 Bucket Policies
* JSON based policies
• Resources: buckets and objects
• Actions: Set of API to Allow or Deny
• Effect: Allow / Deny
• Principal: The account or user to apply the policy to
* Use S3 bucket for policy to:
• Grant public access to the bucket
• Force objects to be encrypted at upload
• Grant access to another account (Cross Account)
<center>

</center>
## Amazon S3 Replication
* Must enable versioning in source and destination
* Cross Region Replication (CRR)
* Same Region Replication (SRR)
* Buckets can be in different accounts
* Copying is asynchronous
* Must give proper IAM permissions to S3
* **CRR - Use cases**: compliance, lower latency access, replication across accounts
* **SRR – Use cases**: log aggregation, live replication between production and test accounts
### Amazon S3 Replication – Notes
* After activating, only new objects are replicated
* Optionally, you can replicate existing objects using S3 Batch Replication
• Replicates existing objects and objects that failed replication
* For DELETE operations:
• Can replicate delete markers from source to target (optional setting)
• Deletions with a version ID are not replicated (to avoid malicious deletes)
* There is no “chaining” of replication
• If bucket 1 has replication into bucket 2, which has replication into bucket 3
• Then objects created in bucket 1 are not replicated to bucket 3
<center>

</center>
## How to create Amazon S3 Buckets
1. Search for `S3` service in the top search bar.

2. Click `Create bucket` Button.

3. Enter `Bucket name` (Bucket name must be globally unique and must not contain spaces or uppercase letters) and Choose `AWS Region`.

4. Click `Create bucket` Button.

5. Now you can upload objects(files) and create folders inside this bucket as you want.

## How to Upload And Download Files From AWS S3
Depending on the framework you are using, search for how to upload and download files from AWS S3, for example (Python): [How to Upload And Download Files From AWS S3 Using Python](https://towardsdatascience.com/how-to-upload-and-download-files-from-aws-s3-using-python-2022-4c9b787b15f2)
## [Amazon S3 Storage Classes](https://aws.amazon.com/s3/storage-classes/)
* Amazon S3 Standard - General Purpose
* Amazon S3 Standard-Infrequent Access (IA)
* Amazon S3 One Zone-Infrequent Access
* Amazon S3 Glacier Instant Retrieval
* Amazon S3 Glacier Flexible Retrieval
* Amazon S3 Glacier Deep Archive
* Amazon S3 Intelligent Tiering
* Can move between classes manually or using S3 Lifecycle configurations
### S3 Durability and Availability
* **Durability:**
• High durability (99.999999999%, 11 9’s) of objects across multiple AZ
• If you store 10,000,000 objects with Amazon S3, you can on average expect to
incur a loss of a single object once every 10,000 years
• Same for all storage classes
* **Availability:**
• Measures how readily available a service is
• Varies depending on storage class
• Example: S3 standard has 99.99% availability = not available 53 minutes a year
### S3 Standard – General Purpose
* 99.99% Availability
* Used for frequently accessed data
* Low latency and high throughput
* Sustain 2 concurrent facility failures
* Use Cases: Big Data analytics, mobile & gaming applications, content distribution
### S3 Storage Classes – Infrequent Access
* For data that is less frequently accessed, but requires rapid access when needed
* Lower cost than S3 Standard
#### Amazon S3 Standard-Infrequent Access (S3 Standard-IA)
* 99.9% Availability
* Use cases: Disaster Recovery, backups
#### Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)
* High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed
* 99.5% Availability
* Use Cases: Storing secondary backup copies of on-premises data, or data you can recreate
### Amazon S3 Glacier Storage Classes
* Low-cost object storage meant for archiving / backup
* Pricing: price for storage + object retrieval cost
#### Amazon S3 Glacier Instant Retrieval
* Millisecond retrieval, great for data accessed once a quarter
* Minimum storage duration of 90 days
#### Amazon S3 Glacier Flexible Retrieval (formerly Amazon S3 Glacier):
* Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) – free
* Minimum storage duration of 90 days
#### Amazon S3 Glacier Deep Archive – for long term storage:
* Standard (12 hours), Bulk (48 hours)
* Minimum storage duration of 180 days
## S3 Intelligent-Tiering
* Small monthly monitoring and auto-tiering fee
* Moves objects automatically between Access Tiers based on usage
* There are no retrieval charges in S3 Intelligent-Tiering
* Frequent Access tier (automatic): default tier
* Infrequent Access tier (automatic): objects not accessed for 30 days
* Archive Instant Access tier (automatic): objects not accessed for 90 days
* Archive Access tier (optional): configurable from 90 days to 700+ days
* Deep Archive Access tier (optional): config. from 180 days to 700+ days
## S3 Storage Classes Comparison

## S3 – Moving between storage classes
* You can transition objects between storage classes
* For infrequently accessed object, move them to STANDARD_IA
* For archive objects you don’t need in real-time, GLACIER or DEEP_ARCHIVE
* Moving objects can be automated using a lifecycle configuratio
<center>

</center>
## S3 Lifecycle Rules
* **Transition actions**: It defines when objects are transitioned to another storage class.
• Move objects to Standard IA class 60 days after creation
• Move to Glacier for archiving after 6 months
* **Expiration actions**: configure objects to expire (delete) after some time
• Access log files can be set to delete after a 365 days
• Can be used to delete old versions of files (if versioning is enabled)
• Can be used to delete incomplete multi-part uploads
* Rules can be created for a certain prefix (ex - s3://mybucket/mp3/*)
* Rules can be created for certain objects tags (ex - Department: Finance)
###### tags: `Parallel and Distributed Systems` `Cloud computing` `IUG` `Computer Engineering`
<center>End Of Lab 5</center>