--- title: Parallel and Distributed Systems Lab 5 --- <h1 style='border: none'><center>Parallel and Distributed Systems Lab 5</center></h1> <h2 style='border: none'><center>Amazon Web Service(AWS)<br>S3 Storage and Data Management</center></h2> <h5><center>The Islamic University of Gaza<br>Engineering Faculty<br>Department of Computer Engineering</center></h5> <h6>Authors: Usama R. Al Zayan<span style="float:right">2022/10/28</span></h6> --- ## introduction * Amazon S3 is one of the main building blocks of AWS * It’s advertised as ”infinitely scaling” storage * It’s widely popular and deserves its own section * Many websites use Amazon S3 as a backbone * Many AWS services uses Amazon S3 as an integration as well ## Amazon S3 Overview ### Amazon S3 - Buckets * Amazon S3 allows people to store objects (files) in “buckets” (directories) * Buckets must have a **globally unique name** * Buckets are defined at the region level * Naming convention • No uppercase • No underscore • 3-63 characters long • Not an IP • Must start with lowercase letter or number ### Amazon S3 - Objects * Objects (files) have a Key * The <span style="color:#33ccff" >key</span> is the **FULL** path: • s3://my-bucket/<span style="color:#33ccff" >my_file.txt</span> • s3://my-bucket/<span style="color:#33ccff" >my_folder1/another_folder/my_file.txt</span> * The key is composed of <span style="color:#ff8000" >prefix</span> + <span style="color:#2eb82e" >object name</span> • s3://my-bucket/<span style="color:#ff8000" >my_folder1/another_folder/</span><span style="color:#2eb82e" >my_file.txt</span> * There’s no concept of “directories” within buckets (although the UI will trick you to think otherwise) * Just keys with very long names that contain slashes (“/”) * Object values are the content of the body: • Max Object Size is 5TB (5000GB) • If uploading more than 5GB, must use “multi-part upload” * Metadata (list of text key / value pairs – system or user metadata) * Tags (Unicode key / value pair – up to 10) – useful for security / lifecycle * Version ID (if versioning is enabled) ### Amazon S3 - Versioning * You can version your files in Amazon S3 * It is enabled at the **bucket level** * Same key overwrite will increment the “version”: 1, 2, 3…. * It is best practice to version your buckets • Protect against unintended deletes (ability to restore a version) • Easy roll back to previous version * **Notes**: • Any file that is not versioned prior to enabling versioning will have version “null” • Suspending versioning does not delete the previous versions ### Amazon S3 - Security * **User based** • IAM policies - which API calls should be allowed for a specific user from IAM console * **Resource Based** • Bucket Policies - bucket wide rules from the S3 console - allows cross account • Object Access Control List (ACL) – finer grain • Bucket Access Control List (ACL) – less common * **Note**: an IAM principal can access an S3 object if • the user IAM permissions allow it OR the resource policy ALLOWS it • AND there’s no explicit DENY ### Amazon S3 Bucket Policies * JSON based policies • Resources: buckets and objects • Actions: Set of API to Allow or Deny • Effect: Allow / Deny • Principal: The account or user to apply the policy to * Use S3 bucket for policy to: • Grant public access to the bucket • Force objects to be encrypted at upload • Grant access to another account (Cross Account) <center> ![](https://i.imgur.com/OhG6mrt.png =500x) </center> ## Amazon S3 Replication * Must enable versioning in source and destination * Cross Region Replication (CRR) * Same Region Replication (SRR) * Buckets can be in different accounts * Copying is asynchronous * Must give proper IAM permissions to S3 * **CRR - Use cases**: compliance, lower latency access, replication across accounts * **SRR – Use cases**: log aggregation, live replication between production and test accounts ### Amazon S3 Replication – Notes * After activating, only new objects are replicated * Optionally, you can replicate existing objects using S3 Batch Replication • Replicates existing objects and objects that failed replication * For DELETE operations: • Can replicate delete markers from source to target (optional setting) • Deletions with a version ID are not replicated (to avoid malicious deletes) * There is no “chaining” of replication • If bucket 1 has replication into bucket 2, which has replication into bucket 3 • Then objects created in bucket 1 are not replicated to bucket 3 <center> ![](https://i.imgur.com/FwZm0MV.png =400x) </center> ## How to create Amazon S3 Buckets 1. Search for `S3` service in the top search bar. ![](https://i.imgur.com/NajItZ0.png) 2. Click `Create bucket` Button. ![](https://i.imgur.com/t4IivSh.png) 3. Enter `Bucket name` (Bucket name must be globally unique and must not contain spaces or uppercase letters) and Choose `AWS Region`. ![](https://i.imgur.com/o76oRgw.png) 4. Click `Create bucket` Button. ![](https://i.imgur.com/VPc9EbE.png) 5. Now you can upload objects(files) and create folders inside this bucket as you want. ![](https://i.imgur.com/es79qRu.png) ## How to Upload And Download Files From AWS S3 Depending on the framework you are using, search for how to upload and download files from AWS S3, for example (Python): [How to Upload And Download Files From AWS S3 Using Python](https://towardsdatascience.com/how-to-upload-and-download-files-from-aws-s3-using-python-2022-4c9b787b15f2) ## [Amazon S3 Storage Classes](https://aws.amazon.com/s3/storage-classes/) * Amazon S3 Standard - General Purpose * Amazon S3 Standard-Infrequent Access (IA) * Amazon S3 One Zone-Infrequent Access * Amazon S3 Glacier Instant Retrieval * Amazon S3 Glacier Flexible Retrieval * Amazon S3 Glacier Deep Archive * Amazon S3 Intelligent Tiering * Can move between classes manually or using S3 Lifecycle configurations ### S3 Durability and Availability * **Durability:** • High durability (99.999999999%, 11 9’s) of objects across multiple AZ • If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years • Same for all storage classes * **Availability:** • Measures how readily available a service is • Varies depending on storage class • Example: S3 standard has 99.99% availability = not available 53 minutes a year ### S3 Standard – General Purpose * 99.99% Availability * Used for frequently accessed data * Low latency and high throughput * Sustain 2 concurrent facility failures * Use Cases: Big Data analytics, mobile & gaming applications, content distribution ### S3 Storage Classes – Infrequent Access * For data that is less frequently accessed, but requires rapid access when needed * Lower cost than S3 Standard #### Amazon S3 Standard-Infrequent Access (S3 Standard-IA) * 99.9% Availability * Use cases: Disaster Recovery, backups #### Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA) * High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed * 99.5% Availability * Use Cases: Storing secondary backup copies of on-premises data, or data you can recreate ### Amazon S3 Glacier Storage Classes * Low-cost object storage meant for archiving / backup * Pricing: price for storage + object retrieval cost #### Amazon S3 Glacier Instant Retrieval * Millisecond retrieval, great for data accessed once a quarter * Minimum storage duration of 90 days #### Amazon S3 Glacier Flexible Retrieval (formerly Amazon S3 Glacier): * Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) – free * Minimum storage duration of 90 days #### Amazon S3 Glacier Deep Archive – for long term storage: * Standard (12 hours), Bulk (48 hours) * Minimum storage duration of 180 days ## S3 Intelligent-Tiering * Small monthly monitoring and auto-tiering fee * Moves objects automatically between Access Tiers based on usage * There are no retrieval charges in S3 Intelligent-Tiering * Frequent Access tier (automatic): default tier * Infrequent Access tier (automatic): objects not accessed for 30 days * Archive Instant Access tier (automatic): objects not accessed for 90 days * Archive Access tier (optional): configurable from 90 days to 700+ days * Deep Archive Access tier (optional): config. from 180 days to 700+ days ## S3 Storage Classes Comparison ![](https://i.imgur.com/JBHyVwu.png) ## S3 – Moving between storage classes * You can transition objects between storage classes * For infrequently accessed object, move them to STANDARD_IA * For archive objects you don’t need in real-time, GLACIER or DEEP_ARCHIVE * Moving objects can be automated using a lifecycle configuratio <center> ![](https://i.imgur.com/v0Eh5PB.png =500x) </center> ## S3 Lifecycle Rules * **Transition actions**: It defines when objects are transitioned to another storage class. • Move objects to Standard IA class 60 days after creation • Move to Glacier for archiving after 6 months * **Expiration actions**: configure objects to expire (delete) after some time • Access log files can be set to delete after a 365 days • Can be used to delete old versions of files (if versioning is enabled) • Can be used to delete incomplete multi-part uploads * Rules can be created for a certain prefix (ex - s3://mybucket/mp3/*) * Rules can be created for certain objects tags (ex - Department: Finance) ###### tags: `Parallel and Distributed Systems` `Cloud computing` `IUG` `Computer Engineering` <center>End Of Lab 5</center>