--- title: (DONE) AWS 101 tags: cloud-service, aws --- # Introduction to AWS ## Important terms to remember - `IaaS`: Infrastructure-as-as-Service, is the first kind of cloud computing, which lets the user have the internet-based access to the cloud storage - `PaaS`: Platform-as-a-Service, lets the developers create and host the mobile and web applications using the internet servers - `SaaS`: Software-as-a-Service, lets the users access the same applications for all the devices using the cloud storage **Amazon S3 (Amazon Simple Storage Service)**: - This is a service offered by AWS, that provides object storage through a web service interface. - S3 is file storage (different with DynamoDB which is a Database), is suited well to store unstructured data. S3 doesn't follow folder structure and uses everything as an object. - S3 stores files in a flat organization of containers called `Buckets` - S3 uses unique Ids called Key to retrieve files from the bucket. Maximum size for each Object is 5TB --> suitable for storing large object. - Besides S3, Amazon provides another storage tier for long-term cold storage namely `Amazon Glacier` and `Amazon Elastic Block Store` + `Amazon Glacier`: low-cost cloud storage services for cold data storage that makes the users able to store their infrequently accessed data for long retrieval times. + `Amazon Elastic Block Store`: storage service that let the users store their persistent data in the block-level storage that is highly available even when the `Elastic Compute Cloud (EC2)` is shut down. + `Amazon Elastic Compute Cloud (EC2)`: is a web-based service virtual server for business to run applications on. This is commonly known as `Instances`, which allow developers to access to compute capacity on the global AWS data centers. - [S3 on AWS documents](https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html) **AWS Glue** - ETL service to prepare and load data for analytics: - AWS Glue Data Catalog: is an Apache Hive Metastore compatible, central repository where AWS Glue discovers your data, stores the associated metadata (table definition and schema) - **AWS Glue Crawler**: this service will process files you transfer/upload to S3, extracts metadata and creates tables definitions in the AWS Glue Data Catalog - AWS Glue ETL Job: is the business logic that performs ETL work in AWS Glue. It generates a PySpark or Scala script, which runs on Apache Spark. **Amazon Redshift**: - This is a fully managed, petabyte-scale data warehouse service in the cloud. - In order to start your data warehouse, first, you have to launch a set of nodes (called an `Amazon Redshift cluster`), then you upload data set and then perform data analysis queries. - An `Amazon Redshift cluster` is a set of nodes, which consists of a leader node and one or more compute nodes. [More on this](https://docs.aws.amazon.com/redshift/latest/mgmt/overview.html) - When you provision a cluster, AR creates one database. This is the database to load data and run queries on your data. You can create additional database under this cluster. (Each Redshift cluster runs its own Redshift engine and contains at least one database) - AR is a direct alternative to on-premise traditional database warehouses on the following areas: - Performance - Cost - Scalability - Security **Amazon Relational Database Service**: services offered by AWS for the database management, also called `Amazon RDS`, let the users to be able to migrate, recover, take the backup of their data. **AWS Migration Hub, AWS Snowball**: services offered by AWS to let the users migrate their data, applications, servers, and database on its publis AWS cloud, also `DMS: Database Migration Service` **Amazon Virtual Private Cloud (VPC)**: for networking, users by using this service can have full control to use a secluded of the AWS cloud, and to balance the network traffic. **AWS Config, AWS Config Rules, AWS Trusted Advisor**: help users deal with the cloud resourcce configuration. **AWS Indentity and Access Management (IAM)**: services that help manage access over to cloud resources. (for Security purpose) **Amazon Messaging Services:** `Amazon Simple Queue Service (Amazon SQS)`, `Amazon Simple Notification Service (Amazon SNS)`, and `Amazon Simple Email Service (Amazon SES)`, services that help users to send sms, push notification, send emails. **AWS Development Tools**: AWS Command Line Interface and Software Development Kits (SDKs), helps manage applications and services. And more services like `AWS Tools for Powershell`, `AWS Serverless Application Model`, `Amazon API Gateway`, etc. **Other Products and Services in the fields of AI:** - Amazon AWS Rekognition: to add visual analysis to your applications - Amazon Lex: to build conversational interfaces using voice and text - Amazon EMR: for Big Data - Amazon Chime: for making online meetings efficient - Amazon Alexa: to help you out with basic taks - Amazon Connect: for Cloud Contact Centers - Amazon Smart Drone: ? **AWS Kinesis**: - To work with real-time streaming data in the AWS cloud **AWS Lambda**: - Is a compure service that runs your code in response to events and automatically manages the compure resources for you :::info - [More on basic terms of AWS](https://www.northeastern.edu/graduate/blog/aws-terminology/) - ::: ## AWS Analytics Services :::info https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html SageMaker: [1](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html), [2](https://towardsdatascience.com/building-fully-custom-machine-learning-models-on-aws-sagemaker-a-practical-guide-c30df3895ef7) [AWS Datalakes and Analytics](https://aws.amazon.com/big-data/datalakes-and-analytics/) ::: ## AWS Architecture ![High-level AWS Architecture](https://miro.medium.com/max/1400/0*-H0h2GdZjC3_muET.png)