Try   HackMD

Building Batch Data Analytics Solutions on AWS

Welcome to the Building Batch Data Analytics Solutions on AWS class. This will be used as supporting material that you can take with you.

EMR

Title Link
Deep Dive and Best Practices https://www.youtube.com/watch?v=dU40df0Suoo
Whats new 2021 https://www.youtube.com/watch?v=lGm8qe4tBrg
Whats new 2020 https://pages.awscloud.com/Deep-Dive-into-Whats-New-in-Amazon-EMR_2020_0230-ABD_OD.html
EMR 6.0.0 https://www.youtube.com/watch?v=M_EOXbJhD3g
2 https://aws.amazon.com/de/blogs/big-data/build-a-self-service-environment-for-each-line-of-business-using-amazon-emr-and-aws-service-catalog/
3 https://aws.amazon.com/de/blogs/big-data/apply-record-level-changes-from-relational-databases-to-amazon-s3-data-lake-using-apache-hudi-on-amazon-emr-and-aws-database-migration-service/
Flink https://aws.amazon.com/de/blogs/big-data/use-apache-flink-on-amazon-emr/
File Formats https://www.youtube.com/watch?v=aIcxFIyL6xo
Spark Optimization https://www.youtube.com/watch?v=daXEp4HmS-E
Spark on EMR https://www.youtube.com/watch?v=aIwJlfEAlHQ
EMR vs Glue https://aws.amazon.com/de/blogs/big-data/how-drop-used-the-amazon-emr-runtime-for-apache-spark-to-halve-costs-and-get-results-5-4-times-faster/
EMR Data Access Controls https://www.youtube.com/watch?v=qOoWnBhnbuU
EMR Serverless https://www.youtube.com/watch?v=qk3TDZ4OkNE
Lake Formation Tag AC https://docs.aws.amazon.com/lake-formation/latest/dg/TBAC-overview.html

Spark

Title Link
Spark Jobs on EKS https://www.youtube.com/watch?v=Om8RRGbZ6zA
Spark on EKS Best Practices https://www.youtube.com/watch?v=3EbTr79wLkU
RDDs, Dataframes, Datasets https://www.youtube.com/watch?v=Ofk7G3GD9jk

Hive

Title Link
ACID Transactions https://aws.amazon.com/de/blogs/big-data/amazon-emr-supports-apache-hive-acid-transactions/

Step Functions

Title Link
EMR Orchestration https://aws.amazon.com/de/blogs/aws/new-using-step-functions-to-orchestrate-amazon-emr-workloads/
Managed Airflow https://aws.amazon.com/de/managed-workflows-for-apache-airflow/
X-Ray Support https://aws.amazon.com/about-aws/whats-new/2020/09/aws-step-functions-adds-support-for-aws-x-ray/
MWAA https://aws.amazon.com/de/blogs/aws/introducing-amazon-managed-workflows-for-apache-airflow-mwaa/

Lambda

Title Link
Lambda Destinations https://aws.amazon.com/blogs/compute/introducing-aws-lambda-destinations/
2 https://www.refinery.io/post/how-to-chain-serverless-functions-call-invoke-a-lambda-from-another-lambda
3 https://www.youtube.com/watch?v=Jkx6kVbDpL4
4 https://www.thoughtworks.com/insights/blog/mitigating-serverless-lock-fears

Glue

Title Link
Data Brew https://aws.amazon.com/glue/features/databrew/
Data Lakes with Glue https://www.youtube.com/watch?v=JsNR8uBVSiA
PySpark For Glue https://www.youtube.com/watch?v=DICsZiwuHJo
1 https://www.youtube.com/watch?v=S_xeHvP7uMo
2 https://aws.amazon.com/de/blogs/big-data/aws-serverless-data-analytics-pipeline-reference-architecture/
3 https://aws.amazon.com/de/blogs/big-data/crafting-serverless-streaming-etl-jobs-with-aws-glue/

Athena

Title Link
1 https://www.youtube.com/watch?v=tzoXRRCVmIQ
2 https://www.youtube.com/watch?v=JIviltfpul0

BigData Architecture

Title Link
Spark on EKS https://www.youtube.com/watch?v=lHM96P5kP2k
Use case 1 https://www.youtube.com/watch?v=XpFNznmRoQ0
Use case 2 https://www.youtube.com/watch?v=wbh51O3QrE4
Data Lake https://www.youtube.com/watch?v=7i1tj59pvYw
Hearst Corp https://www.youtube.com/watch?v=6cwbbqi36k8

Resiliency

Title Link
1 https://aws.amazon.com/de/blogs/big-data/optimizing-amazon-emr-for-resilience-and-cost-with-capacity-optimized-spot-instances/
2 https://www.youtube.com/watch?v=Fup5vHEvU50
3 https://github.com/bbc/chaos-lambda

MSK Managed Kafka

Title Link
1 https://www.youtube.com/watch?v=HtU9pb18g5Q

DynamoDB

Title Link
1 https://www.youtube.com/watch?v=HaEPXoXVf2k
2 https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

Kinesis

Title Link
Kinesis Best Practices https://www.youtube.com/watch?v=jKPlGznbfZ0
Enhanced Fanout https://aws.amazon.com/blogs/aws/kds-enhanced-fanout/
Streaming ETL https://aws.amazon.com/de/blogs/big-data/unified-serverless-streaming-etl-architecture-with-amazon-kinesis-data-analytics/
Storm Integration https://github.com/amazon-archives/kinesis-storm-spout
Flink and Kinesis https://aws.amazon.com/blogs/big-data/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics/

Nitro System

Title Link
AWS Hypervisor Security https://www.youtube.com/watch?v=0qcUOKupt7Y

Re:Invent 2021

Title Link
Re:Invent https://reinvent.awsevents.com