# The Machine Learning Pipeline Welcome to The Machine Learning Pipeline class. This will be used as supporting material that you can take with you. ### Exam Readiness Workshop https://www.aws.training/Details/eLearning?id=42183 https://medium.com/@adam.dejans/my-path-to-passing-the-aws-machine-learning-certification-e8fc45ad7762 ### Machine Learning Title | Link --- | --- TensorFlow Without a PhD| https://www.youtube.com/watch?v=vq2nnJ4g6N0 TensorFlow Without a PhD Project|https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd Deep Learing Book|https://d2l.ai/chapter_computer-vision/anchor.html Managing ML Projects|https://d1.awsstatic.com/whitepapers/aws-managing-ml-projects.pdf Deep Learning on AWS|https://d1.awsstatic.com/whitepapers/Deep_Learning_on_AWS.pdf tf-idf|https://www.youtube.com/watch?v=4vT4fzjkGCQ Boosting|https://www.youtube.com/watch?v=UHBmv7qCey4 Feature Scaling|https://towardsdatascience.com/all-about-feature-scaling-bcc0ad75cb35 CNNs|https://www.youtube.com/watch?v=AjtX1N_VT9E Visualization: Matplotlib vs. Seaborn|https://www.kaggle.com/fazilbtopal/visualization-matplotlib-vs-seaborn Anonymisation with PCA|https://medium.com/lizuna/beacon-the-use-of-principal-components-analysis-to-mask-sensitive-data-in-machine-learning-7904b01445d0 Anonymisation with PCA|https://arxiv.org/pdf/1903.11700.pdf Encoding Cyclic Features|https://towardsdatascience.com/ml-intro-5-one-hot-encoding-cyclic-representations-normalization-6f6e2f4ec001 Boostrapping|https://www.analyticsvidhya.com/blog/2020/02/what-is-bootstrap-sampling-in-statistics-and-machine-learning/ ### Fraud Detection Title | Link --- | --- Fraud Detection with Amazon SageMaker Intro|https://www.youtube.com/watch?v=wzwkLV9gDXk Fraud Detection with Amazon SageMaker Intermediate|https://www.youtube.com/watch?v=elRQPCHDBPE Deep Fake Detection|https://arxiv.org/pdf/1909.11573.pdf ### SageMaker Title | Link --- | --- Hyperparameter Tuning|https://aws.amazon.com/de/blogs/machine-learning/amazon-sagemaker-automatic-model-tuning-now-supports-random-search-and-hyperparameter-scaling/ Ground Truth| https://www.youtube.com/watch?v=6WJxzKsIFKA Custom Labeling in Groud Truth|https://aws.amazon.com/de/blogs/machine-learning/build-a-custom-data-labeling-workflow-with-amazon-sagemaker-ground-truth/ Example Notebooks|https://github.com/aws/amazon-sagemaker-examples Custom Algorithms|https://www.youtube.com/watch?v=Oy_sCAKChhI Using EFS and FSx|https://aws.amazon.com/de/blogs/machine-learning/speed-up-training-on-amazon-sagemaker-using-amazon-efs-or-amazon-fsx-for-lustre-file-systems/ Multi-Model Endpoints|https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html API Gateway Integration|https://aws.amazon.com/de/blogs/machine-learning/creating-a-machine-learning-powered-rest-api-with-amazon-api-gateway-mapping-templates-and-amazon-sagemaker/ Invoke Endpoint API|https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html ### Deep Dive Sagemaker Built-In Algorithms Title | Link --- | --- Overview|https://github.com/awsdocs/amazon-sagemaker-developer-guide/blob/master/doc_source/algos.md BlazingText|https://www.youtube.com/watch?v=G2tX0YpNHfc DeepAR Forecasting|https://www.youtube.com/watch?v=g8UYGh0tlK0 Factorization Machines|https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf K-Means|https://www.youtube.com/watch?v=RPInYpI9MjY LDA|https://www.youtube.com/watch?v=NMDL8Atim1k Linear Learner|https://www.youtube.com/watch?v=ae08a6Bp5lM NTM|https://www.youtube.com/watch?v=eAMjEv7EABM Obj2Vec|https://www.youtube.com/watch?v=ggVWnnRXtYc PCA|https://www.youtube.com/watch?v=RPInYpI9MjY RCF|https://www.youtube.com/watch?v=9BWHR4JsTNU ResNet|https://www.youtube.com/watch?v=CBDwEZtjFDE Seq2Seq|https://www.youtube.com/watch?v=pZIV5NWfGIU XGBoost|https://www.youtube.com/watch?v=THcH0tMdZ6o XGBoost Demo|https://www.youtube.com/watch?v=GrJP9FLV3FE ### New in SageMaker Title | Link --- | --- AutoML|https://www.youtube.com/watch?v=lPQqm5aqXJE Data Wrangler|https://www.youtube.com/watch?v=_bsat_2N8LI Feature Store|https://www.youtube.com/watch?v=pEg5c6d4etI ### Stream Data Ingestion with Kinesis Title | Link --- | --- Kinesis Best Practices | https://www.youtube.com/watch?v=jKPlGznbfZ0 Enhanced Fanout | https://aws.amazon.com/blogs/aws/kds-enhanced-fanout/ Streaming ETL|https://aws.amazon.com/de/blogs/big-data/unified-serverless-streaming-etl-architecture-with-amazon-kinesis-data-analytics/ Storm Integration|https://github.com/amazon-archives/kinesis-storm-spout Flink and Kinesis|https://aws.amazon.com/blogs/big-data/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics/ Managed Kafka|https://www.youtube.com/watch?v=HtU9pb18g5Q ### Lambda Title | Link --- | --- | --- Lambda Destinations|https://aws.amazon.com/blogs/compute/introducing-aws-lambda-destinations/ 2|https://www.refinery.io/post/how-to-chain-serverless-functions-call-invoke-a-lambda-from-another-lambda 3|https://www.youtube.com/watch?v=Jkx6kVbDpL4 4|https://www.thoughtworks.com/insights/blog/mitigating-serverless-lock-fears ### Orchestrating with Step Functions Title | Link --- | --- | --- EMR Orchestration|https://aws.amazon.com/de/blogs/aws/new-using-step-functions-to-orchestrate-amazon-emr-workloads/ Managed Airflow|https://aws.amazon.com/de/managed-workflows-for-apache-airflow/ X-Ray Support|https://aws.amazon.com/about-aws/whats-new/2020/09/aws-step-functions-adds-support-for-aws-x-ray/ ### Processing with BigData Title | Link --- | --- Spark on EKS |https://www.youtube.com/watch?v=lHM96P5kP2k Use case 1|https://www.youtube.com/watch?v=XpFNznmRoQ0 Use case 2|https://www.youtube.com/watch?v=wbh51O3QrE4 Data Lake|https://www.youtube.com/watch?v=7i1tj59pvYw Spark Jobs on EKS|https://www.youtube.com/watch?v=Om8RRGbZ6zA Spark on EKS Best Practices | https://www.youtube.com/watch?v=3EbTr79wLkU Athena 1|https://www.youtube.com/watch?v=tzoXRRCVmIQ Athena 2|https://www.youtube.com/watch?v=JIviltfpul0 Glue 1|https://www.youtube.com/watch?v=S_xeHvP7uMo Glue Reference Architecture|https://aws.amazon.com/de/blogs/big-data/aws-serverless-data-analytics-pipeline-reference-architecture/ Glue Streaming|https://aws.amazon.com/de/blogs/big-data/crafting-serverless-streaming-etl-jobs-with-aws-glue/ Data Brew|https://aws.amazon.com/glue/features/databrew/ Data Lakes with Glue|https://www.youtube.com/watch?v=JsNR8uBVSiA ### ETL with EMR Title | Link --- | --- | --- Deep Dive and Best Practices|https://www.youtube.com/watch?v=dU40df0Suoo Whats new 2020|https://pages.awscloud.com/Deep-Dive-into-Whats-New-in-Amazon-EMR_2020_0230-ABD_OD.html EMR 6.0.0|https://www.youtube.com/watch?v=M_EOXbJhD3g 2|https://aws.amazon.com/de/blogs/big-data/build-a-self-service-environment-for-each-line-of-business-using-amazon-emr-and-aws-service-catalog/ 3|https://aws.amazon.com/de/blogs/big-data/apply-record-level-changes-from-relational-databases-to-amazon-s3-data-lake-using-apache-hudi-on-amazon-emr-and-aws-database-migration-service/ Flink|https://aws.amazon.com/de/blogs/big-data/use-apache-flink-on-amazon-emr/ File Formats|https://www.youtube.com/watch?v=aIcxFIyL6xo Spark Optimization|https://www.youtube.com/watch?v=daXEp4HmS-E Spark on EMR|https://www.youtube.com/watch?v=aIwJlfEAlHQ EMR vs Glue|https://aws.amazon.com/de/blogs/big-data/how-drop-used-the-amazon-emr-runtime-for-apache-spark-to-halve-costs-and-get-results-5-4-times-faster/ ### Resiliency Title | Link --- | --- | --- 1|https://aws.amazon.com/de/blogs/big-data/optimizing-amazon-emr-for-resilience-and-cost-with-capacity-optimized-spot-instances/ 2|https://www.youtube.com/watch?v=Fup5vHEvU50 3|https://github.com/bbc/chaos-lambda ### DynamoDB Title | Link --- | --- | --- 1|https://www.youtube.com/watch?v=HaEPXoXVf2k 2|https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf ### Nitro System Title | Link --- | --- | --- AWS Hypervisor Security|https://www.youtube.com/watch?v=0qcUOKupt7Y ### Re:Invent 2020 Title | Link --- | --- | --- Re:Invent | https://reinvent.awsevents.com