# 2022-12-31 Data Warehousing on AWS (3 days) ###### tags: `classdoc` ### Please bookmark this LIVE document for the rest of class: **`https://hackmd.io/`** Class Start (CST) | Lunch | End of Day ------------------|-------|----------- 09:00 | 12:00-13:00 |16:30 Marko Sluga (markoaws@amazon.com) <sub><sub>Senior Technical Trainer, AWS Training and Certification</sub></sub> ## UPON YOUR ARRIVAL ### 1) Create a lab account here: https://aws.qwiklabs.com Create an account here. Please be certain to use the same email address you signed up for this class with. ### You will have access to the labs 24 hours a day until one month after end of class! You can run each lab up to 3 times; I cannot reset this limit. ### 2) VERIFY EBOOK ACCESS: https://evantage.gilmoreglobal.com We will send these ebook licenses out shortly. We do this AFTER class is underway to ensure attendance. You will receive an email from noreply@gilmoreglobal.ca with a link to gilmore and your code. If you do not receive it please send me a chat message. --- ## Lab 4 Errata At step 11 in lab 4, here are some quick notes to perform the lab using new UI Go to Kinesis Console In Side-bar, Delivery Stream Create Delivery Stream Source: Direct PUT Destination: Redshift Delivery stream name: redshift-game-stream Destination Cluster: your redshift cluster user: admin pass: Redshift123 db: lab table: game_score columns: record_time, user_id, game_id, score Create new intermediate S3 bucket pick a creative name for it Resume at step 21 in lab... --- # History highlights of Redshift from reInvent 2021: (direct timestamp link: https://youtu.be/x0xmqJrAVM8?t=176) - 2012 - Launch https://aws.amazon.com/about-aws/whats-new/2012/11/28/announcing-amazon-redshift/ - 2017 - Spectrum https://aws.amazon.com/about-aws/whats-new/2017/04/introducing-amazon-redshift-spectrum-run-amazon-redshift-queries-directly-on-datasets-as-large-as-an-exabyte-in-amazon-s3/ - 2018 - Elastic Resize https://aws.amazon.com/about-aws/whats-new/2018/11/amazon-redshift-elastic-resize/ - 2019 - Redshift Managed Storage (RMS) https://aws.amazon.com/about-aws/whats-new/2019/12/amazon-redshift-announces-ra3-nodes-managed-storage/ - 2020 - Redshift ML https://aws.amazon.com/about-aws/whats-new/2020/12/aws-announces-amazon-redshift-ml-preview/ - 2020 - AQUA https://aws.amazon.com/about-aws/whats-new/2020/12/aws-announces-aqua-for-amazon-redshift-preview/ - 2020 - Compilation as a Service (CaaS) https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-redshift-now-delivers-better-cold-query-performance/ - 2020 - Federated queries https://aws.amazon.com/blogs/big-data/announcing-amazon-redshift-federated-querying-to-amazon-aurora-mysql-and-amazon-rds-for-mysql/ - 2021 - Cross-region data sharing https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-redshift-region-data-sharing/ # Other "recent" announcements - 2019 - Stored Procedure support https://aws.amazon.com/about-aws/whats-new/2019/05/amazon-redshift-now-supports-stored-procedures/ - 2019 - Redshift export to Parquet https://aws.amazon.com/about-aws/whats-new/2019/12/announcing-amazon-redshift-data-lake-export/ - 2021 - AWS Data Exchange for Redshift - https://aws.amazon.com/redshift/features/aws-data-exchange-for-amazon-redshift/ --- ## Additional resources * If you are curious to see the script that generates the stream of data aimed at “redshift-game-stream” https://us-west-2-tcprod.s3.amazonaws.com/courses/AWS-200-DWH/v1.2.2/lab-4-firehose/scripts/lab-producer.py * If you want to experience DMS... Step-by-step Tutorials: https://docs.aws.amazon.com/dms/latest/sbs/DMS-SBS-Welcome.html * Asynchronous API for Redshift: https://aws.amazon.com/blogs/big-data/using-the-amazon-redshift-data-api-to-interact-with-amazon-redshift-clusters/ * Choosing a distribution Style and Sort Keys: https://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html * Managing Unsorted Region https://docs.aws.amazon.com/redshift/latest/dg/r_vacuum_diskspacereqs.html * An AWS usecase for making Redshift public - Kinesis Firehose https://docs.aws.amazon.com/firehose/latest/dev/controlling-access.html#using-iam-rs --- ## New Features missing from course ### Peformance * Materialized Views GA - 2020-03-12 - https://aws.amazon.com/about-aws/whats-new/2020/03/amazon-redshift-introduces-support-for-materialized-views/ * Materialized Views Automatic Refresh + Rewrite - 2020-11-11 - https://aws.amazon.com/about-aws/whats-new/2020/11/amazon-redshift-announces-automatic-refresh-and-query-rewrite-for-materialized-views/ * AQUA now GA - 2021-04-14 https://aws.amazon.com/about-aws/whats-new/2021/04/aws-announces-general-availability-of-aqua-for-amazon-redshift/ * Amazon Redshift now recommends distribution keys for improved query performance, August 20, 2019 Announcement: https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-redshift-now-recommends-distribution-keys-for-improved-query-performance/ * Heuristics for choosing the DIST STYLE: > * https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-best-dist-key.html > * https://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html ### New Automation * Automatic Table Optimization - 2020-12-09 https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-redshift-announces-automatic-table-optimization/ ### Availability * Cluster Relocation between AZs for RA3 clusters - 2020-12-09 https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-redshift-launches-ability-easily-move-clusters-between-aws-availability-zones/ * Data Sharing GA - 2021-03-10 - https://aws.amazon.com/about-aws/whats-new/2021/03/announcing-general-availability-of-amazon-redshift-data-sharing/ * Data Sharing while Producers are Paused - 2021-04-12 https://aws.amazon.com/about-aws/whats-new/2021/04/amazon-redshift-supports-data-sharing-producer-clusters-paused/ ### Programmability * SUPER Datatype - 2020-12-09 https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-redshift-announces-support-native-json-semi-structured-data-processing/ * SUPER Datatype documentation https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html * Amazon Redshift now automatically picks the best distribution style, announced March 28, 2019: Announcement: https://aws.amazon.com/about-aws/whats-new/2019/03/amazon-redshift-now-automatically-picks-the-best-distribution-st/ * Redshift Concurrency Scaling, announced March 27, 2019: Jeff Barr https://aws.amazon.com/blogs/aws/new-concurrency-scaling-for-amazon-redshift-peak-performance-at-all-times/ Documentation https://docs.aws.amazon.com/redshift/latest/dg/concurrency-scaling.html * Redshift Auto Analyze, announced January 18, 2019 Announcement https://aws.amazon.com/about-aws/whats-new/2019/01/amazon-redshift-auto-analyze/ Documentation https://docs.aws.amazon.com/redshift/latest/dg/t_Analyzing_tables.html#t_Analyzing_tables-auto-analyze Enabled by default, can be disabled by setting parameter auto_analyze to false * VACUUM DELETE now automatically performed, announced December 19, 2018: https://aws.amazon.com/about-aws/whats-new/2018/12/amazon-redshift-automatic-vacuum/ * Redshift Short Query Acceleration, announced August 8, 2018 Announcement https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-automatically-enables-short-query-acceleration/ Documentation https://docs.aws.amazon.com/redshift/latest/dg/wlm-short-query-acceleration.html * Lake Formation Presentation from 2018 re:Invent https://www.youtube.com/watch?v=nsiLMqg654s