# 2022-12-31 Data Warehousing on AWS (3 days)
###### tags: `classdoc`
### Please bookmark this LIVE document for the rest of class: **`https://hackmd.io/`**
Class Start (CST) | Lunch | End of Day
------------------|-------|-----------
09:00 | 12:00-13:00 |16:30
Marko Sluga (markoaws@amazon.com)
<sub><sub>Senior Technical Trainer, AWS Training and Certification</sub></sub>
## UPON YOUR ARRIVAL
### 1) Create a lab account here: https://aws.qwiklabs.com
Create an account here. Please be certain to use the same email address you signed up for this class with.
### You will have access to the labs 24 hours a day until one month after end of class! You can run each lab up to 3 times; I cannot reset this limit.
### 2) VERIFY EBOOK ACCESS: https://evantage.gilmoreglobal.com
We will send these ebook licenses out shortly. We do this AFTER class is underway to ensure attendance.
You will receive an email from noreply@gilmoreglobal.ca with a link to gilmore and your code. If you do not receive it please send me a chat message.
---
## Lab 4 Errata
At step 11 in lab 4, here are some quick notes to perform the lab using new UI
Go to Kinesis Console
In Side-bar, Delivery Stream
Create Delivery Stream
Source: Direct PUT
Destination: Redshift
Delivery stream name: redshift-game-stream
Destination Cluster: your redshift cluster
user: admin
pass: Redshift123
db: lab
table: game_score
columns: record_time, user_id, game_id, score
Create new intermediate S3 bucket
pick a creative name for it
Resume at step 21 in lab...
---
# History highlights of Redshift from reInvent 2021: (direct timestamp link: https://youtu.be/x0xmqJrAVM8?t=176)
- 2012 - Launch https://aws.amazon.com/about-aws/whats-new/2012/11/28/announcing-amazon-redshift/
- 2017 - Spectrum https://aws.amazon.com/about-aws/whats-new/2017/04/introducing-amazon-redshift-spectrum-run-amazon-redshift-queries-directly-on-datasets-as-large-as-an-exabyte-in-amazon-s3/
- 2018 - Elastic Resize https://aws.amazon.com/about-aws/whats-new/2018/11/amazon-redshift-elastic-resize/
- 2019 - Redshift Managed Storage (RMS) https://aws.amazon.com/about-aws/whats-new/2019/12/amazon-redshift-announces-ra3-nodes-managed-storage/
- 2020 - Redshift ML https://aws.amazon.com/about-aws/whats-new/2020/12/aws-announces-amazon-redshift-ml-preview/
- 2020 - AQUA https://aws.amazon.com/about-aws/whats-new/2020/12/aws-announces-aqua-for-amazon-redshift-preview/
- 2020 - Compilation as a Service (CaaS) https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-redshift-now-delivers-better-cold-query-performance/
- 2020 - Federated queries https://aws.amazon.com/blogs/big-data/announcing-amazon-redshift-federated-querying-to-amazon-aurora-mysql-and-amazon-rds-for-mysql/
- 2021 - Cross-region data sharing https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-redshift-region-data-sharing/
# Other "recent" announcements
- 2019 - Stored Procedure support https://aws.amazon.com/about-aws/whats-new/2019/05/amazon-redshift-now-supports-stored-procedures/
- 2019 - Redshift export to Parquet https://aws.amazon.com/about-aws/whats-new/2019/12/announcing-amazon-redshift-data-lake-export/
- 2021 - AWS Data Exchange for Redshift - https://aws.amazon.com/redshift/features/aws-data-exchange-for-amazon-redshift/
---
## Additional resources
* If you are curious to see the script that generates the stream of data aimed at “redshift-game-stream”
https://us-west-2-tcprod.s3.amazonaws.com/courses/AWS-200-DWH/v1.2.2/lab-4-firehose/scripts/lab-producer.py
* If you want to experience DMS...
Step-by-step Tutorials: https://docs.aws.amazon.com/dms/latest/sbs/DMS-SBS-Welcome.html
* Asynchronous API for Redshift:
https://aws.amazon.com/blogs/big-data/using-the-amazon-redshift-data-api-to-interact-with-amazon-redshift-clusters/
* Choosing a distribution Style and Sort Keys:
https://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html
* Managing Unsorted Region
https://docs.aws.amazon.com/redshift/latest/dg/r_vacuum_diskspacereqs.html
* An AWS usecase for making Redshift public - Kinesis Firehose
https://docs.aws.amazon.com/firehose/latest/dev/controlling-access.html#using-iam-rs
---
## New Features missing from course
### Peformance
* Materialized Views GA - 2020-03-12 - https://aws.amazon.com/about-aws/whats-new/2020/03/amazon-redshift-introduces-support-for-materialized-views/
* Materialized Views Automatic Refresh + Rewrite - 2020-11-11 - https://aws.amazon.com/about-aws/whats-new/2020/11/amazon-redshift-announces-automatic-refresh-and-query-rewrite-for-materialized-views/
* AQUA now GA - 2021-04-14
https://aws.amazon.com/about-aws/whats-new/2021/04/aws-announces-general-availability-of-aqua-for-amazon-redshift/
* Amazon Redshift now recommends distribution keys for improved query performance, August 20, 2019
Announcement:
https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-redshift-now-recommends-distribution-keys-for-improved-query-performance/
* Heuristics for choosing the DIST STYLE:
> * https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-best-dist-key.html
> * https://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html
### New Automation
* Automatic Table Optimization - 2020-12-09
https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-redshift-announces-automatic-table-optimization/
### Availability
* Cluster Relocation between AZs for RA3 clusters - 2020-12-09
https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-redshift-launches-ability-easily-move-clusters-between-aws-availability-zones/
* Data Sharing GA - 2021-03-10 - https://aws.amazon.com/about-aws/whats-new/2021/03/announcing-general-availability-of-amazon-redshift-data-sharing/
* Data Sharing while Producers are Paused - 2021-04-12
https://aws.amazon.com/about-aws/whats-new/2021/04/amazon-redshift-supports-data-sharing-producer-clusters-paused/
### Programmability
* SUPER Datatype - 2020-12-09
https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-redshift-announces-support-native-json-semi-structured-data-processing/
* SUPER Datatype documentation
https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html
* Amazon Redshift now automatically picks the best distribution style, announced March 28, 2019:
Announcement: https://aws.amazon.com/about-aws/whats-new/2019/03/amazon-redshift-now-automatically-picks-the-best-distribution-st/
* Redshift Concurrency Scaling, announced March 27, 2019:
Jeff Barr
https://aws.amazon.com/blogs/aws/new-concurrency-scaling-for-amazon-redshift-peak-performance-at-all-times/
Documentation
https://docs.aws.amazon.com/redshift/latest/dg/concurrency-scaling.html
* Redshift Auto Analyze, announced January 18, 2019
Announcement https://aws.amazon.com/about-aws/whats-new/2019/01/amazon-redshift-auto-analyze/
Documentation https://docs.aws.amazon.com/redshift/latest/dg/t_Analyzing_tables.html#t_Analyzing_tables-auto-analyze
Enabled by default, can be disabled by setting parameter auto_analyze to false
* VACUUM DELETE now automatically performed, announced December 19, 2018:
https://aws.amazon.com/about-aws/whats-new/2018/12/amazon-redshift-automatic-vacuum/
* Redshift Short Query Acceleration, announced August 8, 2018
Announcement
https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-automatically-enables-short-query-acceleration/
Documentation
https://docs.aws.amazon.com/redshift/latest/dg/wlm-short-query-acceleration.html
* Lake Formation Presentation from 2018 re:Invent
https://www.youtube.com/watch?v=nsiLMqg654s