# AZURE DP203: Day 2
# Labs Overviews:
## Module 3: Data exploration and transformation in Azure Databrick
* Describe Azure Datbrick
* Read and write data in Azure Databricks
* work with DataFrames in Azure Databricks
* Work with DataFrames in Databricks
* Work with DataFrames advanced methods in Azure Databrick
## Module 4:
* Understand big data engineering with Apache Spark in Azure Synapse Analytics
* Ingest data with Apache Spark notebooks in Azure Synapse Analytics
* Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics
* Integrate SQL and Apache Spark Pools in Azure Synapse Analytics
## Module 5:
* Use data loading best practices in Azure Synapse Analytics
* Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipline
# Course Overviews:
* Explore Azure Synapse serverless SQL pools capabilities: Data Exploration, Data transformation, Logical data warehouse
* user can use Azure Synapse Analytics dedicate SQL pool, which has the SQL authentication/AD authentication to access the date in the storage (for different level of credential)
* Differnt level of Credentials (authentication from low to high): user/group, service principal/app registration, manage identity (system assigned, user assigned[the highest])
* ETC. VM asigned to managed identity -> read the databse (for authentication of db)

* Securing access to data in a data lake when using Azure Synapse Analytics: Authentication -> Authorization -> Access to storage accounts

* What can Azure Synapse serverless & SQL pools do? https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/on-demand-workspace-overview
* Different type of Query data(etc. Parquet, Json, csv…)
* What is Azure Synapse Studio platform?
* How can the user add role signment to storage (datalake/SQL pools)? : https://learn.microsoft.com/zh-tw/azure/role-based-access-control/role-assignments-portal


* Manage user access to data lake files: https://docs.informatica.com/data-integration/powerexchange-adapters-for-informatica/h2l/1357-prerequisites-to-create-a-microsoft-azure-data-lake-storage/prerequisites-to-create-a-microsoft-azure-data-lake-storage-gen2/setting-permissions-for-microsoft-azure-data-lake-store-gen2--ac.html

* How to change the setting for “Manage ACL”? https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-acl-azure-portal
* Uderstand Azure Databricks:\

* Introduction to big data engineering with Apache Spark in Azure Synapse Analytics:

* How do Apache Spark pools work in Azure Synapse Analytics
* Create a new serverless Apache Spark pool using the Azure portal:

* Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics:
* Step1.Define a function for flattening
* Step2. Flatten nested schema
* Step3. Explore arrays
* Flatten child nested schema
* Quick knowledge & practice: https://learn.microsoft.com/en-us/training/modules/understand-big-data-engineering-with-apache-spark-azure-synapse-analytics/
* Integrate SQL and Apache Spark pools in Azure Synapse Analytics
* Quick knowledge & practice:https://learn.microsoft.com/en-us/training/modules/integrate-sql-apache-spark-pools-azure-synapse-analytics/2-describe-integration-methods-between-sql
* 
* Dedicated SQL Pool architecture

* Azure Data Factory/Synapse pipline revision

* Understand integration: Azure Integration Runtime , Self-hosted Integration Runtime