AZURE DP203: Day 2

# AZURE DP203: Day 2 # Labs Overviews: ## Module 3: Data exploration and transformation in Azure Databrick * Describe Azure Datbrick * Read and write data in Azure Databricks * work with DataFrames in Azure Databricks * Work with DataFrames in Databricks * Work with DataFrames advanced methods in Azure Databrick ## Module 4: * Understand big data engineering with Apache Spark in Azure Synapse Analytics * Ingest data with Apache Spark notebooks in Azure Synapse Analytics * Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics * Integrate SQL and Apache Spark Pools in Azure Synapse Analytics ## Module 5: * Use data loading best practices in Azure Synapse Analytics * Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipline # Course Overviews: * Explore Azure Synapse serverless SQL pools capabilities: Data Exploration, Data transformation, Logical data warehouse * user can use Azure Synapse Analytics dedicate SQL pool, which has the SQL authentication/AD authentication to access the date in the storage (for different level of credential) * Differnt level of Credentials (authentication from low to high): user/group, service principal/app registration, manage identity (system assigned, user assigned[the highest]) * ETC. VM asigned to managed identity -> read the databse (for authentication of db) ![](https://i.imgur.com/JN6b6RQ.jpg) * Securing access to data in a data lake when using Azure Synapse Analytics: Authentication -> Authorization -> Access to storage accounts ![](https://i.imgur.com/S7WNbOw.png) * What can Azure Synapse serverless & SQL pools do? https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/on-demand-workspace-overview * Different type of Query data(etc. Parquet, Json, csv…) * What is Azure Synapse Studio platform? * How can the user add role signment to storage (datalake/SQL pools)? : https://learn.microsoft.com/zh-tw/azure/role-based-access-control/role-assignments-portal ![](https://i.imgur.com/t9IrVTa.png) ![](https://i.imgur.com/j4WjgZ7.png) * Manage user access to data lake files: https://docs.informatica.com/data-integration/powerexchange-adapters-for-informatica/h2l/1357-prerequisites-to-create-a-microsoft-azure-data-lake-storage/prerequisites-to-create-a-microsoft-azure-data-lake-storage-gen2/setting-permissions-for-microsoft-azure-data-lake-store-gen2--ac.html ![](https://i.imgur.com/w4h8ZCL.png) * How to change the setting for “Manage ACL”? https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-acl-azure-portal * Uderstand Azure Databricks:\ ![](https://i.imgur.com/lIXAG6N.png) * Introduction to big data engineering with Apache Spark in Azure Synapse Analytics: ![](https://i.imgur.com/hOO1h6Z.png) * How do Apache Spark pools work in Azure Synapse Analytics * Create a new serverless Apache Spark pool using the Azure portal: ![](https://i.imgur.com/Yp3LNXT.png) * Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics: * Step1.Define a function for flattening * Step2. Flatten nested schema * Step3. Explore arrays * Flatten child nested schema * Quick knowledge & practice: https://learn.microsoft.com/en-us/training/modules/understand-big-data-engineering-with-apache-spark-azure-synapse-analytics/ * Integrate SQL and Apache Spark pools in Azure Synapse Analytics * Quick knowledge & practice:https://learn.microsoft.com/en-us/training/modules/integrate-sql-apache-spark-pools-azure-synapse-analytics/2-describe-integration-methods-between-sql * ![](https://i.imgur.com/EAqn1F2.png) * Dedicated SQL Pool architecture ![](https://i.imgur.com/HROf6YT.png) * Azure Data Factory/Synapse pipline revision ![](https://i.imgur.com/mWyuIa1.jpg) * Understand integration: Azure Integration Runtime , Self-hosted Integration Runtime