# Azure Synapse Analytics 2020/10/14
###### tags: `Azure` `Azure Synapse` `Microsoft` `Synapse` `Data Warehouse` `雲端` `分析` `資料倉儲` `數據倉儲`
2020/10/14 13:00-17:00
微軟研討會
打破資料孤島,構建資料與業務中台的關鍵服務:Azure Synapse Analytics 實作坊
### Azure Portal上目前有3種選擇
* Azure Synapse Analytics(formerly SQl DW)
* Azure Synapse Analytics (workspaces preview)
* Azure Synapse Analytics (private link hubs preview)
#### Auzre Synapse Analytics
先前為Azure DW,Azure Synapse Analytics是一組服務
建立後會出現以下Azure resource
1. Azure Synapse Analytics(formerly SQl DW)
* Synapse SQL pool (data warehouse)
* Storage Account (Azure Data Lake Storage)
* SQL server (xxx.database.windows.net)
2. Azure Synapse Analytics (workspaces preview)
* Synapse workspace
* SQL pool (如果有建立)(xxx.sql.azuresynapse.net 是由微軟配置的SQL Server)
* Apache Spark pool (如果有建立)
![](https://i.imgur.com/OQaWKZX.png)
Azure Synapse Analytics(formerly SQl DW)
Synapse SQL pool (data warehouse)
![](https://i.imgur.com/JvgOsXq.png)
Azure Synapse Analytics (workspaces preview)
Synapse workspace
![](https://i.imgur.com/PGp5Z4N.png)
官方文件
Azure Synapse 有四個元件:
* Synapse SQL:完成以 T-SQL 為基礎的分析 - 正式推出
* SQL 集區 (依據佈建的 DWU 付費)
* SQL 隨選 (依據處理的 TB 量付費) (預覽)
* Spark:深入整合的 Apache Spark (預覽)
* Synapse Pipelines:混合式資料整合 (預覽)
* Studio:整合的使用者體驗。 (預覽)
[Lab File](https://1drv.ms/u/s!AtRjVGjLsVVticlQEgKXYVryZUv-PA?e=JvfS9F)
Lab 1
Azure Synapse Analytics(formerly SQl DW) 傳統Data warehouse(沒有spark)
1.Create Azure Synapse Analytics(formerly SQl DW) in Azure portal
[Synapse SQL pool]
SQL Server: synapsebrad.database.windows.net
Pool: 例如設定為 sqlpool01 (在sql物件就是db name)
sqladmin
Pa55w.rd1234
performance level
因為是測試所以將scale等級從預設1000先降到200
平時不用Pause就只會計算儲存空間的費用
2.create workload group in Azure portal
Workload management-new workload group
設定Min esources % 例如設定為 20
Min. resources % per request 例如設定為 5
Cap resources % 預設 100
3.CREATE a LOGIN in SSMS(connect to [Synapse SQL pool])
```
--master
IF NOT EXISTS (SELECT * FROM sys.sql_logins WHERE name = 'ELTLogin')
BEGIN
CREATE LOGIN [ELTLogin] WITH PASSWORD='Aasd456++'
END;
--sqlpool01
IF NOT EXISTS (SELECT * FROM sys.database_principals WHERE name = 'ELTLogin')
BEGIN
CREATE USER [ELTLogin] FOR LOGIN [ELTLogin]
GRANT CONTROL ON DATABASE::sqlpool01 TO ELTLogin
END;
```
4.Create WORKLOAD CLASSIFIER for the login in SSMS
select * from sys.workload_management_workload_groups
select * from sys.workload_management_workload_classifiers
select * from sys.workload_management_workload_classifier_details
CREATE WORKLOAD CLASSIFIER ELTLogin
WITH
(
WORKLOAD_GROUP = 'WGMedium'
,MEMBERNAME = 'ELTLogin'
,IMPORTANCE = HIGH
,WLM_LABEL = 'loadsfact'
);
SELECT c.[name], c.group_name, c.importance, cd.classifier_type, cd.classifier_value
FROM sys.workload_management_workload_classifiers c
JOIN sys.workload_management_workload_classifier_details cd
ON cd.classifier_id = c.classifier_id
WHERE c.name = 'ELTLogin'
-- gen sample load data
CREATE TABLE factstaging (ColA int)
INSERT INTO factstaging VALUES(0)
INSERT INTO factstaging VALUES(1)
INSERT INTO factstaging VALUES(2)
GO
5.登入ETLlogin帳號測試insert data in SSMS
用CREATE AS SELECT寫入資料模擬分析查詢的workload
-switch user to ELTLogin
CREATE TABLE testclassifierfact2 WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT * FROM factstaging
OPTION (LABEL='loadsfact')
-- 查詢剛剛語法是否有落入指定的CLASSIFIER
SELECT TOP 1 request_id, classifier_name, group_name, resource_allocation_percentage, submit_time, [status], [label], command
FROM sys.dm_pdw_exec_requests
WHERE [label] = 'loadsfact'
ORDER BY submit_time DESC
Lab 2
Azure Synapse Analytics (workspaces preview)
1.Create Azure Synapse Analytics(formerly SQl DW) in Azure portal
workspace: synapasewsbrad
storage account: synapasestbrad
container: synapase
預設管理員帳號 sqladminuser
輸入自訂密碼 Pa55w.rd1234
2.firewall
Allow Azure services and resources to access this workspace
change to ON
3.create a SQL Pool
name: sqlpool01
performance level同樣降到200
因為是測試所以將scale等級從預設1000先降到200
因為目前是preview所以workload group功能還沒有出現
synapsewsbrad.sql.azuresynapse.net
sqlpool01
4.create a spark pool
pool name: spark01
node size 因為是測試降成small
autospale 因為是測試diskable
number of nodes: 因為是測試降低4
建立很快只是配置檔完成
後續第一次執行notebook語法才會真正初始化主機
5.回到Ovewview
點選 Workspace web URL
進入 Azure Synapse Analytics的Web開發與管理介面
左邊選擇 Develop,然後按下+號,選SQL Scripts
-- type your sql script here, we now have intellisense
CREATE TABLE [dbo].[Trip]
(
[DateID] int NOT NULL,
[MedallionID] int NOT NULL,
[HackneyLicenseID] int NOT NULL,
[PickupTimeID] int NOT NULL,
[DropoffTimeID] int NOT NULL,
[PickupGeographyID] int NULL,
[DropoffGeographyID] int NULL,
[PickupLatitude] float NULL,
[PickupLongitude] float NULL,
[PickupLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DropoffLatitude] float NULL,
[DropoffLongitude] float NULL,
[DropoffLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[PassengerCount] int NULL,
[TripDurationSeconds] int NULL,
[TripDistanceMiles] float NULL,
[PaymentType] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[FareAmount] money NULL,
[SurchargeAmount] money NULL,
[TaxAmount] money NULL,
[TipAmount] money NULL,
[TollsAmount] money NULL,
[TotalAmount] money NULL
)
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
);
COPY INTO [dbo].[Trip]
FROM 'https://nytaxiblob.blob.core.windows.net/2013/Trip2013/QID6392_20171107_05910_0.txt.gz'
WITH
(
FILE_TYPE = 'CSV',
FIELDTERMINATOR = '|',
FIELDQUOTE = '',
ROWTERMINATOR='0X0A',
COMPRESSION = 'GZIP'
)
OPTION (LABEL = 'COPY : Load [dbo].[Trip] - Taxi dataset');
SELECT PassengerCount,
SUM(TripDistanceMiles) as SumTripDistance,
AVG(TripDistanceMiles) as AvgTripDistance
FROM dbo.Trip
WHERE TripDistanceMiles > 0 AND PassengerCount > 0
GROUP BY PassengerCount
ORDER BY PassengerCount
6.進入 Azure Synapse Analytics的Web開發與管理介面
左邊選擇 Develop,然後按下+號,選Import
選 Analyze with Apache Spark.ipynb
執行第一段就會開始初始化主機(會稍微久一點)
from azureml.opendatasets import NycTlcYellow
data = NycTlcYellow()
data_df = data.to_spark_dataframe()
# Display 10 rows
display(data_df.limit(10))
7.進入 Azure Synapse Analytics的Web開發與管理介面
左邊選擇 Develop,然後按下+號,選Import
Analyze data with SQL on-demand.sql
Connect to 改成SQL on-demand
-- 此URL路徑下是有很多個檔案,以下語法一次查詢出來 (約36 sec)
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/puYear=*/puMonth=*/*.parquet',
FORMAT = 'parquet'
) AS [result];
-- 下方這句須等6都做完才會有資料
SELECT *
FROM nyctaxi.dbo.passengercountstats
8.Azure Synapse Link for CosmosDB
自動複寫CosmosDB即時資料到背景空間給Azure Synapse取用分析
Create CosmosDB in Azure Portal
Account Type可以改成production
Multi-region wrtite改成disable
neworking
改成All Network
設定
Feature
啟用Azure Synapse Link (約3分鐘,會在背後建立一個空間自動即時同步資料給Synapse用)
Data Explorer
1.new一個database
RetailData
level 400
2.new container
container id: StoreDemoGraphics (依據csv檔名來新增)
partition key: /id
analyitca store: on(預設也必須是on)
container id: RetailSales (依據csv檔名來新增)
partition key: /id
analyitca store: on(預設也必須是on)
container id: Products (依據csv檔名來新增)
partition key: /id
analyitca store: on(預設也必須是on)
進入 Azure Synapse Analytics的Web開發與管理介面
Data
的Linked + Connect to external data 加入 CosmosDB
9.到storage account建立一個retaildata資料夾(Container)
,然後上傳csv檔
Products.csv
RetailSales.csv
StoreDemoGraphics.csv
10.Power BI帳號
個人帳號無法使用PowreBI
所以要用Azure AD建立的帳號才能登入
建立後到PowreBI網站,使用此Azure AD新帳號申請試用
登入Web PowerBI後,改用心檢視畫面
左邊workspaces,啟用PowerBI Pro 60天免費試用
new workspaces
加入存取權給Azure AD管理員
回到 Azure Synapse Analytics的Web開發與管理介面
new linked 選擇powerBI
切到develop就有一個PowerBI
下載一個檔案,就可以用Power BI Desktop開發
開發完就發佈到Power BI Web工作區
此時就可以在Azure Synapse Analytics的Web開發與管理介面
develop的PowerBI就會出現此報表,並且可以修改開發,未來就直接在Azure Synapse Analytics開發報表