Dremio Performance
==
###### tags: `infortrend`
[TOC]
# Prerequisites (!!安裝前需要確認!!)
:::danger
* 執行測試時,監控 CPU和 Memory 的使用量
* 相關測試文件
* [**Infortrend 測試報告**](https://docs.google.com/spreadsheets/d/1HzZeIaoYSQRguL9ltCo2yXsaZRZUpLmi9VaDA5DA-5I/edit#gid=48282311)
* [**Dell 測試文件**](https://www.delltechnologies.com/asset/en-us/products/storage/industry-market/h18538-dremio-perf-on-powerscale-ecs.pdf)
* [**Dremio Benchmarking**](https://www.dremio.com/blog/dremio-benchmarking-methodology/#h-benchmarking-test-suite)
:::
# Test Step
## Step 1. Install Dremio
* 安裝Dremio前須要先安裝minio server
* 規格:


* minio的minio-data-pvc建立30TB的空間

* minio上建立一個叫benchmarking的bucket

* 安裝Dremio, 1 coordinator and 4 executor nodes.
* 規格:




* 安裝Dremio時App Setting的bucket name輸入為benchmarking

## Step 2. Generate data
### Step 2-1. Prepare Dremio data folder
* 在K8S Cluster上建立一個dremio-data的PVC, Storage Size為30TB

* 選Many Nodes Read-Write就建立

* 找到以下圖中PVC所在的k8s node並用ssh連線到那個node上

* 需要在有跑etcd的node上執行以下的cmd
* 找到pv對應的(部分)volumeId
```bash=
[root@ks-9202636 ~] kubectl get pv
```

* 透過etcd找到這個pv的資訊
```bash=
[root@ks-9202636 ~] docker exec etcd etcdctl get --prefix /hci/pvi | grep 9e7130f3
```

* deviceId跟hostname的對照
```bash=
[root@ks-9202636 ~] docker exec etcd etcdctl get --prefix /hci/de
```

/Pool_head/9e7130f3a4f643b5a2e9677ef8136682/pvc_9e7130f3a4f643b5a2e9677ef8136682/是ks-9202636上PVC所在的路徑
* 下以下指令啟動ubuntu為dremio產生資料準備
```bash=
docker run -it -v /Pool_head/9e7130f3a4f643b5a2e9677ef8136682/pvc_9e7130f3a4f643b5a2e9677ef8136682/:/mnt infortrend/dremio-client
```
### Step 2-2. 使用 TPC-DS dsdgen utility 來產生 10 TB 的資料
```bash=
mkdir /mnt/data
mkdir /mnt/result
cd /root/tpcds-kit/tools
./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 1 &
./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 2 &
./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 3 &
./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 4 &
./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 5 &
./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 6 &
./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 7 &
./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 8
```
:::danger
Note: 等待一下全部的thread執行完
:::
### Step 2-3. 執行以下 script,將產生出來的 data 加上[第一列的名稱](https://github.com/sergeleo/dremio-tpc-ds/blob/master/Scripts/TPC-DS%20Data%20Prep/DatasetHeaders/tables_list.txt)
一樣在docker中下以下指令
```bash=
cd /root/dremio-tpc-ds/Scripts/TPC-DS\ Data\ Prep/
./data_convertion.sh
```
:::danger
Note: 產生完所有檔案後/mnt/result/底下要有24個檔案
:::
## Step 3. Data Conversion to Parquet
* 使用 Dremio CTAS (Create Table as Select) 將資料轉成 Parquet 格式
* 將產生的 TPC-DS 資料集移至 AWS S3(minio server) 中
```bash=
cd /mnt/result/
mc alias set benchmarking http://<minio LoadBalancer IP>:9000 admin admin123
mc cp ./ benchmarking/benchmarking/tpcds_sf10000/csv/ --recursive
```
* 進入Dremio的網頁(http://< dremio loadbalancer IP>:9047)後註冊username為dremio, password為dremio00, 如下圖:

* 註冊成功後選擇Add source
* 選擇Amazon s3後General menu中填寫從minio server抓過來的資料夾名稱, minio access key, minio secret key

* 移除打勾Encrypt connection

* Advanced Options中Enable compatibility mode要打勾

* 按Add properties後填寫name為fs.s3a.endpoint, value為minio loadbalancer IP與api port(9000)並按save

```bash=
cd /root/dremio-tpc-ds/Scripts/TPC-DS\ Data\ Prep/
vi PromoteCSVtoPDS.sh #內容修改成<dremio loadbalancer IP>
```
```bash=
#!/bin/bash
# Authenticate and extract the token
TOKEN=$(curl -X POST http://<dremio loadbalancer IP>:9047/apiv2/login -H 'Content-Type: application/json' -d '{"userName": "dremio", "password": "dremio00"}'| jq -r '.token' )
# Iterate through datasets
for i in `cat DatasetHeaders/tables_list.txt`
do
DATAOUTPUT=$(curl -X PUT http://<dremio loadbalancer IP>:9047/apiv2/source/benchmarking/folder_format/benchmarking/tpcds_sf10000/csv/$i.csv -H 'Authorization: _dremio'"$TOKEN"'' -H 'Content-Type: application/json' -d '{"type": "Text", "fieldDelimiter": "|", "skipFirstLine": false, "extractHeader": true, "quote": "\"", "comment": "#", "escape": "\"", "lineDelimiter": "\n", "autoGenerateColumnNames": true, "trimHeader": true}');
echo $DATAOUTPUT;
done
```
* 儲存完後如下執行
```bash=
./PromoteCSVtoPDS.sh
```
* 最後所有CSV檔轉換成Parquet file format
```bash=
create table "benchmarking"."benchmarking"."tpcds_sf10000"."call_center" as SELECT CONVERT_TO_INTEGER(cc_call_center_sk, 1, 1, 0) AS cc_call_center_sk, cc_call_center_id, cc_rec_start_date, cc_rec_end_date, cc_closed_date_sk, cc_open_date_sk, cc_name, cc_class, CONVERT_TO_INTEGER(cc_employees, 1, 1, 0) AS cc_employees, CONVERT_TO_INTEGER(cc_sq_ft, 1, 1, 0) AS cc_sq_ft, cc_hours, cc_manager, CONVERT_TO_INTEGER(cc_mkt_id, 1, 1, 0) AS cc_mkt_id, cc_mkt_class, cc_mkt_desc, cc_market_manager, CONVERT_TO_INTEGER(cc_division, 1, 1, 0) AS cc_division, cc_division_name, CONVERT_TO_INTEGER(cc_company, 1, 1, 0) AS cc_company, cc_company_name, CONVERT_TO_INTEGER(cc_street_number, 1, 1, 0) AS cc_street_number, cc_street_name, cc_street_type, cc_suite_number, cc_city, cc_county, cc_state, cc_zip, cc_country, CONVERT_TO_INTEGER(cc_gmt_offset, 1, 1, 0) AS cc_gmt_offset, CONVERT_TO_FLOAT(cc_tax_percentage, 1, 1, 0) AS cc_tax_percentage FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."call_center.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."catalog_returns" as SELECT CONVERT_TO_INTEGER(cr_returned_date_sk, 1, 1, 0) AS cr_returned_date_sk, CONVERT_TO_INTEGER(cr_returned_time_sk, 1, 1, 0) AS cr_returned_time_sk, CONVERT_TO_INTEGER(cr_item_sk, 1, 1, 0) AS cr_item_sk, CONVERT_TO_INTEGER(cr_refunded_customer_sk, 1, 1, 0) AS cr_refunded_customer_sk, CONVERT_TO_INTEGER(cr_refunded_cdemo_sk, 1, 1, 0) AS cr_refunded_cdemo_sk, CONVERT_TO_INTEGER(cr_refunded_hdemo_sk, 1, 1, 0) AS cr_refunded_hdemo_sk, CONVERT_TO_INTEGER(cr_refunded_addr_sk, 1, 1, 0) AS cr_refunded_addr_sk, CONVERT_TO_INTEGER(cr_returning_customer_sk, 1, 1, 0) AS cr_returning_customer_sk, CONVERT_TO_INTEGER(cr_returning_cdemo_sk, 1, 1, 0) AS cr_returning_cdemo_sk, CONVERT_TO_INTEGER(cr_returning_hdemo_sk, 1, 1, 0) AS cr_returning_hdemo_sk, CONVERT_TO_INTEGER(cr_returning_addr_sk, 1, 1, 0) AS cr_returning_addr_sk, CONVERT_TO_INTEGER(cr_call_center_sk, 1, 1, 0) AS cr_call_center_sk, CONVERT_TO_INTEGER(cr_catalog_page_sk, 1, 1, 0) AS cr_catalog_page_sk, CONVERT_TO_INTEGER(cr_ship_mode_sk, 1, 1, 0) AS cr_ship_mode_sk, CONVERT_TO_INTEGER(cr_warehouse_sk, 1, 1, 0) AS cr_warehouse_sk, CONVERT_TO_INTEGER(cr_reason_sk, 1, 1, 0) AS cr_reason_sk, CONVERT_TO_INTEGER(cr_order_number, 1, 1, 0) AS cr_order_number, CONVERT_TO_INTEGER(cr_return_quantity, 1, 1, 0) AS cr_return_quantity, CONVERT_TO_FLOAT(cr_return_amount, 1, 1, 0) AS cr_return_amount, CONVERT_TO_FLOAT(cr_return_tax, 1, 1, 0) AS cr_return_tax, CONVERT_TO_FLOAT(cr_return_amt_inc_tax, 1, 1, 0) AS cr_return_amt_inc_tax, CONVERT_TO_FLOAT(cr_fee, 1, 1, 0) AS cr_fee, CONVERT_TO_FLOAT(cr_return_ship_cost, 1, 1, 0) AS cr_return_ship_cost, CONVERT_TO_FLOAT(cr_refunded_cash, 1, 1, 0) AS cr_refunded_cash, CONVERT_TO_FLOAT(cr_reversed_charge, 1, 1, 0) AS cr_reversed_charge, CONVERT_TO_FLOAT(cr_store_credit, 1, 1, 0) AS cr_store_credit, CONVERT_TO_FLOAT(cr_net_loss, 1, 1, 0) AS cr_net_loss FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."catalog_returns.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."catalog_sales" as SELECT CONVERT_TO_INTEGER(cs_sold_date_sk, 1, 1, 0) AS cs_sold_date_sk, CONVERT_TO_INTEGER(cs_sold_time_sk, 1, 1, 0) AS cs_sold_time_sk, CONVERT_TO_INTEGER(cs_ship_date_sk, 1, 1, 0) AS cs_ship_date_sk, CONVERT_TO_INTEGER(cs_bill_customer_sk, 1, 1, 0) AS cs_bill_customer_sk, CONVERT_TO_INTEGER(cs_bill_cdemo_sk, 1, 1, 0) AS cs_bill_cdemo_sk, CONVERT_TO_INTEGER(cs_bill_hdemo_sk, 1, 1, 0) AS cs_bill_hdemo_sk, CONVERT_TO_INTEGER(cs_bill_addr_sk, 1, 1, 0) AS cs_bill_addr_sk, CONVERT_TO_INTEGER(cs_ship_customer_sk, 1, 1, 0) AS cs_ship_customer_sk, CONVERT_TO_INTEGER(cs_ship_cdemo_sk, 1, 1, 0) AS cs_ship_cdemo_sk, CONVERT_TO_INTEGER(cs_ship_hdemo_sk, 1, 1, 0) AS cs_ship_hdemo_sk, CONVERT_TO_INTEGER(cs_ship_addr_sk, 1, 1, 0) AS cs_ship_addr_sk, CONVERT_TO_INTEGER(cs_call_center_sk, 1, 1, 0) AS cs_call_center_sk, CONVERT_TO_INTEGER(cs_catalog_page_sk, 1, 1, 0) AS cs_catalog_page_sk, CONVERT_TO_INTEGER(cs_ship_mode_sk, 1, 1, 0) AS cs_ship_mode_sk, CONVERT_TO_INTEGER(cs_warehouse_sk, 1, 1, 0) AS cs_warehouse_sk, CONVERT_TO_INTEGER(cs_item_sk, 1, 1, 0) AS cs_item_sk, CONVERT_TO_INTEGER(cs_promo_sk, 1, 1, 0) AS cs_promo_sk, CONVERT_TO_INTEGER(cs_order_number, 1, 1, 0) AS cs_order_number, CONVERT_TO_INTEGER(cs_quantity, 1, 1, 0) AS cs_quantity, CONVERT_TO_FLOAT(cs_wholesale_cost, 1, 1, 0) AS cs_wholesale_cost, CONVERT_TO_FLOAT(cs_list_price, 1, 1, 0) AS cs_list_price, CONVERT_TO_FLOAT(cs_sales_price, 1, 1, 0) AS cs_sales_price, CONVERT_TO_FLOAT(cs_ext_discount_amt, 1, 1, 0) AS cs_ext_discount_amt, CONVERT_TO_FLOAT(cs_ext_sales_price, 1, 1, 0) AS cs_ext_sales_price, CONVERT_TO_FLOAT(cs_ext_wholesale_cost, 1, 1, 0) AS cs_ext_wholesale_cost, CONVERT_TO_FLOAT(cs_ext_list_price, 1, 1, 0) AS cs_ext_list_price, CONVERT_TO_FLOAT(cs_ext_tax, 1, 1, 0) AS cs_ext_tax, CONVERT_TO_FLOAT(cs_coupon_amt, 1, 1, 0) AS cs_coupon_amt, CONVERT_TO_FLOAT(cs_ext_ship_cost, 1, 1, 0) AS cs_ext_ship_cost, CONVERT_TO_FLOAT(cs_net_paid, 1, 1, 0) AS cs_net_paid, CONVERT_TO_FLOAT(cs_net_paid_inc_tax, 1, 1, 0) AS cs_net_paid_inc_tax, CONVERT_TO_FLOAT(cs_net_paid_inc_ship, 1, 1, 0) AS cs_net_paid_inc_ship, CONVERT_TO_FLOAT(cs_net_paid_inc_ship_tax, 1, 1, 0) AS cs_net_paid_inc_ship_tax, CONVERT_TO_FLOAT(cs_net_profit, 1, 1, 0) AS cs_net_profit FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."catalog_sales.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."catalog_page" as SELECT CONVERT_TO_INTEGER(cp_catalog_page_sk, 1, 1, 0) AS cp_catalog_page_sk, cp_catalog_page_id, CONVERT_TO_INTEGER(cp_start_date_sk, 1, 1, 0) AS cp_start_date_sk, CONVERT_TO_INTEGER(cp_end_date_sk, 1, 1, 0) AS cp_end_date_sk, cp_department, CONVERT_TO_INTEGER(cp_catalog_number, 1, 1, 0) AS cp_catalog_number, CONVERT_TO_INTEGER(cp_catalog_page_number, 1, 1, 0) AS cp_catalog_page_number, cp_description, cp_type FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."catalog_page.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."customer_address" as SELECT CONVERT_TO_INTEGER(ca_address_sk, 1, 1, 0) AS ca_address_sk, ca_address_id, CONVERT_TO_INTEGER(ca_street_number, 1, 1, 0) AS ca_street_number, ca_street_name, ca_street_type, ca_suite_number, ca_city, ca_county, ca_state, ca_zip, ca_country, CONVERT_TO_INTEGER(ca_gmt_offset, 1, 1, 0) AS ca_gmt_offset, ca_location_type FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."customer_address.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."customer_demographics" as SELECT CONVERT_TO_INTEGER(cd_demo_sk, 1, 1, 0) AS cd_demo_sk, cd_gender, cd_marital_status, cd_education_status, CONVERT_TO_INTEGER(cd_purchase_estimate, 1, 1, 0) AS cd_purchase_estimate, cd_credit_rating, CONVERT_TO_INTEGER(cd_dep_count, 1, 1, 0) AS cd_dep_count, CONVERT_TO_INTEGER(cd_dep_employed_count, 1, 1, 0) AS cd_dep_employed_count, CONVERT_TO_INTEGER(cd_dep_college_count, 1, 1, 0) AS cd_dep_college_count FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."customer_demographics.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."customer" as SELECT CONVERT_TO_INTEGER(c_customer_sk, 1, 1, 0) AS c_customer_sk, c_customer_id, CONVERT_TO_INTEGER(c_current_cdemo_sk, 1, 1, 0) AS c_current_cdemo_sk, CONVERT_TO_INTEGER(c_current_hdemo_sk, 1, 1, 0) AS c_current_hdemo_sk, CONVERT_TO_INTEGER(c_current_addr_sk, 1, 1, 0) AS c_current_addr_sk, CONVERT_TO_INTEGER(c_first_shipto_date_sk, 1, 1, 0) AS c_first_shipto_date_sk, CONVERT_TO_INTEGER(c_first_sales_date_sk, 1, 1, 0) AS c_first_sales_date_sk, c_salutation, c_first_name, c_last_name, c_preferred_cust_flag, CONVERT_TO_INTEGER(c_birth_day, 1, 1, 0) AS c_birth_day, CONVERT_TO_INTEGER(c_birth_month, 1, 1, 0) AS c_birth_month, CONVERT_TO_INTEGER(c_birth_year, 1, 1, 0) AS c_birth_year, c_birth_country, c_login, c_email_address, CONVERT_TO_INTEGER(c_last_review_date_sk, 1, 1, 0) AS c_last_review_date_sk FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."customer.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."date_dim" as SELECT CONVERT_TO_INTEGER(d_date_sk, 1, 1, 0) AS d_date_sk, d_date_id, d_date, CONVERT_TO_INTEGER(d_month_seq, 1, 1, 0) AS d_month_seq, CONVERT_TO_INTEGER(d_week_seq, 1, 1, 0) AS d_week_seq, CONVERT_TO_INTEGER(d_quarter_seq, 1, 1, 0) AS d_quarter_seq, CONVERT_TO_INTEGER(d_year, 1, 1, 0) AS d_year, CONVERT_TO_INTEGER(d_dow, 1, 1, 0) AS d_dow, CONVERT_TO_INTEGER(d_moy, 1, 1, 0) AS d_moy, CONVERT_TO_INTEGER(d_dom, 1, 1, 0) AS d_dom, CONVERT_TO_INTEGER(d_qoy, 1, 1, 0) AS d_qoy, CONVERT_TO_INTEGER(d_fy_year, 1, 1, 0) AS d_fy_year, CONVERT_TO_INTEGER(d_fy_quarter_seq, 1, 1, 0) AS d_fy_quarter_seq, CONVERT_TO_INTEGER(d_fy_week_seq, 1, 1, 0) AS d_fy_week_seq, d_day_name, d_quarter_name, d_holiday, d_weekend, d_following_holiday, CONVERT_TO_INTEGER(d_first_dom, 1, 1, 0) AS d_first_dom, CONVERT_TO_INTEGER(d_last_dom, 1, 1, 0) AS d_last_dom, CONVERT_TO_INTEGER(d_same_day_ly, 1, 1, 0) AS d_same_day_ly, CONVERT_TO_INTEGER(d_same_day_lq, 1, 1, 0) AS d_same_day_lq, d_current_day, d_current_week, d_current_month, d_current_quarter, d_current_year FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."date_dim.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."household_demographics" as SELECT CONVERT_TO_INTEGER(hd_demo_sk, 1, 1, 0) AS hd_demo_sk, CONVERT_TO_INTEGER(hd_income_band_sk, 1, 1, 0) AS hd_income_band_sk, hd_buy_potential, CONVERT_TO_INTEGER(hd_dep_count, 1, 1, 0) AS hd_dep_count, CONVERT_TO_INTEGER(hd_vehicle_count, 1, 1, 0) AS hd_vehicle_count FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."household_demographics.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."income_band" as SELECT CONVERT_TO_INTEGER(ib_income_band_sk, 1, 1, 0) AS ib_income_band_sk, CONVERT_TO_INTEGER(ib_lower_bound, 1, 1, 0) AS ib_lower_bound, CONVERT_TO_INTEGER(ib_upper_bound, 1, 1, 0) AS ib_upper_bound FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."income_band.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."inventory" as SELECT CONVERT_TO_INTEGER(inv_date_sk, 1, 1, 0) AS inv_date_sk, CONVERT_TO_INTEGER(inv_item_sk, 1, 1, 0) AS inv_item_sk, CONVERT_TO_INTEGER(inv_warehouse_sk, 1, 1, 0) AS inv_warehouse_sk, CONVERT_TO_INTEGER(inv_quantity_on_hand, 1, 1, 0) AS inv_quantity_on_hand FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."inventory.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."item" as SELECT CONVERT_TO_INTEGER(i_item_sk, 1, 1, 0) AS i_item_sk, i_item_id, i_rec_start_date, i_rec_end_date, i_item_desc, CONVERT_TO_FLOAT(i_current_price, 1, 1, 0) AS i_current_price, CONVERT_TO_FLOAT(i_wholesale_cost, 1, 1, 0) AS i_wholesale_cost, CONVERT_TO_INTEGER(i_brand_id, 1, 1, 0) AS i_brand_id, i_brand, CONVERT_TO_INTEGER(i_class_id, 1, 1, 0) AS i_class_id, i_class, CONVERT_TO_INTEGER(i_category_id, 1, 1, 0) AS i_category_id, i_category, CONVERT_TO_INTEGER(i_manufact_id, 1, 1, 0) AS i_manufact_id, i_manufact, i_size, i_formulation, i_color, i_units, i_container, CONVERT_TO_INTEGER(i_manager_id, 1, 1, 0) AS i_manager_id, i_product_name FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."item.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."promotion" as SELECT CONVERT_TO_INTEGER(p_promo_sk, 1, 1, 0) AS p_promo_sk, p_promo_id, CONVERT_TO_INTEGER(p_start_date_sk, 1, 1, 0) AS p_start_date_sk, CONVERT_TO_INTEGER(p_end_date_sk, 1, 1, 0) AS p_end_date_sk, CONVERT_TO_INTEGER(p_item_sk, 1, 1, 0) AS p_item_sk, CONVERT_TO_FLOAT(p_cost, 1, 1, 0) AS p_cost, p_response_target, p_promo_name, p_channel_dmail, p_channel_email, p_channel_catalog, p_channel_tv, p_channel_radio, p_channel_press, p_channel_event, p_channel_demo, p_channel_details, p_purpose, p_discount_active FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."promotion.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."reason" as SELECT CONVERT_TO_INTEGER(r_reason_sk, 1, 1, 0) AS r_reason_sk, r_reason_id, r_reason_desc FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."reason.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."ship_mode" as SELECT CONVERT_TO_INTEGER(sm_ship_mode_sk, 1, 1, 0) AS sm_ship_mode_sk, sm_ship_mode_id, sm_type, sm_code, sm_carrier, sm_contract FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."ship_mode.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."store_returns" as SELECT CONVERT_TO_INTEGER(sr_returned_date_sk, 1, 1, 0) AS sr_returned_date_sk, CONVERT_TO_INTEGER(sr_return_time_sk, 1, 1, 0) AS sr_return_time_sk, CONVERT_TO_INTEGER(sr_item_sk, 1, 1, 0) AS sr_item_sk, CONVERT_TO_INTEGER(sr_customer_sk, 1, 1, 0) AS sr_customer_sk, CONVERT_TO_INTEGER(sr_cdemo_sk, 1, 1, 0) AS sr_cdemo_sk, CONVERT_TO_INTEGER(sr_hdemo_sk, 1, 1, 0) AS sr_hdemo_sk, CONVERT_TO_INTEGER(sr_addr_sk, 1, 1, 0) AS sr_addr_sk, CONVERT_TO_INTEGER(sr_store_sk, 1, 1, 0) AS sr_store_sk, CONVERT_TO_INTEGER(sr_reason_sk, 1, 1, 0) AS sr_reason_sk, CONVERT_TO_INTEGER(sr_ticket_number, 1, 1, 0) AS sr_ticket_number, CONVERT_TO_INTEGER(sr_return_quantity, 1, 1, 0) AS sr_return_quantity, CONVERT_TO_FLOAT(sr_return_amt, 1, 1, 0) AS sr_return_amt, CONVERT_TO_FLOAT(sr_return_tax, 1, 1, 0) AS sr_return_tax, CONVERT_TO_FLOAT(sr_return_amt_inc_tax, 1, 1, 0) AS sr_return_amt_inc_tax, CONVERT_TO_FLOAT(sr_fee, 1, 1, 0) AS sr_fee, CONVERT_TO_FLOAT(sr_return_ship_cost, 1, 1, 0) AS sr_return_ship_cost, CONVERT_TO_FLOAT(sr_refunded_cash, 1, 1, 0) AS sr_refunded_cash, CONVERT_TO_FLOAT(sr_reversed_charge, 1, 1, 0) AS sr_reversed_charge, CONVERT_TO_FLOAT(sr_store_credit, 1, 1, 0) AS sr_store_credit, CONVERT_TO_FLOAT(sr_net_loss, 1, 1, 0) AS sr_net_loss FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."store_returns.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."store_sales" as SELECT CONVERT_TO_INTEGER(ss_sold_date_sk, 1, 1, 0) AS ss_sold_date_sk, CONVERT_TO_INTEGER(ss_sold_time_sk, 1, 1, 0) AS ss_sold_time_sk, CONVERT_TO_INTEGER(ss_item_sk, 1, 1, 0) AS ss_item_sk, CONVERT_TO_INTEGER(ss_customer_sk, 1, 1, 0) AS ss_customer_sk, CONVERT_TO_INTEGER(ss_cdemo_sk, 1, 1, 0) AS ss_cdemo_sk, CONVERT_TO_INTEGER(ss_hdemo_sk, 1, 1, 0) AS ss_hdemo_sk, CONVERT_TO_INTEGER(ss_addr_sk, 1, 1, 0) AS ss_addr_sk, CONVERT_TO_INTEGER(ss_store_sk, 1, 1, 0) AS ss_store_sk, CONVERT_TO_INTEGER(ss_promo_sk, 1, 1, 0) AS ss_promo_sk, CONVERT_TO_INTEGER(ss_ticket_number, 1, 1, 0) AS ss_ticket_number, CONVERT_TO_INTEGER(ss_quantity, 1, 1, 0) AS ss_quantity, CONVERT_TO_FLOAT(ss_wholesale_cost, 1, 1, 0) AS ss_wholesale_cost, CONVERT_TO_FLOAT(ss_list_price, 1, 1, 0) AS ss_list_price, CONVERT_TO_FLOAT(ss_sales_price, 1, 1, 0) AS ss_sales_price, CONVERT_TO_FLOAT(ss_ext_discount_amt, 1, 1, 0) AS ss_ext_discount_amt, CONVERT_TO_FLOAT(ss_ext_sales_price, 1, 1, 0) AS ss_ext_sales_price, CONVERT_TO_FLOAT(ss_ext_wholesale_cost, 1, 1, 0) AS ss_ext_wholesale_cost, CONVERT_TO_FLOAT(ss_ext_list_price, 1, 1, 0) AS ss_ext_list_price, CONVERT_TO_FLOAT(ss_ext_tax, 1, 1, 0) AS ss_ext_tax, CONVERT_TO_FLOAT(ss_coupon_amt, 1, 1, 0) AS ss_coupon_amt, CONVERT_TO_FLOAT(ss_net_paid, 1, 1, 0) AS ss_net_paid, CONVERT_TO_FLOAT(ss_net_paid_inc_tax, 1, 1, 0) AS ss_net_paid_inc_tax, CONVERT_TO_FLOAT(ss_net_profit, 1, 1, 0) AS ss_net_profit FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."store_sales.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."store" as SELECT CONVERT_TO_INTEGER(s_store_sk, 1, 1, 0) AS s_store_sk, s_store_id, s_rec_start_date, s_rec_end_date, s_closed_date_sk, s_store_name, CONVERT_TO_INTEGER(s_number_employees, 1, 1, 0) AS s_number_employees, CONVERT_TO_INTEGER(s_floor_space, 1, 1, 0) AS s_floor_space, s_hours, S_manager, CONVERT_TO_INTEGER(S_market_id, 1, 1, 0) AS S_market_id, S_geography_class, S_market_desc, s_market_manager, CONVERT_TO_INTEGER(s_division_id, 1, 1, 0) AS s_division_id, s_division_name, CONVERT_TO_INTEGER(s_company_id, 1, 1, 0) AS s_company_id, s_company_name, CONVERT_TO_INTEGER(s_street_number, 1, 1, 0) AS s_street_number, s_street_name, s_street_type, s_suite_number, s_city, s_county, s_state, s_zip, s_country, CONVERT_TO_INTEGER(s_gmt_offset, 1, 1, 0) AS s_gmt_offset, CONVERT_TO_FLOAT(s_tax_percentage, 1, 1, 0) AS s_tax_percentage FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."store.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."time_dim" as SELECT CONVERT_TO_INTEGER(t_time_sk, 1, 1, 0) AS t_time_sk, t_time_id, CONVERT_TO_INTEGER(t_time, 1, 1, 0) AS t_time, CONVERT_TO_INTEGER(t_hour, 1, 1, 0) AS t_hour, CONVERT_TO_INTEGER(t_minute, 1, 1, 0) AS t_minute, CONVERT_TO_INTEGER(t_second, 1, 1, 0) AS t_second, t_am_pm, t_shift, t_sub_shift, t_meal_time FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."time_dim.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."warehouse" as SELECT CONVERT_TO_INTEGER(w_warehouse_sk, 1, 1, 0) AS w_warehouse_sk, w_warehouse_id, w_warehouse_name, CONVERT_TO_INTEGER(w_warehouse_sq_ft, 1, 1, 0) AS w_warehouse_sq_ft, CONVERT_TO_INTEGER(w_street_number, 1, 1, 0) AS w_street_number, w_street_name, w_street_type, w_suite_number, w_city, w_county, w_state, w_zip, w_country, CONVERT_TO_INTEGER(w_gmt_offset, 1, 1, 0) AS w_gmt_offset FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."warehouse.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."web_page" as SELECT CONVERT_TO_INTEGER(wp_web_page_sk, 1, 1, 0) AS wp_web_page_sk, wp_web_page_id, wp_rec_start_date, wp_rec_end_date, CONVERT_TO_INTEGER(wp_creation_date_sk, 1, 1, 0) AS wp_creation_date_sk, CONVERT_TO_INTEGER(wp_access_date_sk, 1, 1, 0) AS wp_access_date_sk, wp_autogen_flag, CONVERT_TO_INTEGER(wp_customer_sk, 1, 1, 0) AS wp_customer_sk, wp_url, wp_type, CONVERT_TO_INTEGER(wp_char_count, 1, 1, 0) AS wp_char_count, CONVERT_TO_INTEGER(wp_link_count, 1, 1, 0) AS wp_link_count, CONVERT_TO_INTEGER(wp_image_count, 1, 1, 0) AS wp_image_count, CONVERT_TO_INTEGER(wp_max_ad_count, 1, 1, 0) AS wp_max_ad_count FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."web_page.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."web_returns" as SELECT CONVERT_TO_INTEGER(wr_returned_date_sk, 1, 1, 0) AS wr_returned_date_sk, CONVERT_TO_INTEGER(wr_returned_time_sk, 1, 1, 0) AS wr_returned_time_sk, CONVERT_TO_INTEGER(wr_item_sk, 1, 1, 0) AS wr_item_sk, CONVERT_TO_INTEGER(wr_refunded_customer_sk, 1, 1, 0) AS wr_refunded_customer_sk, CONVERT_TO_INTEGER(wr_refunded_cdemo_sk, 1, 1, 0) AS wr_refunded_cdemo_sk, CONVERT_TO_INTEGER(wr_refunded_hdemo_sk, 1, 1, 0) AS wr_refunded_hdemo_sk, CONVERT_TO_INTEGER(wr_refunded_addr_sk, 1, 1, 0) AS wr_refunded_addr_sk, CONVERT_TO_INTEGER(wr_returning_customer_sk, 1, 1, 0) AS wr_returning_customer_sk, CONVERT_TO_INTEGER(wr_returning_cdemo_sk, 1, 1, 0) AS wr_returning_cdemo_sk, CONVERT_TO_INTEGER(wr_returning_hdemo_sk, 1, 1, 0) AS wr_returning_hdemo_sk, CONVERT_TO_INTEGER(wr_returning_addr_sk, 1, 1, 0) AS wr_returning_addr_sk, CONVERT_TO_INTEGER(wr_web_page_sk, 1, 1, 0) AS wr_web_page_sk, CONVERT_TO_INTEGER(wr_reason_sk, 1, 1, 0) AS wr_reason_sk, CONVERT_TO_INTEGER(wr_order_number, 1, 1, 0) AS wr_order_number, CONVERT_TO_INTEGER(wr_return_quantity, 1, 1, 0) AS wr_return_quantity, CONVERT_TO_FLOAT(wr_return_amt, 1, 1, 0) AS wr_return_amt, CONVERT_TO_FLOAT(wr_return_tax, 1, 1, 0) AS wr_return_tax, CONVERT_TO_FLOAT(wr_return_amt_inc_tax, 1, 1, 0) AS wr_return_amt_inc_tax, CONVERT_TO_FLOAT(wr_fee, 1, 1, 0) AS wr_fee, CONVERT_TO_FLOAT(wr_return_ship_cost, 1, 1, 0) AS wr_return_ship_cost, CONVERT_TO_FLOAT(wr_refunded_cash, 1, 1, 0) AS wr_refunded_cash, CONVERT_TO_FLOAT(wr_reversed_charge, 1, 1, 0) AS wr_reversed_charge, CONVERT_TO_FLOAT(wr_account_credit, 1, 1, 0) AS wr_account_credit, CONVERT_TO_FLOAT(wr_net_loss, 1, 1, 0) AS wr_net_loss FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."web_returns.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."web_sales" as SELECT CONVERT_TO_INTEGER(ws_sold_date_sk, 1, 1, 0) AS ws_sold_date_sk, CONVERT_TO_INTEGER(ws_sold_time_sk, 1, 1, 0) AS ws_sold_time_sk, CONVERT_TO_INTEGER(ws_ship_date_sk, 1, 1, 0) AS ws_ship_date_sk, CONVERT_TO_INTEGER(ws_item_sk, 1, 1, 0) AS ws_item_sk, CONVERT_TO_INTEGER(ws_bill_customer_sk, 1, 1, 0) AS ws_bill_customer_sk, CONVERT_TO_INTEGER(ws_bill_cdemo_sk, 1, 1, 0) AS ws_bill_cdemo_sk, CONVERT_TO_INTEGER(ws_bill_hdemo_sk, 1, 1, 0) AS ws_bill_hdemo_sk, CONVERT_TO_INTEGER(ws_bill_addr_sk, 1, 1, 0) AS ws_bill_addr_sk, CONVERT_TO_INTEGER(ws_ship_customer_sk, 1, 1, 0) AS ws_ship_customer_sk, CONVERT_TO_INTEGER(ws_ship_cdemo_sk, 1, 1, 0) AS ws_ship_cdemo_sk, CONVERT_TO_INTEGER(ws_ship_hdemo_sk, 1, 1, 0) AS ws_ship_hdemo_sk, CONVERT_TO_INTEGER(ws_ship_addr_sk, 1, 1, 0) AS ws_ship_addr_sk, CONVERT_TO_INTEGER(ws_web_page_sk, 1, 1, 0) AS ws_web_page_sk, CONVERT_TO_INTEGER(ws_web_site_sk, 1, 1, 0) AS ws_web_site_sk, CONVERT_TO_INTEGER(ws_ship_mode_sk, 1, 1, 0) AS ws_ship_mode_sk, CONVERT_TO_INTEGER(ws_warehouse_sk, 1, 1, 0) AS ws_warehouse_sk, CONVERT_TO_INTEGER(ws_promo_sk, 1, 1, 0) AS ws_promo_sk, CONVERT_TO_INTEGER(ws_order_number, 1, 1, 0) AS ws_order_number, CONVERT_TO_INTEGER(ws_quantity, 1, 1, 0) AS ws_quantity, CONVERT_TO_FLOAT(ws_wholesale_cost, 1, 1, 0) AS ws_wholesale_cost, CONVERT_TO_FLOAT(ws_list_price, 1, 1, 0) AS ws_list_price, CONVERT_TO_FLOAT(ws_sales_price, 1, 1, 0) AS ws_sales_price, CONVERT_TO_FLOAT(ws_ext_discount_amt, 1, 1, 0) AS ws_ext_discount_amt, CONVERT_TO_FLOAT(ws_ext_sales_price, 1, 1, 0) AS ws_ext_sales_price, CONVERT_TO_FLOAT(ws_ext_wholesale_cost, 1, 1, 0) AS ws_ext_wholesale_cost, CONVERT_TO_FLOAT(ws_ext_list_price, 1, 1, 0) AS ws_ext_list_price, CONVERT_TO_FLOAT(ws_ext_tax, 1, 1, 0) AS ws_ext_tax, CONVERT_TO_FLOAT(ws_coupon_amt, 1, 1, 0) AS ws_coupon_amt, CONVERT_TO_FLOAT(ws_ext_ship_cost, 1, 1, 0) AS ws_ext_ship_cost, CONVERT_TO_FLOAT(ws_net_paid, 1, 1, 0) AS ws_net_paid, CONVERT_TO_FLOAT(ws_net_paid_inc_tax, 1, 1, 0) AS ws_net_paid_inc_tax, CONVERT_TO_FLOAT(ws_net_paid_inc_ship, 1, 1, 0) AS ws_net_paid_inc_ship, CONVERT_TO_FLOAT(ws_net_paid_inc_ship_tax, 1, 1, 0) AS ws_net_paid_inc_ship_tax, CONVERT_TO_FLOAT(ws_net_profit, 1, 1, 0) AS ws_net_profit FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."web_sales.csv";
create table "benchmarking"."benchmarking"."tpcds_sf10000"."web_site" as SELECT CONVERT_TO_INTEGER(web_site_sk, 1, 1, 0) AS web_site_sk, web_site_id, web_rec_start_date, web_rec_end_date, web_name, CONVERT_TO_INTEGER(web_open_date_sk, 1, 1, 0) AS web_open_date_sk, CONVERT_TO_INTEGER(web_close_date_sk, 1, 1, 0) AS web_close_date_sk, web_class, web_manager, CONVERT_TO_INTEGER(web_mkt_id, 1, 1, 0) AS web_mkt_id, web_mkt_class, web_mkt_desc, web_market_manager, CONVERT_TO_INTEGER(web_company_id, 1, 1, 0) AS web_company_id, web_company_name, CONVERT_TO_INTEGER(web_street_number, 1, 1, 0) AS web_street_number, web_street_name, web_street_type, web_suite_number, web_city, web_county, web_state, web_zip, web_country, CONVERT_TO_INTEGER(web_gmt_offset, 1, 1, 0) AS web_gmt_offset, CONVERT_TO_FLOAT(web_tax_percentage, 1, 1, 0) AS web_tax_percentage FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."web_site.csv";
```
* 以上內容貼到dremio的SQL Runner藍色框框中, 並按Run執行

## Step 4. Test Benchmark
* 進入Benchmark路徑
```bash=
cd /root/dremio-tpc-ds/Scripts/TPC-DS\ Benchmark/
```
* DremioTestPlanTPC-DS.jmx與DremioTestPlanTPC-DS-metadatarefresh.jmx中修改dremio loadbalancer IP並儲存
* WORKERS數改成K8S上部署的executor數, 如下
```bash=
vi runDremioBenchmark.sh
```
```bash=
#!/bin/bash
WORKERS='four' # executor數
echo "First Run"
jmeter -n -t ./DremioTestPlanTPC-DS-metadatarefresh.jmx
jmeter -n -t ./DremioTestPlanTPC-DS.jmx
mv DremioSummary-tpc-ds.csv 'DremioSummary-tpc-ds-'$WORKERS'-workers-first.csv'
echo "Second Run"
jmeter -n -t ./DremioTestPlanTPC-DS.jmx
mv DremioSummary-tpc-ds.csv 'DremioSummary-tpc-ds-'$WORKERS'-workers-second.csv'
echo "Third Run"
jmeter -n -t ./DremioTestPlanTPC-DS.jmx
mv DremioSummary-tpc-ds.csv 'DremioSummary-tpc-ds-'$WORKERS'-workers-third.csv'
echo "Done"
```
* 儲存完後如下執行
```bash=
./runDremioBenchmark.sh
```
## Step 5. Scenario
* Scenario 1: 1 coordinator and 4 executor nodes.
* Scenario 2: 1 coordinator and 6 executor nodes.
* Scenario 3: 1 coordinator and 8 executor nodes.
* Scenario 4: 1 coordinator and 12 executor nodes.
* Scenario 5: 1 coordinator and 16 executor nodes.
* Scenario 6: 1 coordinator and 20 executor nodes.
:::danger
Note:
* The test is conducted with (4) Dremio executors and scaled to (8) Dremio executors.
* Compare the eight-node Dremio results using Dell ECS against the eightnode Dremio results on AWS, and forecast the linear scalability for 12, 16, and 20 nodes.
:::
# Monitor cpu & memory
* 在執行測試的期間,執行 iostat 來監看 CPU 和 Memory
## Install iostat
* 於任意一個有跑 kafka pod 的 Node 進行安裝
```bash=
# CentOS
yum install sysstat
# Ubuntu
apt-get install sysstat
```
## iostat 指令
* 需要先知道 Kafka 的 data 存在哪個 Device 並監看其 **MB_read/s**
```bash=
# -m : 以 M 為單位顯示所有訊息
# 每 60 秒刷新訊息,且顯示 180 次,測量時間約 3H
iostat -m 60 180
# ------------------------------
# tps :該設備每秒的傳輸次數
# MB_read/s : 每秒從設備讀取的資料量
# MB_wrtn/s : 每秒從設備寫入的資料量
# MB_read : 讀取的總資料量
# MB_wrtn : 寫入的總資料量
# ------------------------------
```