Dremio Performance == ###### tags: `infortrend` [TOC] # Prerequisites (!!安裝前需要確認!!) :::danger * 執行測試時,監控 CPU和 Memory 的使用量 * 相關測試文件 * [**Infortrend 測試報告**](https://docs.google.com/spreadsheets/d/1HzZeIaoYSQRguL9ltCo2yXsaZRZUpLmi9VaDA5DA-5I/edit#gid=48282311) * [**Dell 測試文件**](https://www.delltechnologies.com/asset/en-us/products/storage/industry-market/h18538-dremio-perf-on-powerscale-ecs.pdf) * [**Dremio Benchmarking**](https://www.dremio.com/blog/dremio-benchmarking-methodology/#h-benchmarking-test-suite) ::: # Test Step ## Step 1. Install Dremio * 安裝Dremio前須要先安裝minio server * 規格: ![image](https://hackmd.io/_uploads/r1araCHqp.png) ![image](https://hackmd.io/_uploads/rJ0plBc_a.png) * minio的minio-data-pvc建立30TB的空間 ![image](https://hackmd.io/_uploads/SkauWBcdp.png) * minio上建立一個叫benchmarking的bucket ![image](https://hackmd.io/_uploads/H1xkLL9ua.png) * 安裝Dremio, 1 coordinator and 4 executor nodes. * 規格: ![image](https://hackmd.io/_uploads/SyNeLE9_a.png) ![image](https://hackmd.io/_uploads/rJhYsEcua.png) ![image](https://hackmd.io/_uploads/S1isq4qOp.png) ![image](https://hackmd.io/_uploads/SyYp2Vqu6.png) * 安裝Dremio時App Setting的bucket name輸入為benchmarking ![image](https://hackmd.io/_uploads/ryxo64qdT.png) ## Step 2. Generate data ### Step 2-1. Prepare Dremio data folder * 在K8S Cluster上建立一個dremio-data的PVC, Storage Size為30TB ![image](https://hackmd.io/_uploads/BJEEND5Op.png) * 選Many Nodes Read-Write就建立 ![image](https://hackmd.io/_uploads/SyMYiL5_p.png) * 找到以下圖中PVC所在的k8s node並用ssh連線到那個node上 ![image](https://hackmd.io/_uploads/Hy7ZjD9dT.png) * 需要在有跑etcd的node上執行以下的cmd * 找到pv對應的(部分)volumeId ```bash= [root@ks-9202636 ~] kubectl get pv ``` ![image](https://hackmd.io/_uploads/S1-OojiOa.png) * 透過etcd找到這個pv的資訊 ```bash= [root@ks-9202636 ~] docker exec etcd etcdctl get --prefix /hci/pvi | grep 9e7130f3 ``` ![image](https://hackmd.io/_uploads/Hk6wnjjOT.png) * deviceId跟hostname的對照 ```bash= [root@ks-9202636 ~] docker exec etcd etcdctl get --prefix /hci/de ``` ![image](https://hackmd.io/_uploads/rJPARis_6.png) /Pool_head/9e7130f3a4f643b5a2e9677ef8136682/pvc_9e7130f3a4f643b5a2e9677ef8136682/是ks-9202636上PVC所在的路徑 * 下以下指令啟動ubuntu為dremio產生資料準備 ```bash= docker run -it -v /Pool_head/9e7130f3a4f643b5a2e9677ef8136682/pvc_9e7130f3a4f643b5a2e9677ef8136682/:/mnt infortrend/dremio-client ``` ### Step 2-2. 使用 TPC-DS dsdgen utility 來產生 10 TB 的資料 ```bash= mkdir /mnt/data mkdir /mnt/result cd /root/tpcds-kit/tools ./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 1 & ./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 2 & ./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 3 & ./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 4 & ./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 5 & ./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 6 & ./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 7 & ./dsdgen -scale 10000 -f -dir /mnt/data -terminate n -parallel 15 -child 8 ``` :::danger Note: 等待一下全部的thread執行完 ::: ### Step 2-3. 執行以下 script,將產生出來的 data 加上[第一列的名稱](https://github.com/sergeleo/dremio-tpc-ds/blob/master/Scripts/TPC-DS%20Data%20Prep/DatasetHeaders/tables_list.txt) 一樣在docker中下以下指令 ```bash= cd /root/dremio-tpc-ds/Scripts/TPC-DS\ Data\ Prep/ ./data_convertion.sh ``` :::danger Note: 產生完所有檔案後/mnt/result/底下要有24個檔案 ::: ## Step 3. Data Conversion to Parquet * 使用 Dremio CTAS (Create Table as Select) 將資料轉成 Parquet 格式 * 將產生的 TPC-DS 資料集移至 AWS S3(minio server) 中 ```bash= cd /mnt/result/ mc alias set benchmarking http://<minio LoadBalancer IP>:9000 admin admin123 mc cp ./ benchmarking/benchmarking/tpcds_sf10000/csv/ --recursive ``` * 進入Dremio的網頁(http://< dremio loadbalancer IP>:9047)後註冊username為dremio, password為dremio00, 如下圖: ![image](https://hackmd.io/_uploads/HkrGYKo_a.png) * 註冊成功後選擇Add source * 選擇Amazon s3後General menu中填寫從minio server抓過來的資料夾名稱, minio access key, minio secret key ![image](https://hackmd.io/_uploads/rkt7kmcO6.png) * 移除打勾Encrypt connection ![image](https://hackmd.io/_uploads/rkkKy79uT.png) * Advanced Options中Enable compatibility mode要打勾 ![image](https://hackmd.io/_uploads/rk9iyQqd6.png) * 按Add properties後填寫name為fs.s3a.endpoint, value為minio loadbalancer IP與api port(9000)並按save ![image](https://hackmd.io/_uploads/ryi1gQ9uT.png) ```bash= cd /root/dremio-tpc-ds/Scripts/TPC-DS\ Data\ Prep/ vi PromoteCSVtoPDS.sh #內容修改成<dremio loadbalancer IP> ``` ```bash= #!/bin/bash # Authenticate and extract the token TOKEN=$(curl -X POST http://<dremio loadbalancer IP>:9047/apiv2/login -H 'Content-Type: application/json' -d '{"userName": "dremio", "password": "dremio00"}'| jq -r '.token' ) # Iterate through datasets for i in `cat DatasetHeaders/tables_list.txt` do DATAOUTPUT=$(curl -X PUT http://<dremio loadbalancer IP>:9047/apiv2/source/benchmarking/folder_format/benchmarking/tpcds_sf10000/csv/$i.csv -H 'Authorization: _dremio'"$TOKEN"'' -H 'Content-Type: application/json' -d '{"type": "Text", "fieldDelimiter": "|", "skipFirstLine": false, "extractHeader": true, "quote": "\"", "comment": "#", "escape": "\"", "lineDelimiter": "\n", "autoGenerateColumnNames": true, "trimHeader": true}'); echo $DATAOUTPUT; done ``` * 儲存完後如下執行 ```bash= ./PromoteCSVtoPDS.sh ``` * 最後所有CSV檔轉換成Parquet file format ```bash= create table "benchmarking"."benchmarking"."tpcds_sf10000"."call_center" as SELECT CONVERT_TO_INTEGER(cc_call_center_sk, 1, 1, 0) AS cc_call_center_sk, cc_call_center_id, cc_rec_start_date, cc_rec_end_date, cc_closed_date_sk, cc_open_date_sk, cc_name, cc_class, CONVERT_TO_INTEGER(cc_employees, 1, 1, 0) AS cc_employees, CONVERT_TO_INTEGER(cc_sq_ft, 1, 1, 0) AS cc_sq_ft, cc_hours, cc_manager, CONVERT_TO_INTEGER(cc_mkt_id, 1, 1, 0) AS cc_mkt_id, cc_mkt_class, cc_mkt_desc, cc_market_manager, CONVERT_TO_INTEGER(cc_division, 1, 1, 0) AS cc_division, cc_division_name, CONVERT_TO_INTEGER(cc_company, 1, 1, 0) AS cc_company, cc_company_name, CONVERT_TO_INTEGER(cc_street_number, 1, 1, 0) AS cc_street_number, cc_street_name, cc_street_type, cc_suite_number, cc_city, cc_county, cc_state, cc_zip, cc_country, CONVERT_TO_INTEGER(cc_gmt_offset, 1, 1, 0) AS cc_gmt_offset, CONVERT_TO_FLOAT(cc_tax_percentage, 1, 1, 0) AS cc_tax_percentage FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."call_center.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."catalog_returns" as SELECT CONVERT_TO_INTEGER(cr_returned_date_sk, 1, 1, 0) AS cr_returned_date_sk, CONVERT_TO_INTEGER(cr_returned_time_sk, 1, 1, 0) AS cr_returned_time_sk, CONVERT_TO_INTEGER(cr_item_sk, 1, 1, 0) AS cr_item_sk, CONVERT_TO_INTEGER(cr_refunded_customer_sk, 1, 1, 0) AS cr_refunded_customer_sk, CONVERT_TO_INTEGER(cr_refunded_cdemo_sk, 1, 1, 0) AS cr_refunded_cdemo_sk, CONVERT_TO_INTEGER(cr_refunded_hdemo_sk, 1, 1, 0) AS cr_refunded_hdemo_sk, CONVERT_TO_INTEGER(cr_refunded_addr_sk, 1, 1, 0) AS cr_refunded_addr_sk, CONVERT_TO_INTEGER(cr_returning_customer_sk, 1, 1, 0) AS cr_returning_customer_sk, CONVERT_TO_INTEGER(cr_returning_cdemo_sk, 1, 1, 0) AS cr_returning_cdemo_sk, CONVERT_TO_INTEGER(cr_returning_hdemo_sk, 1, 1, 0) AS cr_returning_hdemo_sk, CONVERT_TO_INTEGER(cr_returning_addr_sk, 1, 1, 0) AS cr_returning_addr_sk, CONVERT_TO_INTEGER(cr_call_center_sk, 1, 1, 0) AS cr_call_center_sk, CONVERT_TO_INTEGER(cr_catalog_page_sk, 1, 1, 0) AS cr_catalog_page_sk, CONVERT_TO_INTEGER(cr_ship_mode_sk, 1, 1, 0) AS cr_ship_mode_sk, CONVERT_TO_INTEGER(cr_warehouse_sk, 1, 1, 0) AS cr_warehouse_sk, CONVERT_TO_INTEGER(cr_reason_sk, 1, 1, 0) AS cr_reason_sk, CONVERT_TO_INTEGER(cr_order_number, 1, 1, 0) AS cr_order_number, CONVERT_TO_INTEGER(cr_return_quantity, 1, 1, 0) AS cr_return_quantity, CONVERT_TO_FLOAT(cr_return_amount, 1, 1, 0) AS cr_return_amount, CONVERT_TO_FLOAT(cr_return_tax, 1, 1, 0) AS cr_return_tax, CONVERT_TO_FLOAT(cr_return_amt_inc_tax, 1, 1, 0) AS cr_return_amt_inc_tax, CONVERT_TO_FLOAT(cr_fee, 1, 1, 0) AS cr_fee, CONVERT_TO_FLOAT(cr_return_ship_cost, 1, 1, 0) AS cr_return_ship_cost, CONVERT_TO_FLOAT(cr_refunded_cash, 1, 1, 0) AS cr_refunded_cash, CONVERT_TO_FLOAT(cr_reversed_charge, 1, 1, 0) AS cr_reversed_charge, CONVERT_TO_FLOAT(cr_store_credit, 1, 1, 0) AS cr_store_credit, CONVERT_TO_FLOAT(cr_net_loss, 1, 1, 0) AS cr_net_loss FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."catalog_returns.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."catalog_sales" as SELECT CONVERT_TO_INTEGER(cs_sold_date_sk, 1, 1, 0) AS cs_sold_date_sk, CONVERT_TO_INTEGER(cs_sold_time_sk, 1, 1, 0) AS cs_sold_time_sk, CONVERT_TO_INTEGER(cs_ship_date_sk, 1, 1, 0) AS cs_ship_date_sk, CONVERT_TO_INTEGER(cs_bill_customer_sk, 1, 1, 0) AS cs_bill_customer_sk, CONVERT_TO_INTEGER(cs_bill_cdemo_sk, 1, 1, 0) AS cs_bill_cdemo_sk, CONVERT_TO_INTEGER(cs_bill_hdemo_sk, 1, 1, 0) AS cs_bill_hdemo_sk, CONVERT_TO_INTEGER(cs_bill_addr_sk, 1, 1, 0) AS cs_bill_addr_sk, CONVERT_TO_INTEGER(cs_ship_customer_sk, 1, 1, 0) AS cs_ship_customer_sk, CONVERT_TO_INTEGER(cs_ship_cdemo_sk, 1, 1, 0) AS cs_ship_cdemo_sk, CONVERT_TO_INTEGER(cs_ship_hdemo_sk, 1, 1, 0) AS cs_ship_hdemo_sk, CONVERT_TO_INTEGER(cs_ship_addr_sk, 1, 1, 0) AS cs_ship_addr_sk, CONVERT_TO_INTEGER(cs_call_center_sk, 1, 1, 0) AS cs_call_center_sk, CONVERT_TO_INTEGER(cs_catalog_page_sk, 1, 1, 0) AS cs_catalog_page_sk, CONVERT_TO_INTEGER(cs_ship_mode_sk, 1, 1, 0) AS cs_ship_mode_sk, CONVERT_TO_INTEGER(cs_warehouse_sk, 1, 1, 0) AS cs_warehouse_sk, CONVERT_TO_INTEGER(cs_item_sk, 1, 1, 0) AS cs_item_sk, CONVERT_TO_INTEGER(cs_promo_sk, 1, 1, 0) AS cs_promo_sk, CONVERT_TO_INTEGER(cs_order_number, 1, 1, 0) AS cs_order_number, CONVERT_TO_INTEGER(cs_quantity, 1, 1, 0) AS cs_quantity, CONVERT_TO_FLOAT(cs_wholesale_cost, 1, 1, 0) AS cs_wholesale_cost, CONVERT_TO_FLOAT(cs_list_price, 1, 1, 0) AS cs_list_price, CONVERT_TO_FLOAT(cs_sales_price, 1, 1, 0) AS cs_sales_price, CONVERT_TO_FLOAT(cs_ext_discount_amt, 1, 1, 0) AS cs_ext_discount_amt, CONVERT_TO_FLOAT(cs_ext_sales_price, 1, 1, 0) AS cs_ext_sales_price, CONVERT_TO_FLOAT(cs_ext_wholesale_cost, 1, 1, 0) AS cs_ext_wholesale_cost, CONVERT_TO_FLOAT(cs_ext_list_price, 1, 1, 0) AS cs_ext_list_price, CONVERT_TO_FLOAT(cs_ext_tax, 1, 1, 0) AS cs_ext_tax, CONVERT_TO_FLOAT(cs_coupon_amt, 1, 1, 0) AS cs_coupon_amt, CONVERT_TO_FLOAT(cs_ext_ship_cost, 1, 1, 0) AS cs_ext_ship_cost, CONVERT_TO_FLOAT(cs_net_paid, 1, 1, 0) AS cs_net_paid, CONVERT_TO_FLOAT(cs_net_paid_inc_tax, 1, 1, 0) AS cs_net_paid_inc_tax, CONVERT_TO_FLOAT(cs_net_paid_inc_ship, 1, 1, 0) AS cs_net_paid_inc_ship, CONVERT_TO_FLOAT(cs_net_paid_inc_ship_tax, 1, 1, 0) AS cs_net_paid_inc_ship_tax, CONVERT_TO_FLOAT(cs_net_profit, 1, 1, 0) AS cs_net_profit FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."catalog_sales.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."catalog_page" as SELECT CONVERT_TO_INTEGER(cp_catalog_page_sk, 1, 1, 0) AS cp_catalog_page_sk, cp_catalog_page_id, CONVERT_TO_INTEGER(cp_start_date_sk, 1, 1, 0) AS cp_start_date_sk, CONVERT_TO_INTEGER(cp_end_date_sk, 1, 1, 0) AS cp_end_date_sk, cp_department, CONVERT_TO_INTEGER(cp_catalog_number, 1, 1, 0) AS cp_catalog_number, CONVERT_TO_INTEGER(cp_catalog_page_number, 1, 1, 0) AS cp_catalog_page_number, cp_description, cp_type FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."catalog_page.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."customer_address" as SELECT CONVERT_TO_INTEGER(ca_address_sk, 1, 1, 0) AS ca_address_sk, ca_address_id, CONVERT_TO_INTEGER(ca_street_number, 1, 1, 0) AS ca_street_number, ca_street_name, ca_street_type, ca_suite_number, ca_city, ca_county, ca_state, ca_zip, ca_country, CONVERT_TO_INTEGER(ca_gmt_offset, 1, 1, 0) AS ca_gmt_offset, ca_location_type FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."customer_address.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."customer_demographics" as SELECT CONVERT_TO_INTEGER(cd_demo_sk, 1, 1, 0) AS cd_demo_sk, cd_gender, cd_marital_status, cd_education_status, CONVERT_TO_INTEGER(cd_purchase_estimate, 1, 1, 0) AS cd_purchase_estimate, cd_credit_rating, CONVERT_TO_INTEGER(cd_dep_count, 1, 1, 0) AS cd_dep_count, CONVERT_TO_INTEGER(cd_dep_employed_count, 1, 1, 0) AS cd_dep_employed_count, CONVERT_TO_INTEGER(cd_dep_college_count, 1, 1, 0) AS cd_dep_college_count FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."customer_demographics.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."customer" as SELECT CONVERT_TO_INTEGER(c_customer_sk, 1, 1, 0) AS c_customer_sk, c_customer_id, CONVERT_TO_INTEGER(c_current_cdemo_sk, 1, 1, 0) AS c_current_cdemo_sk, CONVERT_TO_INTEGER(c_current_hdemo_sk, 1, 1, 0) AS c_current_hdemo_sk, CONVERT_TO_INTEGER(c_current_addr_sk, 1, 1, 0) AS c_current_addr_sk, CONVERT_TO_INTEGER(c_first_shipto_date_sk, 1, 1, 0) AS c_first_shipto_date_sk, CONVERT_TO_INTEGER(c_first_sales_date_sk, 1, 1, 0) AS c_first_sales_date_sk, c_salutation, c_first_name, c_last_name, c_preferred_cust_flag, CONVERT_TO_INTEGER(c_birth_day, 1, 1, 0) AS c_birth_day, CONVERT_TO_INTEGER(c_birth_month, 1, 1, 0) AS c_birth_month, CONVERT_TO_INTEGER(c_birth_year, 1, 1, 0) AS c_birth_year, c_birth_country, c_login, c_email_address, CONVERT_TO_INTEGER(c_last_review_date_sk, 1, 1, 0) AS c_last_review_date_sk FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."customer.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."date_dim" as SELECT CONVERT_TO_INTEGER(d_date_sk, 1, 1, 0) AS d_date_sk, d_date_id, d_date, CONVERT_TO_INTEGER(d_month_seq, 1, 1, 0) AS d_month_seq, CONVERT_TO_INTEGER(d_week_seq, 1, 1, 0) AS d_week_seq, CONVERT_TO_INTEGER(d_quarter_seq, 1, 1, 0) AS d_quarter_seq, CONVERT_TO_INTEGER(d_year, 1, 1, 0) AS d_year, CONVERT_TO_INTEGER(d_dow, 1, 1, 0) AS d_dow, CONVERT_TO_INTEGER(d_moy, 1, 1, 0) AS d_moy, CONVERT_TO_INTEGER(d_dom, 1, 1, 0) AS d_dom, CONVERT_TO_INTEGER(d_qoy, 1, 1, 0) AS d_qoy, CONVERT_TO_INTEGER(d_fy_year, 1, 1, 0) AS d_fy_year, CONVERT_TO_INTEGER(d_fy_quarter_seq, 1, 1, 0) AS d_fy_quarter_seq, CONVERT_TO_INTEGER(d_fy_week_seq, 1, 1, 0) AS d_fy_week_seq, d_day_name, d_quarter_name, d_holiday, d_weekend, d_following_holiday, CONVERT_TO_INTEGER(d_first_dom, 1, 1, 0) AS d_first_dom, CONVERT_TO_INTEGER(d_last_dom, 1, 1, 0) AS d_last_dom, CONVERT_TO_INTEGER(d_same_day_ly, 1, 1, 0) AS d_same_day_ly, CONVERT_TO_INTEGER(d_same_day_lq, 1, 1, 0) AS d_same_day_lq, d_current_day, d_current_week, d_current_month, d_current_quarter, d_current_year FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."date_dim.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."household_demographics" as SELECT CONVERT_TO_INTEGER(hd_demo_sk, 1, 1, 0) AS hd_demo_sk, CONVERT_TO_INTEGER(hd_income_band_sk, 1, 1, 0) AS hd_income_band_sk, hd_buy_potential, CONVERT_TO_INTEGER(hd_dep_count, 1, 1, 0) AS hd_dep_count, CONVERT_TO_INTEGER(hd_vehicle_count, 1, 1, 0) AS hd_vehicle_count FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."household_demographics.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."income_band" as SELECT CONVERT_TO_INTEGER(ib_income_band_sk, 1, 1, 0) AS ib_income_band_sk, CONVERT_TO_INTEGER(ib_lower_bound, 1, 1, 0) AS ib_lower_bound, CONVERT_TO_INTEGER(ib_upper_bound, 1, 1, 0) AS ib_upper_bound FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."income_band.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."inventory" as SELECT CONVERT_TO_INTEGER(inv_date_sk, 1, 1, 0) AS inv_date_sk, CONVERT_TO_INTEGER(inv_item_sk, 1, 1, 0) AS inv_item_sk, CONVERT_TO_INTEGER(inv_warehouse_sk, 1, 1, 0) AS inv_warehouse_sk, CONVERT_TO_INTEGER(inv_quantity_on_hand, 1, 1, 0) AS inv_quantity_on_hand FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."inventory.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."item" as SELECT CONVERT_TO_INTEGER(i_item_sk, 1, 1, 0) AS i_item_sk, i_item_id, i_rec_start_date, i_rec_end_date, i_item_desc, CONVERT_TO_FLOAT(i_current_price, 1, 1, 0) AS i_current_price, CONVERT_TO_FLOAT(i_wholesale_cost, 1, 1, 0) AS i_wholesale_cost, CONVERT_TO_INTEGER(i_brand_id, 1, 1, 0) AS i_brand_id, i_brand, CONVERT_TO_INTEGER(i_class_id, 1, 1, 0) AS i_class_id, i_class, CONVERT_TO_INTEGER(i_category_id, 1, 1, 0) AS i_category_id, i_category, CONVERT_TO_INTEGER(i_manufact_id, 1, 1, 0) AS i_manufact_id, i_manufact, i_size, i_formulation, i_color, i_units, i_container, CONVERT_TO_INTEGER(i_manager_id, 1, 1, 0) AS i_manager_id, i_product_name FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."item.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."promotion" as SELECT CONVERT_TO_INTEGER(p_promo_sk, 1, 1, 0) AS p_promo_sk, p_promo_id, CONVERT_TO_INTEGER(p_start_date_sk, 1, 1, 0) AS p_start_date_sk, CONVERT_TO_INTEGER(p_end_date_sk, 1, 1, 0) AS p_end_date_sk, CONVERT_TO_INTEGER(p_item_sk, 1, 1, 0) AS p_item_sk, CONVERT_TO_FLOAT(p_cost, 1, 1, 0) AS p_cost, p_response_target, p_promo_name, p_channel_dmail, p_channel_email, p_channel_catalog, p_channel_tv, p_channel_radio, p_channel_press, p_channel_event, p_channel_demo, p_channel_details, p_purpose, p_discount_active FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."promotion.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."reason" as SELECT CONVERT_TO_INTEGER(r_reason_sk, 1, 1, 0) AS r_reason_sk, r_reason_id, r_reason_desc FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."reason.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."ship_mode" as SELECT CONVERT_TO_INTEGER(sm_ship_mode_sk, 1, 1, 0) AS sm_ship_mode_sk, sm_ship_mode_id, sm_type, sm_code, sm_carrier, sm_contract FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."ship_mode.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."store_returns" as SELECT CONVERT_TO_INTEGER(sr_returned_date_sk, 1, 1, 0) AS sr_returned_date_sk, CONVERT_TO_INTEGER(sr_return_time_sk, 1, 1, 0) AS sr_return_time_sk, CONVERT_TO_INTEGER(sr_item_sk, 1, 1, 0) AS sr_item_sk, CONVERT_TO_INTEGER(sr_customer_sk, 1, 1, 0) AS sr_customer_sk, CONVERT_TO_INTEGER(sr_cdemo_sk, 1, 1, 0) AS sr_cdemo_sk, CONVERT_TO_INTEGER(sr_hdemo_sk, 1, 1, 0) AS sr_hdemo_sk, CONVERT_TO_INTEGER(sr_addr_sk, 1, 1, 0) AS sr_addr_sk, CONVERT_TO_INTEGER(sr_store_sk, 1, 1, 0) AS sr_store_sk, CONVERT_TO_INTEGER(sr_reason_sk, 1, 1, 0) AS sr_reason_sk, CONVERT_TO_INTEGER(sr_ticket_number, 1, 1, 0) AS sr_ticket_number, CONVERT_TO_INTEGER(sr_return_quantity, 1, 1, 0) AS sr_return_quantity, CONVERT_TO_FLOAT(sr_return_amt, 1, 1, 0) AS sr_return_amt, CONVERT_TO_FLOAT(sr_return_tax, 1, 1, 0) AS sr_return_tax, CONVERT_TO_FLOAT(sr_return_amt_inc_tax, 1, 1, 0) AS sr_return_amt_inc_tax, CONVERT_TO_FLOAT(sr_fee, 1, 1, 0) AS sr_fee, CONVERT_TO_FLOAT(sr_return_ship_cost, 1, 1, 0) AS sr_return_ship_cost, CONVERT_TO_FLOAT(sr_refunded_cash, 1, 1, 0) AS sr_refunded_cash, CONVERT_TO_FLOAT(sr_reversed_charge, 1, 1, 0) AS sr_reversed_charge, CONVERT_TO_FLOAT(sr_store_credit, 1, 1, 0) AS sr_store_credit, CONVERT_TO_FLOAT(sr_net_loss, 1, 1, 0) AS sr_net_loss FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."store_returns.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."store_sales" as SELECT CONVERT_TO_INTEGER(ss_sold_date_sk, 1, 1, 0) AS ss_sold_date_sk, CONVERT_TO_INTEGER(ss_sold_time_sk, 1, 1, 0) AS ss_sold_time_sk, CONVERT_TO_INTEGER(ss_item_sk, 1, 1, 0) AS ss_item_sk, CONVERT_TO_INTEGER(ss_customer_sk, 1, 1, 0) AS ss_customer_sk, CONVERT_TO_INTEGER(ss_cdemo_sk, 1, 1, 0) AS ss_cdemo_sk, CONVERT_TO_INTEGER(ss_hdemo_sk, 1, 1, 0) AS ss_hdemo_sk, CONVERT_TO_INTEGER(ss_addr_sk, 1, 1, 0) AS ss_addr_sk, CONVERT_TO_INTEGER(ss_store_sk, 1, 1, 0) AS ss_store_sk, CONVERT_TO_INTEGER(ss_promo_sk, 1, 1, 0) AS ss_promo_sk, CONVERT_TO_INTEGER(ss_ticket_number, 1, 1, 0) AS ss_ticket_number, CONVERT_TO_INTEGER(ss_quantity, 1, 1, 0) AS ss_quantity, CONVERT_TO_FLOAT(ss_wholesale_cost, 1, 1, 0) AS ss_wholesale_cost, CONVERT_TO_FLOAT(ss_list_price, 1, 1, 0) AS ss_list_price, CONVERT_TO_FLOAT(ss_sales_price, 1, 1, 0) AS ss_sales_price, CONVERT_TO_FLOAT(ss_ext_discount_amt, 1, 1, 0) AS ss_ext_discount_amt, CONVERT_TO_FLOAT(ss_ext_sales_price, 1, 1, 0) AS ss_ext_sales_price, CONVERT_TO_FLOAT(ss_ext_wholesale_cost, 1, 1, 0) AS ss_ext_wholesale_cost, CONVERT_TO_FLOAT(ss_ext_list_price, 1, 1, 0) AS ss_ext_list_price, CONVERT_TO_FLOAT(ss_ext_tax, 1, 1, 0) AS ss_ext_tax, CONVERT_TO_FLOAT(ss_coupon_amt, 1, 1, 0) AS ss_coupon_amt, CONVERT_TO_FLOAT(ss_net_paid, 1, 1, 0) AS ss_net_paid, CONVERT_TO_FLOAT(ss_net_paid_inc_tax, 1, 1, 0) AS ss_net_paid_inc_tax, CONVERT_TO_FLOAT(ss_net_profit, 1, 1, 0) AS ss_net_profit FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."store_sales.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."store" as SELECT CONVERT_TO_INTEGER(s_store_sk, 1, 1, 0) AS s_store_sk, s_store_id, s_rec_start_date, s_rec_end_date, s_closed_date_sk, s_store_name, CONVERT_TO_INTEGER(s_number_employees, 1, 1, 0) AS s_number_employees, CONVERT_TO_INTEGER(s_floor_space, 1, 1, 0) AS s_floor_space, s_hours, S_manager, CONVERT_TO_INTEGER(S_market_id, 1, 1, 0) AS S_market_id, S_geography_class, S_market_desc, s_market_manager, CONVERT_TO_INTEGER(s_division_id, 1, 1, 0) AS s_division_id, s_division_name, CONVERT_TO_INTEGER(s_company_id, 1, 1, 0) AS s_company_id, s_company_name, CONVERT_TO_INTEGER(s_street_number, 1, 1, 0) AS s_street_number, s_street_name, s_street_type, s_suite_number, s_city, s_county, s_state, s_zip, s_country, CONVERT_TO_INTEGER(s_gmt_offset, 1, 1, 0) AS s_gmt_offset, CONVERT_TO_FLOAT(s_tax_percentage, 1, 1, 0) AS s_tax_percentage FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."store.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."time_dim" as SELECT CONVERT_TO_INTEGER(t_time_sk, 1, 1, 0) AS t_time_sk, t_time_id, CONVERT_TO_INTEGER(t_time, 1, 1, 0) AS t_time, CONVERT_TO_INTEGER(t_hour, 1, 1, 0) AS t_hour, CONVERT_TO_INTEGER(t_minute, 1, 1, 0) AS t_minute, CONVERT_TO_INTEGER(t_second, 1, 1, 0) AS t_second, t_am_pm, t_shift, t_sub_shift, t_meal_time FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."time_dim.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."warehouse" as SELECT CONVERT_TO_INTEGER(w_warehouse_sk, 1, 1, 0) AS w_warehouse_sk, w_warehouse_id, w_warehouse_name, CONVERT_TO_INTEGER(w_warehouse_sq_ft, 1, 1, 0) AS w_warehouse_sq_ft, CONVERT_TO_INTEGER(w_street_number, 1, 1, 0) AS w_street_number, w_street_name, w_street_type, w_suite_number, w_city, w_county, w_state, w_zip, w_country, CONVERT_TO_INTEGER(w_gmt_offset, 1, 1, 0) AS w_gmt_offset FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."warehouse.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."web_page" as SELECT CONVERT_TO_INTEGER(wp_web_page_sk, 1, 1, 0) AS wp_web_page_sk, wp_web_page_id, wp_rec_start_date, wp_rec_end_date, CONVERT_TO_INTEGER(wp_creation_date_sk, 1, 1, 0) AS wp_creation_date_sk, CONVERT_TO_INTEGER(wp_access_date_sk, 1, 1, 0) AS wp_access_date_sk, wp_autogen_flag, CONVERT_TO_INTEGER(wp_customer_sk, 1, 1, 0) AS wp_customer_sk, wp_url, wp_type, CONVERT_TO_INTEGER(wp_char_count, 1, 1, 0) AS wp_char_count, CONVERT_TO_INTEGER(wp_link_count, 1, 1, 0) AS wp_link_count, CONVERT_TO_INTEGER(wp_image_count, 1, 1, 0) AS wp_image_count, CONVERT_TO_INTEGER(wp_max_ad_count, 1, 1, 0) AS wp_max_ad_count FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."web_page.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."web_returns" as SELECT CONVERT_TO_INTEGER(wr_returned_date_sk, 1, 1, 0) AS wr_returned_date_sk, CONVERT_TO_INTEGER(wr_returned_time_sk, 1, 1, 0) AS wr_returned_time_sk, CONVERT_TO_INTEGER(wr_item_sk, 1, 1, 0) AS wr_item_sk, CONVERT_TO_INTEGER(wr_refunded_customer_sk, 1, 1, 0) AS wr_refunded_customer_sk, CONVERT_TO_INTEGER(wr_refunded_cdemo_sk, 1, 1, 0) AS wr_refunded_cdemo_sk, CONVERT_TO_INTEGER(wr_refunded_hdemo_sk, 1, 1, 0) AS wr_refunded_hdemo_sk, CONVERT_TO_INTEGER(wr_refunded_addr_sk, 1, 1, 0) AS wr_refunded_addr_sk, CONVERT_TO_INTEGER(wr_returning_customer_sk, 1, 1, 0) AS wr_returning_customer_sk, CONVERT_TO_INTEGER(wr_returning_cdemo_sk, 1, 1, 0) AS wr_returning_cdemo_sk, CONVERT_TO_INTEGER(wr_returning_hdemo_sk, 1, 1, 0) AS wr_returning_hdemo_sk, CONVERT_TO_INTEGER(wr_returning_addr_sk, 1, 1, 0) AS wr_returning_addr_sk, CONVERT_TO_INTEGER(wr_web_page_sk, 1, 1, 0) AS wr_web_page_sk, CONVERT_TO_INTEGER(wr_reason_sk, 1, 1, 0) AS wr_reason_sk, CONVERT_TO_INTEGER(wr_order_number, 1, 1, 0) AS wr_order_number, CONVERT_TO_INTEGER(wr_return_quantity, 1, 1, 0) AS wr_return_quantity, CONVERT_TO_FLOAT(wr_return_amt, 1, 1, 0) AS wr_return_amt, CONVERT_TO_FLOAT(wr_return_tax, 1, 1, 0) AS wr_return_tax, CONVERT_TO_FLOAT(wr_return_amt_inc_tax, 1, 1, 0) AS wr_return_amt_inc_tax, CONVERT_TO_FLOAT(wr_fee, 1, 1, 0) AS wr_fee, CONVERT_TO_FLOAT(wr_return_ship_cost, 1, 1, 0) AS wr_return_ship_cost, CONVERT_TO_FLOAT(wr_refunded_cash, 1, 1, 0) AS wr_refunded_cash, CONVERT_TO_FLOAT(wr_reversed_charge, 1, 1, 0) AS wr_reversed_charge, CONVERT_TO_FLOAT(wr_account_credit, 1, 1, 0) AS wr_account_credit, CONVERT_TO_FLOAT(wr_net_loss, 1, 1, 0) AS wr_net_loss FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."web_returns.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."web_sales" as SELECT CONVERT_TO_INTEGER(ws_sold_date_sk, 1, 1, 0) AS ws_sold_date_sk, CONVERT_TO_INTEGER(ws_sold_time_sk, 1, 1, 0) AS ws_sold_time_sk, CONVERT_TO_INTEGER(ws_ship_date_sk, 1, 1, 0) AS ws_ship_date_sk, CONVERT_TO_INTEGER(ws_item_sk, 1, 1, 0) AS ws_item_sk, CONVERT_TO_INTEGER(ws_bill_customer_sk, 1, 1, 0) AS ws_bill_customer_sk, CONVERT_TO_INTEGER(ws_bill_cdemo_sk, 1, 1, 0) AS ws_bill_cdemo_sk, CONVERT_TO_INTEGER(ws_bill_hdemo_sk, 1, 1, 0) AS ws_bill_hdemo_sk, CONVERT_TO_INTEGER(ws_bill_addr_sk, 1, 1, 0) AS ws_bill_addr_sk, CONVERT_TO_INTEGER(ws_ship_customer_sk, 1, 1, 0) AS ws_ship_customer_sk, CONVERT_TO_INTEGER(ws_ship_cdemo_sk, 1, 1, 0) AS ws_ship_cdemo_sk, CONVERT_TO_INTEGER(ws_ship_hdemo_sk, 1, 1, 0) AS ws_ship_hdemo_sk, CONVERT_TO_INTEGER(ws_ship_addr_sk, 1, 1, 0) AS ws_ship_addr_sk, CONVERT_TO_INTEGER(ws_web_page_sk, 1, 1, 0) AS ws_web_page_sk, CONVERT_TO_INTEGER(ws_web_site_sk, 1, 1, 0) AS ws_web_site_sk, CONVERT_TO_INTEGER(ws_ship_mode_sk, 1, 1, 0) AS ws_ship_mode_sk, CONVERT_TO_INTEGER(ws_warehouse_sk, 1, 1, 0) AS ws_warehouse_sk, CONVERT_TO_INTEGER(ws_promo_sk, 1, 1, 0) AS ws_promo_sk, CONVERT_TO_INTEGER(ws_order_number, 1, 1, 0) AS ws_order_number, CONVERT_TO_INTEGER(ws_quantity, 1, 1, 0) AS ws_quantity, CONVERT_TO_FLOAT(ws_wholesale_cost, 1, 1, 0) AS ws_wholesale_cost, CONVERT_TO_FLOAT(ws_list_price, 1, 1, 0) AS ws_list_price, CONVERT_TO_FLOAT(ws_sales_price, 1, 1, 0) AS ws_sales_price, CONVERT_TO_FLOAT(ws_ext_discount_amt, 1, 1, 0) AS ws_ext_discount_amt, CONVERT_TO_FLOAT(ws_ext_sales_price, 1, 1, 0) AS ws_ext_sales_price, CONVERT_TO_FLOAT(ws_ext_wholesale_cost, 1, 1, 0) AS ws_ext_wholesale_cost, CONVERT_TO_FLOAT(ws_ext_list_price, 1, 1, 0) AS ws_ext_list_price, CONVERT_TO_FLOAT(ws_ext_tax, 1, 1, 0) AS ws_ext_tax, CONVERT_TO_FLOAT(ws_coupon_amt, 1, 1, 0) AS ws_coupon_amt, CONVERT_TO_FLOAT(ws_ext_ship_cost, 1, 1, 0) AS ws_ext_ship_cost, CONVERT_TO_FLOAT(ws_net_paid, 1, 1, 0) AS ws_net_paid, CONVERT_TO_FLOAT(ws_net_paid_inc_tax, 1, 1, 0) AS ws_net_paid_inc_tax, CONVERT_TO_FLOAT(ws_net_paid_inc_ship, 1, 1, 0) AS ws_net_paid_inc_ship, CONVERT_TO_FLOAT(ws_net_paid_inc_ship_tax, 1, 1, 0) AS ws_net_paid_inc_ship_tax, CONVERT_TO_FLOAT(ws_net_profit, 1, 1, 0) AS ws_net_profit FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."web_sales.csv"; create table "benchmarking"."benchmarking"."tpcds_sf10000"."web_site" as SELECT CONVERT_TO_INTEGER(web_site_sk, 1, 1, 0) AS web_site_sk, web_site_id, web_rec_start_date, web_rec_end_date, web_name, CONVERT_TO_INTEGER(web_open_date_sk, 1, 1, 0) AS web_open_date_sk, CONVERT_TO_INTEGER(web_close_date_sk, 1, 1, 0) AS web_close_date_sk, web_class, web_manager, CONVERT_TO_INTEGER(web_mkt_id, 1, 1, 0) AS web_mkt_id, web_mkt_class, web_mkt_desc, web_market_manager, CONVERT_TO_INTEGER(web_company_id, 1, 1, 0) AS web_company_id, web_company_name, CONVERT_TO_INTEGER(web_street_number, 1, 1, 0) AS web_street_number, web_street_name, web_street_type, web_suite_number, web_city, web_county, web_state, web_zip, web_country, CONVERT_TO_INTEGER(web_gmt_offset, 1, 1, 0) AS web_gmt_offset, CONVERT_TO_FLOAT(web_tax_percentage, 1, 1, 0) AS web_tax_percentage FROM "benchmarking"."benchmarking"."tpcds_sf10000"."csv"."web_site.csv"; ``` * 以上內容貼到dremio的SQL Runner藍色框框中, 並按Run執行 ![image](https://hackmd.io/_uploads/HJfleV5Oa.png) ## Step 4. Test Benchmark * 進入Benchmark路徑 ```bash= cd /root/dremio-tpc-ds/Scripts/TPC-DS\ Benchmark/ ``` * DremioTestPlanTPC-DS.jmx與DremioTestPlanTPC-DS-metadatarefresh.jmx中修改dremio loadbalancer IP並儲存 * WORKERS數改成K8S上部署的executor數, 如下 ```bash= vi runDremioBenchmark.sh ``` ```bash= #!/bin/bash WORKERS='four' # executor數 echo "First Run" jmeter -n -t ./DremioTestPlanTPC-DS-metadatarefresh.jmx jmeter -n -t ./DremioTestPlanTPC-DS.jmx mv DremioSummary-tpc-ds.csv 'DremioSummary-tpc-ds-'$WORKERS'-workers-first.csv' echo "Second Run" jmeter -n -t ./DremioTestPlanTPC-DS.jmx mv DremioSummary-tpc-ds.csv 'DremioSummary-tpc-ds-'$WORKERS'-workers-second.csv' echo "Third Run" jmeter -n -t ./DremioTestPlanTPC-DS.jmx mv DremioSummary-tpc-ds.csv 'DremioSummary-tpc-ds-'$WORKERS'-workers-third.csv' echo "Done" ``` * 儲存完後如下執行 ```bash= ./runDremioBenchmark.sh ``` ## Step 5. Scenario * Scenario 1: 1 coordinator and 4 executor nodes. * Scenario 2: 1 coordinator and 6 executor nodes. * Scenario 3: 1 coordinator and 8 executor nodes. * Scenario 4: 1 coordinator and 12 executor nodes. * Scenario 5: 1 coordinator and 16 executor nodes. * Scenario 6: 1 coordinator and 20 executor nodes. :::danger Note: * The test is conducted with (4) Dremio executors and scaled to (8) Dremio executors. * Compare the eight-node Dremio results using Dell ECS against the eightnode Dremio results on AWS, and forecast the linear scalability for 12, 16, and 20 nodes. ::: # Monitor cpu & memory * 在執行測試的期間,執行 iostat 來監看 CPU 和 Memory ## Install iostat * 於任意一個有跑 kafka pod 的 Node 進行安裝 ```bash= # CentOS yum install sysstat # Ubuntu apt-get install sysstat ``` ## iostat 指令 * 需要先知道 Kafka 的 data 存在哪個 Device 並監看其 **MB_read/s** ```bash= # -m : 以 M 為單位顯示所有訊息 # 每 60 秒刷新訊息,且顯示 180 次,測量時間約 3H iostat -m 60 180 # ------------------------------ # tps :該設備每秒的傳輸次數 # MB_read/s : 每秒從設備讀取的資料量 # MB_wrtn/s : 每秒從設備寫入的資料量 # MB_read : 讀取的總資料量 # MB_wrtn : 寫入的總資料量 # ------------------------------ ```