Druid on AKS ==== ## References ### Druid architecture https://medium.com/@aeli/apache-druid-setup-monitoring-and-auto-scaling-on-kubernetes-91739e350fac https://anskarl.github.io/post/2019/druid-part-1/ https://druid.apache.org/docs/latest/design/processes.html#server-types ### Bitmap https://levelup.gitconnected.com/insights-into-indexing-using-bitmap-index-c28a3db1ad97 ### Apache DataStetch --- approximate count distinct https://datasketches.apache.org/docs/Theta/InverseEstimate.html ## Access to AKS ### by powershell ``` az login # auth thru browser az aks install-cli # instll kubectl for powershell ``` #### Create a resource group https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough#create-a-resource-group or https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough-powershell#create-a-resource-group #### Create AKS cluster https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough-portal#create-an-aks-cluster #### Connect to the cluster ``` az account set --subscription $AZURE_SUBSCRIPTION az aks get-credentials --resource-group kg-one-id-deploy-dev --name $AZURE_KUBERNETES_NAME ``` in our case, ``` az account set --subscription 35015be4-a4d5-4a9d-a705-d88eabccfa2f az aks get-credentials --resource-group kg-one-id-deploy-dev --name kg-aks-druid-dev2 ``` then we will have Use `kubectl get all` to check if succeed ### by WSL ``` sudo apt install kubectl ``` then use power shell fisr to do ``` az account set --subscription $AZURE_SUBSCRIPTION az aks get-credentials --resource-group kg-one-id-deploy-dev --name $AZURE_KUBERNETES_NAME ``` to get `C:\Users\eugene_wang\.kube` and then copy it to `~/.kube/config` ### by local/web cloud shell The only difference local/web is that web azure cloud shell have already done `az login` for you. And cloud shell has install many stuff for you. Note that we can't `su` or `sudo` in cloud shell. Be careful that web/local cloud shell closes itself, and you will lose all printout #### for "no storage mounted" error in cloud shell ![](https://i.imgur.com/1rmhOrF.png) ![](https://i.imgur.com/lIvbIMY.png) Create a fake storage in "advanced settings". Can login and set AKS but files get deleted after logout. ![](https://i.imgur.com/4rHX1KH.png) ##### if you have a proper storage afterward See: https://techcommunity.microsoft.com/t5/azure/azure-cloud-shell-error/m-p/71089#M489 1. Run "clouddrive unmount" 2. Restart Cloud Shell via restart icon or exit and relaunch 3. You should be prompted with the storage creation dialog again ## Deployment with Helm ``` helm install myReleaseName myRepo/myChart --namespace myNamespace -f myValues.yaml ``` Clone the chart from: https://github.com/helm/charts (the original) or https://github.com/y56/charts/blob/master/incubator/druid/values.yaml values.yaml: https://github.com/ASUS-AICS/kg-oneid-analytics-system/wiki/druid-settings ```bash helm repo add incubator https://charts.helm.sh/incubator helm install druid incubator/druid helm install druid incubator/drui d -f my_values.yaml ``` :::warning `values.yaml` may contain secrets ::: ### Helm 2 vs Helm 3 syntax and chart hub url are both changed ### Get in the GUI by port-forward port-forward the Druid "router service" 8888 to localhost 5566 ``` kubectl port-forward svc/druid-router 5566:8888 ``` open http://127.0.0.1:5566 in the browser Note that if you do the forwarding in Windows (e.g., in PowerShell), the WSL2 can't acheive the port. If you are using Cloud shell, be careful that some ports already in use and we can't sudo. Or, we can set druid-router as ingress pod to obtain a public IP (assigned by AKS). ### Add Azure Storage extensions through values.yaml https://druid.apache.org/docs/latest/development/extensions.html#loading-core-extensions https://druid.apache.org/docs/latest/development/extensions-core/azure.html ``` druid_storage_type: azure ``` ``` druid_azure_account: druidsampledata druid_azure_key: abcdxxxxxxxx== druid_azure_container: test-active-user druid.azure.prefix: myPrefix druid_azure_protocol: https druid_azure_maxTries: '3' druid_azure_maxListingLength: '1024' ``` :::info Apache DataSketches is installed by default. ::: ### Updating k8s config by helm ### `Helm uninstall` ``` helm uninstall myReleaseName --namespace myNamespace helm uninstall druid ``` ## some useful commands/tools ### kill all pods in kubernetes, or delete things [Ref](https://stackoverflow.com/questions/33509194/command-to-delete-all-pods-in-all-kubernetes-namespaces) ``` kubectl delete pods --all --all-namespaces ``` Avoid deleting system pods. Recreating wastes time. Might break the cluster and need to restart . ### start/stop cluster https://docs.microsoft.com/en-us/azure/aks/start-stop-cluster ``` az aks stop --name myAKSCluster --resource-group myResourceGroup az aks start --name myAKSCluster --resource-group myResourceGroup ``` ``` az aks stop --name kg-aks-druid-dev2 --resource-group kg-one-id-deploy-dev2 az aks start --name kg-aks-druid-dev2 --resource-group kg-one-id-deploy-dev2 ``` // stop and start can take up to 10 minutes ### restart nodes https://pumpingco.de/blog/restart-a-node-in-aks/ or just stop then start the whole cluster ### [activity log](https://portal.azure.com/#@ASUS.com/resource/subscriptions/35015be4-a4d5-4a9d-a705-d88eabccfa2f/resourceGroups/kg-one-id-deploy-dev/providers/Microsoft.ContainerService/managedClusters/kg-aks-druid-dev2/eventlogs) need access to see history logs ### hello-world service to check if pod/node/cluster works well http://20.195.53.3/ http://20.43.168.59/ ### delete all pods ``` for A in `k get po | grep locust | sed 's/ .*//'`; do kubectl delete pod/$A; do ne ``` ### login into a pod ``` kubectl exec --stdin --tty pod/druid-historical-0 -- /bin/sh ``` ### kubectl describe XXXX ### kubectl logs XXXX ### kubernetes dashboard https://github.com/kubernetes/dashboard https://github.com/kubernetes/dashboard/blob/master/docs/user/access-control/creating-sample-user.md execute: ``` kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.2.0/aio/deploy/recommended.yaml kubectl proxy ``` execute: ``` cat <<EOF | kubectl apply -f - apiVersion: v1 kind: ServiceAccount metadata: name: admin-user namespace: kubernetes-dashboard EOF ``` execute: ``` cat <<EOF | kubectl apply -f - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: admin-user roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: admin-user namespace: kubernetes-dashboard EOF ``` execute: ``` kubectl -n kubernetes-dashboard get secret $(kubectl -n kubernetes-dashboard get sa/admin-user -o jsonpath="{.secrets[0].name}") -o go-template="{{.data.token | base64decode}}" ``` It should print something very long like: `eyJhbGciO......Tuuw2svnV6NYQ` Finally, go to: http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/ use the token ### Setup Monitoring using Prometheus and Grafana https://medium.com/@aeli/apache-druid-setup-monitoring-and-auto-scaling-on-kubernetes-91739e350fac ``` helm repo add stable https://charts.helm.sh/stable helm install demo stable/prometheus-operator ``` [[WIP]] ---- ---- ## some AKS/WSL/K8S/minikube issues whenever suffering error, check internet connection by `nslookup google.com` and try [wsl-fix-resolvconf.sh](https://gist.github.com/nfekete/7a277bf9e25e89e1c8bfb8b64dcc08ed) ### wsl internet issue: bad virtual DNS 172.17.64.65 ==> Ethernet adapter vEthernet (WSL) // virtual DNS 172.23.44.53 ==> corpnet.asus // ASUS LAN DNS btw we can't use 8.8.8.8 as DNS server in ASUS LAN, so some solns from stackoverflow will ![](https://i.imgur.com/hxR41ko.png) check interface by ```ipconfig /all ``` at powershell ### [Pod Stuck In Terminating State](https://containersolutions.github.io/runbooks/posts/kubernetes/pod-stuck-in-terminating-state) ### pods stuck in `ImagePullBackOff` or `ErrImagePull` when `kubectl get po` say a pod's status is ErrImagePull or ImagePullBackOff do `kubectl describe pod/the-pod-name` to check logs may be caused by broken internet or lack of access #### broken internet bc of wsl dns ``` Failed to pull image "apache/druid:0.19.0": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 192.168.65.2:53: read udp 192.168.49.2:57045->192.168.65.2:53: i/o timeout ``` #### some helm chart uses private images ``` Failed to pull image "gcr.io/zenatix-data-archiver/kube-druid:0.0.1": rpc error: code = Unknown desc = Error response from daemon: unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication Normal BackOff 33m (x5 over 34m) kubelet Back-off pulling image "gcr.io/zenatix-data-archiver/kube-druid:0.0.1" ``` ### `kubectl get all -A` fails bc of wsl dns ``` Unable to connect to the server: dial tcp: lookup kg-aks-druid-dev2-dns-2b85eb85.hcp.southeastasia.azmk8s.io on 172.17.177.225:53: read udp 172.17.177.234:54522->172.17.177.225:53: i/o timeout ``` ### nodes are `notReady` may be shutdown. Get azure access to start the vmss/nodes See: https://ithelp.ithome.com.tw/articles/10193944 ### get access of the resource group `MC_rgname_aksname_southeastasia` add tag to resource group to keep it on, otherwise will be shutdown at 9pm need access to MC(managed clusters) to start VMSS( virtual machine scale sets) MC_rgname_aksname_southeastasia也是個RG 是在我們原本的Resource Group裡建立AKS時自動被生出來的 AKS用到的服務(VM, DNS, IP, disk ...etc)都歸MC_XXX_YYY_ZZZ管 ![](https://i.imgur.com/2VgCm67.png) #### to start nodes/VMs by VMSS ![](https://i.imgur.com/cOSIDnW.png) ### get access to ACR(Azure Container Registry) Need access to ACR to pull images. ``` Failed to pull image "kgacrdruiddev.azurecr.io/webfrontend:v1": rpc error: code = Unknown desc = Error response from daemon: Get https://kgacrdruiddev.azurecr.io/v2/webfrontend/manifests/v1: unauthorized: authentication required, visit https://aka.ms/acr/authorization for more information. ``` ### wsl2 can't access port forwarded by powershell GG, still an open issu (as for 2021.Mar.26) https://github.com/microsoft/WSL/issues/5211#issuecomment-628565569 Windows can see port opened by WSL, but not vice versa :::warning use WSL to forward ports ::: ### minikube setup WSL2 + windows docker make sure WSL2 has internet ### switch between k8s clusters (i.e., AKS and minikube) ``` kubectl config view # lists clusters kubectl config use-context kg-aks-druid-dev2 # switch to another cluster kubectl config use-context minikube ``` :::warning Need to fix WSL2 DNS after switching clusters!!! ::: ---- ---- 最實際的問題: 1. 你覺得 Druid 有機會取代 Synapse 嗎XD 考量成本&效能 1. given data of same size 1. ingection time: 2. query's response time: 3. cost given the same uptime: 2. cost-perf plot data size 18GB (for 2 days, active_user) injection time(containing processing)=? query response time=? MC's cost (3VM)=? usage scenario=? query for one week ? the SQL will be like?? ---- ---- 實作面: 1. 如果我們把資料 load 進去之後發現 query 速度很慢,我們有哪些設定能調去做 optimize? - 我預期是有不同的 index 能設 / 設 partition / 調 schema / 根據時間欄位做一些設定,比較複雜的應該是各個 component 之間連線或資源的設定,這部分我就比較沒經驗.. 2. 如果 query 速度很慢,我們能怎麼觀察 bottleneck 在哪,決定把哪裡的機器開大、或調整設定? - 聽 EMR team 用 elasticsearch 的經驗,好像會去看 vm 的 cpu utilization 以及確認運算有沒有平行化(但我不知道 Druid 有沒有平行運算的設定) 3. Load 一天或一個月資料要花多少時間,你現在 load 資料的方式以後有辦法自動化嗎? - 之前 survey 是覺得用 http api 應該可行,但如果能直接從 data lake load 進去也很好,只是要確認能不能自動化 4. 想知道佈到 aks 的過程~還有他實際上裝了哪些東西 5. 想知道我今天也想要開發的話,可以怎麼開始~? - 我想像... 是在 local 下 az get-credential -> `kubectl get services` -> 進 localhost? Druid 方面: 1. Brett 問的那個問題,他讓 query 變快的原理? - Roll-up 應該是個很大的基礎 - no further benefits from roll-up and granularity - approx distinct count ---- ---- ## some druid issue ### debug -- general ```bash for A in `k get po | sed 's/ .*//'`; do echo //////////$A; kl pod/$A | egrep -i "error|fail|ExecutionException"; done ``` ### [Heisenbug] java.lang.IllegalStateException: Already connected at sun.net.www.protocol.http.HttpURLConnection This happened when ingesting. Really don't why. Success when retry and never happen again. ### middle managers being blacklisted https://druid.apache.org/docs/latest/design/overlord.html#blacklisted-workers ### use Theta will fail when ingesting and middle managers all die out #### shrink file size 18-GB files are too large. 100~200 MB files are safe. #### ingestion parameter reduce to 500M: (while default is 1G) ``` "maxFetchCapacityBytes": 536870912, "maxCacheCapacityBytes": 536870912 ``` #### give middle manager JVM more space Each subtask is processed by one JVM Tune middle manager parameters: ``` java option array .... xmx mxs direct... ``` ### [sovled] segments not 100% available because volume of historical not enough ``` 2021-04-11T10:24:27,300 ERROR [ZKCoordinator--0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Failed to load segment for dataSource: {class=org.apache.druid.server.coordination.SegmentLoadDropHandler, exceptionType=class org.apache.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[active_user_all_split_12000_hll_sub16_001_2020-12-03T00:00:00.000Z_2020-12-04T00:00:00.000Z_2021-04-10T19:11:02.644Z_16], segment=DataSegment{binaryVersion=9, id=active_user_all_split_12000_hll_sub16_001_2020-12-03T00:00:00.000Z_2020-12-04T00:00:00.000Z_2021-04-10T19:11:02.644Z_16, loadSpec={type=>azure, containerName=>personal, blobPath=>active_user_all_split_12000_hll_sub16_001/20201203T000000.000Z_20201204T000000.000Z/2021-04-10T19_11_02.644Z/16/index.zip}, dimensions=[app_type, browser, bu, city, first_date, is_member, is_new, os, source, news_category_lv1_id, news_category_lv2_id], metrics=[sum_log_count, hll_one_id], shardSpec=NumberedShardSpec{partitionNum=16, partitions=98}, lastCompactionState=null, size=753977}} ``` https://github.com/apache/druid/issues/8706 ### brokers die during large ingestion Put brokers and middle managers on different nodes. [WIP] :::info keyword: inter-pod anti-affinity ::: ### Error: Resource limit exceeded ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id, news_category_lv2_id FROM active_user_all_split_12000_theta_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') GROUP BY news_category_lv1_id, news_category_lv2_id ``` ``` Error: Resource limit exceeded Not enough aggregation buffer space to execute this query. Try increasing druid.processing.buffer.sizeBytes or enable disk spilling by setting druid.query.groupBy.maxOnDiskStorage to a positive number. org.apache.druid.query.ResourceLimitExceededException ``` :::success historical: enabled: true name: historical replicaCount: 14 port: 8083 serviceType: ClusterIP config: # DRUID_XMX: 512m # STRANGE QQ # DRUID_XMS: 512m DRUID_MAXDIRECTMEMORYSIZE: 5g # (druid.processing.numThreads + druid.processing.numMergeBuffers + 1) * druid.processing.buffer.sizeBytes druid_processing_buffer_sizeBytes: '100000000' # This specifies a buffer size for the storage of intermediate results. The computation engine in both the Historical and Realtime processes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed. # defalt: auto (max 1GB) druid_processing_numMergeBuffers: 1 # processing threads/4 druid_processing_numThreads: 3 # number of cores - 1 # druid_monitoring_monitors: '["org.apache.druid.client.cache.CacheMonitor", "org.apache.druid.server.metrics.HistoricalMetricsMonitor", "org.apache.druid.server.metrics.QueryCountStatsMonitor"]' # druid_segmentCache_locations: '[{"path":"/var/druid/segment-cache","maxSize":300000000000}]' # druid_cache_sizeInBytes: '2000000000' # 2 GB # A general rule-of-thumb for sizing the Historical heap is (0.5GB * number of CPU cores), with an upper limit of ~24GB. # default: min(1GiB, Runtime.maxMemory / 10) druid_historical_cache_useCache: true druid_historical_cache_populateCache: true ::: ### Error: ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv2_id FROM active_user_all_split_12000_theta_sub8_001 -- WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') GROUP BY news_category_lv2_id LIMIT 100 ``` ``` Error: Unknown exception org.jboss.netty.channel.ChannelException: Channel disconnected java.util.concurrent.ExecutionException ``` ``` Warning Unhealthy 43m (x2 over 43m) kubelet Liveness probe failed: Get http://10.244.2.81:8083/status/health: dial tcp 10.244.2.81:8083: connect: connection refused Warning Unhealthy 42m (x2 over 43m) kubelet Readiness probe failed: Get http://10.244.2.81:8083/status/health: dial tcp 10.244.2.81:8083: connect: connection refused ``` ## tuning ### choosing hardware #### VM instance See: https://docs.imply.io/2021.02/planning/#data-servers ``` (AWS) vCPU ECU Memory (GiB) Instance Storage (GB) Linux/UNIX Usage r4.2xlarge 8 31 61 GiB EBS Only $0.532 per Hour ``` will try Azure B8MS :::info as default, 4 pods per node ::: ### Persistent Volume and Persistent Volume Claim To know PV and PVC, see: https://kubernetes.io/docs/concepts/storage/persistent-volumes/ To know Azure's PV, see: https://docs.microsoft.com/en-us/azure/aks/azure-disks-dynamic-pv#built-in-storage-classes We re using `storageClass: "managed-premium"`, already the fast For PV options of Azure: https://docs.microsoft.com/en-us/azure/aks/concepts-storage#storage-classes Remember to delete old PV/PVC if you are changing storage class, or, try to ***migrate*** (survey needed) #### [WIP] better PV ![](https://i.imgur.com/GY1Ej73.png) https://docs.microsoft.com/en-us/azure/aks/azure-disks-dynamic-pv https://docs.microsoft.com/en-us/azure/aks/concepts-storage ### Middle Manager / data ingestion Middle Managers are in charge of injection. * raw data file size * 18G is too large, few hundred MB is good * 18-GB file take 1.0 hr (same on B4ms/B2ms, for https/deep storage ) * file is smallest unit, one sub-task takes care of one file * 500 150-MB files w/ B4ms*6 w/ 5 middle managers ==> 8~9 min * number of middle manager pods * don't let two middle manager pods sit on one node * number of sub-task when ingestion * set to number of total slots * ### cache and thread Use default first. Default values are aligned with [Basic Tuning Guides](http://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html) Unless encountering out-of-memory in JVM ### JVM tuning http://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html#jvm-tuning ### the second partition ## experiment ### sample data 18GB(10 days) https://druidsampledata.blob.core.windows.net/test-active-user/part-00000-tid-3812279347746115856-60712cdf-da41-4bbc-b99b-04ffd513ea6f-48-1-c000.json azure://test-active-user/part-00000-tid-3812279347746115856-60712cdf-da41-4bbc-b99b-04ffd513ea6f-48-1-c000.json azure://test-active-user/test-small-active-user.json #### not preprocessed ```sql SELECT APPROX_COUNT_DISTINCT("one_id") FROM "sampledruiddata-apr8-sub4" WHERE ("bu" = 'ettoday') AND ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') ``` ![](https://i.imgur.com/LYEbEIi.png) ```sql SELECT APPROX_COUNT_DISTINCT_DS_HLL("one_id") FROM "sampledruiddata-apr8-sub4" WHERE ("bu" = 'ettoday') AND ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') ``` ![](https://i.imgur.com/zq18tMR.png) ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("one_id") FROM "sampledruiddata-apr8-sub4" WHERE ("bu" = 'ettoday') AND ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') ``` ![](https://i.imgur.com/OCf4Yhw.png) #### preprocessed ```sql SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id") FROM "sampledruiddata-HHLSB-allDefault_001" WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday') ``` ![](https://i.imgur.com/F3oHc1A.png) ```sql SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id"), "__time" FROM "sampledruiddata-HHLSB-allDefault_001" WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday') GROUP BY 2 ``` ![](https://i.imgur.com/JwCChuw.png) ```sql SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id") FROM "sampledruiddata-HHLSB-allDefault_001" WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday') AND ("city" = 'Tainan') ``` ![](https://i.imgur.com/xzcTTvu.png) ### from 10/1~10/31 // ingested in 8 min with preprocessing #### accuracy, about 4% ![](https://i.imgur.com/9TrTQ7Y.png) ![](https://i.imgur.com/shmCqXh.png) #### 0.98 sec ```sql SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id") FROM "DrSaDa-active_user_20201001_20201031_split500-HHL-sub18" WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday') ``` ![](https://i.imgur.com/QVkvKtv.png) #### 0.55, This is CACHED! ```sql SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id"), "__time" FROM "DrSaDa-active_user_20201001_20201031_split500-HHL-sub18" WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday') GROUP BY 2 ``` #### 0.79 sec ```sql SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id") FROM "DrSaDa-active_user_20201001_20201031_split500-HHL-sub18" WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday') AND ("city" = 'Tainan') ``` ### preprocessed -- Theta ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id") FROM active_user_oct_split3000_theta_002 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday') QQQQ 0.57 ``` ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id") FROM active_user_oct_split3000_theta_002 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday') AND ("city" = 'Tainan') QQQQ 0.67 ``` ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), "__time" FROM active_user_oct_split3000_theta_002 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday') GROUP BY 2 ???????? ``` ### preprocessed -- HLL -- 254 days k8s settings: https://gist.github.com/y56/c6a1493690380b48e6d52cfc47acbab6 ```sql SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id") FROM active_user_all_split_12000_hll_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') ``` 8.84 => 1.76 ```sql SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id") FROM active_user_all_split_12000_hll_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') AND ("bu" = 'ettoday') ``` 16.76s => 2.68s ```sql SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id") FROM active_user_all_split_12000_hll_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') AND ("bu" = 'ettoday') AND ("city" = 'Tainan') ``` 3.63s ### preprocessed -- Theta -- 254 days k8s settings: https://gist.github.com/y56/c6a1493690380b48e6d52cfc47acbab6 ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id") FROM active_user_all_split_12000_theta_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') ``` 3.58s (0.7s for 2nd query, converge to ~0.22s if repeated many times) ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id") FROM active_user_all_split_12000_theta_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') AND ("bu" = 'ettoday') ``` 2.25s (0.68s for 2nd query, converge to ~0.22s if repeated many times) ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id") FROM active_user_all_split_12000_theta_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') AND ("bu" = 'ettoday') AND ("city" = 'Tainan') ``` 0.63 (max 0.21s for repeated) :::warning restart ::: ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id") FROM active_user_all_split_12000_theta_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') AND ("bu" = 'ettoday') AND ("city" = 'Tainan') ``` 4.80s, 2.28s, 1.38s, 0.92, 0.45, 0.38, 0.30, 0.28. 0.27, 0.35 ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id") FROM active_user_all_split_12000_theta_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') AND ("bu" = 'ettoday') ``` 2.47 1.11 0.70, 0.39, 0.34, 0.32, 0.32 ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id") FROM active_user_all_split_12000_theta_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') ``` 0.94 0.59 0.45 0.37 0.37 :::info restart ::: ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), __time FROM active_user_all_split_12000_theta_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') GROUP BY 2 ``` 5.49 ==> 6.25 Standard_D8s_v3 ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id FROM active_user_all_split_12000_theta_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') GROUP BY news_category_lv1_id ``` 6.29 ==> 6.14 Standard_D8s_v3 ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id FROM active_user_all_split_12000_theta_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') AND ("bu" = 'ettoday') GROUP BY news_category_lv1_id ``` 5.84 ==> 6.11 Standard_D8s_v3 ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id FROM active_user_all_split_12000_theta_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') AND ("bu" = 'ettoday') AND ("city" = 'Tainan') GROUP BY news_category_lv1_id ``` 1.31 ==> 1.76 Standard_D8s_v3 ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id FROM active_user_all_split_12000_theta_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') AND ("bu" = 'ettoday') AND ("city" = 'Taipei') GROUP BY news_category_lv1_id ``` 2.66 ==> 3.25 Standard_D8s_v3 ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv2_id FROM active_user_all_split_12000_theta_sub8_001 -- WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') GROUP BY news_category_lv2_id LIMIT 100 ``` 6.79s ERROR ### druid_processing_buffer_sizeBytes: '100000000' 1G * 14 historical pods Standard_D8s_v3 (6 instances) // daily cost 2000NTD k8s settings: https://gist.github.com/y56/abd07e37ff8aaf98fadd925a1f8a19a2 ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id, news_category_lv2_id FROM active_user_all_split_12000_theta_sub8_001 WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31') GROUP BY news_category_lv1_id, news_category_lv2_id ``` 7.06s 3.94 2.59 2.52 2.29 2.46 2.26 10.96 7.5 ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv2_id FROM active_user_all_split_12000_theta_sub8_001 GROUP BY news_category_lv2_id ``` 65.41 (peak 23.4 core with 6 VM * 8 core) ```sql= SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id, news_category_lv2_id FROM active_user_all_split_12000_theta_sub8_001 GROUP BY news_category_lv1_id, news_category_lv2_id ``` 62.30 60.03 // ㄎㄅ ### Standard_D8s_v3 (6 instances) // daily cost 2000NTD // 28 histo pods SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id, news_category_lv2_id FROM active_user_all_split_12000_theta_sub8_001 GROUP BY news_category_lv1_id, news_category_lv2_id 26.78 ![](https://i.imgur.com/NFIe4SY.png) ### Standard_D8s_v3 (6 instances) // daily cost 2000NTD // 35 histo pods 26.56 ![](https://i.imgur.com/9V67WJS.png) ``` historical: replicaCount: 35 config: DRUID_MAXDIRECTMEMORYSIZE: 5g druid_processing_buffer_sizeBytes: '100000000' druid_processing_numMergeBuffers: 1 druid_processing_numThreads: 3 ``` ### Standard_D8s_v3 (6 instances) // daily cost 2000NTD // 35 histo pods ``` historical: replicaCount: 35 config: DRUID_MAXDIRECTMEMORYSIZE: 5g druid_processing_buffer_sizeBytes: '100000000' druid_processing_numMergeBuffers: 2 druid_processing_numThreads: 7 ``` ```sql= SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id, news_category_lv2_id FROM active_user_all_split_12000_theta_sub8_001 GROUP BY news_category_lv1_id, news_category_lv2_id ``` FAIL ```sql= SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id FROM active_user_all_split_12000_theta_sub8_001 GROUP BY news_category_lv1_id ``` 21.32 ```sql= SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv2_id FROM active_user_all_split_12000_theta_sub8_001 GROUP BY news_category_lv2_id ``` 23.96 ### druid_processing_numMergeBuffers: 2000000000 // 2g ```sql= SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id, news_category_lv2_id FROM active_user_all_split_12000_theta_sub8_001 GROUP BY news_category_lv1_id, news_category_lv2_id ``` 28.93, 23.55 // 35 histo pods // bound by disk IO ![](https://i.imgur.com/asPTrqQ.png) 27.18 for 50 histo pods. 23.51 for the 2nd query ### 30k/mo B4ms * 6 :::success seems to be the best receipe so far ::: https://gist.github.com/y56/292f0b4745c06bb31278df5ca09e7213 :::info restart ::: ![](https://i.imgur.com/PV7hF8W.png) ![](https://i.imgur.com/yGvdrAW.png) ![](https://i.imgur.com/8QJhtPH.png) ### PV 128G P10 ```sql SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id") FROM active_user_all_split_12000_theta_002 ``` 121s, then 77.7s, then 42.8 ---- ---- ---- ## observation ### ingestion time * 4 or 5 middle managers on B4ms on 4 pods (filling all slots)(4 slots each) * one-month active_user (5.1GB) * ==> ingesting in 16 min * 3-day data * ==> ingesting in 3 min * 254-day data * ==> ingesting in 1 hr 15 min (for HHL Sketch Builder) * ==> ingesting in 1 hr 15 min (for Theta Sketch) * 2 middle managers on B4ms on 2 pods (filling all slots)(4 slots each) * 254-day data * ==> ingesting in 2 hr 30 min (for HHL Sketch Builder) * ==> ingesting in ? hr ?? min (for Theta Sketch) ## some tuning tips * want higher ingestion concurrency/speed: * more middle managers * Batch ingestion, file size * 100~300 MB * ## some Empirical knowledge Druid will lose cached query results if AKS restart! Try to maske difficult queries after a restart for example: ```sql= SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id, news_category_lv2_id FROM active_user_all_split_12000_theta_sub8_001 GROUP BY news_category_lv1_id, news_category_lv2_id ``` or ```sql= SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id") FROM active_user_all_split_12000_theta_sub8_001 ``` ## sharing ### pros and cons for use cases ###