Druid on AKS
====
## References
### Druid architecture
https://medium.com/@aeli/apache-druid-setup-monitoring-and-auto-scaling-on-kubernetes-91739e350fac
https://anskarl.github.io/post/2019/druid-part-1/
https://druid.apache.org/docs/latest/design/processes.html#server-types
### Bitmap
https://levelup.gitconnected.com/insights-into-indexing-using-bitmap-index-c28a3db1ad97
### Apache DataStetch --- approximate count distinct
https://datasketches.apache.org/docs/Theta/InverseEstimate.html
## Access to AKS
### by powershell
```
az login # auth thru browser
az aks install-cli # instll kubectl for powershell
```
#### Create a resource group
https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough#create-a-resource-group
or
https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough-powershell#create-a-resource-group
#### Create AKS cluster
https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough-portal#create-an-aks-cluster
#### Connect to the cluster
```
az account set --subscription $AZURE_SUBSCRIPTION
az aks get-credentials --resource-group kg-one-id-deploy-dev --name $AZURE_KUBERNETES_NAME
```
in our case,
```
az account set --subscription 35015be4-a4d5-4a9d-a705-d88eabccfa2f
az aks get-credentials --resource-group kg-one-id-deploy-dev --name kg-aks-druid-dev2
```
then we will have
Use `kubectl get all` to check if succeed
### by WSL
```
sudo apt install kubectl
```
then use power shell fisr to do
```
az account set --subscription $AZURE_SUBSCRIPTION
az aks get-credentials --resource-group kg-one-id-deploy-dev --name $AZURE_KUBERNETES_NAME
```
to get `C:\Users\eugene_wang\.kube` and then copy it to `~/.kube/config`
### by local/web cloud shell
The only difference local/web is that web azure cloud shell have already done `az login` for you. And cloud shell has install many stuff for you.
Note that we can't `su` or `sudo` in cloud shell.
Be careful that web/local cloud shell closes itself, and you will lose all printout
#### for "no storage mounted" error in cloud shell


Create a fake storage in "advanced settings". Can login and set AKS but files get deleted after logout.

##### if you have a proper storage afterward
See: https://techcommunity.microsoft.com/t5/azure/azure-cloud-shell-error/m-p/71089#M489
1. Run "clouddrive unmount"
2. Restart Cloud Shell via restart icon or exit and relaunch
3. You should be prompted with the storage creation dialog again
## Deployment with Helm
```
helm install myReleaseName myRepo/myChart --namespace myNamespace -f myValues.yaml
```
Clone the chart from: https://github.com/helm/charts (the original)
or https://github.com/y56/charts/blob/master/incubator/druid/values.yaml
values.yaml: https://github.com/ASUS-AICS/kg-oneid-analytics-system/wiki/druid-settings
```bash
helm repo add incubator https://charts.helm.sh/incubator
helm install druid incubator/druid
helm install druid incubator/drui
d -f my_values.yaml
```
:::warning
`values.yaml` may contain secrets
:::
### Helm 2 vs Helm 3
syntax and chart hub url are both changed
### Get in the GUI by port-forward
port-forward the Druid "router service" 8888 to localhost 5566
```
kubectl port-forward svc/druid-router 5566:8888
```
open http://127.0.0.1:5566 in the browser
Note that if you do the forwarding in Windows (e.g., in PowerShell), the WSL2 can't acheive the port.
If you are using Cloud shell, be careful that some ports already in use and we can't sudo.
Or, we can set druid-router as ingress pod to obtain a public IP (assigned by AKS).
### Add Azure Storage extensions through values.yaml
https://druid.apache.org/docs/latest/development/extensions.html#loading-core-extensions
https://druid.apache.org/docs/latest/development/extensions-core/azure.html
```
druid_storage_type: azure
```
```
druid_azure_account: druidsampledata
druid_azure_key: abcdxxxxxxxx==
druid_azure_container: test-active-user
druid.azure.prefix: myPrefix
druid_azure_protocol: https
druid_azure_maxTries: '3'
druid_azure_maxListingLength: '1024'
```
:::info
Apache DataSketches is installed by default.
:::
### Updating k8s config by helm
### `Helm uninstall`
```
helm uninstall myReleaseName --namespace myNamespace
helm uninstall druid
```
## some useful commands/tools
### kill all pods in kubernetes, or delete things
[Ref](https://stackoverflow.com/questions/33509194/command-to-delete-all-pods-in-all-kubernetes-namespaces)
```
kubectl delete pods --all --all-namespaces
```
Avoid deleting system pods. Recreating wastes time. Might break the cluster and need to restart .
### start/stop cluster
https://docs.microsoft.com/en-us/azure/aks/start-stop-cluster
```
az aks stop --name myAKSCluster --resource-group myResourceGroup
az aks start --name myAKSCluster --resource-group myResourceGroup
```
```
az aks stop --name kg-aks-druid-dev2 --resource-group kg-one-id-deploy-dev2
az aks start --name kg-aks-druid-dev2 --resource-group kg-one-id-deploy-dev2
```
// stop and start can take up to 10 minutes
### restart nodes
https://pumpingco.de/blog/restart-a-node-in-aks/
or just stop then start the whole cluster
### [activity log](https://portal.azure.com/#@ASUS.com/resource/subscriptions/35015be4-a4d5-4a9d-a705-d88eabccfa2f/resourceGroups/kg-one-id-deploy-dev/providers/Microsoft.ContainerService/managedClusters/kg-aks-druid-dev2/eventlogs)
need access to see history logs
### hello-world service
to check if pod/node/cluster works well
http://20.195.53.3/
http://20.43.168.59/
### delete all pods
```
for A in `k get po | grep locust | sed 's/ .*//'`; do kubectl delete pod/$A; do
ne
```
### login into a pod
```
kubectl exec --stdin --tty pod/druid-historical-0 -- /bin/sh
```
### kubectl describe XXXX
### kubectl logs XXXX
### kubernetes dashboard
https://github.com/kubernetes/dashboard
https://github.com/kubernetes/dashboard/blob/master/docs/user/access-control/creating-sample-user.md
execute:
```
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.2.0/aio/deploy/recommended.yaml
kubectl proxy
```
execute:
```
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard
EOF
```
execute:
```
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard
EOF
```
execute:
```
kubectl -n kubernetes-dashboard get secret $(kubectl -n kubernetes-dashboard get sa/admin-user -o jsonpath="{.secrets[0].name}") -o go-template="{{.data.token | base64decode}}"
```
It should print something very long like: `eyJhbGciO......Tuuw2svnV6NYQ`
Finally, go to:
http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
use the token
### Setup Monitoring using Prometheus and Grafana
https://medium.com/@aeli/apache-druid-setup-monitoring-and-auto-scaling-on-kubernetes-91739e350fac
```
helm repo add stable https://charts.helm.sh/stable
helm install demo stable/prometheus-operator
```
[[WIP]]
----
----
## some AKS/WSL/K8S/minikube issues
whenever suffering error, check internet connection by `nslookup google.com` and try [wsl-fix-resolvconf.sh](https://gist.github.com/nfekete/7a277bf9e25e89e1c8bfb8b64dcc08ed)
### wsl internet issue: bad virtual DNS
172.17.64.65 ==> Ethernet adapter vEthernet (WSL) // virtual DNS
172.23.44.53 ==> corpnet.asus // ASUS LAN DNS
btw we can't use 8.8.8.8 as DNS server in ASUS LAN, so some solns from stackoverflow will

check interface by ```ipconfig /all ``` at powershell
### [Pod Stuck In Terminating State](https://containersolutions.github.io/runbooks/posts/kubernetes/pod-stuck-in-terminating-state)
### pods stuck in `ImagePullBackOff` or `ErrImagePull`
when `kubectl get po` say a pod's status is ErrImagePull or ImagePullBackOff
do `kubectl describe pod/the-pod-name` to check logs
may be caused by broken internet or lack of access
#### broken internet bc of wsl dns
```
Failed to pull image "apache/druid:0.19.0": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 192.168.65.2:53: read udp 192.168.49.2:57045->192.168.65.2:53: i/o timeout
```
#### some helm chart uses private images
```
Failed to pull image "gcr.io/zenatix-data-archiver/kube-druid:0.0.1": rpc error: code = Unknown desc = Error response from daemon: unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication
Normal BackOff 33m (x5 over 34m) kubelet Back-off pulling image "gcr.io/zenatix-data-archiver/kube-druid:0.0.1"
```
### `kubectl get all -A` fails bc of wsl dns
```
Unable to connect to the server: dial tcp: lookup kg-aks-druid-dev2-dns-2b85eb85.hcp.southeastasia.azmk8s.io on 172.17.177.225:53: read udp 172.17.177.234:54522->172.17.177.225:53: i/o timeout
```
### nodes are `notReady`
may be shutdown. Get azure access to start the vmss/nodes
See: https://ithelp.ithome.com.tw/articles/10193944
### get access of the resource group `MC_rgname_aksname_southeastasia`
add tag to resource group to keep it on, otherwise will be shutdown at 9pm
need access to MC(managed clusters) to start VMSS( virtual machine scale sets)
MC_rgname_aksname_southeastasia也是個RG
是在我們原本的Resource Group裡建立AKS時自動被生出來的
AKS用到的服務(VM, DNS, IP, disk ...etc)都歸MC_XXX_YYY_ZZZ管

#### to start nodes/VMs by VMSS

### get access to ACR(Azure Container Registry)
Need access to ACR to pull images.
```
Failed to pull image "kgacrdruiddev.azurecr.io/webfrontend:v1": rpc error: code = Unknown desc = Error response from daemon: Get https://kgacrdruiddev.azurecr.io/v2/webfrontend/manifests/v1: unauthorized: authentication required, visit https://aka.ms/acr/authorization for more information.
```
### wsl2 can't access port forwarded by powershell
GG, still an open issu (as for 2021.Mar.26)
https://github.com/microsoft/WSL/issues/5211#issuecomment-628565569
Windows can see port opened by WSL, but not vice versa
:::warning
use WSL to forward ports
:::
### minikube setup
WSL2 + windows docker
make sure WSL2 has internet
### switch between k8s clusters (i.e., AKS and minikube)
```
kubectl config view # lists clusters
kubectl config use-context kg-aks-druid-dev2 # switch to another cluster
kubectl config use-context minikube
```
:::warning
Need to fix WSL2 DNS after switching clusters!!!
:::
----
----
最實際的問題:
1. 你覺得 Druid 有機會取代 Synapse 嗎XD 考量成本&效能
1. given data of same size
1. ingection time:
2. query's response time:
3. cost given the same uptime:
2. cost-perf plot
data size 18GB (for 2 days, active_user)
injection time(containing processing)=?
query response time=?
MC's cost (3VM)=?
usage scenario=? query for one week ?
the SQL will be like??
----
----
實作面:
1. 如果我們把資料 load 進去之後發現 query 速度很慢,我們有哪些設定能調去做 optimize?
- 我預期是有不同的 index 能設 / 設 partition / 調 schema / 根據時間欄位做一些設定,比較複雜的應該是各個 component 之間連線或資源的設定,這部分我就比較沒經驗..
2. 如果 query 速度很慢,我們能怎麼觀察 bottleneck 在哪,決定把哪裡的機器開大、或調整設定?
- 聽 EMR team 用 elasticsearch 的經驗,好像會去看 vm 的 cpu utilization 以及確認運算有沒有平行化(但我不知道 Druid 有沒有平行運算的設定)
3. Load 一天或一個月資料要花多少時間,你現在 load 資料的方式以後有辦法自動化嗎?
- 之前 survey 是覺得用 http api 應該可行,但如果能直接從 data lake load 進去也很好,只是要確認能不能自動化
4. 想知道佈到 aks 的過程~還有他實際上裝了哪些東西
5. 想知道我今天也想要開發的話,可以怎麼開始~?
- 我想像... 是在 local 下 az get-credential -> `kubectl get services` -> 進 localhost?
Druid 方面:
1. Brett 問的那個問題,他讓 query 變快的原理?
- Roll-up 應該是個很大的基礎
- no further benefits from roll-up and granularity
- approx distinct count
----
----
## some druid issue
### debug -- general
```bash
for A in `k get po | sed 's/ .*//'`;
do echo //////////$A; kl pod/$A | egrep -i "error|fail|ExecutionException";
done
```
### [Heisenbug] java.lang.IllegalStateException: Already connected at sun.net.www.protocol.http.HttpURLConnection
This happened when ingesting. Really don't why. Success when retry and never happen again.
### middle managers being blacklisted
https://druid.apache.org/docs/latest/design/overlord.html#blacklisted-workers
### use Theta will fail when ingesting and middle managers all die out
#### shrink file size
18-GB files are too large.
100~200 MB files are safe.
#### ingestion parameter
reduce to 500M: (while default is 1G)
```
"maxFetchCapacityBytes": 536870912,
"maxCacheCapacityBytes": 536870912
```
#### give middle manager JVM more space
Each subtask is processed by one JVM
Tune middle manager parameters:
```
java option array ....
xmx
mxs
direct...
```
### [sovled] segments not 100% available
because volume of historical not enough
```
2021-04-11T10:24:27,300 ERROR [ZKCoordinator--0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Failed to load segment for dataSource: {class=org.apache.druid.server.coordination.SegmentLoadDropHandler, exceptionType=class org.apache.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[active_user_all_split_12000_hll_sub16_001_2020-12-03T00:00:00.000Z_2020-12-04T00:00:00.000Z_2021-04-10T19:11:02.644Z_16], segment=DataSegment{binaryVersion=9, id=active_user_all_split_12000_hll_sub16_001_2020-12-03T00:00:00.000Z_2020-12-04T00:00:00.000Z_2021-04-10T19:11:02.644Z_16, loadSpec={type=>azure, containerName=>personal, blobPath=>active_user_all_split_12000_hll_sub16_001/20201203T000000.000Z_20201204T000000.000Z/2021-04-10T19_11_02.644Z/16/index.zip}, dimensions=[app_type, browser, bu, city, first_date, is_member, is_new, os, source, news_category_lv1_id, news_category_lv2_id], metrics=[sum_log_count, hll_one_id], shardSpec=NumberedShardSpec{partitionNum=16, partitions=98}, lastCompactionState=null, size=753977}}
```
https://github.com/apache/druid/issues/8706
### brokers die during large ingestion
Put brokers and middle managers on different nodes. [WIP]
:::info
keyword: inter-pod anti-affinity
:::
### Error: Resource limit exceeded
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id, news_category_lv2_id
FROM active_user_all_split_12000_theta_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
GROUP BY news_category_lv1_id, news_category_lv2_id
```
```
Error: Resource limit exceeded
Not enough aggregation buffer space to execute this query. Try increasing druid.processing.buffer.sizeBytes or enable disk spilling by setting druid.query.groupBy.maxOnDiskStorage to a positive number.
org.apache.druid.query.ResourceLimitExceededException
```
:::success
historical:
enabled: true
name: historical
replicaCount: 14
port: 8083
serviceType: ClusterIP
config:
# DRUID_XMX: 512m # STRANGE QQ
# DRUID_XMS: 512m
DRUID_MAXDIRECTMEMORYSIZE: 5g # (druid.processing.numThreads + druid.processing.numMergeBuffers + 1) * druid.processing.buffer.sizeBytes
druid_processing_buffer_sizeBytes: '100000000' # This specifies a buffer size for the storage of intermediate results. The computation engine in both the Historical and Realtime processes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed. # defalt: auto (max 1GB)
druid_processing_numMergeBuffers: 1 # processing threads/4
druid_processing_numThreads: 3 # number of cores - 1
# druid_monitoring_monitors: '["org.apache.druid.client.cache.CacheMonitor", "org.apache.druid.server.metrics.HistoricalMetricsMonitor", "org.apache.druid.server.metrics.QueryCountStatsMonitor"]'
# druid_segmentCache_locations: '[{"path":"/var/druid/segment-cache","maxSize":300000000000}]'
# druid_cache_sizeInBytes: '2000000000' # 2 GB # A general rule-of-thumb for sizing the Historical heap is (0.5GB * number of CPU cores), with an upper limit of ~24GB. # default: min(1GiB, Runtime.maxMemory / 10)
druid_historical_cache_useCache: true
druid_historical_cache_populateCache: true
:::
### Error:
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv2_id
FROM active_user_all_split_12000_theta_sub8_001
-- WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
GROUP BY news_category_lv2_id
LIMIT 100
```
```
Error: Unknown exception
org.jboss.netty.channel.ChannelException: Channel disconnected
java.util.concurrent.ExecutionException
```
```
Warning Unhealthy 43m (x2 over 43m) kubelet Liveness probe failed: Get http://10.244.2.81:8083/status/health: dial tcp 10.244.2.81:8083: connect: connection refused
Warning Unhealthy 42m (x2 over 43m) kubelet Readiness probe failed: Get http://10.244.2.81:8083/status/health: dial tcp 10.244.2.81:8083: connect: connection refused
```
## tuning
### choosing hardware
#### VM instance
See: https://docs.imply.io/2021.02/planning/#data-servers
```
(AWS) vCPU ECU Memory (GiB) Instance Storage (GB) Linux/UNIX Usage
r4.2xlarge 8 31 61 GiB EBS Only $0.532 per Hour
```
will try Azure B8MS
:::info
as default, 4 pods per node
:::
### Persistent Volume and Persistent Volume Claim
To know PV and PVC, see: https://kubernetes.io/docs/concepts/storage/persistent-volumes/
To know Azure's PV, see: https://docs.microsoft.com/en-us/azure/aks/azure-disks-dynamic-pv#built-in-storage-classes
We re using `storageClass: "managed-premium"`, already the fast
For PV options of Azure: https://docs.microsoft.com/en-us/azure/aks/concepts-storage#storage-classes
Remember to delete old PV/PVC if you are changing storage class, or, try to ***migrate*** (survey needed)
#### [WIP] better PV

https://docs.microsoft.com/en-us/azure/aks/azure-disks-dynamic-pv
https://docs.microsoft.com/en-us/azure/aks/concepts-storage
### Middle Manager / data ingestion
Middle Managers are in charge of injection.
* raw data file size
* 18G is too large, few hundred MB is good
* 18-GB file take 1.0 hr (same on B4ms/B2ms, for https/deep storage )
* file is smallest unit, one sub-task takes care of one file
* 500 150-MB files w/ B4ms*6 w/ 5 middle managers ==> 8~9 min
* number of middle manager pods
* don't let two middle manager pods sit on one node
* number of sub-task when ingestion
* set to number of total slots
*
### cache and thread
Use default first. Default values are aligned with [Basic Tuning Guides](http://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html)
Unless encountering out-of-memory in JVM
### JVM tuning
http://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html#jvm-tuning
### the second partition
## experiment
### sample data 18GB(10 days)
https://druidsampledata.blob.core.windows.net/test-active-user/part-00000-tid-3812279347746115856-60712cdf-da41-4bbc-b99b-04ffd513ea6f-48-1-c000.json
azure://test-active-user/part-00000-tid-3812279347746115856-60712cdf-da41-4bbc-b99b-04ffd513ea6f-48-1-c000.json
azure://test-active-user/test-small-active-user.json
#### not preprocessed
```sql
SELECT APPROX_COUNT_DISTINCT("one_id")
FROM "sampledruiddata-apr8-sub4"
WHERE ("bu" = 'ettoday') AND ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02')
```

```sql
SELECT APPROX_COUNT_DISTINCT_DS_HLL("one_id")
FROM "sampledruiddata-apr8-sub4"
WHERE ("bu" = 'ettoday') AND ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02')
```

```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("one_id")
FROM "sampledruiddata-apr8-sub4"
WHERE ("bu" = 'ettoday') AND ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02')
```

#### preprocessed
```sql
SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id")
FROM "sampledruiddata-HHLSB-allDefault_001"
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday')
```

```sql
SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id"), "__time"
FROM "sampledruiddata-HHLSB-allDefault_001"
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday')
GROUP BY 2
```

```sql
SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id")
FROM "sampledruiddata-HHLSB-allDefault_001"
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday') AND ("city" = 'Tainan')
```

### from 10/1~10/31 // ingested in 8 min with preprocessing
#### accuracy, about 4%


#### 0.98 sec
```sql
SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id")
FROM "DrSaDa-active_user_20201001_20201031_split500-HHL-sub18"
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday')
```

#### 0.55, This is CACHED!
```sql
SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id"), "__time"
FROM "DrSaDa-active_user_20201001_20201031_split500-HHL-sub18"
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday')
GROUP BY 2
```
#### 0.79 sec
```sql
SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id")
FROM "DrSaDa-active_user_20201001_20201031_split500-HHL-sub18"
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday') AND ("city" = 'Tainan')
```
### preprocessed -- Theta
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id")
FROM active_user_oct_split3000_theta_002
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday')
QQQQ 0.57
```
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id")
FROM active_user_oct_split3000_theta_002
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday') AND ("city" = 'Tainan')
QQQQ 0.67
```
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), "__time"
FROM active_user_oct_split3000_theta_002
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-02') AND ("bu" = 'ettoday')
GROUP BY 2
????????
```
### preprocessed -- HLL -- 254 days
k8s settings: https://gist.github.com/y56/c6a1493690380b48e6d52cfc47acbab6
```sql
SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id")
FROM active_user_all_split_12000_hll_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
```
8.84 => 1.76
```sql
SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id")
FROM active_user_all_split_12000_hll_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
AND ("bu" = 'ettoday')
```
16.76s => 2.68s
```sql
SELECT APPROX_COUNT_DISTINCT_DS_HLL("hll_one_id")
FROM active_user_all_split_12000_hll_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
AND ("bu" = 'ettoday')
AND ("city" = 'Tainan')
```
3.63s
### preprocessed -- Theta -- 254 days
k8s settings: https://gist.github.com/y56/c6a1493690380b48e6d52cfc47acbab6
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id")
FROM active_user_all_split_12000_theta_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
```
3.58s (0.7s for 2nd query, converge to ~0.22s if repeated many times)
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id")
FROM active_user_all_split_12000_theta_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
AND ("bu" = 'ettoday')
```
2.25s (0.68s for 2nd query, converge to ~0.22s if repeated many times)
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id")
FROM active_user_all_split_12000_theta_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
AND ("bu" = 'ettoday')
AND ("city" = 'Tainan')
```
0.63 (max 0.21s for repeated)
:::warning
restart
:::
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id")
FROM active_user_all_split_12000_theta_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
AND ("bu" = 'ettoday')
AND ("city" = 'Tainan')
```
4.80s, 2.28s, 1.38s, 0.92, 0.45, 0.38, 0.30, 0.28. 0.27, 0.35
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id")
FROM active_user_all_split_12000_theta_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
AND ("bu" = 'ettoday')
```
2.47 1.11 0.70, 0.39, 0.34, 0.32, 0.32
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id")
FROM active_user_all_split_12000_theta_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
```
0.94 0.59 0.45 0.37 0.37
:::info
restart
:::
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), __time
FROM active_user_all_split_12000_theta_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
GROUP BY 2
```
5.49 ==> 6.25 Standard_D8s_v3
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id
FROM active_user_all_split_12000_theta_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
GROUP BY news_category_lv1_id
```
6.29 ==> 6.14 Standard_D8s_v3
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id
FROM active_user_all_split_12000_theta_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
AND ("bu" = 'ettoday')
GROUP BY news_category_lv1_id
```
5.84 ==> 6.11 Standard_D8s_v3
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id
FROM active_user_all_split_12000_theta_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
AND ("bu" = 'ettoday')
AND ("city" = 'Tainan')
GROUP BY news_category_lv1_id
```
1.31 ==> 1.76 Standard_D8s_v3
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id
FROM active_user_all_split_12000_theta_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
AND ("bu" = 'ettoday')
AND ("city" = 'Taipei')
GROUP BY news_category_lv1_id
```
2.66 ==> 3.25 Standard_D8s_v3
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv2_id
FROM active_user_all_split_12000_theta_sub8_001
-- WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
GROUP BY news_category_lv2_id
LIMIT 100
```
6.79s ERROR
### druid_processing_buffer_sizeBytes: '100000000' 1G * 14 historical pods
Standard_D8s_v3 (6 instances) // daily cost 2000NTD
k8s settings: https://gist.github.com/y56/abd07e37ff8aaf98fadd925a1f8a19a2
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id, news_category_lv2_id
FROM active_user_all_split_12000_theta_sub8_001
WHERE ("__time" BETWEEN TIMESTAMP '2020-10-01' AND TIMESTAMP '2020-10-31')
GROUP BY news_category_lv1_id, news_category_lv2_id
```
7.06s 3.94 2.59 2.52 2.29 2.46 2.26
10.96 7.5
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv2_id
FROM active_user_all_split_12000_theta_sub8_001
GROUP BY news_category_lv2_id
```
65.41 (peak 23.4 core with 6 VM * 8 core)
```sql=
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id, news_category_lv2_id
FROM active_user_all_split_12000_theta_sub8_001
GROUP BY news_category_lv1_id, news_category_lv2_id
```
62.30 60.03 // ㄎㄅ
### Standard_D8s_v3 (6 instances) // daily cost 2000NTD // 28 histo pods
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id, news_category_lv2_id
FROM active_user_all_split_12000_theta_sub8_001
GROUP BY news_category_lv1_id, news_category_lv2_id
26.78

### Standard_D8s_v3 (6 instances) // daily cost 2000NTD // 35 histo pods
26.56

```
historical:
replicaCount: 35
config:
DRUID_MAXDIRECTMEMORYSIZE: 5g
druid_processing_buffer_sizeBytes: '100000000'
druid_processing_numMergeBuffers: 1
druid_processing_numThreads: 3
```
### Standard_D8s_v3 (6 instances) // daily cost 2000NTD // 35 histo pods
```
historical:
replicaCount: 35
config:
DRUID_MAXDIRECTMEMORYSIZE: 5g
druid_processing_buffer_sizeBytes: '100000000'
druid_processing_numMergeBuffers: 2
druid_processing_numThreads: 7
```
```sql=
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id, news_category_lv2_id
FROM active_user_all_split_12000_theta_sub8_001
GROUP BY news_category_lv1_id, news_category_lv2_id
```
FAIL
```sql=
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id
FROM active_user_all_split_12000_theta_sub8_001
GROUP BY news_category_lv1_id
```
21.32
```sql=
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv2_id
FROM active_user_all_split_12000_theta_sub8_001
GROUP BY news_category_lv2_id
```
23.96
### druid_processing_numMergeBuffers: 2000000000 // 2g
```sql=
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id, news_category_lv2_id
FROM active_user_all_split_12000_theta_sub8_001
GROUP BY news_category_lv1_id, news_category_lv2_id
```
28.93, 23.55 // 35 histo pods // bound by disk IO

27.18 for 50 histo pods. 23.51 for the 2nd query
### 30k/mo B4ms * 6
:::success
seems to be the best receipe so far
:::
https://gist.github.com/y56/292f0b4745c06bb31278df5ca09e7213
:::info
restart
:::



### PV 128G P10
```sql
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id")
FROM active_user_all_split_12000_theta_002
```
121s, then 77.7s, then 42.8
----
----
----
## observation
### ingestion time
* 4 or 5 middle managers on B4ms on 4 pods (filling all slots)(4 slots each)
* one-month active_user (5.1GB)
* ==> ingesting in 16 min
* 3-day data
* ==> ingesting in 3 min
* 254-day data
* ==> ingesting in 1 hr 15 min (for HHL Sketch Builder)
* ==> ingesting in 1 hr 15 min (for Theta Sketch)
* 2 middle managers on B4ms on 2 pods (filling all slots)(4 slots each)
* 254-day data
* ==> ingesting in 2 hr 30 min (for HHL Sketch Builder)
* ==> ingesting in ? hr ?? min (for Theta Sketch)
## some tuning tips
* want higher ingestion concurrency/speed:
* more middle managers
* Batch ingestion, file size
* 100~300 MB
*
## some Empirical knowledge
Druid will lose cached query results if AKS restart!
Try to maske difficult queries after a restart
for example:
```sql=
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id"), news_category_lv1_id, news_category_lv2_id
FROM active_user_all_split_12000_theta_sub8_001
GROUP BY news_category_lv1_id, news_category_lv2_id
```
or
```sql=
SELECT APPROX_COUNT_DISTINCT_DS_THETA("theta_one_id")
FROM active_user_all_split_12000_theta_sub8_001
```
## sharing
### pros and cons for use cases
###