# SUSE Observability Cluster 部屬
* SUSE Observability(前身為 StackState)可用來觀察 Kubernetes 叢集及其工作負載。
* SUSE Observability 主要分為 Server 和 Agent 兩個部分,Server 負責儲存和展示數據,Agent 負責擷取資料並傳送給 Server。
Server 的元件有:
1. Topology (StackGraph)
1. Metrics (VictoriaMetrics)
1. Traces (ClickHouse)
1. Logs (ElasticSearch)

## 1.安裝重點注意事項
1. 需要安裝 CSI - local-path-storage,如果沒有 default storage class,請把這個設為 default。
2. 因為是 hadoop cluster 的關係,node 會比較需要更多的 RAM。
3. trial mode 沒有 HA 機制。
4. 全環境請設定固定 IP。
5. traces 功能需要 AP 有使用 opentelementry 套件。
6. 如果忘了加入 local-path-storage 就建立 obs 的話,重裝的時候請先刪除錯誤的 PVC。
### 2. IP & 資源紀錄
obsm1: 192.168.11.75
obsw1: 192.168.11.76
obsw2: 192.168.11.77
每個節點資源: 4 core,12G memory
* 檢查名稱解析
```
$ host obs1.example.com
obs1.example.com has address 192.168.11.76
$ host otlp-stackstate.example.com
otlp-stackstate.example.com has address 192.168.11.76
$ host otlp-http-stackstate.example.com
otlp-http-stackstate.example.com has address 192.168.11.76
```
## 3. 安裝 local-path-storage 預設 storage backend
* 安裝 local-path
```
$ wget -O - https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.22/deploy/local-path-storage.yaml | kubectl apply -f -
```
* 設定為 default storage,有些 PVC 建立的時候會需要 default backend。
```
$ kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local-path (default) rancher.io/local-path Delete WaitForFirstConsumer false 349d
```
## 4. helm install OBS 應用服務。
:::info
相關的 helm chart values 已經先 download 好,放在全離線包中。
如果沒有 download 過,請參考官網上的 air-gap 安裝模式。
:::
### 4.1 前置作業產出 values 相關參數檔
:::warning
在 SCC 上獲取自己的註冊碼
:::
:::warning
注意 baseUrl,這個是存取 OBS 服務的網址,需要 DNS 相關設定。
請指向到 OBS Cluster 其中一個 worker 即可。
Production 請搭配 DNS+LB 進行部署。
:::
* 安裝 helm,透過 Helm 安裝,安裝最低版本至少 3.13.1
```
$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
$ chmod 700 get_helm.sh
$ ./get_helm.sh
$ helm version
version.BuildInfo{Version:"v3.17.1", GitCommit:"980d8ac1939e39138101364400756af2bdee1da5", GitTreeState:"clean", GoVersion:"go1.23.5"}
```
* 下載 helm chart
```
$ helm repo add suse-observability https://charts.rancher.com/server-charts/prime/suse-observability
$ helm repo update
$ helm fetch suse-observability/suse-observability
$ helm fetch suse-observability/suse-observability-values
$ ls -l | grep suse
-rw-r--r-- 1 root root 561319 Mar 4 17:56 suse-observability-2.3.0.tgz
-rw-r--r-- 1 root root 8420 Mar 4 17:57 suse-observability-values-1.0.7.tgz
```
* 此命令將產生一個 `$VALUES_DIR/suse-observability-values/templates/baseConfig_values.yaml` 文件,`$VALUES_DIR/suse-observability-values/templates/sizing_values.yaml` 其中包含安裝 SUSE Observability Helm Chart 所需的設定。
```
$ export VALUES_DIR=.
# 注意要替換為自己的 license key
$ helm template \
--set license='xxxx-xxxx-xxxx' \
--set baseUrl='https://obs1.example.com' \
--set sizing.profile='trial' \
suse-observability-values suse-observability-values-1.0.7.tgz\
--output-dir $VALUES_DIR
$ ls -l suse-observability-values/templates/
total 8
-rw-r--r-- 1 root root 511 Mar 4 17:58 baseConfig_values.yaml
-rw-r--r-- 1 root root 3961 Mar 4 17:58 sizing_values.yaml
```
### 4.2 確認 ingress 組態資訊(ingress-values.yaml)
:::warning
注意 host,這個是存取 OBS 服務的網址,需要 DNS 相關設定。
請指向到 OBS Cluster 其中一個 worker 即可。
Production 請搭配 DNS+LB 進行部署。
:::
```
$ vim ingress-values.yaml
ingress:
enabled: true
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
hosts:
- host: obs1.example.com
```
### 4.3 確認 traces 組態資訊(ingress_otel_values.yaml)
:::warning
注意 host,這個是存取 OBS trace 服務的網址,需要 DNS 相關設定。
請指向到 OBS Cluster 其中一個 worker 即可。
Production 請搭配 DNS+LB 進行部署。
:::
```
$ vim ingress_otel_values.yaml
opentelemetry-collector:
ingress:
enabled: true
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/backend-protocol: GRPC
hosts:
- host: otlp-stackstate.example.com
paths:
- path: /
pathType: Prefix
port: 4317
additionalIngresses:
- name: otlp-http
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
hosts:
- host: otlp-http-stackstate.example.com
paths:
- path: /
pathType: Prefix
port: 4318
```
### 4.4 email 設定
```
$ vim mail.yaml
stackstate:
email:
enabled: true
sender: "obs@lab.com"
server:
host: "smtp.google.com"
port: 25
protocol: smtp
auth:
username: "null"
password: "null"
```
### 4.5 安裝 OBS Cluster
```
$ helm upgrade --install \
--namespace suse-observability \
--create-namespace \
--values $VALUES_DIR/suse-observability-values/templates/baseConfig_values.yaml \
--values $VALUES_DIR/suse-observability-values/templates/sizing_values.yaml \
--values ingress-values.yaml \
--values ingress_otel_values.yaml \
--values mail.yaml \
suse-observability \
suse-observability-2.3.0.tgz
```
* 確認 pod 是否都正常運行。
```
$ kubectl -n suse-observability get pod
NAME READY STATUS RESTARTS AGE
suse-observability-backup-conf-05t132644-s6c2p 0/1 Completed 0 6m48s
suse-observability-clickhouse-shard0-0 2/2 Running 0 6m48s
suse-observability-correlate-6b6bfdf686-k7k2n 1/1 Running 3 (2m41s ago) 6m48s
suse-observability-e2es-9586bcddb-nj5vt 1/1 Running 2 (3m4s ago) 6m48s
suse-observability-elasticsearch-master-0 1/1 Running 0 6m48s
suse-observability-hbase-stackgraph-0 1/1 Running 1 (5m30s ago) 6m48s
suse-observability-hbase-tephra-0 1/1 Running 0 6m48s
suse-observability-kafka-0 2/2 Running 2 (5m21s ago) 6m37s
suse-observability-kafkaup-operator-kafkaup-84658fb49d-c72dw 1/1 Running 0 6m48s
suse-observability-otel-collector-0 1/1 Running 0 6m48s
suse-observability-prometheus-elasticsearch-exporter-6fb6bpxtlk 1/1 Running 1 (5m49s ago) 6m48s
suse-observability-receiver-58b7bc9fbc-cqd25 1/1 Running 3 (2m16s ago) 6m48s
suse-observability-router-7f964cfbc4-mnjjv 1/1 Running 1 (5m47s ago) 6m48s
suse-observability-server-5578945476-w9qwg 1/1 Running 3 (35s ago) 6m48s
suse-observability-topic-create-05t132644-p9mtp 0/1 Completed 0 6m48s
suse-observability-ui-59c88887db-xr8pm 2/2 Running 0 6m48s
suse-observability-victoria-metrics-0-0 1/1 Running 0 6m48s
suse-observability-vmagent-0 1/1 Running 0 6m48s
suse-observability-zookeeper-0 1/1 Running 0 6m48s
$ kubectl -n suse-observability get ing
NAME CLASS HOSTS ADDRESS
PORTS AGE
suse-observability <none> obs1.example.com 192.168.11.75,192.168.11.76,192.168.11.77 80 27h
suse-observability-otel-collector <none> otlp-stackstate.example.com 192.168.11.75,192.168.11.76,192.168.11.77 80 27h
suse-observability-otel-collector-otlp-http <none> otlp-http-stackstate.example.com 192.168.11.75,192.168.11.76,192.168.11.77 80 3m1s
```
* 獲取 API_KEY
```
$ kubectl -n suse-observability get secret suse-observability-api-key -o jsonpath='{.data.API_KEY}' | base64 -d
6MkFG3Ve942N9tdRChlvKzKRBwrlj2Ci
```
## 使用 ingress 登入

* 預設帳號是 admin,密碼會放在 `baseConfig_values.yaml` 檔案下
```
$ cat suse-observability-values/templates/baseConfig_values.yaml | grep password
# Your SUSE Observability admin password is: x83LJNMco5Yzvo44
```

## 5. 加入監控叢集
:::info
agent cluster 註冊指令,這邊就是在要監控的下游叢集安裝 agent,來讓 OBS 可以監控。
:::
* 如果進入 OBS UI 沒有看到如圖畫面,那麼可能是 license 問題,可以檢查一下 `suse-observability-server` pod 是否有相關錯誤。
* 左上角點選 kubernetes

* 進入 OBS UI 後填選要加入的叢集名稱

* 下方會有提供 agent cluster 註冊指令

* 獲取 helm chart
```
$ helm repo add suse-observability https://charts.rancher.com/server-charts/prime/suse-observability
$ helm repo update
```
1. 修改指令並 skip 一些憑證驗證
2. 在要監控的下游叢集執行安裝以下指令
```
$ helm upgrade --install \
--namespace suse-observability \
--create-namespace \
--set-string 'stackstate.apiKey'='6MkFG3Ve942N9tdRChlvKzKRBwrlj2Ci' \
--set-string 'stackstate.cluster.name'='rke2' \
--set-string 'stackstate.url'='https://obs1.example.com/receiver/stsAgent' \
--set-string 'global.skipSslValidation'='true' \
--set-string 'nodeAgent.skipSslValidation'='true' \
--set-string 'clusterAgent.skipSslValidation'='true' \
--set-string 'logsAgent.skipSslValidation'='true' \
--set-string 'checksAgent.skipSslValidation'='true' \
suse-observability-agent suse-observability/suse-observability-agent
```
```
$ kubectl -n suse-observability get pod
NAME READY STATUS RESTARTS AGE
suse-observability-agent-checks-agent-56b6c94bdd-tscg8 1/1 Running 1 (58s ago) 2m6s
suse-observability-agent-cluster-agent-7d87c4dbc7-mbpvl 1/1 Running 1 (92s ago) 2m6s
suse-observability-agent-logs-agent-294gt 1/1 Running 0 2m6s
suse-observability-agent-logs-agent-qmttk 1/1 Running 0 2m6s
suse-observability-agent-logs-agent-rf9rp 1/1 Running 0 2m6s
suse-observability-agent-node-agent-g88bd 2/2 Running 1 (57s ago) 2m6s
suse-observability-agent-node-agent-qxn7t 2/2 Running 0 2m6s
suse-observability-agent-node-agent-vlwtx 2/2 Running 0 2m6s
```
* 部屬好後回到 OBS UI 確認是否已加入完畢

* 就可以觀察到 agent cluster 叢集數據

## 6. 安裝 open telemetry collector
* 如果需要收集 AP 的 traces 資料,就需要安裝 open telemetry collector
* 以下都是在 agent cluster 執行
```
$ vim otel-collector.yaml
extraEnvsFrom:
- secretRef:
name: open-telemetry-collector
mode: deployment
image:
repository: "otel/opentelemetry-collector-k8s"
ports:
metrics:
enabled: true
presets:
kubernetesAttributes:
enabled: true
extractAllPodLabels: true
config:
extensions:
bearertokenauth:
scheme: SUSEObservability
token: "6MkFG3Ve942N9tdRChlvKzKRBwrlj2Ci" # 注意需更換自己的 api-key
exporters:
otlp/stackstate:
auth:
authenticator: bearertokenauth
endpoint: otlp-stackstate.example.com:443
tls:
insecure: false
insecure_skip_verify: true
otlphttp/stackstate:
auth:
authenticator: bearertokenauth
endpoint: otlp-http-stackstate.example.com:4318
tls:
insecure: false
insecure_skip_verify: true
processors:
tail_sampling:
decision_wait: 10s
policies:
- name: rate-limited-composite
type: composite
composite:
max_total_spans_per_second: 500
policy_order: [errors, slow-traces, rest]
composite_sub_policy:
- name: errors
type: status_code
status_code:
status_codes: [ ERROR ]
- name: slow-traces
type: latency
latency:
threshold_ms: 1000
- name: rest
type: always_sample
rate_allocation:
- policy: errors
percent: 33
- policy: slow-traces
percent: 33
- policy: rest
percent: 34
resource:
attributes:
- key: k8s.cluster.name
action: upsert
value: rke2 # 注意需修改為你的叢集名稱
- key: service.instance.id
from_attribute: k8s.pod.uid
action: insert
filter/dropMissingK8sAttributes:
error_mode: ignore
traces:
span:
- resource.attributes["k8s.node.name"] == nil
- resource.attributes["k8s.pod.uid"] == nil
- resource.attributes["k8s.namespace.name"] == nil
- resource.attributes["k8s.pod.name"] == nil
connectors:
spanmetrics:
metrics_expiration: 5m
namespace: otel_span
routing/traces:
error_mode: ignore
table:
- statement: route()
pipelines: [traces/sampling, traces/spanmetrics]
service:
extensions:
- health_check
- bearertokenauth
pipelines:
traces:
receivers: [otlp]
processors: [filter/dropMissingK8sAttributes, memory_limiter, resource]
exporters: [routing/traces]
traces/spanmetrics:
receivers: [routing/traces]
processors: []
exporters: [spanmetrics]
traces/sampling:
receivers: [routing/traces]
processors: [tail_sampling, batch]
exporters: [debug, otlp/stackstate]
metrics:
receivers: [otlp, spanmetrics, prometheus]
processors: [memory_limiter, resource, batch]
exporters: [debug, otlp/stackstate]
```
```
$ helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
```
* 安裝時填入自己的 stackstate-api-key
```
$ kubectl create ns open-telemetry
$ kubectl create secret generic open-telemetry-collector \
--namespace open-telemetry \
--from-literal=API_KEY='6MkFG3Ve942N9tdRChlvKzKRBwrlj2Ci'
$ helm upgrade \
--install opentelemetry-collector open-telemetry/opentelemetry-collector \
--values otel-collector.yaml \
--namespace open-telemetry
```
```
$ kubectl -n open-telemetry get pod
NAME READY STATUS RESTARTS AGE
opentelemetry-collector-865cdcdf5b-v2pr5 1/1 Running 0 96s
```
## 驗證 trace 功能
1. 下載 task 工具
```
$ wget https://github.com/go-task/task/releases/download/v3.41.0/task_linux_amd64.tar.gz
$ tar -zxvf task_linux_amd64.tar.gz
$ cp task /usr/local/bin/
```
2. 下載 stackstate 使用的 sample
```
$ git clone https://github.com/ravan/observability-hands-on; cd observability-hands-on
```
* 先建立 `.env` 檔,此檔案請存放於 observability-hands-on 資料夾中。
* 重點在於以下兩個參數 `KUBECONFIG_FILE_PATH`、`KUBECONFIG_FILE_NAME`,主要是確認 kubeconfig 位置與檔案,在此使用常規的位置。
* 也需要更換自己的 `STS_API_KEY`
```
$ vim .env
STS_API_KEY=6MkFG3Ve942N9tdRChlvKzKRBwrlj2Ci
LOCAL_CLUSTER=false
CLUSTER_NAME=rke2
CLUSTER_OTLP_HTTP_ENDPOINT=otlp-stackstate.example.com:4318
KUBECONFIG_FILE_PATH=/root/.kube
KUBECONFIG_FILE_NAME=config
HELM_REPO=stackstate-addons
```
* 修改 values.yaml
:::danger
注意!!! 這個位置不是直接指向 Observability cluster 對外的 ingress,是指向已經安裝 opentelemetry 的 collector service。
:::
```
$ cd charts/dino-kiosk/
$ vim values.yaml
blockQueue: 'no'
nameOverride: ''
fullnameOverride: ''
otelHttpEndpoint: opentelemetry-collector.open-telemetry.svc.cluster.local:4318 # 修改此行
```
3. 執行 sample
```
rke2:~/observability-hands-on/charts/dino-kiosk # task labs:dino-kiosk:setup
```
* 在 `museum-dino-kiosk` namespace 下就會看到啟動這些 pod。
```
$ kubectl -n museum-dino-kiosk get pod
NAME READY STATUS RESTARTS AGE
ai-sim-service-85d4888d6-m24l4 1/1 Running 0 75s
build-a-dino-7c698b4df4-sgcvt 1/1 Running 0 75s
kiosk-visitors-56b4ff794f-2dtxc 1/1 Running 0 76s
kiosk-web-5d59848c57-km8q6 1/1 Running 0 76s
printer-3d-8499b76d95-llxcn 1/1 Running 0 76s
printing-queue-bd7c9b856-r9lr5 1/1 Running 0 75s
printing-service-7598cc487-d6clv 1/1 Running 0 76s
shipping-5b844565d9-wbwwh 1/1 Running 0 76s
```
* 模擬錯誤
```
rke2:~/observability-hands-on/charts/dino-kiosk # task labs:dino-kiosk:trigger
```
* 進到 OBS UI 選擇對應叢集與 namespace。

* 進到指定 pod 後點選 Traces,就可以看到相關資訊

## Rancher UI Extensions
在 Rancher 對接 SUSE Observability,URL 需要使用有效憑證(非自簽名憑證)
* 在 OBS UI 左下角點選 CLI 複製以下安裝 sts 指令

```
$ curl -o- https://dl.stackstate.com/stackstate-cli/install.sh | STS_URL="http://obs1.example.com" STS_API_TOKEN="HPoktLn_wz0J4Y7ffD0Eh7GN2dDOyc5V" bash
$ sts service-token --help
Manage service tokens.
Usage:
sts service-token [command]
Available Commands:
create Create a service token
delete Delete a service token
list List service tokens
Use "sts service-token [command] --help" for more information about a command.
```
```
$ sts service-token create --name my-service-token --roles stackstate-k8s-troubleshooter
✅ Service token created: svctok-M101Ky4Ol6wffCm6hFp23npL9klLTsLH
```
* 安裝 OBS extension

* 點選 Install


## 寄信功能設定
* 新增一個 notification ,channel 選擇 E-mail,設定後可以點擊 TEST 測試寄出信



## 解除安裝
* OBS Cluster
```
$ helm -n suse-observability uninstall suse-observability
```
* OBS Agent
```
$ helm -n suse-observability uninstall suse-observability-agent
$ helm -n open-telemetry uninstall opentelemetry-collector
```
## 參考
https://docs.stackstate.com/get-started/k8s-suse-rancher-prime#license-key
https://play.stackstate.com/#/components/urn:kubernetes:%2Francher-rke2-cluster-0%2Fpod%2F2c829a95-cafc-413a-9476-a1b4b53b72e6/metrics?detachedFilters=cluster-name%3Arancher-rke2-cluster-0&timeRange=1741130921846_1741152521846