# OBS Cluster安裝重點 - Online版 ## 1.安裝重點注意事項 1. 需要安裝CSI - local-path-storage,如果沒有default storage class,請把這個設為default。 2. 因為是hadoop cluster的關係,有個node會比較需要更多的RAM。 3. Obs功能目前仍舊需要trial key,key過期請找Sam。 4. trial mode沒有HA機制。 5. 安裝統一透過helm,搭配cluster kubeconfig後可以直接在harbor node上透過helm進行安裝。 6. 全環境請設定固定IP。 7. traces功能需要AP有使用opentelementry套件。 8. Harbor registry相關的repo需設定為public。 9. 如果忘了加入local-path-storage就建立obs的話,重裝的時候請先刪除錯誤的PVC。 10. join cluster需要安裝collector。 ## 2.IP設定指令與須知 1. 確認可用網路介面:nmcli connection 2. 透過nmcli設定網卡:nmcli connection modify ens33 ipv4.addresses 192.168.1.210/24 ipv4.gateway 192.168.1.254 ipv4.dns 192.168.1.100 ipv4.method manual 3. 確認映像檔倉庫可以可偵測到(e.g. harbor):ping harbor.example.com 4. 確認要使用的fqdn正常:dig obs1.example.com 5. 確認otel fqdn正常:dig otlp-stackstate.example.com 6. 先存好obs cluster與agent cluster各一份 config ### 2.1. IP紀錄 obsm1: 192.168.1.210 obsw1: 192.168.1.211 obsw2: 192.168.1.212 obsw3: 192.168.1.213 obsw4: 192.168.1.214 ## 3. 安裝local-path-storage預設storage backend 修正image URL: harbor.example.com/ 若客戶registry位置不同,請記得更換,此檔案已經在離線安裝包中。 local-path-storage.yaml :::spoiler ```yaml! apiVersion: v1 kind: Namespace metadata: name: local-path-storage --- apiVersion: v1 kind: ServiceAccount metadata: name: local-path-provisioner-service-account namespace: local-path-storage --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: local-path-provisioner-role namespace: local-path-storage rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "list", "watch", "create", "patch", "update", "delete"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: local-path-provisioner-role rules: - apiGroups: [""] resources: ["nodes", "persistentvolumeclaims", "configmaps", "pods", "pods/log"] verbs: ["get", "list", "watch"] - apiGroups: [""] resources: ["persistentvolumes"] verbs: ["get", "list", "watch", "create", "patch", "update", "delete"] - apiGroups: [""] resources: ["events"] verbs: ["create", "patch"] - apiGroups: ["storage.k8s.io"] resources: ["storageclasses"] verbs: ["get", "list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: local-path-provisioner-bind namespace: local-path-storage roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: local-path-provisioner-role subjects: - kind: ServiceAccount name: local-path-provisioner-service-account namespace: local-path-storage --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: local-path-provisioner-bind roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: local-path-provisioner-role subjects: - kind: ServiceAccount name: local-path-provisioner-service-account namespace: local-path-storage --- apiVersion: apps/v1 kind: Deployment metadata: name: local-path-provisioner namespace: local-path-storage spec: replicas: 1 selector: matchLabels: app: local-path-provisioner template: metadata: labels: app: local-path-provisioner spec: serviceAccountName: local-path-provisioner-service-account containers: - name: local-path-provisioner image: harbor.example.com/rancher/local-path-provisioner:v0.0.28 imagePullPolicy: IfNotPresent command: - local-path-provisioner - --debug - start - --config - /etc/config/config.json volumeMounts: - name: config-volume mountPath: /etc/config/ env: - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: CONFIG_MOUNT_PATH value: /etc/config/ volumes: - name: config-volume configMap: name: local-path-config --- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: local-path provisioner: rancher.io/local-path volumeBindingMode: WaitForFirstConsumer reclaimPolicy: Delete --- kind: ConfigMap apiVersion: v1 metadata: name: local-path-config namespace: local-path-storage data: config.json: |- { "nodePathMap":[ { "node":"DEFAULT_PATH_FOR_NON_LISTED_NODES", "paths":["/opt/local-path-provisioner"] } ] } setup: |- #!/bin/sh set -eu mkdir -m 0777 -p "$VOL_DIR" teardown: |- #!/bin/sh set -eu rm -rf "$VOL_DIR" helperPod.yaml: |- apiVersion: v1 kind: Pod metadata: name: helper-pod spec: priorityClassName: system-node-critical tolerations: - key: node.kubernetes.io/disk-pressure operator: Exists effect: NoSchedule containers: - name: helper-pod image: registry.suse.com/bci/bci-busybox:15.6 imagePullPolicy: IfNotPresent ``` ::: :::warning 從Rancher介面中要改成default storage,有些PVC建立的時候會需要default backend。 ::: ## 4. helm install OBS 應用服務。 下載o11y helm charts ```shell= # helm repo add suse-observability https://charts.rancher.com/server-charts/prime/suse-observability # helm repo update # helm fetch suse-observability/suse-observability # helm fetch suse-observability/suse-observability-values ``` ### 4.1. 前置作業產出values相關參數檔 :::warning 注意baseUrl,這個是存取OBS服務的網址,需要DNS相關設定。 請指向到OBS Cluster其中一個worker即可。 Production請搭配DNS+LB進行部署。 ::: ```shell! export VALUES_DIR=. helm template \ --set license='XXXXX-XXXXX-XXXXX' \ --set baseUrl='https://obs1.example.com' \ --set sizing.profile='trial' \ suse-observability-values suse-observability-values-1.0.7.tgz\ --output-dir $VALUES_DIR ``` ### 4.2. 確認ingress組態資訊(ingress-values.yaml) :::warning 注意host,這個是存取OBS服務的網址,需要DNS相關設定。 請指向到OBS Cluster其中一個worker即可。 Production請搭配DNS+LB進行部署。 請將以下內容另存成ingress-values.yaml檔。 ::: ```shell! ingress: enabled: true annotations: nginx.ingress.kubernetes.io/proxy-body-size: "50m" hosts: - host: obs1.example.com ``` ### 4.3. 確認traces組態資訊(ingress_otel_values.yaml) :::warning 注意host,這個是存取OBS trace服務的網址,需要DNS相關設定。 請指向到OBS Cluster其中一個worker即可。 Production請搭配DNS+LB進行部署。 請將以下內容另存成ingress_otel_values.yaml檔。 ::: ```shell! opentelemetry-collector: ingress: enabled: true annotations: nginx.ingress.kubernetes.io/proxy-body-size: "50m" nginx.ingress.kubernetes.io/backend-protocol: GRPC hosts: - host: otlp-stackstate.example.com paths: - path: / pathType: Prefix port: 4317 additionalIngresses: - name: otlp-http annotations: nginx.ingress.kubernetes.io/proxy-body-size: "50m" hosts: - host: otlp-http-stackstate.example.com paths: - path: / pathType: Prefix port: 4318 ``` ### 4.4. 安裝OBS Cluster ```shell! helm \ upgrade --install \ --namespace suse-observability \ --create-namespace \ --values $VALUES_DIR/suse-observability-values/templates/baseConfig_values.yaml \ --values $VALUES_DIR/suse-observability-values/templates/sizing_values.yaml \ --values ingress-values.yaml \ --values ingress_otel_values.yaml \ suse-observability \ suse-observability-2.3.0.tgz ``` 指令執行後結果 :::spoiler ```shell! WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: obs.config WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: obs.config Release "suse-observability" does not exist. Installing it now. coalesce.go:237: warning: skipped value for kafka.zookeeper.topologySpreadConstraints: Not a table. coalesce.go:237: warning: skipped value for clickhouse.zookeeper.topologySpreadConstraints: Not a table. coalesce.go:286: warning: cannot overwrite table with non table for suse-observability.opentelemetry-collector.config.exporters.logging (map[]) coalesce.go:286: warning: cannot overwrite table with non table for suse-observability.opentelemetry-collector.config.receivers.jaeger (map[protocols:map[grpc:map[endpoint:${env:MY_POD_IP}:14250] thrift_compact:map[endpoint:${env:MY_POD_IP}:6831] thrift_http:map[endpoint:${env:MY_POD_IP}:14268]]]) coalesce.go:286: warning: cannot overwrite table with non table for suse-observability.opentelemetry-collector.config.receivers.prometheus (map[config:map[scrape_configs:[map[job_name:opentelemetry-collector scrape_interval:10s static_configs:[map[targets:[${env:MY_POD_IP}:8888]]]]]]]) coalesce.go:286: warning: cannot overwrite table with non table for suse-observability.opentelemetry-collector.config.receivers.zipkin (map[endpoint:${env:MY_POD_IP}:9411]) NAME: suse-observability LAST DEPLOYED: Fri Nov 29 23:54:46 2024 NAMESPACE: suse-observability STATUS: deployed REVISION: 1 TEST SUITE: None ``` ::: ## 5.叢集註冊指令 worker cluster註冊指令 :::info 1. 相關指令是在Obs介面中產生,其指令產生時非全離線使用,請依照下面的sample進行修改。 2. 修改完畢後,請注意此指令需要搭配欲加入的叢集之kubeconfig,由helm進行安裝。 3. helm chart download方式請參考 4. helm install OBS 應用服務。 ::: Join Sample :::spoiler ```shell! helm repo add suse-observability https://charts.rancher.com/server-charts/prime/suse-observability helm repo update helm fetch suse-observability/suse-observability-agent helm upgrade --install \ --namespace suse-observability \ --create-namespace \ --set-string 'stackstate.apiKey'='2lkBE1svp5POJu5YQDdYheE582amN8qY' \ --set-string 'stackstate.cluster.name'='democluster' \ --set-string 'stackstate.url'='https://obs1.example.com/receiver/stsAgent' \ --set-string 'global.skipSslValidation'='true' \ --set-string 'nodeAgent.skipSslValidation'='true' \ --set-string 'clusterAgent.skipSslValidation'='true' \ --set-string 'logsAgent.skipSslValidation'='true' \ --set-string 'checksAgent.skipSslValidation'='true' \ suse-observability-agent ./suse-observability-agent-1.0.20.tgz ``` ::: ## 6. 安裝collector 如果需要收集AP的traces資料,需要安裝open telemetry collector 請依照以下內容建立 otel-collector.yaml 檔 :::spoiler ```shell! extraEnvsFrom: - secretRef: name: open-telemetry-collector mode: deployment image: repository: "otel/opentelemetry-collector-k8s" ports: metrics: enabled: true presets: kubernetesAttributes: enabled: true extractAllPodLabels: true config: extensions: bearertokenauth: scheme: SUSEObservability token: "2lkBE1svp5POJu5YQDdYheE582amN8qY" exporters: otlp/stackstate: auth: authenticator: bearertokenauth endpoint: otlp-stackstate.example.com:443 tls: insecure: false insecure_skip_verify: true otlphttp/stackstate: auth: authenticator: bearertokenauth endpoint: otlp-http-stackstate.example.com:4318 tls: insecure: false insecure_skip_verify: true processors: tail_sampling: decision_wait: 10s policies: - name: rate-limited-composite type: composite composite: max_total_spans_per_second: 500 policy_order: [errors, slow-traces, rest] composite_sub_policy: - name: errors type: status_code status_code: status_codes: [ ERROR ] - name: slow-traces type: latency latency: threshold_ms: 1000 - name: rest type: always_sample rate_allocation: - policy: errors percent: 33 - policy: slow-traces percent: 33 - policy: rest percent: 34 resource: attributes: - key: k8s.cluster.name action: upsert value: rhelcluster - key: service.instance.id from_attribute: k8s.pod.uid action: insert filter/dropMissingK8sAttributes: error_mode: ignore traces: span: - resource.attributes["k8s.node.name"] == nil - resource.attributes["k8s.pod.uid"] == nil - resource.attributes["k8s.namespace.name"] == nil - resource.attributes["k8s.pod.name"] == nil connectors: spanmetrics: metrics_expiration: 5m namespace: otel_span routing/traces: error_mode: ignore table: - statement: route() pipelines: [traces/sampling, traces/spanmetrics] service: extensions: - health_check - bearertokenauth pipelines: traces: receivers: [otlp] processors: [filter/dropMissingK8sAttributes, memory_limiter, resource] exporters: [routing/traces] traces/spanmetrics: receivers: [routing/traces] processors: [] exporters: [spanmetrics] traces/sampling: receivers: [routing/traces] processors: [tail_sampling, batch] exporters: [debug, otlp/stackstate] metrics: receivers: [otlp, spanmetrics, prometheus] processors: [memory_limiter, resource, batch] exporters: [debug, otlp/stackstate] ``` ::: helm charts ```shell! # helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts # helm fetch open-telemetry/opentelemetry-collector # ls -al total 36 drwxr-xr-x 1 root root 70 Dec 5 14:48 . drwxr-xr-x 1 root root 200 Nov 30 08:57 .. -rw-r--r-- 1 root root 35260 Dec 5 14:47 opentelemetry-collector-0.117.3.tgz ``` install stackstate-api-key由OBS介面取得。 ```shell! kubectl create ns open-telemetry kubectl create secret generic open-telemetry-collector \ --namespace open-telemetry \ --from-literal=API_KEY='2lkBE1svp5POJu5YQDdYheE582amN8qY' helm upgrade \ --install opentelemetry-collector opentelemetry-collector-0.117.3.tgz \ --values otel-collector.yaml \ --namespace open-telemetry ``` ## 7.解除安裝 OBS Cluster ```shell! helm \ -n suse-observability \ uninstall suse-observability ``` 解除安裝 - OBS Agent ```shell! harbor:~/work/obs-agent1 # helm -n suse-observability uninstall suse-observability-agent ``` ## 8.參考資料 [1. SUSE Observability docs! ](https://docs.stackstate.com/)