# OBS Cluster安裝重點 - Online版
## 1.安裝重點注意事項
1. 需要安裝CSI - local-path-storage,如果沒有default storage class,請把這個設為default。
2. 因為是hadoop cluster的關係,有個node會比較需要更多的RAM。
3. Obs功能目前仍舊需要trial key,key過期請找Sam。
4. trial mode沒有HA機制。
5. 安裝統一透過helm,搭配cluster kubeconfig後可以直接在harbor node上透過helm進行安裝。
6. 全環境請設定固定IP。
7. traces功能需要AP有使用opentelementry套件。
8. Harbor registry相關的repo需設定為public。
9. 如果忘了加入local-path-storage就建立obs的話,重裝的時候請先刪除錯誤的PVC。
10. join cluster需要安裝collector。
## 2.IP設定指令與須知
1. 確認可用網路介面:nmcli connection
2. 透過nmcli設定網卡:nmcli connection modify ens33 ipv4.addresses 192.168.1.210/24 ipv4.gateway 192.168.1.254 ipv4.dns 192.168.1.100 ipv4.method manual
3. 確認映像檔倉庫可以可偵測到(e.g. harbor):ping harbor.example.com
4. 確認要使用的fqdn正常:dig obs1.example.com
5. 確認otel fqdn正常:dig otlp-stackstate.example.com
6. 先存好obs cluster與agent cluster各一份 config
### 2.1. IP紀錄
obsm1: 192.168.1.210
obsw1: 192.168.1.211
obsw2: 192.168.1.212
obsw3: 192.168.1.213
obsw4: 192.168.1.214
## 3. 安裝local-path-storage預設storage backend
修正image URL: harbor.example.com/
若客戶registry位置不同,請記得更換,此檔案已經在離線安裝包中。
local-path-storage.yaml
:::spoiler
```yaml!
apiVersion: v1
kind: Namespace
metadata:
name: local-path-storage
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: local-path-provisioner-service-account
namespace: local-path-storage
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: local-path-provisioner-role
namespace: local-path-storage
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch", "create", "patch", "update", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: local-path-provisioner-role
rules:
- apiGroups: [""]
resources: ["nodes", "persistentvolumeclaims", "configmaps", "pods", "pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "patch", "update", "delete"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "patch"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: local-path-provisioner-bind
namespace: local-path-storage
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: local-path-provisioner-role
subjects:
- kind: ServiceAccount
name: local-path-provisioner-service-account
namespace: local-path-storage
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: local-path-provisioner-bind
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: local-path-provisioner-role
subjects:
- kind: ServiceAccount
name: local-path-provisioner-service-account
namespace: local-path-storage
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: local-path-provisioner
namespace: local-path-storage
spec:
replicas: 1
selector:
matchLabels:
app: local-path-provisioner
template:
metadata:
labels:
app: local-path-provisioner
spec:
serviceAccountName: local-path-provisioner-service-account
containers:
- name: local-path-provisioner
image: harbor.example.com/rancher/local-path-provisioner:v0.0.28
imagePullPolicy: IfNotPresent
command:
- local-path-provisioner
- --debug
- start
- --config
- /etc/config/config.json
volumeMounts:
- name: config-volume
mountPath: /etc/config/
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: CONFIG_MOUNT_PATH
value: /etc/config/
volumes:
- name: config-volume
configMap:
name: local-path-config
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-path
provisioner: rancher.io/local-path
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
---
kind: ConfigMap
apiVersion: v1
metadata:
name: local-path-config
namespace: local-path-storage
data:
config.json: |-
{
"nodePathMap":[
{
"node":"DEFAULT_PATH_FOR_NON_LISTED_NODES",
"paths":["/opt/local-path-provisioner"]
}
]
}
setup: |-
#!/bin/sh
set -eu
mkdir -m 0777 -p "$VOL_DIR"
teardown: |-
#!/bin/sh
set -eu
rm -rf "$VOL_DIR"
helperPod.yaml: |-
apiVersion: v1
kind: Pod
metadata:
name: helper-pod
spec:
priorityClassName: system-node-critical
tolerations:
- key: node.kubernetes.io/disk-pressure
operator: Exists
effect: NoSchedule
containers:
- name: helper-pod
image: registry.suse.com/bci/bci-busybox:15.6
imagePullPolicy: IfNotPresent
```
:::
:::warning
從Rancher介面中要改成default storage,有些PVC建立的時候會需要default backend。
:::
## 4. helm install OBS 應用服務。
下載o11y helm charts
```shell=
# helm repo add suse-observability https://charts.rancher.com/server-charts/prime/suse-observability
# helm repo update
# helm fetch suse-observability/suse-observability
# helm fetch suse-observability/suse-observability-values
```
### 4.1. 前置作業產出values相關參數檔
:::warning
注意baseUrl,這個是存取OBS服務的網址,需要DNS相關設定。
請指向到OBS Cluster其中一個worker即可。
Production請搭配DNS+LB進行部署。
:::
```shell!
export VALUES_DIR=.
helm template \
--set license='XXXXX-XXXXX-XXXXX' \
--set baseUrl='https://obs1.example.com' \
--set sizing.profile='trial' \
suse-observability-values suse-observability-values-1.0.7.tgz\
--output-dir $VALUES_DIR
```
### 4.2. 確認ingress組態資訊(ingress-values.yaml)
:::warning
注意host,這個是存取OBS服務的網址,需要DNS相關設定。
請指向到OBS Cluster其中一個worker即可。
Production請搭配DNS+LB進行部署。
請將以下內容另存成ingress-values.yaml檔。
:::
```shell!
ingress:
enabled: true
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
hosts:
- host: obs1.example.com
```
### 4.3. 確認traces組態資訊(ingress_otel_values.yaml)
:::warning
注意host,這個是存取OBS trace服務的網址,需要DNS相關設定。
請指向到OBS Cluster其中一個worker即可。
Production請搭配DNS+LB進行部署。
請將以下內容另存成ingress_otel_values.yaml檔。
:::
```shell!
opentelemetry-collector:
ingress:
enabled: true
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/backend-protocol: GRPC
hosts:
- host: otlp-stackstate.example.com
paths:
- path: /
pathType: Prefix
port: 4317
additionalIngresses:
- name: otlp-http
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
hosts:
- host: otlp-http-stackstate.example.com
paths:
- path: /
pathType: Prefix
port: 4318
```
### 4.4. 安裝OBS Cluster
```shell!
helm \
upgrade --install \
--namespace suse-observability \
--create-namespace \
--values $VALUES_DIR/suse-observability-values/templates/baseConfig_values.yaml \
--values $VALUES_DIR/suse-observability-values/templates/sizing_values.yaml \
--values ingress-values.yaml \
--values ingress_otel_values.yaml \
suse-observability \
suse-observability-2.3.0.tgz
```
指令執行後結果
:::spoiler
```shell!
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: obs.config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: obs.config
Release "suse-observability" does not exist. Installing it now.
coalesce.go:237: warning: skipped value for kafka.zookeeper.topologySpreadConstraints: Not a table.
coalesce.go:237: warning: skipped value for clickhouse.zookeeper.topologySpreadConstraints: Not a table.
coalesce.go:286: warning: cannot overwrite table with non table for suse-observability.opentelemetry-collector.config.exporters.logging (map[])
coalesce.go:286: warning: cannot overwrite table with non table for suse-observability.opentelemetry-collector.config.receivers.jaeger (map[protocols:map[grpc:map[endpoint:${env:MY_POD_IP}:14250] thrift_compact:map[endpoint:${env:MY_POD_IP}:6831] thrift_http:map[endpoint:${env:MY_POD_IP}:14268]]])
coalesce.go:286: warning: cannot overwrite table with non table for suse-observability.opentelemetry-collector.config.receivers.prometheus (map[config:map[scrape_configs:[map[job_name:opentelemetry-collector scrape_interval:10s static_configs:[map[targets:[${env:MY_POD_IP}:8888]]]]]]])
coalesce.go:286: warning: cannot overwrite table with non table for suse-observability.opentelemetry-collector.config.receivers.zipkin (map[endpoint:${env:MY_POD_IP}:9411])
NAME: suse-observability
LAST DEPLOYED: Fri Nov 29 23:54:46 2024
NAMESPACE: suse-observability
STATUS: deployed
REVISION: 1
TEST SUITE: None
```
:::
## 5.叢集註冊指令
worker cluster註冊指令
:::info
1. 相關指令是在Obs介面中產生,其指令產生時非全離線使用,請依照下面的sample進行修改。
2. 修改完畢後,請注意此指令需要搭配欲加入的叢集之kubeconfig,由helm進行安裝。
3. helm chart download方式請參考 4. helm install OBS 應用服務。
:::
Join Sample
:::spoiler
```shell!
helm repo add suse-observability https://charts.rancher.com/server-charts/prime/suse-observability
helm repo update
helm fetch suse-observability/suse-observability-agent
helm upgrade --install \
--namespace suse-observability \
--create-namespace \
--set-string 'stackstate.apiKey'='2lkBE1svp5POJu5YQDdYheE582amN8qY' \
--set-string 'stackstate.cluster.name'='democluster' \
--set-string 'stackstate.url'='https://obs1.example.com/receiver/stsAgent' \
--set-string 'global.skipSslValidation'='true' \
--set-string 'nodeAgent.skipSslValidation'='true' \
--set-string 'clusterAgent.skipSslValidation'='true' \
--set-string 'logsAgent.skipSslValidation'='true' \
--set-string 'checksAgent.skipSslValidation'='true' \
suse-observability-agent ./suse-observability-agent-1.0.20.tgz
```
:::
## 6. 安裝collector
如果需要收集AP的traces資料,需要安裝open telemetry collector
請依照以下內容建立 otel-collector.yaml 檔
:::spoiler
```shell!
extraEnvsFrom:
- secretRef:
name: open-telemetry-collector
mode: deployment
image:
repository: "otel/opentelemetry-collector-k8s"
ports:
metrics:
enabled: true
presets:
kubernetesAttributes:
enabled: true
extractAllPodLabels: true
config:
extensions:
bearertokenauth:
scheme: SUSEObservability
token: "2lkBE1svp5POJu5YQDdYheE582amN8qY"
exporters:
otlp/stackstate:
auth:
authenticator: bearertokenauth
endpoint: otlp-stackstate.example.com:443
tls:
insecure: false
insecure_skip_verify: true
otlphttp/stackstate:
auth:
authenticator: bearertokenauth
endpoint: otlp-http-stackstate.example.com:4318
tls:
insecure: false
insecure_skip_verify: true
processors:
tail_sampling:
decision_wait: 10s
policies:
- name: rate-limited-composite
type: composite
composite:
max_total_spans_per_second: 500
policy_order: [errors, slow-traces, rest]
composite_sub_policy:
- name: errors
type: status_code
status_code:
status_codes: [ ERROR ]
- name: slow-traces
type: latency
latency:
threshold_ms: 1000
- name: rest
type: always_sample
rate_allocation:
- policy: errors
percent: 33
- policy: slow-traces
percent: 33
- policy: rest
percent: 34
resource:
attributes:
- key: k8s.cluster.name
action: upsert
value: rhelcluster
- key: service.instance.id
from_attribute: k8s.pod.uid
action: insert
filter/dropMissingK8sAttributes:
error_mode: ignore
traces:
span:
- resource.attributes["k8s.node.name"] == nil
- resource.attributes["k8s.pod.uid"] == nil
- resource.attributes["k8s.namespace.name"] == nil
- resource.attributes["k8s.pod.name"] == nil
connectors:
spanmetrics:
metrics_expiration: 5m
namespace: otel_span
routing/traces:
error_mode: ignore
table:
- statement: route()
pipelines: [traces/sampling, traces/spanmetrics]
service:
extensions:
- health_check
- bearertokenauth
pipelines:
traces:
receivers: [otlp]
processors: [filter/dropMissingK8sAttributes, memory_limiter, resource]
exporters: [routing/traces]
traces/spanmetrics:
receivers: [routing/traces]
processors: []
exporters: [spanmetrics]
traces/sampling:
receivers: [routing/traces]
processors: [tail_sampling, batch]
exporters: [debug, otlp/stackstate]
metrics:
receivers: [otlp, spanmetrics, prometheus]
processors: [memory_limiter, resource, batch]
exporters: [debug, otlp/stackstate]
```
:::
helm charts
```shell!
# helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
# helm fetch open-telemetry/opentelemetry-collector
# ls -al
total 36
drwxr-xr-x 1 root root 70 Dec 5 14:48 .
drwxr-xr-x 1 root root 200 Nov 30 08:57 ..
-rw-r--r-- 1 root root 35260 Dec 5 14:47 opentelemetry-collector-0.117.3.tgz
```
install
stackstate-api-key由OBS介面取得。
```shell!
kubectl create ns open-telemetry
kubectl create secret generic open-telemetry-collector \
--namespace open-telemetry \
--from-literal=API_KEY='2lkBE1svp5POJu5YQDdYheE582amN8qY'
helm upgrade \
--install opentelemetry-collector opentelemetry-collector-0.117.3.tgz \
--values otel-collector.yaml \
--namespace open-telemetry
```
## 7.解除安裝
OBS Cluster
```shell!
helm \
-n suse-observability \
uninstall suse-observability
```
解除安裝 - OBS Agent
```shell!
harbor:~/work/obs-agent1 # helm -n suse-observability uninstall suse-observability-agent
```
## 8.參考資料
[1. SUSE Observability docs! ](https://docs.stackstate.com/)