本篇介紹如何使用OpenShift Logging 6.x 搭配Loki 實現Log-based Alert
- OpenShift 4.16+
- Logging Operator 6+
- Loki Operator 6+
- Cluster Observability Operator

邏輯架構大致如下

1. 參考 https://docs.redhat.com/en/documentation/red_hat_openshift_logging/6.3/html/installing_logging/installing-logging#installing-loki-and-logging-gui_installing-logging
安裝
- Logging Operator 6+
- Loki Operator 6+
- Cluster Observability Operator
並建立LokiStack, 需留意要安裝在手冊內指定的Namespace
並且在spec.rules啟用rules功能
rules內的selector用來規範loki rule 查詢哪些符合標準的alertingRule
LokiStack
```yaml=
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
annotations:
name: lokistack-sample
namespace: openshift-logging
spec:
hashRing:
type: memberlist
limits:
global:
queries:
queryTimeout: 3m
managementState: Managed
rules:
enabled: true
selector:
matchLabels:
openshift.io/cluster-monitoring: 'true'
size: 1x.extra-small
storage:
schemas:
- effectiveDate: '2020-10-11'
version: v11
secret:
credentialMode: static
name: logging-loki-s3
type: s3
storageClassName: gp2-csi
tenants:
mode: openshift-logging
```
完成安裝 Cluster logging operator後,建立clusterlogforwarder
**需留意要特別在spec.inputs.infrastructure內指定來源,要加上node, 才會有journal log**
Cluster Log forwarder
```yaml=
apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
inputs:
- infrastructure:
sources:
- node # <------- 一定要有這一段
- container
name: infra-logs
type: infrastructure
managementState: Managed
outputs:
- lokiStack:
authentication:
token:
from: serviceAccount
target:
name: lokistack-sample
namespace: openshift-logging
name: lokistack-out
tls:
ca:
configMapName: openshift-service-ca.crt
key: service-ca.crt
type: lokiStack
pipelines:
- inputRefs:
- application
- infra-logs
name: infra-app-logs
outputRefs:
- lokistack-out
serviceAccount:
name: logging-collector
```
建立Loki的AlertingRule
**一定要加上label, 且要與rule內的selector吻合**
AlertingRule
這個規則會過濾所有Lokistack內 `log_type=infrastructure`的log , 選出含有 `soft lockup` 的log , 用json格式轉換後 ,最後只列出 `log_source=node`的log
只要一天內出現一次就會發出告警
```yaml=
apiVersion: loki.grafana.com/v1
kind: AlertingRule
metadata
labels:
openshift.io/cluster-monitoring: 'true
name: loki-operator-alerts-01
namespace: openshift-logging
spec:
groups:
- interval: 1m
name: soft-lockup-alert
rules:
- alert: SoftLockupDetected
annotations:
description: |
Watchdog BUG: soft lockup found in logs on node - {{ $labels.k8s_node_name }} .
Full message - {{ $labels.message }}
summary: Soft lockup detected in kernel logs
expr: |
count_over_time(
{ log_type=~"infrastructure" } |~ "soft lockup" | json | log_source="node"
[1d]
) > 0
for: 0s
labels:
severity: critical
tenantID: infrastructure
```
套用完成後會在OpenShift內的Alert頁面看到

我們可以手動在節點上手動產生符合條件的log
```yaml=
[lab-user@bastion ~]$ oc debug node/ip-10-0-53-13.ap-southeast-1.compute.internal
Temporary namespace openshift-debug-kq82v is created for debugging node...
Starting pod/ip-10-0-53-13ap-southeast-1computeinternal-debug-xkj22 ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.53.13
If you don't see a command prompt, try pressing enter.
sh-5.1# chroot /host
sh-5.1# logger -t kernel "watchdog: BUG: soft lockup - CPU#$((RANDOM % 32)) stuck for $((RANDOM % 500 + 100))s! [kube-rbac-proxy:$((RANDOM % 9000 + 1000))]"
sh-5.1# logger -t kernel "watchdog: BUG: soft lockup - CPU#$((RANDOM % 32)) stuck for $((RANDOM % 500 + 100))s! [kube-rbac-proxy:$((RANDOM % 9000 + 1000))]"
sh-5.1# logger -t kernel "watchdog: BUG: soft lockup - CPU#$((RANDOM % 32)) stuck for $((RANDOM % 500 + 100))s! [kube-rbac-proxy:$((RANDOM % 9000 + 1000))]"
sh-5.1# logger -t kernel "watchdog: BUG: soft lockup - CPU#$((RANDOM % 32)) stuck for $((RANDOM % 500 + 100))s! [kube-rbac-proxy:$((RANDOM % 9000 + 1000))]"
sh-5.1# logger -t kernel "watchdog: BUG: soft lockup - CPU#$((RANDOM % 32)) stuck for $((RANDOM % 500 + 100))s! [kube-rbac-proxy:$((RANDOM % 9000 + 1000))]"
sh-5.1# logger -t kernel "watchdog: BUG: soft lockup - CPU#$((RANDOM % 32)) stuck for $((RANDOM % 500 + 100))s! [kube-rbac-proxy:$((RANDOM % 9000 + 1000))]"
sh-5.1# logger -t kernel "watchdog: BUG: soft lockup - CPU#$((RANDOM % 32)) stuck for $((RANDOM % 500 + 100))s! [kube-rbac-proxy:$((RANDOM % 9000 + 1000))]"
sh-5.1# logger -t kernel "watchdog: BUG: soft lockup - CPU#$((RANDOM % 32)) stuck for $((RANDOM % 500 + 100))s! [kube-rbac-proxy:$((RANDOM % 9000 + 1000))]"
sh-5.1# logger -t kernel "watchdog: BUG: soft lockup - CPU#$((RANDOM % 32)) stuck for $((RANDOM % 500 + 100))s! [kube-rbac-proxy:$((RANDOM % 9000 + 1000))]"
sh-5.1# logger -t kernel "watchdog: BUG: soft lockup - CPU#$((RANDOM % 32)) stuck for $((RANDOM % 500 + 100))s! [kube-rbac-proxy:$((RANDOM % 9000 + 1000))]"
sh-5.1# logger -t kernel "watchdog: BUG: soft lockup - CPU#$((RANDOM % 32)) stuck for $((RANDOM % 500 + 100))s! [kube-rbac-proxy:$((RANDOM % 9000 + 1000))]"
sh-5.1# logger -t kernel "watchdog: BUG: soft lockup - CPU#$((RANDOM % 32)) stuck for $((RANDOM % 500 + 100))s! [kube-rbac-proxy:$((RANDOM % 9000 + 1000))]"
```
成功觸發

## 參考資料
Custom logging alerts -
https://docs.redhat.com/en/documentation/red_hat_openshift_logging/6.3/html/logging_alerts/custom-logging-alerts-1#configuring-logging-loki-ruler_custom-logging-
Cluster Log forwarder example - https://github.com/openshift/cluster-logging-operator/blob/0bbb53dc1ebbfa9838339ea5667d3982fd3f2095/docs/reference/samples/observability.inputs-app-audit-infra.yaml#L4