# rke logging 只收集 k8s 元件 log ## 測試背景 * rke1 所建立的 k8s 集群,k8s 系統基礎元件(如 kubelet、etcd、kube-scheduler等)都是以 docker 容器方式部署。在使用 rancher v2.6+ 版本,建議升級 rancher logging 到 v2 版本。 rancher logging 基於開源的 logging operator,對應的 git repo:https://github.com/kube-logging/logging-operator 和文件位址:https://kube-logging.dev/docs/ 。 * logging operator 預設只針對 k8s pod 日誌和 k8s events,對於主機上 docker run 運行的容器日誌不是很友善。因此針對這個問題,rancher 在原有的 logging operator 基礎上增加了 rancher-logging-rke-aggregator DaemonSet 元件,用以蒐集 rke1 k8s 系統基礎元件容器日誌。對應的 chart 配置 https://github.com/rancher/charts/tree/release-v2.8/charts/rancher-logging/103.1.2%2Bup4.4.0/templates/loggings/rke 。 * 但在測試中發現, rancher-logging-rke-aggregator 蒐集的日誌只是rke1 k8s 系統組件的原始日誌,並沒有rke1 k8s 系統組件對應的標籤等元數據,因此蒐集到的日誌無法進行區分是什麼組件的紀錄,也因為沒有標籤等元數據,如果想把 rke1 k8s 發送到特有的 topic,那麼在添加 ClusterFlows 或 Flows 時將無法進行過濾。 ## 解決方式 ### 建立 HostTailer * 針對上述日誌無法區分的問題,可以考慮將 rke1 k8s 系統元件的原始日誌檔案掛載到某個容器中,然後再在容器中去列印這些日誌,從而將這些日誌轉換為這個容器的標準輸出日誌。例如將 kubelet 容器日誌掛載到名為 kubelet-log 的容器中,這樣 logging operator 收集到 kubelet-log 這個容器的日誌,也就表示蒐集到 kubelet 的日誌。 * logging operator 支援 HostTailer 功能,配置後會自動建立一個 DaemonSet 服務,然後將預先設定的容器日誌分別掛載到對應的容器中。可以參考以下配置,或 https://kube-logging.dev/docs/configuration/extensions/kubernetes-host-tailer/ 。 ## 實作 * 為了避免日誌重複發送,建議在 cluster tool 中編輯 rancher logging app,將 rke.enabled 設為 false,這樣將停用 rancher-logging-rke-aggregator DaemonSet ,避免 rke1 系統元件日誌重複傳送。 ![image](https://hackmd.io/_uploads/Skgrf5CpA.png) 1. `/var/pos` 目錄為 fluent-bit 的索引 db 目錄,預設是掛載到主機上用以持久化數據,以避免 pod 重啟後重複列印日誌,導致日誌重複蒐集。可根據實際情況進行修改。 2. `/var/lib/docker` docker root 目錄,根據實際情況進行修改。 3. 如果有節點打了污點,那麼就需要在 workloadOverrides 中加入 tolerations。 ``` apiVersion: logging-extensions.banzaicloud.io/v1alpha1 kind: HostTailer metadata: name: rke1-k8s-system-component-logs namespace: cattle-logging-system spec: fileTailers: - containerOverrides: volumeMounts: - mountPath: /var/lib/rancher name: var-lib-rancher-rke-log - mountPath: /var/lib/docker name: var-lib-docker-rke-log - mountPath: /var/pos name: positions name: etcd-log path: /var/lib/rancher/rke/log/etcd_*.log - containerOverrides: volumeMounts: - mountPath: /var/lib/rancher name: var-lib-rancher-rke-log - mountPath: /var/lib/docker name: var-lib-docker-rke-log - mountPath: /var/pos name: positions name: etcd-rolling-snapshots-log path: /var/lib/rancher/rke/log/etcd-rolling-snapshots*.log - containerOverrides: volumeMounts: - mountPath: /var/lib/rancher name: var-lib-rancher-rke-log - mountPath: /var/pos name: positions - mountPath: /var/lib/docker name: var-lib-docker-rke-log name: kube-apiserver-log path: /var/lib/rancher/rke/log/kube-apiserver*.log - containerOverrides: volumeMounts: - mountPath: /var/lib/rancher name: var-lib-rancher-rke-log - mountPath: /var/pos name: positions - mountPath: /var/lib/docker name: var-lib-docker-rke-log name: kube-controller-manager-log path: /var/lib/rancher/rke/log/kube-controller-manager*.log - containerOverrides: volumeMounts: - mountPath: /var/lib/rancher name: var-lib-rancher-rke-log - mountPath: /var/pos name: positions - mountPath: /var/lib/docker name: var-lib-docker-rke-log name: kubelet-log path: /var/lib/rancher/rke/log/kubelet*.log - containerOverrides: volumeMounts: - mountPath: /var/lib/rancher name: var-lib-rancher-rke-log - mountPath: /var/pos name: positions - mountPath: /var/lib/docker name: var-lib-docker-rke-log name: kube-proxy-log path: /var/lib/rancher/rke/log/kube-proxy_*.log - containerOverrides: volumeMounts: - mountPath: /var/lib/rancher name: var-lib-rancher-rke-log - mountPath: /var/pos name: positions - mountPath: /var/lib/docker name: var-lib-docker-rke-log name: kube-scheduler-log path: /var/lib/rancher/rke/log/kube-scheduler_*.log workloadOverrides: # tolerations: # - effect: string # key: string # operator: string # tolerationSeconds: int # value: string volumes: - hostPath: path: /var/lib/rancher name: var-lib-rancher-rke-log - hostPath: path: /var/lib/docker name: var-lib-docker-rke-log - hostPath: path: /var/pos name: positions ``` * 建立好後會在 cattle-logging-system namespace 中建立 rke1-k8s-system-component-logs-host-tailer DaemonSet ``` $ kubectl -n cattle-logging-system get ds NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE rancher-logging-root-fluentbit 4 4 4 4 4 kubernetes.io/os=linux 46d rke1-k8s-system-component-logs-host-tailer 4 4 4 4 4 <none> 64s ``` ![image](https://hackmd.io/_uploads/B1o8XqR6A.png) ## 建立 ClusterOutputs & ClusterFlows * 以下使用 kafka 測試 * 以下示範只收集 kubelet log ``` apiVersion: logging.banzaicloud.io/v1beta1 kind: ClusterOutput metadata: name: kafka namespace: cattle-logging-system spec: kafka: brokers: my-cluster-kafka-bootstrap.kafka:9092 default_topic: andy2 format: type: json buffer: chunk_limit_size: 100Ki --- apiVersion: logging.banzaicloud.io/v1beta1 kind: ClusterFlow metadata: name: "kafka" namespace: "cattle-logging-system" spec: globalOutputRefs: - "kafka" match: - select: container_names: - kubelet-log ``` * 也可以同時收集多個 k8s 元件 log ![image](https://hackmd.io/_uploads/BJ7yu9A6A.png) ## 驗證 * kafka 驗證是否收到 kubelet log ``` $ kubectl -n kafka run kafka-consumer -ti --image=quay.io/strimzi/kafka:0.41.0-kafka-3.7.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic andy2 --from-beginning ...... {"log":"I0923 07:18:05.725865 19282 scope.go:117] \"RemoveContainer\" containerID=\"d1e6f6a427ffe4cac72b07d0c80178a07f590a5598b7bb0dc128eb3d4ecee8c9\"","stream":"stderr","time":"2024-09-23T07:18:05.726532698Z","kubernetes":{"pod_name":"rke1-k8s-system-component-logs-host-tailer-wjhfh","namespace_name":"cattle-logging-system","pod_id":"01537b87-baba-4837-bc95-9416697d0eae","labels":{"app.kubernetes.io/instance":"rke1-k8s-system-component-logs-host-tailer","app.kubernetes.io/name":"host-tailer","controller-revision-hash":"65469789fb","pod-template-generation":"1"},"annotations":{"cni.projectcalico.org/containerID":"c38f3ef2d7bf42608fccdc2fe939e500f74828e50d9490d8fd080a970af9a3f0","cni.projectcalico.org/podIP":"10.42.3.133/32","cni.projectcalico.org/podIPs":"10.42.3.133/32"},"host":"rke-m3","container_name":"kubelet-log","docker_id":"e39b99207dc3acda4f6518ee2b451f78c895c525b8a21f3e56f5f4e673ea4c28","container_hash":"fluent/fluent-bit@sha256:1c8bdb90eb65902a65b7cd32126a621690dc36128fef78951775afbe37dfa01f","container_image":"fluent/fluent-bit:2.1.8"}} {"log":"E0923 07:19:48.799249 5048 remote_runtime.go:432] \"ContainerStatus from runtime service failed\" err=\"rpc error: code = Canceled desc = context canceled\" containerID=\"b6e2f943e0fc74e3cdb4981399a6ffefa47325b190e6cebb38e3311a060ce061\"","stream":"stderr","time":"2024-09-23T07:19:48.799827691Z","kubernetes":{"pod_name":"rke1-k8s-system-component-logs-host-tailer-4dzpk","namespace_name":"cattle-logging-system","pod_id":"a245fd3b-f153-496b-a5fe-3de5efcb6639","labels":{"app.kubernetes.io/instance":"rke1-k8s-system-component-logs-host-tailer","app.kubernetes.io/name":"host-tailer","controller-revision-hash":"65469789fb","pod-template-generation":"1"},"annotations":{"cni.projectcalico.org/containerID":"3d810f20cbabff6fc177839081db340827ec94597deebc78a9276e0a810956d9","cni.projectcalico.org/podIP":"10.42.0.71/32","cni.projectcalico.org/podIPs":"10.42.0.71/32"},"host":"rke-m1","container_name":"kubelet-log","docker_id":"673d782aebf6a9a7aee2ce3d32ab183784f35b75ec2dbd94bb6fb2393f92fb43","container_image":"fluent/fluent-bit:2.1.8"} ``` ## 參考 https://www.xtplayer.cn/rancher/rancher-logging-v2-collect-rke1-k8s-logs/#%E5%88%9B%E5%BB%BA-ClusterFlows-%E6%88%96%E8%80%85-Flows