---
# System prepended metadata

title: 實作 Prometheus Agent，收集 K8s Component Metrics
tags: [工具]

---

# 實作 Prometheus Agent，收集 K8s Component Metrics

## 使用背景


* 預設情況下 etcd、controller-manager、scheduler 和 kube-proxy，它們的 metrics 端口（如 etcd 的 2381，controller-manager 的 10257，scheduler 的 10259，kube-proxy 的 10249）通常都只監聽在 127.0.0.1。
* 而可以透過部署 Prometheus Agent (DaemonSet)，的方式讓 agent 去收集 metrics，並透過 Remote Write 的方式寫入到 Prometheus。


![image](https://hackmd.io/_uploads/HJGUub2v-l.png)

## 部署 monitoring


```
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

$ helm repo update
```

* 檢查安裝版本
```
$ helm search repo prometheus-community/kube-prometheus-stack --versions
```

* 編寫 values.yaml
  - 因為使用 Remote Write 的方式，要開啟 `enableRemoteWriteReceiver` 功能。
```
$ nano monitoring.yaml
prometheus:
  service:
    type: NodePort
    nodePort: 30090
  prometheusSpec:
    enableRemoteWriteReceiver: true
grafana:
  adminPassword: "admin123"
  service:
    type: NodePort
    nodePort: 30080
kubelet:
  enabled: true
  serviceMonitor:
    cAdvisorMetricRelabelings: []
prometheus-node-exporter:
  prometheus:
    monitor:
      enabled: false
# 以下是設定還是啟用 grafana 預設的面板，但不用幫我創建 serviceMonitor 去收集
kubeEtcd:
  enabled: true
  serviceMonitor:
    enabled: false
kubeControllerManager:
  enabled: true
  serviceMonitor:
    enabled: false
kubeScheduler:
  enabled: true
  serviceMonitor:
    enabled: false
kubeProxy:
  enabled: true
  serviceMonitor:
    enabled: false
```

* 部署 `80.14.4` 版本
```
$ helm install monitoring prometheus-community/kube-prometheus-stack --version=80.14.4 --namespace monitoring-system --create-namespace -f monitoring.yaml
```
* 檢視部署狀態
```
$ kubectl -n monitoring-system get pod -owide
NAME                                                     READY   STATUS    RESTARTS   AGE     IP              NODE   NOMINATED NODE   READINESS GATES
alertmanager-monitoring-kube-prometheus-alertmanager-0   2/2     Running   0          6m9s    10.244.190.70   w1     <none>           <none>
monitoring-grafana-6754c8b79d-zlfgd                      3/3     Running   0          6m25s   10.244.190.67   w1     <none>           <none>
monitoring-kube-prometheus-operator-7cc4577cbb-rtm2w     1/1     Running   0          6m25s   10.244.190.66   w1     <none>           <none>
monitoring-kube-state-metrics-6f79bc78d5-zq66d           1/1     Running   0          6m25s   10.244.190.68   w1     <none>           <none>
monitoring-prometheus-node-exporter-5l7fx                1/1     Running   0          6m25s   192.168.1.55    m1     <none>           <none>
monitoring-prometheus-node-exporter-rw4ls                1/1     Running   0          6m25s   192.168.1.56    m2     <none>           <none>
monitoring-prometheus-node-exporter-v92j9                1/1     Running   0          6m25s   192.168.1.58    w1     <none>           <none>
monitoring-prometheus-node-exporter-vzm5r                1/1     Running   0          6m25s   192.168.1.57    m3     <none>           <none>
prometheus-monitoring-kube-prometheus-prometheus-0       2/2     Running   0          6m9s    10.244.190.71   w1     <none>           <none>
```

## 部署 Prometheus Agent
* Prometheus Agent，透過 DaemonSet 的方式部署，分別以下兩個 DaemonSet：
  - prometheus-agent-cp：負責收集 Master 節點上的核心組件的指標 (etcd, kube-controller-manager, kube-scheduler)。
  - prometheus-agent-node：負責收集所有點的 node exporter 指標，和 kube-proxy 的指標。
* Prometheus Agent 會在每個節點透過 127.0.0.1 去收集指標，在由他主動寫入到 prometheus server。

```
$ nano prom-agent.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-agent-cp-config
  namespace: monitoring-system
data:
  prometheus.yml.tpl: |
    global:
      scrape_interval: 15s
      external_labels:
        node_name: ${NODE_NAME}
        node_ip: ${NODE_IP}

    scrape_configs:
      - job_name: 'etcd'
        static_configs:
          - targets: ['127.0.0.1:2381']
        relabel_configs:
          - action: replace
            replacement: "${NODE_IP}:2381"
            target_label: instance

      - job_name: 'kube-controller-manager'
        scheme: https
        tls_config:
          insecure_skip_verify: true
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        static_configs:
          - targets: ['127.0.0.1:10257']
        relabel_configs:
          - action: replace
            replacement: "${NODE_IP}:10257"
            target_label: instance

      - job_name: 'kube-scheduler'
        scheme: https
        tls_config:
          insecure_skip_verify: true
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        static_configs:
          - targets: ['127.0.0.1:10259']
        relabel_configs:
          - action: replace
            replacement: "${NODE_IP}:10259"
            target_label: instance

    remote_write:
      - url: "http://monitoring-kube-prometheus-prometheus.monitoring-system.svc:9090/api/v1/write"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: prometheus-agent-cp
  namespace: monitoring-system
  labels:
    app: prometheus-agent-cp
spec:
  selector:
    matchLabels:
      app: prometheus-agent-cp
  template:
    metadata:
      labels:
        app: prometheus-agent-cp
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      serviceAccountName: monitoring-kube-prometheus-prometheus
      nodeSelector:
        node-role.kubernetes.io/control-plane: ""
      tolerations:
        - key: node-role.kubernetes.io/master
          operator: Exists
          effect: NoSchedule
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule

      initContainers:
        - name: init-config
          image: busybox:latest
          command:
            - sh
            - -c
            - |
              # 不管你的 DaemonSet 部署到哪一台 Node 上，InitContainer 透過 sed 都會自動幫你把該 Node 的專屬資訊寫入設定檔中，Prometheus Agent 抓資料時貼上的標籤（Labels）就會完全正確
              sed -e "s/\${NODE_IP}/$NODE_IP/g" -e "s/\${NODE_NAME}/$NODE_NAME/g" /etc/prometheus-tpl/prometheus.yml.tpl > /etc/prometheus/prometheus.yml
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: NODE_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
          volumeMounts:
            - name: config-volume
              mountPath: /etc/prometheus-tpl
            - name: config-shared
              mountPath: /etc/prometheus

      containers:
        - name: prometheus-agent
          image: quay.io/prometheus/prometheus:v3.9.1
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.agent.path=/prometheus"
            - "--agent"
            - "--web.listen-address=:9091"
            - "--log.level=info"
          volumeMounts:
            - name: config-shared
              mountPath: /etc/prometheus
          resources:
             requests:
               cpu: 50m
               memory: 128Mi
      volumes:
        - name: config-volume
          configMap:
            name: prometheus-agent-cp-config
        - name: config-shared
          emptyDir: {}
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-agent-node-config
  namespace: monitoring-system
data:
  prometheus.yml.tpl: |
    global:
      scrape_interval: 15s
      external_labels:
        node_name: ${NODE_NAME}
        node_ip: ${NODE_IP}

    scrape_configs:
      # 1. 抓取 Node Exporter
      - job_name: 'node-exporter'
        static_configs:
          - targets: ['127.0.0.1:9100']
        relabel_configs:
          - action: replace
            replacement: "${NODE_IP}:9100"
            target_label: instance

      # 2. 抓取 Kube Proxy
      - job_name: 'kube-proxy'
        static_configs:
          - targets: ['127.0.0.1:10249']
        relabel_configs:
          - action: replace
            replacement: "${NODE_IP}:10249"
            target_label: instance

    remote_write:
      - url: "http://monitoring-kube-prometheus-prometheus.monitoring-system.svc:9090/api/v1/write"

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: prometheus-agent-node
  namespace: monitoring-system
  labels:
    app: prometheus-agent-node
spec:
  selector:
    matchLabels:
      app: prometheus-agent-node
  template:
    metadata:
      labels:
        app: prometheus-agent-node
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      serviceAccountName: monitoring-kube-prometheus-prometheus
      tolerations:
        - operator: Exists
        - key: node-role.kubernetes.io/master
          operator: Exists
          effect: NoSchedule
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule

      initContainers:
        - name: init-config
          image: busybox:latest
          command:
            - sh
            - -c
            - |
              sed -e "s/\${NODE_IP}/$NODE_IP/g" -e "s/\${NODE_NAME}/$NODE_NAME/g" /etc/prometheus-tpl/prometheus.yml.tpl > /etc/prometheus/prometheus.yml
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: NODE_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
          volumeMounts:
            - name: config-volume
              mountPath: /etc/prometheus-tpl
            - name: config-shared
              mountPath: /etc/prometheus

      containers:
        - name: prometheus-agent
          image: quay.io/prometheus/prometheus:v3.9.1
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.agent.path=/prometheus"
            - "--agent"
            - "--web.listen-address=:9092"
            - "--log.level=info"
          volumeMounts:
            - name: config-shared
              mountPath: /etc/prometheus
          resources:
             requests:
               cpu: 50m
               memory: 128Mi
      volumes:
        - name: config-volume
          configMap:
            name: prometheus-agent-node-config
        - name: config-shared
          emptyDir: {}
```


* 部署 Prometheus Agent
```
$ kubectl apply -f prom-agent.yaml
```


## 進入 prometheus UI

* 使用 nodeport 方式連進 prometheus UI
* 查詢 `up{job="etcd"}`、`up{job="kube-proxy"}`、`up{job="kube-controller-manager"}`、`up{job="kube-scheduler"}`、`up{job="kube-scheduler"}` 可以看到指標都有收集到


![image](https://hackmd.io/_uploads/H1Tj8Z2wZe.png)

## 進入 grafana UI
* 使用 nodeport 方式連進 grafana UI，預設帳密是 `admin/admin123`
* 點選 Dashboards -> etcd 查看面板都有正確收集

![image](https://hackmd.io/_uploads/HJRbPWhDZx.png)











## 參考
https://prometheus.io/blog/2021/11/16/agent/