# 實作 Prometheus Agent,收集 K8s Component Metrics
## 使用背景
* 預設情況下 etcd、controller-manager、scheduler 和 kube-proxy,它們的 metrics 端口(如 etcd 的 2381,controller-manager 的 10257,scheduler 的 10259,kube-proxy 的 10249)通常都只監聽在 127.0.0.1。
* 而可以透過部署 Prometheus Agent (DaemonSet),的方式讓 agent 去收集 metrics,並透過 Remote Write 的方式寫入到 Prometheus。

## 部署 monitoring
```
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
```
* 檢查安裝版本
```
$ helm search repo prometheus-community/kube-prometheus-stack --versions
```
* 編寫 values.yaml
- 因為使用 Remote Write 的方式,要開啟 `enableRemoteWriteReceiver` 功能。
```
$ nano monitoring.yaml
prometheus:
service:
type: NodePort
nodePort: 30090
prometheusSpec:
enableRemoteWriteReceiver: true
grafana:
adminPassword: "admin123"
service:
type: NodePort
nodePort: 30080
kubelet:
enabled: true
serviceMonitor:
cAdvisorMetricRelabelings: []
prometheus-node-exporter:
prometheus:
monitor:
enabled: false
# 以下是設定還是啟用 grafana 預設的面板,但不用幫我創建 serviceMonitor 去收集
kubeEtcd:
enabled: true
serviceMonitor:
enabled: false
kubeControllerManager:
enabled: true
serviceMonitor:
enabled: false
kubeScheduler:
enabled: true
serviceMonitor:
enabled: false
kubeProxy:
enabled: true
serviceMonitor:
enabled: false
```
* 部署 `80.14.4` 版本
```
$ helm install monitoring prometheus-community/kube-prometheus-stack --version=80.14.4 --namespace monitoring-system --create-namespace -f monitoring.yaml
```
* 檢視部署狀態
```
$ kubectl -n monitoring-system get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alertmanager-monitoring-kube-prometheus-alertmanager-0 2/2 Running 0 6m9s 10.244.190.70 w1 <none> <none>
monitoring-grafana-6754c8b79d-zlfgd 3/3 Running 0 6m25s 10.244.190.67 w1 <none> <none>
monitoring-kube-prometheus-operator-7cc4577cbb-rtm2w 1/1 Running 0 6m25s 10.244.190.66 w1 <none> <none>
monitoring-kube-state-metrics-6f79bc78d5-zq66d 1/1 Running 0 6m25s 10.244.190.68 w1 <none> <none>
monitoring-prometheus-node-exporter-5l7fx 1/1 Running 0 6m25s 192.168.1.55 m1 <none> <none>
monitoring-prometheus-node-exporter-rw4ls 1/1 Running 0 6m25s 192.168.1.56 m2 <none> <none>
monitoring-prometheus-node-exporter-v92j9 1/1 Running 0 6m25s 192.168.1.58 w1 <none> <none>
monitoring-prometheus-node-exporter-vzm5r 1/1 Running 0 6m25s 192.168.1.57 m3 <none> <none>
prometheus-monitoring-kube-prometheus-prometheus-0 2/2 Running 0 6m9s 10.244.190.71 w1 <none> <none>
```
## 部署 Prometheus Agent
* Prometheus Agent,透過 DaemonSet 的方式部署,分別以下兩個 DaemonSet:
- prometheus-agent-cp:負責收集 Master 節點上的核心組件的指標 (etcd, kube-controller-manager, kube-scheduler)。
- prometheus-agent-node:負責收集所有點的 node exporter 指標,和 kube-proxy 的指標。
* Prometheus Agent 會在每個節點透過 127.0.0.1 去收集指標,在由他主動寫入到 prometheus server。
```
$ nano prom-agent.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-agent-cp-config
namespace: monitoring-system
data:
prometheus.yml.tpl: |
global:
scrape_interval: 15s
external_labels:
node_name: ${NODE_NAME}
node_ip: ${NODE_IP}
scrape_configs:
- job_name: 'etcd'
static_configs:
- targets: ['127.0.0.1:2381']
relabel_configs:
- action: replace
replacement: "${NODE_IP}:2381"
target_label: instance
- job_name: 'kube-controller-manager'
scheme: https
tls_config:
insecure_skip_verify: true
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
static_configs:
- targets: ['127.0.0.1:10257']
relabel_configs:
- action: replace
replacement: "${NODE_IP}:10257"
target_label: instance
- job_name: 'kube-scheduler'
scheme: https
tls_config:
insecure_skip_verify: true
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
static_configs:
- targets: ['127.0.0.1:10259']
relabel_configs:
- action: replace
replacement: "${NODE_IP}:10259"
target_label: instance
remote_write:
- url: "http://monitoring-kube-prometheus-prometheus.monitoring-system.svc:9090/api/v1/write"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: prometheus-agent-cp
namespace: monitoring-system
labels:
app: prometheus-agent-cp
spec:
selector:
matchLabels:
app: prometheus-agent-cp
template:
metadata:
labels:
app: prometheus-agent-cp
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
serviceAccountName: monitoring-kube-prometheus-prometheus
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
initContainers:
- name: init-config
image: busybox:latest
command:
- sh
- -c
- |
# 不管你的 DaemonSet 部署到哪一台 Node 上,InitContainer 透過 sed 都會自動幫你把該 Node 的專屬資訊寫入設定檔中,Prometheus Agent 抓資料時貼上的標籤(Labels)就會完全正確
sed -e "s/\${NODE_IP}/$NODE_IP/g" -e "s/\${NODE_NAME}/$NODE_NAME/g" /etc/prometheus-tpl/prometheus.yml.tpl > /etc/prometheus/prometheus.yml
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus-tpl
- name: config-shared
mountPath: /etc/prometheus
containers:
- name: prometheus-agent
image: quay.io/prometheus/prometheus:v3.9.1
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.agent.path=/prometheus"
- "--agent"
- "--web.listen-address=:9091"
- "--log.level=info"
volumeMounts:
- name: config-shared
mountPath: /etc/prometheus
resources:
requests:
cpu: 50m
memory: 128Mi
volumes:
- name: config-volume
configMap:
name: prometheus-agent-cp-config
- name: config-shared
emptyDir: {}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-agent-node-config
namespace: monitoring-system
data:
prometheus.yml.tpl: |
global:
scrape_interval: 15s
external_labels:
node_name: ${NODE_NAME}
node_ip: ${NODE_IP}
scrape_configs:
# 1. 抓取 Node Exporter
- job_name: 'node-exporter'
static_configs:
- targets: ['127.0.0.1:9100']
relabel_configs:
- action: replace
replacement: "${NODE_IP}:9100"
target_label: instance
# 2. 抓取 Kube Proxy
- job_name: 'kube-proxy'
static_configs:
- targets: ['127.0.0.1:10249']
relabel_configs:
- action: replace
replacement: "${NODE_IP}:10249"
target_label: instance
remote_write:
- url: "http://monitoring-kube-prometheus-prometheus.monitoring-system.svc:9090/api/v1/write"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: prometheus-agent-node
namespace: monitoring-system
labels:
app: prometheus-agent-node
spec:
selector:
matchLabels:
app: prometheus-agent-node
template:
metadata:
labels:
app: prometheus-agent-node
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
serviceAccountName: monitoring-kube-prometheus-prometheus
tolerations:
- operator: Exists
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
initContainers:
- name: init-config
image: busybox:latest
command:
- sh
- -c
- |
sed -e "s/\${NODE_IP}/$NODE_IP/g" -e "s/\${NODE_NAME}/$NODE_NAME/g" /etc/prometheus-tpl/prometheus.yml.tpl > /etc/prometheus/prometheus.yml
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus-tpl
- name: config-shared
mountPath: /etc/prometheus
containers:
- name: prometheus-agent
image: quay.io/prometheus/prometheus:v3.9.1
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.agent.path=/prometheus"
- "--agent"
- "--web.listen-address=:9092"
- "--log.level=info"
volumeMounts:
- name: config-shared
mountPath: /etc/prometheus
resources:
requests:
cpu: 50m
memory: 128Mi
volumes:
- name: config-volume
configMap:
name: prometheus-agent-node-config
- name: config-shared
emptyDir: {}
```
* 部署 Prometheus Agent
```
$ kubectl apply -f prom-agent.yaml
```
## 進入 prometheus UI
* 使用 nodeport 方式連進 prometheus UI
* 查詢 `up{job="etcd"}`、`up{job="kube-proxy"}`、`up{job="kube-controller-manager"}`、`up{job="kube-scheduler"}`、`up{job="kube-scheduler"}` 可以看到指標都有收集到

## 進入 grafana UI
* 使用 nodeport 方式連進 grafana UI,預設帳密是 `admin/admin123`
* 點選 Dashboards -> etcd 查看面板都有正確收集

## 參考
https://prometheus.io/blog/2021/11/16/agent/