Try   HackMD

Grafana Agent: Checking Whether Grafana Agent Picked Up the PodMonitor and ServiceMonitor and Actually Scrapes Metrics

This guide works with Prometheus ("Prometheus Agent" mode) and with the Grafana Agent Operator.

Imagine that you have installed the Grafana Agent Operator in the grafana namespace with Helm:

helm upgrade --install -n grafana --create-namespace grafana-agent-operator grafana/grafana-agent-operator

You have created a PodMonitor or a ServiceMonitor but nothing is showing in Grafana.

Unfortunately, the Grafana Agent Operator and the Grafana Agent don't show anything in the logs that will help you diagnose if your ServiceMonitor is working.

Check 1: Has the Grafana Agent Operator loaded the PodMonitor or ServiceMonitor into a ConfigMap?

kubectl get secret -n grafana grafana-agent-config -ojson | jq '.data."agent.yml"' -r \
  | base64 -d \
  | grep job_name:

It should show the list of monitors that the Grafana Agent Operator has found and loaded into the ConfigMap:

job_name: podMonitor/default/example/0
job_name: serviceMonitor/cert-manager/cert-manager/0

If your PodMonitor or ServiceMonitor doesn't show in the list:

  1. Try restarting the Grafana Agent Operator:

    ​​​kubectl rollout restart -n grafana deploy grafana-agent-operator
    
  2. Check that the labels match in the MetricsInstance:

    ​​​​kubectl get metricsinstance -A -oyaml
    

    For example, if you have:

    ​​​​kind: MetricsInstance
    ​​​​spec:
    ​​​​  podMonitorNamespaceSelector: {}
    ​​​​  podMonitorSelector:
    ​​​​    matchLabels:
    ​​​​      instance: primary
    ​​​​  serviceMonitorNamespaceSelector: {}
    ​​​​  serviceMonitorSelector:
    ​​​​    matchLabels:
    ​​​​      instance: primary
    

    Then, look at your PodMonitors and ServiceMonitor and check that they have that label among their labels:

    ​​​​kubectl get servicemonitor,podmonitor -A -ojson | jq '.items[].metadata | {"name": .name, "namespace": .namespace, "labels": .labels}'
    

Check 2: Has Grafana Agent Loaded the PodMonitor or ServiceMonitor?

The first thing you will want to do is to create a Service to access the Grafana Agent's API. Imagining that your agent is in the default namespace, the command will be:

kubectl expose pod grafana-agent-0

Then, query the Grafana Agent's API:

kubectl get --raw /api/v1/namespaces/grafana/services/grafana-agent-0:8080/proxy/agent/api/v1/metrics/targets \
  | jq -r '.data[] | "\(.target_group)\t\(.state)\t\(.scrape_error)"' \
  | column -t

That command will list each service monitor and show their state:

JOB NAME                                     STATE
serviceMonitor/cert-manager/cert-manager/0   up
serviceMonitor/default/cadvisor-monitor/0    up
serviceMonitor/default/kubelet-monitor/0     up
serviceMonitor/cert-manager/cert-manager/0   up
serviceMonitor/default/foo/0                 down  Get  "http://10.76.0.20:8080/metrics":  dial  tcp  10.76.0.20:8080:  connect:  connection  refused

As you can see, the ServiceMonitor foo in the namespace default can't be reached. If the Service is also named foo, we can check whether the /metrics endpoint is reachable with the following command:

kubectl get --raw /api/v1/namespaces/default/services/foo:8080/proxy/metrics

That should help you debug the issue further.

I have found that when I forget to set the correct labels, the Grafana Agent Operator doesn't create the corresponding entry in the grafana-agent-config:

kubectl view-secret -n default grafana-agent-config

When a service monitor doesn't show in the Grafana Agent API:

  1. The ServiceMonitor references a Service (not a Deployment, or a Pod), by labels and by the port name in the Service. This port name is optional in Kubernetes, but must be specified for the ServiceMonitor to work. It is not the same as the port name on the Pod or container, although it can be. (source: Prometheus Troubleshooting)