HackMD - Collaborative Markdown Knowledge Base

## Blackbox Operator: ServiceMonitor?Probe? How? Written by Emin Aktaş, Furkan Türkal, Necatican Yıldırım developer-guy Yasintahaerol Probing endpoints is highly important if you have multiple internal or external endpoints. In any failure or down situation, you have to be informed immediately about that. Blackbox exporter has an important role be able to observe the variety of endpoints. So, let us look at blackbox exporter closely. 🌚 Blackbox exporter creates metrics that depend on response time of many kind of internal or external endpoints, such as HTTP/S, TCP, ICMP, DNS. It gathers information about the SSL certificate. You can create alert in case of a certificate expired situation or an invalid certificate. It could be passed all metrics to the grafana and create detailed dashboards. ( Like DNS lookup, HTTP latencies, etc… ) Blackbox exporter could be used in different ways. One of them is using as a service inside systemd. The Second one is deploying with Kubernetes. Today we will focus on deploying with Kubernetes and use helm chart to configure it. Before we start, I am willing to inform you about some concepts about prometheus-operator. Prometheus operator always monitors the Kubernetes API server for any changes in configuration and compares actual state and desired state. Then, it tries to sync without manual operation. The Operator has many custom resource definitions (CRDs). One of them is ServiceMonitor. ### What is ServiceMonitor ? In documentation, ServiceMonitor, which declaratively specifies how groups of Kubernetes services should be monitored. The Operator automatically generates Prometheus scrape configuration based on the current state of the objects in the API server. So that we can specify set of targets to be monitored by prometheus without any changes on the prometheus server side. ### What is probe ❓ In documentation, Probe defines monitoring for a set of static targets or ingresses. A declarative way of defining how set of ingress or static targets monitored. Actually, probe resembles servicemonitor, if we look what they do. When any probe is created in the cluster, prometheus will start to scrape configuration automatically. #### Why are these CRD ' s are important ❓ It is easy to set up any (probe or servicemonitor) independentally, So that you don't have to any manual change in prometheus configuration. These crd's will take care of the integration. Entire Teams could create their own resources without affecting each others. Easy deployment and troubleshooting. ### How we can use blackbox-exporter ❓ Now, it is start to time for hands-on. It is showed that three different methods and examples of how we configure all of them in this article. To be more clear, I will separate services as external or internal. Let me start first with external services. --- ![](https://i.imgur.com/fPA4WiL.png) --- #### External Services To be able to probe external services, there exist two variety ways. The first one is creating servicemonitor, second one is creating a probe. When we considered our use case, it is better to use probe resources. Each team will be able to scrape metrics about their external services by creating a probe. 1. ServiceMonitor If you deploy blackbox-exporter via using helm, it is easy to configure serviceMonitor. There exist a section that enable us to activate serviceMonitor. When choosing to enable this property, necessary configurations would be created automatically for you. All urls to be probing are specified in the targets section. ``` serviceMonitor: enabled: true defaults: additionalMetricsRelabels: {} labels: {} interval: 30s scrapeTimeout: 30s module: http_2xx scheme: http tlsConfig: {} bearerTokenFile: targets: [] - name: google url: http://google.com/ interval: 60s scrapeTimeout: 60s module: http_2xx ``` 2. Probe Also, it could be deployed an external probe resource in the cluster. Basically, they have similar results, when we look at what they do at the end of the day. In prober part, relevant blackbox-exporter service's FQDN information should be entered. ``` apiVersion: monitoring.coreos.com/v1 kind: Probe metadata: name: blackbox-exporter namespace: monitoring spec: jobName: http-get interval: 60s module: http_2xx prober: url: <blackbox-exporter-svc>.<ns>.svc:19115 scheme: http path: /probe targets: staticConfig: static: [] ``` #### Internal Services There exist a feature on p8s-operator. We will create a job in prometheus.yml. With kubernetes_sd_configs feature (by choosing the service role), development teams could define an annotation for their services to get the metrics by blackbox-exporter. In the commit, if any service has specific annotation "promethesu.io/probe: true", Prometheus will start sending requests to blackbox-exporter automatically. Also, with the power of Prometheus relabeling mechanism, it is possible to probe a variety of different sources such as consul catalog, endpoints, etc. Moreover, variety module definitions could be added to p8s-operator configuration. Common modules are HTTP/S, TCP, ICMP, etc. kubernetes_sd_configs: - role: service metrics_path: /probe params: module: - http_2xx relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_probe - source_labels: - __address__ target_label: __param_target - replacement: blackbox-exporter-prometheus-blackbox-exporter:9115 target_label: __address__ - source_labels: - __param_target target_label: instance - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name A simple service example with prometheus.io/probe: true annotation. Here, an example. 2s ➜ kubectl describe svc nginx ``` Name: nginx Namespace: monitoring Labels: app=nginx Annotations: prometheus.io/probe: true Selector: app=nginx Type: NodePort IP Families: <none> IP: 10.233.40.80 IPs: 10.233.40.80 Port: http 80/TCP TargetPort: 80/TCP NodePort: http 30301/TCP Endpoints: 10.233.71.180:80 Session Affinity: None External Traffic Policy: Cluster Events: <none> ``` ### 🔔How to create ALERTRULES ❓ There exist numerous alert rules to be configured for blackbox-exporter. Alerts could be created for different troubling issues such as SSL expiration time, probe slowdown or non-reach to service. these warnings can be broadcast on different channels via webhook. → https://awesome-prometheus-alerts.grep.to/rules.html#blackbox-1 ![](https://i.imgur.com/0Ny9r2G.png) ### 🌅How to visualize data ❓ Blackbox metrics could be converted to human-readable format by using detailed grafana dashboards. Here, you can find many dashboard templates for blackbox exporter depending on your need. ![](https://i.imgur.com/OATbivh.png) ## BONUS Let us get our hands dirty with the Blackbox Exporter to understand how it works since we provide necessary parameters via Prometheus which does the dirty job for us. Time to do it ourselves. ![](https://i.imgur.com/RCUovz1.png) Blackbox Exporter has many abilities through modules. Here are a couple of examples; it makes basic HTTP requests such as GET, POST and expects to receive a 2xx status code within the timeout period. Or, it can make matching with regex to body or header. If you want more details about the probes and options, check the [documantations](https://github.com/prometheus/blackbox_exporter/blob/master/CONFIGURATION.md). ```bash # Create a tmp directory and get the example config file $ pushd $(mktemp -d -q "/tmp/blackbox_exporter.XXXXXX") # Download a good configuration file $ wget https://raw.githubusercontent.com/prometheus/blackbox_exporter/master/config/testdata/blackbox-good.yml -O blackbox.yml # Run the Blackbox Exporter as a Docker container $ docker run --rm -d -p 9115:9115 --name blackbox_exporter -v `pwd`:/config prom/blackbox-exporter:master --config.file=/config/blackbox.yml # When you are done, you can get out the directory with $ popd ``` We can now probe any target with `http_2xx` probe which is defined in the configuration file along with other probe configurations. By simply calling the url http://localhost:9115/probe?target=www.trendyol.com&module=http_2xx returns Prometheus metrics. `probe_success` is the first metrics we should check. 1 means that probe succeeded. Also, we can do debugging with just add `debug=true` end of the url like this: http://localhost:9115/probe?target=www.trendyol.com&module=http_2xx&debug=true We are going to see more details along with our module configuration. ``` Logs for the probe: ts=2021-11-10T12:03:19.539609322Z caller=main.go:320 module=http_2xx target=www.trendyol.com level=info msg="Beginning probe" probe=http timeout_seconds=5 ts=2021-11-10T12:03:19.539685705Z caller=http.go:335 module=http_2xx target=www.trendyol.com level=info msg="Resolving target address" ip_protocol=ip6 ts=2021-11-10T12:03:19.570921716Z caller=http.go:335 module=http_2xx target=www.trendyol.com level=info msg="Resolved target address" ip=104.17.133.16 ts=2021-11-10T12:03:19.570980068Z caller=client.go:251 module=http_2xx target=www.trendyol.com level=info msg="Making HTTP request" url=http://104.17.133.16 host=www.trendyol.com ts=2021-11-10T12:03:19.74709647Z caller=client.go:492 module=http_2xx target=www.trendyol.com level=info msg="Received redirect" location=https://www.trendyol.com/ ts=2021-11-10T12:03:19.747202186Z caller=client.go:251 module=http_2xx target=www.trendyol.com level=info msg="Making HTTP request" url=https://www.trendyol.com/ host= ts=2021-11-10T12:03:19.747223777Z caller=client.go:251 module=http_2xx target=www.trendyol.com level=info msg="Address does not match first address, not sending TLS ServerName" first=104.17.133.16 address=www.trendyol.com ts=2021-11-10T12:03:20.085912327Z caller=main.go:130 module=http_2xx target=www.trendyol.com level=info msg="Received HTTP response" status_code=200 ts=2021-11-10T12:03:20.309809321Z caller=main.go:130 module=http_2xx target=www.trendyol.com level=info msg="Response timings for roundtrip" roundtrip=0 start=2021-11-10T12:03:19.571052069Z dnsDone=2021-11-10T12:03:19.571052069Z connectDone=2021-11-10T12:03:19.631688228Z gotConn=2021-11-10T12:03:19.631718525Z responseStart=2021-11-10T12:03:19.747027915Z tlsStart=0001-01-01T00:00:00Z tlsDone=0001-01-01T00:00:00Z end=0001-01-01T00:00:00Z ts=2021-11-10T12:03:20.309844977Z caller=main.go:130 module=http_2xx target=www.trendyol.com level=info msg="Response timings for roundtrip" roundtrip=1 start=2021-11-10T12:03:19.747300002Z dnsDone=2021-11-10T12:03:19.751055881Z connectDone=2021-11-10T12:03:19.846510737Z gotConn=2021-11-10T12:03:19.914806905Z responseStart=2021-11-10T12:03:20.085834663Z tlsStart=2021-11-10T12:03:19.846537968Z tlsDone=2021-11-10T12:03:19.914701661Z end=2021-11-10T12:03:20.309796122Z ts=2021-11-10T12:03:20.309911491Z caller=main.go:320 module=http_2xx target=www.trendyol.com level=info msg="Probe succeeded" duration_seconds=0.770276769 Metrics that would have been returned: # HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds # TYPE probe_dns_lookup_time_seconds gauge probe_dns_lookup_time_seconds 0.031248997 # HELP probe_duration_seconds Returns how long the probe took to complete in seconds # TYPE probe_duration_seconds gauge probe_duration_seconds 0.770276769 # HELP probe_failed_due_to_regex Indicates if probe failed due to regex # TYPE probe_failed_due_to_regex gauge probe_failed_due_to_regex 0 # HELP probe_http_content_length Length of http content response # TYPE probe_http_content_length gauge probe_http_content_length -1 # HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects # TYPE probe_http_duration_seconds gauge probe_http_duration_seconds{phase="connect"} 0.156121312 probe_http_duration_seconds{phase="processing"} 0.286337175 probe_http_duration_seconds{phase="resolve"} 0.035004882 probe_http_duration_seconds{phase="tls"} 0.068163702 probe_http_duration_seconds{phase="transfer"} 0.223961445 # HELP probe_http_redirects The number of redirects # TYPE probe_http_redirects gauge probe_http_redirects 1 # HELP probe_http_ssl Indicates if SSL was used for the final redirect # TYPE probe_http_ssl gauge probe_http_ssl 1 # HELP probe_http_status_code Response HTTP status code # TYPE probe_http_status_code gauge probe_http_status_code 200 # HELP probe_http_uncompressed_body_length Length of uncompressed response body # TYPE probe_http_uncompressed_body_length gauge probe_http_uncompressed_body_length 222945 # HELP probe_http_version Returns the version of HTTP of the probe response # TYPE probe_http_version gauge probe_http_version 2 # HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes. # TYPE probe_ip_addr_hash gauge probe_ip_addr_hash 1.231528671e+09 # HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6 # TYPE probe_ip_protocol gauge probe_ip_protocol 4 # HELP probe_ssl_earliest_cert_expiry Returns earliest SSL cert expiry in unixtime # TYPE probe_ssl_earliest_cert_expiry gauge probe_ssl_earliest_cert_expiry 1.652864248e+09 # HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp seconds # TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge probe_ssl_last_chain_expiry_timestamp_seconds 1.652864248e+09 # HELP probe_ssl_last_chain_info Contains SSL leaf certificate information # TYPE probe_ssl_last_chain_info gauge probe_ssl_last_chain_info{fingerprint_sha256="0315524193aa6ceb020b85a8311534d51d7b32d0344895687c57b9f0928eb9bb"} 1 # HELP probe_success Displays whether or not the probe was a success # TYPE probe_success gauge probe_success 1 # HELP probe_tls_version_info Contains the TLS version used # TYPE probe_tls_version_info gauge probe_tls_version_info{version="TLS 1.3"} 1 Module configuration: prober: http timeout: 5s http: ip_protocol_fallback: true follow_redirects: true tcp: ip_protocol_fallback: true icmp: ip_protocol_fallback: true dns: ip_protocol_fallback: true ``` ## References https://sysdig.com/blog/blackbox-exporter-sysdig/ https://github.com/prometheus/blackbox_exporter https://medium.com/codex/prometheus-blackbox-what-why-how-28290dbb22ce

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.