## Blackbox Operator: ServiceMonitor?Probe? How?
Written by Emin Aktaş, Furkan Türkal, Necatican Yıldırım developer-guy Yasintahaerol
Probing endpoints is highly important if you have multiple internal or external endpoints. In any failure or down situation, you have to be informed immediately about that. Blackbox exporter has an important role be able to observe the variety of endpoints. So, let us look at blackbox exporter closely.
🌚 Blackbox exporter creates metrics that depend on response time of many kind of internal or external endpoints, such as HTTP/S, TCP, ICMP, DNS.
It gathers information about the SSL certificate. You can create alert in case of a certificate expired situation or an invalid certificate.
It could be passed all metrics to the grafana and create detailed dashboards. ( Like DNS lookup, HTTP latencies, etc… )
Blackbox exporter could be used in different ways. One of them is using as a service inside systemd. The Second one is deploying with Kubernetes. Today we will focus on deploying with Kubernetes and use helm chart to configure it.
Before we start, I am willing to inform you about some concepts about prometheus-operator.
Prometheus operator always monitors the Kubernetes API server for any changes in configuration and compares actual state and desired state. Then, it tries to sync without manual operation. The Operator has many custom resource definitions (CRDs). One of them is ServiceMonitor.
### What is ServiceMonitor ?
In documentation,
ServiceMonitor, which declaratively specifies how groups of Kubernetes services should be monitored. The Operator automatically generates Prometheus scrape configuration based on the current state of the objects in the API server.
So that we can specify set of targets to be monitored by prometheus without any changes on the prometheus server side.
### What is probe ❓
In documentation,
Probe defines monitoring for a set of static targets or ingresses.
A declarative way of defining how set of ingress or static targets monitored. Actually, probe resembles servicemonitor, if we look what they do. When any probe is created in the cluster, prometheus will start to scrape configuration automatically.
#### Why are these CRD ' s are important ❓
It is easy to set up any (probe or servicemonitor) independentally, So that you don't have to any manual change in prometheus configuration. These crd's will take care of the integration.
Entire Teams could create their own resources without affecting each others.
Easy deployment and troubleshooting.
### How we can use blackbox-exporter ❓
Now, it is start to time for hands-on. It is showed that three different methods and examples of how we configure all of them in this article. To be more clear, I will separate services as external or internal. Let me start first with external services.
---

---
#### External Services
To be able to probe external services, there exist two variety ways. The first one is creating servicemonitor, second one is creating a probe. When we considered our use case, it is better to use probe resources. Each team will be able to scrape metrics about their external services by creating a probe.
1. ServiceMonitor
If you deploy blackbox-exporter via using helm, it is easy to configure serviceMonitor. There exist a section that enable us to activate serviceMonitor. When choosing to enable this property, necessary configurations would be created automatically for you. All urls to be probing are specified in the targets section.
```
serviceMonitor:
enabled: true
defaults:
additionalMetricsRelabels: {}
labels: {}
interval: 30s
scrapeTimeout: 30s
module: http_2xx
scheme: http
tlsConfig: {}
bearerTokenFile:
targets: []
- name: google
url: http://google.com/
interval: 60s
scrapeTimeout: 60s
module: http_2xx
```
2. Probe
Also, it could be deployed an external probe resource in the cluster. Basically, they have similar results, when we look at what they do at the end of the day. In prober part, relevant blackbox-exporter service's FQDN information should be entered.
```
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
name: blackbox-exporter
namespace: monitoring
spec:
jobName: http-get
interval: 60s
module: http_2xx
prober:
url: <blackbox-exporter-svc>.<ns>.svc:19115
scheme: http
path: /probe
targets:
staticConfig:
static: []
```
#### Internal Services
There exist a feature on p8s-operator. We will create a job in prometheus.yml. With kubernetes_sd_configs feature (by choosing the service role), development teams could define an annotation for their services to get the metrics by blackbox-exporter. In the commit, if any service has specific annotation "promethesu.io/probe: true", Prometheus will start sending requests to blackbox-exporter automatically. Also, with the power of Prometheus relabeling mechanism, it is possible to probe a variety of different sources such as consul catalog, endpoints, etc. Moreover, variety module definitions could be added to p8s-operator configuration. Common modules are HTTP/S, TCP, ICMP, etc.
kubernetes_sd_configs:
- role: service
metrics_path: /probe
params:
module:
- http_2xx
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_probe
- source_labels:
- __address__
target_label: __param_target
- replacement: blackbox-exporter-prometheus-blackbox-exporter:9115
target_label: __address__
- source_labels:
- __param_target
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name
A simple service example with prometheus.io/probe: true annotation. Here, an example.
2s ➜ kubectl describe svc nginx
```
Name: nginx
Namespace: monitoring
Labels: app=nginx
Annotations: prometheus.io/probe: true
Selector: app=nginx
Type: NodePort
IP Families: <none>
IP: 10.233.40.80
IPs: 10.233.40.80
Port: http 80/TCP
TargetPort: 80/TCP
NodePort: http 30301/TCP
Endpoints: 10.233.71.180:80
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
```
### 🔔How to create ALERTRULES ❓
There exist numerous alert rules to be configured for blackbox-exporter. Alerts could be created for different troubling issues such as SSL expiration time, probe slowdown or non-reach to service. these warnings can be broadcast on different channels via webhook.
→ https://awesome-prometheus-alerts.grep.to/rules.html#blackbox-1

### 🌅How to visualize data ❓
Blackbox metrics could be converted to human-readable format by using detailed grafana dashboards. Here, you can find many dashboard templates for blackbox exporter depending on your need.

## BONUS
Let us get our hands dirty with the Blackbox Exporter to understand how it works since we provide necessary parameters via Prometheus which does the dirty job for us. Time to do it ourselves.

Blackbox Exporter has many abilities through modules. Here are a couple of examples; it makes basic HTTP requests such as GET, POST and expects to receive a 2xx status code within the timeout period. Or, it can make matching with regex to body or header. If you want more details about the probes and options, check the [documantations](https://github.com/prometheus/blackbox_exporter/blob/master/CONFIGURATION.md).
```bash
# Create a tmp directory and get the example config file
$ pushd $(mktemp -d -q "/tmp/blackbox_exporter.XXXXXX")
# Download a good configuration file
$ wget https://raw.githubusercontent.com/prometheus/blackbox_exporter/master/config/testdata/blackbox-good.yml -O blackbox.yml
# Run the Blackbox Exporter as a Docker container
$ docker run --rm -d -p 9115:9115 --name blackbox_exporter -v `pwd`:/config prom/blackbox-exporter:master --config.file=/config/blackbox.yml
# When you are done, you can get out the directory with
$ popd
```
We can now probe any target with `http_2xx` probe which is defined in the configuration file along with other probe configurations.
By simply calling the url http://localhost:9115/probe?target=www.trendyol.com&module=http_2xx returns Prometheus metrics.
`probe_success` is the first metrics we should check. 1 means that probe succeeded.
Also, we can do debugging with just add `debug=true` end of the url like this: http://localhost:9115/probe?target=www.trendyol.com&module=http_2xx&debug=true
We are going to see more details along with our module configuration.
```
Logs for the probe:
ts=2021-11-10T12:03:19.539609322Z caller=main.go:320 module=http_2xx target=www.trendyol.com level=info msg="Beginning probe" probe=http timeout_seconds=5
ts=2021-11-10T12:03:19.539685705Z caller=http.go:335 module=http_2xx target=www.trendyol.com level=info msg="Resolving target address" ip_protocol=ip6
ts=2021-11-10T12:03:19.570921716Z caller=http.go:335 module=http_2xx target=www.trendyol.com level=info msg="Resolved target address" ip=104.17.133.16
ts=2021-11-10T12:03:19.570980068Z caller=client.go:251 module=http_2xx target=www.trendyol.com level=info msg="Making HTTP request" url=http://104.17.133.16 host=www.trendyol.com
ts=2021-11-10T12:03:19.74709647Z caller=client.go:492 module=http_2xx target=www.trendyol.com level=info msg="Received redirect" location=https://www.trendyol.com/
ts=2021-11-10T12:03:19.747202186Z caller=client.go:251 module=http_2xx target=www.trendyol.com level=info msg="Making HTTP request" url=https://www.trendyol.com/ host=
ts=2021-11-10T12:03:19.747223777Z caller=client.go:251 module=http_2xx target=www.trendyol.com level=info msg="Address does not match first address, not sending TLS ServerName" first=104.17.133.16 address=www.trendyol.com
ts=2021-11-10T12:03:20.085912327Z caller=main.go:130 module=http_2xx target=www.trendyol.com level=info msg="Received HTTP response" status_code=200
ts=2021-11-10T12:03:20.309809321Z caller=main.go:130 module=http_2xx target=www.trendyol.com level=info msg="Response timings for roundtrip" roundtrip=0 start=2021-11-10T12:03:19.571052069Z dnsDone=2021-11-10T12:03:19.571052069Z connectDone=2021-11-10T12:03:19.631688228Z gotConn=2021-11-10T12:03:19.631718525Z responseStart=2021-11-10T12:03:19.747027915Z tlsStart=0001-01-01T00:00:00Z tlsDone=0001-01-01T00:00:00Z end=0001-01-01T00:00:00Z
ts=2021-11-10T12:03:20.309844977Z caller=main.go:130 module=http_2xx target=www.trendyol.com level=info msg="Response timings for roundtrip" roundtrip=1 start=2021-11-10T12:03:19.747300002Z dnsDone=2021-11-10T12:03:19.751055881Z connectDone=2021-11-10T12:03:19.846510737Z gotConn=2021-11-10T12:03:19.914806905Z responseStart=2021-11-10T12:03:20.085834663Z tlsStart=2021-11-10T12:03:19.846537968Z tlsDone=2021-11-10T12:03:19.914701661Z end=2021-11-10T12:03:20.309796122Z
ts=2021-11-10T12:03:20.309911491Z caller=main.go:320 module=http_2xx target=www.trendyol.com level=info msg="Probe succeeded" duration_seconds=0.770276769
Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.031248997
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.770276769
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_http_content_length Length of http content response
# TYPE probe_http_content_length gauge
probe_http_content_length -1
# HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects
# TYPE probe_http_duration_seconds gauge
probe_http_duration_seconds{phase="connect"} 0.156121312
probe_http_duration_seconds{phase="processing"} 0.286337175
probe_http_duration_seconds{phase="resolve"} 0.035004882
probe_http_duration_seconds{phase="tls"} 0.068163702
probe_http_duration_seconds{phase="transfer"} 0.223961445
# HELP probe_http_redirects The number of redirects
# TYPE probe_http_redirects gauge
probe_http_redirects 1
# HELP probe_http_ssl Indicates if SSL was used for the final redirect
# TYPE probe_http_ssl gauge
probe_http_ssl 1
# HELP probe_http_status_code Response HTTP status code
# TYPE probe_http_status_code gauge
probe_http_status_code 200
# HELP probe_http_uncompressed_body_length Length of uncompressed response body
# TYPE probe_http_uncompressed_body_length gauge
probe_http_uncompressed_body_length 222945
# HELP probe_http_version Returns the version of HTTP of the probe response
# TYPE probe_http_version gauge
probe_http_version 2
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 1.231528671e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_ssl_earliest_cert_expiry Returns earliest SSL cert expiry in unixtime
# TYPE probe_ssl_earliest_cert_expiry gauge
probe_ssl_earliest_cert_expiry 1.652864248e+09
# HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp seconds
# TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge
probe_ssl_last_chain_expiry_timestamp_seconds 1.652864248e+09
# HELP probe_ssl_last_chain_info Contains SSL leaf certificate information
# TYPE probe_ssl_last_chain_info gauge
probe_ssl_last_chain_info{fingerprint_sha256="0315524193aa6ceb020b85a8311534d51d7b32d0344895687c57b9f0928eb9bb"} 1
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
# HELP probe_tls_version_info Contains the TLS version used
# TYPE probe_tls_version_info gauge
probe_tls_version_info{version="TLS 1.3"} 1
Module configuration:
prober: http
timeout: 5s
http:
ip_protocol_fallback: true
follow_redirects: true
tcp:
ip_protocol_fallback: true
icmp:
ip_protocol_fallback: true
dns:
ip_protocol_fallback: true
```
## References
https://sysdig.com/blog/blackbox-exporter-sysdig/
https://github.com/prometheus/blackbox_exporter
https://medium.com/codex/prometheus-blackbox-what-why-how-28290dbb22ce