# POC: Investigate OpenTelemetry as replacement for federation to garden from aggregate Prometheus
## Scratchpad
- [x] check otel-collector-contrib
- [x] check prometheus receiver configuration
### Local Setup
```shell
make kind-multi-zone-up operator-seed-up
```
Ignore the `ManagedResource` `garden/prometheus-garden` with the annotation `resources.gardener.cloud/ignore: "true"`.
Edit the `Prometheus` `garden/garden` and set `.spec.enableOTLPReceiver: true`.
Apply the `OpenTelemetryCollector` resource to the `Seed`.
Port forward either `prometheus-garden-{0,1}` `Pod` and try querying for, e.g., `seed:images:count`.
If you get results the federation is working.
*Note:* If federation (i.e. ingestion into the garden Prometheus) is temporarily not working, e.g., due to `NetworkPolicies` it leads to a situation where the `prometheusreceiver` tries to ingest out-of-order entries.
This can be remedied by configuring the `.spec.tsdb.outOfOrderTimeWindow` field on the `Prometheus` resource ([ref](https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1.TSDBSpec)).
## Otel collector
Luckily `@rrhubenov` anticipated the hackathon and added the `prometheusreceiver` to Gardener's `opentelemetry-collector` a while back:
* https://github.com/gardener/opentelemetry-collector/blob/636ec4ddf8f4a7f61ddfff86a66939fbb706764c/manifest.yml#L46
* https://github.com/gardener/opentelemetry-collector/pull/13
### Version 1
```yaml=
apiVersion: v1
kind: ServiceAccount
metadata:
name: opentelemetry-collector
namespace: garden
---
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
annotations:
networking.resources.gardener.cloud/from-all-scrape-targets-allowed-ports: '[{"protocol":"TCP","port":8888}]'
labels:
networking.gardener.cloud/to-dns: allowed
networking.resources.gardener.cloud/to-prometheus-aggregate-tcp-9090: allowed
networking.resources.gardener.cloud/to-prometheus-garden-tcp-9090: allowed
name: opentelemetry-collector
namespace: garden
spec:
config:
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 5s
static_configs:
- targets: ['prometheus-aggregate:80']
metrics_path: '/federate'
params:
match[]:
- '{__name__=~"seed:(.+):count"}'
exporters:
debug:
verbosity: detailed
sampling_initial: 5
sampling_thereafter: 0
otlphttp:
endpoint: "http://prometheus-garden:80/api/v1/otlp"
tls:
insecure: true
processors:
batch:
timeout: 10s
send_batch_size: 100
service:
pipelines:
metrics:
receivers: [prometheus]
processors: [batch]
exporters: [debug, otlphttp]
telemetry:
logs:
level: info
configVersions: 3
image: europe-docker.pkg.dev/sap-se-gcp-k8s-delivery/releases-public/europe-docker_pkg_dev/gardener-project/releases/gardener/observability/opentelemetry-collector@sha256:d33d65ab8aa41e188d9055463f72f50d5def0ddc652b7c108039918e027a351f
ipFamilyPolicy: SingleStack
managementState: managed
mode: deployment
priorityClassName: gardener-system-100
replicas: 1
resources:
requests:
cpu: 10m
memory: 50Mi
securityContext:
allowPrivilegeEscalation: false
serviceAccount: opentelemetry-collector
targetAllocator:
allocationStrategy: consistent-hashing
collectorNotReadyGracePeriod: 30s
collectorTargetReloadInterval: 30s
filterStrategy: relabel-config
prometheusCR:
scrapeInterval: 30s
upgradeStrategy: none
```
### Version 2
Two exporters to push data to both Prometheus pods.
```yaml=
apiVersion: v1
kind: ServiceAccount
metadata:
name: opentelemetry-collector
namespace: garden
---
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
annotations:
networking.resources.gardener.cloud/from-all-scrape-targets-allowed-ports: '[{"protocol":"TCP","port":8888}]'
labels:
networking.gardener.cloud/to-dns: allowed
networking.resources.gardener.cloud/to-prometheus-aggregate-tcp-9090: allowed
networking.resources.gardener.cloud/to-prometheus-garden-tcp-9090: allowed
name: opentelemetry-collector
namespace: garden
spec:
config:
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 5s
static_configs:
- targets: ['prometheus-aggregate:80']
metrics_path: '/federate'
params:
match[]:
- '{__name__=~"seed:(.+):count"}'
exporters:
debug:
verbosity: detailed
sampling_initial: 5
sampling_thereafter: 0
otlphttp/0:
endpoint: "http://prometheus-garden-0.prometheus-operated:9090/api/v1/otlp"
tls:
insecure: true
otlphttp/1:
endpoint: "http://prometheus-garden-1.prometheus-operated:9090/api/v1/otlp"
tls:
insecure: true
processors:
batch:
timeout: 10s
send_batch_size: 100
service:
pipelines:
metrics:
receivers: [prometheus]
processors: [batch]
exporters: [debug, otlphttp/0, otlphttp/1]
telemetry:
logs:
level: info
configVersions: 3
image: europe-docker.pkg.dev/sap-se-gcp-k8s-delivery/releases-public/europe-docker_pkg_dev/gardener-project/releases/gardener/observability/opentelemetry-collector@sha256:d33d65ab8aa41e188d9055463f72f50d5def0ddc652b7c108039918e027a351f
ipFamilyPolicy: SingleStack
managementState: managed
mode: deployment
priorityClassName: gardener-system-100
replicas: 1
resources:
requests:
cpu: 10m
memory: 50Mi
securityContext:
allowPrivilegeEscalation: false
serviceAccount: opentelemetry-collector
targetAllocator:
allocationStrategy: consistent-hashing
collectorNotReadyGracePeriod: 30s
collectorTargetReloadInterval: 30s
filterStrategy: relabel-config
prometheusCR:
scrapeInterval: 30s
upgradeStrategy: none
```
## Summary
This final version configures two OTel collectors to push data from the seeds to the garden Prometheus in the runtime cluster.
The first OTel collector, located in the seed, uses a Prometheus receiver to query the required metrics from the aggregate Prometheus and an OTLP receiver to push these metrics to the second OTel collector in the runtime cluster. The second collector in the runtime cluster configures an OTLP receiver and two OTLP exporters, each targeting one garden Prometheus pod. This ensures that metrics are sent to both Prometheus pods.
To maintain high availability of the garden Prometheus, the OTel collector in the runtime cluster is deployed with two replicas. These replicas are exposed behind a Kubernetes service that load balances streams of metrics.
In this setup, the OTel collector from the seed pushes streams of metrics only once to the runtime cluster. This differs from the current setup, where both Prometheus pods federate from the aggregate Prometheus, sending the same metrics twice over the network.
This PoC has been developed in a local setup, where the seed and runtime clusters are the same, and hence, the ingress can't be skipped. However, how the seed could obtain the right credentials to pass through the garden Prometheus ingress remains an open question.

### Resources
#### Service Account
```
apiVersion: v1
kind: ServiceAccount
metadata:
name: opentelemetry-collector
namespace: garden
```
#### OTel Collector in the seed
```
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
annotations:
networking.resources.gardener.cloud/from-all-scrape-targets-allowed-ports: '[{"protocol":"TCP","port":8888}]'
labels:
networking.gardener.cloud/to-dns: allowed
networking.resources.gardener.cloud/to-prometheus-aggregate-tcp-9090: allowed
networking.resources.gardener.cloud/to-prometheus-garden-tcp-9090: allowed
networking.resources.gardener.cloud/to-runtime-collector-tcp-4318: allowed
name: seed
namespace: garden
spec:
config:
receivers:
prometheus:
config:
scrape_configs:
- job_name: "otel-collector"
scrape_interval: 5s
static_configs:
- targets: ["prometheus-aggregate:80"]
metrics_path: "/federate"
params:
match[]:
- '{__name__=~"seed:(.+):count"}'
- '{__name__=~"seed:(.+):sum"}'
- '{__name__=~"seed:(.+):sum_cp"}'
- '{__name__=~"seed:(.+):sum_by_pod",namespace=~"extension-(.+)"}'
- '{__name__=~"seed:(.+):sum_by_container",__name__!="seed:kube_pod_container_status_restarts_total:sum_by_container",container="kube-apiserver"}'
- '{__name__=~"shoot:(.+):(.+)",__name__!="shoot:apiserver_storage_objects:sum_by_resource",__name__!="shoot:apiserver_watch_duration:quantile"}'
- '{__name__="ALERTS"}'
- '{__name__="shoot:availability"}'
- '{__name__="prometheus_tsdb_lowest_timestamp"}'
- '{__name__="prometheus_tsdb_storage_blocks_bytes"}'
- '{__name__="seed:persistentvolume:inconsistent_size"}'
- '{__name__="seed:kube_pod_container_status_restarts_total:max_by_namespace"}'
- '{__name__=~"metering:.+:(sum_by_namespace|sum_by_instance_type)"}'
- '{__name__="kube_node_spec_taint"}'
exporters:
debug:
verbosity: detailed
sampling_initial: 5
sampling_thereafter: 0
otlphttp:
endpoint: "http://runtime-collector:4318"
tls:
insecure: true
processors:
batch:
timeout: 10s
send_batch_size: 100
service:
pipelines:
metrics:
receivers: [prometheus]
processors: [batch]
exporters: [debug, otlphttp]
telemetry:
logs:
level: info
configVersions: 3
image: europe-docker.pkg.dev/sap-se-gcp-k8s-delivery/releases-public/europe-docker_pkg_dev/gardener-project/releases/gardener/observability/opentelemetry-collector@sha256:d33d65ab8aa41e188d9055463f72f50d5def0ddc652b7c108039918e027a351f
ipFamilyPolicy: SingleStack
managementState: managed
mode: deployment
priorityClassName: gardener-system-100
replicas: 1
resources:
requests:
cpu: 10m
memory: 50Mi
securityContext:
allowPrivilegeEscalation: false
serviceAccount: opentelemetry-collector
targetAllocator:
allocationStrategy: consistent-hashing
collectorNotReadyGracePeriod: 30s
collectorTargetReloadInterval: 30s
filterStrategy: relabel-config
prometheusCR:
scrapeInterval: 30s
upgradeStrategy: none
```
#### OTel Collector in the runtime
```
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
annotations:
networking.resources.gardener.cloud/from-all-scrape-targets-allowed-ports: '[{"protocol":"TCP","port":8888}]'
labels:
networking.gardener.cloud/to-dns: allowed
networking.resources.gardener.cloud/to-prometheus-aggregate-tcp-9090: allowed
networking.resources.gardener.cloud/to-prometheus-garden-tcp-9090: allowed
name: runtime
namespace: garden
spec:
config:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
exporters:
debug:
verbosity: detailed
sampling_initial: 5
sampling_thereafter: 0
otlphttp/0:
endpoint: "http://prometheus-garden-0.prometheus-operated:9090/api/v1/otlp"
tls:
insecure: true
otlphttp/1:
endpoint: "http://prometheus-garden-1.prometheus-operated:9090/api/v1/otlp"
tls:
insecure: true
processors:
batch:
timeout: 10s
send_batch_size: 100
service:
pipelines:
metrics:
receivers: [otlp]
processors: [batch]
exporters: [debug, otlphttp/0, otlphttp/1]
telemetry:
logs:
level: info
configVersions: 3
image: europe-docker.pkg.dev/sap-se-gcp-k8s-delivery/releases-public/europe-docker_pkg_dev/gardener-project/releases/gardener/observability/opentelemetry-collector@sha256:d33d65ab8aa41e188d9055463f72f50d5def0ddc652b7c108039918e027a351f
ipFamilyPolicy: SingleStack
managementState: managed
mode: deployment
priorityClassName: gardener-system-100
replicas: 2
resources:
requests:
cpu: 10m
memory: 50Mi
securityContext:
allowPrivilegeEscalation: false
serviceAccount: opentelemetry-collector
targetAllocator:
allocationStrategy: consistent-hashing
collectorNotReadyGracePeriod: 30s
collectorTargetReloadInterval: 30s
filterStrategy: relabel-config
prometheusCR:
scrapeInterval: 30s
upgradeStrategy: none
```
## References
* [Enable the OTLP receiver](https://prometheus.io/docs/guides/opentelemetry/#enable-the-otlp-receiver)
* [Prometheus Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/prometheusreceiver)
* [gardener/opentelemetry-collector](https://github.com/gardener/opentelemetry-collector)