> The helm charts mentioned in this doc are primarily from [Prometheus Community](https://github.com/prometheus-community/helm-charts/) ## Prometheus Stack The [kube-prometheus stack](https://github.com/prometheus-operator/kube-prometheus), a collection of Kubernetes manifests, [Grafana](http://grafana.com/) dashboards, and [Prometheus rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with [Prometheus](https://prometheus.io/) using the [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator). ### Installation ```bash helm pull kube-prometheus-stack --repo https://prometheus-community.github.io/helm-charts helm install prometheus-stack kube-prometheus-stack-48.3.1.tgz --create-namespace --namespace monitoring ``` * **Note:** The release name is `prometheus-stack`. It will be referenced multiple times in this doc. ## Redis Exporter ### Installation ``` helm pull prometheus-redis-exporter --repo https://prometheus-community.github.io/helm-charts helm install redis-exporter prometheus-redis-exporter-5.5.0.tgz --namespace redis -f redis-exporter.yaml ``` ### Values file `redis-exporter.yaml` ```yaml env: - name: REDIS_EXPORTER_IS_CLUSTER value: "true" redisAddress: ml-redis-leader:6379 serviceMonitor: enabled: true namespace: monitoring labels: release: prometheus-stack interval: 60s timeout: 30s auth: enabled: true secret: name: ml-redis-auth key: redis-password ``` ### Grafana Dashboards #### [redis-dashboard-for-prometheus-redis-exporter-1](https://grafana.com/grafana/dashboards/11835-redis-dashboard-for-prometheus-redis-exporter-helm-stable-redis-ha/) *Currently used* #### [redis-dashboard-for-prometheus-redis-exporter-2](https://grafana.com/grafana/dashboards/11692-redis-dashboard-for-prometheus-redis-exporter-1-x/) ## RabbitMQ Exporter ### Installation ```bash helm pull prometheus-rabbitmq-exporter --repo https://prometheus-community.github.io/helm-charts helm install rabbitmq-exporter prometheus-rabbitmq-exporter-1.8.0.tgz --namespace rabbitmq -f rabbitmq-exporter.yaml ``` The helm chart has a bug. The "rabbitmq.password" set in the values file doesn't take effect. Has to use a k8s Security. See the [link](https://github.com/prometheus-community/helm-charts/pull/3649) for details. I modified the helm chart in order to make "rabbitmq.password" work. Why didn't use the Rabbitmq Security created by the rabbitmq Operator? Because somehow the username/password stored in the Security is the default ones but the ones I set in RabbitmqCluster, a k8s CR handled by the rabbitmq Operator. The Patch is as follows, ```diff diff --git a/charts/prometheus-rabbitmq-exporter/templates/deployment.yaml b/charts/prometheus-rabbitmq-exporter/templates/deployment.yaml index 7c4bfd0b..4b66890b 100644 --- a/charts/prometheus-rabbitmq-exporter/templates/deployment.yaml +++ b/charts/prometheus-rabbitmq-exporter/templates/deployment.yaml @@ -47,7 +47,7 @@ spec: {{- if .Values.rabbitmq.configMapOverrideReference }} - configMapRef: name: {{ .Values.rabbitmq.configMapOverrideReference }} - {{- end }} + {{- end }} env: {{- if .Values.rabbitmq.existingPasswordSecret }} - name: RABBIT_PASSWORD @@ -55,6 +55,9 @@ spec: secretKeyRef: name: "{{ Values.rabbitmq.existingPasswordSecret }}" key: {{ Values.rabbitmq.existingPasswordSecretKey }} + {{- else if .Values.rabbitmq.password }} + - name: RABBIT_PASSWORD + value: {{ .Values.rabbitmq.password }} {{- end }} ports: - containerPort: {{ .Values.service.internalPort }} ``` ### Values file `rabbitmq-exporter.yaml` ```yaml rabbitmq: url: http://ml-rmq.rabbitmq:15672 user: ml-rabbitmq password: s1t2c3b4@rabbitmq capabilities: bert,no_sort include_queues: ".*" include_vhost: ".*" skip_queues: "^$" skip_verify: "false" skip_vhost: "^$" exporters: "exchange,node,overview,queue" output_format: "TTY" timeout: 30 max_queues: 0 excludeMetrics: "" prometheus: monitor: enabled: true interval: 60s namespace: - rabbitmq additionalLabels: release: prometheus-stack ``` ### Grafana Dashboards #### [The Dashboard for prometheus rabbitmq exporter](https://grafana.com/grafana/dashboards/4279-rabbitmq-monitoring/) *Currently used* #### [The Official Dashboard](https://grafana.com/grafana/dashboards/10991-rabbitmq-overview/) *Currently used* The official dashboard doesn't consume the data from rabbitmq exporter, but consumes the data directly from rabbitmq server. It requires `rabbitmq-prometheus` to be enabled, a built-in plugin since [RabbitMQ v3.8.0](https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.8.0). The plugin is enabled by default in most of the cases. ```bash # Enable the plugin # Run the command within a rabbitmq container rabbitmq-plugins enable rabbitmq_prometheus ``` So has to configure the `prometheus` to let it scrape data from rabbitmq server. Thus, a `ServiceMonitor` is going to be need (if Prometheus Operator is being used, which is just my case). The `ServiceMonitor` can be installed through the following command. See the [official link](https://www.rabbitmq.com/kubernetes/operator/operator-monitoring.html) for details. ```bash curl -O https://raw.githubusercontent.com/rabbitmq/cluster-operator/main/observability/prometheus/monitors/rabbitmq-servicemonitor.yml # Edit the manifest file, see the diff below kubectl -n monitoring apply -f rabbitmq-servicemonitor.yml ``` ```diff --- rabbitmq-servicemonitor.yml.orig 2023-08-08 15:58:22.505454451 -0700 +++ rabbitmq-servicemonitor.yml 2023-08-08 16:02:14.368766617 -0700 @@ -3,7 +3,8 @@ kind: ServiceMonitor metadata: name: rabbitmq - # If labels are defined in spec.serviceMonitorSelector.matchLabels of your deployed Prometheus object, make sure to include them here. + labels: + release: prometheus-stack spec: endpoints: - port: prometheus @@ -45,4 +46,5 @@ matchLabels: app.kubernetes.io/component: rabbitmq namespaceSelector: - any: true + matchNames: + - rabbitmq ``` * **Note:** The label `release: prometheus-stack` is very important. Until the label is added, the `ServiceMonitor` won't work. Why? Run the following command to check the spec the CR `Prometheus` ```bash kubectl -n monitoring get prometheus prometheus-stack-kube-prom-prometheus -o yaml ``` And note the part as follows, ```yaml spec: serviceMonitorNamespaceSelector: {} serviceMonitorSelector: matchLabels: release: prometheus-stack ``` It means the Prometheus Operator will keep watching the `ServiceMonitor` in any namespaces and only pay attention to those having label `release: prometheus-stack`. ## ElasticSearch Exporter ### Installation ```bash helm pull prometheus-elasticsearch-exporter --repo https://prometheus-community.github.io/helm-charts helm install elasticsearch-exporter prometheus-elasticsearch-exporter-5.2.0.tgz --namespace opensearch -f elasticsearch-exporter.yaml ``` ### Values file `elasticsearch-exporter.yaml` ```yaml env: ES_USERNAME: ml-elastic extraEnvSecrets: ES_PASSWORD: secret: opensearch-extra-admin-password key: password es: uri: https://ml-os:9200 sslSkipVerify: true all: true indices: true indices_settings: true indices_mappings: true aliases: false shards: true snapshots: true cluster_settings: false slm: false data_stream: false timeout: 30s serviceMonitor: enabled: true namespace: monitoring labels: release: prometheus-stack interval: 30s scrapeTimeout: 10s ``` ### Grafana Dashboards #### [ElasticSearch](https://grafana.com/grafana/dashboards/6483-elasticsearch/) *Currently used* Bugs - For "cluster health", it displays "N/A" #### [Elasticsearch Exporter Quickstart and Dashboard](https://grafana.com/grafana/dashboards/14191-elasticsearch-overview/) *Currently used* Similar to the upper one and with more recent update #### [Elasticsearch - Index Stats](https://grafana.com/grafana/dashboards/13072-elasticsearch-index-stats/) *Currently used* #### [Elasticsearch Cluster - Indices](https://grafana.com/grafana/dashboards/3598-elasticsearch-cluster-indices/) > This dashboard monitors a cluster using the data collected through the x-pack monitoring collector. ??? I haven't figured out how to add x-pack data source for this dashboard. ## OpenCTI Server Two kinds of Metrics - General NodeJS metrics There are plenty of third-party Grafana Dashboards (see the [Grafana Dashboards](#Grafana-Dashboards3) below). - OpenCTI specific metrics There is no any third-party Grafana Dashboards. Have to create on my own. Access `<http://<opencti-server-exporter>:14269/metrics` to find out what OpenCTI specific metrics there are. ### How a NodeJS Application exposes its metrics The NodeJS metrics works this way ```mermaid graph LR; a("NodeJS Application<br/>(Also a Prometheus Exporter)") --> |scraped by| b(Prometheus) --> |fetched by| c(Grafana) --> |displayed in| d(A Dashboard) ``` I.e., it works as long as the Nodejs Applications make themselves the Prometheus Exporter. AFAIK, there are two ways for a NodeJS Application to be a Prometheus Exporter, one is to use [Prom-Client](https://www.npmjs.com/package/prom-client), another is to use [Express Prometheus Middleware](https://www.npmjs.com/package/express-prometheus-middleware). ### Enable Prometheus Exporter of OpenCTI Server Via configuration file ```json "app": { "telemetry": { "metrics": { "enabled": true, "exporter_prometheus": 14269 } } } ``` Via environment variables ```bash APP_TELEMETRY__METRICS__ENABLED="true" APP_TELEMETRY__METRICS__EXPORTER_PROMETHEUS="14269" ``` ### Kubernetes Resources #### The changes of `Service` *Used to open the exporter port for Prometheus* Example ```yaml apiVersion: v1 kind: Service metadata: ... spec: ports: - name: prometheus port: 14269 targetPort: 14269 ``` The `prometheus` port can be added in the existing OpenCTI `Service`. Or create a new `Service` for the exporter only, if the existing OpenCTI `Service` is a public-facing service (e.g., of type `NodePort` and `LoadBalancer`), and you don't want to expose the `prometheus` port to the public. #### A new `ServiceMonitor` *Used to configure Prometheus to scrape data from OpenCTI* Example ```yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: app: opencti-server app.kubernetes.io/component: opencti-server prometheus-exporter: opencti-server release: prometheus-stack name: opencti-prometheus-exporter-opencti-server namespace: monitoring spec: endpoints: - honorLabels: true port: prometheus jobLabel: opencti namespaceSelector: matchNames: - opencti selector: matchLabels: app: opencti-server app.kubernetes.io/component: opencti-server prometheus-exporter: opencti-server ``` ### Grafana Dashboards #### [Node.js and Express Metrics](https://grafana.com/grafana/dashboards/14565-node-js-dashboard/) *Currently Used* Work with the Nodejs Applications that uses [Express Prometheus Middleware](https://www.npmjs.com/package/express-prometheus-middleware) to turn themselves into the Prometheus Exporter. Some visual panel doesn't work (has no data to display), because the current OpenCTI is not using [Express Prometheus Middleware](https://www.npmjs.com/package/express-prometheus-middleware) any more. (It was using it, here is the [evidence](https://github.com/OpenCTI-Platform/opencti/pull/1598/files), take look at the diff of file `package.json`) #### [NodeJS Application Dashboard](https://grafana.com/grafana/dashboards/11159-nodejs-application-dashboard/) *Currently Used* *The most downloaded nodejs dashboards* Work with the Nodejs Applications that uses [Prom-Client](https://www.npmjs.com/package/prom-client) to turn themselves into the Prometheus Exporter. ## OpenCTI Worker ### Enable Prometheus Exporter of OpenCTI Worker Via environment variable ```bash WORKER_TELEMETRY_ENABLED="TRUE" WORKER_PROMETHEUS_TELEMETRY_PORT="14270" ``` ### Kubernetes Resources #### A new `Service` ```yaml apiVersion: v1 kind: Service metadata: labels: app: opencti-worker app.kubernetes.io/component: opencti-worker prometheus-exporter: opencti-worker name: opencti-prometheus-exporter-opencti-worker namespace: opencti spec: ports: - name: prometheus port: 14270 protocol: TCP targetPort: 14270 selector: app: opencti-worker app.kubernetes.io/component: opencti-worker app.kubernetes.io/instance: opencti app.kubernetes.io/name: opencti ``` #### A new `ServiceMonitor` ```yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: app: opencti-worker app.kubernetes.io/component: opencti-worker prometheus-exporter: opencti-worker release: prometheus-stack name: opencti-prometheus-exporter-opencti-worker namespace: monitoring spec: endpoints: - honorLabels: true port: prometheus jobLabel: opencti namespaceSelector: matchNames: - opencti selector: matchLabels: app: opencti-worker app.kubernetes.io/component: opencti-worker prometheus-exporter: opencti-worker ``` ### Grafana Dashboards No any predefined (third-party of official) Grafana Dashboards available. Have to create on my own. Access `http://<opencti-worker-exporter-service>:14270/metrics` to find out what metrics are exposed.