Prometheus Midware

# Prometheus Midware ## github link [midware prometheus](https://github.com/s3cc0mp/midware_prometheus/tree/main) ## Hierarchy and Architecture modules and contents: ``` - midware_prometheus/ - daemonize.py - prometheus_main.py - prometheus.py - config.json ``` * **`daemonize.py`**: This module assists user to run prometheus middleware as a daemon. * **`prometheus_main.py`**: This module contains the user interface of prometheus middlware. * **`prometheus.py`**: This module contains the main control logic of the entire prometheus middleware, including grabbing data from the prometheus server, formatting scraped data, outputting the data into a csv file, etc. * **`config.json`**: This is the main configuration file of prometheus middleware. ## Usage ```bash $ python3 prometheus_main.py [daemon|restart|stop|start] ``` Prometheus midware can be run in both background and foreground. ### Foreground * `start`: start argument runs the midware normally in the foreground. ```bash $ python3 prometheus_main.py start Starting... Daemonize_off ... ``` ### Background * `daemon`: Daemon arguemnt input would daemonize the process using daemonize module implemented by 鎮寧. The midware will be detached from the current terminal, then run in the background. * `stop`: Stop argument kills the background prometheus middleware process. ```bash $ python3 zabbix_main.py daemon Starting... $ python3 zabbix_main.py stop Stopping... Daemon_has_stoped ``` ## Configuration File Every settings related to prometheus middleware can be configured in config.json. Below is a sample `config.json`: ```json= { "out_Dir": "test", "url": "http://dev.k8s:31390", "configs": [ { "ip": "172.16.1.99", "exporter": "kubelet", "probe": "kubernetes_probe", "metrics": { "kubernetes_cpu_usage_sum": "sum(rate(container_cpu_usage_seconds_total{container!=\"POD\",pod!=\"\"}[3m]))", "kubernetes_cpu_usage_request": "sum(kube_pod_container_resource_requests_cpu_cores)", "kubernetes_memory_usage_sum": "sum(rate(container_memory_usage_bytes{container!=\"POD\",pod!=\"\"}[3m]))", "kubernetes_memory_usage_request": "sum(kube_pod_container_resource_requests_memory_bytes)", "kubernetes_network_transmit_bytes_total": "sum(rate(container_network_transmit_bytes_total{container!=\"POD\"}[3m]))", "kubernetes_network_receive_bytes_total": "sum(rate(container_network_receive_bytes_total{container!=\"POD\"}[3m]))", "kubernetes_container_restart_total": "sum(kube_pod_container_status_restarts_total)" }, "write_metrics": [ ] }, { "ip": "172.16.1.99", "exporter": "kubelet", "probe": "kubernetes_container_probe", "metrics": { "kubernetes_container_cpu_usage": "rate(container_cpu_usage_seconds_total{container!=\"POD\",pod!=\"\"}[3m])", "kubernetes_container_memory_usage": "rate(container_memory_usage_bytes{container!=\"POD\",pod!=\"\"}[3m])" }, "write_metrics": [ "namespace", "pod", "container" ] }, { "ip": "172.16.1.99", "exporter": "kubelet", "probe": "kubernetes_pod_probe", "metrics": { "kubernetes_pod_cpu_usage": "sum(rate(container_cpu_usage_seconds_total{container!=\"POD\",pod!=\"\"}[3m])) by (pod)", "kubernetes_pod_memory_usage": "sum(rate(container_memory_usage_bytes{container!=\"POD\",pod!=\"\"}[3m])) by (pod)" }, "write_metrics": [ "pod" ] }, { "ip": "172.16.1.99", "exporter": "kubelet", "probe": "kubernetes_node_probe", "metrics": { "kubernetes_node_allocable_pods": "kube_node_status_allocatable_pods", "kubernetes_node_allocable_cpu_core": "kube_node_status_allocatable_cpu_cores", "kubernetes_node_allocable_memory": "kube_node_status_allocatable_memory_bytes" }, "write_metrics": [ "node" ] }, { "ip": "172.16.1.99", "exporter": "kubelet", "probe": "kubernetes_namespace_probe", "metrics": { "kubernetes_namespace_cpu_usage": "sum(rate(container_cpu_usage_seconds_total{container!=\"POD\",namespace!=\"\"}[3m])) by (namespace)", "kubernetes_namespace_memory_usage": "sum(rate(container_memory_usage_bytes{container!=\"POD\",namespace!=\"\"}[3m])) by (namespace)" }, "write_metrics": [ "namespace" ] } ] } ``` * **`out_Dir`**: This field specifies the output directory of the csv files. * **`url`**: Specifies the url of your prometheus server. * **`configs`**: This field is an array that contains the setting of different probes. Each element in this array is an object that contains the following fields: * **`ip`**: The IP address of your target. * **`exporter`**: The exporter that the probe uses. * **`probe`**: The name of the probe. * **`metrics`**: An array of metrics that you want to use in the probe. The metrics must be a *PromQL* expression. ([More information for PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/)) * **`write_metrics`**: An array of metric labels that you want write to your csv file. ## Output CSV The name format of the csv file is shown below: | **Format** | ***probe name*** | @ | ***date*** | . | csv | | - | -| - | - | - | - | | **Example 1** | kubernetes_probe | @ | 20201204_16_08 | . | csv | | **Example 2** | kubernetes_node_probe | @ | 20201204_16_10 | . | csv | :::info :memo: Date format is `"%Y%m%d_%H_%M"` ::: The format of the content is shown below: | **Format** | ***value*** | ***subprobe name*** | ***subfields*** | ***subfields*** | ... | ***target IP*** | ***date*** | | - | -| - | - | - | - | - | - | | **Example 1** | 0.010182706464034987 | kubernetes_container_cpu_usage | kube-system (namespace) | calico-node (container) | calico-node-2pnlr (pod) | 172.16.1.99 | 20201204_16:08:17 | | **Example 2** | 110 | kubernetes_node_allocable_pods | 1.dev.k8s (node) | | | 172.16.1.99 | 20201204_16:10:13 | :::info :memo: Date format is `"%Y%m%d_%H:%M:%S"` ::: ### Sample Below is a sample output csv file `kubernetes_probe@20201204_16_08.csv`: ```csv 23.3665936816224,kubernetes_cpu_usage_sum,172.16.1.99,20201204_16:08:17 1.65,kubernetes_cpu_usage_request,172.16.1.99,20201204_16:08:17 16913905.219381742,kubernetes_memory_usage_sum,172.16.1.99,20201204_16:08:17 817889280,kubernetes_memory_usage_request,172.16.1.99,20201204_16:08:17 54819.22700615068,kubernetes_network_transmit_bytes_total,172.16.1.99,20201204_16:08:17 49244.03774023048,kubernetes_network_receive_bytes_total,172.16.1.99,20201204_16:08:17 11660,kubernetes_container_restart_total,172.16.1.99,20201204_16:08:17 ``` ## Kubernetes Exporter ### kubelet metrics * Provides metrics via *cAdvisor*. * Provides container-level metrics such as resource usage from running containers. ### kube-state-metrics * A simple service that listens to the Kubernetes API server and generates metrics about the state of Kubernetes. * It focuses on the state of the various objects *inside Kubernetes*, such as metrics based on pod, deployments, replica sets, etc. ### apiserver metrics * Provides metrics via *kube-apiserver* * Provides cluster level metrics that monitors *noncontainerized* workloads, such as load-balanced cluster services, client certificates, and so on. ## Kuberentes Metrics The following is the metric I used in this project： * **`container_cpu_usage_seconds_total`** * **Exporter**: kubelet * **Description**: The current cumulative CPU usage time of the container * **`container_memory_usage_bytes`** * **Exporter**: kubelet * **Description**: The current cumulative memory usage (in bytes) * **`container_network_transmit_bytes_total`** * **Exporter**: kubelet * **Description**: The cumulative amount of data transmitted in the container network * **`container_network_receive_bytes_total`** * **Exporter**: kubelet * **Description**: The cumulative amount of data received in the container network * **`kube_pod_container_resource_requests_cpu_cores`** * **Exporter**: kube-state-metrics * **Description**: The number of CPU cores currently required by the Pod * **`kube_pod_container_status_restarts_total`** * **Exporter**: kube-state-metrics * **Description**: Cumulative number of Pods that restarts * **`kube_pod_container_resource_requests_memory_bytes`** * **Exporter**: kube-state-metrics * **Description**: The number of memory (in bytes) currently required by the Pod * **`kube_node_status_allocatable_cpu_cores`** * **Exporter**: kube-state-metrics * **Description**: CPU resources currently provided by Node * **`kube_node_status_allocatable_memory_bytes`** * **Exporter**: kube-state-metrics * **Description**: Memory resources currently provided by Node * **`kube_node_status_allocatable_pods`** * **Exporter**: kube-state-metrics * **Description**: Number of pods currently provided by Node * **`apiserver_request_total`** * **Exporter**: kube-apiserver * **Description**: Monitor the source requests, destination request, and whether the request were successful. ### Kubernetes Probe * **`kubernetes_cpu_usage_sum`** * **Metric**: `sum(rate(container_cpu_usage_seconds_total{container!="POD",pod!=""}[3m]))` * **Description**: Collect the cumulative CPU usage time of the entire Kubernetes in the past 3 minutes. * **`kubernetes_memory_usage_sum`** * **Metric**: `sum(rate(container_memory_usage_bytes{container!="POD",pod!=""}[3m]))` * **Description**: Collect the cumulative memory usage of the entire Kubernetes in the past 3 minutes. * **`kubernetes_cpu_usage_request`** * Metric: `sum(kube_pod_container_resource_requests_cpu_cores)` * **Description**: Collect the CPU cores required and used by Pods in the entire Kubernetes. * **`kubernetes_memory_usage_request`** * Metric: `sum(kube_pod_container_resource_requests_memory_bytes)` * **Description**: Collect the memory usage required and used by Pods in the entire Kubernetes. * **`kubernetes_network_transmit_bytes_total`** * **Metric**: `sum(rate(container_network_transmit_bytes_total{container!="POD"}[3m]))` * **Description**: Collect the cumulative data transmission of the entire Kubernetes in the past 3 minutes. * **`kubernetes_network_receive_bytes_total`** * **Metric**: `sum(rate(container_network_receive_bytes_total{container!="POD"}[3m]))` * **Description**: Collect the cumulative received data traffic of the entire Kubernetes in the past 3 minutes. * **`kubernetes_container_restart_total`** * **Metric**: `sum(kube_pod_container_status_restarts_total)` * **Description**: Collect the cumulative number of Pod restarts in the entire Kubernetes. ### Kubernetes Container Probe * **`kubernetes_container_cpu_usage`** * **Metric**: `rate(container_cpu_usage_seconds_total{container!="POD",pod!=""}[3m])` * **Description**: Collect the cumulative CPU usage time of each container in the past 3 minutes. * **`kubernetes_container_memory_usage`** * **Metric**: `rate(container_memory_usage_bytes{container!="POD",pod!=""}[3m])` * **Description**: Collect the cumulative memory usage of each container in the past 3 minutes. ### Kubernetes Pod Probe * **`kubernetes_pod_cpu_usage`** * **Metric**: `sum(rate(container_cpu_usage_seconds_total{container!="POD",pod!=""}[3m])) by (pod)` * **Description**: Collect the cumulative CPU usage time of different Pods in the past 3 minutes. * **`kubernetes_pod_memory_usage`** * **Metric**: `sum(rate(container_memory_usage_bytes{container!="POD",pod!=""}[3m])) by (pod)` * **Description**: Collect the cumulative memory usage of different Pods in the past 3 minutes. ### Kubernetes Node Probe * **`kubernetes_node_allocable_pods`** * **Metric**: `kube_node_status_allocatable_pods` * **Description**: Collect the pod resources currently provided by each Node. * **`kubernetes_node_allocable_cpu_core`** * **Metric**: `kube_node_status_allocatable_cpu_cores` * **Description**: Collect the CPU resource usage currently provided by each Node. * **`kubernetes_node_allocable_memory`** * **Metric**: `kube_node_status_allocatable_memory_bytes` * **Description**: Collect the number of memory bytes currently provided by each Node. ### Kubernetes Apiserver Probe * **`kubernetes_apiserver_success_requests`** * **Metric**: `sum(rate(apiserver_request_total{code=~"2.."}[3m]))` * **Description**: Collect all the successful requests from kube-apiserver. * **`kubernetes_apiserver_failed_requests`** * **Metric**: `sum(rate(apiserver_request_total{code=~"[45].."}[3m]))` * **Description**: Collect all the failed requests from kube-apiserver. ## Reference 1. [Prometheus documentation](https://prometheus.io/docs/introduction/overview/) 2. [Kubernetes in Production: The Ultimate Guide to Monitoring Resource Metrics with Prometheus](https://www.replex.io/blog/kubernetes-in-production-the-ultimate-guide-to-monitoring-resource-metrics) 3. [Metrics used in Alamada](https://github.com/containers-ai/alameda/blob/master/docs/metrics_used_in_Alameda.md)