Prometheus - HackMD

# Prometheus ## Introduction ## Architecture Overview ![](https://i.imgur.com/V4gL274.png) ### Prometheus Server #### TSDB Prometheus consists of a **TSDB** (time series database). TSDB is a database optimized for handling time series data. Specifically, Prometheus stores value that belong to the same metric by time series. Each value consists of three parts: metric, value and timestamp (in ms). The **metric name** specifies the feature of a system that is measured. For instance, the ```http_requests_total``` metric aggregate the total number of HTTP requests received. The **label** is used to identify different dimensions of the same time series. For example, ```prometheus_http_request_total{method="Get"}``` indicates the number of all HTTP Get Requests, so ```prometheus_http_request_total{method="Post"}``` is another new metric that accumulates the number of Post Requests The **timestamp** is the actual time series stored using 64 bit float value in millisecond. The Prometheus Client library supports four metric types: 1. Counter: Metrics that can be accumulated, such as the number of occurrences of an HTTP Get requests. 2. Gauge: Any change metric that is instantaneous and independent of time, such as memory usage. 3. Histogram: Mainly used to represent data sampling within a period of time. 4. Summary: Similar to Histogram, it is used to represent the summary of data sampling in a time range. #### PromQL **PromQL** (Prometheus Query Language) is a quering language provided by Prometheus that allows the user to select, examine, and aggregate time series data. #### Retrieval Prometheus mainly uses HTTP PULL method to collect metrices. But you can also push data through **Push Gateway** (not commonly used). ### HTTP Server Prometheus server provides an HTTP API. It allows us to query the database via the API. Example query: ``` up ``` We can use wget or curl to query the Prometheus server: ```bash curl 'http://localhost:9090/api/v1/query?query=up ``` The Prometheus server will then return the result in JSON format: ```json { "status" : "success", "data" : { "resultType" : "vector", "result" : [ { "metric" : { "__name__" : "up", "job" : "prometheus", "instance" : "localhost:9090" }, "value": [ 1435781451.781, "1" ] }, { "metric" : { "__name__" : "up", "job" : "node", "instance" : "localhost:9100" }, "value" : [ 1435781451.781, "0" ] } ] } } ``` ### Jobs/exporters Jobs/exporters are used to expose metrics of third-party services to Prometheus Server once the exporters are installed on the monitored. Prometheus retrieves metrics by periodically collecting metrics from the monitored target's http endpoints. For instance, the Node Exporter exposes hardware and OS metrics exposed by \*NIX kernels. #### Exporter Sample code ```go // cpuCollector struct type cpuCollector struct { fs procfs.FS cpu *prometheus.Desc cpuInfo *prometheus.Desc cpuGuest *prometheus.Desc cpuCoreThrottle *prometheus.Desc cpuPackageThrottle *prometheus.Desc logger log.Logger cpuStats []procfs.CPUStat cpuStatsMutex sync.Mutex } // the collector function that returns cpuCollector struct func NewCPUCollector(logger log.Logger) (Collector, error) { fs, err := procfs.NewFS(*procPath) if err != nil { return nil, fmt.Errorf("failed to open procfs: %w", err) } return &cpuCollector{ fs: fs, cpu: prometheus.NewDesc( prometheus.BuildFQName(namespace, cpuCollectorSubsystem, "seconds_total"), "Seconds the cpus spent in each mode.", []string{"cpu", "mode"}, nil, ), cpuInfo: prometheus.NewDesc( prometheus.BuildFQName(namespace, cpuCollectorSubsystem, "info"), "CPU information from /proc/cpuinfo.", []string{"package", "core", "cpu", "vendor", "family", "model", "model_name", "microcode", "stepping", "cachesize"}, nil, ), cpuGuest: prometheus.NewDesc( prometheus.BuildFQName(namespace, cpuCollectorSubsystem, "guest_seconds_total"), "Seconds the cpus spent in guests (VMs) for each mode.", []string{"cpu", "mode"}, nil, ), cpuCoreThrottle: prometheus.NewDesc( prometheus.BuildFQName(namespace, cpuCollectorSubsystem, "core_throttles_total"), "Number of times this cpu core has been throttled.", []string{"package", "core"}, nil, ), cpuPackageThrottle: prometheus.NewDesc( prometheus.BuildFQName(namespace, cpuCollectorSubsystem, "package_throttles_total"), "Number of times this cpu package has been throttled.", []string{"package"}, nil, ), logger: logger, }, nil } // Implementation func (c *cpuCollector) Update(ch chan<- prometheus.Metric) error { if *enableCPUInfo { if err := c.updateInfo(ch); err != nil { return err } } if err := c.updateStat(ch); err != nil { return err } if err := c.updateThermalThrottle(ch); err != nil { return err } return nil } // ... // Register the collector registerCollector("cpu", defaultEnabled, NewCPUCollector) ``` ### Alertmanager By defining alarm rule in Prometheus, Prometheus will periodically calculate the alarm rule. If it meets the alarm trigger conditions, it will push an alarm to the Alertmanager. The Alertmanager can further inform the administrator some abnormal events via email, Pagerduty, etc. (本次實驗不會用到) ### Service Discovery In cloud environment, there is no fixed monitoring target, and nearly every monitored object in the cloud changes dynamically. Thus, Prometheus cannot statically monitor every device in the cloud. For Prometheus, the solution is to introduce an intermediate agent. This agent has access to all current monitored targets. Prometheus only needs to ask the agent what monitoring targets there are. Such mechanism is called **service discovery**. For instance, Kubernetes manages all container and service information. Thus, Prometheus only needs to interact with Kubernetes to find all the containers and service objects that need to be monitored. (參考) ### Push Gateway The Prometheus Pushgateway allows batch jobs to expose their metrics to Prometheus. Since these kinds of jobs may not exist long, they can instead push their metrics to a Pushgateway. Then, the Pushgateway exposes these metrics to Prometheus. (本次實驗不會用到) ### Consoles and Dashboards #### Prometheus Console Template Prometheus consists of a simple built-in Console that allows users to create any console interface through the Go template language, and provides external access paths through Prometheus Server. #### Grafana Grafana is a universal visualization tool suitable for visualizing and displaying data stored in different databases including Prometheus. :::info Default user/password for grafana installed using helm: `` kubectl get secret --namespace default grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo `` ::: ## Demo Demo ## Monitor Kubernetes with Prometheus ###