Prometheus is an pen-source systems monitoring and alerting toolkit. It is written in Go, licensed under the Apache 2 License
Prometheus collects metrics from configured targets at given intervals, and store the time series data in its database.
Prometheus runs rules over the collected data to aggregate these data or generate alerts.
Several dashboards are available for administrator to visualize the collected data.
Features
Prometheus stores all data as time series, which can be identified by metric names.
Prometheus provides a query language called PromQL that allows the user to query and aggregate time series data.
Prometheus collects data via HTTP PULL method. Alternatively, pushing mechanism is supported through push gateway.
Prometheus can trigger alerts if certain condition is observed to be true.
Outline
Introduction
Architecture
Components
Prometheus Server
Exporters
Push Gateway
Alertmanager
Consoles and Dashboards
Service Discovery
Pros and Cons
Outline
Introduction
Architecture
Components
Prometheus Server
Exporters
Push Gateway
Alertmanager
Consoles and Dashboards
Service Discovery
Outline
Introduction
Architecture
Components
Prometheus Server
Exporters
Push Gateway
Alertmanager
Consoles and Dashboards
Service Discovery
Pros and Cons
Prometheus Server
The Prometheus Server retrieves data from monitored target, stores time series data into the database, and provide interface for users to query the database.
Generically, consists of three components:
Time Series Database (TSDB)
HTTP Server
Prometheus Quering Language (PromQL)
TSDB
Prometheus server consists of a Time Series database (TSDB). A TSDB is a database optimized for handling time series data.
Prometheus stores all data by time series. Every time series is uniquely identified by its metric name and optional key-value pairs called labels.
Metrics
For example, a time series with the metric name prometheus_http_requests_total (which indicates the number of accumulated http requests to Prometheus Server), and the labels method="POST" (which specifies the number of POST requests) could be written like the following:
Counter: Metrics that can be accumulated, such as the number of occurrences of an HTTP Get requests.
Gauge: Any change metric that is instantaneous and independent of time, such as memory usage.
Histogram: Mainly used to represent data sampling within a period of time.
Summary: Similar to Histogram, it is used to represent the summary of data sampling in a time range.
PromQL
PromQL (Prometheus Query Language) is a quering language provided by Prometheus that allows the user to select, examine, and aggregate time series data.
Exporters are used to expose metrics of third-party services to Prometheus Server. The Exporters are installed on the monitored device.
Exporters will expose an http endpoint for Prometheus server to retrieve metrics. Prometheus mainly uses HTTP PULL method to collect metrices. It retrieves metrics by periodically pulling metrics from the monitored target's http endpoints.
Exporters is written using Prometheus Client Libraries. The library supports many differnt languages. The client library provides an API that can sends the metrics back to the server when Prometheus scrapes the target's HTTP endpoint.
Exporter
Node exporter is one of the most common official exporter. It exposes some hardware and OS metrics of UNIX kernels. For example: CPU usage, memory statics, disk I/O statistics, network statistics, and so on. (Node Exporter Github Page)
Mysql server exporter is another common official exporter. It allows us to monitor, measure database performance, examine resource utilization, and so on. (MySQL Exporter Github Page)
If no existing exporters meet our need, we can write our own exporter using Prometheus Client Libraries.
Outline
Introduction
Architecture
Components
Prometheus Server
Exporters
Push Gateway
Alertmanager
Consoles and Dashboards
Service Discovery
Pros and Cons
Pushgateway
Occasionally, we might need to monitor components which cannot be scraped. In this case, the Pushgateway is used to tackle the problem. These metrices will be pushed onto the Pushgateway first, then Prometheus will periodically pull the metrics from the Pushgateway.
In the official documentation, it states that "Usually, the only valid use case for the Pushgateway is for capturing the outcome of a service-level batch job". An example of "service-level" batch job is deleting a number of users for an entire service. is a discrete job which is not related to a specific machine.
In conclusion, the Pushgateway is seldomly used.
Outline
Introduction
Architecture
Components
Prometheus Server
Exporters
Push Gateway
Alertmanager
Consoles and Dashboards
Service Discovery
Pros and Cons
Alert Manager
By defining alarm rule in Prometheus' configuration file, Prometheus will periodically calculate the alarm rule. If it meets the alarm trigger conditions, it will push an alarm to the Alertmanager.
The Alertmanager can further inform the administrator some abnormal events via email, Pagerduty, etc.
Outline
Introduction
Architecture
Components
Prometheus Server
Exporters
Push Gateway
Alertmanager
Consoles and Dashboards
Service Discovery
Pros and Cons
Expression Browser
The expression browser is available at /graph on the Prometheus server. It allowing us to enter any PromQL query and see its result in a table or a graph.
Grafana
Grafana is a universal visualization tool suitable for visualizing and displaying data stored in different databases including Prometheus.
Console Template
Prometheus consists of a simple built-in console template that allows users to create any console interface.
In cloud environment, there is no fixed monitoring target, and nearly every monitored object in the cloud changes dynamically. Thus, we cannot statically monitor every device in the cloud.
The solution to the above problem is introducing an intermediate agent. This agent has access to all current monitored targets.
Prometheus only needs to ask the agent what monitoring targets there are. Such mechanism is called service discovery.
Example
In some cloud environments like AWS, Prometheus has the ability to find all cloud hosts that need to be monitored by using the API provided by the platform.
In Kubernetes, The master node manages all nodes information, Thus, Prometheus only need to interact with the master node to find all the containers and service objects that need to be monitored.
Introduction
Architecture
Components
Prometheus Server
Exporters
Push Gateway
Alertmanager
Consoles and Dashboards
Service Discovery
Pros and Cons
Pros
Prometheus is mainly used for event monitoring and event alerting. It works prticularly well for recording purely time series data.
Prometheus fits well in monitoring dynamic service-oriented cloud environments such as Kubernetes.
Prometheus has higher reliability since Prometheus server is a standalone monitoring system, ane it does not depending on network storage or other remote services.
Cons
Prometheus does not offer durable long-term storage. The data storage of Prometheus is ephemeral since is mainly used for event monitoring and alerting.
Prometheus does not support logging. Prometheus is designed to collect and process metrics, not an event logging system.
Concusion
In our project, Elastic Stack can be used to perform long-term data storage, monitoring, and data retrieval, while Prometheus can be used to perform short-term event monitoring and alerting.
Since Prometheus works well in monitoring cloud enviroment, it can be deployed into our Kubernetes and perform monitoring on the entire opKubernetes.