owned this note
owned this note
Published
Linked with GitHub
# Prometheus
## Introduction
## Architecture Overview

### Prometheus Server
#### TSDB
Prometheus consists of a **TSDB** (time series database). TSDB is a database optimized for handling time series data. Specifically, Prometheus stores value that belong to the same metric by time series. Each value consists of three parts: metric, value and timestamp (in ms).
The **metric name** specifies the feature of a system that is measured. For instance, the ```http_requests_total``` metric aggregate the total number of HTTP requests received.
The **label** is used to identify different dimensions of the same time series. For example, ```prometheus_http_request_total{method="Get"}``` indicates the number of all HTTP Get Requests, so ```prometheus_http_request_total{method="Post"}``` is another new metric that accumulates the number of Post Requests
The **timestamp** is the actual time series stored using 64 bit float value in millisecond.
The Prometheus Client library supports four metric types:
1. Counter: Metrics that can be accumulated, such as the number of occurrences of an HTTP Get requests.
2. Gauge: Any change metric that is instantaneous and independent of time, such as memory usage.
3. Histogram: Mainly used to represent data sampling within a period of time.
4. Summary: Similar to Histogram, it is used to represent the summary of data sampling in a time range.
#### PromQL
**PromQL** (Prometheus Query Language) is a quering language provided by Prometheus that allows the user to select, examine, and aggregate time series data.
#### Retrieval
Prometheus mainly uses HTTP PULL method to collect metrices. But you can also push data through **Push Gateway** (not commonly used).
### HTTP Server
Prometheus server provides an HTTP API. It allows us to query the database via the API.
Example query:
```
up
```
We can use wget or curl to query the Prometheus server:
```bash
curl 'http://localhost:9090/api/v1/query?query=up
```
The Prometheus server will then return the result in JSON format:
```json
{
"status" : "success",
"data" : {
"resultType" : "vector",
"result" : [
{
"metric" : {
"__name__" : "up",
"job" : "prometheus",
"instance" : "localhost:9090"
},
"value": [ 1435781451.781, "1" ]
},
{
"metric" : {
"__name__" : "up",
"job" : "node",
"instance" : "localhost:9100"
},
"value" : [ 1435781451.781, "0" ]
}
]
}
}
```
### Jobs/exporters
Jobs/exporters are used to expose metrics of third-party services to Prometheus Server once the exporters are installed on the monitored. Prometheus retrieves metrics by periodically collecting metrics from the monitored target's http endpoints. For instance, the Node Exporter exposes hardware and OS metrics exposed by \*NIX kernels.
#### Exporter Sample code
```go
// cpuCollector struct
type cpuCollector struct {
fs procfs.FS
cpu *prometheus.Desc
cpuInfo *prometheus.Desc
cpuGuest *prometheus.Desc
cpuCoreThrottle *prometheus.Desc
cpuPackageThrottle *prometheus.Desc
logger log.Logger
cpuStats []procfs.CPUStat
cpuStatsMutex sync.Mutex
}
// the collector function that returns cpuCollector struct
func NewCPUCollector(logger log.Logger) (Collector, error) {
fs, err := procfs.NewFS(*procPath)
if err != nil {
return nil, fmt.Errorf("failed to open procfs: %w", err)
}
return &cpuCollector{
fs: fs,
cpu: prometheus.NewDesc(
prometheus.BuildFQName(namespace, cpuCollectorSubsystem, "seconds_total"),
"Seconds the cpus spent in each mode.",
[]string{"cpu", "mode"}, nil,
),
cpuInfo: prometheus.NewDesc(
prometheus.BuildFQName(namespace, cpuCollectorSubsystem, "info"),
"CPU information from /proc/cpuinfo.",
[]string{"package", "core", "cpu", "vendor", "family", "model", "model_name", "microcode", "stepping", "cachesize"}, nil,
),
cpuGuest: prometheus.NewDesc(
prometheus.BuildFQName(namespace, cpuCollectorSubsystem, "guest_seconds_total"),
"Seconds the cpus spent in guests (VMs) for each mode.",
[]string{"cpu", "mode"}, nil,
),
cpuCoreThrottle: prometheus.NewDesc(
prometheus.BuildFQName(namespace, cpuCollectorSubsystem, "core_throttles_total"),
"Number of times this cpu core has been throttled.",
[]string{"package", "core"}, nil,
),
cpuPackageThrottle: prometheus.NewDesc(
prometheus.BuildFQName(namespace, cpuCollectorSubsystem, "package_throttles_total"),
"Number of times this cpu package has been throttled.",
[]string{"package"}, nil,
),
logger: logger,
}, nil
}
// Implementation
func (c *cpuCollector) Update(ch chan<- prometheus.Metric) error {
if *enableCPUInfo {
if err := c.updateInfo(ch); err != nil {
return err
}
}
if err := c.updateStat(ch); err != nil {
return err
}
if err := c.updateThermalThrottle(ch); err != nil {
return err
}
return nil
}
// ...
// Register the collector
registerCollector("cpu", defaultEnabled, NewCPUCollector)
```
### Alertmanager
By defining alarm rule in Prometheus, Prometheus will periodically calculate the alarm rule. If it meets the alarm trigger conditions, it will push an alarm to the Alertmanager. The Alertmanager can further inform the administrator some abnormal events via email, Pagerduty, etc. (本次實驗不會用到)
### Service Discovery
In cloud environment, there is no fixed monitoring target, and nearly every monitored object in the cloud changes dynamically. Thus, Prometheus cannot statically monitor every device in the cloud. For Prometheus, the solution is to introduce an intermediate agent. This agent has access to all current monitored targets. Prometheus only needs to ask the agent what monitoring targets there are. Such mechanism is called **service discovery**.
For instance, Kubernetes manages all container and service information. Thus, Prometheus only needs to interact with Kubernetes to find all the containers and service objects that need to be monitored.
(參考)
### Push Gateway
The Prometheus Pushgateway allows batch jobs to expose their metrics to Prometheus. Since these kinds of jobs may not exist long, they can instead push their metrics to a Pushgateway. Then, the Pushgateway exposes these metrics to Prometheus. (本次實驗不會用到)
### Consoles and Dashboards
#### Prometheus Console Template
Prometheus consists of a simple built-in Console that allows users to create any console interface through the Go template language, and provides external access paths through Prometheus Server.
#### Grafana
Grafana is a universal visualization tool suitable for visualizing and displaying data stored in different databases including Prometheus.
## Demo
Demo
## Monitor Kubernetes with Prometheus
###