Monitoring and Alert

--- title: Monitoring and Alert tags: SRE image: --- <style> .prometheus-color { color: #E8512C; } .grafana-color { color: #FBD123; } </style> # Monitoring and Alert :::info ==**Benifits**== * **Continuous Monitoring & Observability** * Performance Monitoring & Health * Failure Detection & Prevention * **Continuous Feedbacks** * Alerts & Notifications * Ensure SLAs & enhance user-experience ::: [toc] ![](https://i.imgur.com/NtIPlyl.png) # Prometheus server :::info * [Prometheus](https://prometheus.io/docs/introduction/overview/) is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. * Prometheus is not an event logging system. If you want to extract Prometheus metrics from application logs, [Grafana Loki](https://grafana.com/oss/loki/) is designed for log aggregation. ::: **Why we need Prometheus?** ![](https://i.imgur.com/Rs4vXjf.png) > Installation [Download](https://prometheus.io/download/) ``` ... ... $ prometheus --version ``` ## Prometheus web UI ## Grafana :::info [Grafana](https://grafana.com/) is the open source analytics & monitoring solution for every database. ::: ``` curl https://raw.githubusercontent.com/grafana/grafana/main/latest.json | jq -r ".stable" ``` ## Jobs / exporters * Center |Name|Service name|Port|Alert| |--|--|--|--| |Prometheus|prometheus|9090|-| |Node Exporter|node_exporter|9100|Host High CPU Load Host Out of Memory Node disk will fill in 24 hours 設備無回應| |AlertManager|alertmanager|9093, 9094|-| |AlertManagerDiscord|prometheus-alertmanger-discord|9095|-| |Ping Exporter|prometheus-ping-exporter|9427|Network Disconnect| |Mosquitto Exporter|prometheus-mosquitto-exporter|9234|| |Systemd Exporter|prometheus-systemd-exporter|9558|| |Grafana|grafana-server|3000|| * Gateway / Media Player |Name|Port| |--|--| |Node Exporter|9100| ## Ping Exporter ### Alert List * GW 1 - 4 * MP 1 - 4 * MQTT Bridge ## Alertmanager :::info The Alertmanager handles alerts sent by client applications such as the Prometheus server. ::: * Config - `/etc/prometheus/alertmanager/alertmanager.yml` ### Discord * Config * Debian - `/etc/systemd/system/prometheus-alertmanager-discord.service` * Ubuntu - `/lib/systemd/system/prometheus-alertmanager-discord.service` ## Glossary * ``Metrics``: Metrics are measurements and records in numbers with a timestamp * ``SLA``: Service Level Agreement * ``Targets``: Target is what is to monitor * ## References * https://hackmd.io/GJtTdYBsQQuBlmyqCeLO9Q * [Prometheus Intro](https://prometheus.io/docs/introduction/overview/) * [Prometheus Architecture Explained](https://scoutapm.com/blog/prometheus-architecture) * [Can Prometheus be made highly available?](https://prometheus.io/docs/introduction/faq/#can-prometheus-be-made-highly-available) * [Why do you pull rather than push?](https://prometheus.io/docs/introduction/faq/#why-do-you-pull-rather-than-push) * [Timezone support in the display layer #500](https://github.com/prometheus/prometheus/issues/500)