---
title: Monitoring and Alert
tags: SRE
image:
---
<style>
.prometheus-color {
color: #E8512C;
}
.grafana-color {
color: #FBD123;
}
</style>
# Monitoring and Alert
:::info
==**Benifits**==
* **Continuous Monitoring & Observability**
* Performance Monitoring & Health
* Failure Detection & Prevention
* **Continuous Feedbacks**
* Alerts & Notifications
* Ensure SLAs & enhance user-experience
:::
[toc]

# <span class="prometheus-color">Prometheus server</span>
:::info
* [Prometheus](https://prometheus.io/docs/introduction/overview/) is an open-source systems monitoring and alerting toolkit originally built at SoundCloud.
* Prometheus is not an event logging system. If you want to extract Prometheus metrics from application logs, [Grafana Loki](https://grafana.com/oss/loki/) is designed for log aggregation.
:::
**Why we need Prometheus?**

> Installation [Download](https://prometheus.io/download/)
```
...
...
$ prometheus --version
```
## <span class="prometheus-color">Prometheus web UI</span>
## <span class="grafana-color">Grafana</span>
:::info
[Grafana](https://grafana.com/) is the open source analytics & monitoring solution for every database.
:::
```
curl https://raw.githubusercontent.com/grafana/grafana/main/latest.json | jq -r ".stable"
```
## Jobs / exporters
* Center
|Name|Service name|Port|Alert|
|--|--|--|--|
|Prometheus|prometheus|9090|-|
|Node Exporter|node_exporter|9100|Host High CPU Load <br> Host Out of Memory <br> Node disk will fill in 24 hours <br> 設備無回應|
|AlertManager|alertmanager|9093, 9094|-|
|AlertManagerDiscord|prometheus-alertmanger-discord|9095|-|
|Ping Exporter|prometheus-ping-exporter|9427|Network Disconnect|
|Mosquitto Exporter|prometheus-mosquitto-exporter|9234||
|Systemd Exporter|prometheus-systemd-exporter|9558||
|Grafana|grafana-server|3000||
* Gateway / Media Player
|Name|Port|
|--|--|
|Node Exporter|9100|
## Ping Exporter
### Alert List
* GW 1 - 4
* MP 1 - 4
* MQTT Bridge
## <span class="prometheus-color">Alertmanager</span>
:::info
The Alertmanager handles alerts sent by client applications such as the Prometheus server.
:::
* Config - `/etc/prometheus/alertmanager/alertmanager.yml`
### Discord
* Config
* Debian - `/etc/systemd/system/prometheus-alertmanager-discord.service`
* Ubuntu - `/lib/systemd/system/prometheus-alertmanager-discord.service`
## Glossary
* ``Metrics``: Metrics are measurements and records in numbers with a timestamp
* ``SLA``: Service Level Agreement
* ``Targets``: Target is what is to monitor
*
## References
* https://hackmd.io/GJtTdYBsQQuBlmyqCeLO9Q
* [Prometheus Intro](https://prometheus.io/docs/introduction/overview/)
* [Prometheus Architecture Explained](https://scoutapm.com/blog/prometheus-architecture)
* [Can Prometheus be made highly available?](https://prometheus.io/docs/introduction/faq/#can-prometheus-be-made-highly-available)
* [Why do you pull rather than push?](https://prometheus.io/docs/introduction/faq/#why-do-you-pull-rather-than-push)
* [Timezone support in the display layer #500](https://github.com/prometheus/prometheus/issues/500)