devops
research
tutorials
Note
For purpose monitoring and observability the Kubernetes Cluster, nowaday you have many options to handle these configuration, such as
Each Tools have different uses like
Most of components on MnO system are open-source project, they provide helm-chart
that help you easily set up for your cluster. You can check about them on
Tip
You have multiple ways to apply this chart into your cluster, in the good condition, you can use helm
with terraform
through helm-release
provider that truly insane
For example, you can configure loki
like
loki/main.tf
After apply terraform, your component of MnO will release to namespace=monitoring
You can handle with the same idea with Grafana
, Prometheus
, Promtail
, Tempo
and Pyroscope
kube_stack_prometheus/main.tf
promtail/main.tf
pyroscope/main.tf
tempo/main.tf
If you want dive into Grafana
, you will face up lots of topic to learn, including
Note
Therefore, It's making Grafana to become a good option when choose the MnO whole system for free. You can use Grafana Cloud
for enterprise or managed from Azure/AWS Cloud, but it's up to you :+1:
If you want to setup full dashboard for AKS, you can concern a bit with these dashboard to implementing inside your cluster
Grafana
provide datasource postgres and gain permit to query performance metrics from PostgreSQL
so you can leverage and create a couple dashboard for yourself. Explore at Integration Performance Query for MySQL or PostgreSQLPrometheus
permit to use exporter, so you can install exporter to expose metric from MongoDB
cluster and Prometheus
can scrape and you visualise it into Grafana
. Explore at mongodb_exporterMongoDB
, you can install exporter for RabbitMQ
. Explore at Monitoring with Prometheus and GrafanaMongoDB
, RabbitMQ
, you can install exporter for Elasticsearch
. Explore at elasticsearch_exporterRedis
. Explore at redis_exporterIf you want dive into Alert
system with Grafana, don't forget to check it out my blog Deploy your alert with Grafana by Terraform and some common error with K8s
Error : Occur when your storage has problems with any components
Note
All components in MnO system are already store data to azure-disk with name contain the service
Troubleshoot : Check on dashboard AKS portal about component, Does component attach with azure-disk ? Does the azure-disk exist or not for service ? Can attach that for your service via value helm-chart
?
Error : Prometheus is restart or not running in currently
Troubleshoot : Wait 30s for Prometheus restart and query dashboard again. If not, refer to kubectl
command to check status of Prometheus
Error : Occur when using wrong queries or datasource have not response.
Troubleshoot : Check queries again, if not, please use kubectl
to look up what happen with datasource you want to search (Ex: Loki, Tempo,…)
Error : Error when set time queries range to large or log storage in this time range out of range.
Troubleshoot : Reduce size of time range and choose specifically to increase exact log you want to check.
Error : Occur when exporter is restart, failure or DNS scraping not working
Troubleshoot:
pod
for restart it if state is failurehelm-chart
, DNS of exporter matching prefix <name-of-service>.<namespace>.svc:<port-service>
NULL
informationError : Occur when your alert is spam with over length of query loki
Troubleshoot :
silent
mode for spam alertSilent
alert and ignore themNote
On this topic, you will understand about