# Monitoring with ELK Stack

## Goals
ELK deployment has multiple benefits, but in this document we will be looking at ELK as a tool for operational monitoring of a [12-factor app](https://12factor.net/) (as a goal). The [XI rule](https://12factor.net/logs) of a 12-factor app is to have the logs written, unbuffered, to stdout. During local development, the developer will view this stream in the foreground of their terminal to observe the app’s behavior.
In staging or production deploys, each process’ stream will be captured by the execution environment, collated together with all other streams from the app, and routed to one or more final destinations for viewing and long-term archival. These archival destinations are not visible to or configurable by the app, and instead are completely managed by the execution environment. Open-source log routers (in our case Filebeat) are available for this purpose.
## Approach

The log ingestion & visualization workflow is the following:
1. Application pods in kubernetes log to stdout
2. Filebeats collects the logs and sends them to Logstash
3. Logstash indexes the logs and sends them to Elasticsearch
4. Elasticsearch stores the indexed data
5. Kibana is used for visualization / dashboarding / querying the data collected, transformed and indexed in Elasticsearch
## PoC
In the PoC, the following was covered:
1. Provisioning ELK stack in k8s
- Elasticsearch
- Service account, cluster role & cluster role binding for RBAC implementation
- Service for networking
- Statefulset for running elasticsearch
- **Note: there is an init container setting a system property, running linux alpine. Setting /sbin/sysctl vm.max_map_count=262144 is mandatory**
- Filebeat
- Service account, cluster role & cluster role binding for RBAC implementation
- Configmap with filebeat options and prospectors / inputs. **This configuration file is the central piece for data collection methods and critical for our needs.**
- Daemonset - apart from deploying running filebeats instances, here is critical that the configmap with the filebeat options & prospectors / inputs is mapped coorectly. Essentially, the configuration of filebeats is the following:
```
/usr/share/filebeat/
filebeat.yml
prospectors.d/
kubernetes.yml
[PROSPECTOR].yml
modules.d/
[MODULE].yml
[MODULE].yml.disabled
```
- Kibana
- Service for networking
- Deployment for running kibana instance
- Ingressroute for routing the kibana traffic to public internet
- Logstash
- Service for networking
- Configmap with logstash endpoint and pipeline configs
- Deployment for running logstash instance. Here is critical that the configmap with the logstash endpoint and pipeline optionsare mapped coorectly. The configuration of filebeats is the following:
```
/usr/share/logstash/
config/
logstash.yml
pipeline/
logstash.conf
```
2. Traefik routing
- Ingress routes for Dev, Test, Acceptance environments.
- Routing is from **default** to **kube-system** namespace.
- Routing is based on subdomains, not on routes (see below).
3. Configuring DNS entries in Azure for Dev, Test, Acceptance
- dev-kibana.alkem.io
- test-kibana.alkem.io
- acc-kibana.alkem.io
## Progress
- [x] Clustered provisioned with all nodes of the stack operational
- [x] Routing configured so Kibana endpoints are accessible externally
- [x] Basic, indexable configuration operational on dev environment
- [ ] Log aggregation of alkemio services
- [ ] Custon dashboards
- [ ] Advanced filtering - proper tagging of k8s resources so they can be indexed & filtered
- [ ] Multiple operational environments (AWS & Azure)
- [ ] Data persistance of logs
- [ ] Log rotation
- [ ] Enabled filebeats modules for services in our stack (mysql, traefik, mariadb, postgres etc.)
- [ ] Protected kibana endpoints
## Next steps
- [ ] Filebeats configuration to enable data ingestion from alkemio services
- [ ] Logstash configuration for data indexing of alkemio logs
- [ ] creating tags for k8s resources so they are indexed
- [ ] traefik, mysql, mariadb, postgres monitoring
- [ ] data persistance of logs
- [ ] setup on Azure environemnt (currently working example only on AWS)
## Known risks
- [ ] Elasticsearch data is not found at all on acc (Azure) although all services are operational
- [ ] Elasticsearch needs to be updated - the PoC is on version 6.8.4, there is 8.1.2 already. Some of the configuration files are different.
- [ ] Data indexing requires know-how of elasticsearch specifics
- [ ] logstash* index on dev appears and disappears without any service crashes
- [ ] Although the kubernetes.yml filebeat prospector is named 'kubernetes', its type is 'docker'
## References
1. [Log aggregation using ELK stack ](https://www.magalix.com/blog/kubernetes-observability-log-aggregation-using-elk-stack)
2. [12-factor app](https://12factor.net/logs)
3. [Filebeat prospectors](https://www.oreilly.com/library/view/learning-elastic-stack/9781787281868/b130c3aa-cfc6-49f0-be6a-5cc5865d0c39.xhtml)
4. [Filebeat inputs vs prospectors](https://discuss.elastic.co/t/error-setting-filebeat-config-prospectors-has-been-removed-after-upgrade-to-7-2/188489)
5. [Configuring filebeat inputs](https://www.elastic.co/guide/en/cloud/current/ec-getting-started-search-use-cases-node-logs.html)
6. [Configuring filebeat options](https://www.elastic.co/guide/en/beats/filebeat/current/configuration-filebeat-options.html)
7. [Deploy logstash and filebeat on kubernetes](https://raphaeldelio.medium.com/deploy-logstash-and-filebeat-on-kubernetes-with-eck-ssl-and-filebeat-d9f616737390)