In my current job I am a MarkLogic Administrator, which is quite complex. In the version we are currently working on, monitoring of MarkLogic events is quite persistent as the vendor itself admits that their standard system of metrics may show unrealistic data. A few months ago I decided to do a reasearch on the possibilities of monitoring MarkLogic by independent (open-source) applications. The product is not very well known, so there is not much choice, so I decided to create my own stack of monitoring components.
Proper application monitoring should support tracking of all events on the server where the application is hosted. Applications based on UNIX system should contain implementation of system library, which passes messages to syslog. In this case to be sure if the application is healthy we should collect information about all application events, system logs and system metrics.
The main goal was to create a solution, which does not use large system resources and has great flexibility in configuration with other monitoring components. I divided the description of the solution into 2 parts, in the first part I will describe the components for logs, and further I will describe components for monitoring system metrics.
For redirecting log streams I chose Rsyslog, which has a whole bunch of advantages and is available on most of the available UNIX distributions. I've configured Rsyslog on all servers where the application is running so that all system and application logs are redirected to the central Rsyslog server. Our production server is divided into several projects, so the logs on the central server are already pre-filtered and sorted.
The central Rsyslog server, has been configured to listen on a specific port to receive messages from all Rsyslog clients. Each message is written to a file according to the template description and sorted into folders based on the hostname of the client from which the message was sent. Writing to a file was not necessary, I could have used directly redirecting the logs to any other application that indexes the log streams. Initial segregation and saving makes it easier for me to configure the next monitoring components and gives me many options in archiving old messages.
/etc/rsyslog.d/client-collector.conf
/etc/rsyslog.d/central-collector.conf
/etc/rsyslog.d/def_name.json
The preprocessed logs via Rsyslog are already on the central server, so the next step is to configure the search/indexing and visualization software. In my solution I used the Promtail-Loki-Grafana stack configured as a cluster on the docker. In the configuration I used 3 Loki instances connected to an Nginx gateway to route the read and write loads from the clients (Grafana, Promtail). We then get a much smoother running front-end application.
docker-compose-ha.yaml
promtail.yaml
nginx-loki.conf
loki-memberlist.yaml
I chose components to monitor the system metrics:
Telegraph is installed on all servers with the application being monitored. The configuration file is set to have the agent collect system metrics and send to the central server where the InfluxDB image container is running.
telegraf.conf
InfluxDB settings were added to the docker-compose-ha.yaml file
docker-compose-ha.yaml
influxdb.env
At the very end, all that remains is to run all the components and connect all the data sources in Grafana from the browser level. This solution gives us the ability to monitor any application both on our own server infrastructure and in the cloud.