# Monitoring ( cockpit ) Design ###### tags: `Design` ## High level overview ### Tools used * [grafana](https://grafana.com/docs/) * [telegraf](https://docs.influxdata.com/telegraf/v1.14/) * [influxdb](https://docs.influxdata.com/influxdb/v1.8/) * [mariadb](https://mariadb.com/kb/en/documentation/) * python * [docker / dockerfiles](https://docs.docker.com/engine/reference/builder/) ### Design Use small independent [python tools and scripts](https://github.com/rdo-infra/ci-config/tree/master/ci-scripts/infra-setup/roles/rrcockpit/files/telegraf) to pull data from various systems. Take the data and [dump it csv](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/telegraf/last_promotions.py#L19-L32) format to the local system [Telegraf is configured to trigger](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/telegraf/telegraf.d/zuulv3_job_builds.conf) the python scripts on the system and takes the stdout from the commands and the output is written to [influxdb](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/telegraf/telegraf.conf#L68-L84) So to get started and understand the workflow we recommend writing a simple python program that writes to stdout a simple csv formatted output. Configure the python command to execute with a telegraf config. ## Database design Why do we have influxdb and mariadb. ### Two use case.. At this time the tooling is configured to get data from an api, transform the output and dump data directly to the database. It does not update records. One could write tooling to update records sent to mariadb but it's not recomended for influxdb metric data. #### job data and status Data from the [zuul api](https://zuul.openstack.org/api/) is meant to be retained for historical information. It's important to know the pass/fail rates over time etc. A time series database like influxdb does a very good job at processing this kind of data. #### launchpad and bugzilla The tooling today is only concerned with open bugs and not tracking historical bug data. In order to ensure we're not recreating a bug database the mariadb tables are dropped and the latest bug data is pulled from launchpad and bugzilla and the tables are repopulated. ----- # Getting started Most of the configuration, pages and tooling are upstream. ``` git clone https://github.com/rdo-infra/ci-config.git ``` Internal only pages are available at ``` https://url.corp.redhat.com/rrockpit-git ``` At the moment grafana monitoring is colocated next to other tools and scripts. Change into the right directory. ``` cd ci-config/ci-scripts/infra-setup/roles/rrcockpit ``` ## starting a local development environment There is a simiple script included in this repo that will help start up the required service container with docker-compose - see [development_script.sh](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/development_script.sh). Running with -s will start up the cockpit: ``` cd files [m@192 files]$ ./development_script.sh -s + '[' -z -s ']' + '[' -s '!=' '' ']' + case $1 in + shift + start + docker volume create telegraf-volume telegraf-volume + docker volume create grafana-volume grafana-volume + docker volume create influxdb-volume influxdb-volume + docker volume create mariadb-volume mariadb-volume + docker-compose up Starting nginx ... done Starting mariadb ... done Starting influxdb ... done Starting mariadb-sidecar ... done Starting telegraf ... done Starting grafana ... done ``` ## Walk through the start up. Life starts with the [docker-compose.yml](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/docker-compose.yml) There is a dockerfile for each container in the coresponding directory, e.g. [dockerfile](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/telegraf/Dockerfile). This install the required packages and sets of the local configuration and launches the service. Note the 'env' file in each directory as well for docker environmental variables. ---- # updating or creating new grafana pages ### create a key * log into the webui at http://localhost:8080 w/ admin/admin First create the required key using [create-api-key.py](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/grafana/create-api-key.py). ``` ./create-api-key.py --key-name foo > grafana.key ``` ### update your panel or page. Add a panel to grafana [overview doc](https://grafana.com/docs/grafana/latest/panels/panels-overview/) Add a dashboard ( new page ) [doc](https://grafana.com/docs/grafana/latest/features/dashboard/dashboards/) ### export gui changes to grafana json Use the [export-grafana.py](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/grafana/export-grafana.py) script ``` ./export-grafana.py --key foo ``` ------ # Working with the cockpit I suggest after your development environment starts up that you let it collect data over the course of 10 to 15 minutes. ## logs docker-compose logs -f ## exec to the containers docker exec -ti hash /bin/bash ### influx once inside the influx container ``` influx use telegraf; select ```