# Monitoring ( cockpit ) Design
###### tags: `Design`
## High level overview
### Tools used
* [grafana](https://grafana.com/docs/)
* [telegraf](https://docs.influxdata.com/telegraf/v1.14/)
* [influxdb](https://docs.influxdata.com/influxdb/v1.8/)
* [mariadb](https://mariadb.com/kb/en/documentation/)
* python
* [docker / dockerfiles](https://docs.docker.com/engine/reference/builder/)
### Design
Use small independent [python tools and scripts](https://github.com/rdo-infra/ci-config/tree/master/ci-scripts/infra-setup/roles/rrcockpit/files/telegraf) to pull data from various systems. Take the data and [dump it csv](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/telegraf/last_promotions.py#L19-L32) format to the local system
[Telegraf is configured to trigger](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/telegraf/telegraf.d/zuulv3_job_builds.conf) the python scripts on the system and takes the stdout from the commands and the output is written to [influxdb](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/telegraf/telegraf.conf#L68-L84)
So to get started and understand the workflow we recommend writing a simple python program that writes to stdout a simple csv formatted output. Configure the python command to execute with a telegraf config.
## Database design
Why do we have influxdb and mariadb.
### Two use case..
At this time the tooling is configured to get data from an api, transform the output and dump data directly to the database. It does not update records.
One could write tooling to update records sent to mariadb but it's not recomended for influxdb metric data.
#### job data and status
Data from the [zuul api](https://zuul.openstack.org/api/) is meant to be retained for historical information. It's important to know the pass/fail rates over time etc. A time series database like influxdb does a very good job at processing this kind of data.
#### launchpad and bugzilla
The tooling today is only concerned with open bugs and not tracking historical bug data. In order to ensure we're not recreating a bug database the mariadb tables are dropped and the latest bug data is pulled from launchpad and bugzilla and the tables are repopulated.
-----
# Getting started
Most of the configuration, pages and tooling are upstream.
```
git clone https://github.com/rdo-infra/ci-config.git
```
Internal only pages are available at
```
https://url.corp.redhat.com/rrockpit-git
```
At the moment grafana monitoring is colocated next to other tools and scripts. Change into the right directory.
```
cd ci-config/ci-scripts/infra-setup/roles/rrcockpit
```
## starting a local development environment
There is a simiple script included in this repo that will help start up the required service container with docker-compose - see [development_script.sh](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/development_script.sh). Running with -s will start up the cockpit:
```
cd files
[m@192 files]$ ./development_script.sh -s
+ '[' -z -s ']'
+ '[' -s '!=' '' ']'
+ case $1 in
+ shift
+ start
+ docker volume create telegraf-volume
telegraf-volume
+ docker volume create grafana-volume
grafana-volume
+ docker volume create influxdb-volume
influxdb-volume
+ docker volume create mariadb-volume
mariadb-volume
+ docker-compose up
Starting nginx ... done
Starting mariadb ... done
Starting influxdb ... done
Starting mariadb-sidecar ... done
Starting telegraf ... done
Starting grafana ... done
```
## Walk through the start up.
Life starts with the [docker-compose.yml](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/docker-compose.yml)
There is a dockerfile for each container in the coresponding directory, e.g. [dockerfile](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/telegraf/Dockerfile). This install the required packages and sets of the local configuration and launches the service. Note the 'env' file in each directory as well for docker environmental variables.
----
# updating or creating new grafana pages
### create a key
* log into the webui at http://localhost:8080 w/ admin/admin
First create the required key using [create-api-key.py](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/grafana/create-api-key.py).
```
./create-api-key.py --key-name foo > grafana.key
```
### update your panel or page.
Add a panel to grafana [overview doc](https://grafana.com/docs/grafana/latest/panels/panels-overview/)
Add a dashboard ( new page ) [doc](https://grafana.com/docs/grafana/latest/features/dashboard/dashboards/)
### export gui changes to grafana json
Use the [export-grafana.py](https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/files/grafana/export-grafana.py) script
```
./export-grafana.py --key foo
```
------
# Working with the cockpit
I suggest after your development environment starts up that you let it collect data over the course of 10 to 15 minutes.
## logs
docker-compose logs -f
## exec to the containers
docker exec -ti hash /bin/bash
### influx
once inside the influx container
```
influx
use telegraf;
select
```