# EESSI CVMFS monitoring sync meeting (20240830)
- Ansible playbooks implemented for Grafana/Prometheus and Prometheus exporters on CVMFS Stratum servers
- currently test setup, with "dummy" CVMFS servers reporting back data
- see also https://github.com/EESSI/cvmfs-servers/pull/12
- bit outdated, Bob needs to sync recent changes
- `node-exporter.json` 24k file is export of project dashboard, so it can be easily restored somewhere if needed
- only monitoring server can access endpoint for Prometheus exporter on CVMFS servers
- Ansible playbooks are run from Bob's laptop or from Stratum 0 as jumphost
- should monitoring server be responsible for all alterting to Slack/email? => YES
- status page provides a JSON file that can be pulled in by Prometheus, see https://status.eessi.io/test/status.json
- results from EESSI test suite could also be pushed into Prometheus
- but for a selected CPU target/system/subset of tests, as "canary in the coal mine"
- can we set up client systems that can report back to monitoring server?
- could run a daily Slurm job from the HPC clusters we have access to
- can also monitor S3 bucket used by CVMFS sync server via AWS CloudFront
- we need to create overview of EESSI infrastructure
- who can access what
- who's responsible (+ backup) for each component
- should cover:
- CVMFS servers (incl. jumphost)
- EESSI status page
- build clusters in AWS/Azure
- GitHub org + repos
- shared secrets via Keepass database file with a master password known by people who need to
- see https://keepassxc.org
- next meeting
- Fri 18 Oct 2024 at 13:30 CEST