# EESSI CVMFS monitoring sync meeting (20240830) - Ansible playbooks implemented for Grafana/Prometheus and Prometheus exporters on CVMFS Stratum servers - currently test setup, with "dummy" CVMFS servers reporting back data - see also https://github.com/EESSI/cvmfs-servers/pull/12 - bit outdated, Bob needs to sync recent changes - `node-exporter.json` 24k file is export of project dashboard, so it can be easily restored somewhere if needed - only monitoring server can access endpoint for Prometheus exporter on CVMFS servers - Ansible playbooks are run from Bob's laptop or from Stratum 0 as jumphost - should monitoring server be responsible for all alterting to Slack/email? => YES - status page provides a JSON file that can be pulled in by Prometheus, see https://status.eessi.io/test/status.json - results from EESSI test suite could also be pushed into Prometheus - but for a selected CPU target/system/subset of tests, as "canary in the coal mine" - can we set up client systems that can report back to monitoring server? - could run a daily Slurm job from the HPC clusters we have access to - can also monitor S3 bucket used by CVMFS sync server via AWS CloudFront - we need to create overview of EESSI infrastructure - who can access what - who's responsible (+ backup) for each component - should cover: - CVMFS servers (incl. jumphost) - EESSI status page - build clusters in AWS/Azure - GitHub org + repos - shared secrets via Keepass database file with a master password known by people who need to - see https://keepassxc.org - next meeting - Fri 18 Oct 2024 at 13:30 CEST