---
tags: SA, monitoring, k8s
---
# OME UoD Kubernetes review
2020-12-10
- Move to hosted solution or EBI?
- https://github.com/ome/kubernetes-apps
- minio -> Docker
- Redmine -> requires PSQL + data volume
- jupyterhub -> mybinder/EBI binderhub
- grafana/prometheus: possible replacement of check_mk. OME monitoring. Also pulls a subset of the IDR metrics
- hosted solution alternatives: DataDog etc
- Question of how much data to export/scrape
- usually trying to cut down results in fear to lose data
- need to have discussion around the types of queries/stats that would be useful
- Time to be comfortable with monitoring
- < 1 month
- main thing missing is alerts
- https://github.com/openmicroscopy/management_tools/blob/ef27ae1e71092458f2c10e9dda2641536844b378/k8s/config/prometheus-global/prometheus.yml#L14-L17
https://github.com/openmicroscopy/management_tools/blob/ef27ae1e71092458f2c10e9dda2641536844b378/k8s/config/prometheus-global/prometheus.yml#L31-L32
- Simon: inclination to get rid of prometheus-global and replicate prometheus-ome in Docker
- Redmine
- standalone PG server. ome-k8s-pg-01.openmicroscopy.org. Nightly dumped
- Docker compose
- Volume on GPFS
- https://github.com/ome/idr-redmine-tracker/blob/2c9e270ad1baacfb7aeaa65c38c6237cd9390d9b/k8s/secret-config.yaml.example
- https://github.com/ome/idr-redmine-tracker/blob/2c9e270ad1baacfb7aeaa65c38c6237cd9390d9b/SETUP.md
- K8S cluster timelines
- nothing external. Potentially some internal certs expired
- security issues
- main risk is not beign able to restart/upgrade it cf OpenStack
- EBI K8s cluster
- Mid-term decommission and point at Craig's instances
see also: https://openmicroscopy.slack.com/archives/C0K5ME1EW/p1607606588067400