--- tags: SA, monitoring, k8s --- # OME UoD Kubernetes review 2020-12-10 - Move to hosted solution or EBI? - https://github.com/ome/kubernetes-apps - minio -> Docker - Redmine -> requires PSQL + data volume - jupyterhub -> mybinder/EBI binderhub - grafana/prometheus: possible replacement of check_mk. OME monitoring. Also pulls a subset of the IDR metrics - hosted solution alternatives: DataDog etc - Question of how much data to export/scrape - usually trying to cut down results in fear to lose data - need to have discussion around the types of queries/stats that would be useful - Time to be comfortable with monitoring - < 1 month - main thing missing is alerts - https://github.com/openmicroscopy/management_tools/blob/ef27ae1e71092458f2c10e9dda2641536844b378/k8s/config/prometheus-global/prometheus.yml#L14-L17 https://github.com/openmicroscopy/management_tools/blob/ef27ae1e71092458f2c10e9dda2641536844b378/k8s/config/prometheus-global/prometheus.yml#L31-L32 - Simon: inclination to get rid of prometheus-global and replicate prometheus-ome in Docker - Redmine - standalone PG server. ome-k8s-pg-01.openmicroscopy.org. Nightly dumped - Docker compose - Volume on GPFS - https://github.com/ome/idr-redmine-tracker/blob/2c9e270ad1baacfb7aeaa65c38c6237cd9390d9b/k8s/secret-config.yaml.example - https://github.com/ome/idr-redmine-tracker/blob/2c9e270ad1baacfb7aeaa65c38c6237cd9390d9b/SETUP.md - K8S cluster timelines - nothing external. Potentially some internal certs expired - security issues - main risk is not beign able to restart/upgrade it cf OpenStack - EBI K8s cluster - Mid-term decommission and point at Craig's instances see also: https://openmicroscopy.slack.com/archives/C0K5ME1EW/p1607606588067400