Associate users with groups for usage monitoring

# Associate users with groups for usage monitoring ## :beginner: Project Info :small_blue_diamond:GitHub issue: [#5315 · 2i2c-org/infrastructure](https://github.com/2i2c-org/infrastructure/issues/5315) :small_blue_diamond:Champion: Jenny :small_blue_diamond:Target Audience: Hub admins and other partners who monitor usage with Grafana :small_blue_diamond:Timebox: 3 sprints (6 weeks) ## :triangular_flag_on_post: Background Hub admins would like to monitor usage and costs by user groups. This allows them to advocate for better funding / cost recovery based on their own internal representations and needs. Example communities who would benefit from this feature include HHMI, Cryocloud, CIROH and EarthScope. ## :star: Goal Hub Admins can use Grafana to filter usage metrics by user group. ### Out of scope - BinderHubs, since they do not have a concept of users and groups by design - Shared password hubs, for the same reason as BinderHubs - CILogon, or any other auth method that does not permit user group control atm - Updates to cost monitoring work will come later - Including logic dealing with users who are a part of multiple user groups ## :pencil: Definition of Done Each item makes incremental progress to achieving the overall goal. 1. User and group information from JupyterHub is available to Prometheus via a service 2. User groups can be joined to existing usage metrics through PromQL and visualised in Grafana ## :wrench: Technical Requirements These items and their tasks can map to a sub-issue of the overall [initiative](https://github.com/2i2c-org/infrastructure/issues/5315), depending on their effort estimates. :::info A note on the "Questions" below: @yuvipanda you don't need to answer all of the questions -- they represent current gaps in my knowledge so you can skip the "basic" low-level questions that require me to just brush up on our current setup and answer the more "architectural" high-level ones. ::: ### 0. Preliminary work This work was refined and/or undertaken during current sprint (iteration 12): - [Use annotations instead of labels for usernames on pods](https://github.com/2i2c-org/infrastructure/issues/5804) - [Use kube_pod_annotations instead of kube_pod_labels for Prometheus queries](https://github.com/2i2c-org/infrastructure/issues/5806) - [Enable manage_groups for all hub deployments](https://github.com/2i2c-org/infrastructure/issues/5805) The following tasks below are refined and planned for subsequent sprints. ### 1. User and group information from JupyterHub is available to Prometheus We configure users and groups with a [variety of external identity providers](https://infrastructure.2i2c.org/hub-deployment-guide/configure-auth/), such as GitHub, CILogon, etc. [JupyterHub itself handles authentication internally](https://jupyterhub.readthedocs.io/en/stable/explanation/oauth.html), and the [`manage_groups`](https://jupyterhub.readthedocs.io/en/latest/reference/authenticators.html#authenticator-managed-group-membership) setting saves user <-> group mappings to JupyterHub's state/database. We can create a [JupyterHub Service](https://jupyterhub.readthedocs.io/en/latest/reference/services.html) to expose this mapping and format this information with a Prometheus exporter for scraping. For authentication, Earthscope uses OIDC, Google and CILogon. Some of these usernames are not human-readable/friendly, e.g. `google-2doauth2-<21-char-hash>`, so being able to segment user base into user groups becomes even more beneficial for monitoring purposes. In terms of security boundaries, the JupyterHub service should be limited to the following [scopes](https://jupyterhub.readthedocs.io/en/latest/rbac/scopes.html#available-scopes): - `list:users` - `list:groups` Here we explicitly define the user group data queried from the hub's REST API to be passed to the Prometheus exporter (all formatted as [gauges](https://prometheus.io/docs/concepts/metric_types/#gauge)): - From [`list:users`](https://jupyterhub.readthedocs.io/en/latest/reference/rest-api.html#operation/get-users) we need - `name` (string) - `groups` (list) - no `auth_state` information is required, since we do not need to pull through information from a third party identitiy provider - no information on `roles` is required - no information on `activity` or `servers` is required - From [`list:groups`](https://jupyterhub.readthedocs.io/en/latest/reference/rest-api.html#operation/get-groups) we need - `name` (string) - `users` (list) #### Tasks (32h) - [ ] Set up JupyterHub service (8h) - [ ] Write Prometheus exporter for a `jupyterhub_user_groups` metric that exposes user <-> group mapping (16h) - [ ] Deploy service to staging area for testing and validation (8h) - [ ] Upstream and publish to software packages (4h) #### Risks - This component needs to be agnostic to external auth providers that are out of scope, e.g. CILogon - Dealing with the case where a Hub Admin might CRUD user groups :::info Questions - JupyterHub stores the user <-> group mappings, but will not store the group labels. This will need to pulled from the external oauth provider. Introduces complexity but is important for the Hub Admin. Out of scope for the MVP for now? - Are we handling the general case where `manage_groups` may or may not be enabled by the JupyterHub Base OAuthenticator? - `manage_groups = False` is out of scope for this initial iteration - `manage_groups = True` populates the User Groups model in JHub's state. JupyterHealth and Earthscope are communities with this setting enabled. - How often should we emit user group data? - We have 3 components that need to talk to each other: 1. JupyterHub 2. JupyterHub service 3. Prometheus - We have two infinite while loops that passes communication between each component - Prometheus hits the JupyterHub service every minute - The JupyterHub service can be tuned to balance the load of the infrastructure resources available - This presents a mismatch between the cadence of communication - We can cache the result to the JupyterHub service - Prometheus async client library takes care of concurrency - Will we need to deal with [pagination](https://jupyterhub.readthedocs.io/en/latest/howto/rest.html#api-pagination), e.g. in case there is a very long list of users? - Yes, see [prometheus-dirsize-exporter](https://github.com/yuvipanda/prometheus-dirsize-exporter/) as an example. This is one of many reasons why the JupyterHub service cannot collect real-time information. - Do we really need a separate JupyterHub service? Can we pass info from hub's REST API directly to the custom Prometheus exporter? - Yes, the JupyterHub service provides the appropriate credentials and scopes that we can deploy as a standalone service. ::: #### Contributors - App Eng - Infra Eng ### 2. User and user group data can be associated to existing usage metrics through PromQL and visualised in Grafana Once [Part 1.](#1-User-and-group-information-from-JupyterHub-is-available-to-Prometheus-via-a-service) is completed, we need to update the PromQL queries used to populate our Grafana dashboards to allow filtering on user groups. Our infrastructure scrapes the following Prometheus exporters: - [Default JupyterHub Prometheus metrics](https://jupyterhub.readthedocs.io/en/latest/reference/monitoring.html): JupyterHub exposes the `/metrics` endpoint and provides [these metrics](https://jupyterhub.readthedocs.io/en/latest/reference/metrics.html) as standard. - [kubernetes/kube-state-metrics](https://github.com/kubernetes/kube-state-metrics): a service that listens to the Kubernetes API server and generates metrics about the state of the objects. - (See [Support helm chart](https://github.com/2i2c-org/infrastructure/blob/2fc74019a8e6fb21d24dd1ac267dd4f476b59c2d/helm-charts/support/values.yaml#L90) for labels enabled for `kube-state-metrics.`) - The [Linux Node exporter](https://grafana.com/oss/prometheus/exporters/node-exporter/): collects Linux system metrics like CPU load and disk I/O. - [Home directory exporter](https://github.com/yuvipanda/prometheus-dirsize-exporter/tree/main): custom exporter to measure a user's home directory storage usage. We also make use of the following labelsets: - `kube_pod_labels` - `kube_node_labels` - `kube_service_labels` - `kube_volumeattachment_labels` ...and we have a view to use `kube_pod_annotations` so that we can join on unescaped JupyterHub usernames were possible to avoid restrtictive label naming (see [GH issue](https://github.com/2i2c-org/infrastructure/issues/5804)). Our default [JupyterHub Grafana Dashboards](https://jupyterhub-grafana.readthedocs.io/en/latest/index.html) usage metrics are comprised of - Activity - Cluster Information - Home Directory Usage - JupyterHub - NFS and Support Information - Usage Report - User Diagnostics Let's take a look at each of the PromQL queries in turn to see what is needed to allow filtering on user groups. :small_blue_diamond: Activity *Running servers:* This uses the metric `jupyterhub_running_servers`, which is scraped from the [default JupyterHub Prometheus metrics](https://jupyterhub.readthedocs.io/en/latest/reference/metrics.html). This does not expose any pod/user-level labels, since the metric is derived from the JupyterHub database. [jupyterhub/user.py](https://github.com/jupyterhub/jupyterhub/blob/742de1311e2671c2f5ea1187c508e60d299a9f63/jupyterhub/user.py#L186) is the upstream JupyterHub code that counts the number of active user servers. *Daily/Weekly/Monthly Active Users:* This uses the metric `jupyterhub_active_users`, which is scraped from the [default JupyterHub Prometheus metrics](https://jupyterhub.readthedocs.io/en/latest/reference/metrics.html). Again, this does not expose any pod/user-level labels. The upstream code that calculates this metric is in [jupyterhub/metrics.py](https://github.com/jupyterhub/jupyterhub/blob/742de1311e2671c2f5ea1187c508e60d299a9f63/jupyterhub/metrics.py#L378) :point_right: Verdict: Possible to filter on user groups with upstream work on JupyterHub Prometheus metrics. :small_blue_diamond: Cluster Information *Running users:* This uses the metric `kube_pod_status_phase` and sums by `namespace/hub` through grouping on `kube_pod_labels` from `kube-state-metrics`. In contrast to Activity, we are summing over a list of all running pods rather than querying the JupyterHub database for this metric. *Node count:* This sums up counts of `kube_node_labels` and joins them to nodegroup labels (set by `eksctl` for example) through node address of the `node` key, e.g. `node="ip-192-168-14-8.us-west-2.compute.internal"`. This is a node level metric, so it would not be meaningful to segment by user groups, i.e. a user group could have $n$ users spread across 1 node or $n$ nodes. *Pods not in Running state:* uses the `kube_status_phase` metric that contains the pod name, which we can use to relate to users and groups. *Out of Memory Kill Count:* This uses the `node_vmstat_oom_kill` metric scraped from the `node-exporter`. This again is a node level metric. *Node Stats:* The all make use of metrics from the `node-exporter` and is a node-level metric. :point_right: Verdict: Currently possible to enable user group filtering on *Running users*. :::info Question I am less sure about the PromQL for node CPU and memory commit % and commitment %, since the queries are fairly complex. ::: :small_blue_diamond: Home Directory Usage All of the metrics in this dashboard are scraped from the [Home directory exporter](https://github.com/yuvipanda/prometheus-dirsize-exporter/tree/main). Every metric is grouped by `directory`, which is derived from `{escaped_username}` when mounting the NFS filesystem in our [basehub/values.yaml config](https://github.com/2i2c-org/infrastructure/blob/cf88ed040292c13f4c24ecf456f001d9ad854c14/helm-charts/basehub/values.yaml#L549). :point_right: Verdict: Currently possible to enable user group filtering through `{escaped_username}`. :::info **Question** - How does `{escaped_username}` work? Looks like we need to track both unescaped and escaped usernames in the annotations. - We could make some upstream changes to the [Home directory exporter](https://github.com/yuvipanda/prometheus-dirsize-exporter/tree/main) ::: :small_blue_diamond: JupyterHub *Images used by pods:* This just sums over `kube_pod_container_info` only. For user group information, we can join `kube_pod_annotations` on the `pod` key. *User <active age/CPU usage/memory usage> distribution:* These queries all use a group by `kube_pod_label` containing the `singleuser-server` component which should be straightforward to pass user group information. *Server start times:* This uses `jupyterhub_server_spawn_duration_seconds_bucket` from the JupyterHub metrics endpoint, which does not expose any user-level information. *Server start failures:* This uses `jupyterhub_server_spawn_duration_seconds_bucket` but by `status!=success`. Does not expose and user-level information. *Hub response latency/response status codes:* This can be omitted from user group filtering. *All JupyterHub Components CPU/Memory (Working Set):* This can be omitted from user group filtering. *Users per node:* This makes use of `kube_pod_info` and `kube_pod_labels`, which can be mapped to user names and therefore user groups. *Non running pods:* This is the same as *Pods not in Running state* above. *Hub DB disk space availability:* This can be omitted from user group filtering. *Free space (%) in shared volume (Home directories, etc.):* This is a node-level metric that does not make sense to segment by user groups. *Anomalous user pods:* All of these metrics expose pod-level data that can be related to users and user groups. :point_right: Verdict: Currently possible to enable user group filtering for some but not all metrics. :small_blue_diamond: NFS and Support Information :point_right: Verdict: All of the dashboards in this section concern NFS diagnostics and Prometheus support information that does not require user group filtering. :small_blue_diamond: Usage Report All of the dashboards in this section are presented as [bar gauges](https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/bar-gauge/) for each user pod, and displays the latest set of data from a time series. For user groups, we can sum over metrics by user groups instead of user pods. *User pod memory usage:* Tracks `container_memory_working_set_bytes` scraped from `kube-state-metrics`. This metric contains `pod` name, therefore can be joined with `kube_pod_labels`. *Dask-gateway worker pod memory usage*: Dask worker pods have `label_app_kubernetes_io_component="dask-worker"` set in `kube_pod_labels`, which includes `label_hub_jupyter_org_username.` *Dask-gateway scheduler pod memory usage:* Dask scheduler pods have `label_app_kubernetes_io_component="dask-scheduler"` set in `kube_pod_labels`, which includes `label_hub_jupyter_org_username.` *GPU pod memory usage:* This grabs GPU usage by filtering `container_working_memory_working_set_bytes` on nodepool labels, e.g. `"nb-gpu-k80"`. This again is joined with `kube_pod_labels`, which includes `label_hub_jupyter_org_username.` :point_right: Verdict: Currently possible to enable user group filtering since all usage reports are at the pod level. :small_blue_diamond: User Diagnostics (technically Pod Diagnostics) *Memory usage:* This is broadly the same PromQL as *Usage Report > User pod memory usage*, except presented as a time series. *CPU usage:* This uses `container_cpu_usage_seconds_total` and is grouped by `kube_pod_labels`. Similar to *JupyterHub > User CPU Usage distribution* except presented as time series rather than a heat map. *Home directory usage:* Similar to *Home Directory Usage* except presented as time series rather than a table showing the latest data. *Memory requests:* This uses the metric `kube_pod_container_resource_requests{resource="memory",}`. This display a time series per-pod per-server (not per-user per-server). This needs to be cleaned up to exclude namespaces such as `support`, `kube-system`, and other service pods such as `shared-dirsize-metrics`, `proxy`, `hub`. Dask scheduler, dask worker and GPU pods need to be joined to the user name through `kube_pod_labels`. *CPU requests:* This uses the metric `kube_pod_container_resource_requests{resource="cpu",}`. The same comments from above apply. :point_right: Verdict: Currently possible to enable user group filtering **Grafana dashboard UX** We want to enable a hub admin to filter and aggregate usage metrics by user group in Grafana. This could be another dropdown box at the top (similar to filtering by "Hub" from a cluster) to filter metrics by user group. We use jsonnet to configure Grafana dashboard templates, e.g. we use the upstream [jupyterhub/grafana-dashboards](https://github.com/jupyterhub/grafana-dashboards) project for default Grafana dashboards. Existing customisation for filtering by "Hub" from a cluster lives in [infrastructure/grafana-dashboards/common.libsonnet ](https://github.com/2i2c-org/infrastructure/blob/main/grafana-dashboards/common.libsonnet), which can be adapted to allow filtering by user group. #### Tasks (>20h) - [ ] Conduct a spike to learn how existing data processing works in Prometheus (8hr) - [ ] Write PromQL that join users/groups to existing metrics where appropriate (16hr) - [ ] Adapt jsonnet to allow filtering by user group UI/UX in Grafana dashboards (8h) - [ ] Deploy Grafana dashboards to Earthscope staging area for testing and validation (8h) #### Risks :::info Questions - Is the "User diagnostics" dashboard technically a "Pod diagnostics" dashboard? - Yes! - `kube-pod-label` records `label_hub_jupyter_org_username="google-oauth2-1173224---1e69b61c"` and `pod="jupyter-google-oauth2-117322480787655244438---1e69b61c"`. This is set in [helm-chart/support/values.yaml](https://github.com/2i2c-org/infrastructure/blob/8b9b5627009b358033724203fc0e61e7c82d162e/helm-charts/support/values.yaml#L90). How easy would it be to add something like `label_hub_jupyter_org_groupname`? ::: #### Contributors - App Eng - Infra Eng ### General tasks (16h) The following sub-tasks apply to the definition of done of each task above - [ ] Code review (2x2h = 4h) - [ ] Documentation (2x2h = 4h) - [ ] Appropriate tests, including overall integration tests (2x4h = 8h) ## :timer_clock: Timeline | **Item** | **Hours** | **Note** | |:----------------------:|:--------:|:--------:| | 1. Export user and group information from JupyterHub to Prometheus | 32 | | | 2. Write Prometheus queries joining users/groups to existing metrics | >20 | | | 3. Add configuration to Grafana dashboards that adds a dropdown menu to filter by user group | 26 | | | General tasks | 16 | | | **Total** | > 102 | > 12 working days |