Configuration Refactor Proposal

# Configuration Refactor Proposal ### What do we want to be able to support? #### Main goals - To have standalone helm chart values files for our _helm releases_ (helm chart installations) - With standalone values files, we can easier validate our passed helm chart values via `helm template`, assuming the helm chart's we rely on bundle with a `values.schema.json` file. - This validation should be run on all Pull Requests that touch helm configuration files, since merging is equivalent to deploying and invalid helm config can lead to a misconfigured helm release. This could expose security risks and helm doesn't necessarily feed this back to us on the command line. - Out of scope for this refactoring proposal, defining `values.schema.yaml` files to generate `values.schema.json` files for the helm chart's we have defined is tracked in https://github.com/2i2c-org/infrastructure/issues/937. Note that we don't have to do this for the JupyterHub helm chart we depend on, only for the Helm chart's we define ourselves. - With standalone helm chart values files, we can also begin to explore ways in which we can deploy _hubs_ in parallel (not just the clusters) #### Additional goals - Clarifying/standardising vocabulary - To let a helm release's values files be explicitly declared in a `*.cluster.yaml` file - An option to this would be to rely on a naming convention (which we currently do). - While not relying on a naming convention, we look to be consistent for improved human readability. ### Batches of work _We envision these to be singular Pull Requests that can be made without breaking changes to the main branch_ 1. Rename `hub-templates` to `helm-charts` since that folder includes locally defined helm charts 2. Relocate the `support` chart into the `helm-charts` folder since it is itself a helm chart 3. Rename `/config/hubs` to `/config/clusters` since each file currently represents a cluster, not a hub Instead of listing the helm chart values directly in the `*.cluster.yaml` files, pull them out into individual helm chart values files and then define a list of values files in the `*.cluster.yaml` (example below) 4. Migrate the `support.config` key to `support.helm_chart_values_files` and separate out the config 5. Refactor configuration structure i. Migrate the `hubs.*.template` key to be `hubs.*.helm_chart` and the `hubs.*.config` key to be `hubs.*.helm_chart_values_files` and separate the config into individual files. ii. We let each cluster have it's own folder under `/config/clusters` that contains all the helm chart values files for each deployment (see example below) iii. Each cluster should be recognisable from a file with a fixed name (like `Chart.yaml` for helm charts), e.g. `cluster.yaml`. Note: The value of the `name` field _inside_ `cluster.yaml` should match the name of the cluster folder. 6. Add a new function to the deployer that can generate: i. A list of helm chart values files from a given cluster name and hub name ii. the reverse We can use this in CI to decide which hubs need updating. For example, input a list of changed files into the function, and get out a list of hubs that need to redeployed, and their cluster names. 7. Update `deployer validate` function to run `helm template` Whenever `helm template` is run, the `helm` CLI will validate the helm chart with the provided configuration values against its embedded schema (`values.schema.json`). This will help us catch configuration errors. ### Examples #### Example directory tree ``` config └── clusters ├── cluster_1 │ ├── cluster.yaml │ ├── prod.hub.values.yaml │ ├── staging.hub.values.yaml │ └── support.values.yaml └── cluster_2 ├── cluster.yaml ├── prod.hub.values.yaml ├── staging.hub.values.yaml └── support.values.yaml ``` #### Example `cluster.yaml` config By defining hub deployments as an ordered object of values files, we can share config via `common.values.yaml` or just by listing the required files. ```yaml name: cluster_1 image_repo: ... provider: {{ aws | azure | gcp }} gcp: # For example... key: ... project: ... cluster: ... zone: ... support: helm_chart_values_files: - support.values.yaml hubs: - name: staging domain: staging.2i2c.cloud helm_chart: basehub auth0: connection: github helm_chart_values_files: - staging.hub.values.yaml - name: prod domain: prod.2i2c.cloud helm_chart: basehub auth0: connection: github helm_chart_values_files: - prod.hub.values.yaml ``` --- # Dump of notes from Sarah + Erik 2022-01-21 ## Ideas of PR, ordered chronologically - we rename hub-templates to helm-charts - we relocate support folder to helm-charts/support - we rename config/hubs to config/clusters - we change cluster config: support.config to be support.helm-chart-values-files - we change cluster config: hubs.*.templates to be hubs.*.helm-chart - we change cluster config: hubs.*.config to be hubs.*.helm-chart-values-files - we let each cluster have its own folder under config/clusters - we let each cluster config have its own config file that can be recognized by a fixed name (like Chart.yaml for Helm charts). cluster.yaml cluster-config.yaml deployer-script-cluster-config.yaml - what do we name the folders? - we are not sure, its not critical - we probably want to have those names, match the name declared in the cluster.yaml - what do we name and how do we organize the helm chart values files? - we are not sure, its not critical - maybe something.values.yaml - maybe hubname.hub.values.yaml ## Action points - [name=Sarah] Cleanup this document - Sarah aims to clean this up by monday 24th - [name=Sarah] Request feedback from the tech team on monday 24th, with a "please before" some day. # Dump of notes from Sarah + Erik 2022-01-20 # Action points 2022-01-20 - [name=Erik] Create issue about values.schema.json files - DONE: https://github.com/2i2c-org/infrastructure/issues/937 - [name=Erik] Experiment with github workflow files based on an installation graph - DONE: conclusion is that its hard, the exploration didn't lead to clear ideas on how to structure oure deployer script config. - [name=Sarah] Black autoformatting etc in pre-commit. - IN PROGRESS: https://github.com/2i2c-org/infrastructure/issues/938 - [name=Erik&Sarah] Follow up on Friday 21th Jan about github workflow experimentation # Ramble of notes 2022-01-20 ### What is staff.yaml? It is ready by the deployer script to inject some credentials etc, to conclude, it is very relevant configuration for the deployer script. ### UX of deployer script 1. The deployer script can only deploy a single helm release. When doing so, it will not consider the dependency structure etc, but instead straight up try to deploy the individual helm chart release. 2. The deployer script has configuration to allow it to create an installation graph. 3. The deployer script can filter the installation graph based on changed files if passed a git reference to compare against 4. Our github workflows should be able to upgrade the helm chart releases based on the installation graph. - [name=Erik] will explore if/how this is practically possible github workflow file: - you let the deployer script decide on a graph on how to install multiple separate releases locally: - you use the deployer script to install individual releases, ignoring `needs`. `needs` is only used to generate installation graphs ### Example deployer cluster config file ```yaml cluster deployer-script configuration file: lists releases to install each release has a "needs" example: support chart release: needs: [] staging hub release: needs: [support] values: [file1, file2] prod hub release: needs: [staging-hub, support] values: [file1, file3] ``` ### Discussion about helm chart values schema files - locate them in the Helm chart directory, in values.schema.yaml next to values.yaml and Chart.yaml - generate a .json representation whenever they are needed Related issue: https://github.com/2i2c-org/infrastructure/issues/937 ### Let the deployer script work with helm releases generically - avoid having a concept of "support chart" vs "hub prod chart" vs "staging hub deployment" - helm chart release dependencies (`needs`) - NOTE: `deployer deploy-support` is a chart installation process that also installs cert-manager first. cert-manager may or may not be something we can install with helm as other charts so we should investigate if we can or can't do that. - NOTE: `deployer deploy-grafana` isn't a chart installation process, its a process of updating grafana dashboards ### Detecting relevant files by naming convention vs explicit declaration To have a convention by naming, or explicit declaration. Erik suggests we aim for explicit declarations as much as possible. ### Out of scope idea Maybe we can have a way to detect the deletion of a helm chart release, which would then trigger `helm delete`. # Other notes copied from issues of relevance This issue was created about one goal, but I hope to see multiple goals solved with the proposed API changes to our configuration that the deployer script reads. To have pure helm chart values files allowing us to run helm template --validate etc without trouble. To enable GitHub workflows define jobs that trigger effectively to not redeploy too much or too little. To enable a dependency on misc helm charts, like for example jupyterhub-ssh To enable a dependency on a cluster support chart or similar. I'm not sure what proposal of changes to our custom configuration structure is needed, but I'd like us to consider these kinds of challenges when coming up with a config structure proposal. --- What we have referred to a "hub" is practically a helm chart release (an installation of a helm chart). Each hub / helm chart release explicitly lists its values files Each hub / helm chart release explicitly lists dependencies to other hub / helm chart releases We attempt to make the support deployment another hub / helm chart release that the hub's in the cluster explicitly depend on