# Installation ## Operator The scenarios related to Installation only cover the targeted version of OpenShift i.e 4.13 ### Positive: - [ ] Kepler operator installs fine from Community Catalog :::warning This is to be verified from OpenShift Console ::: - [ ] Kepler operator installs fine from Operator-SDK. :::warning This is to be verified by Upstream/Downstream CI. ::: - [ ] Kepler operator should be able to deploy/handle any dependencies required. :::warning This includes the ability of the operator to provision, configure, and manage resources such as pods, services, volumes, etc. ::: - [ ] Following resources should get deployed as part of Operator Installation: - [ ] `kepler-operator-controller-manager deployment` - [ ] `kube-rbac-proxy` - [ ] `manager` - [ ] `kepler-operator-controller-manager replicaset` - [ ] `kepler-operator-controller-manager-metrics-service` - [ ] `Kepler CRD` - [ ] `kepler-operator-controller-manager Service Account` - [ ] `roles` - [ ] `rolebindings` - [ ] `lease` - [ ] `servicemonitor` -> TBD - [ ] `kepler-operator CSV` - [ ] `installplan` - [ ] `subscription` - [ ] `packagemanifests` - [ ] `catalog` ### Negative: - [ ] OpenShift console should log appropiate error in case of unsucessful installation of Operator. :::warning This is to be verified from OpenShift Console. It can be related to any failure scenario or in-case of unsuccessful installation. ::: :::spoiler **Out of Dev Preview** - [ ] Kepler operator installs fine on namespace other than openshift-operators. :::warning Try deploying the operator on different namespaces. ::: - [ ] Kepler Operator should log appropriate status in case of successful installation :::warning On the OpenShift Console side we should be able to see the success status ::: ::: ## Kepler Instance ### Positive: - [ ] Kepler operator when installed from Community Catalog can create a Kepler instance successfully. :::warning This is to be verified from OpenShift console Kepler operator can reconcile `kepler` CR ::: - [ ] Kepler operator when installed from Operator-SDK can deploy Kepler instance when Kepler template is applied. :::warning This is to be verified by Upstream/Downstream CI. ::: - [ ] Following resources should get deployed as part of Kepler Instance Creation: - [ ] `openshift-kepler-operator namespace` - [ ] `clusterrole` - [ ] `scc` - [ ] `clusterrolebindings` - [ ] `kepler-exporter-ds` - [ ] `kepler-exporter-cm` - [ ] `kepler-exporter-svc` - [ ] Kepler operator only exposes `exporter port` at the time of configuring the Kepler Instance. - [ ] Kepler Instance should be able to run with different port number. :::warning This is to be verified from OpenShift console ::: - [ ] Appropriate status should be reflected for creation of Instance(Reconcillation status). :::warning This includes whether Kepler Instance was deployed successfully on cluster nodes and logs any error on controller manager. ::: :::spoiler **Out of Dev Preview** - [ ] After Kepler Instance creation Operator should show Instance related objects. :::warning This includes what all get deployed as part of Instance creation like daemonsets, services, namespace etc. ::: - [ ] Kepler exporter daemonsets should only be created on nodes that has label attached to them. :::warning If the label is attached to nodes then Kepler instance should respect that else it should stick to the default behavior i.e deploy on all the available nodes. ::: ::: ### Negative: - [ ] OpenShift/Kubernetes cluster log error if an invalid port value is passed in the Instance configuration. - [ ] OpenShift/Kubernetes cluster logs an appropriate error if an invalid/wrong configuration key/value is passed. - [ ] Kepler Operator logs an appropriate error if no nodes are available for deployment of Kepler Instance. :::warning Appropriate status should be reflected in this case. ::: - [ ] Appropriate logging of error if an invalid name of Kepler instance is provided. :::warning Kepler-Operator should only respect `kepler` as the instance name. Other than that should be considered invalid. ::: - [ ] Kepler controller manager logs an appropriate error if Instance creation is unable to create required resources. :::warning Availability status should also be updated ::: - [ ] Kepler controller manager logs an appropriate error if it is unable to pull Kepler Image with a pre-configured tag. :::warning Availability status should also be updated ::: # Metrics :::spoiler **Kepler Operator Metrics(Out of scope for Dev preview)** - [ ] Kepler controller manager should be able to expose metrics at the configured port. - [ ] Metric HTTP endpoint should be reachable either via port-forwarding or creating a route on OpenShift. - [ ] Following metrics should be available and updated accordingly: - [ ] `controller_runtime_reconcile_total` - [ ] `controller_runtime_max_concurrent_reconciles` - [ ] `leader_election_master_status` ::: ## Kepler Exporter Metrics - [ ] Kepler exporter should be able to expose metrics at port 9103 - [ ] Metric HTTP endpoint should be reachable either via port-forwarding or creating a route on OpenShift. - [ ] Following metrics should be available and updated accordingly: :::info These metrics are used inside our grafana dashboard. ::: - [ ] `kepler_container_package_joules_total` - [ ] `kepler_container_dram_joules_total` - [ ] `kepler_container_other_host_components_joules_total` - [ ] `kepler_container_gpu_joules_total` # Uninstallation ## Kepler Operator ### Positive - [ ] Kepler Operator should get uninstalled successfully when deployed using OpenShift console. - [ ] Kepler Operator should get uninstalled successfully when deployed using Operator-SDK. - [ ] Following resources should get uninstalled: - [ ] `kepler-operator-controller-manager deployment` - [ ] `kube-rbac-proxy` - [ ] `manager` - [ ] `kepler-operator-controller-manager replicaset` - [ ] `kepler-operator-controller-manager-metrics-service` - [ ] `kepler-operator-controller-manager Service Account` - [ ] `kepler-operator CSV` - [ ] `subscription` - [ ] `servicemonitor` -> **TBD** ### Negative - [ ] Appropriate error must be logged in case of unable to remove Operator from the cluster. :::warning Reconcillation should report the appropiate reason. ::: ## Kepler Instance ### Positive - [ ] Kepler Instance should get uninstalled successfully using OpenShift Console. :::warning This is to be verified from OpenShift console ::: - [ ] Kepler Operator should get uninstalled successfully using Operator-SDK. :::warning This is to be verified by Upstream/Downstream CI. ::: - [ ] Following resources should get uninstalled: - [ ] `openshift-kepler-operator namespace` - [ ] `clusterrole` - [ ] `scc` - [ ] `clusterrolebindings` - [ ] `kepler-exporter-ds` - [ ] `kepler-exporter-cm` - [ ] `kepler-exporter-svc` ### Negative - [ ] Controller manager logs an appropriate error if it is unable to remove Kepler Instance from the cluster and availability status should be updated accordingly. :::warning If something isn't able to uninstall then the related objects should be present on cluster for debugging purposes. ::: # Grafana Dashboard ## Positive - [ ] `deploy-grafana.sh` should be able to deploy/configure the Grafana dashboard. - [ ] Configuring `user-workload-monitoring` if not already configured. - [ ] Install Grafana Operator inside `openshift-kepler-operator` namespace. - [ ] Configure tokens - [ ] Configure Service monitors - [ ] Add Grafana dashboard source. - [ ] Grafana dashboard should not show CO2 carbon emission panels when deployed. (**For Dev Preview**) - [ ] Grafana dashboard should be able to query data from `user-workload-monitoring` Prometheus. - [ ] Grafana dashboard should show tooltips with relevant information on each panel. - [ ] All the panels inside dashboard should update accordingly. ## Negative - [ ] `deploy-grafana.sh` should log an appropriate error in case of any failure.