#### Prow Status * [Operators - Check/Gate failures](https://prow.ci.openshift.org/?job=pull-ci-openstack-k8s-operators*&state=failure) * For now, we just look (and investigate) if a test has multiple failures across different PRs * [Operators - Check/Gate status](https://prow.ci.openshift.org/?job=pull-ci-openstack-k8s-operators*) * [pre-commit jobs failures](https://prow.ci.openshift.org/?job=pull-ci-openstack-k8s-operators*-main-precommit-check&state=failure) * [pre-commit jobs status](https://prow.ci.openshift.org/?job=pull-ci-openstack-k8s-operators*-main-precommit-check) * [unit tests jobs failures](https://prow.ci.openshift.org/?job=pull-ci-openstack-k8s-operators*-operator-main-unit&state=failure) * [unit test jobs status](https://prow.ci.openshift.org/?job=pull-ci-openstack-k8s-operators*-operator-main-unit) * [kuttl tests jobs failures](https://prow.ci.openshift.org/?job=pull-ci-openstack-k8s-operators*-operator-build-deploy-kuttl&state=failure) * [kuttl tests jobs status](https://prow.ci.openshift.org/?job=pull-ci-openstack-k8s-operators*-operator-build-deploy-kuttl) * [Tempest jobs failures](https://prow.ci.openshift.org/?job=pull-ci-openstack-k8s-operators*-operator-build-deploy-tempest&state=failure) * [Tempest jobs status](https://prow.ci.openshift.org/?job=pull-ci-openstack-k8s-operators*-operator-build-deploy-tempest) ### How to debug a Prow job failure * precommit * artifacts -> build.log * usually linter type errors in make-operator-lint, golangci-lint * build/deploy * artifacts -> build.log * search for ... * kuttl * check if it's actually a kuttl failure - might be the build/deploy step * look in artifacts -> -operator-build-deploy-kuttl and check if there is even a kuttl log * [example of real kuttl failure](https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/40096/rehearse-40096-pull-ci-openstack-k8s-operators-openstack-operator-main-openstack-operator-build-deploy-kuttl/1673755615094116352) * openstack-k8s-operators-kuttl/build-log.txt look for error there * look for file job definition in https://github.com/openshift/release * https://github.com/openshift/release/tree/master/ci-operator/step-registry/openstack-k8s-operators/kuttl * example look in: https://github.com/openshift/release/blob/master/ci-operator/jobs/openstack-k8s-operators/glance-operator/openstack-k8s-operators-glance-operator-main-presubmits.yaml#L118 (all in precommit definition - in the operator folder) * also in config: https://github.com/openshift/release/blob/master/ci-operator/config/openstack-k8s-operators/glance-operator/openstack-k8s-operators-glance-operator-main.yaml#L96 * to find the per operator definition: https://github.com/openstack-k8s-operators/keystone-operator/tree/main/tests/kuttl/tests/change_keystone_config * tempest ##### Hive status [Hive metrics to watch cluster pools](https://grafana-route-ci-grafana.apps.ci.l2s4.p1.openshiftapps.com/d/22491886c1e19dde8d2984bca82154c1/cluster-pool-dashboard?orgId=1&from=now-7d&to=now) What to look for in Hive metrics: * https://docs.ci.openshift.org/docs/how-tos/cluster-claim/#existing-cluster-pools * Search for openstack * Look for number "Ready" - will show availability and if we are short on cluster for scheduled jobs ##### Operator Image Build status [Dashboard for monitoring operator Container builds](https://github.com/gibizer/openstack-k8s-status) #### Podifed Container Build Lines * [openstack-periodic-container-master-centos9](https://review.rdoproject.org/zuul/buildsets?pipeline=openstack-periodic-container-master-centos9) * [openstack-periodic-container-antelope-centos9](https://review.rdoproject.org/zuul/buildsets?pipeline=openstack-periodic-container-antelope-centos9&skip=0) #### EDPM Tests CI-framework * https://review.rdoproject.org/zuul/builds?job_name=cifmw-end-to-end&skip=0 * https://review.rdoproject.org/zuul/builds?job_name=cifmw-end-to-end-nobuild-tagged&skip=0 * https://review.rdoproject.org/zuul/builds?job_name=ci-framework-crc-podified-edpm-deployment&skip=0 * https://review.rdoproject.org/zuul/builds?job_name=ci-framework-crc-podified-edpm-baremetal&skip=0 Edpm-ansible * https://review.rdoproject.org/zuul/builds?job_name=edpm-ansible-crc-podified-edpm-deployment&skip=0 * https://review.rdoproject.org/zuul/builds?job_name=edpm-ansible-crc-podified-edpm-baremetal&skip=0 Dataplane-operator * https://review.rdoproject.org/zuul/builds?job_name=dataplane-operator-crc-podified-edpm-baremetal&skip=0 * https://review.rdoproject.org/zuul/builds?job_name=dataplane-operator-crc-podified-edpm-deployment&skip=0 Openstack-operator * https://review.rdoproject.org/zuul/builds?job_name=openstack-operator-crc-podified-edpm-baremetal&skip=0 * https://review.rdoproject.org/zuul/builds?job_name=openstack-operator-crc-podified-edpm-deployment&skip=0 Openstack-ansibleee-operator * https://review.rdoproject.org/zuul/builds?job_name=ansibleee-operator-crc-podified-edpm-deployment&skip=0 Openstack-baremetal-operator * https://review.rdoproject.org/zuul/builds?job_name=openstack-baremetal-operator-crc-podified-edpm-baremetal&skip=0 Periodic Jobs * https://review.rdoproject.org/zuul/builds?job_name=periodic-podified-edpm-deployment-antelope-ocp-crc-1cs9 * https://review.rdoproject.org/zuul/builds?job_name=periodic-podified-edpm-baremetal-antelope-ocp-crct ### How to debug a EDPM job failure Example log: * https://review.rdoproject.org/zuul/build/62a2f4396564439490779bab749bd377 will show the top level error * "View log" -> job-output.txt to confirm error * ci-framework-data/logs/: ansible.log-**date** will show the ansible failure * example `"stderr_lines": ["error: timed out waiting for the condition on openstackcontrolplanes/openstack-network-isolation"]` * epdm folder: * events.log - all the events happened on a cluster * operator_pods.txt (any failed operators - copy pod name) * openstack_pods.txt (openstack service pods created during deployment) * pods (search for pod name.txt) * pv.log (useful for any storage related failure) * crs - custom resources generated for each services and operator during deployment * crc folder: * contains all the logs related to crc cluster * system-config/libvirt/: * Logs related to libvirt vms * Look for latest changes to imapcted operators in https://github.com/openstack-k8s-operators Common errors: * ` *** [Makefile:393: namespace] Error 1\n", "stderr_lines": ["error: You must be logged in to the server (Unauthorized)"` ### Common Errors #### Memory ```could not run steps: step unit failed: failed to create or restart test pod: unable to create pod: Pod "unit" is invalid: spec.containers[0].resources.requests: Invalid value: "2171836734": must be less than or equal to memory limit of 2Gi``` Upper limit error: https://github.com/openshift/release/blob/master/ci-operator/config/openstack-k8s-operators/ironic-operator/openstack-k8s-operators-ironic-operator-main.yaml#L54-L56 ``` resources: '*': limits: memory: 2Gi requests: cpu: 100m memory: 200Mi ``` Need to bump to 4Gi