# CI flakes/failures notes This document is meant to collect CI test flakes and failures tracked by CI team for release cycle v1.4.0 ## 12.12.2022 - 16.12.2022, week2: Responsible: Furkat Gofurov * **12.12.2022:** * 2 jobs [capi-e2e-release-1-2-1-23-1-24](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.2#capi-e2e-release-1-2-1-23-1-24) and [capi-e2e-release-1-2-1-22-1-23](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.2#capi-e2e-release-1-2-1-22-1-23) are failing constantly for 5days: - [x] CAPI Issue: https://github.com/kubernetes-sigs/cluster-api/issues/7738 - [x] Test infra issue: https://github.com/kubernetes/test-infra/issues/28233 - [x] Debug PRs' in test-infra and fix in CAPI: - [x] https://github.com/kubernetes/test-infra/pull/28238 - [x] https://github.com/kubernetes/test-infra/pull/28243 - [x] https://github.com/kubernetes/test-infra/pull/28241 - [x] https://github.com/kubernetes-sigs/cluster-api/pull/7505 * 1 job [capi-pr-verify-main](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-pr-verify-main) has failed couple of times, triaged as flake: - [x] Issue: https://github.com/kubernetes-sigs/cluster-api/issues/7736 * **13.12.2022:** * Monitoring the CI signal before patch releases (v1.2.X & v1.3.x) scheduled for tomorrow, so far we have a ๐ŸŸข signal * **14.12.2022:** * CI is ๐ŸŸข, new patch releases are out ([v1.3.1](https://github.com/kubernetes-sigs/cluster-api/releases/tag/v1.3.1) and [v1.2.8](https://github.com/kubernetes-sigs/cluster-api/releases/tag/v1.2.8)) * **15.12.2022:** * 1 job [capi-e2e-release-1-1-1-23-1-24](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.1#capi-e2e-release-1-1-1-23-1-24) failing since 9th December (runs every other day). - [x] CAPI Issue: https://github.com/kubernetes-sigs/cluster-api/issues/7768 * **16.12.2022:** * Working on the fix of #7768 ---- ## 19.12.2022 - 23.12.2022, week3: Responsible: Alexander Demicev * **20.12.2022:** * [capi-e2e-release-1-1-1-23-1-24](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.1#capi-e2e-release-1-1-1-23-1-24) is still failing. * **21.12.2022:** * same as yesterday * **22.12.2022:** * all jobs were green except for the one from 20.12.2022 * **23.12.2022:** ---- ## 26.12.2022 - 30.12.2022, week4: * 2 jobs [capi-e2e-release-1-0-1-22-1-23](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-e2e-release-1-0-1-22-1-23) and [capi-e2e-release-0-4-1-22-1-23](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-e2e-release-0-4-1-22-1-23) are failing: - [ ] CAPI Issue: https://github.com/kubernetes-sigs/cluster-api/issues/7811 ==> this can be ignored until the tests decided to be kept. Leaving it open, can be marked as done, once tests are dropped or fixed. ---- ## January 2nd 2023 - January 6th 2023, week 5: Watcher: Nawaz * **January 2nd 2023** * everything looks good so far. * **January 3rd 2023** * The following below test jobs seem to be failing. * https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-mink8s-main * Initial failure on Jan 3rd. * **Resolution**: Was flake. Been running green EOD * https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-e2e-release-0-4-1-22-1-23 * Failing consistently since Dec 20th * **Resolution**: [We might not want to keep tests for unsupported version for long](https://github.com/kubernetes-sigs/cluster-api/issues/7811#issuecomment-1370573057) * https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-e2e-release-1-0-1-22-1-23 * Failing consistently since Dec 20th * **Resolution**: [We might not want to keep tests for unsupported version for long](https://github.com/kubernetes-sigs/cluster-api/issues/7811#issuecomment-1370573057) * https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-main * Been flacky; failed 4 times consecutively but then went green again * **Resolution**: Was flake. Been running green EOD. * **January 4th 2023** * https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-pr-e2e-main * This is flacky but it could be flacky for multitude of reasons and need not be pursued as these are PR tests. * **January 5th 2023** * https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-e2e-release-0-4-1-22-1-23 * Still failing since Dec 20th. * We have a PR addressing this from Stefan https://github.com/kubernetes-sigs/cluster-api/pull/7856 * https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-e2e-release-1-0-1-22-1-23 * Still failing since Dec 20th. * We have a PR addressing this from Stefan https://github.com/kubernetes-sigs/cluster-api/pull/7856 * https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.2#capi-e2e-release-1-2 * Been flacky and parts of it has failed intermittently making the whole test fail four times consecutively. * This test runs every 4 hours. Depending on the next run's output, will raise a new issue for it. * https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.3#capi-pr-e2e-informing-release-1-3 * Has failed thrice since Jan4th 23:20PST. * Will wait for another run for it and raise alarm if it fails again. * Raised https://github.com/kubernetes-sigs/cluster-api/issues/7858 to track this. ---- ## 09.01.2023 - 13.01.2023, week6: Responsible: Furkat Gofurov * **09.01.2023:** * 2 jobs are failing since last week: * [capi-e2e-release-1-3](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.3#capi-e2e-release-1-3) * [capi-e2e-release-1-2](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.2#capi-e2e-release-1-2) - [x] Tracking CAPI [Issue](https://github.com/kubernetes-sigs/cluster-api/issues/7858) - [x] Expected fix in main branch for above issue: backport PRs of [7856](https://github.com/kubernetes-sigs/cluster-api/pull/7856) into release-1.2/1.3 branches - [x] backport [PR](https://github.com/kubernetes-sigs/cluster-api/pull/7871) to release-1.3 - [x] backport [PR](https://github.com/kubernetes-sigs/cluster-api/pull/7870) to release-1.2 * **10.01.2023:** - both backport PRs landed, watching the CI closely. * **11.01.2023:** - CI is green, patch releases are out. * **12.01.2023:** - All green. * **13.01.2023:** - [x] [capi-pr-e2e-full-main](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-pr-e2e-full-main) failed twice, but it is a PR job. Monitoring for now, no actions needed. ---- ## 16.01.2023 - 20.01.2023, week7: Responsible: Aniruddha Basak * **16.01.2023:** * All Green but majority of the [sig-cluster-lifecycle-cluster-api-1.3](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.3) is flaky. * **17.01.2023:** - Same as yesterday. * **18.01.2023:** - Most of the [sig-cluster-lifecycle-cluster-api-1.3](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.3) is flaky. - Failing tests - - [capi-e2e-release-1-0-1-22-1-23](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-e2e-release-1-0-1-22-1-23) - [capi-e2e-release-0-4-1-22-1-23](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-e2e-release-0-4-1-22-1-23) * **19.01.2023:** - Same as yesterday. * **20.01.2023:** - All [sig-cluster-lifecycle-cluster-api-1.3](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.3) flaky tests are green. **Conclusion:** CI was mostly green, only open item to follow-up on is tracking open issue https://github.com/kubernetes-sigs/cluster-api/issues/7858 and close it depending on CI signal. ---- ## 23.01.2023 - 27.01.2023, week8: Responsible: Alex Demicev * **23.01.2023:** - miniK8S e2e is failing https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-mink8s-main, issue tracking thread https://kubernetes.slack.com/archives/C8TSNPY4T/p1674469385575859 - capi-e2e-release-1-0-1-22-1-23, capi-e2e-release-0-4-1-22-1-23: still failing from last week - https://kubernetes.slack.com/archives/C8TSNPY4T/p1674057996218879 * **24.01.2023:** - miniK8S is ok now * **25.01.2023:** - Every job looks fine * **26.01.2023:** - All was fine, some failures that we decided to ignore * **27.01.2023:** - same as yesterday รง ## January 30th 2023 - February 3rd 2023, week 9: Watcher: Nawaz * **January 30th 2023** * [capi-test-release-0-3](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.3#capi-test-release-0-3) has been failing since Jan 25th with the below error: - ``` *=== RUN TestKubeadmConfigReconciler_Reconcile_GenerateCloudConfigData 381 I0130 03:49:32.253250 8681 kubeadmconfig_controller.go:366] "msg"="Creating BootstrapData for the init control plane" "kind"="Machine" "kubeadmconfig"={"Namespace":"default","Name":"control-plane-init-cfg"} "name"="control-plane-init-machine" "version"="1" 382 I0130 03:49:32.254219 8681 kubeadmconfig_controller.go:813] "msg"="Altering ClusterConfiguration" "kubeadmconfig"="default/control-plane-init-cfg" "ClusterName"="cluster" 383 I0130 03:49:32.254256 8681 kubeadmconfig_controller.go:839] "msg"="Altering ClusterConfiguration" "kubeadmconfig"="default/control-plane-init-cfg" "KubernetesVersion"="v1.19.1" 384 --- FAIL: TestKubeadmConfigReconciler_Reconcile_GenerateCloudConfigData (1.43s) 385 panic: runtime error: invalid memory address or nil pointer dereference [recovered] 386 panic: runtime error: invalid memory address or nil pointer dereference 387 [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xaa1d4a] ``` - [ periodic-cluster-api-e2e-workload-upgrade-1-22-1-23-release-0-4](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-e2e-release-0-4-1-22-1-23) failing since Jan 17th, need to follow up if this test is to be ignored? - [periodic-cluster-api-e2e-workload-upgrade-1-22-1-23-release-1-0](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-e2e-release-1-0-1-22-1-23) failing since Jan 17th, need to follow up if this test needs to be ignored. - [pull-cluster-api-e2e-informing-release-1-3](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.3#capi-pr-e2e-informing-release-1-3) failing since Jan 18th, need to check if this is an important test signal. - **January 31st 2023** - Everything looks normal, except - https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.3#capi-test-release-0-3 - https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-test-release-0-4 - https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-test-release-1-0 - **February 1st 2023** - Everything looks normal, except - https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.3#capi-test-release-0-3 - https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-test-release-0-4 - https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-test-release-1-0 - **February 2nd 2023** - Everything looks normal - **February 3rd 2023** - Everything looks normal ---- ## 06.02.2023 - 10.02.2023, week 10: Responsible: Furkat Gofurov * **06.02.2023:** * all green (disregarding failing jobs in the EOL branches which soon to be removed) * **07.02.2023:** * Few jobs were failing due to image bump, fix in https://github.com/kubernetes/test-infra/pull/28654 landed * **08.02.2023:** * All green, few flaky tests in some jobs (capi e2e, capi-e2e-mink8s) * **09.02.2023:** * all green * **10.02.2023:** * all green ---- ## 20.02.2023 - 24.02.2023, week 12: Responsible: Aniruddha Basak * **20.02.2023:** * CI Failing - * [capi-e2e-main](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-main) * [capi-e2e-mink8s-main](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-mink8s-main) * **21.02.2023:** * All green * **22.02.2023:** * All green * **23.02.2023:** * All green * **24.02.2023:** * All green ---- ## February 27th 2023 - March 3rd 2023, week 13: Watcher: Nawaz * Februrary 27th * Everything looks green * February 28th * Everything looks green * March 1st * Everything looks green * March 2nd * Everything looks green * March 3rd * Everything looks green ---- ## March 6th 2023 - March 10th 2023, week 14: Responsible: Furkat * March 6th * All ๐ŸŸข * March 7th * All ๐ŸŸข * March 8th * `capi-e2e-main` job is flaky, tracking * March 9th * `capi-e2e-main` job is flaky * March 10th * `capi-e2e-main` job is extremely flaky, but not red yet, might need an escalation (e.g. issue) ---- ## March 20th 2023 - March 24th 2023, week 16: Responsible: Furkat * March 20 * All ๐ŸŸข * March 21 * All ๐ŸŸข * March 22 * All ๐ŸŸข * March 23 * All ๐ŸŸข * March 24 * All ๐ŸŸข ---- ## March 27th - March 31st 2023, week 17: Watcher: Nawaz K * March 27th * Mostly green ๐ŸŸข and some flakes * March 28th * Mostly green ๐ŸŸข * March 29th * Mostly green ๐ŸŸข * March 30th * Mostly green ๐ŸŸข * March 31st * Mostly green ๐ŸŸข