# CI flakes/failures notes
This document is meant to collect CI test flakes and failures tracked by CI team for release cycle v1.4.0
## 12.12.2022 - 16.12.2022, week2:
Responsible: Furkat Gofurov
* **12.12.2022:**
* 2 jobs [capi-e2e-release-1-2-1-23-1-24](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.2#capi-e2e-release-1-2-1-23-1-24) and [capi-e2e-release-1-2-1-22-1-23](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.2#capi-e2e-release-1-2-1-22-1-23) are failing constantly for 5days:
- [x] CAPI Issue: https://github.com/kubernetes-sigs/cluster-api/issues/7738
- [x] Test infra issue: https://github.com/kubernetes/test-infra/issues/28233
- [x] Debug PRs' in test-infra and fix in CAPI:
- [x] https://github.com/kubernetes/test-infra/pull/28238
- [x] https://github.com/kubernetes/test-infra/pull/28243
- [x] https://github.com/kubernetes/test-infra/pull/28241
- [x] https://github.com/kubernetes-sigs/cluster-api/pull/7505
* 1 job [capi-pr-verify-main](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-pr-verify-main) has failed couple of times, triaged as flake:
- [x] Issue: https://github.com/kubernetes-sigs/cluster-api/issues/7736
* **13.12.2022:**
* Monitoring the CI signal before patch releases (v1.2.X & v1.3.x) scheduled for tomorrow, so far we have a ๐ข signal
* **14.12.2022:**
* CI is ๐ข, new patch releases are out ([v1.3.1](https://github.com/kubernetes-sigs/cluster-api/releases/tag/v1.3.1) and [v1.2.8](https://github.com/kubernetes-sigs/cluster-api/releases/tag/v1.2.8))
* **15.12.2022:**
* 1 job [capi-e2e-release-1-1-1-23-1-24](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.1#capi-e2e-release-1-1-1-23-1-24) failing since 9th December (runs every other day).
- [x] CAPI Issue: https://github.com/kubernetes-sigs/cluster-api/issues/7768
* **16.12.2022:**
* Working on the fix of #7768
----
## 19.12.2022 - 23.12.2022, week3:
Responsible: Alexander Demicev
* **20.12.2022:**
* [capi-e2e-release-1-1-1-23-1-24](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.1#capi-e2e-release-1-1-1-23-1-24) is still failing.
* **21.12.2022:**
* same as yesterday
* **22.12.2022:**
* all jobs were green except for the one from 20.12.2022
* **23.12.2022:**
----
## 26.12.2022 - 30.12.2022, week4:
* 2 jobs [capi-e2e-release-1-0-1-22-1-23](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-e2e-release-1-0-1-22-1-23) and [capi-e2e-release-0-4-1-22-1-23](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-e2e-release-0-4-1-22-1-23) are failing:
- [ ] CAPI Issue: https://github.com/kubernetes-sigs/cluster-api/issues/7811 ==> this can be ignored until the tests decided to be kept. Leaving it open, can be marked as done, once tests are dropped or fixed.
----
## January 2nd 2023 - January 6th 2023, week 5:
Watcher: Nawaz
* **January 2nd 2023**
* everything looks good so far.
* **January 3rd 2023**
* The following below test jobs seem to be failing.
* https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-mink8s-main
* Initial failure on Jan 3rd.
* **Resolution**: Was flake. Been running green EOD
* https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-e2e-release-0-4-1-22-1-23
* Failing consistently since Dec 20th
* **Resolution**: [We might not want to keep tests for unsupported version for long](https://github.com/kubernetes-sigs/cluster-api/issues/7811#issuecomment-1370573057)
* https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-e2e-release-1-0-1-22-1-23
* Failing consistently since Dec 20th
* **Resolution**: [We might not want to keep tests for unsupported version for long](https://github.com/kubernetes-sigs/cluster-api/issues/7811#issuecomment-1370573057)
* https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-main
* Been flacky; failed 4 times consecutively but then went green again
* **Resolution**: Was flake. Been running green EOD.
* **January 4th 2023**
* https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-pr-e2e-main
* This is flacky but it could be flacky for multitude of reasons and need not be pursued as these are PR tests.
* **January 5th 2023**
* https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-e2e-release-0-4-1-22-1-23
* Still failing since Dec 20th.
* We have a PR addressing this from Stefan https://github.com/kubernetes-sigs/cluster-api/pull/7856
* https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-e2e-release-1-0-1-22-1-23
* Still failing since Dec 20th.
* We have a PR addressing this from Stefan https://github.com/kubernetes-sigs/cluster-api/pull/7856
* https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.2#capi-e2e-release-1-2
* Been flacky and parts of it has failed intermittently making the whole test fail four times consecutively.
* This test runs every 4 hours. Depending on the next run's output, will raise a new issue for it.
* https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.3#capi-pr-e2e-informing-release-1-3
* Has failed thrice since Jan4th 23:20PST.
* Will wait for another run for it and raise alarm if it fails again.
* Raised https://github.com/kubernetes-sigs/cluster-api/issues/7858 to track this.
----
## 09.01.2023 - 13.01.2023, week6:
Responsible: Furkat Gofurov
* **09.01.2023:**
* 2 jobs are failing since last week:
* [capi-e2e-release-1-3](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.3#capi-e2e-release-1-3)
* [capi-e2e-release-1-2](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.2#capi-e2e-release-1-2)
- [x] Tracking CAPI [Issue](https://github.com/kubernetes-sigs/cluster-api/issues/7858)
- [x] Expected fix in main branch for above issue: backport PRs of [7856](https://github.com/kubernetes-sigs/cluster-api/pull/7856) into release-1.2/1.3 branches
- [x] backport [PR](https://github.com/kubernetes-sigs/cluster-api/pull/7871) to release-1.3
- [x] backport [PR](https://github.com/kubernetes-sigs/cluster-api/pull/7870) to release-1.2
* **10.01.2023:**
- both backport PRs landed, watching the CI closely.
* **11.01.2023:**
- CI is green, patch releases are out.
* **12.01.2023:**
- All green.
* **13.01.2023:**
- [x] [capi-pr-e2e-full-main](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-pr-e2e-full-main) failed twice, but it is a PR job. Monitoring for now, no actions needed.
----
## 16.01.2023 - 20.01.2023, week7:
Responsible: Aniruddha Basak
* **16.01.2023:**
* All Green but majority of the [sig-cluster-lifecycle-cluster-api-1.3](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.3) is flaky.
* **17.01.2023:**
- Same as yesterday.
* **18.01.2023:**
- Most of the [sig-cluster-lifecycle-cluster-api-1.3](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.3) is flaky.
- Failing tests -
- [capi-e2e-release-1-0-1-22-1-23](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-e2e-release-1-0-1-22-1-23)
- [capi-e2e-release-0-4-1-22-1-23](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-e2e-release-0-4-1-22-1-23)
* **19.01.2023:**
- Same as yesterday.
* **20.01.2023:**
- All [sig-cluster-lifecycle-cluster-api-1.3](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.3) flaky tests are green.
**Conclusion:** CI was mostly green, only open item to follow-up on is tracking open issue https://github.com/kubernetes-sigs/cluster-api/issues/7858 and close it depending on CI signal.
----
## 23.01.2023 - 27.01.2023, week8:
Responsible: Alex Demicev
* **23.01.2023:**
- miniK8S e2e is failing https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-mink8s-main, issue tracking thread https://kubernetes.slack.com/archives/C8TSNPY4T/p1674469385575859
- capi-e2e-release-1-0-1-22-1-23, capi-e2e-release-0-4-1-22-1-23: still failing from last week - https://kubernetes.slack.com/archives/C8TSNPY4T/p1674057996218879
* **24.01.2023:**
- miniK8S is ok now
* **25.01.2023:**
- Every job looks fine
* **26.01.2023:**
- All was fine, some failures that we decided to ignore
* **27.01.2023:**
- same as yesterday
รง
## January 30th 2023 - February 3rd 2023, week 9:
Watcher: Nawaz
* **January 30th 2023**
* [capi-test-release-0-3](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.3#capi-test-release-0-3) has been failing since Jan 25th with the below error:
- ```
*=== RUN TestKubeadmConfigReconciler_Reconcile_GenerateCloudConfigData
381
I0130 03:49:32.253250 8681 kubeadmconfig_controller.go:366] "msg"="Creating BootstrapData for the init control plane" "kind"="Machine" "kubeadmconfig"={"Namespace":"default","Name":"control-plane-init-cfg"} "name"="control-plane-init-machine" "version"="1"
382
I0130 03:49:32.254219 8681 kubeadmconfig_controller.go:813] "msg"="Altering ClusterConfiguration" "kubeadmconfig"="default/control-plane-init-cfg" "ClusterName"="cluster"
383
I0130 03:49:32.254256 8681 kubeadmconfig_controller.go:839] "msg"="Altering ClusterConfiguration" "kubeadmconfig"="default/control-plane-init-cfg" "KubernetesVersion"="v1.19.1"
384
--- FAIL: TestKubeadmConfigReconciler_Reconcile_GenerateCloudConfigData (1.43s)
385
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
386
panic: runtime error: invalid memory address or nil pointer dereference
387
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xaa1d4a]
```
- [ periodic-cluster-api-e2e-workload-upgrade-1-22-1-23-release-0-4](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-e2e-release-0-4-1-22-1-23) failing since Jan 17th, need to follow up if this test is to be ignored?
- [periodic-cluster-api-e2e-workload-upgrade-1-22-1-23-release-1-0](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-e2e-release-1-0-1-22-1-23) failing since Jan 17th, need to follow up if this test needs to be ignored.
- [pull-cluster-api-e2e-informing-release-1-3](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.3#capi-pr-e2e-informing-release-1-3) failing since Jan 18th, need to check if this is an important test signal.
- **January 31st 2023**
- Everything looks normal, except
- https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.3#capi-test-release-0-3
- https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-test-release-0-4
- https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-test-release-1-0
- **February 1st 2023**
- Everything looks normal, except
- https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.3#capi-test-release-0-3
- https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-0.4#capi-test-release-0-4
- https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.0#capi-test-release-1-0
- **February 2nd 2023**
- Everything looks normal
- **February 3rd 2023**
- Everything looks normal
----
## 06.02.2023 - 10.02.2023, week 10:
Responsible: Furkat Gofurov
* **06.02.2023:**
* all green (disregarding failing jobs in the EOL branches which soon to be removed)
* **07.02.2023:**
* Few jobs were failing due to image bump, fix in https://github.com/kubernetes/test-infra/pull/28654 landed
* **08.02.2023:**
* All green, few flaky tests in some jobs (capi e2e, capi-e2e-mink8s)
* **09.02.2023:**
* all green
* **10.02.2023:**
* all green
----
## 20.02.2023 - 24.02.2023, week 12:
Responsible: Aniruddha Basak
* **20.02.2023:**
* CI Failing -
* [capi-e2e-main](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-main)
* [capi-e2e-mink8s-main](https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-mink8s-main)
* **21.02.2023:**
* All green
* **22.02.2023:**
* All green
* **23.02.2023:**
* All green
* **24.02.2023:**
* All green
----
## February 27th 2023 - March 3rd 2023, week 13:
Watcher: Nawaz
* Februrary 27th
* Everything looks green
* February 28th
* Everything looks green
* March 1st
* Everything looks green
* March 2nd
* Everything looks green
* March 3rd
* Everything looks green
----
## March 6th 2023 - March 10th 2023, week 14:
Responsible: Furkat
* March 6th
* All ๐ข
* March 7th
* All ๐ข
* March 8th
* `capi-e2e-main` job is flaky, tracking
* March 9th
* `capi-e2e-main` job is flaky
* March 10th
* `capi-e2e-main` job is extremely flaky, but not red yet, might need an escalation (e.g. issue)
----
## March 20th 2023 - March 24th 2023, week 16:
Responsible: Furkat
* March 20
* All ๐ข
* March 21
* All ๐ข
* March 22
* All ๐ข
* March 23
* All ๐ข
* March 24
* All ๐ข
----
## March 27th - March 31st 2023, week 17:
Watcher: Nawaz K
* March 27th
* Mostly green ๐ข and some flakes
* March 28th
* Mostly green ๐ข
* March 29th
* Mostly green ๐ข
* March 30th
* Mostly green ๐ข
* March 31st
* Mostly green ๐ข