--- title: kapp on GKE autopilot clusters tags: kapp --- ## Summary GKE autopilot clusters use [Cilium](https://cilium.io/) for network connectivity and come pre-installed with it. When we create a an app using kapp which has a pod resource (like Deployment), then the ownership labels are injected into the Pods and the ReplicaSets as well, and into the `CiliumIdentity` resource in case of GKE autopilot cluster. While deleting the app, since `CiliumIdentity` is a cluster owned resource, kapp waits for the cluster to delete it, but these resources are not deleted immediately so kapp gets stuck waiting for the cluster to delete them. Example: ```bash $ kapp deploy -a simple-app -f https://github.com/carvel-dev/kapp/blob/develop/examples/simple-app-example/config-1.yml Target cluster 'https://xx.xxx.xx.xxx' (nodes: gk3-xxx-xxx-default-pool, 2+) Changes Namespace Name Kind Age Op Op st. Wait to Rs Ri default simple-app Deployment - create - reconcile - - ^ simple-app Service - create - reconcile - - Op: 2 create, 0 delete, 0 update, 0 noop, 0 exists Wait to: 2 reconcile, 0 delete, 0 noop Continue? [yN]: y 11:07:03AM: ---- applying 2 changes [0/2 done] ---- Warning: Autopilot set default resource requests for Deployment default/simple-app, as resource requests were not specified. See http://g.co/gke/autopilot-defaults 11:07:04AM: create service/simple-app (v1) namespace: default 11:07:06AM: create deployment/simple-app (apps/v1) namespace: default ...snip... 11:09:13AM: ---- applying complete [2/2 done] ---- 11:09:13AM: ---- waiting complete [2/2 done] ---- Succeeded ``` ```bash $ kapp delete -a simple-app Target cluster 'https://xx.xxx.xx.xxx' (nodes: gk3-xxx-xxx-default-pool, 2+) Changes Namespace Name Kind Age Op Op st. Wait to Rs Ri (cluster) 22690 CiliumIdentity 3m - - delete ok - default simple-app Deployment 5m delete - delete ok - ^ simple-app Endpoints 5m - - delete ok - ^ simple-app Service 5m delete - delete ok - ^ simple-app-64dccdbdf5 ReplicaSet 5m - - delete ok - ^ simple-app-64dccdbdf5-smkjb CiliumEndpoint 3m - - delete ok - ^ simple-app-64dccdbdf5-smkjb Pod 5m - - delete ok - ^ simple-app-64dccdbdf5-smkjb PodMetrics 2s - - delete ok - ^ simple-app-7mdbq EndpointSlice 5m - - delete ok - Op: 0 create, 2 delete, 0 update, 7 noop, 0 exists Wait to: 0 reconcile, 9 delete, 0 noop Continue? [yN]: y 11:12:13AM: ---- applying 9 changes [0/9 done] ---- 11:12:13AM: noop ciliumendpoint/simple-app-64dccdbdf5-smkjb (cilium.io/v2) namespace: default 11:12:13AM: noop pod/simple-app-64dccdbdf5-smkjb (v1) namespace: default 11:12:13AM: noop replicaset/simple-app-64dccdbdf5 (apps/v1) namespace: default 11:12:13AM: noop endpoints/simple-app (v1) namespace: default 11:12:13AM: noop endpointslice/simple-app-7mdbq (discovery.k8s.io/v1) namespace: default 11:12:13AM: noop podmetrics/simple-app-64dccdbdf5-smkjb (metrics.k8s.io/v1beta1) namespace: default 11:12:13AM: noop ciliumidentity/22690 (cilium.io/v2) cluster 11:12:13AM: delete deployment/simple-app (apps/v1) namespace: default 11:12:13AM: delete service/simple-app (v1) namespace: default ...snip... 11:12:17AM: ---- waiting on 1 changes [8/9 done] ---- 11:13:15AM: ongoing: delete ciliumidentity/22690 (cilium.io/v2) cluster 11:13:18AM: ---- waiting on 1 changes [8/9 done] ---- 11:14:16AM: ongoing: delete ciliumidentity/22690 (cilium.io/v2) cluster 11:14:19AM: ---- waiting on 1 changes [8/9 done] ---- ^C ``` ## Current Workarounds ### Use `filter` option during delete We can use the `filter` option during delete to filter out the `CiliumIdentity` resource during deletion. The disadvantage of using this option is that kapp won't delete the app unless all the resources of an app are deleted. ```bash $ kapp delete -a simple-app --filter '{"not":{"resource":{"kinds":["CiliumIdentity"]}}}' ``` ### Use `--apply-ignored` option during delete Setting `apply-ignored` to true will lead to kapp deleting the cluster owned resources and therefore this can have side-effects as well. ```bash kapp delete -a simple-app --apply-ignored ``` ### Add annotations to not inject the ownership labels kapp by default injects it's ownership labels to Pods, ReplicaSets ,.etc via the pod owned resources (like Deployment, Job). We can add the `kapp.k14s.io/disable-default-ownership-label-rules` and `kapp.k14s.io/disable-default-label-scoping-rules` annotations to our pod owned resources to prevent kapp from injecting it's labels to `CiliumIdentity` resources. ## Proposal Since all the GKE autopilot clusters use Cilium, we need to support it out of the box in kapp. We can either skip waiting for the CiliumIdentity resource by default or mark it as owned for deletion by kapp. If we skip waiting for it, then kapp will error out before deleting the app because all the app resoources are not deleted. If we somehow mark the resource as owned for deletion then kapp would need permissions to delete it which might not always be present. ### Option 1 Provide a way to exclude resources based on resource matchers through kapp config (or a flag). This will prevent kapp from searching for these resources. Thoughts: - If we use kapp config, then resource matchers can be of various types, GKs, GVs, namespace and name matcher,. etc. Excluding various types would be tricky - If we use a flag to exclude just GK, then a lot of the functionality would be overlaping with filters and it could get confusing.