owned this note
owned this note
Published
Linked with GitHub
# Hack The Garden 2024-12 - Topics
Also see https://github.com/gardener-community/hackathon.
## Participants (22)
- Stefan M.
- Gerrit S.
- Valentin K.
- Tim E.
- Lukas H.
- Claus-Theodor R.
- Maximilian G.
- Michael E.
- Marcel B.
- Rafael F.
- Johannes S.
- Andreas F.
- Simon M.
- Luca B.
- Plamen K.
- Oliver G.
- Alexander H.
- Lukas F.
- Damyan Y.
- Erik S.
- Ismail A.
- Tobias S.
## Votings
See all proposals including links/details [here](#All-Proposals).
### Initial Assignment / First Topics
This is the list of the most voted topics indicating the highest interest. For each topic, there is a list of people who wanted work on it (or didn't state any topic preference).
This is just a suggestion for how to get started at the event without wasting time on topic and team coordination. Everybody is free to choose another topic if the grouping doesn't work for them. More topics can also be picked up freely as desired.
- 🚧 IPv6 support on additional infrastructure (14 votes)
- Johannes S.
- Stefan M.
- Damyan Y.
- PRs so far:
- [extension-provider-ironcore](https://github.com/ironcore-dev/gardener-extension-provider-ironcore/pull/669)
- [mcm ironcore](https://github.com/ironcore-dev/machine-controller-manager-provider-ironcore/pull/436)
- [ccm ironcore](https://github.com/ironcore-dev/cloud-provider-ironcore/pull/473)
- Open Issues
- LoadBalancer service does not work for dual stack clusters, no matter the IP family. Service creation (including external IP assignment) works, though the service is not reachable.
- Node-to-Node communication does not work for IPv6 (ironcore infrastructure issue)
- currently the root IPv6 prefix is hard-coded
- during `shoot` deletion the allocated prefixes are not deleted
- the pod IPAM works only for dual stack clusters, the pod ranges are hard-coded. IPv4-only clusters are not possible.
- no tests for the introduced changes
- Achievments
- dual stack shoot reconciles to 100%, no matter the primary IP family
- nodes, pods and services receive IP addresses from the proper ranges
- in-cluster node-to-node communication works
- bugs found for various `IronCore` components, a list with issues will follow, some of them are fixed already
- 🚧 CloudProfile: list of version states with start date per classification (9 votes)
- Gerrit S.
- Claus R.
- Valentin K.
- PRs
- https://github.com/gardener/gardener/pull/10982
- https://github.com/metal-stack/gardener/pull/9
- ✅ Gardener SLIs: cluster creation/deletion times, machine creation times (8 votes)
- Tim E.
- Simon M.
- Luca B.
- PRs
- https://github.com/gardener/ci-infra/pull/2807
- https://github.com/gardener/gardener/pull/10964
- https://github.com/gardener/gardener/pull/10967
- https://github.com/gardener/gardener/pull/10965
- https://github.com/gardener/gardener/pull/10971
- 🚧 Enhance {seed,node-agent} authorizers to evaluate field/label selectors (8 votes)
- Rafael F.
- Marcel B.
- 🚧 etcd encryption with cloud provider key management service (aka. bring your own encryption key) (7 votes)
- Plamen K.
- Lukas H.
- Alexander H.
- 🚧 Load Balancing of Shoot KAPIs (7 votes)
- Oliver G.
- Michael E.
- 🚧 VPA maxAllowed so that pods still fit on seed nodes (6 votes)
- Ismail A.
- Max G.
- Tobias S.
- PRs:
- https://github.com/kubernetes/autoscaler/pull/7560
- https://github.com/ialidzhikov/gardener/commits/enh/seed-and-shoot-vpa-max-allowed - depends on the above VPA PR
- 🚧 in-place node upgrade (4 votes)
- Erik S.
- Andreas F.
- Lukas F.
### More Topics To Pick Up
- Adapt e2e tests to gardener-operator setup (7 votes)
- nobody
- 🚧 Generic Monitoring Extension to manage remote writes in different Prometheus instances (7 votes)
- Max G.
- Simon M.
- Issues:
- https://github.com/gardener/gardener/issues/10985
- Gardener scale-out tests (5 votes)
- Tim E.
- Michael E.
- Marcel B.
- Plamen K.
- Integrate skaffold to provider-extensions for local deployments (5 votes)
- Ismail A.
- Alexander H.
- Inject correct CA bundle name into Prometheus scrape configs to prevent Gardener and all components from fetching it (5 votes)
- Rafael F.
- Tim E.
- Claus R.
- Marcel B.
- Add generic tool for evaluating shoot cluster compliance (5 votes)
- Stefan M.
- Johannes S.
- shared config for gardenctl (5 votes)
- nobody
- 🚧 deploy gardener prow with Flux (5 votes)
- Tim E.
- Oliver G.
- PRs
- https://github.com/gardener/ci-infra/pull/2812
- https://github.com/gardener/ci-infra/pull/2813
- https://github.com/gardener/ci-infra/pull/2828
- https://github.com/gardener/ci-infra/pull/2830
- https://github.com/gardener/ci-infra/pull/2832
- https://github.com/gardener/ci-infra/pull/2833
- https://github.com/gardener/ci-infra/pull/2834
- https://github.com/gardener/ci-infra/pull/2835
- https://github.com/gardener/ci-infra/pull/2836
- https://github.com/gardener/ci-infra/pull/2838
- https://github.com/gardener/ci-infra/pull/2839
- https://github.com/gardener/ci-infra/pull/2840
- https://github.com/gardener/ci-infra/pull/2842
- TODOs
- [ ] add/update docs for bootstrapping, secrets management
- Implement a CLI to analyze / search Prow Job failures (4 votes)
- Tobias S.
- Valentin K.
- ✅ Evaluate the ServiceTrafficDistribution feature as replacement of the TopologyAwareHints for topology-aware routing (4 votes)
- Max G.
- Ismail A.
- Johannes S.
- Oliver G.
- PRs:
- https://github.com/gardener/gardener/pull/10973
- 🚧 Trigger Credentials Rotation Per Worker Pool (3 votes)
- Rafael F.
- Migrate VPA and HPA state during shoot control plane migrations (3 votes)
- Ismail A.
- Plamen K.
### Fast Track
- ✅ GNA: Persist “applied-state” after each step (instead of only at the end)
- https://github.com/gardener/gardener/pull/10969
- ✅ Watch `ManagedResource`s in Shoot Care Controller
- https://github.com/gardener/gardener/pull/10987
- 🚧 Refactoring: Original OSC Controller should not write a secret
- ~~rfranzke~~ timebertt will open a PR 🫠
- ✅ CTRLreg controller should watch seed and requeue immediately when gardenlet removes finalizer or deletion is finished
- https://github.com/gardener/gardener/pull/10989
- ✅ (maboehm) gardener-resource-manager token-requestor: creation of kubeconfigs with shoot CA
- https://github.com/gardener/gardener/pull/10988
- 🚧 (maboehm) gardener-resource-manager token-requestor: customize watched namespaces
- 🚧 (LucaBernstein) refactor gardener e2e tests to [ordered ginkgo containers](https://onsi.github.io/ginkgo/#ordered-containers)
- Allow to override the kubernetes version compatibility check of gardenlet and extensions to allow them to start in newer kubernetes versions
- 🚧 Make cluster-autoscaler work in provider-local setup
- 🚧 (timebertt) drop internal version of component config APIs
- ✅ Switch `kind-up` seed authorizer configuration to Structured Authorization with match conditions
- https://github.com/gardener/gardener/pull/10984
- 🚧 Implementation of a Cluster API shim (2 votes)
### Discussion Topics (~~probably no hacking~~)
- ✅ autonomous shoot clusters (GEP-28): deep dive in PoC, come up with implementation plan
- ✅ TODO(timebertt): Check whether `podExecutor` (note the `o`) can be improved
- ✅ TODO(maboehm (silently 🤐 accepted 😺)): Investigate whether a linter exists to avoid: Ergebnis: gibt's nicht 🥹
- gardener-operator: enable/fix deployment of provider extensions -> knowledge transfer session
- non-required extension that allow a cluster to still reconcile even if the extension reaches an error
- gardener-extension-provider-metal: naming overlap?
- Quote of the week: "broom this room 🧹"
## All Proposals
Add your proposal in the following sections 🚀
### Core
- Gardener scale-out tests: run hollow gardenlets similar to kubemark’s hollow nodes to generate load on garden cluster (5 votes)
- Gardener SLIs: cluster creation/deletion times, machine creation times (8 votes)
- interesting for gardener developers and Kubernetes service providers
- filtered per cloud provider/seed?
- also collect for e2e tests in CI
- intstrumenting flow library (per task, identified by task name)
- (https://github.com/gardener/gardener/pull/10967)
- duration of entire flow -> might be useless because of mixed usage of flows: cluster creation/hibernation/wake-up, workerless shoots, etc.
- duration of individual flow tasks
- respect `SkipIf` in duration, skip counter
- time waiting for task dependencies
- timeouts
- errors
- time from flow start to task start ("delay")
- operation duration on cluster including retries (might include multiple flow executions)
- (https://github.com/gardener/gardener/pull/10965)
- done in shoot controller when finishing the `Create` or `Delete` operation -> recorded to controller-runtime metrics registry
- for service providers, only create and deletion operation can provide meaningful insights -> clusters are highly heterogeneous
- other operations have a high cardinality (e.g., workerless, number of nodes, performing an upgrade or not, hibernating or normal reconciliation, force or normal deletion, etc.) -> only meaningful in CI environments because homogeneous cluster operations -> even then: how to distinguish normal reconciliation from reconciliation with node roll
- would need to store operation start time
- store in status.lastOperation.startTime
- reset on generation bump
- categorize by operation type, workerless, hibernation state, force deletion, etc.
- display in to-be-built seed plutono dashboard
- record timing of e2e test steps
- would be solved out of the box by refactoring e2e tests to ordered `It`s
- -> out of scope for this track
- CloudProfile: list of version states with start date per classification, i.e., automatically promote/deprecate K8s/OS versions (9 votes)
- HA-cluster only seeds (aka. allow disable non-HA clusters) (0 votes)
- Enhance seed authorizer to evaluate field/label selectors (8 votes)
- Adapt e2e tests to gardener-operator setup (7 votes)
- Migrate VPA and HPA state to new control plane during shoot control plane migrations to avoid unnecessary evictions for clusters that were under high load (3 votes)
- Trigger Credentials Rotation Per Worker Pool: https://github.com/gardener/gardener/issues/10121 (3 votes)
- in-place node upgrade: https://github.com/gardener/gardener/pull/10828 (4 votes)
- improve cross-cluster webhook deployments (certificate rotation etc.) for workerless shoots (1 votes)
### Extensions
- gardener-extension-provider-stackit open source: general code review, extensions rework (0 votes)
- etcd encryption with cloud provider key management service (aka. bring your own encryption key) (7 votes)
- Forbid replacing secret with new account for existing Shoots: https://github.com/gardener/gardener-extension-provider-aws/issues/40 (1 votes)
- applicable to all infrastructures except azure where such validation already exists
- Integrate skaffold to provider-extensions for local deployments (5 votes)
### Networking
- IPv6 support on additional infrastructure, e.g. [ironcore](https://github.com/ironcore-dev), [metal-stack](https://github.com/metal-stack) or on STACKIT openstack (14 votes)
- Load Balancing of Shoot KAPIs: https://github.com/gardener/gardener/issues/8810 (7 votes)
- Evaluate the ServiceTrafficDistribution feature as replacement of the TopologyAwareHints for topology-aware routing: https://github.com/gardener/gardener/issues/10421 (4 votes)
### Observability
- Inject correct CA bundle name into Prometheus scrape configs to prevent Gardener and all components from fetching it (5 votes)
- Put Plutono dashboards in a central namespace to not duplicate them in each shoot namespace (4 votes)
- Generic Monitoring Extension to manage remote writes in different Prometheus instances (8 votes)
- Interested parties: IronCore, STACKIT, x-cellent
### Security
- Add generic tool for evaluating shoot cluster compliance with standards or make existing tools aware of Gardener architecture (5 votes)
- e.g., kubebench from Aqua security
- see existing work on diki: https://github.com/gardener/diki
### Autoscaling
- VPA maxAllowed so that pods still fit on seed nodes, related to https://github.com/gardener/gardener/pull/10413 (https://github.com/kubernetes/autoscaler/issues/7147) (6 votes)
- [one implementation](https://github.com/sapcc/vpa_butler/blob/main/internal/controllers/vpa_runnable.go#L162)
### Other
- shared config for gardenctl (5 votes)
- deploy gardener prow with Flux (5 votes)
- Improve Prometheus Operator deployment in gardener prow (reduce resource consumption) (2 votes)
- prow go cache for speeding up e2e test jobs
- Webhook development with `mirrord` (replace legacy `hook-me.sh`) 2 votes)
- Implement a CLI to analyze / search Prow Job failures (4 votes)
## Notes
### seed authorizer
- ❌ chore: add feature gate to admission controller to enable the feature for k8s 1.31
- ✅ move `helper.IsSeedReadyForMigration` out of gardenlet, so that we never have to read other seeds
- ✅ when gardenlet reconciles ManagedSeeds is needs to delete the Seed when the MS is deleted. This is not part of the graph yet ~~, and probably complicated~~. We need to add labels to the seeds, which ManagedSeeds caused them to be created
- getting rid of alwaysAllowedVerbs
- ~~add edge from seed to itself~~ (not needed)
- ✅ and from seed to ManagedSeed
- ✅ label Seed with labels for "itself" and the "managing-seed"
- via admission plugin
- this will allow us to add label selectors to the seed cache in `gardenlet/app.go`
### VPA maxAllowed so that pods still fit on seed nodes
https://hackmd.io/GwTNubtZTg-D1mNhzV5VOw