Cluster API Karpenter Feature Group Notes

meeting zoom
(passcode: 77777)
Meetings are scheduled for 19:00 UTC, immediately following the Cluster API office hours.

Please add agenda items for future meetings here:

<add name and agenda topic above here ex. [name] topic>

2025-04-30 @ 19:00 (UTC)

~~recording~~

Attendees

elmiko - Red Hat

Agenda

elmiko - given the velocity and involvement on the provider, should we reduce the cadence of this meeting to once a month?
- moving to a once every 4 weeks cadence for this meeting.
- if the group has no further goals, we should probably stop the feature group meetings and convert to project meetings.
elmiko - we have some regular maintenance tasks to catch up on, rebasing k8s, cluster api, and karpenter core.

2025-04-16 @ 19:00 (UTC)

~~recording~~

Attendees

elmiko - Red Hat

Agenda

cancelled due to lack of quorum, will raise cadence issue at next cluster-api community meeting.

2025-03-19 @ 19:00 (UTC)

recording

Attendees

elmiko, Marco - Red Hat
Paweł Bek

Agenda

elmiko - provider updated to 1.30 kube and 1.9 cluster api
- next steps, update to 1.31 kube and 1.10 cluster api
- Pawel - want to ask controller-runtime maintainers about making it easier to consume setup-envtest
  - https://github.com/kubernetes/community/tree/master/sig-api-machinery

2025-03-05 @ 19:00 (UTC)

recording

Attendees

elmiko, Julio - Red Hat

Agenda

elmiko - fixed up https://github.com/kubernetes-sigs/karpenter-provider-cluster-api/pull/20
- could use a review
- Julio going to take a look, perhaps do next update as well
we might want to review the getting started docs to see how easy it is.

2025-02-19 @ 19:00 (UTC)

recording

Attendees

elmiko, Julio - Red Hat
Paweł Bek

Agenda

Paweł - End-End stuff based on one of quite complex from operational perspective PR where K8s,CAPI and Karpenter get an update: https://github.com/kubernetes-sigs/karpenter-provider-cluster-api/pull/20
- Looks like tests fail there due to unability to start test cluster env due to lack of kubebuilder, tests continue to run but there are nil pointers due to structs which were never filled with data. I could check this out but there is a PR already opened. Trying to rejoin.
- I guess I could just try to adjust yamls but it does not seem like a problem here
- There's Nil Pointer Deref error in the end, and I think it's caused by missing Struct because Cluster failed to spin up)
- Runner can't fetch kubebuilder from GCS (Goole Cloud Storage)
- Control Planes cannot then gets bootstrapped for a cluster to run tests on
- etcd binary cannot run (ETCD by default is ControlPlanes and Nodes failed to spin for that runner sandbox env, so it's only inquired by the aforementioned;
- etcd k/v binary cannot be found in PATH - probably script setting envs fails because there is no machine to even set PATHs on to include etcd3 binary.
  Therefore, Diagnose is: Lack Of ControlPlane Node(s)
- (Looks like it's kubernetes karpenter-provider for CAPI but it includes updates for k8s, CAPI and Karpenter vs CI/CD)
- Trying to find a proper owner/project for this but it looks like it may be:
  https://docs.google.com/document/d/12v6NFr7xal9RH3GEB_1IR6HENq2GMuzALS3l5elvrMg
- if not
  https://docs.google.com/document/d/1AUiuvapS3ldYVJfKucDhIoH6IJIPS009jqwnSTwS0E
  - think you want https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/community/20231018-karpenter-integration.md
- I decided to get involved in NoBo to gain better insight and because it aligns with my own plans/ambition (it is certainly important from DevOps point of view, and the PR pasted, even though it's provider should make certain considerations such as below clearer)
Seperate but slightly related
- Paweł : Ubuntu update of the runner OS in github actions processes may come in with usual package manager repositories changes, depending on deb image and its contents some pipelines could get affected. Worth looking into if not already done.
- elmiko - we are using the github provided ubuntu image

2025-02-05 @ 19:00 (UTC)

recording

Attendees

elmiko, Marco - Red Hat
Paweł Bek
Fabrizio Pandini - VMware

Agenda

Fabrizio - big picture questions, thinking back to previous designs, we talked about targeted Machine instead of MachineDeployment?
- (see recording for discussion)

2025-01-22 @ 19:00 (UTC)

recording

Attendees

elmiko, Julio - Red Hat
Paweł Bek

Agenda

elmiko - updated provider to 1.8 capi, 1.0 karpenter, and 1.31 k8s
- working on next update, 1.9/1.1.0/1.32
- https://github.com/kubernetes-sigs/karpenter-provider-cluster-api/pull/20
adding Julio as a reviewer
- https://github.com/kubernetes-sigs/karpenter-provider-cluster-api/pull/19

2025-01-08 @ 19:00 (UTC)

recording

Attendees

elmiko, Marco Braga, Julio - Red Hat
Fabrizio Pandini - VMWare
Paweł Bek

Agenda

Luke - curious about status of project
- want to integrate with Cluster API, curious about how this work with MachinePools?
elmiko - slack channel
- reference https://www.kubernetes.dev/docs/comms/slack/
- we are limited to 21 characters
- "karpenter-provider-cluster-api" won't work, what should we use?
- karpenter-cluster-api ?
- Julio - what do other karpenter provider do for their channels?
  - maybe karpenter-provider-capi
  - k7r-capi
  - kapi
- Marco - do other capi providers use a prefix?
elmiko - updates to code
- https://github.com/kubernetes-sigs/karpenter-provider-cluster-api/pull/17
- https://github.com/kubernetes-sigs/karpenter-provider-cluster-api/pull/16
elmiko - experimenting with asynchronous NodeClaim/Machine creation
- https://github.com/kubernetes-sigs/karpenter/issues/1273
Luke - curious if there will be helm charts
- AI: Luke - make an issue about this on the repo
Fabrizio - big picture questions, thinking back to previous designs, we talked about targeted Machine instead of MachineDeployment?
- how to bring Karpenter features/arch to cluster-api
- elmiko - let's bring this up next time too

2024-12-25 Cancelled for holiday

2024-12-11 Lack of quorum

recording

Attendees

elmiko, julio - Red Hat

Agenda

next meeting, Wednesday January 8 2025
- 25 December meeting is canceled
- happy holidays and new year!
ended early due to lack of quorum, no recording

2024-11-26 Cancelled for USA holiday

2024-11-13 Cancelled for KubeCon

2024-10-30 @ 19:00 (UTC)

recording

Attendees

elmiko, Julio - Red Hat
Jeremy Lieb - Adobe

Agenda

elmiko - dependency update, karp 1.0, capi 1.8, k8s 1.29
- https://github.com/kubernetes-sigs/karpenter-provider-cluster-api/pull/16

2024-10-16 @ 19:00 (UTC)

recording

Attendees

elmiko, Marco Braga - Red Hat
Mike Tougeron - Adobe
Jeremy Lieb - Adobe

Agenda

elmiko - updating to 1.0 karpenter
elmiko - asynchronous machine creation
- touge - perhaps similar to early kube unknown provider id
touge - code reviews, anything we can help with, anything we should be on the lookout for?
- happy for any participation, we are in full foss mode!
- at the moment things are slow, but keep on the lookout for open pull requests and issues

2024-10-02 @ 19:00 (UTC)

recording

Attendees

elmiko - Red Hat
Jeremy - Adobe

Agenda

[elmiko] - https://github.com/kubernetes-sigs/cluster-api/pull/11250
(discussion about user experience and what we want to achieve as a feature group)

2024-09-18 @ 19:00 (UTC)

recording

Attendees

elmiko, Marco, Julio - Red Hat
Jeremy - Adobe

Agenda

[elmiko] current status of repo migration
[elmiko] things to do
- update feature group charter
- start draft of caep
- create slack channel
[elmiko] alternate designs, focused on api contracts, karpenter as a provider for capi

2024-09-03 @ 19:00 (UTC)

recording

Attendees

elmiko, Marco Braga - Red Hat

Agenda

[elmiko] looking to add maintainers to the repo
- several people here have been instrumental in our progress, would anyone like to volunteer?
- [marco] will talk more offline
[elmiko] should we continue the feature group?
- if so, what topics should we get into next?
- perhaps landing enhancement is next
- [marco] maybe change the cadence to once a month
- [elmiko] looking at the charter, the 2nd point of scope might make sense to keep things going until we have an enhancement
  - https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/community/20231018-karpenter-integration.md
  - [marco] does point 2 also include the notion of maintainership and having a healthy community?
    - [elmiko] i didn't think so when i wrote it, but now it might be nice to update with those points.
[elmiko] status of repo migration
[elmiko] creating a CAEP for karpenter integration in capi repo
- [elmiko] will start working on draft to capture current work, this can also inspire the documentation we create
[elmiko] creating a slack channel for devel discussions
- is it appropriate to field questions in the cluster-api channel?
  - let's bring this up at the next capi meeting
- channel name ideas
  - #karpenter-cluster-api (this seems logical)
  - #cluster-api-karpenter
  - #karpenter-provider-cluster-api
  - (current capi providers do not use "provider" in their names)
AI: elmiko update feature group charter with info about enhancement and maintainership

2024-08-21 @ 19:00 (UTC)

recording

Attendees

elmiko, Marco Braga - Red Hat

Agenda

[elmiko] issue created to migrate repo into kubernetes-sigs org
- https://github.com/kubernetes/org/issues/5097
- we discussed this at the sig cluster lifecycle meeting this week
  - no objections to adoption
  - want to wait for 2 chairs on PTO to review
  - planning for lazy consensus to end on 1 September
[elmiko] retrospective on design process
- https://notes.elmiko.dev/2024/08/18/developing-the-karpenter-cluster-api-provider.html
[elmiko] next steps after migration
- adding issues to the repo
- experimenting with other designs
- starting to draft a CAEP
- should we keep the feature group going?
- [marco] should we create a slack channel for the project?
  - yes, good idea

2024-08-07 @ 19:00 (UTC)

recording

Attendees

elmiko - Red Hat
Scott - TeraSky

Agenda

[elmiko] status on create interface
- it's working! demo video
- opted to back off on the separate controller idea in favor of a simple blocking mechanism
- this should be fine for the PoC, but i would like to explore better patterns if it becomes a performance issue
[elmiko] moving repo to kubernetes-sigs org
- i would like to get the create interface working before moving
- getting some requests to move, want to gauge how people feel about it
[elmiko] hoping to spurn more conversations in CAPI community
- maybe revisit ClusterClass with respect to installing non-CAPI resources
- can we build better U/X
[scott] thinking about capacity, does the PoC have logic to pull capacity from the InfraMachineTemplates?
- not yet, but it's high on the list of improvements
blocking create might be a problem on some platforms, e.g. vsphere can sometimes take a few minutes to get a provider id
- maybe timeout should be configurable, as a flag to karpenter

2024-07-24 @ 19:00 (UTC)

Attendees

elmiko

Agenda

lack of quorum, will revisit topics in the next meeting

2024-07-10 @ 19:00 (UTC)

recording

Attendees

elmiko, Marco - Red Hat
Scott - TeraSky
Jeremy - Adobe

Agenda

[elmiko] delete interface complications
[elmiko] create interface status
[elmiko] hoping to have an alpha ready by end of July

2024-06-26 @ 19:00 (UTC)

recording

Attendees

elmiko - Red Hat
Scott - TeraSky
Jeremy - Adobe

Agenda

[elmiko] update on provider status, solving the asynchronous instance creation problem
- [scott] create with async, how to handle multiple replicas created at once? how to get 1:1 mapping
  - will create a fake provider id on first call (eg clusterapi://machine-deployment-name-<random number>)
  - upon reconciling new Machines, check MachineDeployment owner name against open NodeClaims
  - associate Machine from MachineDeployment with NodeClaim of similar
- [scott] with delete how does this work?
  - similar to cluster autoscaler
  - mark Machine for deletion
  - reduce MachineDeployment replicas

2024-06-12 @ 19:00 (UTC)

[elmiko] - I will be traveling on Wednesday and might not be at my hotel by the time of this meeting. It is probably prudent for us to cancel.

By way of update, this is the current state of the PoC:

completed GetInstanceTypes interface
working solving asynchronous provider ID through Create
implementing Get and Delete
exploring Karpenter code to better understand NodeClaim lifecycle.

Questions

If you have any questions about Karpenter Cluster API provider or about the current state of the PoC, please add them below and I will answer asynchronously:

[name] - question

2024-05-29 @ 19:00 (UTC)

Canceled

2024-05-15 @ 19:00 (UTC)

recording

Attendees

elmiko - Red Hat
Jeremy - Adobe

Agenda

[elmiko] prep for deep dive on 16 May
[jeremy] adding more abstractions on top of karpenter and cluster-api could lead to worse performance. don't want the layer cake to get so thick that it adds to time it takes to make nodes.
- in general, performance of scaling speed is important to optimization. having nodes quickly reducing the pressure on overprovisioning.

2024-05-01 @ 19:00 (UTC)

recording

Attendees

elmiko - Red Hat
Fabrizio - vMWare
Tony G, Jeremy L - Adobe

Agenda

[elmiko] using scalable resource types to back NodeClaims
- giving some background
- [fabrizio] want to avoid us going down the path of re-engineering the scalable resources to account for the collections of machines that are created
- [fabrizio] what if we could invert the layers a little and have a MachineDeployment as the owner, perhaps with some options to differentiate the machines
- [fabrizio] happy to help with some pairing
- AI: elmiko to gather notes about technical difficulties with scalable resources, will schedule a deep dive meeting with Fabrizio, and the community, to investigate further

2024-04-17 @ 19:00 (UTC)

recording

Attendees

elmiko - Red Hat
Mike T, Jeremy L - Adobe
Pau - Giant Swarm

Agenda

[elmiko] making my repo public
- it's not ready for use, but i would feel better to have it open
- [mike t] +1 if it's at the point and advertises its status
[elmiko] review some architecture desicions for the PoC
- joined cluster to make things simple
- labeling machines for ownership
  - [mike t] +1 short term, longer term want to be able to delete specific machine object, how will this work in karpenter?
    - this is a good question and we might use owernship with the NodeClaim in some way
    - [jeremy] from my experience working with karpenter, it seems like there are too many options for delete style operations. eg delete the node or delete the nodeclaim.
- specifying infra templates by name in ClusterAPINodeClass
  - [mike t] i'm a fan of label selectors

2024-04-03 @ 19:00 (UTC)

recording

Attendees

elmiko, Marco - Red Hat

Agenda

[elmiko] - first technical challenge, working through multiplexing client issues.
[scott] - have we looked at how karpenter will look at the NodeClaims and NodePools, can it be configured to run multiple copies in a specific namespace, how will we handle multiple clusters in the same management cluster?
- yes, somewhat, we will have to instruct users that running karpenter is best suited for individual namespaces
- AI: elmiko to look into namespacing karpenter and can it differentiate NodePools, NodeClaims.
[scott] how will we indicate which infrastructure templates are ok for karpenter to use?
- elmiko: initially just going to use annotations, but we'll need to talk with wider community
- scott: maybe list on Cluster object to indicate which can be used?

2024-03-20 @ 19:00 (UTC)

Cancelled due to KubeCon

next scheduled meeting 2024-04-17

2024-03-06 @ 19:00 (UTC)

recording

Attendees

Tony Gosselin, Mike Tougeron, Adobe
elmiko - Red Hat
Scott Rosenberg
Jack Francis - Microsoft

Agenda

[elmiko] where should the karpenter resources live?
- as i'm working through the deployment options it gives me pause for thought about how we will consume the karpenter and cluster api resources. would love to talk through the various patterns.
- [jack] do the karpenter objects have a lifecycle beyond a specific reconciliation, does NoidClaim have owner-like properties?
  - think so
- [jack] in CAS, we namespace the app in the management cluster, but it only has application no specific operands. many to many relation between karpenters and workload clusters.
  - [scott] have tested large number of clusters with capi and cas, seemed to work well, needed a little more resource for the cas instances
- [jack] if karp is the same, would want to have namespace separation for karpenters, with similar separation for the crds. having these in the management cluster makes sense in this scenario.
- seems like we are talking about having a similar pattern for karpenter, recommend to run in management cluster in same namespace as capi artifacts.
  - [scott] open question here, how will multiple karpenters in the same namespace handle living next to each other, is this even possible? we use a similar pattern in CAS now.
- [jack] for MVP we can focus on karpenter in management cluster, and then later we can address some of the wider questions, or even adjust our assumptions.
- [mike] my team has hesitancy to run everything through a management cluster, not for anything speific beyond "all the eggs in one basket", i do think running in management cluster is better approach.
- [scott] running karpenter in the workload starts to get into that self-managing pattern that we've seen be problematic in the past. having CAS for management cluster and karpenter for workloads is another pattern that would be beneficial.
- [jack] we need to make sure that we build it in such a way that it is idempotent and can restart easily
- [elmiko] (describing architecture and progress)
- [scott] infra templates in capi are much more specific than those that karpenter normally deals with
  - [elmiko] would like to see this get better over time, maybe we need more from capi
- [jack] do you think we'll need capi changes for the poc?
  - don't think so, we should be able to use the existing mechanisms for scale from zero with the kubemark provider for the poc

2024-02-21 @ 19:00 (UTC)

recording

Attendees

elmiko - Red Hat
Jack Francis - Microsoft

Agenda

[elmiko] repo progress
- hope to have repo setup in next couple weeks
- [jack] are we talking about a kubebuilder init type repo?
  - [elmiko] not quite sure yet, want it to be easy to hack on and build
[elmiko] orphan machine progress
- making decent progress, doesn't seem blocked by controllers
- [jack] might want to stop using "orphan" term as it's not quite accurate
  - "karpenter owned"
  - "singleton machine"
  - "ownerless machine"
- [jack] concept of "externally" owned machine
  - elmiko ++
[elmiko] price information / dynamic inventory / inventory discovery
- [jack] this will be cloud provider specific, not sure if aws/azure have enough to use as a common implementation or api. potential for other providers to emulate behavior if we can agree on a contract (aws/azure already have this written).
- what about some sort of discovery information about instances on the provider (eg pricing)

2024-02-07 @ 19:00 (UTC)

recording

Attendees

elmiko - Red Hat
Cameron McAvoy - Indeed
Jeremy Lieb - Adobe

Agenda

[elmiko] orphan machine discussion
- concerned about owner reference after reading the docs
- https://cluster-api.sigs.k8s.io/developer/architecture/controllers/machine
- still working on some kubemark experiments with orphans
- [elmiko] not sure if owner reference is causing me issues
- [cameron] looked into something similar with machinepools, this might be in the pr conversation or enhancement
  - look at machinepool machines, provider can make machines that might not have owner
[elmiko] repo progress
- talked with sig autoscaling, cluster api community, and karpenter working group, no objections to creating repo
- would like to get a bootstrap in place before requesting
- [cameron] would be nice if we could consume the early versions to be able try things out

2024-01-24 @ 19:00 (UTC)

recording

Attendees

elmiko - Red Hat
jeremy - Adobe
jonathan - AWS

Agenda

[jeremy] looking at how to run karpenter on cluster
- karpenter doc recomends not running in the same cluster
- thinking about some sort of dual-mode management where CAS could manage some nodes
- don't like have a bifurcated approach
- like being able to specify instance types for use and lose some of that when using ASGs
- set up a cascade style topology with karpenter nodes managing other karpenter nodes
  - karp admin node pool, small maybe 2 nodes, managing scaling for the rest of the cluster (in a separate node pool)
  - a separate node pool, also running karpenter, manages the admin node pool
  - feel this is better than managing the admin pool with CAS
- same cluster, multiple node pools, multiple karpenters monitoring
- is this like a ring or more like primary/secondary?
  - ring
- node group a - running karp a - managing node pool b
- node group b - running karp b - managing node pool a
- scaling to zero is an issue because it would starve one of the karpenters
- question that predicated this investigation was about how to size the nodes for running karpenter, thinking about in relation to ASGs and CAS, weren't quite sure what we wanted, karpenter seemed better aligned
- want to have admin node group have the ability to vertically scale when karpenter needs more resources

2024-01-10 @ 19:00 (UTC)

recording

Attendees

elmiko, Marco Braga - Red Hat
jackfrancis - Microsoft
Cameron McAvoy - Indeed
Jeremy Lieb - Adobe

Agenda

[elmiko] should we increase the frequency of this meeting?
- +1 from jack
- elmiko, i'm neutral, kinda prefer if we change when needed
- Cameron, maybe once we have a repo and some code to hack on
- let's revisit in a month
[elmiko] orphan machine pattern and infrastructure templates
- [elmiko] i am investigating this direction, hoping to build traction here
- [jack] no objection on this direction, would love to see a prototype of what the integration would look like. what does it look like to create the machine with infra template and metadata. active capi cluster running a provider, skip the owner object and just create machines on their own. would like to confirm this.
  - [elmiko] i am investigating this with kubemark provider, can create a demo for next time
- [cameron] no objection, would like to better understand the primary purpose for the orphan machines.
  - [elmiko] orphan machines are like a read-only way for users to understand what is happening
- [jack] on deletion, why wouldn't we just delete the machine and allow capa (for example) to do the cleanup?
  - we will rely on the machine being deleted by provider id from the core interface that karpenter exposes for node claims
[elmiko] do we have enough consensus to start think about creating a code repo?
- should we make a karpenter-provider-cluster-api in kube-sigs org?
  - alternative cluster-api-provider-karpenter
- we want to target a generic cluster-api implementation
- if any contracts are needed from providers (eg infra machine capacity), we will put that material into the CAEP and socialize through the community. if/when needed, providers will be responsible for operating with karpenter.
- [jack] let's bring this up at monday's sig autoscaling meeting

2023-12-13 @ 19:00 (UTC)

recording

Attendees

elmiko - Red Hat
Marco Braga - Red Hat
jonathan-innis - AWS
njtran - AWS
Mike Tougeron, Nathan Romriell, Jonathan Raymond, Andrew Rafferty - Adobe

Agenda

[elmiko] MachineDeployments, what do folks think about the idea of different modes for them?
- [jack] might expect a mode property at the top level, different modes could then signal heter or homo geneous. is there anything about the machine template that says it has to be homogenous, or represent a single type?
  - would we need a new infrastructure type?
    - karpenter machine template?
      - would this make duplicated fields or objects?
      - could something create these automatically?
  - how to reconcile this with a single karpenter provider for capi?
- [jack] what is the "lift" for creating a pan-infra provider that plugs into karpenter in the same manner as other providers?
- do we even need the MachineDeployment?
generic capi provider will be different than the specific providers
- [jack] not convinced that there is an architectural advantage either way
is there some way to bring the concepts of NodeClaim and Machine together?
karpenter replaces capi controller manager altogether, assuming we don't need a MacheDeployment
- [mike] from my perspective, interacting with the Machine is a large part of the value of this integration
maybe we should meet weekly in the new year?

2023-11-29 @ 19:00 (UTC)

recording

Attendees

elmiko - Red Hat
Mike Tougeron, Jeremy Lieb - Adobe
Cameron McAvoy - Indeed

Agenda

[elmiko] should we cancel the meeting on 27 December?
- we will cancel
[elmiko] NodeClass, still learning about this but wondering if it would be appropriate to wrap an InfrastructureMachine in this type of object?
- cloud provider gives some nuts and bolts about launching
- NodeClaim is the "i want a machine"
  - [elmiko] does NodeClass get applied to k8s?
- core karp generates NodeClaims based its matching
- NodeClass contains provider specific details
- karpenter wants to own the provisioning lifecycle
- maybe possible to just have karp created Machine objects on its own, without a parent object
- would the community be ok with orphan machines?
[jonathan] where do we want to integrate?
- karp doing provisioning or capi doing the provisioning?
- is it inefficient to have capi doing the provisioning?
- are we creating extra CRs by having both together?
[jack] what about scenarios where karpenter is running on a previously created CAPI cluster?
- [cameron] this can be done today with eks out of the box, additional work required for kubeadm clusters
what is our goal?
- we want to integrate with karpeneter in a manner that allows cluster api users to continue using cluster api interfaces and CRs. basically so that cluster api users can leverage their experience and also gain access to karpenter's benefits. (if possible)

2023-11-15 @ 19:00 (UTC)

recording

Attendees

elmiko - Red Hat
Cameron McAvoy - Indeed
Mikhail Fedosin - New Relic
Chris Negus - AWS

Agenda

[elmiko] is it feasible to build a pure CAPI provider for karpenter?
- what issues are we bound to face?
[ellis] what about making NodeClaims something that CAPI could understand, perhaps a different way to look at the layer cake approach (eg capi on the bottom).
[pau] sharing some experiences from giant swarm

2023-09-14 @ 9:00 (GMT-4)

recording

Attendees

elmiko - Red Hat
Mike Tougeron, dan mcweeney - Adobe
Lorenzo Soligo, Andreas Sommer, Pau Rosello - Giant Swarm
Jack Francis - Microsoft

Agenda

intro, what are we doing here?
what are people looking to get from cluster api and karpenter integration?
- is there a cost savings that can come from using karpenter?
  - would be interesting to see some A/B testing results here with cluster autoscaler and karpenter
  - seems there is enough evidence to warrant pushing forward with doing something with karpenter and capi
- Scenario 1: replace cluster-autoscaler w/ karpenter on my existing capi cluster
- easier to manage multiple instance types
- some folks are experimenting with karpenter and capa
- how can provisioners be created from the launch templates?
  - karpenter folks might be deprecating this
- want to have parity between instances in cloud and Machine CRs
  - might be some scenarios where users don't care about seeing the CAPI resources
    - this is non-ideal, but workable in some cases
- "bring you own nodes" approach, how do nodes join the cluster?
- spot instances are more usable in karpenter than ASG, EKS-specific
- how does karpenter handle node drift? e.g. when updating AMI on a running cluster
  - a capi mgmt cluster central approach would be nice here
would the community want a cluster api provider for karpenter?
- how would this be done? perhaps we would need some sort of provider-provider for karpenter, e.g. capi api informs about which provider is deployed is with karpenter.
- karpenter running on workload, is able to use a cloud-specific provider on that cluster, and then makes Machine objects as it creates instances
- this approach might require more involvement from cloud-specific contributors to ensure that problems which arise on that cloud could have the best attention.
- capi machinepool may be another approach to fit into the karpenter logic
  - machinepools are different between providers, this may require a lot of work on the capi side
- karpenter hybrid-cloud topology might be difficult to make work, there are many other questions beyond provisioning here.
would the community want to use karpenter with capa in some sort of managed mode?
- this might look a little like the karpenter in workload option from above
do we have enough interest for a feature group and followup meetings?
- pau +1 to continuing this effort
- dan +1
- jack +1
- [jack] I would like to see us come up with a CAEP that describes how a karpenter provider would look, and then choose one cloud (probably AWS) to implement first of all in this “general karpenter provider” implementation (assuming that that’s what the CAEP describes)
- [dan] let's sync with the karpenter folks as well
  - elmiko +1
  - jack on the wg with sig autoscaling for karpenter inclusion, happy to share knowledge
    - tl;dr karpenter in the process of donating itself to cncf, many things to work through
    - https://github.com/kubernetes/org/issues/4258
    - https://docs.google.com/document/d/1_KCCr5CzxmurFX_6TGLav6iMwCxPOj0P/edit
  - karpenter working group calendar item