Design changes based on review

# Design changes based on review ## Design keypoint ### From review decision - CAPI is responsible for orchestrate machine update - Split changes and allow multiple external updaters to collaborate - InPlaceUpdatePlan CR is introduced to make update blueprint and then execute - leverage existing CR (KubeadmControlPlane/MachineDeployment/MachineSet/Machine) to handle in-place update ### From my idea - CAPI leverge updaterUpdateTask contract to exchange status - implement webhook is complex. CR contract is a widely used techinque for commmuncating between capi core and capi providers. - Define aborting updatePhase - for inplace update, we need handle abort carefully to avoid machine run into chaos status - when a new change comes while doing inplace update, we shall call `abort` to abort current on-going inplace update and wait unitl it completed. ## Chanlleges - How to identify the changeSet when consider different situations - start update on a cluster which all machines are on target state - start update on a cluster with inconsistent state, some machines updated, some are not - How to arrange the sequence of updater - Are all current Infra providers assume Machine spec immutable? - How to create a generic k8s component upgrade updater? - How to expose shell interface from infra provider? ## Sequence diagram ```plantuml @startuml actor User as user participant "Toplology\ncontroller" as top participant "InPlaceUpdatePlan\ncontroller" as clustert participant "ControlPlane\ncontroller" as cpt participant "MachineDeployment\ncontroller" as mdt participant "Machine\ncontroller" as mt box external collections updaters end box activate top activate cpt activate mdt activate mt user -> top: request update loop iterate updaters top -> updaters: call Lifecycle Runtime hook\n**ExternalUpdateRequest** updaters --> top: accept with `updaterName` `responsibilities` and `updateTaskTemplate` end alt#white #pink not all change picked top -> top: fallback to rollout\nor mark update failed end top -> clustert: create inPlaceUpdateTask activate clustert alt no ongoing update clustert -> cpt: set `update-task-name` label clustert -> mdt: set `update-task-name` label else #pink has ongoing update clustert -> clustert: wait and reenqueue end clustert -> cpt: set updatePhase to start\nsequence 1 update: [cluster1-cp] loop iterate machines cpt -> cpt: preflight check cpt -> mt: pick 1 machine, set `update-task-name` label\nand updatePhase loop sort by sequence and iterate updater mt -> updaters: create updaterUpdateTask activate updaters updaters -> updaters: operate on\nmachine updaters --> mt: report .status.updateStatus deactivate updaters end mt --> cpt: report .status.updateStatus end cpt --> clustert: report .status.updateStatus clustert -> mdt: set updatePhase to start\nsequence 2 update: [cluster-md1, cluster-md2] mdt -> mdt: create MachineSet with new spec loop iterate machines mdt -> mdt: preflight check mdt -> mt: pick 1 machine, set `update-task-name` label\nand updatePhase loop sort by sequence and iterate updater mt -> updaters: create updaterUpdateTask\nand set owner activate updaters updaters -> updaters: operate on\nmachine updaters --> mt: report .status.updateStatus deactivate updaters end mt --> mdt: report .status.updateStatus mdt -> mt: move machine from old MachineSet to\nnew MachineSet end mdt --> clustert: report .status.updateStatus clustert -> clustert: update .status.updateStatus deactivate clustert @enduml ``` ## Webhook definitions ``` controlPlaneUpdaterequest { clusterName controlplaneName changeSet } machineDeploymentUpdaterequest { clusterName machineDeploymentName changeSet } response { status: accept|reject|error message responsibilities updaterName updateTaskTemplateRef } ``` ## CRD definitions ```yaml --- # InPlaceUpdatePlanController is responsible for reconcile in-place update on cluster level # reconcile: # - if updatePhase == `updating` # - set cluster.x-k8s.io/update-task-name label for cp/md to be updated # - aggregate cp/md update status # - if cp/md in current update sequence finished, start cp/md update in next sequence num # - if updatePhase == `aborting` # - change updatePhase == `aborting` for cp/md which is in `updating` phase # - remove cluster.x-k8s.io/update-task-name label InPlaceUpdatePlan spec: updatePhase: updating|aborting controlPlane: name: cluster-cp1 sequence: 1 updaters: - name: osUpdater updaterUpdateTemplateRef: apiVersion: capzupdate.cluster.x-k8s.io/v1beta1 kind: capzVmOsUpdateTaskTemplate name: capzVmOsUpdateTask sequence: 1 responsibilities: - path: /spec/image selectResource: InfraMachine value: azurelinux3.0-240602-k8s - name: versionUpdater updaterUpdateTemplateRef: apiVersion: update.cluster.x-k8s.io/v1beta1 kind: nodeVersionUpdateTaskTemplate name: nodeVersionUpdateTask-control-plane sequence: 3 responsibilities: - path: /spec/version selectResource: Machine value: v1.30.1 - name: kubeadmNtpUpdater updaterUpdateTemplateRef: apiVersion: kubeadmupdate.cluster.x-k8s.io/v1beta1 kind: kubeadmNtpUpdateTaskTemplate name: kubeadmNtpUpdateTask sequence: 2 responsibilities: - path: /spec/NTP selectResource: Bootstrap value: | server: - time-a-g.nist.gov enabled: true machineDeployments: - name: cluster1-md1 sequence: 2 updaters: [...] # ignored - name: cluster1-md2 sequence: 2 updaters: [...] # ignored status: updateStatus: updating|updated|aborting|aborted conditions: - type: controlPlaneUpdate - type: machineDeploymentUpdate --- # KubeadmControlPlaneController is responsible for reconcile in-place update on kubeadmControlPlane # reconcile: # - preflight check # - aggregate machine in-inplace update status # - pick a machine and do in-place update # - if updatePhase == `aborting`, change updatePhase == `aborting` for machine which is in `updating` phase KubeadmControlPlane metadata: labels: cluster.x-k8s.io/update-task-name: updatetask1 spec: updatePhase: updating|aborting status: updateStatus: updating|updated|aborting|aborted conditions: - type: InPlaceUpdate # aggregate from machine InPlaceUpdate condition --- # MachineDeploymentController is responsible for reconcile in-place update on machinedeployment # reconcile: # - create machineSet with new Spec # - move updated machine from old machineSet to new machineSet # - preflight check # - aggregate machine in-inplace update status # - pick a machine and do in-place update # - if updatePhase == `aborting`, change updatePhase == `aborting` for machine which is in `updating` phase MachineDeployment metadata: labels: cluster.x-k8s.io/update-task-name: updatetask1 spec: updatePhase: updating|aborting status: updateStatus: updating|updated|aborting|aborted conditions: - type: InPlaceUpdate # aggregate from machine InPlaceUpdate condition --- # MachineInPlaceUpdateController is responsible for reconcile in-place update on machine level # reconcile: # - orchestrate updater and create updaterUpdateTask # - when updatePhase == `updating`, aggregate updaterUpdateTask status, then update `updateStatus` field and `InPlaceUpdate` condition # - when updatePhase == `aborting`, abort updaterUpdateTask, aggregate updaterUpdateTask status, then then update `updateStatus` field and `InPlaceUpdate` condition (timeout may need) Machine: metadata: labels: cluster.x-k8s.io/update-task-name: updatetask1 spec: updatePhase: updating|aborting status: updateStatus: updating|updated|aborting|aborted condition: - type: InPlaceUpdate # aggregate from updaterUpdateTasks --- # the contract of UpdaterUpdateTask. # it MUST has spec.updatePhase to indicate the reconcile target, and has updateStatus to indicate the reconcile result # it SHALL have condition to report if error happens and detailed reason MachineUpdateContract: spec: updatePhase: updating|aborting status: updateStatus: updating|updated|aborting|aborted conditions: --- # updaterUpdateTask is responsbile for part of update work on target machine # responsibility is decided in ControlPlaneInPlaceUpdateTask/MachineDeploymentInPlaceUpdateTask responsibilities field. # reoncile: # - when updatePhase == `updating`, update machine based on responsiblity, then update `updateStatus` field and `ready` condition # - when updatePhase == `aborting`, abort update operation if has, then then update `updateStatus` field and `ready` condition CapzVmOsUpdateTask metadata: labels: cluster.x-k8s.io/cluster-name: cluster1 cluster.x-k8s.io/control-plane-name: cluster1-cp cluster.x-k8s.io/update-task-name: updatetask1 cluster.x-k8s.io/machine-name: cluster1-cp-1 cluster.x-k8s.io/updater-name: versionUpdater spec: updatePhase: updating status: updateStatus: updated conditions: - type: ``` ### remediation snipt One thing to notice, after using **external remediator** to recover machine, the machine may still use the old spec, so it's necessary to rerun the inplace update plan on remediated machine. Another thing, Machine health check rules shall be extended so that it can monitor the machine in-place update progress. #### Machine Health Check - It shall be a way to identify if a machine is under inplace update, and when it's started. - A inplace update timeout setting shall be added to MHC spec. When a machine stays at inplace update stage for too long and exceeds the timeout threshold, MHC shall mark machine `HealthCheckSucceeded` Condition to false. - A contract with external updater is needed to determine if update operation fails and in terminate state. When a machine fall into this state, MHC shall mark machine `HealthCheckSucceeded` condition to false. [TODO: need consider this in contract design] - When calculating allowed remediation count, it shall treat the updating machines as `unhealthy`, because it's expected that updating machine may offline during update. ` RemediationCount = maxAllowedUnhealthy - unhealthyMachineCount - inUpdateHealthyMachineCount` - Corner case 1: A new machine which is not initialized yet and in updating stage, timeout shall be `initializeTimeout + inPlaceUpdateTimeout`