Proposal: containerd shim lifecycle operator

As more containerd shims are created to extend the behavior of containerd, there is a need for an operator to coordinate containerd shim lifecycle operations. For example, shims need to be able to be installed / updated / removed, registered in the containerd config as handlers, runtime classes applied / removed, node labels applied / removed to indicate to the Kubernetes scheduler that a node has a given shim version installed so the scheduler can selectively schedule workloads on a node. All of these tasks can be done manually, but they are error prone and easily automated. These types of operations are prime tasks for an operator to perform.

Kwasm

With the introduction of Kwasm, the community has enabled easy installation of deislabs/containerd-wasm-shims to enable cluster operators to extend the functionality of their Kubernetes clusters to run WebAssembly (Wasm) workloads.

Kwasm has demonstrated that an operator is a convenient means of shim lifecycle management in comparison to other methods of preparing cluster nodes to run Wasm workloads. For example, in kubernetes-sigs/image-builder project which is used by Cluster API to produce Kubernetes node virutal machine images, the containerd-wasm-shims must be downloaded, installed, and containerd config mutated when the virtual machine image is built, which produces a static virtual machine image with a specific set of Wasm shim versions. Additionally, to mutate Wasm shims installed in a cluster, a new virtual machine image would need to be built, published, and rolled out to clusters replacing existing cluster nodes.

Generic containerd shim lifecycle management

To accomplish the goal of building a generic containerd shim livecycle manager, it will need to manage the following tasks:

  1. Fetching the binary shim from some location via some protocol
  2. Mutating the containerd config on nodes
  3. Mutating node labels
  4. Mutating cluster runtime classes

All of the aforementioned things can be done with an operator running within the cluster. This operator should be able to automate each of these tasks. By automating these tasks with an operator, it will make these operations reliable, goal seeking, and remove human error from the process of installing and configuring a shim.

Shim CRD

Below is a proposed CRD structure for describing the information needed to install and configure a shim on a cluster.

apiVersion: containerd.x-k8s.io/v1alpha1
kind: Shim
metadata:
  # The Shim resource is a cluster wide resource, no namespace.
  name: my-shim-v0.1.2
spec:
  # optional: label selector for nodes to target with shim.
  # If not supplied, the shim should be installed on all nodes.
  nodeSelector:
    wasm: true
    
  # required: The method for fetching a shim.
  # This could be any number of strategies for fetching. For example, OCI.
  fetchStrategy:
    type: annonymousHttp
    anonHttp:
      location: https://github.com/some-org/some-project/releases/v0.8.0/shims.tar.gz
      
  # required: The runtime class to be applied in the cluster for the shim.
  # 
  # The validation for this structure should also validate the `handler`
  # will map to the name / path of the shim binary that is installed on the node.
  #
  # Upon installation of a shim to a node, a label should be added to the node
  # to indicate a specific shim is installed on the node. This label must be
  # used to inform the K8s scheduler where to schedule workloads for the given
  # runtime class.
  #
  # ---
  # apiVersion: node.k8s.io/v1
  # kind: RuntimeClass
  # metadata:
  #   name: myshim-v0.1.2
  # handler: myshim_v0_1_2
  # scheduling:
  #   nodeSelector:
  #     myshim_v0_1_2: "true"
  runtimeclass:
    name: my-shim-v0.1.2
    
  # rolloutStrategy describes how a change to this shim will be applied to nodes.
  rolloutStrategy:
    type: rolling
    rolling:
      maxUpdate: 5 # could also be a percentage of nodes, like 10% of nodes.
status:
  # conditions should provide the status of the resource and it's progression 
  # toward the goal state.
  # 
  # similar to: https://github.com/kubernetes-sigs/cluster-api/blob/82eff49867008365d3b26f82b55ff29a67880aa7/api/v1beta1/condition_types.go#L54-L85 
  conditions:
    - type: foo
      status: true # true, false, unknown
      conditionSeverity: info # error, info, warning, etc
      lastTransitionTime: "2023-08-18T19:21:00"
      reason: "some reason"
      message: "some message"

The preceding yaml describes the following.

  1. A nodeSelector for targeting a subset of nodes in the cluster for installing the shim. This optional configuration would default to installing the shim on all non-controlplane nodes in the cluster.
  2. A fetchStrategy for informing the controller on how to fetch the shim.
  3. A runtimeclass structure for describing how the name of the runtimeclass.
  4. Conditions for describing the status of the resource and it's progression toward goal state.

Each top-level key in the CRD a purposefully designed as a structure so that as new configuration is needed, the structure can be extended without needing to make major structural changes, like changing a string to a struct.

Controller Lifecycle

The controller must be able to mutate shim installs along with the related mutations of node labels, containerd configs, and runtime classes.

Install

When the Shim CRD is applied, the controller should do the following:

  1. Select the cluster nodes using the nodeSelector configuration and that do not already have the specific shim node label applied and limit the results by the configured rollout strategy.
  2. Determine if this is an upgrade or a new install.
  3. Install or upgrade the shim and containerd config on the node.
  4. Restart containerd to use the updated configuration.
  5. Determine if containerd started successfully. If not, rollback local changes and stop rollout.
  6. Apply node label for the specific shim.
  7. Apply the runtime class targeting the node label and the handler name (if does not exist).
  8. Requeue the reconciler until all nodes have been updated.

Uninstall

When the Shim CRD is deleted, the controller should do the following:

  1. Select the cluster nodes using the nodeSelector configuration and that have the specific shim node label applied and limit the results by the configured rollout strategy.
  2. Remove node label for the specific shim from target node(s).
  3. Remove the shim and configuration from containerd config on the node(s).
  4. Restart containerd to use the updated configuration.
  5. Determine if containerd started successfully. If not, rollback local changes and stop rollout.
  6. Requeue the reconciler until all nodes have been updated.
  7. Delete the runtime class associated with the shim once all nodes have been updated.
Select a repo