---
---
# De-coupling APIs and controllers
## What really _is_ an API?
In Kubernetes, operators are controller implementations of custom APIs (as opposed to a non-operator controller, which primarily operates on APIs that already exist in the cluster, either built-in or provided by other operators).
The custom APIs are defined in one of two ways:
- [Custom resource definitions](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/)
- [Aggregated API servers](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
Note that for a given resource, it is not possible for a single control plane to serve an API via both a CRD and an aggregated API server because they both require a URL path reservation for the apiserver to use.
### CustomResourceDefinition
A CustomResourceDefinition declaratively describes an API and its versions. It defines the API's schema as well as configurations for admission and conversion webhooks. Thus the collection of resources that are typically found associated with a CustomResourceDefinition are:
- `CustomResourceDefinition` - defines the API
- `Namespace` - location to run any provided webhooks
- `Deployment` - definitions of webhook servers
- `Service` - mechanism for apiserver to reach webhook servers
- RBAC (`ClusterRole`, `ClusterRoleBinding`, `Role`, `RoleBinding`, `ServiceAccount`) - permissions required by webhook servers to function
- `ValidatingWebhookConfiguration` - informs API server about how to connect to validating webhook
- `MutatingWebhookConfiguration` - informs API server about how to connect to mutating webhook
- cert-manager `Certificate` or similar - often used to automatically provision/roll client/server certs and CAs for authentication between apiserver and webhook server
### APIService
Typically, an API service is a pod that serves an API to a control plane and a configuration object that tells the control plane how to reach the pod. This pod is responsible for handling CRUD/REST-style requests proxied to it by the apiserver on behalf of requests made by clients of the apiserver.
Thus the collection of resources that are typically found associated with an APIService are:
- `APIService` - informs the apiserver about where to proxy requests for the API.
- `Namespace` - location to run the aggregated API service
- `Deployment` - specification for running the aggregated API service
- RBAC (`ClusterRole`, `ClusterRoleBinding`, `Role`, `RoleBinding`, `ServiceAccount`) - permissions required by aggregated API server to function
- cert-manager `Certificate` or similar - often used to automatically provision/roll client/server certs and CAs for authentication between apiserver and aggregated API server
## Use cases
1. CRD conversion webhooks must be lifecycled with CRDs.
- Conversion webhooks configured on the CRD must be available for the entire lifetime of the CRD's existence. Otherwise the core GC controller fails (i.e. stops garbage collecting across the cluster).
1. Admission webhooks
- Admission webhooks that apply to all objects of a CRD should be lifecycled with CRDs. Controller variants may require running their own admission webhooks, scoped to the namespaces that the controller is watching. Admission webhook configurations are cluster-scoped objects, so care must be taken to ensure that bundles for controller variants do not conflict on the name of the admission webhook configs. (Perhaps this situation calls for 3 bundles: CRD, AdmissionWebhookConfigs, Controller+Webhook servers?)
1. Finalizers
- When an object that has a finalizer is deleted, Kubernetes marks it with a deletion timestamp, but waits for something to process the deletion and remove the finalizer. If a finalizer is applicable to all objects of that API, a finalizer controller should be packaged with the API bundle. If a finalizer implementation is specific to a particular controller variant, things become somewhat complicated. Often the finalizer logic exists in the reconciling controller itself. In that case, it means the controller must exist and be running in the cluster until all of the CRs with finalizers are removed. If nothing exists to process the finalizer, the deletion of the CR will be blocked forever (or until someone manually removes the finalizer). And a blocked CR deletion cascades and causes a CRD deletion to be blocked as well.
1. Upgrade lifecycling
- If decoupled from their controllers, API upgrades can focus purely on the validity of _just_ APIs being upgraded. Concerns related to schema changes, stored version migration, webhook deployment strategy, etc. can all be handled without concern for controller-related lifecycling. OLM's dependency resolution would ensure that APIs cannot be upgraded unless all installed controllers allow it (via their declared dependency on the API package).
- If decoupled from their APIs, upgrading and downgrading controllers becomes much easier. Rollbacks are no longer necessarily contigent on the ability to _also_ rollback an API.
1. Uninstall
- Moving to a declarative model for operator lifecycling, we will likely need to treat bundles as a unit for install/uninstall. If a bundle contains CRDs, deleting that bundle from the cluster will delete the CRDs. Deleting CRDs cascades to deletion of CRs and user workloads. If CRDs and controllers are split into separate bundles, a controller can be atomically deleted without affecting CRDs and CRs.
1. De-scoping
- If we separate APIs from controllers, our de-scoping story is likely simplified down to API de-scoping. There are already several very visible examples of multiple controllers for the same API being installed together in a cluster. The most obvious example is the Ingress API and all of the ingress controllers.
- Kubernetes has already stated for us "only one definition of a given group/kind" per cluster.
- We can potentially back off on the verbiage around "only one operator per owned API per cluster". Ingress controllers prove this is an unnecessary constraint. OLM would delegate to controller implementations to determine how to shard ownership of custom resources. Options include namespace-based sharding, label selectors, and implementation-specific class names (e.g. IngressClass, StorageClass)
1. KCP
- In KCP every application stack will get its own workspace (i.e. virtual cluster, separate control plane). This means different applications (in different workspaces) can use different versions of the same API because APIs are installed at the workspace level. However controllers may be installed in physical clusters or in such a way that their view of the cluster spans multiple workspaces. In this model, any operator targeting KCP will very likely need to de-couple the APIs it provides from the controllers it runs.
1. Canary rollouts
- If I want to be able to run multiple versions of the same operator for the purposes of a canary rollout, both of those operators must rely on the same CRD. That means CRDs cannot be changed during a canary rollout. To ensure that invariant, the only way to enable canary rollouts for every release of an operator is to manage the CRD separately from the operator.
- Related to item (1), CRD conversion webhooks are the domain of the CRD, not the controller. You can't canary conversion webhooks because there can be only one active conversion webhook per CRD. Therefore, any bundle that contains a conversion webhook cannot be canaried. Therefore, bundles with conversion webhooks and combined APIs and controllers cannot be canaried.
- Admission webhooks _do_ have configurable selectors such that they could theoretically be configured to align with canary rollout selectors. More thought and experimentation is necessary to understand if admission webhooks can be lifecycled with controllers instead of CRDs.