Cluster API Addon Provider Fleet

## Motivation Currently in the CAPI ecosystem there are several solutions to address the idea of deploying applications on provisioned clusters by CAPI, but this idea was not explored actively in upstream so far. One of the projects which were not involved in providing functionality in CAPI is HelmChatProxy resource provided by CAPI Addon Provider Helm allowing users to install HelmChat on provisioned clusters automatically. Fleet, is capable of the same set of operations as addon provider helm, but allows to do more tasks and overall the project is mature and is used in production. Additionally, the fleet provides other alternatives to helm based deployment models, like kustomize and GitOps based deployment. With introduction of the addon provider, it makes them all natively compatible with CAPI ecosystem. ## User stories ### User story 1. As an infrastructure provider I want to deploy my Helm application into every provisioned child cluster so I can provide immediate functionality after cluster bootstrap. ### User story 2. As a DevOps I'd like to utilize GitOps practices to deploy applications into my CAPI clusters, so I can manage all of my cluster configuration and deployed applications there from a single location. ### User story 3. As a user I'd like to deploy applications into my CAPI clusters and configure those application based on the cluster infrastructure templates, so the application will be provisioned correctly for the cluster environment. ### User story 4. As a cluster operator I want to aid in provisioning of the cluster API child cluster while provisioning cluster behind NAT, so my cluster can successfully be provisioned without manual intervention. ## Proposal For Cluster API scenario, Cluster API Addon Provider Fleet (CAAPF) is intended to be a wrapper upon [Fleet](https://github.com/rancher/fleet) project, allowing users to automatically integrate CAPI clusters into the Fleet ecosystem and deploy predefined set of configured applications based on matching selectors onto newly provisioned child CAPI clusters. Main goals of the addon provider are: - Automatically create fleet cluster on discovering a newly created matching CAPI cluster resource in the management cluster, and ensuring successful connection. - Automatically create a default ClusterGroup on discovering a CAPI ClusterClass. In order to achieve this goal cluster API and Addon Provider Fleet will take several responsibilities. ### Responsibility 1. It will be capable of automatically deploying configured Fleet installation and Fleet CRDs into management CAPI cluster to aid configuration tasks and abstract users from using Helm-based installation manually. ### Responsibility 2. The Addon Provider Fleet will be capable of automatically installing the Fleet agent into child cluster, and leverage any of manager-initiated or agent-initiated connection upon desired provisioning mode. ### Responsibility 3. Cluster API and Addon Provider Fleet will aid in the process of creating fleet bundles to deploy applications on the freshly provisioned clusters. ### Responsibility 4. The Addon Provider Fleet will define simple operations providing automatic templating capabilities based on the CAPI cluster infrastructure definition for the applications to modify or alter their inner content based on provided infrastructure configuration. ### Responsibility 5. Cluster API and Addon Provider Fleet will utilize ClusterClass [Lifecycle Hook Runtime Extension](https://cluster-api.sigs.k8s.io/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks#implementing-lifecycle-hook-runtime-extensions) and be able to inject required prerequisite steps into bootstrap configuration of the CAPI cluster using `BeforeClusterCreate` and `BeforeClusterUpgrade` hook. This feature will be initially available for a set of bootstrap providers, starting with Kubeadm and RKE2. CAAPF will serve the backend for lifecycle hook extensions, and be able to modify bootstrap definitions accordingly to inject installation steps into provisioning definition. ### Responsibility 6. Add-on provider fleet will require to define a custom resource which will be named fleet `FleetAddonConfig`. This will be a singleton `cluster` scoped resource will be responsible for configuring the provider functionality, like enabling or disabling specific set of features addon provider has, and defining on which CAPI Clusters the fleet agent deployment is supposed to be configured and in which configuration. #### Fleet installation configuration The resource will expose fields for configuring Fleet installation like: 1. API server URL - to provide declarative value for installing fleet into management cluster and specifying its public API server URL that agent will be able to connect to. This value can not be derived automatically for any kubernetes installation. 2. API server Certificate - allowing user to declare a CA for the fleet agent to connect to the management cluster. Users will be able to provide their own certificates in case they want to override default behavior or default behavior doesn't work for them. By default those certificates will be collected from a dedicated service account secret for newly create cluster-provisioner SA. 3. Install - bool flag to enable/disable automatic installation and configuration of the Fleet instance in the management cluster. Disabled marks the management cluster being configured by user with fleet prerequisites. #### Fleet import configuration By default CAAPF will use manager initiated installation method for all CAPI clusters. CAAPF `FleetAddonConfig` resource will allow to define behavior when the cluster needs to use alternative communication mechanism by exposing a field with label selectors which apply to the cluster(s) in question. These labels will be used for selection. If no labels are provided, all clusters will be imported by manager initiated mechanism. CAAPF `FleetAddonConfig` will define configuration for excluding clusters from automatic import into Fleet. This will be possible with: 1. A list of label selectors. Matching cluster per any selector will be imported into fleet automatically. `Or` logic is applied to individual label selectors. 2. A list of `regex` namespace names. If cluster is created in any of the matching namespaces, then it will be imported, otherwise not. The list being empty causes all Clusters to be imported. 3. Future selection logic. The general rule - if this is a list of selector rules, `OR` operation applies to each of them. Empty value equals to automatic import of every Cluster. ### Helm operations Fleet uses [helm](https://fleet.rancher.io/quickstart#install) to install all of it’s components. The CAAPF will embed `helm` image inside the build image to leverage all CLI functionality of this tool, due to lack of available options in the cargo space. It will wrap required commands to install and deploy fleet components like fleet, fleet-agent definitions and fleet CRDs on demand. #### CAAPF installation The mechanism for installation of the add-on itself will be consistent with `clusterctl` and CAPI operator best practices. The addon provider will publish required components to rollout new CAAPF replica from release. This manifests will be possible to install using `AddonProvider` with CAPI Operator. Example: ```yaml apiVersion: operator.cluster.x-k8s.io/v1alpha2 kind: AddonProvider metadata: name: fleet namespace: default spec: version: v0.2.1 fetchConfig: url: https://github.com/rancher-sandbox/cluster-api-addon-provider-fleet/releases/latest/addon-components.yaml ``` Once add-on provider is installed, the Fleet Addon Config can be applied by the user. Depending on the configuration, it may potentially trigger automatic installation of the `Fleet` helm chart and its dependencies in the management cluster. #### Cluster Class Fleet Agent provisioning injection In addition to watching the CAPI clusters and creating fleet clusters for each external CAPI cluster, fleet add-on provider will define a serving code to handle cluster API [lifecycle hooks](https://cluster-api.sigs.k8s.io/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks) requests. It will serve `BeforeClusterCreateHook` to modify provided cluster class bootstrap template to inject required commands to install and rollout add-on provider fleet instance on the child CAPI cluster during boostrap procedure. This will be initially supported by kubeadm and RKE2. Addon installation in the child cluster will serve these roles: - Configure and install fleet agent which would allow it to connect to the `Fleet` replica on the management cluster. The agent itself would act as an envelope around those installed components to report health state of deployed resources in the child cluster to the management cluster. - Orchestrate the process of installation and report metrics, including the cause of failure, if needed. - Aid in upgrades of the `fleet-agent` on the child cluster. Depending on the downstream cluster registration model, the agent will be installed either by the management cluster with CAPI provided cluster kubeconfig, or deployed with a set of: - Fleet `ClusterRegistrationToken` - `Cluster` without kubeconfig specified, but with Cluster ClientId set which will initiate the connection from the child cluster side. The second approach would allow this connection to be initiated to the publicly exposed and known available API server instance of the management cluster, while the child cluster being enclosed under NAT or any sort of private network. Typically in this scenario, cluster API is unable to connect via provisioned kubeconfig to the child cluster and notice the readiness of the infrastructure provisioned and cluster configuration, thus unable to mark the cluster as ready and healthy. #### Application templating based on Cluster definition In order to support application specific configuration based on provided `Cluster` infrastructure, CAAPF will leverage regular helm [templating](https://github.com/rancher/fleet/blob/5bc51b0fab630bb54de72764f21832c53bdb77b9/pkg/apis/fleet.cattle.io/v1alpha1/bundledeployment_types.go#L213) mechanisms exposed on a bundle deployment. All basic infrastructure configuration resources like `Cluster`, `InfrastructureCluster` or `ControlPlane` will be stored in secret for the user defined bundle definition to reference from it using `valuesFrom.secretKeyRef`. #### Automatic Cluster Move and Pivot Fleet Agent provider will be able to assist in this task by leveraging the functionality implemented for `clusterctl` - [move](https://cluster-api.sigs.k8s.io/clusterctl/commands/move#clusterctl-move) and [pivot](https://cluster-api.sigs.k8s.io/clusterctl/commands/move#pivot) procedure, in which the originally created CAPI cluster on management side and related Cluster API definitions including cluster resources and installed CAPI providers will moved to the newly created child cluster. This procedure will be leveraged by dependency on the `Cluster API Operator` as this project allows to declare a set of providers as custom resources, which can be easily transferred to the child cluster without requiring availability of the `clusterctl` tool, and Cluster API Operator, being installed by `Helm` makes it a perfect candidate for addition in agent installation steps. This makes it possible to install the CAPI Operator definitions together with the fleet agent during the bootstrap procedure of the child cluster by the agent instance rolled out in the cluster. The move procedure executed by the CAAPF replica on the child cluster will perform: 1. Stopping existing CAPI Cluster related resources in the management cluster. 2. Bundle the CAPI Cluster and related resources inside a set of `Fleet` `Bundle` resources, targeted against `Fleet` Cluster created for CAPI cluster. 3. Bundle with `Fleet` `Bundle` all CAPI Operator Providers associated with the Cluster in the current configuration, and target them against CAPI cluster. These steps will lead to automatic migration of CAPI infrastructure to the CAPI cluster after successful bootstrap, making the child cluster self-managed from day 1. Original resources can be later removed from the management cluster. --- # POC / Ideas # Provisioning konnectivity proxy In the scenarios when it is not desired to move the original cluster to the private child cluster, and thus removing its definitions from the management cluster, described fleet provisioning mechanism can allow automatic installation of the [konnectivity](https://kubernetes.io/docs/tasks/extend-kubernetes/setup-konnectivity/) agent in the child cluster. This scenario will require installation of the CAPI Operator in the child cluster, but will not require bundling Cluster resources. The proxy definition, on the other hand will be bundled or injected into bootstrap to be deployed on the child cluster. Management cluster will host the konnectivity server, and the konnectivity agent will be deployed on the child cluster. FleetAddon will be able to connect to both agent and child cluster where it is hosted, to orchestrate CAPI Cluster definition on agent proxied management cluster, treating the local child cluster as a remote downstream cluster. This *might* require a specific kubeconfig deployed in the management cluster to allow such type of communication. ### Air-gapped installation This scenario does not cover fully air-gapped installation. Where specific load balancing and network configuration solution should take place beforehand to allow the fleet agent to initiate connection to the management cluster, thus breaking some of the assumptions of the air-gapped installation. The same thing applies that it is not a goal of the project to solve the prerequisites of the components like images or configuration values, which are not initially available in the child cluster within existing bootstrap implementation, which additionally would be agnostic to the provider environment.