# RFC-NNNN Gating Flux reconciliation **Status:** provisional **Creation date:** 2023-03-01 **Last update:** 2023-04-26 ## Summary Flux should offer a mechanism for cluster admins and other teams involved in the release process to manually approve the rollout of changes onto clusters. ## Motivation Flux watches sources (e.g. GitRepositories, OCIRepositories, HelmRepositories, S3-compatible Buckets) and automatically reconciles the changes onto clusters as described with Flux Kustomizations and HelmReleases. The teams involved in the delivery process (e.g. dev, qa, sre) can decide when changes are delivered to production by reviewing and approving the proposed changes in a collaborative manner with pull requests. Once a pull request is merged onto a branch that defines the desired state of the production system, Flux kicks off the reconciliation process. There are situations when users want to have a gating mechanism after the desired state changes are merged in Git: - Manual approval of container image updates (e.g. https://github.com/fluxcd/flux2/discussions/870) - Manual approval of infrastructure upgrades (e.g. https://github.com/fluxcd/flux2/issues/959) - Maintenance window (e.g. https://github.com/fluxcd/flux2/discussions/1004) - Planned releases - No Deploy Friday ### Goals - Offer a dedicated API for defining gates in a declarative manner. - Extend the current Flux APIs and controllers to support gating. ### Non-Goals - Provide a controller that implements custom logic for opening/closing the `Gate` objects. ## Proposal In order to support manual gating, Flux could be extended with a dedicated API that would allow users to define `Gate` objects and perform operations like `open` and `close`. For example, the following manifest represents a closed gate: ```yaml apiVersion: gating.toolkit.fluxcd.io/v1alpha1 kind: Gate metadata: name: change-freeze namespace: flux-system spec: closed: true ``` A `Gate` object could then be referenced in sources (Buckets, Git, Helm, OCI Repositories) to block the reconciliation until the gate is opened. For example, the following manifest represents a `GitRepository` object that is controlled via a gate: ```yaml apiVersion: source.toolkit.fluxcd.io/v1beta2 kind: GitRepository metadata: name: flux-system namespace: flux-system spec: gates: # all (default): all gates must be open for the reconciliation to go ahead. # oneOf: at least one of the gates must be open for the reconciliation to go ahead. require: all # <all|oneOf> refs: - change-freeze ``` A `Gate` can be opened or closed by updating its spec much like any other resource. When the gate is open for a `GitRepository` resource, source-controller will reconcile it as normal. When the gate is closed, source-controller will stop reconciling the resource and no changes will be applied to the cluster that originate from the gated `GitRepository` resource. If the gate defined in the source resource is not found, the source-controller will raise an error. #### Multiple Gates Flux source objects can specify multiple gates. By default, all gates specified must be open for the reconciliation to go ahead. To change the behavior `spec.gates.require` can be set to `oneOf` instead: ```yaml apiVersion: kustomize.toolkit.fluxcd.io/v1beta1 kind: GitRepository metadata: name: flux-system namespace: flux-system spec: gates: # all (default): all gates must be open for the reconciliation to go ahead. # oneOf: at least one of the gates must be open for the reconciliation to go ahead. require: oneOf # <all|oneOf> refs: - change-freeze # gate that enforces a change freeze time window - bypass-signoff # gate that allows other gates to be overriden. ``` When `oneOf` is used, a single open gate object is required for reconciliations to proceed. #### New Flux CLI commands The Flux CLI will be extended to include commands that manage the state of gates. Given the following gate that exists in the cluster: ```yaml apiVersion: gating.toolkit.fluxcd.io/v1alpha1 kind: Gate metadata: name: no-deploy-fridays namespace: flux-system spec: closed: true ``` The following command will open the gate by setting the `spec.closed` key to `false`: `flux open gate no-deploy-fridays --namespace flux-system` The following command will close the gate by setting the `spec.closed` key to `true`: `flux close gate no-deploy-fridays --namespace flux-system` ### User Stories #### Story 1 > As a member of the SRE team, I want to ensure no deployments happen on Fridays. Define a gate and reference it from a `GitRepository`: ```yaml apiVersion: gating.toolkit.fluxcd.io/v1alpha1 kind: Gate metadata: name: no-deploy-fridays namespace: flux-system spec: closed: true ``` ```yaml apiVersion: kustomize.toolkit.fluxcd.io/v1beta1 kind: GitRepository metadata: name: flux-system namespace: flux-system spec: gates: refs: - no-deploy-fridays # gate that opens/closes based on a cron schedule ``` Once the above objects are reconciled, no further changes that originate from the gated `GitRepository` object are applied to the cluster until the gate opens. A pair of Kubernetes CronJob resources can be used to open/close the gate at the desired schedules, by setting the `spec.closed` key accordingly on the gate object. The following CronJob resources demonstrate this: ```yaml apiVersion: batch/v1 kind: CronJob metadata: name: close-gate namespace: flux-system spec: schedule: "0 0 * * 5" # At 00:00 every Friday jobTemplate: spec: template: spec: containers: - name: flux image: ghcr.io/fluxcd/flux-cli:v0.41.2 imagePullPolicy: IfNotPresent command: flux close gate -n flux-system no-deploy-fridays restartPolicy: OnFailure ``` ```yaml apiVersion: batch/v1 kind: CronJob metadata: name: open-gate namespace: flux-system spec: schedule: "0 0 * * 1" # At 00:00 every Monday jobTemplate: spec: template: spec: containers: - name: flux image: ghcr.io/fluxcd/flux-cli:v0.41.2 imagePullPolicy: IfNotPresent command: flux open gate -n flux-system no-deploy-fridays restartPolicy: OnFailure ``` ### Alternatives #### Building a dedicated gating controller as part of Flux Implementing a new gating controller that manages `Gate` objects could be an option for certain scenarios such as time-based gates. However, it is expected that the gating requirements will inevitably vary across Flux users and building a controller that can accomodate all possible gating scenarios is not an simple task. Note that this proposal does not prevent Flux users from implementing their own gating logic in the form of a custom controller using the `Gate` object as a building block/abstraction. #### Users to implement gating outside of Flux ##### Before Flux source Users could implement their own gating mechanisms as part of their development processes ensuring that their custom rules are applied before the changes reach their Flux sources (i.e. the target Git repository). For example, if deployments are not allowed on Fridays, no PRs would be merged on those days. The disadvantage is that some source types may not provide easy ways for users to enforce such rules. When using different source types (e.g. Git, OCI, Helm), multiple implementations may be required. ##### CronJobs and Flux Suspend Users can implement a gating mechanism within Kubernetes by leveraging CronJobs and using the built-in suspend feature in Flux that allows for a Flux object to stop being reconciled until it is resumed. This alternative does not scale well when considering hundreds of Flux objects. ## Design Details A new `Gate` API object will be defined with its `.spec` consisting of a simple boolean field `closed`. A new `gates` object will be added to the `.spec` of all source objects and will be optional. It will include a `refs` field which lists all `Gate` objects that the source object will depend upon. It will also include an optional `require` field which will be used to indicate whether all gates (AND semantics) or at least one of the gates (OR semantics) listed in `refs` need to be open, in order for new artifacts to be available in the cluster. The default behaviour if the `require` field is not specified is that all the gates need to be open. As part of its reconciliation loop, the source-controller will check whether the source object references any `Gate` objects: - If no `Gate` objects are being referenced, then source-controller will proceed with reconciling the object as normal, maintaining the current behaviour. - If one or more `Gate` objects are defined and the `require` field is not set or set to `all`, then source-controller will inspect all gate objects and will only make the most recent artifacts of the source object available when **all** gates are open. Otherwise, it will inspect all gates again after the next interval (defined in the source object) or when the source object changes. In both cases, a new condition will be added to the source object status indicating the gates inspected and their status at inspection time. - If one or more `Gate` objects are defined and the `require` field is set to `oneOf`, then source-controller will start inspecting the gate objects and if it finds **at least one** gate open then it will make the most recent artifacts of the source object available in the cluster. Otherwise, it will inspect them again after the next interval (defined in the source object) or when the source object changes. In both cases, a new condition will be added to the source object status indicating the gates inspected and their status at inspection time. It is worth pointing out that the action of closing the gate will be reflected in the cluster instantly. However, because of the nature of Kubernetes controllers that execute their logic at certain intervals, the *side effects* of closing a gate that is referenced by a source object will be visible at the next reconciliation loop of that source object. ## Implementation History