owned this note
owned this note
Published
Linked with GitHub
# Support Custom Bundle Content Probes
## Summary
Expand the plain+v0 bundle format and enable users (e.g. operator authors) to define custom content probes through a dedicated metadata file. Ensure the plain provisioner can accommodate this use case, and update the registry+v1 bundle provisioner to dynamically generate probe definitions when it decomposes OLM's CSV resource into a plain bundle format.
## Motivation
The plain provisioner is responsible for ensuring that the unpacked Bundle source contents are persisted to the cluster. In that vein, the provisioner implementation only cares whether a resource can be successfully applied. If an individual manifest is invalid, and the Kubernetes API server rejects that manifest, then the BundleDeployment will be correctly updated to reflect that failed rollout.
Where this behavior starts to break down is when the plain provisioner successfully applies manifests with runtime issues. In the case that a Deployment resource has been applied, and that resource references an invalid container image, then the result would be an unavailable container. Despite these runtime issues, the BundleDeployment would reflect a successful installation state.
This has larger implications for higher-level components that wish to build logic on top of rukpak primitives. These components may elect to evaluate Bundle state before making any pivoting decisions. Because the plain+v0 bundle format supports defining arbitrary Kubernetes resources, the plain provisioner implementation cannot monitor the state of all managed resources, and delegating this process to a higher-level system makes it near impossible to configure the right configuration of resources to monitor.
## Goals
- Improve UX for users and higher-level systems that interface with rukpak
- Introduce a mechanism that enables declarative custom probe logic
- Enable users packaging plain Kubernetes manifests to add custom probe logic
- Provide escape hatches for users
## Non-Goals
- Introduce this mechanism as the default behavior
## Open Questions
- Does this need a first-class Bundle probe definition API?
- Does the BD API need to be able to override mechanism?
## Downsides
- Defining probes through a metadata file may lead to visibility UX concerns
- Future requirements may require a first-class API to handle these outlined scenarios
## Proposal
The Bundle API is the closest layer to the desired bundle source's manifests. Introducing this behavior to higher-level APIs may risk improper configuring of custom probe values. This proposal recommends expanding the plain+v0 bundle format, and introducing a dedicated metadata file. This metadata file can enable users to define custom probes for their bundle contents.
When the plain provisioner is processing unpacked Bundle contents, and a manifests file is present, it's responsible for serializing those values, and evaluating any defined custom probes. These custom probes will specify a set of JSONPath templates to declaratively define probes.
The plain provisioner implementation will be updated to run those defined probes after it reports a successful application of the applied unpacked bundle contents. When an an individual probe evaluation fails, that state will be reflected in the BundleDeployment's status.
The registry+v1 provisioner has logic that converts OLM's registry+v1 bundle format into a plain bundle format. The ClusterServiceVersion (CSV) resource can be decomposed into a set of core Kubernetes manifests (e.g. RBAC, Deployment, etc.). Given OLM's registry+v1 bundle format supports a known subset of Kubernetes resources, the conversion logic would be able to handle configuring probes for the underlying Deployment resource.
This would help bridge the gap between the current limitations of reconciling arbitrary Kubernetes manifests, and higher-level systems that want to apply registry+v1 bundle contents and have rukpak manage those contents.
The registry+v1 conversion logic may output the following metadata file:
```yaml=
apiVersion: bundles.rukpak.io/v1alpha1
kind: Plain
metadata:
name: prometheus-operator.v0.0.1
spec:
health:
...
- name: prometheus-operator
group: apps
version: v1
kind: deployment
path: '.status.availableReplicas'
value: '1'
```
Alternatives may include introducing a new API entirely, and letting users inject that API's resources in their own bundle. Any design in the short term will need to accommodate the registry+v1 controller dynamically converting registry+v1 bundle contents into a plain bundle format.
Those alternatives were ultimately rejected due to higher-level components that manage configuring BundleDeployment resources themselves. Having those components determine these probe values upfront can be difficult if new spec fields, or a new API entirely, are introduced. This behavior may also be provisioner specific, and only apply to the plain provisioner implementation given the wide net of supported manifests.
The ProvisionerClass API, which doesn't exist today, would be a candidate resource that provides an escape hatch for configuring this mechanism's behavior. In the case that a probe was misconfigured, an admin may elect to toggle this behavior off entirely. Without that API existing today, the `plain.core.rukpak.io/enable-probes: "false"` BundleDeployment annotation may provide a short term solution. The plain provisioner implementation would be responsible for respecting that annotation key.
## Alternatives
### Bundle Probe Definition API
Introduce a BundleProbeDefinition API which defines a series of probe definitions. These probe definitions can use JSONPath templates to declaratively define custom probe definitions:
```yaml=
apiVersion: core.rukpak.io/v1alpha1
kind: BundleProbeDefinition
metadata:
name: prometheus-operator.v0.0.1
spec:
health:
- name: thanosrulers.monitoring.coreos.com
group: apiextensions.k8s.io
version: v1
kind: customresourcedefinition
path: '.status.acceptedNames.singular'
value: "thanosruler"
- name: prometheus-operator
group: apps
version: v1
kind: deployment
path: '.status.availableReplicas'
value: '1'
```
In order to use this custom probe definition resource, the BundleDeployment API can be updated to referencing a BundleProbeDefinition resource:
```yaml=
apiVersion: core.rukpak.io/v1alpha1
kind: BundleDeployment
metadata:
name: resolveset-asdf1234
spec:
provisionerClassName: core.rukpak.io/plain
probes:
health:
ref: prometheus-operator.v0.0.1
template:
spec:
provisionerClassName: core.rukpak.io/registry
source:
type: image
image:
ref: ...
```
This alternative was ultimately rejected due to UX/visibility concerns, and how higher-level components would accommodate this design in their implementation.
### Bundle Deployment Status API
Introduce a BundleDeploymentState (naming TBD) API, which helps decouple performing, and evaluating probe definitions from the plain provisioner implementation. A controller would watch for BundleDeployment resources, and propagate the underlying state for unpacking and applying Bundle contents, in addition to running the defined probes. Clients may elect to generate BundleDeployment resoures, but watch the BundleDeploymentState when proxying information back to higher-level systems.
This alternative was ultimately rejected, but may be revisited in the future as a longer term solution. Due to decoupling external information from the plain bundle format, we're able to keep the plain provisioner implementation still tightly scoped. This API may still run into the same configurability issues over time but this depends on how we're enabling Bundle authors and clients interfacing with rukpak APIs and behaviors, and whether it's possible to define these probe checks at the lowest layer.