Platform Operators Sourcing Catalog Content

--- author: timflannagan title: Platform Operators Sourcing Catalog Content --- # Platform Operators Content Sourcing - Owner: Tim Flannagan - Co-authors: Dan Sover, Tyler Slaton [^1]: <https://issues.redhat.com/browse/OLM-2513> [^2]: <https://github.com/operator-framework/rukpak/blob/c6a0ec82aa85cb899e22e66ff9b81144ae655384/internal/provisioner/plain/controllers/bundle_controller.go#L312> ## Summary As we continue iterating on the platform operators (PO) deliverable, the question of how we handle `plain+v0` and`registry+v1` bundles in the same world needs to be addressed. Currently, the discussion around POs has evolved over internal threads/meetings detailing how we manage both formats. This intent of this document is to serve as a way of aggregating thoughts, discussing trade-offs, and ultimately coming to a consensus on the best path forward. ## Motivation [Rukpak](https://github.com/operator-framework/rukpak) has been making great progress in recent sprints towards becoming productizeable. During the development process, the team focused on building on the plain provisioner, which is responsible for managing Bundles that are in the `plain+v0` bundle format. This bundle format contains a series of arbitrary Kubernetes manifests. As the project matures, the question of how to best handle the existing ecosystem of OLM catalog contents has emerged. Supporting the legacy OLM bundle format is extremely important for adoption of our product as it allows bundle authors to continue using their existing format without performing any manual migrations steps in the short term. This legacy bundle format may make sense for legacy OLM, but OLM's ClusterServiceVersion (CSV) resource has been a consistent pain point for bundle and operator authors. Supporting this existing format out-of-the-gate may not be attractive for existing or new users that simply want to apply Kubernetes manifests in a structured way without the complexity burden that CSV's provide. As a result, there is likely a world where we have both the `plain+v0` and `registry+v1` bundle formats present in catalog content. The goal of this document is to determine if this is the best path forward, how we should go about navigating that path. ## Goal Outline and explore several different alternative designs for how to source catalog content for the platform operators (PO) initiative [^1] that build on top of OLM's new rukpak component. ## Prior Work - [RukPak's plain provisioner implementation](https://github.com/operator-framework/rukpak/blob/main/internal/provisioner/plain/README.md) - [RukPak's plain bundle format](https://github.com/operator-framework/rukpak/blob/main/docs/plain-bundle-spec.md) - [RukPak's BundleInstance API enables declarative updates for Bundles](https://github.com/operator-framework/rukpak/pull/293) - [Platform Operators POC that interacts with OLM's CatalogSource API](https://github.com/timflannagan/platform-operators) - [Support guarantees around introducing v1alpha1 APIs into the core OCP payload](https://github.com/openshift/openshift-docs/pull/41018#issuecomment-1027327520) ## Requirements - Support installing PO in day 0 operations - Support upgrading a PO when the cluster upgrades - Support a PO impacting the cluster upgradeability - Restrict PO's constraints outside of maxOCPVersion and minKubeVersion - Restrict PO's expressing dependencies ## Open Questions - Support installing and removing POs in day 2 operations? - What integrations need to be present for clusters that contain both legacy OLM and the PO stack(s) present on cluster? - Can a layered product be installed by both OLM and this PO mechanism? - What are the requirements for the KCP use case? - Does there need to be a console integration with platform operators when console is present on a cluster? - Can we package legacy OLM itself as a PO? - Use case: resource constrained environments, e.g. microshift, edge - The local UX around `kubectl get co` vs. `kubectl get po` ## Alternatives ### Converting registry+v1 bundles at runtime Introduce a registry+v1 Bundle provisioner that's responsible for converting registry+v1 bundle contents to a decomposed set of plain Kubernetes manifests. The BundleInstance API supports sourcing and unpacking Bundle contents using a different provisioner implementation than the provisioner that's responsible for applying those Bundle contents. Due to RukPak's descoped tenancy model where Bundle content can only be applied once per cluster, the following alternative only supports registry+v1 bundles that specify the AllNamespace install mode. See the below for more restrictions on which registry+v1 bundles are supported. <details> <summary>Implementation details</summary> The following example showcases a BundleInstance resource that references different, unique provisioner IDs: > Note: Both of the controllers in rukpak's plain provisioner implementation already filter out Bundles that don't match the "core.rukpak.io/plain" unique provisioner ID. ```yaml apiVersion: core.rukpak.io/v1alpha1 kind: BundleInstance metadata: name: cert-manager spec: provisionerClassName: core.rukpak.io/plain template: spec: provisionerClassName: core.rukpak.io/converter source: type: image image: ref: registry.redhat.io/redhat/redhat-operator-index:v4.12 ``` The converter provisioner implementation would then be responsible for filtering out any Bundle resources that don't specify the `core.rukpak.io/converter` unique provisioner ID. This provisioner implementation can then re-use most of the existing plain provisioner Bundle controller code, which creates and manages an Pod that's responsible for sourcing Bundle contents and running the "unpacker" component's container image. > Note: The unpacker component returns a compressed representation of the sourced set of Bundle contents, and surfaces those contents through Pod's log sub-resource. When the unpack Pod is reporting a completed phase [^2], the converter can parse the log sub-resource for the sourced Bundle contents and manipulate the registry+v1 bundle manifests using the existing [internal/convert][convert] rukpak library. That library will translate a CSV resource into core Kubernetes resources. The controllers within the plain provisioner implementation share an in-memory cache of unpacked Bundle contents that are stored as compressed tar.gz files. The converter provisioner can manipulate the tar.gz file that was read from the unpack Pod's log sub-resource, manipulate those resources into a decomposed set of plain Kubernetes resources, and then create and store a new tar archive in the shared in-memory cache. From the perspective of the core.rukpak.io/plain BundleInstance controller, it's processing a set of arbitrary Kubernetes manifests that are in the plain bundle format. Note: The conversion implementation detailed above may require the re-archiving a tar.gz file twice due to tar not supporting in-place updates, and appending files to an existing tar archive doesn't work when those files are compressed. Working around this behavior may require interface changes. </details> --- Advantages: - No immediate migration is needed for (platform) operator authors or third party vendors - No major infrastructure (e.g. "pipelines") changes are required - Hook into OLM's existing ecosystem of catalog content - May meet KCP's requirements around managing registry+v1 catalog content - Alleviates current (legacy) OLM issues - Improved visibility into unpacked Bundle contents - Deletion of a BundleInstance deletes all associated resources --- Disadvantages: - No support for non-AllNamespace install modes due to the conflicting tenancy models between OLM and rukpak - No support for declaring APIService or webhook resource definitions due to increased complexity of the design - Decreased UX with multiple entrypoints for installing operator content on an OCP cluster - This is likely an issue for any of the outlined alternative approaches - No first-class discovery API within the OLM stack that communicates which operators and their APIs are already present on the cluster - Ecosystem remains stuck on a complex, opaque bundle format that is strongly coupled to legacy OLM - Users may be hestitant to use if the conversion is not handled in a way that feels intuituve - Requires changes to the current plain provisioner implementation - Storage interface only supports fs.FS parameters, [which isn't writable](https://github.com/golang/go/issues/45757) - The tar format doesn't support in-place updates to existing archives - The plain provisioner uses an in-memory, ephemeral cache of sourced Bundle contents in the form of a tar.gz files - ~~Configuring a PVC resource to persist cache contents across multiple provisioner Pod(s) leads to a difficult UX for bare metal clusters~~ - ~~The converter (naming TBD) provisioner would need to live in the same Pod as the plain provisioner container~~ --- Open Questions: - Does the conversion logic respect the CSV's `operatorframework.io/suggested-namespace` annotation or does the logic hardcode the installation namespace in the form of `<package-name>-system`? - How to handle OLM's namespace-scoped OperatorConditions API? - Are we breaking console's integration with the CSV's `alm-examples` annotation field? ### Support plain bundle catalog content only Utilize the rukpak plain provisioner implementation, which only supports reconciling Bundle content that's in the [plain bundle format][plain]. --- Advantages: - Keep scope creep minimal by focusing on introducing the desired mechanism and behaviors that will act as a baseline for installing a small set of layered product operators that can be installed in 4.12 - The registry+v1 bundle format may have a limited shelf life as the OLM team continues to progress on the set of components outlined in OLMs v1 vision - We don't have to deal with OLM's existing tenancy model - Minimal migration for CVO style manifests --- Disadvantages: - The plain bundle format is still an alpha level concept - The plain bundle format may need to be updated in the future to support declaring dependencies - Infrastructure components need to be able to work with rukpak's plain bundle format - Candidate POs need to perform a migration (either through tooling or manually) - Other use cases, e.g. KCP, may require a registry+v1 provisioner - Platform Operator candidates may need to maintain multiple bundle formats during the 4.12+ OCP releases - File-based catalogs don't have a first-class plain bundle schema - Legacy OLM only supports registry+v1 bundle formats --- Open Questions: - How does a Bundle communicate maxOCPVersion or minKubeVersion constraints? - How do higher level components communicate PO status? - Ensure that every PO has a ClusterOperator resource? - Aggregate all PO ClusterOperator resources into a single condition? - What's the entrypoint for catalog content? - The CatalogSource custom resource? - File-based catalogs that are served statically without the existing grpc API? ### Support hybrid catalog content In this model, catalog sources would contain both the `registry+v1` and `plain+v0` format bundles. OLM would continue working with the existing `registry+v1` bundles and ignore bundles in other formats. Meanwhile, plain bundles will be available on the cluster that the plain provisioner can reconcile via the Bundle resource. This approach is desirable because it allows for a gradual migration away from the existing `registry+v1` format by suppporting various bundle formats. This comes at the expense of increased complexity, both for catalog maintainers and cluster-admins. Rukpak could also potentially support `registry+v1` in this scenario, via a conversion mechanism. There would then be two separate ways of installing a bundle from a catalogsource. --- Advantages: - Can support legacy (registry+v1) and next generation (plain+v0) formats at the same time - Migration to only using plain+v0 becomes clearer --- Disadvantages: - Increases the complexity of building a catalog (somewhat alleviated by FBC) - Existing catalog tooling doesn't support non-registry-v1 bundle formats - Catalog needs to become aware of what plain+v0 bundles are. New endpoint? - OLM needs to become aware of plain+v0 bundles and not action on them - Need to think of how RukPak and OLM don't step on each other's toes - PlatformOperator and OLM need to be aware of each other for a good UX --- Open Questions: - How to map which bundles are compatible with individual provisioner implementations? - How to handle the case where multiple provisioners can interact with the same bundle format? - How to expose a bundle's format at the catalog level? - Introduce a new plain.bundle FBC schema? - Downside: backwards compatibility - Advantage: enables provisioners to select which bundle formats are compatible through [matchSchema selection logic][strawman] - Add an explicit `format` olm.bundle property? - Defaults to registry+v1 mediatype when that format property is missing - Downside: short term hack - Advantage: avoids changes to the gRPC registry APIs - What runtime changes are needed for OLM to work with hybrid catalog content? - Do we need to filter out plain bundles from packages surfaced in the PackageManifest API? - Are updates to the gRPC registy APIs required? - Are clients responsible for filtering out bundle formats that are incompatible? - How to handle the conflict between installing a plain bundle and a registry+v1 bundle when both provide a version of the same API? ## Notes (05/23) - Joe: the cert-manager bundle could be converted given those constraints - Nick: is there an effort to build pipelines for plain manifests from the ground-up? - Joe: likely targeting SBO + cert-manager as phase 0 candidates, which are already in the registry+v1 bundle format - Ben: long term vision is to move current OLM payload components to be installed through the PO mechanism (e.g. console, image registry, monitoring, etc.) - Ben: concerned with another large migration for bundle formats - Nick: PM wants to be able to use the existing catalog content for KCP. Needs a migration path from legacy OLM to the new OLM world - Conversion constraints in place might be sufficient? - Kevin: I want my operator to be installed once on a cluster, but only watch a single namespace - Kevin: explore a couple of different options to make migrations as smooth as possible - Per: can we restrict a set of constraints for phase 0? - There's no support for `opm registry add ...` that pipelines can use - Pipelines are still working towards using the FBC format - Introduce a new schema type vs. a new `format` or `type` olm.bundle property? - Nick: clients may only care about properties - Nick: revisit consuming the gRPC registry API, and instead consume a low-level FBC JSON file - Joe: realistically we won't have the pipeline in place to use the plain bundle format. Let's just focus on registry+v1 bundles for phase 0. - Per: what's the downside to a registry+v1 provisioner implementation? - The platform operator's controller may need to interface with the OperatorConditions API (or another API that exposes operator status) for upgradeability - Joe: do we need an API to signal upgradeability - Joe: cluster admins can signal when POs can upgrade (manual by default vs. automatic for cluster upgrades) - Nick: pursue android style approval vs. manual approval - Next steps: - Start bootstrapping deppy in parallel - No rush for fleshing out the deppy component in 4.12 given we have a small set of available constraints for POs - Start prototyping a registry+v1 converter implementation - In-flight PR: https://github.com/operator-framework/rukpak/pull/387 - Design/POC around android-style approval - Revisit the usage of OperatorConditions vs. a first-class upgrade API [convert]: <https://github.com/operator-framework/rukpak/blob/c6a0ec82aa85cb899e22e66ff9b81144ae655384/internal/convert/registryv1.go> [plain]: <https://github.com/operator-framework/rukpak/blob/main/docs/plain-bundle-spec.md> [strawman]: <https://hackmd.io/hppR60SiRGKPcqKQuByvVg?view#ResolveSet-Provisioner-Configuration>

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.